Voice-driven Instructions
Last updated: 2024-07-19 10:08:06
Voice-driven Instructions
Last updated: 2024-07-19 10:08:06
Request Parameters
|
ReqId | String | Yes | A unique identifier for a single drive. Each segment of audio is assigned a UUID value. |
SessionId | String | Yes | Unique identifier for the session. |
Command | String | Yes | SEND_AUDIO; send the audio. |
Data | | Yes | Data Object |
|
Audio | string | Yes | The byte array of the original audio data, encoded into a string via Base64. Only supports: format-PCM, sampling rate-16kHz, sampling bit depth-16bits, audio track-mono. |
Seq | int | Yes | Audio packet sequence number, which must start from 1. |
IsFinal | bool | No | The default value is false. |
Note:
1. If the data is being sent in real-time from a microphone, it can be sent every 160 ms (5120B) without any waiting interval. If the data is being sent from an offline audio file, the packet size should be 160 ms (5120B) with a 120 ms interval between packets.
2. The size of the last packet should be based on the actual remaining data (must be less than 160 ms).
3. After all data packets have been sent, an empty data packet with IsFinal=true (with the Audio field left empty) must be sent to signal the end of the audio session and return the Digital Human to a silent state.
4. The real-time rate of sending audio must be between [0.75, 1]. A rate lower than 0.75 will trigger throttling, while a rate higher than 1 will cause video stuttering. For example, for a 160 ms audio packet size, the sending interval must not be less than 120 ms or more than 160 ms.
Request Sample
{
"Header": {},
"Payload": {
"ReqId": "d7aa08da33dd4a662ad5be508c5b77cf",
"SessionId": "m123adfafvbadsafd",
"Command": "SEND_AUDIO",
"Data": {
"Audio": "The value of the audio binary data encoded in Base64",
"Seq": 0,
"IsFinal": false
}
}
}