Parameter Name | Type | Required | Description |
ReqId | string | Yes | Unique identifier for each request, a 32-character UUID |
StreamId | string | Yes | Conversation ID, used to distinguish multiple rounds of conversation, obtained through the Creating a Long Connection Channel API |
VirtualmanProjectId | string | Yes | Digital Human Project ID, available in the digital human project |
InputText | string | No | Request text content. Cannot be empty when DriverType is TEXT. |
SpeechParam | SpeechParam | No | Define the detailed parameters of the output audio. |
DriverType | string | Yes | Drive Type 1. TEXT: Text-driven; 2. CHAT: Text dialogue-driven 3. STREAM_TEXT: Streaming text-driven; |
ChatCommand | string | No | Dialogue command, default value CHATTING 1. CHATTING: Dialogue 2. START_CHAT: Start a conversation 3. STOP_CHAT: End conversation |
InputTextType | string | No | The type of InputText, default is MARKDOWN 1. MARKDOWN: markdown format, includes plain text, supports streaming 2. SSML: ssml standard format, does not support streaming |
Seq | int | No | Streaming Text Fragment ID |
IsFinal | bool | No | End marker of streaming text fragments, which should be passed in for each streaming text segment. |
Parameter Name | Type | Required | Description |
TimbreKey | string | No | Timbre value, default to the timbre configured in the digital human project. Available timbre list can be obtained through Querying Timbre Lists by Pagination. |
Speed | float | No | Speech speed: 1.0 is normal speed, with a range of [0.5 to 2.0]. A value of 0.5 represents the slowest speed, while 2.0 represents the fastest speed. If not specified, the default value is the speech speed configured in the digital human project. |
Volume | int | No | Volume level, ranging from -10 to 10. The default is 0, which represents normal volume. The higher the values, the louder the volume. |
EmotionCategory | string | No | Controls the emotion of the synthesized audio, supported only for multi-emotion timbres. See Querying Timbre Lists by Pagination for available values. |
EmotionIntensity | int | No | Controls the intensity of the synthesized audio emotion, with a range of [50,200]. This is only effective when EmotionCategory is not empty. |
SmartActionEnabled | bool | No | Enable intelligent action or not, default is disabled |
SubtitleType | int | No | The mode of subtitle return, by character or by word. Default is by character. 0: By character 1: By word |
TimbreLanguage | string | No | Timbre language, see the Personal Asset Management API Paginated Query Timbre List for available languages. For multi-language timbres, the corresponding language must be selected during synthesis. |
Parameter Name | Type | Required | Description |
ReqId | string | Yes | Single request ID, consistent with the parameter |
StreamId | string | Yes | Conversation ID, used to distinguish multiple rounds of conversation, consistent with the parameter |
DriverRspType | string | Yes | Response Type 1. REPLY: Return ReplyRsp, corresponding to session information 2. SPEECH: Return SpeechRsp, corresponding to audio content |
ReplyRsp | ReplyRsp | No | Session response, returned when DriverRsp is REPLY |
SpeechRsp | SpeechRsp | No | Return audio content when DriverRsp is SPEECH |
ErrorCode | int | Yes | Error Code |
ErrorMessage | string | Yes | Error message. |
Parameter Name | Type | Required | Description |
ReplyType | string | Yes | The reply message type. 1. cloudAiGpt: Tencent Cloud large model dialogue 2. yunxiaowei: Tencent Cloud Xiaowei customer service dialogue 3. cloudAiWaiting: Script for waiting for the first package due to timeout. 4. cloudAiTimeOut: Script for timeout without response and the session ends. 5. sensitive: Fixed script returned when the input text or reply contains sensitive content. 6. input: Content of InputText when it is plain text or streaming text. 7. enhanceText: When dialogue service is not configured, matches the content in script management. |
ReplyPro | string | No | Broadcast content, including SSML tags |
ReplyDisplay | string | Yes | Display content, including rich text tags |
InteractionType | string | No | Special message type |
InteractionContent | string | No | Special message content, used to deliver special messages such as pop-ups, images, and other non-text content. |
Uninterrupt | bool | Yes | Can the current broadcast be interrupted? |
Muted | bool | Yes | Is the current broadcast off audio recording? |
SeqNo | int | Yes | Clause number, when ReplyType is cloudAiGp t, the normal reply sequence number starts from 1, other fixed phrases start from 0 |
ContentType | int | Yes | The reply message content type 0: unknown. 1: ordinary string. 2: ordered list. 3: unordered list. 4: image link. 5: HTTP link. 6: table. 8: Title 9: SSML |
TtsSupport | bool | Yes | Is the current clause broadcast? |
IsFinal | bool | Yes | Is it the last sentence? |
IsHighLight | bool | Yes | Highlighted display needed? |
Parameter Name | Type | Required | Description |
Audio | string | Yes | base64 encoded PCM audio data |
ThDim | int | Yes | mouth shape dimension |
ThFeat | Array of float | Yes | mouth shape data |
Phn | Array of [PhnInfo] | Yes | phoneme information |
Word | Array of [WordInfo] | Yes | word segmentation information |
Final | bool | Yes | End of sentence marker |
SentenceFinal | bool | Yes | End of Streaming Clause marker |
Sampling | int | Yes | Sample rate |
Action | Array of [Action] | Yes | Action information |
Subtitle | Array of [SubtitleInfo] | Yes | Subtitles information |
RealThType | string | Yes | mouth shape parameter |
Expression | Array of [Expression] | Yes | Emoji information |
SeqNo | int | Yes | Serial number of a clause. |
SentenceStart | bool | Yes | Start of a clause |
ThFeatFinal | bool | Yes | End of mouth shape marker |
Parameter Name | Type | Required | Description |
Phn | string | Yes | phoneme |
Start | string | Yes | Start time, unit is 0.1 microseconds, the value divided by 10,000 represents milliseconds |
End | string | Yes | End time, unit is 0.1 microseconds, the value divided by 10,000 represents milliseconds |
Parameter Name | Type | Required | Description |
Phn | string | Yes | phoneme |
Word | string | Yes | Corresponding word |
Parameter Name | Type | Required | Description |
Pos | string | Yes | Action Name |
Start | string | Yes | Start time point, unit is 0.1 microseconds, the value divided by 10,000 represents milliseconds |
Parameter Name | Type | Required | Description |
Word | string | Yes | Corresponding word |
Start | string | Yes | Start time point, unit is 0.1 microseconds, the value divided by 10,000 represents milliseconds |
End | string | Yes | End time point, unit is 0.1 microseconds, the value divided by 10,000 represents milliseconds |
PosStart | string | Yes | The starting Unicode location in the text, note it is in left-closed and right-open form [PosStart, PosEnd) |
PosEnd | string | Yes | The ending Unicode location in the text, note it is in left-closed and right-open form [PosStart, PosEnd) |
Parameter Name | Type | Required | Description |
Name | string | Yes | Emoji name |
Start | string | Yes | Start time point, unit is 0.1 microseconds, the value divided by 10,000 represents milliseconds |
End | string | Yes | End time point, unit is 0.1 microseconds, the value divided by 10,000 represents milliseconds |
Loc | string | Yes | Unicode location of emoji in text |
Flag | string | Yes | B: This text contains the start of an emoji; I: This text is part of the middle of an emoji E: This text contains the end of an emoji S: This text contains the start and end of an emoji |
{"Header": {},"Payload": {"VirtualmanProjectId": "253b2a182d694a60bed82635b18025a2","InputText": "In the AI industry, which domains have better foundational conditions for AI development?""ReqId": "d7aa08da33dd4a662ad5be508c5b77cf","StreamId": "92597c35-3a99-415e-9bae-3124771b7749","DriverType": "TEXT","SpeechParam": {"TimbreKey": ""}}}
//DriverRspType is REPLY{"Header": {"RequestID": "fe0e4c13f2a34cb69b2475d8483f28de","SessionID": "gza802cc9317231084402578413","DialogID": "","Code": 0,"Message": ""},"Payload": {"DriverRspType": "REPLY","ErrorCode": 0,"ErrorMessage": "","ReplyRsp": {"ContentType": 1,"InteractionContent": "","InteractionType": "","IsFinal": true,"IsHighLight": true,"Muted": false,"ReplyDisplay": "Which domains have a foundation for AI development""ReplyPro": "<speak>Which domains show superior conditions for AI development?</speak>","ReplyType": "input","SeqNo": 2,"TtsSupport": true,"UninterrId": "fe0e4c13f2a34cb69b2475d8483f28de","SpeechRsp": {"Action": [],"Audio": "","Expression": [],"Final": false,"Phn": [],"RealThType": "","Sampling": 0,"SentenceFinal": false,"SentenceStart": false,"SeqNo": 0,"Subtitle": [],"ThDim": 0,"ThFeat": [],"ThFeatFinal": false,"Word": []},"StreamId": "92597c35-3a99-415e-9bae-3124771b7749"}}}//DriverRspType is SPEECH{"Header": {"RequestID": "fe0e4c13f2a34cb69b2475d8483f28de","SessionID": "gza802cc9317231084402578413","DialogID": "","Code": 0,"Message": ""},"Payload": {"DriverRspType": "SPEECH","ErrorCode": 0,"ErrorMessage": "","ReplyRsp": {"ContentType": 0,"InteractionContent": "","InteractionType": "","IsFinal": false,"IsHighLight": false,"Muted": false,"ReplyDisplay": "","ReplyPro": "","ReplyType": "","SeqNo": 0,"TtsSupport": false,"Uninterrupt": false},"ReqId": "fe0e4c13f2a34cb69b2475d8483f28de","SpeechRsp": {"Action": [],"Audio": "", // Content too long, not displayed"Expression": [],"Final": false,"Phn": [{"End": "200000","Phn": "sil0","Start": "0"},{"End": "1100000","Phn": "z4","Start": "200000"},{"End": "2700000","Phn": "ai4","Start": "1100000"},{"End": "3800000","Phn": "r2","Start": "2700000"},{"End": "5100000","Phn": "en2","Start": "3800000"},{"End": "5800000","Phn": "g1","Start": "5100000"},{"End": "7300000","Phn": "ong1","Start": "5800000"},{"End": "8100000","Phn": "zh4","Start": "7300000"},{"End": "9000000","Phn": "iii4","Start": "8100000"},{"End": "9800000","Phn": "n2","Start": "9000000"},{"End": "11300000","Phn": "eng2","Start": "9800000"},{"End": "12800000","Phn": "ch3","Start": "11300000"},{"End": "14000000","Phn": "an3","Start": "12800000"},{"End": "16000000","Phn": "ie4","Start": "14000000"},{"End": "17100000","Phn": "zh1","Start": "16000000"},{"End": "19200000","Phn": "ong1","Start": "17100000"},{"End": "24200000","Phn": "sil0","Start": "19200000"}],"RealThType": "3D_standard","Sampling": 24000,"SentenceFinal": false,"SentenceStart": true,"SeqNo": 1,"Subtitle": [{"End": "2700000","PosEnd": "1","PosStart": "0","Start": "200000","Word": "at"},{"End": "5100000","PosEnd": "2","PosStart": "1","Start": "2700000","Word": "user"},{"End": "7300000","PosEnd": "3","PosStart": "2","Start": "5100000","Word": "work"},{"End": "9000000","PosEnd": "4","PosStart": "3","Start": "7300000","Word": "intelligent"},{"End": "11300000","PosEnd": "5","PosStart": "4","Start": "9000000","Word": "can"},{"End": "14000000","PosEnd": "6","PosStart": "5","Start": "11300000","Word": "product"},{"End": "16000000","PosEnd": "7","PosStart": "6","Start": "14000000","Word": "industry"},{"End": "19200000","PosEnd": "9","PosStart": "7","Sta": "16000000","Word": "in,"}],"ThDim": 52,"ThFeat": [], // Content too long, not displayed"ThFeatFinal": false,"Word": [{"Phn": "z-ai4","Word": "at"},{"Phn": "r-en2|g-ng1","Word": "manual"},{"Phn": "zh-iii4|n-eng2","Word": "intelligent"},{"Phn": "ch-an3|ie4","Word": "industry"},{"Phn": "zh-ong1","Word": "in"}]},"StreamId": "925999-415e-9bae-3124771b7749"}}
Was this page helpful?