tencent cloud

14天试用边缘安全加速平台 EO 限时免费

Feedback

Tencent Cloud AI Digital Human

Endpoint Rendering Driver API

Last updated: 2025-01-17 16:20:52
After you Create Long Connection Channel, you can use a websocket persistent connection to send text to obtain driving data.

Request Parameters

Parameter Name
Type
Required
Description
ReqId
string
Yes
Unique identifier for each request, a 32-character UUID
StreamId
string
Yes
Conversation ID, used to distinguish multiple rounds of conversation, obtained through the Creating a Long Connection Channel API
VirtualmanProjectId
string
Yes
Digital Human Project ID, available in the digital human project
InputText
string
No
Request text content. Cannot be empty when DriverType is TEXT.
SpeechParam
SpeechParam
No
Define the detailed parameters of the output audio.
DriverType
string
Yes
Drive Type
1. TEXT: Text-driven;
2. CHAT: Text dialogue-driven
3. STREAM_TEXT: Streaming text-driven;
ChatCommand
string
No
Dialogue command, default value CHATTING
1. CHATTING: Dialogue
2. START_CHAT: Start a conversation
3. STOP_CHAT: End conversation
InputTextType
string
No
The type of InputText, default is MARKDOWN
1. MARKDOWN: markdown format, includes plain text, supports streaming
2. SSML: ssml standard format, does not support streaming
Seq
int
No
Streaming Text Fragment ID
IsFinal
bool
No
End marker of streaming text fragments, which should be passed in for each streaming text segment.

SpeechParam

Parameter Name
Type
Required
Description
TimbreKey
string
No
Timbre value, default to the timbre configured in the digital human project. Available timbre list can be obtained through Querying Timbre Lists by Pagination.
Speed
float
No
Speech speed: 1.0 is normal speed, with a range of [0.5 to 2.0]. A value of 0.5 represents the slowest speed, while 2.0 represents the fastest speed. If not specified, the default value is the speech speed configured in the digital human project.
Volume
int
No
Volume level, ranging from -10 to 10. The default is 0, which represents normal volume. The higher the values, the louder the volume.
EmotionCategory
string
No
Controls the emotion of the synthesized audio, supported only for multi-emotion timbres. See Querying Timbre Lists by Pagination for available values.
EmotionIntensity
int
No
Controls the intensity of the synthesized audio emotion, with a range of [50,200]. This is only effective when EmotionCategory is not empty.
SmartActionEnabled
bool
No
Enable intelligent action or not, default is disabled
SubtitleType
int
No
The mode of subtitle return, by character or by word. Default is by character.
0: By character
1: By word
TimbreLanguage
string
No
Timbre language, see the Personal Asset Management API Paginated Query Timbre List for available languages. For multi-language timbres, the corresponding language must be selected during synthesis.

Persistent Connection Downstream Message

Parameter Name
Type
Required
Description
ReqId
string
Yes
Single request ID, consistent with the parameter
StreamId
string
Yes
Conversation ID, used to distinguish multiple rounds of conversation, consistent with the parameter
DriverRspType
string
Yes
Response Type
1. REPLY: Return ReplyRsp, corresponding to session information
2. SPEECH: Return SpeechRsp, corresponding to audio content
ReplyRsp
ReplyRsp
No
Session response, returned when DriverRsp is REPLY
SpeechRsp
SpeechRsp
No
Return audio content when DriverRsp is SPEECH
ErrorCode
int
Yes
Error Code
ErrorMessage
string
Yes
Error message.
ReplyRsp
Parameter Name
Type
Required
Description
ReplyType
string
Yes
The reply message type.
1. cloudAiGpt: Tencent Cloud large model dialogue
2. yunxiaowei: Tencent Cloud Xiaowei customer service dialogue
3. cloudAiWaiting: Script for waiting for the first package due to timeout.
4. cloudAiTimeOut: Script for timeout without response and the session ends.
5. sensitive: Fixed script returned when the input text or reply contains sensitive content.
6. input: Content of InputText when it is plain text or streaming text.
7. enhanceText: When dialogue service is not configured, matches the content in script management.
ReplyPro
string
No
Broadcast content, including SSML tags
ReplyDisplay
string
Yes
Display content, including rich text tags
InteractionType
string
No
Special message type
InteractionContent
string
No
Special message content, used to deliver special messages such as pop-ups, images, and other non-text content.
Uninterrupt
bool
Yes
Can the current broadcast be interrupted?
Muted
bool
Yes
Is the current broadcast off audio recording?
SeqNo
int
Yes
Clause number, when ReplyType is cloudAiGp t, the normal reply sequence number starts from 1, other fixed phrases start from 0
ContentType
int
Yes
The reply message content type
0: unknown.
1: ordinary string.
2: ordered list.
3: unordered list.
4: image link.
5: HTTP link.
6: table.
8: Title
9: SSML
TtsSupport
bool
Yes
Is the current clause broadcast?
IsFinal
bool
Yes
Is it the last sentence?
IsHighLight
bool
Yes
Highlighted display needed?
SpeechRsp
Parameter Name
Type
Required
Description
Audio
string
Yes
base64 encoded PCM audio data
ThDim
int
Yes
mouth shape dimension
ThFeat
Array of float
Yes
mouth shape data
Phn
Array of [PhnInfo]
Yes
phoneme information
Word
Array of [WordInfo]
Yes
word segmentation information
Final
bool
Yes
End of sentence marker
SentenceFinal
bool
Yes
End of Streaming Clause marker
Sampling
int
Yes
Sample rate
Action
Array of [Action]
Yes
Action information
Subtitle
Array of [SubtitleInfo]
Yes
Subtitles information
RealThType
string
Yes
mouth shape parameter
Expression
Array of [Expression]
Yes
Emoji information
SeqNo
int
Yes
Serial number of a clause.
SentenceStart
bool
Yes
Start of a clause
ThFeatFinal
bool
Yes
End of mouth shape marker
PhnInfo
Parameter Name
Type
Required
Description
Phn
string
Yes
phoneme
Start
string
Yes
Start time, unit is 0.1 microseconds, the value divided by 10,000 represents milliseconds
End
string
Yes
End time, unit is 0.1 microseconds, the value divided by 10,000 represents milliseconds
WordInfo
Parameter Name
Type
Required
Description
Phn
string
Yes
phoneme
Word
string
Yes
Corresponding word
Action
Parameter Name
Type
Required
Description
Pos
string
Yes
Action Name
Start
string
Yes
Start time point, unit is 0.1 microseconds, the value divided by 10,000 represents milliseconds
SubtitleInfo
Parameter Name
Type
Required
Description
Word
string
Yes
Corresponding word
Start
string
Yes
Start time point, unit is 0.1 microseconds, the value divided by 10,000 represents milliseconds
End
string
Yes
End time point, unit is 0.1 microseconds, the value divided by 10,000 represents milliseconds
PosStart
string
Yes
The starting Unicode location in the text, note it is in left-closed and right-open form [PosStart, PosEnd)
PosEnd
string
Yes
The ending Unicode location in the text, note it is in left-closed and right-open form [PosStart, PosEnd)
Expression
Parameter Name
Type
Required
Description
Name
string
Yes
Emoji name
Start
string
Yes
Start time point, unit is 0.1 microseconds, the value divided by 10,000 represents milliseconds
End
string
Yes
End time point, unit is 0.1 microseconds, the value divided by 10,000 represents milliseconds
Loc
string
Yes
Unicode location of emoji in text
Flag
string
Yes
B: This text contains the start of an emoji;
I: This text is part of the middle of an emoji
E: This text contains the end of an emoji
S: This text contains the start and end of an emoji

Request Sample

{
"Header": {},
"Payload": {
"VirtualmanProjectId": "253b2a182d694a60bed82635b18025a2",
"InputText": "In the AI industry, which domains have better foundational conditions for AI development?"
"ReqId": "d7aa08da33dd4a662ad5be508c5b77cf",
"StreamId": "92597c35-3a99-415e-9bae-3124771b7749",
"DriverType": "TEXT",
"SpeechParam": {
"TimbreKey": ""
}
}
}

Response Sample

//DriverRspType is REPLY
{
"Header": {
"RequestID": "fe0e4c13f2a34cb69b2475d8483f28de",
"SessionID": "gza802cc9317231084402578413",
"DialogID": "",
"Code": 0,
"Message": ""
},
"Payload": {
"DriverRspType": "REPLY",
"ErrorCode": 0,
"ErrorMessage": "",
"ReplyRsp": {
"ContentType": 1,
"InteractionContent": "",
"InteractionType": "",
"IsFinal": true,
"IsHighLight": true,
"Muted": false,
"ReplyDisplay": "Which domains have a foundation for AI development"
"ReplyPro": "<speak>Which domains show superior conditions for AI development?</speak>",
"ReplyType": "input",
"SeqNo": 2,
"TtsSupport": true,
"UninterrId": "fe0e4c13f2a34cb69b2475d8483f28de",
"SpeechRsp": {
"Action": [],
"Audio": "",
"Expression": [],
"Final": false,
"Phn": [],
"RealThType": "",
"Sampling": 0,
"SentenceFinal": false,
"SentenceStart": false,
"SeqNo": 0,
"Subtitle": [],
"ThDim": 0,
"ThFeat": [],
"ThFeatFinal": false,
"Word": []
},
"StreamId": "92597c35-3a99-415e-9bae-3124771b7749"
}
}
}

//DriverRspType is SPEECH
{
"Header": {
"RequestID": "fe0e4c13f2a34cb69b2475d8483f28de",
"SessionID": "gza802cc9317231084402578413",
"DialogID": "",
"Code": 0,
"Message": ""
},
"Payload": {
"DriverRspType": "SPEECH",
"ErrorCode": 0,
"ErrorMessage": "",
"ReplyRsp": {
"ContentType": 0,
"InteractionContent": "",
"InteractionType": "",
"IsFinal": false,
"IsHighLight": false,
"Muted": false,
"ReplyDisplay": "",
"ReplyPro": "",
"ReplyType": "",
"SeqNo": 0,
"TtsSupport": false,
"Uninterrupt": false
},
"ReqId": "fe0e4c13f2a34cb69b2475d8483f28de",
"SpeechRsp": {
"Action": [],
"Audio": "", // Content too long, not displayed
"Expression": [],
"Final": false,
"Phn": [
{
"End": "200000",
"Phn": "sil0",
"Start": "0"
},
{
"End": "1100000",
"Phn": "z4",
"Start": "200000"
},
{
"End": "2700000",
"Phn": "ai4",
"Start": "1100000"
},
{
"End": "3800000",
"Phn": "r2",
"Start": "2700000"
},
{
"End": "5100000",
"Phn": "en2",
"Start": "3800000"
},
{
"End": "5800000",
"Phn": "g1",
"Start": "5100000"
},
{
"End": "7300000",
"Phn": "ong1",
"Start": "5800000"
},
{
"End": "8100000",
"Phn": "zh4",
"Start": "7300000"
},
{
"End": "9000000",
"Phn": "iii4",
"Start": "8100000"
},
{
"End": "9800000",
"Phn": "n2",
"Start": "9000000"
},
{
"End": "11300000",
"Phn": "eng2",
"Start": "9800000"
},
{
"End": "12800000",
"Phn": "ch3",
"Start": "11300000"
},
{
"End": "14000000",
"Phn": "an3",
"Start": "12800000"
},
{
"End": "16000000",
"Phn": "ie4",
"Start": "14000000"
},
{
"End": "17100000",
"Phn": "zh1",
"Start": "16000000"
},
{
"End": "19200000",
"Phn": "ong1",
"Start": "17100000"
},
{
"End": "24200000",
"Phn": "sil0",
"Start": "19200000"
}
],
"RealThType": "3D_standard",
"Sampling": 24000,
"SentenceFinal": false,
"SentenceStart": true,
"SeqNo": 1,
"Subtitle": [
{
"End": "2700000",
"PosEnd": "1",
"PosStart": "0",
"Start": "200000",
"Word": "at"
},
{
"End": "5100000",
"PosEnd": "2",
"PosStart": "1",
"Start": "2700000",
"Word": "user"
},
{
"End": "7300000",
"PosEnd": "3",
"PosStart": "2",
"Start": "5100000",
"Word": "work"
},
{
"End": "9000000",
"PosEnd": "4",
"PosStart": "3",
"Start": "7300000",
"Word": "intelligent"
},
{
"End": "11300000",
"PosEnd": "5",
"PosStart": "4",
"Start": "9000000",
"Word": "can"
},
{
"End": "14000000",
"PosEnd": "6",
"PosStart": "5",
"Start": "11300000",
"Word": "product"
},
{
"End": "16000000",
"PosEnd": "7",
"PosStart": "6",
"Start": "14000000",
"Word": "industry"
},
{
"End": "19200000",
"PosEnd": "9",
"PosStart": "7",
"Sta": "16000000",
"Word": "in,"
}
],
"ThDim": 52,
"ThFeat": [], // Content too long, not displayed
"ThFeatFinal": false,
"Word": [
{
"Phn": "z-ai4",
"Word": "at"
},
{
"Phn": "r-en2|g-ng1",
"Word": "manual"
},
{
"Phn": "zh-iii4|n-eng2",
"Word": "intelligent"
},
{
"Phn": "ch-an3|ie4",
"Word": "industry"
},
{
"Phn": "zh-ong1",
"Word": "in"
}
]
},
"StreamId": "925999-415e-9bae-3124771b7749"
}
}
Contact Us

Contact our sales team or business advisors to help your business.

Technical Support

Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

7x24 Phone Support
中国香港
+852 800 906 020 (免费)
美国
+1 844 606 0804 (免费)
英国
+44 808 196 4551 (免费)
加拿大
+1 888 605 7930 (免费)
澳大利亚
+61 1300 986 386 (免费)
EdgeOne 热线
+852 300 80699
更多本地服务热线陆续新增中