tencent cloud

Log In Sign Up Free

Detect AI Fraud with Tencent eKYC！Intercept 99%+ Deepfake Attacks!

Tencent Cloud AI Digital Human

Product Overview

Product Features

Product Advantage

Purchase Guide

Process for Purchasing with Vouchers

Refund Instructions

Digital Human Platform Operation Guide

Accessing Platform

Avatar Production and Asset Management

Custom Asset Management

Personal Asset Management

Asset Renewal Management

Sub-user and Permission Management

Broadcast Digital Human Video Generation and Management

Operations Management and Analysis

Digital Human Conversation Interaction Application and Management

Configuration Process Introduction

Project Creation and Management

Image and Output Settings

Quick Experience and Integration

Introduction of Avatar

Introduction to Image Categories

Basic Image Library

3D Basic Image Library

2D Small Sample (General Mouth Shape) Basic Image Library

2D Small Sample (Exclusive Mouth Shape) Basic Image Library

2D Boutique Basic Image Library

Guide on Avatar and Voice Clone

Avatar Recording Guide - Studio Avatar

Avatar Recording Guide - Instant Avatar

Avatar Recording Guide - 4K Version

Voice Clone Recording Guide - Basic Edition

Voice Clone Recording Tool - Basic Edition

Voice Clone Recording Guide - Ultra-fast Version

Voice Clone Recording Guide - Ultra-Fast Version (Minority Language)

Custom Material Submission Guide

Server API Integration

Avatar aPaas API Calling Methods

Avatar Image Customization and Voice Clone API Documentation

Video Generation Service API Documentation

Digital Human aPaaS API Calling Methods

Audio Production API

Video Production API - Basic Edition

Audio and Video Production Progress Query API

Video Production API - Advanced Version

Customer Resource Query Anchor API

Querying All Images of a Specific Anchor

Querying the Supported Timbres for VirtualmanKey (to Be Deprecated)

Querying the Supported Actions for VirtualmanKey

Appendix

Appendix I: Result Code Dictionary

Appendix II: Callback Request Body Format

API Integration FAQs

Interactive Digital Human Service API Documentation

Personal Asset Management API Documentation

Digital Human aPaaS API Calling Methods

Querying for Avatar List by Pagination API

Querying Supported Timbres for Avatars (to be Deprecated)

Querying Customer Service Asset Information

Querying Timbre Lists by Pagination

Querying Image Asset Information - Query Anchor

Querying Image Asset Information - Querying all Avatars under the Anchor

Querying the List Of Actions Supported by the Avatar

Appendix 1 - Service Asset Type

Appendix 2 - Emotional Style

Appendix 3 - Digital Human Type

Appendix 4 - Language List

API Integration FAQs

Client SDK Integration

H5 SDK Integration

HTML5 SDK API Description for Client Rendering

Client Rendering API Integration

Create a Persistent Connection Channel

Endpoint Rendering Driver API

Digital Human SSML Markup Language Specification

Related Agreement

DSA (Data Sharing Agreement)

DocumentationTencent Cloud AI Digital HumanClient Rendering API IntegrationEndpoint Rendering Driver API

Endpoint Rendering Driver API

Last updated: 2025-01-17 16:20:52

Endpoint Rendering Driver API

Last updated: 2025-01-17 16:20:52

After you Create Long Connection Channel, you can use a websocket persistent connection to send text to obtain driving data.
Request Parameters
Parameter Name
Type
Required
Description
ReqId
string
Yes
Unique identifier for each request, a 32-character UUID
StreamId
string
Yes
Conversation ID, used to distinguish multiple rounds of conversation, obtained through the Creating a Long Connection Channel API
VirtualmanProjectId
string
Yes
Digital Human Project ID, available in the digital human project
InputText
string
No
Request text content. Cannot be empty when DriverType is TEXT.
SpeechParam
SpeechParam
No
Define the detailed parameters of the output audio.
DriverType
string
Yes
Drive Type
1. TEXT: Text-driven;
2. CHAT: Text dialogue-driven
3. STREAM_TEXT: Streaming text-driven;
ChatCommand
string
No
Dialogue command, default value CHATTING
1. CHATTING: Dialogue
2. START_CHAT: Start a conversation
3. STOP_CHAT: End conversation
InputTextType
string
No
The type of InputText, default is MARKDOWN
1. MARKDOWN: markdown format, includes plain text, supports streaming
2. SSML: ssml standard format, does not support streaming
Seq
int
No
Streaming Text Fragment ID
IsFinal
bool
No
End marker of streaming text fragments, which should be passed in for each streaming text segment.
﻿SpeechParam
﻿
Parameter Name
Type
Required
Description
TimbreKey
string
No
Timbre value, default to the timbre configured in the digital human project. Available timbre list can be obtained through Querying Timbre Lists by Pagination.
Speed
float
No
Speech speed: 1.0 is normal speed, with a range of [0.5 to 2.0]. A value of 0.5 represents the slowest speed, while 2.0 represents the fastest speed. If not specified, the default value is the speech speed configured in the digital human project.
Volume
int
No
Volume level, ranging from -10 to 10. The default is 0, which represents normal volume. The higher the values, the louder the volume.
EmotionCategory
string
No
Controls the emotion of the synthesized audio, supported only for multi-emotion timbres. See Querying Timbre Lists by Pagination for available values.
EmotionIntensity
int
No
Controls the intensity of the synthesized audio emotion, with a range of [50,200]. This is only effective when EmotionCategory is not empty.
SmartActionEnabled
bool
No
Enable intelligent action or not, default is disabled
SubtitleType
int
No
The mode of subtitle return, by character or by word. Default is by character.
0: By character
1: By word
TimbreLanguage
string
No
Timbre language, see the Personal Asset Management API Paginated Query Timbre List for available languages. For multi-language timbres, the corresponding language must be selected during synthesis.
Persistent Connection Downstream Message
Parameter Name
Type
Required
Description
ReqId
string
Yes
Single request ID, consistent with the parameter
StreamId
string
Yes
Conversation ID, used to distinguish multiple rounds of conversation, consistent with the parameter
DriverRspType
string
Yes
Response Type
1. REPLY: Return ReplyRsp, corresponding to session information
2. SPEECH: Return SpeechRsp, corresponding to audio content
ReplyRsp
ReplyRsp
No
Session response, returned when DriverRsp is REPLY
SpeechRsp
SpeechRsp
No
Return audio content when DriverRsp is SPEECH
ErrorCode
int
Yes
Error Code
ErrorMessage
string
Yes
Error message.
ReplyRsp
Parameter Name
Type
Required
Description
ReplyType
string
Yes
The reply message type.
1. cloudAiGpt: Tencent Cloud large model dialogue
2. yunxiaowei: Tencent Cloud Xiaowei customer service dialogue
3. cloudAiWaiting: Script for waiting for the first package due to timeout.
4. cloudAiTimeOut: Script for timeout without response and the session ends.
5. sensitive: Fixed script returned when the input text or reply contains sensitive content.
6. input: Content of InputText when it is plain text or streaming text.
7. enhanceText: When dialogue service is not configured, matches the content in script management.
ReplyPro
string
No
Broadcast content, including SSML tags
ReplyDisplay
string
Yes
Display content, including rich text tags
InteractionType
string
No
Special message type
InteractionContent
string
No
Special message content, used to deliver special messages such as pop-ups, images, and other non-text content.
Uninterrupt
bool
Yes
Can the current broadcast be interrupted?
Muted
bool
Yes
Is the current broadcast off audio recording?
SeqNo
int
Yes
Clause number, when ReplyType is cloudAiGp t, the normal reply sequence number starts from 1, other fixed phrases start from 0
ContentType
int
Yes
The reply message content type
0: unknown.
1: ordinary string.
2: ordered list.
3: unordered list.
4: image link.
5: HTTP link.
6: table.
8: Title
9: SSML
TtsSupport
bool
Yes
Is the current clause broadcast?
IsFinal
bool
Yes
Is it the last sentence?
IsHighLight
bool
Yes
Highlighted display needed?
SpeechRsp
Parameter Name
Type
Required
Description
Audio
string
Yes
base64 encoded PCM audio data
ThDim
int
Yes
mouth shape dimension
ThFeat
Array of float
Yes
mouth shape data
Phn
Array of [PhnInfo]
Yes
phoneme information
Word
Array of [WordInfo]
Yes
word segmentation information
Final
bool
Yes
End of sentence marker
SentenceFinal
bool
Yes
End of Streaming Clause marker
Sampling
int
Yes
Sample rate
Action
Array of [Action]
Yes
Action information
Subtitle
Array of [SubtitleInfo]
Yes
Subtitles information
RealThType
string
Yes
mouth shape parameter
Expression
Array of [Expression]
Yes
Emoji information
SeqNo
int
Yes
Serial number of a clause.
SentenceStart
bool
Yes
Start of a clause
ThFeatFinal
bool
Yes
End of mouth shape marker
PhnInfo
Parameter Name
Type
Required
Description
Phn
string
Yes
phoneme
Start
string
Yes
Start time, unit is 0.1 microseconds, the value divided by 10,000 represents milliseconds
End
string
Yes
End time, unit is 0.1 microseconds, the value divided by 10,000 represents milliseconds
WordInfo
Parameter Name
Type
Required
Description
Phn
string
Yes
phoneme
Word
string
Yes
Corresponding word
Action
Parameter Name
Type
Required
Description
Pos
string
Yes
Action Name
Start
string
Yes
Start time point, unit is 0.1 microseconds, the value divided by 10,000 represents milliseconds
SubtitleInfo
Parameter Name
Type
Required
Description
Word
string
Yes
Corresponding word
Start
string
Yes
Start time point, unit is 0.1 microseconds, the value divided by 10,000 represents milliseconds
End
string
Yes
End time point, unit is 0.1 microseconds, the value divided by 10,000 represents milliseconds
PosStart
string
Yes
The starting Unicode location in the text, note it is in left-closed and right-open form [PosStart, PosEnd)
PosEnd
string
Yes
The ending Unicode location in the text, note it is in left-closed and right-open form [PosStart, PosEnd)
Expression
Parameter Name
Type
Required
Description
Name
string
Yes
Emoji name
Start
string
Yes
Start time point, unit is 0.1 microseconds, the value divided by 10,000 represents milliseconds
End
string
Yes
End time point, unit is 0.1 microseconds, the value divided by 10,000 represents milliseconds
Loc
string
Yes
Unicode location of emoji in text
Flag
string
Yes
B: This text contains the start of an emoji;
I: This text is part of the middle of an emoji
E: This text contains the end of an emoji
S: This text contains the start and end of an emoji
Request Sample
{    
    "Header": {},
    "Payload": {
        "VirtualmanProjectId": "253b2a182d694a60bed82635b18025a2",
        "InputText": "In the AI industry, which domains have better foundational conditions for AI development?"
        "ReqId":               "d7aa08da33dd4a662ad5be508c5b77cf",
        "StreamId":            "92597c35-3a99-415e-9bae-3124771b7749",
        "DriverType":          "TEXT",
        "SpeechParam": {
          "TimbreKey": ""
        }
    }
}
Response Sample
//DriverRspType is REPLY
{
  "Header": {
    "RequestID": "fe0e4c13f2a34cb69b2475d8483f28de",
    "SessionID": "gza802cc9317231084402578413",
    "DialogID": "",
    "Code": 0,
    "Message": ""
  },
  "Payload": {
    "DriverRspType": "REPLY",
    "ErrorCode": 0,
    "ErrorMessage": "",
    "ReplyRsp": {
      "ContentType": 1,
      "InteractionContent": "",
      "InteractionType": "",
      "IsFinal": true,
      "IsHighLight": true,
      "Muted": false,
      "ReplyDisplay": "Which domains have a foundation for AI development"
      "ReplyPro": "<speak>Which domains show superior conditions for AI development?</speak>",
      "ReplyType": "input",
      "SeqNo": 2,
      "TtsSupport": true,
      "UninterrId": "fe0e4c13f2a34cb69b2475d8483f28de",
      "SpeechRsp": {
        "Action": [],
        "Audio": "",
        "Expression": [],
        "Final": false,
        "Phn": [],
        "RealThType": "",
        "Sampling": 0,
        "SentenceFinal": false,
        "SentenceStart": false,
        "SeqNo": 0,
        "Subtitle": [],
        "ThDim": 0,
        "ThFeat": [],
        "ThFeatFinal": false,
        "Word": []
      },
      "StreamId": "92597c35-3a99-415e-9bae-3124771b7749"
    }
  }
}
﻿
//DriverRspType is SPEECH
{
  "Header": {
    "RequestID": "fe0e4c13f2a34cb69b2475d8483f28de",
    "SessionID": "gza802cc9317231084402578413",
    "DialogID": "",
    "Code": 0,
    "Message": ""
  },
  "Payload": {
    "DriverRspType": "SPEECH",
    "ErrorCode": 0,
    "ErrorMessage": "",
    "ReplyRsp": {
      "ContentType": 0,
      "InteractionContent": "",
      "InteractionType": "",
      "IsFinal": false,
      "IsHighLight": false,
      "Muted": false,
      "ReplyDisplay": "",
      "ReplyPro": "",
      "ReplyType": "",
      "SeqNo": 0,
      "TtsSupport": false,
      "Uninterrupt": false
    },
    "ReqId": "fe0e4c13f2a34cb69b2475d8483f28de",
    "SpeechRsp": {
      "Action": [],
      "Audio": "", // Content too long, not displayed
      "Expression": [],
      "Final": false,
      "Phn": [
        {
          "End": "200000",
          "Phn": "sil0",
          "Start": "0"
        },
        {
          "End": "1100000",
          "Phn": "z4",
          "Start": "200000"
        },
        {
          "End": "2700000",
          "Phn": "ai4",
          "Start": "1100000"
        },
        {
          "End": "3800000",
          "Phn": "r2",
          "Start": "2700000"
        },
        {
          "End": "5100000",
          "Phn": "en2",
          "Start": "3800000"
        },
        {
          "End": "5800000",
          "Phn": "g1",
          "Start": "5100000"
        },
        {
          "End": "7300000",
          "Phn": "ong1",
          "Start": "5800000"
        },
        {
          "End": "8100000",
          "Phn": "zh4",
          "Start": "7300000"
        },
        {
          "End": "9000000",
          "Phn": "iii4",
          "Start": "8100000"
        },
        {
          "End": "9800000",
          "Phn": "n2",
          "Start": "9000000"
        },
        {
          "End": "11300000",
          "Phn": "eng2",
          "Start": "9800000"
        },
        {
          "End": "12800000",
          "Phn": "ch3",
          "Start": "11300000"
        },
        {
          "End": "14000000",
          "Phn": "an3",
          "Start": "12800000"
        },
        {
          "End": "16000000",
          "Phn": "ie4",
          "Start": "14000000"
        },
        {
          "End": "17100000",
          "Phn": "zh1",
          "Start": "16000000"
        },
        {
          "End": "19200000",
          "Phn": "ong1",
          "Start": "17100000"
        },
        {
          "End": "24200000",
          "Phn": "sil0",
          "Start": "19200000"
        }
      ],
      "RealThType": "3D_standard",
      "Sampling": 24000,
      "SentenceFinal": false,
      "SentenceStart": true,
      "SeqNo": 1,
      "Subtitle": [
        {
          "End": "2700000",
          "PosEnd": "1",
          "PosStart": "0",
          "Start": "200000",
          "Word": "at"
        },
        {
          "End": "5100000",
          "PosEnd": "2",
          "PosStart": "1",
          "Start": "2700000",
          "Word": "user"
        },
        {
          "End": "7300000",
          "PosEnd": "3",
          "PosStart": "2",
          "Start": "5100000",
          "Word": "work"
        },
        {
          "End": "9000000",
          "PosEnd": "4",
          "PosStart": "3",
          "Start": "7300000",
          "Word": "intelligent"
        },
        {
          "End": "11300000",
          "PosEnd": "5",
          "PosStart": "4",
          "Start": "9000000",
          "Word": "can"
        },
        {
          "End": "14000000",
          "PosEnd": "6",
          "PosStart": "5",
          "Start": "11300000",
          "Word": "product"
        },
        {
          "End": "16000000",
          "PosEnd": "7",
          "PosStart": "6",
          "Start": "14000000",
          "Word": "industry"
        },
        {
          "End": "19200000",
          "PosEnd": "9",
          "PosStart": "7",
          "Sta": "16000000",
          "Word": "in,"
        }
      ],
      "ThDim": 52,
      "ThFeat": [], // Content too long, not displayed
      "ThFeatFinal": false,
      "Word": [
        {
          "Phn": "z-ai4",
          "Word": "at"
        },
        {
          "Phn": "r-en2|g-ng1",
          "Word": "manual"
        },
        {
          "Phn": "zh-iii4|n-eng2",
          "Word": "intelligent"
        },
        {
          "Phn": "ch-an3|ie4",
          "Word": "industry"
        },
        {
          "Phn": "zh-ong1",
          "Word": "in"
        }
      ]
    },
    "StreamId": "925999-415e-9bae-3124771b7749"
  }
}
 
 

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

No

Contact Us

Contact our sales team or business advisors to help your business.

Technical Support

Open a ticket if you're looking for further assistance. Our Ticket is 7x24 available.

7x24 Phone Support

Hong Kong, China

+852 800 906 020 (Toll Free)

United States

+1 844 606 0804 (Toll Free)

United Kingdom

+44 808 196 4551 (Toll Free)

Canada

+1 888 605 7930 (Toll Free)

Australia

+61 1300 986 386 (Toll Free)

EdgeOne hotline

+852 300 80699

More local hotlines coming soon