Video Production API - Basic Edition

API Description
Use the SSML text and the digital human for video production. The final product video and subtitle file are returned through the Audio and Video Production Progress Query API.
Note:
Defining advanced parameters like anchor position is not supported. To use these features, switch to the Video Production API - Advanced Edition.
Calling Protocol
HTTPS + JSON
POST     /v2/ivh/videomaker/broadcastservice/videomake
Header    Content-Type: application/json;charset=utf-8
Request Parameters
Parameters
Type
Mandatory
Description
VirtualmanKey
string
Yes
Define the broadcasting role, clothing, pose, and resolution. The parameter is an enumerated value.
Note:
You can query through the Query Image Asset Information - Query All Images under Anchors API.
InputSsml
string
Yes
The text content to be broadcast supports SSML tags. Refer to the Digital Human SSML Markup Language Specification for supported tag types and examples for tag usage. The content must not include line breaks, and symbols must be escaped. The upper limit is 20,000 characters (counted as Unicode characters). This field is required if DriverType is empty or set to Text.
SpeechParam
object
Yes
Define the detailed parameters of the audio.
SpeechParam.Speed
float
Yes
The speech rate (1.0 is normal speed, range [0.5-1.5]. A value of 0.5 indicates the slowest speed and a value of 1.5 indicates the fastest speed. Speech rate control is not effective when DriverType is set to audio-driven type).
SpeechParam.TimbreKey
string
No
Timbre key, and the avatar's own timbre is used by default.
SpeechParam.Volume
int
No
Volume level, ranging from 0 to 10. The default is 0, which represents normal volume. The higher the values, the louder the volume.
SpeechParam.EmotionCategory
string
No
Controls the emotion of the synthesized audio, supported only for multi-emotion timbres. See the Personal Asset Management API Paginated Query Timbre List for available values.
SpeechParam.EmotionIntensity
int
No
Controls the intensity of the synthesized audio emotion, with a range of [50,200]. This is only effective when EmotionCategory is not empty.
VideoParam
object
No
Define the detailed parameters for video synthesis.
VideoParam.Format
string
No
Video output format, and it is TransparentWebm by default.
TransparentWebm: WebM format video with a transparent background
GreenScreenMp4: MP4 format video with a green screen background
CallbackUrl
string
No
When the user adds a callback URL, the video production results will be sent as a POST request in a fixed format to that URL. For the fixed format, see Appendix II: Invocationback Request Body Format. Note:
1. The InvocationbackUrl length must be less than 1000 characters.
2. Only one request can be sent. Regardless of the issue causing the request to fail, it cannot be resent.
DriverType
string
No
Driver type, and it is Text by default.
1. Text: It is text-driven and requires the InputSsml field to be filled.
2. OriginalVoice: It is original voice audio-driven and requires the InputAudioUrl field to be filled.
3. ModulatedVoice: It is modulated voice audio-driven and can specify timbre using Speech.TimbreKey. If not specified, the anchor's default timbre will be used.
InputAudioUrl
string
No
The audio URL to drive the digital human. This field is required when DriverType is OriginalVoice or ModulatedVoice.
Audio format requirements:
1. For small sample avatars, the duration should not exceed 60 minutes and not be less than 0.5 seconds. For non-avatars, the duration should not exceed 10 minutes and not be less than 0.5 seconds.
2. Supported formats: WAV, MP3, WMA, M4A, and AAC.
VideoStorageS3Url
string
No
A URL with an authenticated S3 protocol for storage can be provided, and the finished video will be uploaded to that URL.
SubtitleStorageS3Url
string
No
A URL with an authenticated S3 protocol for storage can be provided, and the finished subtitles will be uploaded to that URL.
ConcurrencyType
string
No
The concurrency type used for video production tasks defaults to prioritizing dedicated concurrency, followed by shared policies.
1. Exclusive: Dedicated concurrency. If no dedicated concurrency is available, the task submission will fail.
2. Shared: Shared concurrency
Response Parameter
Parameters
Type
Mandatory
Description
TaskId
string
Yes
The video production task ID. Use the TaskId to access the Audio and Video Production Progress Query API to obtain the production progress and results.
Request Sample
{
    "Header": {},
    "Payload": {
        "VirtualmanKey": "123",
        "InputSsml": "Hello, I am the virtual <phoneme alphabet=\\"py\\" ph=\\"fu4\\">anchor</phoneme>",
        "SpeechParam": {
            "Speed": 1.0
        },
        "VideoParam": {
            "Format": "GreenScreenMp4"
        }
    }
}
Response Sample
{
    "Header": {
        "Code": 0,
        "DialogID": "",
        "Message": "",
        "RequestID": "123",
    },
    "Payload": {
        "TaskId": "123"
    }
}
 
 

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

Parameters	Type	Mandatory	Description
VirtualmanKey	string	Yes	Define the broadcasting role, clothing, pose, and resolution. The parameter is an enumerated value. Note: You can query through the Query Image Asset Information - Query All Images under Anchors API.
InputSsml	string	Yes	The text content to be broadcast supports SSML tags. Refer to the Digital Human SSML Markup Language Specification for supported tag types and examples for tag usage. The content must not include line breaks, and symbols must be escaped. The upper limit is 20,000 characters (counted as Unicode characters). This field is required if DriverType is empty or set to Text.
SpeechParam	object	Yes	Define the detailed parameters of the audio.
SpeechParam.Speed	float	Yes	The speech rate (1.0 is normal speed, range [0.5-1.5]. A value of 0.5 indicates the slowest speed and a value of 1.5 indicates the fastest speed. Speech rate control is not effective when DriverType is set to audio-driven type).
SpeechParam.TimbreKey	string	No	Timbre key, and the avatar's own timbre is used by default.
SpeechParam.Volume	int	No	Volume level, ranging from 0 to 10. The default is 0, which represents normal volume. The higher the values, the louder the volume.
SpeechParam.EmotionCategory	string	No	Controls the emotion of the synthesized audio, supported only for multi-emotion timbres. See the Personal Asset Management API Paginated Query Timbre List for available values.
SpeechParam.EmotionIntensity	int	No	Controls the intensity of the synthesized audio emotion, with a range of [50,200]. This is only effective when EmotionCategory is not empty.
VideoParam	object	No	Define the detailed parameters for video synthesis.
VideoParam.Format	string	No	Video output format, and it is TransparentWebm by default. TransparentWebm: WebM format video with a transparent background GreenScreenMp4: MP4 format video with a green screen background
CallbackUrl	string	No	When the user adds a callback URL, the video production results will be sent as a POST request in a fixed format to that URL. For the fixed format, see Appendix II: Invocationback Request Body Format. Note: 1. The InvocationbackUrl length must be less than 1000 characters. 2. Only one request can be sent. Regardless of the issue causing the request to fail, it cannot be resent.
DriverType	string	No	Driver type, and it is Text by default. 1. Text: It is text-driven and requires the InputSsml field to be filled. 2. OriginalVoice: It is original voice audio-driven and requires the InputAudioUrl field to be filled. 3. ModulatedVoice: It is modulated voice audio-driven and can specify timbre using Speech.TimbreKey. If not specified, the anchor's default timbre will be used.
InputAudioUrl	string	No	The audio URL to drive the digital human. This field is required when DriverType is OriginalVoice or ModulatedVoice. Audio format requirements: 1. For small sample avatars, the duration should not exceed 60 minutes and not be less than 0.5 seconds. For non-avatars, the duration should not exceed 10 minutes and not be less than 0.5 seconds. 2. Supported formats: WAV, MP3, WMA, M4A, and AAC.
VideoStorageS3Url	string	No	A URL with an authenticated S3 protocol for storage can be provided, and the finished video will be uploaded to that URL.
SubtitleStorageS3Url	string	No	A URL with an authenticated S3 protocol for storage can be provided, and the finished subtitles will be uploaded to that URL.
ConcurrencyType	string	No	The concurrency type used for video production tasks defaults to prioritizing dedicated concurrency, followed by shared policies. 1. Exclusive: Dedicated concurrency. If no dedicated concurrency is available, the task submission will fail. 2. Shared: Shared concurrency

tencent cloud

Sign Up

Log in

Compute

Microservice

Data Migration

Database SaaS Tool

Data Security

Application Security

Big Data

Image Creation

Internet of Things

Stream Services

Cloud Real-time Rendering

Management and Audit Tools

Edge Computing

Serverless

Relational Database

Networking

Business Security

Domains & Websites

Face Recognition

AI Platform Service

Middleware

Media On-Demand

Game Services

Developer Tools

Container

Essential Storage Service

Enterprise Distributed DBMS

CDN and Acceleration

Security Services

Enterprise Applications

Tencent Big Model

Natural Language Processing

Communication

Media Process Services

Education Sevices

Monitor and Operation

Distributed cloud

Data Process and Analysis

NoSQL Database

Network Security

Cloud Security

Office Collaboration

Voice Technology

Optical Character Recognition

Interactive Video Services

Media SDK

Cloud Resource Management

More

API Description

Calling Protocol

Request Parameters

Response Parameter

Request Sample

Response Sample