Audio Production API

DocumentationTencent Cloud AI Digital HumanServer API IntegrationVideo Generation Service API DocumentationAudio Production API

Audio Production API

Download PDF

Last updated: 2024-07-18 18:18:42

Audio Production API

Last updated: 2024-07-18 18:18:42

Download PDF

API Description
To preview the input text, you can query the timbre to be previewed through the Query Supported Timbres for VirtualmanKey API. Some avatars do not support changing the timbre.
Calling Protocol
HTTPS + JSON
POST     /v2/ivh/videomaker/broadcastservice/tts
Header    Content-Type: application/json;charset=utf-8
Request Parameters
Parameters
Type
Mandatory
Description
TimbreKey
string
No
Timbre key. When VirtualmanKey is empty, TimbreKey cannot be empty.
VirtualmanKey
string
No
Define information such as the role, clothing, pose, and resolution for the broadcast. The parameter is an enumerated value. When TimbreKey is empty, VirtualmanKey cannot be empty. By default, the first matching timbre for the avatar will be selected to produce the audio.
InputSsml
string
Yes
Text content to be broadcast which supports SSML tags with an upper limit of 20,000 characters (counted as Unicode characters)
Speed
float
Yes
Speech rate (1.0 is normal speed, with a range of [0.5 to 1.5]. A value of 0.5 represents the slowest speed, while 1.5 represents the fastest speed).
AudioStorageS3Url
string
No
A URL with an authenticated S3 protocol for storage can be provided, and the finished audio will be uploaded to that URL.
SampleRate
int
No
Sample rate which supports 24000 (24k) and 16000 (16k) with 24000 (24k) as the default
Codec
string
No
Audio format which supports mp3 and wav with mp3 as the default
SentenceMaxWords
int
No
The upper limit number of characters per sentence which ranges from 0 to 999. If 0 or nothing is provided, the default value is 30.
SentenceDisplayPunctuation
string
No
Punctuation marks to be displayed within sentences. Special character "0" means that no punctuation marks will be displayed, while "1" (default value) means that all punctuation marks will be displayed. You can also customize which punctuation marks to display.
SentenceSplitPunctuation
string
No
Punctuation marks for sentence segmentation with the default values being . ; ?! ... !?
Volume
int
No
Volume level, ranging from 0 to 10. The default is 0, which represents normal volume. The higher the values, the louder the volume.
EmotionCategory
string
No
Control the emotion of synthetic audio, and only multi-emotion timbres are supported for the use. See the Personal Asset Management API 4.5 Timbre List API for optional values.
EmotionIntensity
int
No
Controls the intensity of the synthesized audio emotion, with a range of [50,200]. This is only effective when EmotionCategory is not empty.
Response Parameter
Parameters
Type
Mandatory
Description
TaskId
string
Yes
The task ID for audio production. Use the taskId to access the <Audio and Video Production Progress Query API> to obtain the production progress and download link for the video.
Request Sample
{
    "Header": {},
    "Payload": {
        "VirtualmanKey": "123",
        "InputSsml": "Hello, virtual anchor",
        "Speed": 1
    }
}
Response Sample
{
    "Header": {
        "Code": 0,
        "DialogID": "",
        "Message": "",
        "RequestID": "123"
    },
    "Payload": {
        "TaskId": "123"
    }
}
 
 
 

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

Parameters	Type	Mandatory	Description
TimbreKey	string	No	Timbre key. When VirtualmanKey is empty, TimbreKey cannot be empty.
VirtualmanKey	string	No	Define information such as the role, clothing, pose, and resolution for the broadcast. The parameter is an enumerated value. When TimbreKey is empty, VirtualmanKey cannot be empty. By default, the first matching timbre for the avatar will be selected to produce the audio.
InputSsml	string	Yes	Text content to be broadcast which supports SSML tags with an upper limit of 20,000 characters (counted as Unicode characters)
Speed	float	Yes	Speech rate (1.0 is normal speed, with a range of [0.5 to 1.5]. A value of 0.5 represents the slowest speed, while 1.5 represents the fastest speed).
AudioStorageS3Url	string	No	A URL with an authenticated S3 protocol for storage can be provided, and the finished audio will be uploaded to that URL.
SampleRate	int	No	Sample rate which supports 24000 (24k) and 16000 (16k) with 24000 (24k) as the default
Codec	string	No	Audio format which supports mp3 and wav with mp3 as the default
SentenceMaxWords	int	No	The upper limit number of characters per sentence which ranges from 0 to 999. If 0 or nothing is provided, the default value is 30.
SentenceDisplayPunctuation	string	No	Punctuation marks to be displayed within sentences. Special character "0" means that no punctuation marks will be displayed, while "1" (default value) means that all punctuation marks will be displayed. You can also customize which punctuation marks to display.
SentenceSplitPunctuation	string	No	Punctuation marks for sentence segmentation with the default values being . ; ?! ... !?
Volume	int	No	Volume level, ranging from 0 to 10. The default is 0, which represents normal volume. The higher the values, the louder the volume.
EmotionCategory	string	No	Control the emotion of synthetic audio, and only multi-emotion timbres are supported for the use. See the Personal Asset Management API 4.5 Timbre List API for optional values.
EmotionIntensity	int	No	Controls the intensity of the synthesized audio emotion, with a range of [50,200]. This is only effective when EmotionCategory is not empty.

tencent cloud

New User Offers

Next-Generation CDN：EdgeOne

Elasticsearch Service free trial

Free Tier

Tencent Cloud Startup Program

Special Offers

Lighthouse Special Offers

Cloud Object Storage Special Offers

Featured Products

New Products

Education

Tencent Cloud Online Education Solutions

Gaming

Gaming Solution

Game Media Solutions

E-commerce

E-commerce retail solutions

Audio & Video

Audio/Video Solution

LVB Recording Solution

Interactive Classroom Solution

Interactive Live Streaming Solution

Audio Chat Social Networking Solution

Financial Services

Financial Services Solution

Compute

Cloud Virtual Machine

Auto Scaling

Batch Compute

CVM Dedicated Host

Database

TencentDB for MySQL

TencentDB for Redis®

TencentDB for CTSDB

TDSQL for MySQL

Data Transfer Service

TencentDB for MongoDB

TencentDB for PostgreSQL

TencentDB for SQL Server

Video Service

Cloud Streaming Services

Video on Demand

Media Processing Service

Cloud Application Rendering

Cloud Contact Center

Game Multimedia Engine

Chat

Real-time Communication

Tencent Effect SDK

AI and Machine Learning

Image Creation Large Model

Face Fusion

eKYC

Optical Character Recognition

Video Creation Large Model

Industry Applications

Tencent HealthCare Omics Platform

Container and Middleware

TDMQ for CKafka

Serverless Cloud Function

Tencent Kubernetes Engine

Tencent Kubernetes Engine for Serverless

Networking

Cloud Load Balancer

Virtual Private Cloud

Direct Connect

Cloud Connect Network

NAT Gateway

VPN Connection

Bandwidth Package

Anycast Internet Acceleration

Elastic Network Interface

Flow Logs

Global Application Acceleration Platform

Security

Captcha

Cloud Workload Protection Platform

Data Security Governance Center

Key Management Service