tencent cloud

All product documents
Tencent Cloud AI Digital Human
Pricing Guide
Last updated: 2025-03-13 16:47:47
Pricing Guide
Last updated: 2025-03-13 16:47:47

Basic Structure


Tencent Cloud AI Digital Human (TCADH) offers three products for sales: Image Procurement, Broadcasting Service, and Interactive Service. Image Procurement is a required option and can be used together with the Broadcasting Service and Interactive Service of the digital human. Note that purchasing Image Procurement, Broadcasting Service, or Interactive Service alone cannot be directly applied to the final application scenarios and a combination purchase is needed.
2D/3D Avatar
Avatar customization
Customize your exclusive digital human avatar. For 3D, you need to purchase the cloud - driven engine additionally, while for 2D it's not necessary. You can choose either customization or rental for purchase.
Renewal of customized Avatar
The customized image has a default validity period of 1 year. This service is specifically for purchase and use after the customized avatar expires. After the rental image expires, you can directly repurchase it without the need to buy this renewal service.
application scenarios
Conversation interaction
Cloud Rendering
After rendering and generating the avatar through cloud services, the avatar is pushed to the terminal for real - time display. You need to purchase "Cloud Rendering Session Driver Concurrency". It supports APl and SDK and is mutually exclusive with local rendering.
Conversation interaction

The avatar is rendered and displayed directly on - terminal locally. The cloud service only takes responsibility for pushing conversation content. Local rendering You need to purchase the "Local Rendering Session Driver Usage Package" or a terminal- authorized license. APl and SDK are supported. It is mutually exclusive with cloud rendering.

Audio - video broadcasting
Generate video (including audio)
Generate a video using a specified virtual avatar and voice, following a preset text. You need to purchase the "Video Broadcast Synthesis Hour Package"(which includes audio synthesis capabilities).
Generate audio only
Generate audio based on the preset text with the specified voice. You need to purchase the"Audio Broadcast Synthesis Hour Package"(This hour package is required when only generating audio).
Concurrent audio - video broadcasting
Increase the number of concurrent channels to improve the generation efficiency of videos or audio, without affecting the generated results. Optional for purchase.
Voice customization
Voice replication
Train and generate a specified voice timbre through the provided voice materials, which can be used in application scenarios.
Renewal of customized voice
The replicated voice has a default validity period of one year. This service is specifically for purchase and use after the replicated voice expires.

Introduction to Image

Introduction to Image Categories
Image Type
Definition
Use Cases
Example
2D Premium
By recording motion materials in a professional studio and training for about two weeks, a digital human can be generated for broadcasting and interactive scenarios. The boutique image can randomly insert specified motions in the text, and the motions are diverse.
Applicable to customers in finance and media who have requirements for the image and motion of digital humans.



2D Small Sample - General Lip Movement
Train a digital human with a real-person video material. The appearance of the digital human is consistent with that of the real person, and the mouth shape will use the general lips and teeth generated by the large model. The requirements for training video materials are lower. For details, see Image Recording Guide - General Lip Shape.
Applicable to customers who have no requirements for the lip shape of digital humans and no good shooting conditions.



2D Small Sample - Exclusive Lip Shape
Train a digital human with a real person's video material. The appearance of the digital human is consistent with that of the real person, and the mouth shape will use the real person's exclusive lip and teeth. The training video material should have no other voices or obvious environmental sounds. For details, see Image Recording Guide - Exclusive Mouth Shape.
Applicable to customers who have requirements for the image replication of digital humans and good shooting conditions.
2D Small sample - high-precision version
Train a digital human with a 4K real-person video material. The material collection requirements and the final lips and teeth effect are the same as those of 2D Small Sample (Exclusive Lip Shape). The resolution of the final digital human is upgraded to 4K. For details, see Avatar Recording Guide - High-Precision Version.
Applicable to large conferences, face-to-face dialogue, product launch events, large screen scenarios.
2D Small sample - photo avatar
A digital human can be trained with one photo; this version is designed for low-cost and quick turnaround. Generally, it is ready for use within 10 minutes after material submission.
Applicable to pan-internet and entertainment scenarios.



3D Cartoon
Set the facial features, hairstyle, clothing, accessories, etc. of the digital human according to the customer's requirements to complete the original painting. After the customer reviews and finalizes the final image, proceed with model production. After stages such as bone binding, rendering, and UE optimization, a digital human that covers interactive and broadcast scenarios can be output.
Applicable to scenarios where there is an existing 2D mascot image and it is expected to be upgraded to a 3D image to provide services to users.



3D Semi-Realistic
Set the facial features, hairstyle, clothing, accessories, etc. of the digital human according to the customer's requirements to complete the original painting. After the customer reviews and finalizes the final image, proceed with model production. After stages such as bone binding, rendering, and UE optimization, a digital human that covers interactive and broadcast scenarios can be output.
It is suitable for scenarios requiring a certain realistic sense but not high precision requirements, such as news reading and mobile smart customer service scenarios.



3D Realistic
Set the facial features, hairstyle, clothing, accessories, etc. of the digital human according to the customer's requirements to complete the original painting. After the customer reviews and finalizes the final image, proceed with model production. After stages such as bone binding, rendering, and UE optimization, a digital human that covers interactive and broadcast scenarios can be output.
It is suitable for scenarios requiring high realistic sense and high-precision display, such as brand promotion and large screen interaction scenarios.




Image comparison


2D Small Sample - General Lip Movement
2D Small Sample - Exclusive Mouth Shape
2D Small Sample - High-Precision Version
2D Small Sample - Photo Avatar
Recording requirements
Record a video of at least 1 minute. There is no requirement for the sound of the video recording.
Record a video for at least 3 minutes. The recording environment needs to be quiet, and only the sound of the subject being filmed can be recorded.
The recording standard is the same as exclusive lip synchronization. The video resolution must be 4K.
Only one clear front photo of a person is required.
Delivery cycle
Deliver a demo within 1 day for customer effect confirmation. It can be used after the customer clicks to confirm.
Deliver a demo within 2 days for customer effect confirmation. It can be used after the customer clicks to confirm.
Deliver a demo within 3 days for customer confirmation. It can be used after the customer clicks to confirm.
Available within 10 minutes.
Finished product effect
The general version uses lip and teeth generated by the big data model.
The exclusive version records one's own lip movement, with better facial resolution.
Based on the effect of exclusive lip synchronization, output in 4K resolution for higher definition.
The photo avatar uses lips and teeth generated by the big data model, and the body pose cannot sway slightly.
General lip movement vs exclusive lip movement



General lip movement vs photo avatar



Exclusive lip movement vs high-precision version




Price Description

Avatar Procurement

It refers to purchase of Avatar's image, which can be divided into Image Rental and Image Customization. Additionally, it supports Voice Clone.
Avatar Rental: Rent an avatar from the Public Basic Image Library. During the rental period, it is a non-exclusive rental. You only have the usage right of the avatar. The ownership of the avatar still belongs to Tencent, and Tencent has the right to secondary lease the avatar. It is suitable for customers who do not have high requirements for exclusive avatars and whose businesses are in the initial stage.
Avatar customization: Customize the digital human's image through recording training or modeling. It is suitable for customers who have requirements for their own images and need to own the ownership of the images.
Voice Replication: Replicate a specific voice through the collection and training of speech data.

1. Avatar customization
Note: The avatar customization quota takes effect immediately after purchase and is valid for one year.
Image Type
Feature Description
Price
2D Small Sample - General Lip Movement
Limited to cloud services. Supports text and original sound drive.
You can customize a digital human by providing 1-minute video footage, including 1 default voice type.
Clothing style, pose, and motion shall be subject to video material data.
Only when the material has a green screen solid - color background can the background replacement feature be supported.
200 USD/each

2D Small Sample - Exclusive Mouth Shape
Support text-driven or original sound drive.
You can customize a broadcasting digital human with one piece of 3-minute video footage, including one default voice type.
Clothing style, pose, and motion shall be subject to video material data.
Only when the material has a green screen solid - color background can the background replacement feature be supported.
1,000 USD/each
2D Small Sample Photo
Support text-driven or original sound drive.
A digital human can be trained with one photo, with low cost and fast customization speed.
2.5 USD/each
3D Cartoon
Unlimited cloud services, privatize use.
Support text/audio-driven/monocular camera video driving, 1 set of clothing, 8 motions, 1 voice type.
The accuracy of 3D cartoon supporting assets is Level B.
Contact us to get a quotation.
3D Semi-Realistic
Unlimited cloud services, privatize use.
Customized based on "Yunyi" body mode, support text/audio-driven/monocular camera video driving, 1 set of clothing, 8 motions, 1 voice type.
The accuracy of 3D semi-realistic image supporting assets is Grade A.
Contact us to get a quotation.
3D Realistic
Unlimited cloud services, privatize use.
Support text/audio-driven/monocular camera video driving, based on the default version of 3D portrait (refer to the body template of YouYou image), customize face shape, hairstyle, clothing, and motion as required. The complete set of models includes 1 face shape, 1 hairstyle, 1 clothing, and an action library of 8.
If additional customization of hairstyle, clothing, motion, and expression is required, extra items need to be added to the cart.
The accuracy of 3D realistic supporting assets is Grade S.
Contact us to get a quotation.
2. Voice replication
The quota of voice replication takes effect immediately after purchase and is valid for one year.
Category
Feature Description
Price
Voice Reproduce (VRS) - Ultra-fast Version
Input audio data in seconds, and you can instantly own an exclusive AI customized timbre within 10 minutes; mainly used in conjunction with photo avatars, emphasizing immediate usability. See Voice Clone Recording Guide - Ultra-fast Version. Voice Clone Recording Guide - Ultra-fast Version.
2.5 USD/each
Voice Reproduce (VRS) - Ultra-fast Version (Minority Language)
The same feature as above supports multiple languages. For details, seeAppendix 4 - Language List.
50 USD/each

Broadcasting Service

It implies the capability of providing audio and video broadcasts via Avatar. In this scenario, services are provided in three categories: Video Generation Service - Hourly Package, Audio Generation Service - Hourly Package, and Video Generation Concurrency Service.Video Generation Service - Hourly Package and Concurrency are charged based on the image type, and packages for different image types are not interchangeable.
Video Generation Service - Hourly Package: A video duration resource package that can be used to produce broadcast audio and video.
Audio Generation Service - Hourly Package: An audio duration resource package that can be used to produce broadcast audio and video.
Video Generation Concurrent Service: Supports the concurrent number of videos that can be generated online simultaneously.
1. Video Generation Service - Hourly Package
Image Type
Feature Description
Price
2D Small Sample - General Lip Movement
Lease/Clone Digital Human Voice and Video Generation Duration: 10 Hours
1800 USD/each
2D Small Sample - Exclusive Mouth Shape
Lease/Clone Digital Human Voice and Video Generation Duration: 10 Hours
1800 USD/each
2D Small Sample - Photo Digital Human
Lease/Clone Digital Human Voice and Video Generation Duration: 10 Hours
1800 USD/each
3D Realistic
Lease/Clone Digital Human Voice and Video Generation Duration: 10 Hours
3600 USD/each
2. Audio Generation Service - Hour
Renewal Type of Hourly Package
Feature Description
Price
General Audio Broadcast Synthesis
Cloud Service Only, Lease/Clone a Digital Human's Voice Audio Generation Duration of 1 - Hour Package
10 USD/each
3. Broadcast Concurrency
Image Type
Feature Description
Price
2D Small Sample - General Lip Movement
Support 2D small sample - general lip movement, with a maximum resolution of 1080p.
500 USD/month/path
2D Small Sample - Exclusive Mouth Shape
Support 2D small sample - exclusive lip movement, with a maximum resolution of 1080p.
500 USD/month/path
2D Small Sample Photo
Support 2D small sample photos, with a maximum resolution of 1080p.
500 USD/month/path
3D Realistic
Support 3D realistic, with a maximum resolution of 1080p.
800 USD/month/path

Interaction Service (Cloud Rendering Session - Driven Concurrency)

It refers to the capability provided by Avatar for voice interaction, commonly used in intelligent customer service, digital human live streaming, and other scenarios. This scenario provides services for interactive concurrency, specifically referring to the number of concurrent online interactions and stream building. Interactive concurrency is provided separately based on the image type, and different image types do not support mixed use.
Image Type
Feature Description
Price
2D Small Sample - General Lip Movement
Support 2D small sample - general lip movement, with a maximum resolution of 1080p.
500 USD/month/path
2D Small Sample - Exclusive Mouth Shape
Support 2D small sample - exclusive mouth shape, with a maximum resolution of 1080p.
500 USD/month/path
2D Small Sample Photo
Support 2D small sample photos, with a maximum resolution of 1080p.
500 USD/month/path
3D Realistic
Support 3D realistic, with a maximum resolution of 1080p.
800 USD/month/path

Avatar Customization Service in Operation

It can be used to extend the effective time of image customization and Voice Clone.
Image Type
Feature Description
Price
3D Avatar
Supports the renewal of on-shelf services for custom avatars in styles including 3D cartoon, 3D semi-realistic, and 3D realistic.
84 USD/month/each
Voice Reproduce (VRS) - Ultra-fast Version (Minority Language)
Supports the renewal of on-shelf services for cloned voices.
4 USD/month/each

Privatized Service

If you need to purchase a privatized service, please contact your business manager for a quotation.


Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback

Contact Us

Contact our sales team or business advisors to help your business.

Technical Support

Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

7x24 Phone Support
Hong Kong, China
+852 800 906 020 (Toll Free)
United States
+1 844 606 0804 (Toll Free)
United Kingdom
+44 808 196 4551 (Toll Free)
Canada
+1 888 605 7930 (Toll Free)
Australia
+61 1300 986 386 (Toll Free)
EdgeOne hotline
+852 300 80699
More local hotlines coming soon