2D/3D Avatar | Avatar customization | Customize your exclusive digital human avatar. For 3D, you need to purchase the cloud - driven engine additionally, while for 2D it's not necessary. You can choose either customization or rental for purchase. | | |
Renewal of customized Avatar | The customized image has a default validity period of 1 year. This service is specifically for purchase and use after the customized avatar expires. After the rental image expires, you can directly repurchase it without the need to buy this renewal service. | | | |
application scenarios | Conversation interaction | Cloud Rendering | After rendering and generating the avatar through cloud services, the avatar is pushed to the terminal for real - time display. You need to purchase "Cloud Rendering Session Driver Concurrency". It supports APl and SDK and is mutually exclusive with local rendering. | |
| | Conversation interaction | The avatar is rendered and displayed directly on - terminal locally. The cloud service only takes responsibility for pushing conversation content. Local rendering You need to purchase the "Local Rendering Session Driver Usage Package" or a terminal- authorized license. APl and SDK are supported. It is mutually exclusive with cloud rendering. | |
| Audio - video broadcasting | Generate video (including audio) | Generate a video using a specified virtual avatar and voice, following a preset text. You need to purchase the "Video Broadcast Synthesis Hour Package"(which includes audio synthesis capabilities). | |
| | Generate audio only | Generate audio based on the preset text with the specified voice. You need to purchase the"Audio Broadcast Synthesis Hour Package"(This hour package is required when only generating audio). | |
| | Concurrent audio - video broadcasting | Increase the number of concurrent channels to improve the generation efficiency of videos or audio, without affecting the generated results. Optional for purchase. | |
Voice customization | Voice replication | Train and generate a specified voice timbre through the provided voice materials, which can be used in application scenarios. | | |
Renewal of customized voice | The replicated voice has a default validity period of one year. This service is specifically for purchase and use after the replicated voice expires. | | |
Image Type | Definition | Use Cases | Example |
2D Premium | By recording motion materials in a professional studio and training for about two weeks, a digital human can be generated for broadcasting and interactive scenarios. The boutique image can randomly insert specified motions in the text, and the motions are diverse. | Applicable to customers in finance and media who have requirements for the image and motion of digital humans. | ![]() |
2D Small Sample - General Lip Movement | Train a digital human with a real-person video material. The appearance of the digital human is consistent with that of the real person, and the mouth shape will use the general lips and teeth generated by the large model. The requirements for training video materials are lower. For details, see Image Recording Guide - General Lip Shape. | Applicable to customers who have no requirements for the lip shape of digital humans and no good shooting conditions. | ![]() |
2D Small Sample - Exclusive Lip Shape | Train a digital human with a real person's video material. The appearance of the digital human is consistent with that of the real person, and the mouth shape will use the real person's exclusive lip and teeth. The training video material should have no other voices or obvious environmental sounds. For details, see Image Recording Guide - Exclusive Mouth Shape. | Applicable to customers who have requirements for the image replication of digital humans and good shooting conditions. | |
2D Small sample - high-precision version | Train a digital human with a 4K real-person video material. The material collection requirements and the final lips and teeth effect are the same as those of 2D Small Sample (Exclusive Lip Shape). The resolution of the final digital human is upgraded to 4K. For details, see Avatar Recording Guide - High-Precision Version. | Applicable to large conferences, face-to-face dialogue, product launch events, large screen scenarios. | |
2D Small sample - photo avatar | A digital human can be trained with one photo; this version is designed for low-cost and quick turnaround. Generally, it is ready for use within 10 minutes after material submission. | Applicable to pan-internet and entertainment scenarios. | ![]() |
3D Cartoon | Set the facial features, hairstyle, clothing, accessories, etc. of the digital human according to the customer's requirements to complete the original painting. After the customer reviews and finalizes the final image, proceed with model production. After stages such as bone binding, rendering, and UE optimization, a digital human that covers interactive and broadcast scenarios can be output. | Applicable to scenarios where there is an existing 2D mascot image and it is expected to be upgraded to a 3D image to provide services to users. | ![]() |
3D Semi-Realistic | Set the facial features, hairstyle, clothing, accessories, etc. of the digital human according to the customer's requirements to complete the original painting. After the customer reviews and finalizes the final image, proceed with model production. After stages such as bone binding, rendering, and UE optimization, a digital human that covers interactive and broadcast scenarios can be output. | It is suitable for scenarios requiring a certain realistic sense but not high precision requirements, such as news reading and mobile smart customer service scenarios. | ![]() |
3D Realistic | Set the facial features, hairstyle, clothing, accessories, etc. of the digital human according to the customer's requirements to complete the original painting. After the customer reviews and finalizes the final image, proceed with model production. After stages such as bone binding, rendering, and UE optimization, a digital human that covers interactive and broadcast scenarios can be output. | It is suitable for scenarios requiring high realistic sense and high-precision display, such as brand promotion and large screen interaction scenarios. | ![]() |
| 2D Small Sample - General Lip Movement | 2D Small Sample - Exclusive Mouth Shape | 2D Small Sample - High-Precision Version | 2D Small Sample - Photo Avatar |
Recording requirements | Record a video of at least 1 minute. There is no requirement for the sound of the video recording. | Record a video for at least 3 minutes. The recording environment needs to be quiet, and only the sound of the subject being filmed can be recorded. | The recording standard is the same as exclusive lip synchronization. The video resolution must be 4K. | Only one clear front photo of a person is required. |
Delivery cycle | Deliver a demo within 1 day for customer effect confirmation. It can be used after the customer clicks to confirm. | Deliver a demo within 2 days for customer effect confirmation. It can be used after the customer clicks to confirm. | Deliver a demo within 3 days for customer confirmation. It can be used after the customer clicks to confirm. | Available within 10 minutes. |
Finished product effect | The general version uses lip and teeth generated by the big data model. | The exclusive version records one's own lip movement, with better facial resolution. | Based on the effect of exclusive lip synchronization, output in 4K resolution for higher definition. | The photo avatar uses lips and teeth generated by the big data model, and the body pose cannot sway slightly. |
General lip movement vs exclusive lip movement | ![]() | | | |
General lip movement vs photo avatar | ![]() | | | |
Exclusive lip movement vs high-precision version | ![]() | | | |
Image Type | Feature Description | Price |
2D Small Sample - General Lip Movement | Limited to cloud services. Supports text and original sound drive. You can customize a digital human by providing 1-minute video footage, including 1 default voice type. Clothing style, pose, and motion shall be subject to video material data. Only when the material has a green screen solid - color background can the background replacement feature be supported. | 200 USD/each |
2D Small Sample - Exclusive Mouth Shape | Support text-driven or original sound drive. You can customize a broadcasting digital human with one piece of 3-minute video footage, including one default voice type. Clothing style, pose, and motion shall be subject to video material data. Only when the material has a green screen solid - color background can the background replacement feature be supported. | 1,000 USD/each |
2D Small Sample Photo | Support text-driven or original sound drive. A digital human can be trained with one photo, with low cost and fast customization speed. | 2.5 USD/each |
3D Cartoon | Unlimited cloud services, privatize use. Support text/audio-driven/monocular camera video driving, 1 set of clothing, 8 motions, 1 voice type. The accuracy of 3D cartoon supporting assets is Level B. | Contact us to get a quotation. |
3D Semi-Realistic | Unlimited cloud services, privatize use. Customized based on "Yunyi" body mode, support text/audio-driven/monocular camera video driving, 1 set of clothing, 8 motions, 1 voice type. The accuracy of 3D semi-realistic image supporting assets is Grade A. | Contact us to get a quotation. |
3D Realistic | Unlimited cloud services, privatize use. Support text/audio-driven/monocular camera video driving, based on the default version of 3D portrait (refer to the body template of YouYou image), customize face shape, hairstyle, clothing, and motion as required. The complete set of models includes 1 face shape, 1 hairstyle, 1 clothing, and an action library of 8. If additional customization of hairstyle, clothing, motion, and expression is required, extra items need to be added to the cart. The accuracy of 3D realistic supporting assets is Grade S. | Contact us to get a quotation. |
Category | Feature Description | Price |
Voice Reproduce (VRS) - Ultra-fast Version | Input audio data in seconds, and you can instantly own an exclusive AI customized timbre within 10 minutes; mainly used in conjunction with photo avatars, emphasizing immediate usability. See Voice Clone Recording Guide - Ultra-fast Version. Voice Clone Recording Guide - Ultra-fast Version. | 2.5 USD/each |
Voice Reproduce (VRS) - Ultra-fast Version (Minority Language) | 50 USD/each |
Image Type | Feature Description | Price |
2D Small Sample - General Lip Movement | Lease/Clone Digital Human Voice and Video Generation Duration: 10 Hours | 1800 USD/each |
2D Small Sample - Exclusive Mouth Shape | Lease/Clone Digital Human Voice and Video Generation Duration: 10 Hours | 1800 USD/each |
2D Small Sample - Photo Digital Human | Lease/Clone Digital Human Voice and Video Generation Duration: 10 Hours | 1800 USD/each |
3D Realistic | Lease/Clone Digital Human Voice and Video Generation Duration: 10 Hours | 3600 USD/each |
Renewal Type of Hourly Package | Feature Description | Price |
General Audio Broadcast Synthesis | Cloud Service Only, Lease/Clone a Digital Human's Voice Audio Generation Duration of 1 - Hour Package | 10 USD/each |
Image Type | Feature Description | Price |
2D Small Sample - General Lip Movement | Support 2D small sample - general lip movement, with a maximum resolution of 1080p. | 500 USD/month/path |
2D Small Sample - Exclusive Mouth Shape | Support 2D small sample - exclusive lip movement, with a maximum resolution of 1080p. | 500 USD/month/path |
2D Small Sample Photo | Support 2D small sample photos, with a maximum resolution of 1080p. | 500 USD/month/path |
3D Realistic | Support 3D realistic, with a maximum resolution of 1080p. | 800 USD/month/path |
Image Type | Feature Description | Price |
2D Small Sample - General Lip Movement | Support 2D small sample - general lip movement, with a maximum resolution of 1080p. | 500 USD/month/path |
2D Small Sample - Exclusive Mouth Shape | Support 2D small sample - exclusive mouth shape, with a maximum resolution of 1080p. | 500 USD/month/path |
2D Small Sample Photo | Support 2D small sample photos, with a maximum resolution of 1080p. | 500 USD/month/path |
3D Realistic | Support 3D realistic, with a maximum resolution of 1080p. | 800 USD/month/path |
Image Type | Feature Description | Price |
3D Avatar | Supports the renewal of on-shelf services for custom avatars in styles including 3D cartoon, 3D semi-realistic, and 3D realistic. | 84 USD/month/each |
Voice Reproduce (VRS) - Ultra-fast Version (Minority Language) | Supports the renewal of on-shelf services for cloned voices. | 4 USD/month/each |