I. Custom Material Self-Check Items
For Avatar customization, you need to submit a real-person video of at least 1 minute in length. Before submission, ensure that you check each of the following self-check items:
1. Video quality: The face should be clear and not blurry, with well-defined edges even when zoomed in. The video should be stable with no shaking.
2. Model performance: The eyes should be looking directly at the camera, with no significant head turns or tilts, and the face should remain unobstructed throughout the video.
3. Filming key points: The video should begin with 1-3 seconds of silence with the mouth closed. The entire video must be unedited, with no frame skips, and the total length must exceed 1 minute.
4. Environmental voice: No audio recording is required. The model can simply keep their mouths naturally closed throughout the video.
5. Filming background: If the cutout is required, a green screen or white screen background must be provided. The green or white screen should completely cover the background and be free of any other objects.
Video Format Requirements:
1. The video size should not exceed 5 GB, with a duration of no less than 1 minute and no more than 10 minutes.
2. The video format should be either MP4 or MOV.
3. The video resolution should be 1080P or 4K (3840 x 2160) with an aspect ratio of 16:9 (or 9:16).
4. The video frame rate should be no less than 25 fps and no more than 60 fps.
5. The person's head in the video must be upright. If the person is positioned horizontally, the video should be rotated to correct the orientation.
II. Filming Guide (Text Version)
Filming Location Setup
1. Location selection
Note:
If there is a need for background replacement in post-production, use a green screen or white screen for filming. If a fixed background is desired, choose an appropriate environment for on-site filming; the background will be retained in the videos generated subsequently.
On-site filming: Choose a well-lit, stable indoor/outdoor location for recording. No specific audio requirements are necessary. (On-site filming means the background is fixed and cannot be replaced with another background in post-production.)
Green screen filming: Record the material in front of a well-lit, stable green screen. No specific audio requirements are necessary.
White screen filming: Record in a well-lit, stable, quiet room with a white wall or white screen. No specific audio requirements are necessary. (White screen filming currently does not support shooting with tables or chairs.)
2. Model clothing and style selection
Model: The model should have well-defined facial features, be attractive, possess a good presence, have clear speech, and act naturally. Preference is given to models with extensive on-camera experience.
Clothing:
On-site filming: No specific color requirements for clothing.
Green screen filming: Avoid reflective materials or clothing with checkered or striped patterns. Do not wear clothing in colors similar to green (such as yellow, green, or yellow-green) to prevent issues with cutting out the background.
White screen filming: Avoid wearing white or similar-colored clothing. White clothing is acceptable if it is not on the body’s edges (e.g., an inner layer under a suit).
Hairstyle: The hairstyle should be neat, avoiding noticeable partings and stray hairs. Avoid wearing dangling earrings. (This requirement applies only to the material for green screen and white screen filming; there are no such restrictions for on-site filming.)
On-site filming example 1:
On-site filming example 2:
Green screen filming example:
White screen filming example:
Dual Avatar example: (Currently in testing, stay tuned for updates)
3. Filming equipment and lighting
Ensure that the camera remains stable without shaking during filming and that the lighting does not change significantly during the recording process.
The recording resolution should be 1080p or higher, and HDR mode should not be enabled during filming.
The green screen should be smooth, without wrinkles, and should fully cover the entire frame.
4. Mobile filming standards
The preferred device for mobile filming is an iPhone. The specific parameters for filming are as follows: use the rear camera in video mode (not cinematic mode), set the zoom to 1x, resolution to 4K, and frame rate to 30 fps. Disable PAL format, HDR mode, and auto FPS. These settings can be adjusted under Settings > Camera > Record Video.
The specific settings are shown in the figure below:
Filming and Recording
1. Video recording position
2. Real-time monitoring and preview during filming
You can use software like OBS for real-time preview of the cutout effect. This allows you to detect issues such as reflective accessories or green spills on the face and clothing in advance. Adjustments can be made on set in real-time to avoid repeated recordings and delays in the customization process.
3. Filming and recording (no audio required)
Shot selection: If the final video will be used in a vertical screen format, it is recommended to shoot in the vertical mode; the same applies to horizontal format. When the entire body is filmed, ensure that the subject appears as large as possible within the frame.
Recording process: Choose any of the following options for recording. During the process, ensure that the eyes do not look sideways. Keep a direct gaze at the camera, and make sure movements stay within the frame.
Option 1: At the start of the recording, the model should keep their mouths closed and perform natural, subtle movements.
Option 2: At the start of the recording, the model should first keep their mouth closed for 1-3 seconds. After that, the model can speak naturally (ensuring that the lip movements are not too exaggerated) while performing natural, subtle movements.
You may stop recording once the duration exceeds 1 minute.
Movement suggestions: While speaking, the model can make neutral and versatile hand gestures. If unsure about gestures, the model can cross their hands in front of them. Ensure that the gestures are small, slow, and gentle, without obstructing the neck or face. Avoid gestures that have specific meanings or directional intent, as they need to be suitable for all types of text. (If the Avatar is ultimately used for real-time interaction scenarios, there are additional requirements for hand movements. See the fourth section of this page for details.)
III. Post-Processing
1. Editing
Trim the beginning and end of the video to remove any unnecessary footage.
Ensure that the frame rate of the editing project matches that of the recorded material to avoid misalignment between the audio and lip movements.
2. Color correction and beautification
Correct any imperfections in the footage to ensure the model looks their best, but retain the natural texture of the model's skin. Avoid making the skin appear too white or too smooth.
3. Audio adjustment
If the audio in the video contains noise, it needs to be removed to ensure good voice quality. The synchronized audio should be clear.
4. Cutout
If you have cutout capabilities, you can perform the cutout process on the original video in advance. The video output options will vary based on the provided video material.
Case 1: Provide a green background video that has already undergone cutout (Video 2 in the figure below). The Avatar side will directly output the video with the green background (Video 3 in the figure below).
Clients can provide a green background video that has undergone the cutout for training. The Avatar side will directly use the green background as the final output video background. This approach offers higher customization efficiency and shorter delivery times. The cutout guide is as follows:
Remove the green screen background and eliminate any green reflections on the actor. Check the video against other background colors to ensure a clean cutout, making sure it can seamlessly adapt to any background.
After the clean cutout, fill the background with a pure green color, #00ff00 (R:0, G:255, B:0).
In the Avatar interaction & broadcasting API, the output video & video stream do not support background replacement, meaning: (1) background replacement is not supported in the output; (2) transparent background webm videos are not supported. After receiving the output video from the Avatar, the client needs to perform further green screen removal in their use cases.
Case 2: In addition to providing the original recorded video, an additional video with an alpha channel (Video 2 in the figure below) is provided. On the Avatar side, background replacement in the output is supported (Video 3 in the figure below)
You need to provide both Video 1: Original Recorded Video (which can also be a processed video) and Video 2: Alpha Channel Video. The resolution and duration of these two videos must be exactly the same.
In this case, the videos and video streams output by the Avatar interaction & broadcasting API support background replacement.
IV. Recording Requirements for Avatars in Interactive Scenes
If the Avatar is intended for real-time interactive scenes, there are additional requirements for the model's hand movements during the 3-5 minute video recording. The specific requirements are as follows:
Each movement should be brief (see Hand Movement Illustration). After the movement is completed, quickly return hands to the starting position (see Hand Position Reset Illustration). There should be at least 5 seconds between movements.
Note: The final Avatar will replicate the movements exactly as they were performed during filming. If no movements are performed throughout the recording, the final Avatar will also have no hand movements.
1. Hand movement illustration:
The model's hands can perform some general movements. After the movement is completed, quickly return the hands to the starting position, with each movement lasting no more than 2 seconds. This segment will be used for speaking mode in the Avatar interactive scenes. The illustration is as follows: 2. Hand position reset illustration:
In this segment, while the model continues speaking naturally, the hands should avoid making any noticeable movements. This segment will be used for the listening/waiting mode in the Avatar interactive scenes. The illustration is as follows: 3. The reference video for recording the demo is as follows:
Was this page helpful?