Issue Description | Answer | Optimization Suggestions | Example |
What impact will there be if there are edits or frame skips in the middle of the video material? | The generated Avatar will experience frame skips at the same position. During the demo export, you can manually select segments, but you need to ensure there is at least 1 minute of continuous footage. | The video can only be edited by trimming the beginning and the end. The provided material must be continuous and uninterrupted footage. | |
What impact will there be if beauty effects in the video material cause shaking (face, waist, etc.)? | The generated Avatar will experience shaking at the same position. During the demo export, you can manually select segments, but you need to ensure there is at least 1 minute of continuous footage. | If the video requires beautification, it is recommended to check the processed footage for any shaking. | |
What impact will there be if the video material shows the person turning their head at a large angle, with a pronounced side profile, or looking up and down? | 1. This can cause face detection to fail during training, resulting in an effective video duration of less than 3 minutes, which may negatively impact the final lip-syncing effect. 2. The lip-syncing effect will be significantly worse before and after large head turns. | It is recommended that the person’s head does not make large turns. When filming at an angled sitting position, ensure that the mouth is fully visible. | |
What impact will there be if the face or chin is obstructed by hands or other objects in the video material? | 1. Obstructions to the face can cause the training to end prematurely, resulting in an effective video duration of less than 3 minutes, which may negatively impact the final lip-syncing effect. 2. If the chin is obstructed, the resulting Avatar may have missing elements where the chin is covered, such as the hand being overlapped by the mouth during lip-syncing. | Ensure that hands do not enter the head area while making movements; the face should remain fully visible without interruptions throughout the video. | |
Video duration is too short, less than 3 minutes. Video duration is insufficient, extended by repeating and stitching segments together. What impact will there be if the video repeats reading the same text segment? | Insufficient variety in lip movements can significantly affect the accuracy of the lip-syncing effect. | It is recommended that the video recording meets a duration of 3 to 5 minutes. | |
What impact will camera shaking have? | The generated Avatar may also exhibit shaking. You will need to manually select stable, continuous segments for training. If there is no stable segment of at least one minute, re-recording will be necessary. | Ensure that the camera remains fixed and stable throughout the entire recording, with no changes in the camera position. | |
What impact will it have if clothes, chairs, tables, and other accessories are the same color as the green screen? | 1. It can easily result in poor segmentation effects. 2. During green screen removal, accessories of the same color may be affected, leading to color changes. | Avoid wearing green-toned clothing and using green-toned props. | |
How do I handle large reflections of green light from characters, tables, and props? | During training, applying a higher level of green screen removal may cause color distortion in areas where the green is removed. If a high level of green screen removal is not applied, the resulting Avatar may also exhibit green spill effects. | Optimizing the filming process. | |
How do I handle the reflections and green tints in my glasses? | Green areas on glasses may be detected and segmented as background. | During filming, make appropriate adjustments to avoid green reflections in the eyeglass lenses. | |
How can I reduce the reflection of green light onto a person? | 1. Choose Oxford fabric for the green screen and install backlighting to ensure no light seeps through from behind. 2. Keep the model at least 1.5 meters away from the green screen. 3. Surround the model with lights to eliminate green tints on the contours. 4. Light the green screen and model separately. 5. Use black cloths to cover surrounding green areas that are not needed for the shot, reducing diffused light. | | |
What impact will there be if there are multiple voices speaking in the video? | It can interfere with lip movement recognition, negatively affecting the quality of the lip-syncing. | Ensure a quiet environment during recording. If this is not possible, adjust the microphone's pickup range to minimize the capture of other voices or apply post-processing to the audio. | |
What impact will there be if a blue screen is used for recording? | Blue light reflected on the body cannot be removed; you will need to manually key out the blue light and provide the processed material along with the channel file. This will allow for normal production. | Prepare a green screen in advance to avoid using a blue screen for recording. | |
What impact will there be if the gaze does not focus on the camera and is unsteady? | The generated Avatar's gaze will also appear unsteady and unfocused. | It is recommended to maintain direct eye contact with the camera throughout the entire recording. | |
What impact will there be if the recording does not start with 3 seconds of silence? | 1. It may cause the generated Avatar's mouth to remain open during silent moments. 2. During training, you can manually select frames where the mouth is closed between speech segments to use as silent frames, but this may result in unnatural transitions. | It is recommended to start with 1-3 seconds of silence, keeping the mouth in a closed position. | |
What impact will there be if hair partings and stray hairs are prominent? | 1. Stray hairs outside the main area of the hair may disappear during segmentation. 2. A prominent parting may cause the outer hair to disappear or flicker, and the parting itself may not be segmented properly. | Before recording, fix your hair to ensure that no parting is visible against the green screen and minimize stray hairs as much as possible. | |
What impact will there be if earrings are worn? | 1. If the background around the earrings is a green screen, the earrings may disappear or flicker after segmentation. 2. If the area around the earrings is covered by hair, there will be no impact, and the segmentation will proceed normally. | It is recommended to avoid wearing earrings. If earrings are worn, ensure that they remain within the area covered by hair in the frame. | |
What impact will there be if metal accessories, such as jewelry, buttons, watches, or necklaces, are worn? | 1. They may reflect the green screen, and after green screen removal, this could result in color changes or flickering. 2. A necklace that is too thick or positioned near the head area may cause face recognition to fail. | It is recommended to avoid wearing metal accessories. If you choose to wear them, ensure that the accessories minimize green screen reflections. | |
What impact will there be if there are small closed gaps under the arms or between the legs? | In cases where the gaps are too dark or extremely small, segmentation may not be accurate. | Adjust the pose during filming to either increase the gap or ensure no gaps are visible. | |
What impact will there be if movements extend beyond the frame? | When the generated Avatar performs the same action, the parts that go outside the frame will disappear. During training, you will need to manually select continuous segments where the actions stay within the frame. | Keep movements within the frame, avoiding any actions that extend beyond the boundaries of the screen. | |
What impact will there be if there are many specific actions with low reusability (e.g., gestures like showing numbers 1, 2, 3)? | When the generated Avatar performs random actions, these specific gestures may not match the text content, leading to unnatural results. | When performing actions, try to use more general, versatile gestures. | |
What impact will there be if the video has significant background noise? | It may negatively impact the accuracy of the lip-syncing effect. | It is recommended to use a microphone for recording. You can also lower the recording volume and increase the speaking volume accordingly. | |
What should be considered when recording a seated posture using a table? | Ensure that the table does not reflect green from the green screen and that it remains stable without any shaking. | | |
Can side-profile recording be done? | The facial features and mouth must remain fully visible at all times. Avoid turning too far to the side. | | |
Is it necessary to record audio when capturing the Avatar? | Yes, it is absolutely necessary, and audio and video must be synchronized. The algorithm requires audio and video to form a paired set for lip-sync training, so the audio corresponding to the video is essential. | | |
What should be done if a fill light or other objects appear in the frame? | 1. Ensure that the person is fully within the green screen area. 2. Ensure that any unnecessary objects do not overlap or intersect with the body in the frame, maintaining a clear separation. | | |
What should be done if the text is read incorrectly? | 1. Mispronounced words during the video recording process can be ignored. 2. If a word is mispronounced during audio recording, pause for two seconds and then re-read the sentence. | | |
Can the client's own text be used? | Yes, and it is recommended that the client reads text that aligns with the type of content being produced. | | |
Issue Description | Answer | Optimization Suggestions |
What impact do reverb and noise have? | This can easily lead to poor results after voice training. | 1. Choose a room with minimal echo and good voiceproofing (e.g., a bedroom) for recording. 2. Use a microphone for recording, and adjust the microphone settings to minimize noise pickup. 3. Enhance the demo quality by applying post-production audio processing to reduce reverb and noise. |
What impact will there be if the Mandarin pronunciation is not standard? | After voice training, the pronunciation may sound unusual. | It is recommended to use standard Mandarin with clear enunciation. |
What impact will there be if the ASR segmentation contains fewer than 50 sentences? | If the overall recording duration is short, resulting in fewer sentences, it can severely affect the voice quality, and additional recording will be necessary. | Follow the voice recording guidelines to ensure the recording contains at least 100 segments and exceeds 10 minutes in duration. |
What happens if the audio amplitude is too high (popping)? | The trained voice may also exhibit the same pronunciation issues. | You can debug the recording equipment to improve the audio quality or provide post-processed audio material. |
What impact will there be if the audio contains noticeable saliva voice or breathing noises? | The trained voice may also exhibit the same pronunciation issues. | Ensure that you avoid these issues during recording, or provide post-processed audio material. |
How should I choose a proper location for audio recording? | It is recommended to record in a quiet location with plenty of soft materials, such as in bedrooms or cars. | |
Was this page helpful?