Feature | Description |
Increased compatibility | A source video can be transcoded to formats (such as MP4) that are compatible with more types of devices for smooth playback. |
Increased bandwidth adaptability | A source video can be transcoded for output in multiple definitions such as LD, SD, HD, and UHD. End users can select the most appropriate bitrate depending on their network conditions. |
Improved playback efficiency | The moov atom can be moved from the end of an MP4 file to the beginning of the file, allowing the video to be played before it is entirely downloaded. |
Reduced bandwidth consumption | With a more advanced codec (such as H.265), the bitrate of a video can be substantially reduced while retaining the original quality, which helps reduce the bandwidth consumption. |
Category | Parameter | Description |
Input | Container format | 3GP, AVI, FLV, MP4, M3U8, MPG, ASF, WMV, MKV, MOV, TS, WebM, MXF. |
| Video codec | AV1, AVS2, H.264/AVC, H.263, H.263+, H.265, MPEG-1, MPEG-2, MPEG-4, MJPEG, VP8, VP9, RealVideo, Windows Media Video, QuickTime. |
| Audio codec | AAC, ADPCM, AMR, DSD, MP1, MP2, MP3, PCM, RealAudio, Windows Media Audio, Vorbis, AC-3. |
Output | Container format | Video: FLV, MP4, HLS (M3U8 + TS), MXF. |
| | Audio: MP3, MP4, Ogg, FLAC, M4A. |
| | Image: GIF, WebP. |
| Video codec | H.264/AVC, H.265/HEVC, AV1. |
| Audio codec | MP3, AAC, FLAC, MP2, Vorbis. |
Packaging | Delete video streams | If this is enabled, the transcoding result will contain only audio streams. |
| Delete audio streams | If this is enabled, the transcoding result will contain only video streams. |
Enhancement Type | Features | Description |
Video Enhancement | Super Resolution | Super-resolution can identify the content and contours of the video, reconstruct the details and local features of the video in high definition, converting low-resolution videos into high-resolution ones, suitable for scenarios like old film restoration. |
| Low-light Enhancement | Due to environmental conditions and limitations of the camera hardware, some scenes may suffer from a lack of brightness and contrast, resulting in dark images or missing details. Activating low light enhancement significantly improves the details and contrast in dark areas, enhancing the subjective quality of the human eye. |
| HDR | Supports HDR10, and HLG, offering a wider color gamut and more color details, providing higher-quality video content. |
| Comprehensive Enhancement | Through AI's comprehensive analytical capabilities, it automatically balances the texture content in the image, enhancing key details while removing compression artifacts and jagged edges, thereby improving the overall subjective perception of the image. |
| Color Enhancement | Color enhancement makes the image closer to real colors and enhances them to some extent to meet the preferences of the human eye. |
| Detail Enhancement | Detail enhancement focuses on the details in the video that require attention (e.g., the grass on a sports field), making the image content clearer and richer. |
| Face Enhancement | Enhance the areas of the video that the human visual system particularly focuses on, such as faces, making the details in these areas clearer and improving subjective perception. |
| Scratches Removal | Scratch removal can repair scratches and snowflake spots in the video, restoring damaged content. |
| Artifacts Removal | Due to multiple compressions of the video during transcoding or multiple transcoding processes, block effects, ringing effects, chroma bleeding, and mosquito noise are introduced, causing distortions that affect the visual effect. De-compression distortion effectively repairs distortions introduced by encoding. |
| Video Noise Reduction | Random noise may be introduced during film shooting due to the camera and environment. This service offers denoising, eliminating random noise in the image without losing detail. |
Audio Enhancement | Audio Noise Reduction | Removes device noise, environmental noise, etc., suitable for scenarios such as recording classes and post-production of outdoor shooting. |
| Audio Separation | Separates human voices from background sounds, or singing voices from accompaniment in audio-video files, creating independent audio materials for post-production artistic processing. |
| Volume Equalization | 1. Loudness Normalization: Maintains a consistent overall loudness level, making the playback sound similar in volume, avoiding issues of being too loud or too quiet, and providing a better auditory experience. 2. Volume Leveling: Smoothens overly loud audio segments, avoiding sudden volume changes, and providing a more stable auditory experience. |
| Audio Improvement | 1. Noise removal: Reduces unwanted noise or interference in the audio, improving the quality and clarity of the audio. 2. De-essing (Sibilance Suppression): Sibilance refers to the sharp, piercing sounds in audio, often produced when the sound source is close to the microphone. Suppressing sibilance aims to reduce or eliminate this unnatural sound, thereby improving audio quality. |
Parameter | Description |
Type | The watermark type. Watermarks can be static or animated. |
Position | The relative position of a watermark in the video. |
ImageSize | The size of the watermark in the video. |
ImageContent | Binary data of a watermark. |
Parameter | Description |
Format | The screenshot format (only JPG is supported currently) |
Width | Screenshot width (px). Value range: 128-4096 |
Height | Screenshot height (px). Value range: 128-4096 |
FillType | The fill mode ( FillType ) specifies how the source video image processed when the aspect ratio does not match the specified aspect ratio of a screenshot. The following fill modes are supported: Scale to fill: Source video images are stretched to match the aspect ratio of screenshots. This may cause images to appear distorted. Black bars: The aspect ratio of source video images is retained, and the empty spaces are painted black. White bars: The aspect ratio of source video images is retained, and the empty spaces are painted white. Gaussian blur: The aspect ratio of source video images is retained, and Gaussian blur is applied to the empty spaces. |
Parameter | Description |
Format | The screenshot format (only JPG is supported currently) |
Width | Screenshot width (px). Value range: 128-4096 |
Height | Screenshot height (px). Value range: 128-4096 |
SampleType | How sampling intervals are measured. Sampling intervals can be measured in two ways: By percent: Intervals are measured by percent. For example, if Interval is set to 5 (%), 20 screenshots will be generated for a video. By time: Intervals are measured by time. For example, if Interval is set to 10 (sec), the number of screenshots generated will depend on the video length. |
Interval | The sampling interval. If the interval measurement ( SampleType ) is by percent, this parameter is a percent value. If interval measurement is by time, this parameter is a time value (sec). |
FillType | The fill mode ( FillType ) specifies how the source video image processed when the aspect ratio does not match the specified aspect ratio of a screenshot. The following fill modes are supported: Scale to fill: Source video images are stretched to match the aspect ratio of screenshots. This may cause images to appear distorted. Black bars: The aspect ratio of source video images is retained, and the empty spaces are painted black. White bars: The aspect ratio of source video images is retained, and the empty spaces are painted white. Gaussian blur: The aspect ratio of source video images is retained, and Gaussian blur is applied to the empty spaces. |
Parameter | Description |
Format | The format of the image sprite (only JPG is supported currently). |
Width | The width of the subimage in an image sprite. |
Height | The height of the subimage in an image sprite. |
Rows | The number of image rows in a sprite. |
Columns | The number of image columns in a sprite. |
SampleType | How sampling intervals are measured. Currently, only sampling by time is supported. |
Interval | The time interval for image sampling. |
Width
x Columns
(i.e., sprite width) should be within the range of 128-4096.Height
x Rows
(i.e., sprite height) should be in the range of 128-4096.Parameter | Description |
Format | The format of the animated image (only GIF and WebP are supported currently). |
Width | The animated image width. Value range: 128–4096 px. |
Height | The animated screenshot height. Value range: 128–4096 px. |
FPS | The frame rate. Value range: 1–60 fps. |
Recognition Type | Description |
Face Recognition | Quickly recognizes facial information in a video based on deep learning and locates the frames in which a person is present as well as the position of the person’s face. You can use custom person libraries or call video AI-enabled public person libraries to recognize faces. |
Speech recognition | Quickly recognizes the speech in a video and converts it to text based on deep learning. You can specify custom keywords and locate the time points in the video at which the keywords are spoken. |
Text recognition | Recognizes text in a video, including vertically oriented text, and automatically extracts keywords from the text. |
Frame tag recognition | Uses deep learning to automatically recognize tags in the video frames captured at the custom frame capturing interval, and locates the tags in the video. Frame tags are divided into nine categories, such as people, landscape, artificial object, building, plant, animal, and food, covering various aspects of daily life. You can use custom tags based on the tag system. It has transfer learning capabilities, so you can customize classifiers simply by providing the raw user data. In this way, it meets the requirements of different types of users and makes the tag system more flexible. |
Opening and ending credits recognition | Automatically recognizes and locates the time points of opening and ending credits of movies and TV series based on the video image characteristics, text, speech, and other information. |
Analysis Type | Description |
Intelligent categorization | Recommends a category for the target video by analyzing the video content. Currently, it supports 19 categories, including food, travel, animation, and music. Custom categories are also supported as a paid feature. |
Intelligent video tagging | Intelligently recognizes top five tags that best fit the video content based on Tencent's deep learning solution. It is suitable for video recommendation and search scenarios. You can customize the number of tags to be returned in the API. |
Smart cover | Automatically generates a video cover based on characteristic information such as video image texture and scene recognition. It allows you to output images quickly, making it easier to create video and improving video click rates. |
Detection Type | Detection Item Description |
Video image auditing | Moderates the video image to detect erotic and non-compliant content, specifically including: Erotic content detection `porn`: Pornographic content `vulgar`: Vulgar content `intimacy`: Content that displays intimacy `sexy`: Content that displays sexiness Illegal and non-compliant content detection `guns`: Weapons and guns `bloody`: Bloodiness `explosion`: Explosions and fires `violation_photo`: Banned icons |
Audio auditing | Moderates the speech in the audio based on the following: Erotic content detection: Analyzes speech in the audio to detect keywords related to erotic content. Illegal and non-compliant content detection: Analyzes speech in the audio to detect keywords related to illegal and non-compliant content. |
Text auditing | Moderates the text in video images, specifically including: Erotic content detection: Analyzes text in the video image to detect keywords related to erotic content. Illegal and non-compliant content detection: Analyzes text in the video image to detect keywords related to illegal and non-compliant content. |
Was this page helpful?