Overview
The Smart Captions and Subtitles Function offers real-time voice recognition for video files and live streams, converting speech to subtitles in multiple languages. It's ideal for live broadcasts and international video transcription, with customizable hotwords and glossary libraries for improved accuracy.
Key features
Comprehensive Platform Support: Offers processing capabilities for on-demand files, live streams, and RTC streams. Live broadcast real-time simultaneous captioning supports steady and gradient modes, with a low barrier to integration and no need for modifications on the playback end.
High Accuracy: Utilizes large-scale models, and supports hotwords and glossary databases, achieving industry-leading accuracy.
Rich Language Variety: Supports hundreds of languages, including various dialects. Capable of recognizing mixed-language speech, such as combinations of Chinese and English.
Customizable Styles: Enables embedding open subtitles into videos, with customizable subtitle styles (font, size, color, background, position, etc.).
Scenario 1: Processing Offline Files
Method 1: Initiating a Zero-Code Task from the Console
Initiating a Task Manually
1. Specify an input file.
You can choose a video file from a Tencent Cloud Object Storage (COS) bucket or provide a video download URL. The current subtitle generation and translation feature does not support using AWS S3 as an input file source.
2. Process the input file.
Select Create Orchestration and insert the "Smart Subtitle" node.
System preset templates are shown in the table below:
Template Name/ID | Template Capability |
Generate_Chinese_Subtitle_For_Chinese_Video 100 | Recognize the Chinese speech in the source video and generate a Chinese subtitle file (WebVTT format). |
Generate_English_Subtitle_For_Chinese_Video 121 | Recognize the Chinese speech in the source video, translate it into English, and generate an English subtitle file. |
Generate_Chinese_And_English_Subtitle_For_Chinese_Video 122 | Recognize the Chinese speech in the source video, translate it into English, and generate a Chinese-English bilingual subtitle file. |
Generate_English_Subtitle_For_English_Video 200 | Recognize the English speech in the source video and generate an English subtitle file. |
Generate_Chinese_Subtitle_For_English_Video 211 | Recognize the English speech in the source video, translate it into Chinese, and generate a Chinese subtitle file. |
Generate_Chinese_And_English_Subtitle_For_English_Video 212 | Recognize the English speech in the source video, translate it into Chinese, and generate an English-Chinese bilingual subtitle file. |
3. Specify an output path.
Specify the storage path of the output file.
4. Initiate a task.
Click Create to initiate a task.
Automatically Triggering a Task Through the Orchestration (Optional)
If you want to upload a video file to the COS bucket and achieve automatic smart subtitles according to preset parameters, you can:
1. Enter On-demand Orchestration in the menu, click Create VDD Orchestration, select the smart subtitle node in task configuration, and configure parameters such as the bucket and directory to be triggered.
2. Go to the On-demand Orchestration list, find the new orchestration, and enable the switch at Enable. Subsequently, any new video files added to the triggered directory will automatically initiate tasks according to the preset process and parameters of the orchestration, and the processed video files will be saved to the output path configured in the orchestration.
Note:
It takes 3-5 minutes for the orchestration to take effect after being enabled.
Method 2: API Call
Method 1
Call the ProcessMedia API and initiate a task by specifying the Template ID. Example: {
"InputInfo": {
"Type": "URL",
"UrlInputInfo": {
"Url": "https://test-1234567.cos.ap-guangzhou.myqcloud.com/video/test.mp4"
}
},
"SmartSubtitlesTask": {
"Definition": 122
},
"OutputStorage": {
"CosOutputStorage": {
"Bucket": "test-1234567",
"Region": "ap-guangzhou"
},
"Type": "COS"
},
"OutputDir": "/output/",
"Action": "ProcessMedia",
"Version": "2019-06-12"
}
Method 2
Call the ProcessMedia API and initiate a task by specifying the Orchestration ID. Example: {
"InputInfo": {
"Type": "COS",
"CosInputInfo": {
"Bucket": "facedetectioncos-125*****11",
"Region": "ap-guangzhou",
"Object": "/video/123.mp4"
}
},
"ScheduleId": 12345,
"Action": "ProcessMedia",
"Version": "2019-06-12"
}
Note:
If there is a callback address set, see the ParseNotification document for response packets. Subtitle Application to Videos (Optional Capability)
Call the ProcessMedia API, initiate a transcoding task, specify the vtt file path for the subtitle, and specify subtitle application styles through the SubtitleTemplate field. Example:
{
"MediaProcessTask": {
"TranscodeTaskSet": [
{
"Definition": 100040,
"OverrideParameter": {
"SubtitleTemplate": {
"Path": "https://test-1234567.cos.ap-nanjing.myqcloud.com/mps_autotest/subtitle/1.vtt",
"StreamIndex": 2,
"FontType": "simkai.ttf",
"FontSize": "10px",
"FontColor": "0xFFFFFF",
"FontAlpha": 0.9
}
}
}
]
},
"InputInfo": {
"Type": "URL",
"UrlInputInfo": {
"Url": "https://test-1234567.cos.ap-nanjing.myqcloud.com/mps_autotest/subtitle/123.mkv"
}
},
"OutputStorage": {
"Type": "COS",
"CosOutputStorage": {
"Bucket": "test-1234567",
"Region": "ap-nanjing"
}
},
"OutputDir": "/mps_autotest/output2/",
"Action": "ProcessMedia",
"Version": "2019-06-12"
}
Querying Task Results
Via the Console
Navigate to the VOD Tasks in the console, where the list will display the tasks that have just been initiated. When the subtask status is "Successful", clicking on View Result allows for a preview of the subtitle.
The generated VTT subtitle file can be found in Orchestration > COS Bucket > Output Bucket.
Sample Chinese-English subtitles:
Event Notification Callbacks
When initiating a media processing task with ProcessMedia, you can configure event callbacks through the TaskNotifyConfig
parameter. Upon the completion of the task, the results will be communicated back to you via the configured callback information, which you can decipher using ParseNotification. Querying Task Results by Calling an API
Call the DescribeTaskDetail API and fill in the task ID (for example, 24000022-WorkflowTask-b20a8exxxxxxx1tt110253 or 24000022-ScheduleTask-774f101xxxxxxx1tt110253) to query task results. Example: Scenario 2: Live Streams
There are currently 2 solutions for using subtitles and translations in live streams: Enable the subtitle feature through the Cloud Streaming Services (CSS) console, or use MPS to call back text and embed it into live streams. It is recommended to enable the subtitle feature through the CSS console. The solution is introduced as follows:
Method 1: Enabling the Subtitle Feature in the CSS Console
1. Configure the live subtitling feature.
2. Obtain subtitle streams.
When the transcoding stream (append the transcoding template name _transcoding template name
bound with the subtitle template to the corresponding live stream's StreamName to generate a transcoding stream address) is obtained, subtitles will be displayed. For detailed rules of splicing addresses for obtaining streams, see Splicing Playback URLs. Note:
Currently, there are 2 forms of subtitle display: real-time dynamic subtitles and delayed steady-state subtitles. For real-time dynamic subtitles, the subtitles in live broadcast will dynamically correct the content word by word based on the speech content, and the output subtitles change in real time. For delayed steady-state subtitles, the system will display the live broadcast with a delay according to the set time, but the viewing experience of the complete sentence subtitle mode is better.
Method 2: Calling Back Text through MPS
Currently, it is not supported to use the MPS console to initiate live stream smart subtitle tasks. You can initiate them through the API.
Note:
Currently, using MPS to process live streams requires the use of the Intelligent Identification template. This is achieved using Automatic Speech Recognition or speech translation.
{
"Url": "http://5000-wenzhen.liveplay.myqcloud.com/live/123.flv",
"AiRecognitionTask": {
"Definition": 10101
},
"OutputStorage": {
"CosOutputStorage": {
"Bucket": "6c0f30dfvodgzp*****0800-10****53",
"Region": "ap-guangzhou-2"
},
"Type": "COS"
},
"OutputDir": "/6c0f30dfvodgzp*****0800/0d1409d3456551**********652/",
"TaskNotifyConfig": {
"NotifyType": "URL",
"NotifyUrl": "http://****.qq.com/callback/qtatest/?token=*****"
},
"Action": "ProcessLiveStream",
"Version": "2019-06-12"
}