Overview
The Smart Captions and Subtitles Function offers real-time voice recognition for video files and live streams, converting speech to subtitles in multiple languages. It's ideal for live broadcasts and international video transcription, with customizable hotwords and glossary libraries for improved accuracy.
Key features
Comprehensive Platform Support: Offers processing capabilities for on-demand files, live streams, and RTC streams. Live broadcast real-time simultaneous captioning supports steady and gradient modes, with a low barrier to integration and no need for modifications on the playback end.
High Accuracy: Utilizes large-scale models, and supports hotwords and glossary databases, achieving industry-leading accuracy.
Rich Language Variety: Supports hundreds of languages, including various dialects. Capable of recognizing mixed-language speech, such as combinations of Chinese and English.
Customizable Styles: Enables embedding open subtitles into videos, with customizable subtitle styles (font, size, color, background, position, etc.).
Scenario 1: Offline File Processing
1. Zero-Code Automatic Generation
1.1 Specify an input file.
You can choose a video file from a Tencent Cloud Object Storage (COS) bucket or provide a video download URL. The current subtitle generation and translation feature does not support using AWS S3 as an input file source.
1.2 Process the input file.
Select Create Orchestration. Automatic Speech Recognition (ASR) and speech translation capabilities can be achieved by inserting an intelligent identification orchestration node. Click to choose a system preset template based on the actual business scenario, or create a custom template. The system preset templates and capabilities are shown in the table below:
Template ID | Template Capability |
10101 | Identifies Chinese voice in the source video and generates a Chinese subtitle file (VTT format). |
10102 | Identifies English voice in the source video and generates an English subtitle file (VTT format). |
10103 | Identifies Chinese voice in the source video, translates it into English, and generates a Chinese-English bilingual subtitle file. |
10104 | Identifies English voice in the source video, translates it into Chinese, and generates an English-Chinese bilingual subtitle file. |
10105 | Identifies Japanese voice in the source video and generates a Japanese subtitle file. |
10106 | Identifies Korean voice in the source video and generates a Korean subtitle file. |
1.3 Specify an output path.
Select a save path for the output file from COS.
1.4 Initiate a task.
Click Create to initiate a task.
2. After the task is completed, the automatically generated VTT subtitle file can be found in Orchestration > COS Bucket > Output Bucket.
Sample Chinese subtitles:
Sample Chinese-English subtitles:
2. API Calling
1. Enter the orchestration ID in ScheduleId to initiate a task. For details, see API information. Example:
{
"InputInfo": {
"Type": "COS",
"CosInputInfo": {
"Bucket": "facedetectioncos-125*****11",
"Region": "ap-guangzhou",
"Object": "/video/123.mp4"
}
},
"ScheduleId": 20073,
"Action": "ProcessMedia",
"Version": "2019-06-12"
}
3. Embedding into Video (Optional)
Example:
{
"MediaProcessTask": {
"TranscodeTaskSet": [
{
"Definition": 206390,
"OverrideParameter": {
"Container": "mp4",
"RemoveVideo": 0,
"RemoveAudio": 0,
"VideoTemplate": {
"Codec": "libx264",
"Fps": 30,
"Bitrate": 2346,
"ResolutionAdaptive": "close",
"Width": 1920,
"Height": 0,
"Gop": 0,
"FillType": "black"
},
"AudioTemplate": {
"Codec": "libmp3lame",
"Bitrate": 0,
"SampleRate": 32000,
"AudioChannel": 2
},
"SubtitleTemplate": {
"Path": "https://lily-125*****27.cos.ap-nanjing.myqcloud.com/mps_autotest/subtitle/1.vtt",
"StreamIndex": 2,
"FontType": "simkai.ttf",
"FontSize": "10px",
"FontColor": "0xFFFFFF",
"FontAlpha": 0.9
}
}
}
]
},
"InputInfo": {
"Type": "URL",
"UrlInputInfo": {
"Url": "https://lily-125*****27.cos.ap-nanjing.myqcloud.com/mps_autotest/subtitle/123.mkv"
}
},
"OutputStorage": {
"Type": "COS",
"CosOutputStorage": {
"Bucket": "lily-125*****27",
"Region": "ap-nanjing"
}
},
"OutputDir": "/mps_autotest/output2/",
"Action": "ProcessMedia",
"Version": "2019-06-12"
}
Scenario 2: Live Streams
There are currently 2 solutions for using subtitles and translations in live streams: Enable the subtitle feature through the Cloud Streaming Services (CSS) console, or use MPS to call back text and embed it into live streams. It is recommended to enable the subtitle feature through the CSS console. The solution is introduced as follows:
Solution 1: Enabling the Subtitle Feature in the CSS Console
1. Configure the live subtitling feature.
2. Obtain subtitle streams.
When the transcoding stream (append the transcoding template name _transcoding template name
bound with the subtitle template to the corresponding live stream's StreamName to generate a transcoding stream address) is obtained, subtitles will be displayed. For detailed rules of splicing addresses for obtaining streams, see Splicing Playback URLs. Note:
Currently, there are 2 forms of subtitle display: real-time dynamic subtitles and delayed steady-state subtitles. For real-time dynamic subtitles, the subtitles in live broadcast will dynamically correct the content word by word based on the speech content, and the output subtitles change in real time. For delayed steady-state subtitles, the system will display the live broadcast with a delay according to the set time, but the viewing experience of the complete sentence subtitle mode is better.
Solution 2: Calling Back Text through MPS
1. Initiate a task via API. Use a preset subtitle template to initiate a recognition task. For details, see ProcessLiveStream. Example:
{
"Url": "http://5000-wenzhen.liveplay.myqcloud.com/live/123.flv",
"AiRecognitionTask": {
"Definition": 10101
},
"OutputStorage": {
"CosOutputStorage": {
"Bucket": "6c0f30dfvodgzp*****0800-10****53",
"Region": "ap-guangzhou-2"
},
"Type": "COS"
},
"OutputDir": "/6c0f30dfvodgzp*****0800/0d1409d3456551**********652/",
"TaskNotifyConfig": {
"NotifyType": "URL",
"NotifyUrl": "http://****.qq.com/callback/qtatest/?token=*****"
},
"Action": "ProcessLiveStream",
"Version": "2019-06-12"
}
Was this page helpful?