Use Cases
Tencent Real-Time Communication (TRTC) supports the speech-to-text feature, which converts the audio streams of specified users or all users in a room into corresponding Chinese text for effects such as real-time captions.
Prerequisites
Go to the purchase page to buy an RTC-Engine package of any version to unlock the speech-to-text feature. Note:
The speech-to-text feature incurs fees based on usage. See Fee Details for more information. Feature Overview
After a task is initiated, TRTC AI Service uses an Automatic Speech Recognition (ASR) bot to enter a TRTC room to pull the streams of specified users or all users for speech-to-text recognition, and then relay the recognition results to the client and server in real time.
Integration Guide
Step 1: Receiving Speech-to-Text Results
Method 1: Receiving Text Messages via Client SDK
The client callback message format is as follows, taking the web end as an example:
trtc.on(TRTC.EVENT.CUSTOM_MESSAGE, event => {
const data = new TextDecoder().decode(event.data)
console.log(`received custom msg from ${event.userId}, message: ${ data }`)
})
Data field explanation
Real-Time Captions
|
type | Integer | 10000: When there are real-time captions and a complete sentence, the message type will be delivered. |
sender | String | Speaker's userid. |
receiver | Array | Recipient's userid list. This message is actually broadcast within a room. |
payload.text | String | Recognized text, Unicode encoded. |
payload.start_time | String | Message start time. It is the absolute time after a task starts. |
payload.end_time | String | Message end time. It is the absolute time after a task starts. |
payload.end | Boolean | If true, it indicates that this is a complete sentence. |
{
"type": 10000,
"sender": "user_a",
"payload": {
"text":"",
"start_time":"00:00:02",
"end_time":"00:00:05",
"end": true
}
}
Note:
Callback example explanation:
Transcription: A complete sentence will be transcribed and pushed.
"How's the weather today?"
Captions: A sentence will be segmented for pushing, with each subsequent segment containing the previous one to ensure real-time performance.
"Today"
"Today's weather"
"How's the weather today?"
Sequence explanation: Caption message > Caption message > .... > Caption message (end = true)
Method 2: Receiving via Server-side Callbacks
The speech-to-text service also provides server-side event callbacks, facilitating your service to receive real-time conversation messages. See Detailed Callback Events. Step 2: Initiating a Speech-to-Text Task
TRTC provides the following Tencent Cloud APIs for initiating and managing speech-to-text tasks:
Note:
The speech-to-text feature has a concurrency limit of 100 tasks per SDKAppId. Submit a ticket if you need to increase this limit.
Was this page helpful?