Speech-to-Text

Use Cases
Tencent Real-Time Communication (TRTC) supports the speech-to-text feature, which converts the audio streams of specified users or all users in a room into corresponding Chinese text for effects such as real-time captions.
Prerequisites
Log in to the TRTC console, activate the TRTC service, and create an RTC-Engine application.
Go to the purchase page to buy an RTC-Engine package of any version to unlock the speech-to-text feature.
Note:
The speech-to-text feature incurs fees based on usage. See Fee Details for more information.
Feature Overview
After a task is initiated, TRTC AI Service uses an Automatic Speech Recognition (ASR) bot to enter a TRTC room to pull the streams of specified users or all users for speech-to-text recognition, and then relay the recognition results to the client and server in real time.
﻿
Integration Guide
Step 1: Receiving Speech-to-Text Results
Method 1: Receiving Text Messages via Client SDK
Use the custom message receiving feature of the TRTC SDK to listen to callbacks on the client and receive real-time speech-to-text result data.
The client callback message format is as follows, taking the web end as an example:
trtc.on(TRTC.EVENT.CUSTOM_MESSAGE, event => { // Receive custom messages.
   // event.userId: The userId of the ASR robot.
   // event.cmdId: The message ID, which is fixed at 1 for transcriptions and captions.
   // event.seq: The sequence number of a message.
   // event.data: ArrayBuffer type. For content of transcriptions or captions, see the explanation of the data field below.
   const data = new TextDecoder().decode(event.data)
   // Explanation of the data field is as follows.
   console.log(`received custom msg from ${event.userId}, message: ${ data }`)
})
Data field explanation
Real-Time Captions
Field Name
Type
Meaning
type
Integer
10000: When there are real-time captions and a complete sentence, the message type will be delivered.
sender
String
Speaker's userid.
receiver
Array
Recipient's userid list. This message is actually broadcast within a room.
payload.text
String
Recognized text, Unicode encoded.
payload.start_time
String
Message start time. It is the absolute time after a task starts.
payload.end_time
String
Message end time. It is the absolute time after a task starts.
payload.end
Boolean
If true, it indicates that this is a complete sentence.
{
  "type": 10000,
  "sender": "user_a",
  "payload": {
     "text":"",
     "start_time":"00:00:02",
     "end_time":"00:00:05",
     "end": true
  }
}
Note:
Callback example explanation:
Transcription: A complete sentence will be transcribed and pushed.
	"How's the weather today?"
Captions: A sentence will be segmented for pushing, with each subsequent segment containing the previous one to ensure real-time performance.
"Today"
"Today's weather"
"How's the weather today?"
Sequence explanation: Caption message > Caption message > .... > Caption message (end = true)
Method 2: Receiving via Server-side Callbacks
The speech-to-text service also provides server-side event callbacks, facilitating your service to receive real-time conversation messages. See Detailed Callback Events.
Step 2: Initiating a Speech-to-Text Task
TRTC provides the following Tencent Cloud APIs for initiating and managing speech-to-text tasks:
Start a speech-to-text task: StartAITranscription
Query a speech-to-text task: DescribeAITranscription
Stop a speech-to-text task: StopAITranscription
Note:
The speech-to-text feature has a concurrency limit of 100 tasks per SDKAppId. Submit a ticket if you need to increase this limit.

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

Field Name	Type	Meaning
type	Integer	10000: When there are real-time captions and a complete sentence, the message type will be delivered.
sender	String	Speaker's userid.
receiver	Array	Recipient's userid list. This message is actually broadcast within a room.
payload.text	String	Recognized text, Unicode encoded.
payload.start_time	String	Message start time. It is the absolute time after a task starts.
payload.end_time	String	Message end time. It is the absolute time after a task starts.
payload.end	Boolean	If true, it indicates that this is a complete sentence.

tencent cloud

Sign Up

Log in

Compute

Microservice

Data Migration

Database SaaS Tool

Data Security

Application Security

Big Data

Image Creation

Internet of Things

Stream Services

Cloud Real-time Rendering

Management and Audit Tools

Edge Computing

Serverless

Relational Database

Networking

Business Security

Domains & Websites

Face Recognition

AI Platform Service

Middleware

Media On-Demand

Game Services

Developer Tools

Container

Essential Storage Service

Enterprise Distributed DBMS

CDN and Acceleration

Security Services

Enterprise Applications

Tencent Big Model

Natural Language Processing

Communication

Media Process Services

Education Sevices

Monitor and Operation

Distributed cloud

Data Process and Analysis

NoSQL Database

Network Security

Cloud Security

Office Collaboration

Voice Technology

Optical Character Recognition

Interactive Video Services

Media SDK

Cloud Resource Management

More

Use Cases

Prerequisites

Feature Overview

Integration Guide

Step 1: Receiving Speech-to-Text Results

Method 1: Receiving Text Messages via Client SDK

Real-Time Captions

Method 2: Receiving via Server-side Callbacks

Step 2: Initiating a Speech-to-Text Task