tencent cloud

All product documents
Serverless Application Center
Deploying ASR
Last updated: 2024-12-02 11:16:20
Deploying ASR
Last updated: 2024-12-02 11:16:20

Use cases

Draw and guess: The audio of a user in the room is pulled through this API for real-time recognition and then converted into text, which is called back to the customer’s business server for business logic judgment.
Audio audit: This API is closely related to business. It delivers the data streams to the speech recognition API for speech recognition and keyword-based filtering.
Real-time subtitles: The audio data in a room is recognized through this API in real time and converted into text, which is displayed at the frontend.

Architecture

The following figure shows the detailed process:




Application strengths

Real-time return: The audio data in a Tencent Real-Time Communication (TRTC) room is recognized and returned in real time, which is fast and efficient.
Simple process: TRTC and Automatic Speech Recognition (ASR) are deeply integrated, so that the data streams are fully pipelined without complex operations.
Flexible use: The data can be associated with the business logic in real time after it is returned to the business server.

Must-knows

In general, it takes a long time to process speech recognition because Async Execution is enabled during function deployment.
The recognition results are sent to the business server. WebSocket connections are not supported. Therefore, the recognition results cannot be sent to clients.
The default authentication type is App Authentication. For more information, see Application Management. You can change the authentication type to Authentication-Free during a test. For more information, see [Step 3. Create an API with Mock as the Backend Type](https://www. tencentcloud.com/document/product/628/44318!6af3e7433c19e0c5b321cde7fed199bd).

Directions

1. Activate the service

You must activate Tencent Cloud ASR. For more information, see [Activate the service](https://www.tencentcloud.com/document/product/1118/43344#3.-.E6.96.B0.E6.89.8B.E5.85.A5.E9.97.A8! 673aaf6cd84c2d22e15eb0eeef6866e8).

2. Deploying function

1. Log in to the SLS console.
2. Click Create application to go to the "Create application" page.
3. Select Live Stream Real-time ASR and set parameters on the Basic Configuration page.
Application name: Specify a custom name for the application.
Region: Select the region based on the actual business.
Key information: You can view the key’s information of the Tencent Cloud account on the Manage API Key page.
4. Click Complete.
5. Click the function name in the Function name column in the Cloud function section to go to the details page of the function, and click "Trigger management" on the left to view the access path.

3. Configuring the ASR startup API

proto: HTTPS
Method: POST
URL: https://service-xxx-xxxx.sh.apigw.tencentcs.com/release/asr_speech
Request parameters:
Parameter
Type
Required
Description
SdkAppId
Int
Yes
Application ID. Each TRTC application has a unique application ID.
RoomId
Int
No
Room ID of the integer type. Each room of a TRTC application has a unique room ID.
StrRoomId
String
No
Room ID of the string type. Either RoomId or StrRoomId must be configured. If both are configured, RoomId is used.
UserId
String
Yes
ID of the user who uses the recording service. Each user of a TRTC application has a unique user ID.
UserSig
String
Yes
Signature of the user who uses the recording service. The signature is used for login authentication of the user.
Callback
String
No
The address to which a webhook is sent by using the POST method when the recording ends.
Sample request:
{
"SdkAppId": 1400000000,
"RoomId": 43474,
"UserId": "user_55952145",
"UserSig": "eJwtzNEKgkAUBNBxxxxxxx",
"Callback": "https:xxxxxxxx.com/post/xxx"
}

Recognition result webhook API

Webhook parameters:
Parameters
Type
Required
Description
SdkAppId
Int
Yes
Application ID.
RoomId
int
Yes
Room ID of the integer type.
UserId
String
Yes
ID of the recognized user.
StrRoomId
String
Yes
Room ID of the string type.
Result
Array
Yes
Results of audio recognition in the format of [{},{},{},{}].
Status
String
Yes
Recognition status of the current user. Valid values: normal and finished.
The value of Result is a JSON array that contains the following objects:
Parameter
Type
Required
Description
Voice
String
Yes
Text of the current sentence in UTF8.
Index
Integer
Yes
Sequence number of the current sentence in the entire audio stream. The sequence number starts from 0.
StartTime
Integer
Yes
Start time of the current sentence in the entire audio stream.
EndTime
Integer
Yes
End time of the current sentence in the entire audio stream.
Message
String
Yes
Execution result of the recognition task, such as recognition finished, recognition in progress, and recognition failed.
Sample result:
{
"RequestID": "95941e2c85898384a95b81c2a5******",
"SdkAppId": 1400000000,
"RoomId": 43474,
"UserId": "user_55952145",
"Status": "recognizing/finished",
"Result": [{
"Voice": "Real-time voice recognition",
"Index": 0,
"StartTime": 0,
"EndTime": 1024,
"Message": "success"
}]
}

Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback

Contact Us

Contact our sales team or business advisors to help your business.

Technical Support

Open a ticket if you're looking for further assistance. Our Ticket is 7x24 available.

7x24 Phone Support