Use cases
Draw and guess: The audio of a user in the room is pulled through this API for real-time recognition and then converted into text, which is called back to the customer’s business server for business logic judgment.
Audio audit: This API is closely related to business. It delivers the data streams to the speech recognition API for speech recognition and keyword-based filtering.
Real-time subtitles: The audio data in a room is recognized through this API in real time and converted into text, which is displayed at the frontend.
Architecture
The following figure shows the detailed process:
Application strengths
Real-time return: The audio data in a Tencent Real-Time Communication (TRTC) room is recognized and returned in real time, which is fast and efficient.
Simple process: TRTC and Automatic Speech Recognition (ASR) are deeply integrated, so that the data streams are fully pipelined without complex operations.
Flexible use: The data can be associated with the business logic in real time after it is returned to the business server.
Must-knows
In general, it takes a long time to process speech recognition because Async Execution is enabled during function deployment. The recognition results are sent to the business server. WebSocket connections are not supported. Therefore, the recognition results cannot be sent to clients.
The default authentication type is App Authentication. For more information, see Application Management. You can change the authentication type to Authentication-Free during a test. For more information, see [Step 3. Create an API with Mock as the Backend Type](https://www. tencentcloud.com/document/product/628/44318!6af3e7433c19e0c5b321cde7fed199bd). Directions
1. Activate the service
You must activate Tencent Cloud ASR. For more information, see [Activate the service](https://www.tencentcloud.com/document/product/1118/43344#3.-.E6.96.B0.E6.89.8B.E5.85.A5.E9.97.A8! 673aaf6cd84c2d22e15eb0eeef6866e8).
2. Deploying function
2. Click Create application to go to the "Create application" page.
3. Select Live Stream Real-time ASR and set parameters on the Basic Configuration page.
Application name: Specify a custom name for the application.
Region: Select the region based on the actual business.
Key information: You can view the key’s information of the Tencent Cloud account on the Manage API Key page. 4. Click Complete.
5. Click the function name in the Function name column in the Cloud function section to go to the details page of the function, and click "Trigger management" on the left to view the access path.
3. Configuring the ASR startup API
proto: HTTPS
Method: POST
URL: https://service-xxx-xxxx.sh.apigw.tencentcs.com/release/asr_speech
Request parameters:
|
| | | Application ID. Each TRTC application has a unique application ID. |
| | | Room ID of the integer type. Each room of a TRTC application has a unique room ID. |
| | | Room ID of the string type. Either RoomId or StrRoomId must be configured. If both are configured, RoomId is used. |
| | | ID of the user who uses the recording service. Each user of a TRTC application has a unique user ID. |
| | | Signature of the user who uses the recording service. The signature is used for login authentication of the user. |
| | | The address to which a webhook is sent by using the POST method when the recording ends. |
Sample request:
{
"SdkAppId": 1400000000,
"RoomId": 43474,
"UserId": "user_55952145",
"UserSig": "eJwtzNEKgkAUBNBxxxxxxx",
"Callback": "https:xxxxxxxx.com/post/xxx"
}
Recognition result webhook API
Webhook parameters:
|
| | | |
| | | Room ID of the integer type. |
| | | ID of the recognized user. |
| | | Room ID of the string type. |
| | | Results of audio recognition in the format of [{},{},{},{}]. |
| | | Recognition status of the current user. Valid values: normal and finished . |
The value of Result
is a JSON array that contains the following objects:
|
| | | Text of the current sentence in UTF8. |
| | | Sequence number of the current sentence in the entire audio stream. The sequence number starts from 0. |
| | | Start time of the current sentence in the entire audio stream. |
| | | End time of the current sentence in the entire audio stream. |
| | | Execution result of the recognition task, such as recognition finished, recognition in progress, and recognition failed. |
Sample result:
{
"RequestID": "95941e2c85898384a95b81c2a5******",
"SdkAppId": 1400000000,
"RoomId": 43474,
"UserId": "user_55952145",
"Status": "recognizing/finished",
"Result": [{
"Voice": "Real-time voice recognition",
"Index": 0,
"StartTime": 0,
"EndTime": 1024,
"Message": "success"
}]
}
Was this page helpful?