tencent cloud

Feedback

Deploying ASR

Last updated: 2024-12-02 11:16:20

    Use cases

    Draw and guess: The audio of a user in the room is pulled through this API for real-time recognition and then converted into text, which is called back to the customer’s business server for business logic judgment.
    Audio audit: This API is closely related to business. It delivers the data streams to the speech recognition API for speech recognition and keyword-based filtering.
    Real-time subtitles: The audio data in a room is recognized through this API in real time and converted into text, which is displayed at the frontend.

    Architecture

    The following figure shows the detailed process:
    
    
    

    Application strengths

    Real-time return: The audio data in a Tencent Real-Time Communication (TRTC) room is recognized and returned in real time, which is fast and efficient.
    Simple process: TRTC and Automatic Speech Recognition (ASR) are deeply integrated, so that the data streams are fully pipelined without complex operations.
    Flexible use: The data can be associated with the business logic in real time after it is returned to the business server.

    Must-knows

    In general, it takes a long time to process speech recognition because Async Execution is enabled during function deployment.
    The recognition results are sent to the business server. WebSocket connections are not supported. Therefore, the recognition results cannot be sent to clients.
    The default authentication type is App Authentication. For more information, see Application Management. You can change the authentication type to Authentication-Free during a test. For more information, see [Step 3. Create an API with Mock as the Backend Type](https://www. tencentcloud.com/document/product/628/44318!6af3e7433c19e0c5b321cde7fed199bd).

    Directions

    1. Activate the service

    You must activate Tencent Cloud ASR. For more information, see [Activate the service](https://www.tencentcloud.com/document/product/1118/43344#3.-.E6.96.B0.E6.89.8B.E5.85.A5.E9.97.A8! 673aaf6cd84c2d22e15eb0eeef6866e8).

    2. Deploying function

    1. Log in to the SLS console.
    2. Click Create application to go to the "Create application" page.
    3. Select Live Stream Real-time ASR and set parameters on the Basic Configuration page.
    Application name: Specify a custom name for the application.
    Region: Select the region based on the actual business.
    Key information: You can view the key’s information of the Tencent Cloud account on the Manage API Key page.
    4. Click Complete.
    5. Click the function name in the Function name column in the Cloud function section to go to the details page of the function, and click "Trigger management" on the left to view the access path.

    3. Configuring the ASR startup API

    proto: HTTPS
    Method: POST
    URL: https://service-xxx-xxxx.sh.apigw.tencentcs.com/release/asr_speech
    Request parameters:
    Parameter
    Type
    Required
    Description
    SdkAppId
    Int
    Yes
    Application ID. Each TRTC application has a unique application ID.
    RoomId
    Int
    No
    Room ID of the integer type. Each room of a TRTC application has a unique room ID.
    StrRoomId
    String
    No
    Room ID of the string type. Either RoomId or StrRoomId must be configured. If both are configured, RoomId is used.
    UserId
    String
    Yes
    ID of the user who uses the recording service. Each user of a TRTC application has a unique user ID.
    UserSig
    String
    Yes
    Signature of the user who uses the recording service. The signature is used for login authentication of the user.
    Callback
    String
    No
    The address to which a webhook is sent by using the POST method when the recording ends.
    Sample request:
    {
    "SdkAppId": 1400000000,
    "RoomId": 43474,
    "UserId": "user_55952145",
    "UserSig": "eJwtzNEKgkAUBNBxxxxxxx",
    "Callback": "https:xxxxxxxx.com/post/xxx"
    }

    Recognition result webhook API

    Webhook parameters:
    Parameters
    Type
    Required
    Description
    SdkAppId
    Int
    Yes
    Application ID.
    RoomId
    int
    Yes
    Room ID of the integer type.
    UserId
    String
    Yes
    ID of the recognized user.
    StrRoomId
    String
    Yes
    Room ID of the string type.
    Result
    Array
    Yes
    Results of audio recognition in the format of [{},{},{},{}].
    Status
    String
    Yes
    Recognition status of the current user. Valid values: normal and finished.
    The value of Result is a JSON array that contains the following objects:
    Parameter
    Type
    Required
    Description
    Voice
    String
    Yes
    Text of the current sentence in UTF8.
    Index
    Integer
    Yes
    Sequence number of the current sentence in the entire audio stream. The sequence number starts from 0.
    StartTime
    Integer
    Yes
    Start time of the current sentence in the entire audio stream.
    EndTime
    Integer
    Yes
    End time of the current sentence in the entire audio stream.
    Message
    String
    Yes
    Execution result of the recognition task, such as recognition finished, recognition in progress, and recognition failed.
    Sample result:
    {
    "RequestID": "95941e2c85898384a95b81c2a5******",
    "SdkAppId": 1400000000,
    "RoomId": 43474,
    "UserId": "user_55952145",
    "Status": "recognizing/finished",
    "Result": [{
    "Voice": "Real-time voice recognition",
    "Index": 0,
    "StartTime": 0,
    "EndTime": 1024,
    "Message": "success"
    }]
    }
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support