tencent cloud

Feedback

Real-Time Speech Recognition

Last updated: 2022-08-30 11:17:56

    Connection Preparations

    SDK acquisition

    The real-time speech recognition SDK and demo for Android can be downloaded here.

    Notes on connection

    You need to view the API description of real-time speech recognition to understand the use requirements and directions of the API before calling it.
    The API requires the phone to have an internet connection over GPRS, 3G, Wi-Fi, etc. and requires the system to be Android 4.0 or later.

    Development environment

    Import the AAR package speech_release.aar: ASR SDK.
    implementation(name: 'speech_release', ext: 'aar')
    Add dependencies Add the OkHttp3, Okio, GSON, and SLF4J dependencies in the build.gradle file:
    implementation 'com.squareup.okhttp3:okhttp:4.2.2'
    implementation 'com.squareup.okio:okio:1.11.0'
    implementation 'com.google.code.gson:gson:2.8.5'
    implementation 'org.slf4j:slf4j-api:1.7.25'
    Add the following permissions in AndroidManifest.xml:
    < uses-permission android:name="android.permission.RECORD_AUDIO"/>
    < uses-permission android:name="android.permission.INTERNET"/>
    < uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />

    Quick Connection

    Starting real-time speech recognition

    int appid = XXX;
    int projectid = XXX;
    String secretId = "XXX";
    
    // For user convenience, the SDK provides a local signature. For the security of the secretKey, please generate the signature on a third-party server in a production environment.
    AbsCredentialProvider credentialProvider = new LocalCredentialProvider("your secretKey");
    
    final AAIClient aaiClient;
    try {
    // 1. Initialize the AAIClient object.
    aaiClient = new AAIClient(this, appid, projectid, secretId, credentialProvider);
    
    /** You can also use temporary certificate authentication
    * * 1. Obtain temporary certificates through sts, this step should be implemented on your server side
    * 2. Call the interface using temporary credentials
    * **/
    // aaiClient = new AAIClient(MainActivity.this, appid, projectId, "temporary secretId", "temporary secretKey", "corresponding token", credentialProvider);
    
    
    // 2. Initialize ASR request.
    final AudioRecognizeRequest audioRecognizeRequest = new AudioRecognizeRequest.Builder()
    .pcmAudioDataSource(new AudioRecordDataSource()) // Set the voice source to microphone input
    .build();
    
    // 3. Initialize ASR result listener.
    final AudioRecognizeResultListener audioRecognizeResultListener = new AudioRecognizeResultListener() {
    @Override
    public void onSliceSuccess(AudioRecognizeRequest audioRecognizeRequest, AudioRecognizeResult audioRecognizeResult, int i) {
    // Return the recognition result of the voice segment
    }
    
    @Override
    public void onSegmentSuccess(AudioRecognizeRequest audioRecognizeRequest, AudioRecognizeResult audioRecognizeResult, int i) {
    // Return the recognition result of the voice stream
    }
    
    @Override
    public void onSuccess(AudioRecognizeRequest audioRecognizeRequest, String s) {
    // Return all recognition results
    }
    
    @Override
    public void onFailure(AudioRecognizeRequest audioRecognizeRequest, ClientException e, ServerException e1) {
    // Recognition failed
    }
    };
    
    // 4. Start ASR
    new Thread(new Runnable() {
    @Override
    public void run() {
    if (aaiClient!=null) {
    aaiClient.startAudioRecognize(audioRecognizeRequest, audioRecognizeResultListener);
    }
    }
    }).start();
    
    } catch (ClientException e) {
    e.printStackTrace();
    }

    Stopping real-time speech recognition

    // 1, Get the request ID
    final int requestId = audioRecognizeRequest.getRequestId();
    // 2, Call the stop method
    new Thread(new Runnable() {
    @Override
    public void run() {
    if (aaiClient!=null){
    // Stop ASR, wait for the current task to end
    aaiClient.stopAudioRecognize(requestId);
    }
    }
    }).start();

    Canceling real-time speech recognition

    // 1, Get the request ID
    final int requestId = audioRecognizeRequest.getRequestId();
    // 2, Call the cancel method
    new Thread(new Runnable() {
    @Override
    public void run() {
    if (aaiClient!=null){
    //Cancel ASR and discard the current task
    aaiClient.cancelAudioRecognize(requestId);
    }
    }
    }).start();

    Descriptions of Main API Classes and Methods

    Calculating signature

    You need to implement the AbsCredentialProvider API on your own to calculate the signature. This method is called inside the SDK, and the upper layer doesn't need to care about the source.
    The signature calculation function is as follows:
    /**
    * Signature function: encrypt the original string, the specific encryption algorithm is described below.
    * @param source Original string
    * @return Encrypted ciphertext
    */
    String getAudioRecognizeSign(String source);
    Signature algorithm SecretKey is used to encrypt the source with HMAC-SHA1 first, and then the ciphertext is Base64-encoded to get the final signature string, i.e., sign=Base64Encode(HmacSha1(source,secretKey)).
    The SDK provides an implementation class LocalCredentialProvider for testing purposes, but we recommend you use it only in the test environment to guarantee the security of SecretKey and implement the method in the AbsCredentialProvider API in the upper layer in the production environment.

    Initializing AAIClient

    AAIClient is a core class of ASR, which you can call to start, stop, and cancel speech recognition.
    public AAIClient(Context context, int appid, int projectId, String secreteId, AbsCredentialProvider credentialProvider) throws ClientException
    Parameter
    Type
    Required
    Description
    context
    Context
    Yes
    Context
    appid
    Int
    Yes
    AppID registered with Tencent Cloud
    projectId
    Int
    No
    Your projectId
    secreteId
    String
    Yes
    Your SecreteId
    credentialProvider
    AbsCredentialProvider
    Yes
    Authentication class
    Sample:
    try {
    AaiClient aaiClient = new AAIClient(context, appid, projectId, secretId, credentialProvider);
    } catch (ClientException e) {
    e.printStackTrace();
    }
    If AAIClient is no longer needed, you need to call the release() method to release resources:
    aaiClient.release();

    Configuring global parameters

    You need to call the static methods of the ClientConfiguration class to modify the global configuration.
    Method
    Description
    Default Value
    Valid Range
    setMaxAudioRecognizeConcurrentNumber
    Maximum number of concurrent speech recognition requests
    2
    1 - 5
    setMaxRecognizeSliceConcurrentNumber
    Maximum number of concurrent segments for speech recognition
    5
    1 - 5
    setAudioRecognizeSliceTimeout
    HTTP read timeout period
    5000ms
    500 - 10000ms
    setAudioRecognizeConnectTimeout
    HTTP connection timeout period
    5000ms
    500 - 10000ms
    setAudioRecognizeWriteTimeout
    HTTP write timeout period
    5000ms
    500 - 10000ms
    Sample:
    ClientConfiguration.setMaxAudioRecognizeConcurrentNumber(2)
    ClientConfiguration.setMaxRecognizeSliceConcurrentNumber(5)
    ClientConfiguration.setAudioRecognizeSliceTimeout(2000)
    ClientConfiguration.setAudioRecognizeConnectTimeout(2000)
    ClientConfiguration.setAudioRecognizeWriteTimeout(2000)

    Setting result listener

    AudioRecognizeResultListener can be used to listen on speech recognition results. It has the following four APIs:
    Speech recognition result callback API for audio segment
    void onSliceSuccess(AudioRecognizeRequest request, AudioRecognizeResult result, int order);
    Parameters
    Type
    Description
    request
    AudioRecognizeRequest
    Speech recognition request
    result
    AudioRecognizeResult
    Speech recognition result of the audio segment
    order
    Int
    Sequence of the audio stream of the audio segment
    Speech recognition result callback API for audio stream
    void onSegmentSuccess(AudioRecognizeRequest request, AudioRecognizeResult result, int order);
    Parameters
    Type
    Description
    request
    AudioRecognizeRequest
    Speech recognition request
    result
    AudioRecognizeResult
    Speech recognition result of the audio segment
    order
    Int
    Sequence of the audio stream
    Return all recognition results
    void onSuccess(AudioRecognizeRequest request, String result);
    Parameters
    Type
    Description
    request
    AudioRecognizeRequest
    Speech recognition request
    result
    String
    All recognition results
    ASR request failed callback function
    void onFailure(AudioRecognizeRequest request, final ClientException clientException, final ServerException serverException,String response);
    Parameters
    Type
    Description
    request
    AudioRecognizeRequest
    Speech recognition request
    clientException
    ClientException
    Client exception
    serverException
    ServerException
    Server exception
    response
    String
    JSON string returned by the server
    For the sample code, see Demo.

    Setting speech recognition parameters

    By constructing the AudioRecognizeConfiguration class, you can set the speech recognition configuration:
    Parameter
    Type
    Required
    Description
    Default Value
    setSilentDetectTimeOut
    Boolean
    No
    Specifies whether to enable silence detection. After it is enabled, the silence part before the actual speech starts will not be recognized
    true
    audioFlowSilenceTimeOut
    Int
    No
    Specifies whether to enable speech start detection timeout. After it is enabled, recording will automatically stop after the timeout period elapses
    5000ms
    minAudioFlowSilenceTime
    Int
    No
    Minimum period for segmenting two audio streams
    2000ms
    minVolumeCallbackTime
    Int
    No
    Volume callback time
    80ms
    Sample:
    AudioRecognizeConfiguration audioRecognizeConfiguration = new AudioRecognizeConfiguration.Builder()
    .setSilentDetectTimeOut(true)// Enable silence detection. false means the silent part will not be checked
    .audioFlowSilenceTimeOut(5000) // Silent detection timeout to stop recording
    .minAudioFlowSilenceTime(2000) // Interval time during voice stream recognition
    .minVolumeCallbackTime(80) // Volume callback time
    .build();
    
    // Start ASR
    new Thread(new Runnable() {
    @Override
    public void run() {
    if (aaiClient!=null) {
    aaiClient.startAudioRecognize(audioRecognizeRequest, audioRecognizeResultListener, audioRecognizeConfiguration);
    }
    }
    }).start();

    Setting status listener

    AudioRecognizeStateListener can be used to listen on speech recognition status. It has the following APIs:
    Method
    Description
    onStartRecord
    Start of recording
    onStopRecord
    Stop of recording
    onVoiceFlowStart
    Start of audio stream
    onVoiceFlowStartRecognize
    Start of audio stream recognition
    onVoiceFlowFinishRecognize
    End of audio stream recognition
    onVoiceVolume
    Volume
    onNextAudioData
    Return of the audio stream to the host layer for recording caching. It will take effect when true is passed in for new AudioRecordDataSource(true)

    Setting timeout listener

    AudioRecognizeTimeoutListener can be used to listen on speech recognition timeout. It has the following two APIs:
    Method
    Description
    onFirstVoiceFlowTimeout
    Detects the timeout of the first audio stream
    onNextVoiceFlowTimeout
    Detects the timeout of the next audio stream
    Sample:
    AudioRecognizeStateListener audioRecognizeStateListener = new AudioRecognizeStateListener() {
    @Override
    public void onStartRecord(AudioRecognizeRequest audioRecognizeRequest) {
    // Start recording
    }
    @Override
    public void onStopRecord(AudioRecognizeRequest audioRecognizeRequest) {
    // End recording
    }
    @Override
    public void onVoiceFlowStart(AudioRecognizeRequest audioRecognizeRequest, int i) {
    // Voice stream starts
    }
    @Override
    public void onVoiceFlowFinish(AudioRecognizeRequest audioRecognizeRequest, int i) {
    // Voice stream ends
    }
    @Override
    public void onVoiceFlowStartRecognize(AudioRecognizeRequest audioRecognizeRequest, int i) {
    // Voice stream starts recognition
    }
    @Override
    public void onVoiceFlowFinishRecognize(AudioRecognizeRequest audioRecognizeRequest, int i) {
    // Voice stream ends recognition
    }
    @Override
    public void onVoiceVolume(AudioRecognizeRequest audioRecognizeRequest, int i) {
    // Volume callback
    }
    };
    /**
    * Return Audio Stream,
    * Used to return to the host layer for the recording cache service.
    * Since the method runs on the SDK thread, it is mostly used for file operations. The host needs to create a new thread specifically for implementing business logic
    * new AudioRecordDataSource(true) is valid, otherwise, this function will not be called back
    * @param audioDatas
    */
    @Override
    public void onNextAudioData(final short[] audioDatas, final int readBufferLength){
    }

    Descriptions of other important classes

    AudioRecognizeRequest

    If both templateName and customTemplate are set, templateName will be used preferably.
    Parameter
    Type
    Required
    Description
    Default Value
    pcmAudioDataSource
    PcmAudioDataSource
    Yes
    Audio Data Source
    No
    templateName
    String
    No
    Template name set in the console
    No
    customTemplate
    AudioRecognizeTemplate
    No
    Custom template
    ("16k_zh", 1)

    AudioRecognizeResult

    Speech recognition result object, which corresponds to the AudioRecognizeRequest object and is used to return the speech recognition result.
    Parameter
    Type
    Description
    code
    Int
    Recognition status code
    message
    String
    Recognition prompt message
    text
    String
    Recognition result
    seq
    Int
    Sequence number of the audio segment
    voiceId
    String
    ID of the audio stream of the audio segment
    cookie
    String
    Cookie value

    AudioRecognizeTemplate

    Custom audio template, for which you need to set the following parameters:
    Parameter
    Type
    Required
    Description
    engineModelType
    String
    Yes
    Engine model type
    resType
    Int
    Yes
    Result return method
    Sample:
    AudioRecognizeTemplate audioRecognizeTemplate = new AudioRecognizeTemplate("16k_zh",1);

    PcmAudioDataSource

    This API class can be implemented to recognize mono-channel PCM audio data with a sample rate of 16 kHz. It mainly includes the following APIs:
    Add data to the speech recognizer: copy the data with the length of length starting from subscript 0 to the audioPcmData array, and the actual length of the copied data will be returned.
    int read(short[] audioPcmData, int length);
    Callback function when recognition is started, where you can perform initialization.
    void start() throws AudioRecognizerException;
    Callback function when recognition is ended, where you can perform clearing.
    void stop();
    Get the path of the SDK recording source file in PCM format.
    void savePcmFileCallBack(String filePath);
    Get the path of the SDK recording source file in WAV format.
    void saveWaveFileCallBack(String filePath);
    Set the maximum amount of data read by the speech recognizer each time.
    int maxLengthOnceRead();

    AudioRecordDataSource

    Implementation class of the PcmAudioDataSource API, which can directly read the audio data input by the mic for real-time recognition.

    AudioFileDataSource

    Implementation class of the PcmAudioDataSource API, which can directly read mono-channel PCM audio data files with a sample rate of 16 kHz.
    Note:
    Data in other formats cannot be recognized accurately.

    AAILogger

    You can use AAILogger to choose to output logs at the DEBUG, INFO, WARN, or ERROR level.
    public static void disableDebug();
    public static void disableInfo();
    public static void disableWarn();
    public static void disableError();
    public static void enableDebug();
    public static void enableInfo();
    public static void enableWarn();
    public static void enableError();

    Guide for Local Audio Data Caching

    You can choose to save audios in the host layer locally by following the steps below:
    1. Set isSaveAudioRecordFiles to true during the initialization of new AudioRecordDataSource(isSaveAudioRecordFiles).
    2. Add the file logic for creating the recording in the AudioRecognizeStateListener.onStartRecord callback function. You can customize the path and filename.
    3. Add the stream closing logic in the AudioRecognizeStateListener.onStopRecord callback function and optionally save PCM files as WAV files.
    4. Add the logic for writing audio streams to local files in the AudioRecognizeStateListener.onNextAudioData callback function.
    5. As the callback functions all run on the SDK thread, to avoid slow writes that may affect the internal running smoothness of the SDK, we recommend you complete the above steps in a single thread pool. For more information, see the sample code in the MainActivity class in the demo project.
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support