Real-Time Speech Recognition

Connection Preparations
SDK acquisition
The real-time speech recognition SDK and demo for Android can be downloaded here.
Notes on connection
You need to view the API description of real-time speech recognition to understand the use requirements and directions of the API before calling it.
The API requires the phone to have an internet connection over GPRS, 3G, Wi-Fi, etc. and requires the system to be Android 4.0 or later.
Development environment
Import the AAR package 
speech_release.aar: ASR SDK.
implementation(name: 'speech_release', ext: 'aar')
Add dependencies
Add the OkHttp3, Okio, GSON, and SLF4J dependencies in the build.gradle file:
  implementation 'com.squareup.okhttp3:okhttp:4.2.2' 
  implementation 'com.squareup.okio:okio:1.11.0'
  implementation 'com.google.code.gson:gson:2.8.5'
  implementation 'org.slf4j:slf4j-api:1.7.25'
Add the following permissions in AndroidManifest.xml:
  < uses-permission android:name="android.permission.RECORD_AUDIO"/>
  < uses-permission android:name="android.permission.INTERNET"/>
  < uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
Quick Connection
Starting real-time speech recognition
int appid = XXX;
int projectid = XXX;
String secretId = "XXX";
﻿
// For user convenience, the SDK provides a local signature. For the security of the secretKey, please generate the signature on a third-party server in a production environment.
AbsCredentialProvider credentialProvider = new LocalCredentialProvider("your secretKey");
﻿
final AAIClient aaiClient;
try {
    // 1. Initialize the AAIClient object.
    aaiClient = new AAIClient(this, appid, projectid, secretId, credentialProvider);
﻿
/** You can also use temporary certificate authentication
* * 1. Obtain temporary certificates through sts, this step should be implemented on your server side
*   2. Call the interface using temporary credentials
* **/
  // aaiClient = new AAIClient(MainActivity.this, appid, projectId, "temporary secretId", "temporary secretKey", "corresponding token", credentialProvider);
﻿
﻿
    // 2. Initialize ASR request.
    final AudioRecognizeRequest audioRecognizeRequest = new AudioRecognizeRequest.Builder()
            .pcmAudioDataSource(new AudioRecordDataSource()) // Set the voice source to microphone input
            .build();
﻿
    // 3. Initialize ASR result listener.
    final AudioRecognizeResultListener audioRecognizeResultListener = new AudioRecognizeResultListener() {
        @Override
        public void onSliceSuccess(AudioRecognizeRequest audioRecognizeRequest, AudioRecognizeResult audioRecognizeResult, int i) {
            // Return the recognition result of the voice segment
        }
﻿
        @Override
        public void onSegmentSuccess(AudioRecognizeRequest audioRecognizeRequest, AudioRecognizeResult audioRecognizeResult, int i) {
            // Return the recognition result of the voice stream
        }
﻿
        @Override
        public void onSuccess(AudioRecognizeRequest audioRecognizeRequest, String s) {
            // Return all recognition results
        }
﻿
        @Override
        public void onFailure(AudioRecognizeRequest audioRecognizeRequest, ClientException e, ServerException e1) {
            // Recognition failed
        }
    };
﻿
    // 4. Start ASR
    new Thread(new Runnable() {
        @Override
        public void run() {
            if (aaiClient!=null) {
                aaiClient.startAudioRecognize(audioRecognizeRequest, audioRecognizeResultListener);
            }
        }
    }).start();
﻿
} catch (ClientException e) {
    e.printStackTrace();
}
Stopping real-time speech recognition
// 1, Get the request ID
final int requestId = audioRecognizeRequest.getRequestId();
// 2, Call the stop method
new Thread(new Runnable() {
    @Override
    public void run() {
        if (aaiClient!=null){
        // Stop ASR, wait for the current task to end
            aaiClient.stopAudioRecognize(requestId);
        }
    }
}).start();
Canceling real-time speech recognition
// 1, Get the request ID
final int requestId = audioRecognizeRequest.getRequestId();
// 2, Call the cancel method
new Thread(new Runnable() {
    @Override
    public void run() {
        if (aaiClient!=null){
        //Cancel ASR and discard the current task
            aaiClient.cancelAudioRecognize(requestId);
        }
    }
}).start();
Descriptions of Main API Classes and Methods
Calculating signature
You need to implement the AbsCredentialProvider API on your own to calculate the signature. This method is called inside the SDK, and the upper layer doesn't need to care about the source.
The signature calculation function is as follows:
/**
* Signature function: encrypt the original string, the specific encryption algorithm is described below.
* @param source Original string
* @return Encrypted ciphertext
*/
String getAudioRecognizeSign(String source);
Signature algorithm  SecretKey is used to encrypt the source with HMAC-SHA1 first, and then the ciphertext is Base64-encoded to get the final signature string, i.e., sign=Base64Encode(HmacSha1(source,secretKey)).
The SDK provides an implementation class LocalCredentialProvider for testing purposes, but we recommend you use it only in the test environment to guarantee the security of SecretKey and implement the method in the AbsCredentialProvider API in the upper layer in the production environment.
Initializing AAIClient
AAIClient is a core class of ASR, which you can call to start, stop, and cancel speech recognition.
public AAIClient(Context context, int appid, int projectId, String secreteId, AbsCredentialProvider credentialProvider) throws ClientException
Parameter
Type
Required
Description
context
Context
Yes
Context
appid
Int
Yes
AppID registered with Tencent Cloud
projectId
Int
No
Your projectId
secreteId
String
Yes
Your SecreteId
credentialProvider
AbsCredentialProvider
Yes
Authentication class
Sample:
try {
    AaiClient aaiClient = new AAIClient(context, appid, projectId, secretId, credentialProvider);
} catch (ClientException e) {
    e.printStackTrace();
}
If AAIClient is no longer needed, you need to call the release() method to release resources:
aaiClient.release();
Configuring global parameters
You need to call the static methods of the ClientConfiguration class to modify the global configuration.
Method
Description
Default Value
Valid Range
setMaxAudioRecognizeConcurrentNumber
Maximum number of concurrent speech recognition requests
2
1 - 5
setMaxRecognizeSliceConcurrentNumber
Maximum number of concurrent segments for speech recognition
5
1 - 5
setAudioRecognizeSliceTimeout
HTTP read timeout period
5000ms
500 - 10000ms
setAudioRecognizeConnectTimeout
HTTP connection timeout period
5000ms
500 - 10000ms
setAudioRecognizeWriteTimeout
HTTP write timeout period
5000ms
500 - 10000ms
Sample:
ClientConfiguration.setMaxAudioRecognizeConcurrentNumber(2)
ClientConfiguration.setMaxRecognizeSliceConcurrentNumber(5)
ClientConfiguration.setAudioRecognizeSliceTimeout(2000)
ClientConfiguration.setAudioRecognizeConnectTimeout(2000)
ClientConfiguration.setAudioRecognizeWriteTimeout(2000)
Setting result listener
AudioRecognizeResultListener can be used to listen on speech recognition results. It has the following four APIs:
Speech recognition result callback API for audio segment
void onSliceSuccess(AudioRecognizeRequest request, AudioRecognizeResult result, int order);
Parameters
Type
Description
request
AudioRecognizeRequest
Speech recognition request
result
AudioRecognizeResult
Speech recognition result of the audio segment
order
Int
Sequence of the audio stream of the audio segment
Speech recognition result callback API for audio stream
void onSegmentSuccess(AudioRecognizeRequest request, AudioRecognizeResult result, int order);
Parameters
Type
Description
request
AudioRecognizeRequest
Speech recognition request
result
AudioRecognizeResult
Speech recognition result of the audio segment
order
Int
Sequence of the audio stream
Return all recognition results
void onSuccess(AudioRecognizeRequest request, String result);
Parameters
Type
Description
request
AudioRecognizeRequest
Speech recognition request
result
String
All recognition results
ASR request failed callback function
void onFailure(AudioRecognizeRequest request, final ClientException clientException, final ServerException serverException,String response);
Parameters
Type
Description
request
AudioRecognizeRequest
Speech recognition request
clientException
ClientException
Client exception
serverException
ServerException
Server exception
response
String
	JSON string returned by the server
For the sample code, see Demo.
Setting speech recognition parameters
By constructing the AudioRecognizeConfiguration class, you can set the speech recognition configuration:
Parameter
Type
Required
Description
Default Value
setSilentDetectTimeOut
Boolean
No
Specifies whether to enable silence detection. After it is enabled, the silence part before the actual speech starts will not be recognized
true
audioFlowSilenceTimeOut
Int
No
Specifies whether to enable speech start detection timeout. After it is enabled, recording will automatically stop after the timeout period elapses
5000ms
minAudioFlowSilenceTime
Int
No
Minimum period for segmenting two audio streams
2000ms
minVolumeCallbackTime
Int
No
Volume callback time
80ms
Sample:
AudioRecognizeConfiguration audioRecognizeConfiguration = new AudioRecognizeConfiguration.Builder()
    .setSilentDetectTimeOut(true)// Enable silence detection. false means the silent part will not be checked
        .audioFlowSilenceTimeOut(5000) // Silent detection timeout to stop recording
        .minAudioFlowSilenceTime(2000) // Interval time during voice stream recognition
        .minVolumeCallbackTime(80) // Volume callback time
        .build();
﻿
// Start ASR
new Thread(new Runnable() {
    @Override
    public void run() {
        if (aaiClient!=null) {
            aaiClient.startAudioRecognize(audioRecognizeRequest, audioRecognizeResultListener, audioRecognizeConfiguration);
        }
    }
}).start();
Setting status listener
AudioRecognizeStateListener can be used to listen on speech recognition status. It has the following APIs:
Method
Description
onStartRecord
Start of recording
onStopRecord
Stop of recording
onVoiceFlowStart
Start of audio stream
onVoiceFlowStartRecognize
Start of audio stream recognition
onVoiceFlowFinishRecognize
End of audio stream recognition
onVoiceVolume
Volume
onNextAudioData
Return of the audio stream to the host layer for recording caching. It will take effect when true is passed in for new AudioRecordDataSource(true)
Setting timeout listener
AudioRecognizeTimeoutListener can be used to listen on speech recognition timeout. It has the following two APIs:
Method
Description
onFirstVoiceFlowTimeout
Detects the timeout of the first audio stream
onNextVoiceFlowTimeout
Detects the timeout of the next audio stream
Sample:
AudioRecognizeStateListener audioRecognizeStateListener = new AudioRecognizeStateListener() {
  @Override
  public void onStartRecord(AudioRecognizeRequest audioRecognizeRequest) {
      // Start recording
  }
    @Override
  public void onStopRecord(AudioRecognizeRequest audioRecognizeRequest) {
// End recording
  }
    @Override
  public void onVoiceFlowStart(AudioRecognizeRequest audioRecognizeRequest, int i) {
// Voice stream starts
  }
    @Override
  public void onVoiceFlowFinish(AudioRecognizeRequest audioRecognizeRequest, int i) {
// Voice stream ends
  }
    @Override
  public void onVoiceFlowStartRecognize(AudioRecognizeRequest audioRecognizeRequest, int i) {
// Voice stream starts recognition
  }
    @Override
  public void onVoiceFlowFinishRecognize(AudioRecognizeRequest audioRecognizeRequest, int i) {
// Voice stream ends recognition
  }
    @Override
  public void onVoiceVolume(AudioRecognizeRequest audioRecognizeRequest, int i) {
// Volume callback
  }
};
/**
    * Return Audio Stream,
    * Used to return to the host layer for the recording cache service.
    * Since the method runs on the SDK thread, it is mostly used for file operations. The host needs to create a new thread specifically for implementing business logic
    * new AudioRecordDataSource(true) is valid, otherwise, this function will not be called back
    * @param audioDatas
  */
    @Override
    public void onNextAudioData(final short[] audioDatas, final int readBufferLength){
    }
Descriptions of other important classes
AudioRecognizeRequest
If both templateName and customTemplate are set, templateName will be used preferably.
Parameter
Type
Required
Description
Default Value
pcmAudioDataSource
PcmAudioDataSource
Yes
Audio Data Source
No
templateName
String
No
Template name set in the console
No
customTemplate
AudioRecognizeTemplate
No
Custom template
("16k_zh", 1)
AudioRecognizeResult
Speech recognition result object, which corresponds to the AudioRecognizeRequest object and is used to return the speech recognition result.
Parameter
Type
Description
code
Int
Recognition status code
message
String
Recognition prompt message
text
String
Recognition result
seq
Int
Sequence number of the audio segment
voiceId
String
ID of the audio stream of the audio segment
cookie
String
Cookie value
AudioRecognizeTemplate
Custom audio template, for which you need to set the following parameters:
Parameter
Type
Required
Description
engineModelType
String
Yes
Engine model type
resType
Int
Yes
Result return method
Sample:
AudioRecognizeTemplate audioRecognizeTemplate = new AudioRecognizeTemplate("16k_zh",1);
PcmAudioDataSource
This API class can be implemented to recognize mono-channel PCM audio data with a sample rate of 16 kHz. It mainly includes the following APIs:
Add data to the speech recognizer: copy the data with the length of length starting from subscript 0 to the audioPcmData array, and the actual length of the copied data will be returned.
int read(short[] audioPcmData, int length);
Callback function when recognition is started, where you can perform initialization.
void start() throws AudioRecognizerException;
Callback function when recognition is ended, where you can perform clearing.
void stop();
Get the path of the SDK recording source file in PCM format.
void savePcmFileCallBack(String filePath);
Get the path of the SDK recording source file in WAV format.
void saveWaveFileCallBack(String filePath);
Set the maximum amount of data read by the speech recognizer each time.
int maxLengthOnceRead();
AudioRecordDataSource
Implementation class of the PcmAudioDataSource API, which can directly read the audio data input by the mic for real-time recognition.
AudioFileDataSource
Implementation class of the PcmAudioDataSource API, which can directly read mono-channel PCM audio data files with a sample rate of 16 kHz.
Note:
Data in other formats cannot be recognized accurately.
AAILogger
You can use AAILogger to choose to output logs at the DEBUG, INFO, WARN, or ERROR level.
public static void disableDebug();
public static void disableInfo();
public static void disableWarn();
public static void disableError();
public static void enableDebug();
public static void enableInfo();
public static void enableWarn();
public static void enableError();
Guide for Local Audio Data Caching
You can choose to save audios in the host layer locally by following the steps below:
1. Set isSaveAudioRecordFiles to true during the initialization of new AudioRecordDataSource(isSaveAudioRecordFiles).
2. Add the file logic for creating the recording in the AudioRecognizeStateListener.onStartRecord callback function. You can customize the path and filename.
3. Add the stream closing logic in the AudioRecognizeStateListener.onStopRecord callback function and optionally save PCM files as WAV files.
4. Add the logic for writing audio streams to local files in the AudioRecognizeStateListener.onNextAudioData callback function.
5. As the callback functions all run on the SDK thread, to avoid slow writes that may affect the internal running smoothness of the SDK, we recommend you complete the above steps in a single thread pool. For more information, see the sample code in the MainActivity  class in the demo project.

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

Parameter	Type	Required	Description
context	Context	Yes	Context
appid	Int	Yes	AppID registered with Tencent Cloud
projectId	Int	No	Your projectId
secreteId	String	Yes	Your SecreteId
credentialProvider	AbsCredentialProvider	Yes	Authentication class

Method	Description	Default Value	Valid Range
setMaxAudioRecognizeConcurrentNumber	Maximum number of concurrent speech recognition requests	2	1 - 5
setMaxRecognizeSliceConcurrentNumber	Maximum number of concurrent segments for speech recognition	5	1 - 5
setAudioRecognizeSliceTimeout	HTTP read timeout period	5000ms	500 - 10000ms
setAudioRecognizeConnectTimeout	HTTP connection timeout period	5000ms	500 - 10000ms
setAudioRecognizeWriteTimeout	HTTP write timeout period	5000ms	500 - 10000ms

Parameters	Type	Description
request	AudioRecognizeRequest	Speech recognition request
result	AudioRecognizeResult	Speech recognition result of the audio segment
order	Int	Sequence of the audio stream of the audio segment

Method	Description
onStartRecord	Start of recording
onStopRecord	Stop of recording
onVoiceFlowStart	Start of audio stream
onVoiceFlowStartRecognize	Start of audio stream recognition
onVoiceFlowFinishRecognize	End of audio stream recognition
onVoiceVolume	Volume
onNextAudioData	Return of the audio stream to the host layer for recording caching. It will take effect when true is passed in for new AudioRecordDataSource(true)

Method	Description
onFirstVoiceFlowTimeout	Detects the timeout of the first audio stream
onNextVoiceFlowTimeout	Detects the timeout of the next audio stream

tencent cloud

Sign Up

Log in

Compute

Microservice

Data Migration

Database SaaS Tool

Data Security

Application Security

Big Data

Voice Technology

Internet of Things

Stream Services

Cloud Real-time Rendering

Cloud Resource Management

More

Edge Computing

Serverless

Relational Database

Networking

Business Security

Domains & Websites

Face Recognition

AI Platform Service

Middleware

Media On-Demand

Game Services

Management and Audit Tools

Container

Essential Storage Service

Enterprise Distributed DBMS

CDN and Acceleration

Security Services

Enterprise Applications

Image Creation

Natural Language Processing

Communication

Media Process Services

Education Sevices

Developer Tools

Distributed cloud

Data Process and Analysis

NoSQL Database

Network Security

Cloud Security

Office Collaboration

Tencent Big Model

Optical Character Recognition

Interactive Video Services

Media SDK

Medical Services

Monitor and Operation

Connection Preparations

SDK acquisition

Notes on connection

Development environment

Quick Connection

Starting real-time speech recognition

Stopping real-time speech recognition

Canceling real-time speech recognition

Descriptions of Main API Classes and Methods

Calculating signature

Initializing AAIClient

Configuring global parameters

Setting result listener

Setting speech recognition parameters

Setting status listener

Setting timeout listener

Descriptions of other important classes

AudioRecognizeRequest

AudioRecognizeResult

AudioRecognizeTemplate

PcmAudioDataSource

AudioRecordDataSource

AudioFileDataSource

AAILogger

Guide for Local Audio Data Caching