tencent cloud

All product documents
Real-Time Speech Recognition
Last updated: 2022-08-30 11:17:56
Real-Time Speech Recognition
Last updated: 2022-08-30 11:17:56

Connection Preparations

SDK acquisition

The real-time speech recognition SDK and demo for Android can be downloaded here.

Notes on connection

You need to view the API description of real-time speech recognition to understand the use requirements and directions of the API before calling it.
The API requires the phone to have an internet connection over GPRS, 3G, Wi-Fi, etc. and requires the system to be Android 4.0 or later.

Development environment

Import the AAR package speech_release.aar: ASR SDK.
implementation(name: 'speech_release', ext: 'aar')
Add dependencies Add the OkHttp3, Okio, GSON, and SLF4J dependencies in the build.gradle file:
implementation 'com.squareup.okhttp3:okhttp:4.2.2'
implementation 'com.squareup.okio:okio:1.11.0'
implementation 'com.google.code.gson:gson:2.8.5'
implementation 'org.slf4j:slf4j-api:1.7.25'
Add the following permissions in AndroidManifest.xml:
< uses-permission android:name="android.permission.RECORD_AUDIO"/>
< uses-permission android:name="android.permission.INTERNET"/>
< uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />

Quick Connection

Starting real-time speech recognition

int appid = XXX;
int projectid = XXX;
String secretId = "XXX";

// For user convenience, the SDK provides a local signature. For the security of the secretKey, please generate the signature on a third-party server in a production environment.
AbsCredentialProvider credentialProvider = new LocalCredentialProvider("your secretKey");

final AAIClient aaiClient;
try {
// 1. Initialize the AAIClient object.
aaiClient = new AAIClient(this, appid, projectid, secretId, credentialProvider);

/** You can also use temporary certificate authentication
* * 1. Obtain temporary certificates through sts, this step should be implemented on your server side
* 2. Call the interface using temporary credentials
* **/
// aaiClient = new AAIClient(MainActivity.this, appid, projectId, "temporary secretId", "temporary secretKey", "corresponding token", credentialProvider);


// 2. Initialize ASR request.
final AudioRecognizeRequest audioRecognizeRequest = new AudioRecognizeRequest.Builder()
.pcmAudioDataSource(new AudioRecordDataSource()) // Set the voice source to microphone input
.build();

// 3. Initialize ASR result listener.
final AudioRecognizeResultListener audioRecognizeResultListener = new AudioRecognizeResultListener() {
@Override
public void onSliceSuccess(AudioRecognizeRequest audioRecognizeRequest, AudioRecognizeResult audioRecognizeResult, int i) {
// Return the recognition result of the voice segment
}

@Override
public void onSegmentSuccess(AudioRecognizeRequest audioRecognizeRequest, AudioRecognizeResult audioRecognizeResult, int i) {
// Return the recognition result of the voice stream
}

@Override
public void onSuccess(AudioRecognizeRequest audioRecognizeRequest, String s) {
// Return all recognition results
}

@Override
public void onFailure(AudioRecognizeRequest audioRecognizeRequest, ClientException e, ServerException e1) {
// Recognition failed
}
};

// 4. Start ASR
new Thread(new Runnable() {
@Override
public void run() {
if (aaiClient!=null) {
aaiClient.startAudioRecognize(audioRecognizeRequest, audioRecognizeResultListener);
}
}
}).start();

} catch (ClientException e) {
e.printStackTrace();
}

Stopping real-time speech recognition

// 1, Get the request ID
final int requestId = audioRecognizeRequest.getRequestId();
// 2, Call the stop method
new Thread(new Runnable() {
@Override
public void run() {
if (aaiClient!=null){
// Stop ASR, wait for the current task to end
aaiClient.stopAudioRecognize(requestId);
}
}
}).start();

Canceling real-time speech recognition

// 1, Get the request ID
final int requestId = audioRecognizeRequest.getRequestId();
// 2, Call the cancel method
new Thread(new Runnable() {
@Override
public void run() {
if (aaiClient!=null){
//Cancel ASR and discard the current task
aaiClient.cancelAudioRecognize(requestId);
}
}
}).start();

Descriptions of Main API Classes and Methods

Calculating signature

You need to implement the AbsCredentialProvider API on your own to calculate the signature. This method is called inside the SDK, and the upper layer doesn't need to care about the source.
The signature calculation function is as follows:
/**
* Signature function: encrypt the original string, the specific encryption algorithm is described below.
* @param source Original string
* @return Encrypted ciphertext
*/
String getAudioRecognizeSign(String source);
Signature algorithm SecretKey is used to encrypt the source with HMAC-SHA1 first, and then the ciphertext is Base64-encoded to get the final signature string, i.e., sign=Base64Encode(HmacSha1(source,secretKey)).
The SDK provides an implementation class LocalCredentialProvider for testing purposes, but we recommend you use it only in the test environment to guarantee the security of SecretKey and implement the method in the AbsCredentialProvider API in the upper layer in the production environment.

Initializing AAIClient

AAIClient is a core class of ASR, which you can call to start, stop, and cancel speech recognition.
public AAIClient(Context context, int appid, int projectId, String secreteId, AbsCredentialProvider credentialProvider) throws ClientException
Parameter
Type
Required
Description
context
Context
Yes
Context
appid
Int
Yes
AppID registered with Tencent Cloud
projectId
Int
No
Your projectId
secreteId
String
Yes
Your SecreteId
credentialProvider
AbsCredentialProvider
Yes
Authentication class
Sample:
try {
AaiClient aaiClient = new AAIClient(context, appid, projectId, secretId, credentialProvider);
} catch (ClientException e) {
e.printStackTrace();
}
If AAIClient is no longer needed, you need to call the release() method to release resources:
aaiClient.release();

Configuring global parameters

You need to call the static methods of the ClientConfiguration class to modify the global configuration.
Method
Description
Default Value
Valid Range
setMaxAudioRecognizeConcurrentNumber
Maximum number of concurrent speech recognition requests
2
1 - 5
setMaxRecognizeSliceConcurrentNumber
Maximum number of concurrent segments for speech recognition
5
1 - 5
setAudioRecognizeSliceTimeout
HTTP read timeout period
5000ms
500 - 10000ms
setAudioRecognizeConnectTimeout
HTTP connection timeout period
5000ms
500 - 10000ms
setAudioRecognizeWriteTimeout
HTTP write timeout period
5000ms
500 - 10000ms
Sample:
ClientConfiguration.setMaxAudioRecognizeConcurrentNumber(2)
ClientConfiguration.setMaxRecognizeSliceConcurrentNumber(5)
ClientConfiguration.setAudioRecognizeSliceTimeout(2000)
ClientConfiguration.setAudioRecognizeConnectTimeout(2000)
ClientConfiguration.setAudioRecognizeWriteTimeout(2000)

Setting result listener

AudioRecognizeResultListener can be used to listen on speech recognition results. It has the following four APIs:
Speech recognition result callback API for audio segment
void onSliceSuccess(AudioRecognizeRequest request, AudioRecognizeResult result, int order);
Parameters
Type
Description
request
AudioRecognizeRequest
Speech recognition request
result
AudioRecognizeResult
Speech recognition result of the audio segment
order
Int
Sequence of the audio stream of the audio segment
Speech recognition result callback API for audio stream
void onSegmentSuccess(AudioRecognizeRequest request, AudioRecognizeResult result, int order);
Parameters
Type
Description
request
AudioRecognizeRequest
Speech recognition request
result
AudioRecognizeResult
Speech recognition result of the audio segment
order
Int
Sequence of the audio stream
Return all recognition results
void onSuccess(AudioRecognizeRequest request, String result);
Parameters
Type
Description
request
AudioRecognizeRequest
Speech recognition request
result
String
All recognition results
ASR request failed callback function
void onFailure(AudioRecognizeRequest request, final ClientException clientException, final ServerException serverException,String response);
Parameters
Type
Description
request
AudioRecognizeRequest
Speech recognition request
clientException
ClientException
Client exception
serverException
ServerException
Server exception
response
String
JSON string returned by the server
For the sample code, see Demo.

Setting speech recognition parameters

By constructing the AudioRecognizeConfiguration class, you can set the speech recognition configuration:
Parameter
Type
Required
Description
Default Value
setSilentDetectTimeOut
Boolean
No
Specifies whether to enable silence detection. After it is enabled, the silence part before the actual speech starts will not be recognized
true
audioFlowSilenceTimeOut
Int
No
Specifies whether to enable speech start detection timeout. After it is enabled, recording will automatically stop after the timeout period elapses
5000ms
minAudioFlowSilenceTime
Int
No
Minimum period for segmenting two audio streams
2000ms
minVolumeCallbackTime
Int
No
Volume callback time
80ms
Sample:
AudioRecognizeConfiguration audioRecognizeConfiguration = new AudioRecognizeConfiguration.Builder()
.setSilentDetectTimeOut(true)// Enable silence detection. false means the silent part will not be checked
.audioFlowSilenceTimeOut(5000) // Silent detection timeout to stop recording
.minAudioFlowSilenceTime(2000) // Interval time during voice stream recognition
.minVolumeCallbackTime(80) // Volume callback time
.build();

// Start ASR
new Thread(new Runnable() {
@Override
public void run() {
if (aaiClient!=null) {
aaiClient.startAudioRecognize(audioRecognizeRequest, audioRecognizeResultListener, audioRecognizeConfiguration);
}
}
}).start();

Setting status listener

AudioRecognizeStateListener can be used to listen on speech recognition status. It has the following APIs:
Method
Description
onStartRecord
Start of recording
onStopRecord
Stop of recording
onVoiceFlowStart
Start of audio stream
onVoiceFlowStartRecognize
Start of audio stream recognition
onVoiceFlowFinishRecognize
End of audio stream recognition
onVoiceVolume
Volume
onNextAudioData
Return of the audio stream to the host layer for recording caching. It will take effect when true is passed in for new AudioRecordDataSource(true)

Setting timeout listener

AudioRecognizeTimeoutListener can be used to listen on speech recognition timeout. It has the following two APIs:
Method
Description
onFirstVoiceFlowTimeout
Detects the timeout of the first audio stream
onNextVoiceFlowTimeout
Detects the timeout of the next audio stream
Sample:
AudioRecognizeStateListener audioRecognizeStateListener = new AudioRecognizeStateListener() {
@Override
public void onStartRecord(AudioRecognizeRequest audioRecognizeRequest) {
// Start recording
}
@Override
public void onStopRecord(AudioRecognizeRequest audioRecognizeRequest) {
// End recording
}
@Override
public void onVoiceFlowStart(AudioRecognizeRequest audioRecognizeRequest, int i) {
// Voice stream starts
}
@Override
public void onVoiceFlowFinish(AudioRecognizeRequest audioRecognizeRequest, int i) {
// Voice stream ends
}
@Override
public void onVoiceFlowStartRecognize(AudioRecognizeRequest audioRecognizeRequest, int i) {
// Voice stream starts recognition
}
@Override
public void onVoiceFlowFinishRecognize(AudioRecognizeRequest audioRecognizeRequest, int i) {
// Voice stream ends recognition
}
@Override
public void onVoiceVolume(AudioRecognizeRequest audioRecognizeRequest, int i) {
// Volume callback
}
};
/**
* Return Audio Stream,
* Used to return to the host layer for the recording cache service.
* Since the method runs on the SDK thread, it is mostly used for file operations. The host needs to create a new thread specifically for implementing business logic
* new AudioRecordDataSource(true) is valid, otherwise, this function will not be called back
* @param audioDatas
*/
@Override
public void onNextAudioData(final short[] audioDatas, final int readBufferLength){
}

Descriptions of other important classes

AudioRecognizeRequest

If both templateName and customTemplate are set, templateName will be used preferably.
Parameter
Type
Required
Description
Default Value
pcmAudioDataSource
PcmAudioDataSource
Yes
Audio Data Source
No
templateName
String
No
Template name set in the console
No
customTemplate
AudioRecognizeTemplate
No
Custom template
("16k_zh", 1)

AudioRecognizeResult

Speech recognition result object, which corresponds to the AudioRecognizeRequest object and is used to return the speech recognition result.
Parameter
Type
Description
code
Int
Recognition status code
message
String
Recognition prompt message
text
String
Recognition result
seq
Int
Sequence number of the audio segment
voiceId
String
ID of the audio stream of the audio segment
cookie
String
Cookie value

AudioRecognizeTemplate

Custom audio template, for which you need to set the following parameters:
Parameter
Type
Required
Description
engineModelType
String
Yes
Engine model type
resType
Int
Yes
Result return method
Sample:
AudioRecognizeTemplate audioRecognizeTemplate = new AudioRecognizeTemplate("16k_zh",1);

PcmAudioDataSource

This API class can be implemented to recognize mono-channel PCM audio data with a sample rate of 16 kHz. It mainly includes the following APIs:
Add data to the speech recognizer: copy the data with the length of length starting from subscript 0 to the audioPcmData array, and the actual length of the copied data will be returned.
int read(short[] audioPcmData, int length);
Callback function when recognition is started, where you can perform initialization.
void start() throws AudioRecognizerException;
Callback function when recognition is ended, where you can perform clearing.
void stop();
Get the path of the SDK recording source file in PCM format.
void savePcmFileCallBack(String filePath);
Get the path of the SDK recording source file in WAV format.
void saveWaveFileCallBack(String filePath);
Set the maximum amount of data read by the speech recognizer each time.
int maxLengthOnceRead();

AudioRecordDataSource

Implementation class of the PcmAudioDataSource API, which can directly read the audio data input by the mic for real-time recognition.

AudioFileDataSource

Implementation class of the PcmAudioDataSource API, which can directly read mono-channel PCM audio data files with a sample rate of 16 kHz.
Note:
Data in other formats cannot be recognized accurately.

AAILogger

You can use AAILogger to choose to output logs at the DEBUG, INFO, WARN, or ERROR level.
public static void disableDebug();
public static void disableInfo();
public static void disableWarn();
public static void disableError();
public static void enableDebug();
public static void enableInfo();
public static void enableWarn();
public static void enableError();

Guide for Local Audio Data Caching

You can choose to save audios in the host layer locally by following the steps below:
1. Set isSaveAudioRecordFiles to true during the initialization of new AudioRecordDataSource(isSaveAudioRecordFiles).
2. Add the file logic for creating the recording in the AudioRecognizeStateListener.onStartRecord callback function. You can customize the path and filename.
3. Add the stream closing logic in the AudioRecognizeStateListener.onStopRecord callback function and optionally save PCM files as WAV files.
4. Add the logic for writing audio streams to local files in the AudioRecognizeStateListener.onNextAudioData callback function.
5. As the callback functions all run on the SDK thread, to avoid slow writes that may affect the internal running smoothness of the SDK, we recommend you complete the above steps in a single thread pool. For more information, see the sample code in the MainActivity class in the demo project.
Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback

Contact Us

Contact our sales team or business advisors to help your business.

Technical Support

Open a ticket if you're looking for further assistance. Our Ticket is 7x24 available.

7x24 Phone Support