Real-Time Speech Recognition (WebSocket)

Automatic Speech Recognition

User Guide

Product Introduction

Release Notes

Access Management

Overview

Authorizable Resource Types

Authorization Policy Syntax

Purchase Guide

Getting Started

Quick Server API Connection

API Documentation

Making API Requests

Recording Recognition APIs

DescribeTaskStatus

CreateRecTask

Real-Time Speech Recognition APIs

Real-Time Speech Recognition (WebSocket)

Data Types

Error Codes

SDK Documentation

Quick SDK Integration and Run

iOS

Real-Time Speech Recognition

Android

Real-Time Speech Recognition

FAQs

Related Agreement

Service Level Agreement

Data Privacy And Security Agreement

Glossary

DocumentationAutomatic Speech RecognitionAPI DocumentationReal-Time Speech Recognition APIsReal-Time Speech Recognition (WebSocket)

Real-Time Speech Recognition (WebSocket)

Download PDF

Last updated: 2024-11-28 11:00:31

Real-Time Speech Recognition (WebSocket)

Last updated: 2024-11-28 11:00:31

Download PDF

Note:
This interface is version API 2.0. It differs from version API 3.0 in terms of parameter style, error codes, etc. Please be aware.
Interface description
This API service utilizes the websocket protocol to recognize real-time audio streams and synchronously return recognition results, achieving a "speak and text appears" effect.
Before using this API, you need to activate the service in the ASR console and go to the API Key Management page to create a new key, generating an AppID, SecretID, and SecretKey for generating signatures during API calls. The signature will be used for interface authentication.
Interface requirements
To integrate the real-time ASR API, follow these requirements.
Content
Description
Language types
Supports Mandarin, Cantonese, English, Korean, Japanese, Thai, Indonesian, Malay, Arabic, etc. The corresponding language type can be set through the interface parameter engine_model_type
Supported industries
General, Finance, Gaming, Education, Medical
Audio properties
Sampling Rate: 16000Hz or 8000Hz
Sampling Accuracy: 16bits
Audio Track: Mono
Audio format
pcm,wav,opus,speex,silk,mp3,m4a,aac
Request protocol
wss Protocol
Request address
wss://asr.cloud.tencent.com/asr/v2/<appid>?{request parameters}
Interface Authentication
Signature authentication mechanism, see Signature generation
Response Format
Unified JSON format
Data Transmission
It is recommended to send data packets with a duration of 40ms every 40ms (i.e., 1:1 real-time rate). For PCM, the data size is: 640 bytes at 8k sampling rate, 1280 bytes at 16k sampling rate
Audio transmission rate that is too fast (exceeds 1:1 real-time rate) or an interval between audio packets exceeding 6 seconds may cause engine errors, and the backend will return an error and proactively disconnect the connection
Concurrency Limitation
By default, the concurrent connection limit for a single account is set to 20. If you need to increase this limit, please submit a ticket for consultation.
API call process
The interface call process is divided into two stages: handshake phase and recognition phase. The backend returns a text message in both stages, in JSON serialized string format. The format is as follows:
Field name
Type
Description
code
Integer
Status Codes: 0 means normal, non-zero values indicate an error occurred
message
String
Error Description: Displays the specific reason for the error. This text may be frequently updated or changed as the business develops or to optimize the user experience
voice_id
String
Unique Audio Stream ID: Generated by the client during the handshake phase and assigned in the call parameters
message_id
String
Unique message ID
result
Result
Latest ASR Result
final
Integer
When this field returns 1, it indicates that the audio stream recognition is complete
The recognition result is in Result structure format:
Field name
Type
Description
slice_type
Integer
Recognition Result Type:
0: Start of a new paragraph recognition
1: Recognizing a paragraph, voice_text_str is a non-steady-state result (the recognition result of this paragraph may change) 
2: Recognition of a paragraph completed, voice_text_str is a steady-state result (the recognition result of this paragraph will no longer change)
Based on the transmitted audio, the possible slice_type sequence that may return during recognition includes:
0-1-2: Start of a new paragraph recognition, in-progress recognition (multiple returns of 1 possible), recognition completed
0-2: Start of a new paragraph recognition, recognition completed
2: Directly return the complete recognition result of a paragraph
index
Integer
The current paragraph result's sequence number in the entire audio stream, incrementing from 0 for each sentence
start_time
Integer
The start time of the current paragraph result in the entire audio stream
end_time
Integer
The end time of the current paragraph result in the entire audio stream
voice_text_str
String
The text result of the current paragraph, encoded in UTF-8
word_size
Integer
The number of words in the current paragraph result
word_list
Word Array
The word list for the current paragraph, Word structure format is:
word: String type, the content of the word
start_time: Integer type, the start time of the word in the entire audio stream
end_time: Integer type, the end time of the word in the entire audio stream
stable_flag: Integer type, the steady-state result of the word. 0 indicates that the word may change in subsequent recognition, 1 indicates that the word will not change in subsequent recognition
Handshake phase
Request format
Handshake phase, the client actively initiates a websocket connection request, the request URL format is:
wss://asr.cloud.tencent.com/asr/v2/<appid>?{request parameters}
Replace <appid> with the AppID of your Tencent Cloud registered account, which can be retrieved from the API Key Management Page. The format of {request parameters} is
key1=value1&key2=value2...(both key and value need to be URL encoded)
Parameter description:
Parameter name
Required
Type
Description
secretid
Yes
String
SecretId of the Tencent Cloud registered account, can be obtained through the API Key Management Page
timestamp
Yes
Integer
Current UNIX timestamp in seconds. If the difference with the current time is too large, it will cause a signature expiration error
expired
Yes
Integer
The UNIX timestamp of the signature's expiration time, in seconds. expired must be greater than timestamp and expired - timestamp less than 90 days
nonce
Yes
Integer
Random positive integer. Users need to generate it themselves, up to 10 digits
engine_model_type
Yes
String
Engine Model Type
Telephone Scenario:
• 8k_zh: Chinese Telephone General;
• 8k_en: English Telephone General;
Non-Telephone Scenario:
• 16k_zh_large: General English large model engine [large model version]. The current model supports recognition of Chinese, English, various Chinese dialects, etc., with a large number of model parameters, enhanced language model performance, and greatly improved recognition accuracy for low-quality audio such as high noise, high echo, low human voice, and distant human voice;
• 16k_zh: Mandarin Chinese General;
• 16k_yue: Cantonese;
• 16k_zh-TW: Traditional Chinese;
• 16k_ar: Arabic;
• 16k_en: English;
• 16k_ko: Korean;
• 16k_ja: Japanese;
• 16k_th: Thai;
• 16k_id: Indonesian;
• 16k_ms: Malay;
voice_id
Yes
String
16-bit String as a unique identifier for each audio, generated by the user
voice_format
No
Int
Speech Encoding method, optional, default value is 4.1:pcm;4:speex(sp);6:silk;8:mp3;10:opus(opus format audio stream packaging description);12:wav;14:m4a(each fragment must be a complete m4a audio);16:aac
needvad
No
Integer
0: disable vad, 1: enable vad
If the audio fragment length exceeds 60 seconds, users need to enable vad (voice activity detection feature)
hotword_id
No
String
Hotword list id. If this parameter is not set, the default hotword list will automatically take effect. If this parameter is set, the corresponding hotword list will take effect
reinforce_hotword
No
Integer
Hotword enhancement feature. Default is 0, 0: disabled, 1: enabled.
After enabling (only supported for 8k_zh, 16k_zh), the homophonic substitution feature will be activated, replacing homophones and words in the hotword list.
For example: After setting the hotword "蜜制" and enabling the enhancement feature, recognition results of words with the same pronunciation (mizhi) as "蜜制", such as "秘制", "蜜汁", will be forcibly replaced with "蜜制". Therefore, it is recommended that customers enable this feature based on their actual situation.
customization_id
No
String
Self-learning model id. If this parameter is not set, the last self-learning model to go online will automatically take effect. If this parameter is set, the corresponding self-learning model will take effect
filter_dirty
No
Integer
Whether to filter profanity (currently supports Mandarin Chinese engine). The default is 0. 0: Do not filter profanity; 1: Filter profanity; 2: Replace profanity with " * "
filter_modal
No
Integer
Whether to filter modal particles (currently supports Mandarin Chinese engine). The default is 0. 0: Do not filter modal particles; 1: Partially filter; 2: Strictly filter
filter_punc
No
Integer
Whether to filter sentence-ending periods (currently supports Mandarin Chinese engine). The default is 0. 0: Do not filter sentence-ending periods; 1: Filter sentence-ending periods
filter_empty_result
No
Integer
Whether to callback for empty results, default is 1.0: callback empty results; 1: do not callback empty results;
Note: If you need to pair callbacks with slice_type=0 and slice_type=2, you need to set filter_empty_result=0. Pairing returns is typically required in outbound scenarios, and slice_type=0 is used to determine the presence of human voice.
convert_num_mode
No
Integer
Whether to perform intelligent conversion of Arabic numerals (currently supported by the Mandarin Chinese engine). 0: do not convert, output Chinese numerals directly, 1: intelligently convert to Arabic numerals according to the scenario, 3: enable math-related digit conversion. Default is 1
word_info
No
Int
Whether to display word-level timestamps. 0: do not display; 1: display, do not include punctuation timestamps, 2: display, include punctuation timestamps. Supported engines: 8k_en, 8k_zh, 8k_zh_finance, 16k_zh, 16k_en, 16k_ca, 16k_zh-TW, 16k_ja, 16k_wuu-SH, default is 0
vad_silence_time
No
Integer
Speech segmentation detection threshold. Silence duration exceeding this threshold will be considered a break (mainly used in intelligent customer service scenarios, needs to be used with needvad = 1). The value range is 240-2000 ms. It is recommended not to adjust this parameter arbitrarily as it may affect recognition results. Currently, it is only supported by the 8k_zh, 8k_zh_finance, and 16k_zh engine models
max_speak_time
No
Integer
Forced segmentation feature, value range 5000-90000 (unit: milliseconds), default value is 0 (not enabled). In the case of continuous speech without interruption, this parameter will enforce segmentation (the result becomes stable, slice_type=2). For example: In a game commentary scenario, if the commentator continues to speak without interruption and cannot segment the speech, setting this parameter to 10000 will receive a slice_type=2 callback every 10 seconds.
noise_threshold
No
Float
Noise parameter threshold, default is 0, range: [-1,1]. For some audio segments, the higher the value, the more likely it is to be detected as noise. The lower the value, the more likely it is to be detected as human voice.
Use with caution: may affect recognition accuracy
signature
Yes
String
API signature parameter
hotword_list
No
String
Temporary hotword list: This parameter is used to improve recognition accuracy.
Single hotword limit: "hotword|weight", each hotword should not exceed 30 characters (maximum 10 Chinese characters), weight ranges from 1-11, e.g., "Tencent Cloud|5" or "ASR|11";
Temporary hotword list limit: Multiple hotwords are separated by English commas, with a maximum of 128 hotwords supported, e.g., "Tencent Cloud|10, ASR|5, ASR|11";
Difference between the parameters hotword_id (hotword list) and hotword_list (temporary hotword list):
hotword_id: hotword list. You need to create a hotword list in the console or via the API to obtain the corresponding hotword_id to use the hotword feature;
hotword_list: temporary hotword list. Directly pass in the temporary hotword list for each request to use the hotword feature. The cloud does not retain the temporary hotword list. Suitable for users with a large demand for hotwords;
﻿
Note:
If both hotword_id and hotword_list are passed in, hotword_list will be used first;
When the hotword weight is set to 11, the current hotword will be upgraded to a super hotword. It is recommended to set only important and must-be-effective hotwords to 11. Setting too many hotwords with a weight of 11 will affect the overall word accuracy rate.
input_sample_rate
No
Interge
Supports 8k audio in pcm format to be upsampled to 16k when the sampling rate does not match the engine, effectively improving recognition accuracy. Only supports: 8000. For example: if 8000 is passed in, then the pcm audio sampling rate is 8k. When the engine is selected to 16k_zh, the 8k sampling rate of the pcm audio can be recognized normally under the 16k_zh engine.
 Note: this parameter is only applicable to pcm format audio. If no value is passed in, the default state will be maintained, which means the default engine sampling rate is equal to the pcm audio sampling rate.
signature generation
1. Sort all parameters except signature in dictionary order and concatenate the request URL as the original signature text. Here, using Appid=1259228442, SecretId=AKIDoQq1zhZMN8dv0psmvud6OUKuGPO7pu0r as an example to concatenate the original signature text, the concatenated original signature text is:
asr.cloud.tencent.com/asr/v2/1259228442?engine_model_type=16k_zh&expired=1592380492&filter_dirty=1&filter_modal=1&filter_punc=1&needvad=1&nonce=1592294092123&secretid=AKIDoQq1zhZMN8dv0psmvud6OUKuGPO7pu0r&timestamp=1592294092&voice_format=1&voice_id=RnKu9FODFHK5FPpsrN
2. Use SecretKey to encrypt the signature plaintext with HmacSha1, then perform base64 encoding. For example, for the signature plaintext of Back, SecretKey=kFpwoX5RYQ2SkqpeHgqmSzHK7h3A2fni, use the HmacSha1 algorithm for encryption and base64 encoding processing:
Base64Encode(HmacSha1("asr.cloud.tencent.com/asr/v2/1259228442?engine_model_type=16k_zh&expired=1592380492&filter_dirty=1&filter_modal=1&filter_punc=1&needvad=1&nonce=1592294092123&secretid=AKIDoQq1zhZMN8dv0psmvud6OUKuGPO7pu0r&timestamp=1592294092&voice_format=1&voice_id=RnKu9FODFHK5FPpsrN", "kFpwoX5RYQ2SkqpeHgqmSzHK7h3A2fni"))
The obtained signature value is:
HepdTRX6u155qIPKNKC+3U0j1N0=
3. URL encode the signature value (urlencoding is necessary, otherwise occasional authentication failures may occur ) and concatenate to get the final request URL:
wss://asr.cloud.tencent.com/asr/v2/1259228442?engine_model_type=16k_zh&expired=1592380492&filter_dirty=1&filter_modal=1&filter_punc=1&needvad=1&nonce=1592294092123&secretid=AKIDoQq1zhZMN8dv0psmvud6OUKuGPO7pu0r&timestamp=1592294092&voice_format=1&voice_id=RnKu9FODFHK5FPpsrN&signature=HepdTRX6u155qIPKNKC%2B3U0j1N0%3D
Opus audio stream encapsulation instructions  
The compressed FrameSize is fixed at 640, that is, 640 shorts are compressed at a time, otherwise decompression will fail. The data sent to the server can be a combination of multiple frames, each frame must meet the following format.
Each frame of compressed data is encapsulated as follows:
OpusHead (4 bytes)
Frame data length (2 bytes)
An Opus frame of compressed data
opus
Length len
Corresponding opus decode data of length len
Request Response
After the client initiates a connection request, the background establishes the connection and performs signature verification. If the verification is successful, a confirmation message with a code value of 0 indicating a successful handshake is returned. If the verification fails, the background returns a message with a non-zero code value and disconnects.
 {"code":0,"message":"success","voice_id":"RnKu9FODFHK5FPpsrN"}
Recognition Stage
After the handshake is successful, enter the recognition stage. The client uploads voice data and receives recognition result messages.
Upload Data
During the recognition process, the client continuously uploads binary messages to the background, which contain binary data of the audio stream. It is recommended to send 40ms duration (i.e., 1:1 real-time rate) data packets every 40ms, corresponding to PCM sizes of 640 bytes for an 8k sampling rate and 1280 bytes for a 16k sampling rate. If the audio sending rate is too fast, exceeding the 1:1 real-time rate, or the interval between audio data packets exceeds 6 seconds, it may cause an engine error. The background will return an error and proactively disconnect. After the audio stream upload is complete, the client needs to send a text message with the following content to notify the background to end recognition.
{"type": "end"}
Receiving messages
During the client's data upload process, it needs to simultaneously receive real-time recognition results returned by the background. Example results:
 {"code":0,"message":"success","voice_id":"RnKu9FODFHK5FPpsrN","message_id":"RnKu9FODFHK5FPpsrN_11_0","result":{"slice_type":0,"index":0,"start_time":0,"end_time":1240,"voice_text_str":"real-time","word_size":0,"word_list":[]}}
{"code":0,"message":"success","voice_id":"RnKu9FODFHK5FPpsrN","message_id":"RnKu9FODFHK5FPpsrN_33_0","result":{"slice_type":2,"index":0,"start_time":0,"end_time":2840,"voice_text_str":"real-time ASR","word_size":0,"word_list":[]}}
After the background recognition of all uploaded voice data is complete, a final message with a value of 1 is returned and the connection is disconnected.
{"code":0,"message":"success","voice_id":"CzhjnqBkv8lk5pRUxhpX","message_id":"CzhjnqBkv8lk5pRUxhpX_241","final":1}
If an error occurs during recognition, the background returns a message with a code of non-zero value and disconnects the connection.
{"code":4008,"message":"background recognition server audio shard wait timeout","voice_id":"CzhjnqBkv8lk5pRUxhpX","message_id":"CzhjnqBkv8lk5pRUxhpX_241"}
Developer Resources
SDK
Tencent Cloud Speech SDK for Go
Tencent Cloud Speech SDK for Java
Tencent Cloud Speech SDK for C++
Tencent Cloud Speech SDK for Python
Tencent Cloud Speech SDK for JS
SDK Invocation Example
Golang example
Java example
C++ example
Python example
JS example
Error code
Value
Description
4001
Invalid Parameter, see message for details
4002
Authentication failed
4003
AppID Service Not Activated, please activate the service in the console
4004
No available free quota
4005
Account arrears. Service stopped, please recharge in time
4006
The account's concurrent calls limit is exceeded
4007
Audio decoding failed. Please check that the uploaded audio data format is consistent with the call parameters
4008
Client data upload timeout
4009
Client connection closed
4010
Client uploaded an unknown text message
5000
Background error, please try again
5001
Background recognition server recognition failure, please try again
5002
Background recognition server recognition failure, please try again

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

Content	Description
Language types	Supports Mandarin, Cantonese, English, Korean, Japanese, Thai, Indonesian, Malay, Arabic, etc. The corresponding language type can be set through the interface parameter engine_model_type
Supported industries	General, Finance, Gaming, Education, Medical
Audio properties	Sampling Rate: 16000Hz or 8000Hz Sampling Accuracy: 16bits Audio Track: Mono
Audio format	pcm,wav,opus,speex,silk,mp3,m4a,aac
Request protocol	wss Protocol
Request address	wss://asr.cloud.tencent.com/asr/v2/<appid>?{request parameters}
Interface Authentication	Signature authentication mechanism, see Signature generation
Response Format	Unified JSON format
Data Transmission	It is recommended to send data packets with a duration of 40ms every 40ms (i.e., 1:1 real-time rate). For PCM, the data size is: 640 bytes at 8k sampling rate, 1280 bytes at 16k sampling rate Audio transmission rate that is too fast (exceeds 1:1 real-time rate) or an interval between audio packets exceeding 6 seconds may cause engine errors, and the backend will return an error and proactively disconnect the connection
Concurrency Limitation	By default, the concurrent connection limit for a single account is set to 20. If you need to increase this limit, please submit a ticket for consultation.

Field name	Type	Description
code	Integer	Status Codes: 0 means normal, non-zero values indicate an error occurred
message	String	Error Description: Displays the specific reason for the error. This text may be frequently updated or changed as the business develops or to optimize the user experience
voice_id	String	Unique Audio Stream ID: Generated by the client during the handshake phase and assigned in the call parameters
message_id	String	Unique message ID
result	Result	Latest ASR Result
final	Integer	When this field returns 1, it indicates that the audio stream recognition is complete

Parameter name	Required	Type	Description
secretid	Yes	String	SecretId of the Tencent Cloud registered account, can be obtained through the API Key Management Page
timestamp	Yes	Integer	Current UNIX timestamp in seconds. If the difference with the current time is too large, it will cause a signature expiration error
expired	Yes	Integer	The UNIX timestamp of the signature's expiration time, in seconds. expired must be greater than timestamp and expired - timestamp less than 90 days
nonce	Yes	Integer	Random positive integer. Users need to generate it themselves, up to 10 digits
engine_model_type	Yes	String	Engine Model Type Telephone Scenario: • 8k_zh: Chinese Telephone General; • 8k_en: English Telephone General; Non-Telephone Scenario: • 16k_zh_large: General English large model engine [large model version]. The current model supports recognition of Chinese, English, various Chinese dialects, etc., with a large number of model parameters, enhanced language model performance, and greatly improved recognition accuracy for low-quality audio such as high noise, high echo, low human voice, and distant human voice; • 16k_zh: Mandarin Chinese General; • 16k_yue: Cantonese; • 16k_zh-TW: Traditional Chinese; • 16k_ar: Arabic; • 16k_en: English; • 16k_ko: Korean; • 16k_ja: Japanese; • 16k_th: Thai; • 16k_id: Indonesian; • 16k_ms: Malay;
voice_id	Yes	String	16-bit String as a unique identifier for each audio, generated by the user
voice_format	No	Int	Speech Encoding method, optional, default value is 4.1:pcm;4:speex(sp);6:silk;8:mp3;10:opus(opus format audio stream packaging description);12:wav;14:m4a(each fragment must be a complete m4a audio);16:aac
needvad	No	Integer	0: disable vad, 1: enable vad If the audio fragment length exceeds 60 seconds, users need to enable vad (voice activity detection feature)
hotword_id	No	String	Hotword list id. If this parameter is not set, the default hotword list will automatically take effect. If this parameter is set, the corresponding hotword list will take effect
reinforce_hotword	No	Integer	Hotword enhancement feature. Default is 0, 0: disabled, 1: enabled. After enabling (only supported for 8k_zh, 16k_zh), the homophonic substitution feature will be activated, replacing homophones and words in the hotword list. For example: After setting the hotword "蜜制" and enabling the enhancement feature, recognition results of words with the same pronunciation (mizhi) as "蜜制", such as "秘制", "蜜汁", will be forcibly replaced with "蜜制". Therefore, it is recommended that customers enable this feature based on their actual situation.
customization_id	No	String	Self-learning model id. If this parameter is not set, the last self-learning model to go online will automatically take effect. If this parameter is set, the corresponding self-learning model will take effect
filter_dirty	No	Integer	Whether to filter profanity (currently supports Mandarin Chinese engine). The default is 0. 0: Do not filter profanity; 1: Filter profanity; 2: Replace profanity with " * "
filter_modal	No	Integer	Whether to filter modal particles (currently supports Mandarin Chinese engine). The default is 0. 0: Do not filter modal particles; 1: Partially filter; 2: Strictly filter
filter_punc	No	Integer	Whether to filter sentence-ending periods (currently supports Mandarin Chinese engine). The default is 0. 0: Do not filter sentence-ending periods; 1: Filter sentence-ending periods
filter_empty_result	No	Integer	Whether to callback for empty results, default is 1.0: callback empty results; 1: do not callback empty results; Note: If you need to pair callbacks with slice_type=0 and slice_type=2, you need to set filter_empty_result=0. Pairing returns is typically required in outbound scenarios, and slice_type=0 is used to determine the presence of human voice.
convert_num_mode	No	Integer	Whether to perform intelligent conversion of Arabic numerals (currently supported by the Mandarin Chinese engine). 0: do not convert, output Chinese numerals directly, 1: intelligently convert to Arabic numerals according to the scenario, 3: enable math-related digit conversion. Default is 1
word_info	No	Int	Whether to display word-level timestamps. 0: do not display; 1: display, do not include punctuation timestamps, 2: display, include punctuation timestamps. Supported engines: 8k_en, 8k_zh, 8k_zh_finance, 16k_zh, 16k_en, 16k_ca, 16k_zh-TW, 16k_ja, 16k_wuu-SH, default is 0
vad_silence_time	No	Integer	Speech segmentation detection threshold. Silence duration exceeding this threshold will be considered a break (mainly used in intelligent customer service scenarios, needs to be used with needvad = 1). The value range is 240-2000 ms. It is recommended not to adjust this parameter arbitrarily as it may affect recognition results. Currently, it is only supported by the 8k_zh, 8k_zh_finance, and 16k_zh engine models
max_speak_time	No	Integer	Forced segmentation feature, value range 5000-90000 (unit: milliseconds), default value is 0 (not enabled). In the case of continuous speech without interruption, this parameter will enforce segmentation (the result becomes stable, slice_type=2). For example: In a game commentary scenario, if the commentator continues to speak without interruption and cannot segment the speech, setting this parameter to 10000 will receive a slice_type=2 callback every 10 seconds.
noise_threshold	No	Float	Noise parameter threshold, default is 0, range: [-1,1]. For some audio segments, the higher the value, the more likely it is to be detected as noise. The lower the value, the more likely it is to be detected as human voice. Use with caution: may affect recognition accuracy
signature	Yes	String	API signature parameter
hotword_list	No	String	Temporary hotword list: This parameter is used to improve recognition accuracy. Single hotword limit: "hotword\|weight", each hotword should not exceed 30 characters (maximum 10 Chinese characters), weight ranges from 1-11, e.g., "Tencent Cloud\|5" or "ASR\|11"; Temporary hotword list limit: Multiple hotwords are separated by English commas, with a maximum of 128 hotwords supported, e.g., "Tencent Cloud\|10, ASR\|5, ASR\|11"; Difference between the parameters hotword_id (hotword list) and hotword_list (temporary hotword list): hotword_id: hotword list. You need to create a hotword list in the console or via the API to obtain the corresponding hotword_id to use the hotword feature; hotword_list: temporary hotword list. Directly pass in the temporary hotword list for each request to use the hotword feature. The cloud does not retain the temporary hotword list. Suitable for users with a large demand for hotwords; Note: If both hotword_id and hotword_list are passed in, hotword_list will be used first; When the hotword weight is set to 11, the current hotword will be upgraded to a super hotword. It is recommended to set only important and must-be-effective hotwords to 11. Setting too many hotwords with a weight of 11 will affect the overall word accuracy rate.
input_sample_rate	No	Interge	Supports 8k audio in pcm format to be upsampled to 16k when the sampling rate does not match the engine, effectively improving recognition accuracy. Only supports: 8000. For example: if 8000 is passed in, then the pcm audio sampling rate is 8k. When the engine is selected to 16k_zh, the 8k sampling rate of the pcm audio can be recognized normally under the 16k_zh engine. Note: this parameter is only applicable to pcm format audio. If no value is passed in, the default state will be maintained, which means the default engine sampling rate is equal to the pcm audio sampling rate.

OpusHead (4 bytes)	Frame data length (2 bytes)	An Opus frame of compressed data
opus	Length len	Corresponding opus decode data of length len

Value	Description
4001	Invalid Parameter, see message for details
4002	Authentication failed
4003	AppID Service Not Activated, please activate the service in the console
4004	No available free quota
4005	Account arrears. Service stopped, please recharge in time
4006	The account's concurrent calls limit is exceeded
4007	Audio decoding failed. Please check that the uploaded audio data format is consistent with the call parameters
4008	Client data upload timeout
4009	Client connection closed
4010	Client uploaded an unknown text message
5000	Background error, please try again
5001	Background recognition server recognition failure, please try again
5002	Background recognition server recognition failure, please try again

tencent cloud

New User Offers

Next-Generation CDN：EdgeOne

Elasticsearch Service Special Offers

Free Tier

Tencent Cloud Startup Program

Special Offers

Lighthouse Special Offers

Cloud Object Storage Special Offers

Featured Products

New Products

Education

Tencent Cloud Online Education Solutions

Gaming

Gaming Solution

Game Media Solutions

Financial Services

Financial Services Solution

Audio & Video

Audio/Video Solution

LVB Recording Solution

Interactive Classroom Solution

Interactive Live Streaming Solution

Audio Chat Social Networking Solution

Real Estate

Tencent Cloud LinkBase(Weiling)

E-commerce

E-commerce retail solutions

Compute

Cloud Virtual Machine

Auto Scaling

Batch Compute

CVM Dedicated Host

Database

TencentDB for MySQL

TencentDB for Redis®

TencentDB for CTSDB

TDSQL for MySQL

Data Transfer Service

TencentDB for MongoDB

TencentDB for PostgreSQL

TencentDB for SQL Server

TencentDB for TcaplusDB

Video Service

Cloud Streaming Services

Video on Demand

Media Processing Service

Cloud Application Rendering

Cloud Contact Center

Game Multimedia Engine

Chat

Real-time Communication

Tencent Effect SDK

AI and Machine Learning

Image Creation Large Model

Face Fusion

eKYC

Optical Character Recognition

Video Creation Large Model

Industry Applications

Tencent HealthCare Omics Platform

Container and Middleware

TDMQ for CKafka

Serverless Cloud Function

Tencent Kubernetes Engine

Tencent Kubernetes Engine for Serverless

Networking

Cloud Load Balancer

Virtual Private Cloud

Direct Connect

Cloud Connect Network

NAT Gateway

VPN Connection

Bandwidth Package

Anycast Internet Acceleration

Elastic Network Interface

Flow Logs

Global Application Acceleration Platform

Security

Captcha