Submitting Speech Recognition Job

Overview
This API is used to submit a speech recognition job.
Note: 
For constraints on speech recognition, see Specifications and Limits.
﻿
Request
Sample request
POST /asr_jobs HTTP/1.1
Host: <BucketName-APPID>.ci.<Region>.myqcloud.com
Date: <GMT Date>
Authorization: <Auth String>
Content-Length: <length>
Content-Type: application/xml
﻿
<body>
Note: 
Authorization: Auth String (for more information, see Request Signature).
When this feature is used by a sub-account, relevant permissions must be granted as instructed in Authorization Granularity Details.
Request headers
This API only uses Common Request Headers.
Request body
This request requires the following request body:
<Request>
    <Tag>SpeechRecognition</Tag>
    <Input>
        <Object>input/test.mp3</Object>
        <Url></Url>
    </Input>
    <Operation>
        <SpeechRecognition>
            <EngineModelType>16k_zh_video</EngineModelType>
            <ChannelNum>1</ChannelNum>
            <FilterDirty>1</ChannelNum>
            <FilterModal>1</ChannelNum>
        </SpeechRecognition>
        <Output>
            <Region>ap-chongqing</Region>
            <Bucket>test-123456789</Bucket>
            <Object>output/asr.txt</Object>
        </Output>
        <UserData>This is my data.</UserData>
        <JobLevel>0</JobLevel>
    </Operation>
    <QueueId>pcd463e1467964d39ad2d3f66aacd8199</QueueId>
    <CallBack>http://callback.demo.com</CallBack>
    <CallBackFormat>JSON<CallBackFormat>
</Request>
The nodes are described as follows:
Node Name (Keyword)
Parent Node
Description
Type
Required
Request
None
Request container
Container
Yes
Request has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
Required
Tag
Request
Job type: SpeechRecognition.
String
Yes
Input
Request
Information about the object to be operated
Container
Yes
Operation
Request
Operation rule
Container
Yes
QueueId
Request
ID of the queue where the job is in
String
Yes
CallBackFormat
Request
Job callback format, which can be JSON or XML (default). It has a higher priority than that of the queue.
String
No
CallBackType
Request
Job callback type, which can be Url (default) or TDMQ. It has a higher priority than that of the queue.
String
No
CallBack
Request
Job callback address, which has a higher priority than that of the queue. If it is set to no, no callbacks will be generated at the callback address of the queue.
String
No
CallBackMqConfig
Request
TDMQ configuration for job callback as described in Structure > CallBackMqConfig, which is required if CallBackType is TDMQ.
Container
No
Input has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
Required
Object
Request.Input
Media filename
String
No
Url
Request.Input
URL that supports downloading over the public network. At least one of Url and Object must be passed in. If both parameters are passed in, Object is preferred.
String
No
Operation has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
Required
SpeechRecognition
Request.Operation
Speech recognition parameter
Container
Yes
Output
Request.Operation
Result output address
Container
Yes
UserData
Request.Operation
The user information passed through, which is printable ASCII codes of up to 1,024 in length.
String
No
JobLevel
Request.Operation
Job priority. The greater the value, the higher the priority. Valid values: 0, 1, 2. Default value: 0.
String
No
SpeechRecognition has the following sub-nodes:
﻿
SpeechRecognition has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
Required
EngineModelType
Request.Operation.Speech Recognition
Engine model type. 
Phone call scenario:
8k_zh: 8 kHz Mandarin for general fields (applied to dual-channel audios)
8k_zh_s: 8 kHz Mandarin for speaker diarization (applied only to mono-channel audios)
Non-phone call scenario:
16k_zh: 16 kHz Mandarin for general fields
16k_zh_video: 16 kHz Mandarin for audiovisual fields
16k_en: 16 kHz English
16k_ca: 16 kHz Cantonese
String
Yes
ChannelNum
Request.Operation.Speech Recognition
Number of sound channels:
1: Mono
2: Dual (for the 8k_zh engine only)
Integer
Yes
ResTextFormat
Request.Operation.Speech Recognition
Format of the returned recognition result.
0: recognition result text, including the list of segment timestamps
1: recognition result details, including the list of word timestamps (generally used to generate subtitles and for the 16 kHz Mandarin engine only).
Integer
Yes
FilterDirty
Request.Operation.Speech Recognition
Whether to filter restricted words (for the Mandarin engine only).
0: Does not filter.
1: Filters.
2: Replaces restricted words with *. Default value: 0.
Integer
No
FilterModal
Request.Operation.Speech Recognition
Whether to filter modal particles (for the Mandarin engine only).
0: Does not filter.
1: Filters partially.
2: Filters strictly. Default value: 0.
Integer
No
ConvertNumMode
Request.Operation.Speech Recognition
Whether to intelligently convert Chinese numbers to Arabic numerals (for the Mandarin engine only):
0: Directly outputs Chinese numbers.
1: Intelligently converts based on the scenario. Default value: 1.
Integer
No
Output has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
Required
Region
Request.Operation.Output
Bucket region
String
Yes
Bucket
Request.Operation.Output
Result storage bucket
String
Yes
Object
Request.Operation.Output
Result file name
String
Yes
Response
Response headers
This API only returns Common Response Headers.
Response body
The response body returns application/xml data. The following contains all the nodes:
<Response>
    <JobsDetail>
        <Code>Success</Code>
        <CreationTime>2021-08-05T15:43:50+0800</CreationTime>
        <EndTime>-</EndTime>
        <Input>
            <BucketId>test-1234567890</BucketId>
            <Object>input/test.mp3</Object>
            <Region>ap-chongqing</Region>
        </Input>
        <JobId>s58ccb634149211ed84ce2b1cd7fbb14a</JobId>
        <Message/>
        <Operation>
            <JobLevel>0</JobLevel>
            <Output>
                <Bucket>test-1234567890</Bucket>
                <Object>output/asr.txt</Object>
                <Region>ap-chongqing</Region>
            </Output>
            <SpeechRecognition>
                <ChannelNum>1</ChannelNum>
                <ConvertNumMode>0</ConvertNumMode>
                <EngineModelType>16k_zh_video</EngineModelType>
                <FilterDirty>0</FilterDirty>
                <FilterModal>0</FilterModal>
                <FilterPunc>0</FilterPunc>
                <OutputFileType>txt</OutputFileType>
                <ResTextFormat>0</ResTextFormat>
                <SpeakerDiarization>0</SpeakerDiarization>
                <SpeakerNumber>0</SpeakerNumber>
            </SpeechRecognition>
            <UserData>This is my data.</UserData>
            <JobLevel>0</JobLevel>
        </Operation>
        <QueueId>pcd463e1467964d39ad2d3f66aacd8199</QueueId>
        <StartTime>-</StartTime>
        <State>Submitted</State>
        <Tag>SpeechRecognition</Tag>
    </JobsDetail>
</Response>
The nodes are described as follows:
Node Name (Keyword)
Parent Node
Description
Type
Response
None
Result storage container
Container
Response has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
JobsDetail
Response
Job details
Container
JobsDetail has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
Code
Response.JobsDetail
Error code, which is returned only if State is Failed
String
Message
Response.JobsDetail
Error message, which is returned only if State is Failed
String
JobId
Response.JobsDetail
Job ID
String
Tag
Response.JobsDetail
Job type: SpeechRecognition
String
State
Response.JobsDetail
Job status. Valid values: Submitted, Running, Success, Failed, Pause, Cancel.
String
CreationTime
Response.JobsDetail
Job creation time
String
StartTime
Response.JobsDetail
Job start time
String
EndTime
Response.JobsDetail
Job end time
String
QueueId
Response.JobsDetail
ID of the queue where the job is in
String
Input
Response.JobsDetail
Same as the Request.Input node in the request
Container
Operation
Response.JobsDetail
Operation rule. Up to 6 operation rules are supported.
Container
Operation has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
SpeechRecognition
Response.JobsDetail.Operation
Same as Request.Operation.SpeechRecognition in the request.
Container
Output
Response.JobsDetail.Operation
Same as Request.Operation.Output in the request.
Container
UserData
Response.JobsDetail.Operation
The user information passed through.
String
JobLevel
Response.JobsDetail.Operation
Job priority
String
SpeechRecognitionResult
Response.JobsDetail.Operation
Speech recognition job result. It is not returned when there is no result.
Container
SpeechRecognitionResult has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
AudioTime
Response.JobsDetail.Operation.SpeechRecognitionResult
Audio duration (seconds)
String
Result
Response.JobsDetail.Operation.SpeechRecognitionResult
Speech recognition result
String
FlashResult
Response.JobsDetail.Operation.SpeechRecognitionResult
Result of fast speech recognition
Container array
ResultDetail
Response.JobsDetail.Operation.SpeechRecognitionResult
Recognition result details, including the time offsets of words in each sentence (generally used to generate subtitles). This field will not be empty if ResTextFormat in the speech recognition request is 1.
Note: This field may be empty, indicating that no valid values can be obtained.
Container array
FlashResult has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
channel_id
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult
Sound channel ID, which starts from 0 and is corresponding to the number of audio channels
Integer
text
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult
Complete audio recognition results of the sound channel
String
sentence_list
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult
List of recognition results at sentence/paragraph level
Container array
sentence_list has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
text
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult.sentence_list
Sentence/paragraph-level text
String
start_time
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult.sentence_list
Start time
Integer
end_time
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult.sentence_list
End time
Integer
speaker_id
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult.sentence_list
Speaker ID (If speaker_diarization is set in the request, speakers can be distinguished by speaker_id.)
Integer
word_list
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult.sentence_list
List of recognition results at word level
Container array
word_list has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
word
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult.sentence_list.word_list
Word-level text
String
start_time
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult.sentence_list.word_list
Start time
Integer
end_time
Response.JobsDetail.Operation.SpeechRecognitionResult.FlashResult.sentence_list.word_list
End time
Integer
ResultDetail has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
FinalSentence
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail
Final recognition result of a single sentence
String
SliceSentence
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail
Middle recognition result of a single sentence, which is split into multiple words using spaces
String
StartMs
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail
Start time of a single sentence (millisecond)
String
EndMs
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail
End time of a single sentence (millisecond)
String
WordsNum
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail
Number of words in a single sentence
String
SpeechSpeed
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail
Speed of a single sentence. Unit: words/second
String
SpeakerId
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail
Sound channel or speaker ID (If speaker_diarization is set or ChannelNum is set to Dual in the request, the speakers or channels can be distinguished.)
String
Words
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail
Word details in a single sentence
Container array
Words has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
Word
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail.Words
Word text
String
OffsetStartMs
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail.Words
Start time offset in a sentence
String
OffsetEndMs
Response.JobsDetail.Operation.SpeechRecognitionResult.ResultDetail.Words
End time offset in a sentence
String
Error codes
No special error message will be returned for this request. For the common error messages, please see Error Codes.
Examples
Request
POST /asr_jobs HTTP/1.1
Authorization: q-sign-algorithm=sha1&q-ak=AKIDZfbOAo7cllgPvF9cXFrJD0a1ICvR****&q-sign-time=1497530202;1497610202&q-key-time=1497530202;1497610202&q-header-list=&q-url-param-list=&q-signature=28e9a4986df11bed0255e97ff90500557e0ea057
Host: test-1234567890.ci.ap-beijing.myqcloud.com
Content-Length: 166
Content-Type: application/xml
﻿
<Request>
    <Tag>SpeechRecognition</Tag>
    <Input>
        <Object>input/test.mp3</Object>
    </Input>
    <Operation>
        <SpeechRecognition>
            <EngineModelType>16k_zh_video</EngineModelType>
            <ChannelNum>1</ChannelNum>
            <FilterDirty>1</ChannelNum>
            <FilterModal>1</ChannelNum>
        </SpeechRecognition>
        <Output>
            <Region>ap-chongqing</Region>
            <Bucket>test-123456789</Bucket>
            <Object>output/asr.txt</Object>
        </Output>
        <UserData>This is my data.</UserData>
        <JobLevel>0</JobLevel>
    </Operation>
    <QueueId>pcd463e1467964d39ad2d3f66aacd8199</QueueId>
    <CallBack>http://callback.demo.com</CallBack>
    <CallBackFormat>JSON<CallBackFormat>
</Request>
Response
HTTP/1.1 200 OK
Content-Type: application/xml
Content-Length: 230
Connection: keep-alive
Date: Mon, 28 Jun 2022 15:23:12 GMT
Server: tencent-ci
x-ci-request-id: NTk0MjdmODlfMjQ4OGY3XzYzYzhf****
﻿
<Response>
    <JobsDetail>
        <Code>Success</Code>
        <CreationTime>2021-08-05T15:43:50+0800</CreationTime>
        <EndTime>-</EndTime>
        <Input>
            <BucketId>test-1234567890</BucketId>
            <Object>input/test.mp3</Object>
            <Region>ap-chongqing</Region>
        </Input>
        <JobId>s58ccb634149211ed84ce2b1cd7fbb14a</JobId>
        <Message/>
        <Operation>
            <JobLevel>0</JobLevel>
            <Output>
                <Bucket>test-1234567890</Bucket>
                <Object>output/asr.txt</Object>
                <Region>ap-chongqing</Region>
            </Output>
            <SpeechRecognition>
                <ChannelNum>1</ChannelNum>
                <ConvertNumMode>0</ConvertNumMode>
                <EngineModelType>16k_zh_video</EngineModelType>
                <FilterDirty>0</FilterDirty>
                <FilterModal>0</FilterModal>
                <FilterPunc>0</FilterPunc>
                <OutputFileType>txt</OutputFileType>
                <ResTextFormat>0</ResTextFormat>
                <SpeakerDiarization>0</SpeakerDiarization>
                <SpeakerNumber>0</SpeakerNumber>
            </SpeechRecognition>
            <UserData>This is my data.</UserData>
            <JobLevel>0</JobLevel>
        </Operation>
        <QueueId>pcd463e1467964d39ad2d3f66aacd8199</QueueId>
        <StartTime>-</StartTime>
        <State>Submitted</State>
        <Tag>SpeechRecognition</Tag>
    </JobsDetail>
</Response>
﻿

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

Node Name (Keyword)	Parent Node	Description	Type	Required
Request	None	Request container	Container	Yes

tencent cloud

New User Offers

Next-Generation CDN：EdgeOne

Elasticsearch Service free trial

Free Tier

Tencent Cloud Startup Program

Special Offers

Lighthouse Special Offers

Cloud Object Storage Special Offers

Featured Products

New Products

Education

Tencent Cloud Online Education Solutions

Gaming

Gaming Solution

Game Media Solutions

E-commerce

E-commerce retail solutions

Audio & Video

Audio/Video Solution

LVB Recording Solution

Interactive Classroom Solution

Interactive Live Streaming Solution

Audio Chat Social Networking Solution

Financial Services

Financial Services Solution

Compute

Cloud Virtual Machine

Auto Scaling

Batch Compute

CVM Dedicated Host

Database

TencentDB for MySQL

TencentDB for Redis®

TencentDB for CTSDB

TDSQL for MySQL

Data Transfer Service

TencentDB for MongoDB

TencentDB for PostgreSQL

TencentDB for SQL Server

Video Service

Cloud Streaming Services

Video on Demand

Media Processing Service

Cloud Application Rendering

Cloud Contact Center

Game Multimedia Engine

Chat

Real-time Communication

Tencent Effect SDK

AI and Machine Learning

Image Creation Large Model

Face Fusion

eKYC

Optical Character Recognition

Video Creation Large Model

Industry Applications

Tencent HealthCare Omics Platform

Container and Middleware

TDMQ for CKafka

Serverless Cloud Function

Tencent Kubernetes Engine

Tencent Kubernetes Engine for Serverless

Networking

Cloud Load Balancer

Virtual Private Cloud

Direct Connect

Cloud Connect Network

NAT Gateway

VPN Connection

Bandwidth Package

Anycast Internet Acceleration

Elastic Network Interface

Flow Logs

Global Application Acceleration Platform

Security

Captcha

Cloud Workload Protection Platform

Data Security Governance Center

Key Management Service