CreateSpeechJobs
) is used to submit a speech recognition job.POST /asr_jobs HTTP/1.1Host: <BucketName-APPID>.ci.<Region>.myqcloud.comDate: <GMT Date>Authorization: <Auth String>Content-Length: <length>Content-Type: application/xml<body>
<Request><Tag>SpeechRecognition</Tag><Input><Object></Object></Input><Operation><SpeechRecognition></SpeechRecognition><Output><Region></Region><Bucket></Bucket><Object></Object></Output></Operation><QueueId></QueueId></Request>
Node Name (Keyword) | Parent Node | Description | Type | Required |
Request | None | Request container | Container | Yes |
Request
has the following sub-nodes:Node Name (Keyword) | Parent Node | Description | Type | Required |
Tag | Request | Job type, which currently can only be `SpeechRecognition`. | String | Yes |
Input | Request | Speech file to be manipulated | Container | Yes |
Operation | Request | Operation rule | Container | Yes |
QueueId | Request | ID of the queue which the job is in | String | Yes |
Input
has the following sub-nodes:Node Name (Keyword) | Parent Node | Description | Type | Required |
Object | Request.Input | Speech file key in COS. `Bucket` is specified by `Host`. | String | Yes |
Operation
has the following sub-nodes:Node Name (Keyword) | Parent Node | Description | Type | Required |
SpeechRecognition | Request.Operation | Job type parameter, which takes effect only if Tag is SpeechRecognition . | Container | No |
Output | Request.Operation | Result output address | Container | Yes |
Node Name (Keyword) | Parent Node | Description | Type | Required |
EngineModelType | Request.Operation.Speech
Recognition | Engine model type.
Phone call scenarios:
• 8k_zh: 8 kHz, for Mandarin in general scenarios (available for dual-channel audio).
• 8k_zh_s: 8 kHz, for Mandarin with speaker separation (available for mono-channel audio only).
Non-phone call scenarios:
• 16k_zh: 16 kHz, for Mandarin in general scenarios.
• 16k_zh_video: 16 kHz, for Mandarin in audio/video scenarios.
• 16k_en: 16 kHz, for English.
• 16k_ca: 16 kHz, for Cantonese. | String | Yes |
ChannelNum | Request.Operation.Speech
Recognition | Number of speech sound channels. 1: mono; 2: dual (for the 8k_zh engine only). | Integer | Yes |
ResTextFormat | Request.Operation.Speech
Recognition | Format of the returned recognition result. 0: recognition result text, including the list of segment timestamps; 1: recognition result details, including the list of word timestamps (generally used to generate subtitles and for the 16k Mandarin engine only). | Integer | Yes |
FilterDirty | Request.Operation.Speech
Recognition | Whether to filter restricted words (for the Mandarin engine only). 0 (default value): does not filter; 1: filters; 2: replaces restricted words with "*". | Integer | No |
FilterModal | Request.Operation.Speech
Recognition | Whether to filter interjections (for the Mandarin engine only). 0 (default value): does not filter; 1: filters; 2: filters strictly. | Integer | No |
ConvertNumMode | Request.Operation.Speech
Recognition | Whether to intelligently convert Chinese numbers to Arabic numerals (for the Mandarin engine only). 0: directly outputs Chinese numbers; 1 (default value): intelligently converts based on the scenario. | Integer | No |
Output
has the following sub-nodes:Node Name (Keyword) | Parent Node | Description | Type | Required |
Region | Request.Operation.Output | Bucket region | String | Yes |
Bucket | Request.Operation.Output | Result storage bucket | String | Yes |
Object | Request.Operation.Output | Result filename | String | Yes |
<Response><JobsDetail><Code></Code><Message></Message><JobId></JobId><State></State><CreationTime></CreationTime><QueueId></QueueId><Tag><Tag><Input><Object></Object></Input><Operation><SpeechRecognition></SpeechRecognition><Output><Region></Region><Bucket></Bucket><Object></Object></Output><MediaInfo></MeidaInfo></Operation></JobsDetail></Response>
Node Name (Keyword) | Parent Node | Description | Type |
Response | None | Response container | Container |
Response
has the following sub-nodes:Node Name (Keyword) | Parent Node | Description | Type |
JobsDetail | Response | Job details | Container |
JobsDetail
has the following sub-nodes:Node Name (Keyword) | Parent Node | Description | Type |
Code | Response.JobsDetail | Error code, which is meaningful only if State is Failed | String |
Message | Response.JobsDetail | Error description, which is meaningful only if State is Failed | String |
JobId | Response.JobsDetail | Job ID | String |
Tag | Response.JobsDetail | Job type: SpeechRecognition | String |
State | Response.JobsDetail | Job status. Valid values: Submitted, Running, Success, Failed, Pause, Cancel | String |
CreationTime | Response.JobsDetail | Job creation time | String |
QueueId | Response.JobsDetail | ID of the queue which the job is in | String |
Input | Response.JobsDetail | Input resource address of the job | Container |
Operation | Response.JobsDetail | Operation rule | Container |
Input
has the following sub-nodes:
Same as the Request.Input
node in the request.Operation
has the following sub-nodes:Node Name (Keyword) | Parent Node | Description | Type |
TemplateId | Response.JobsDetail.Operation | Job template ID | String |
Output | Response.JobsDetail.Operation | File output address | Container |
MediaInfo | Response.JobsDetail.Operation | Transcoding output video information. This node will not be returned if there is no output video. | Container |
Output
has the following sub-nodes:
Same as the Request.Operation.Output
node in the request.SpeechRecognition
has the following sub-nodes:
Same as the Request.Operation.SpeechRecognition
node in the request.
Was this page helpful?