POST /template HTTP/1.1Host: <BucketName-APPID>.ci.<Region>.myqcloud.comDate: <GMT Date>Authorization: <Auth String>Content-Length: <length>Content-Type: application/xml<body>
<Request><Tag>SpeechRecognition</Tag><Name>TemplateName</Name><SpeechRecognition><EngineModelType>16k_zh</EngineModelType><ResTextFormat>1</ResTextFormat><FilterDirty>0</FilterDirty><FilterModal>1</FilterModal><ConvertNumMode>0</ConvertNumMode><SpeakerDiarization>1</SpeakerDiarization><SpeakerNumber>0</SpeakerNumber><FilterPunc>0</FilterPunc><OutputFileType>txt</OutputFileType></SpeechRecognition></Request>
Node Name (Keyword) | Parent Node | Description | Type | Required |
Request | None | Request container | Container | Yes |
Request
has the following sub-nodes:Node Name (Keyword) | Parent Node | Description | Type | Required | Constraints |
Tag | Request | Template tag: SpeechRecognition. | String | Yes | No |
Name | Request | Template name, which can contain letters, digits, underscores (_), hyphens (-), and asterisks (*). | String | Yes | None |
SpeechRecognition | Request | Speech recognition parameter. | Container | Yes | None |
SpeechRecognition
has the following sub-nodes:Node Name (Keyword) | Parent Node | Description | Type | Required |
EngineModelType | Request.Speech
Recognition | Engine model type, divided into phone call and non-phone call scenarios. Phone call scenarios: 8k_zh: 8 kHz, for Mandarin in general scenarios (available for dual-channel audio). 8k_zh_s: 8 kHz, for Mandarin with speaker separation (available for mono-channel audio only). 8k_en: 8 kHz, for English. Non-phone call scenarios: 16k_zh: 16 kHz, for Mandarin in general scenarios. 16k_zh_video: 16 kHz, for audio/video scenarios. 16k_en: 16 kHz, for English. 16k_ca: 16 kHz, for Cantonese. 16k_ja: 16 kHz, for Japanese. 16k_zh_edu: For Mandarin in education scenarios. 16k_en_edu: For English in education scenarios. 16k_zh_medical: For healthcare scenarios. 16k_th: For Thai. 16k_zh_dialect: Multi-dialect, for up to 23 dialects. Fast ASR is only supported for 8k_zh, 16k_zh, 16k_en, and 16k_zh_video. | String | Yes |
ChannelNum | Request.Speech
Recognition | This parameter is supported only for general ASR. Number of sound channels: 1: Mono. If EngineModelType is not the phone call scenario, only mono channel is supported. 2: Dual (for the 8k_zh engine only, where the two channels correspond to the caller and callee respectively). | Integer | Yes |
ResTextFormat | Request.Speech
Recognition | This parameter is supported only for general ASR. Format of the returned recognition result. 0: Recognition result text, including the list of segment timestamps. 1: Detailed word-level recognition result, excluding punctuation marks but including the speech speed value (the list of word timestamps, generally used to generate subtitles). 2: Detailed word-level recognition result, including punctuation marks and the speech speed value. | Integer | Yes |
FilterDirty | Request.Speech
Recognition | Whether to filter restricted words (for the Mandarin engine only). 0: Does not filter. 1: Filters. 2: Replaces restricted words with *. Default value: 0. | Integer | No |
FilterModal | Request.Speech
Recognition | Whether to filter modal particles (for the Mandarin engine only). 0: Does not filter. 1: Filters partially. 2: Filters strictly. Default value: 0. | Integer | No |
ConvertNumMode | Request.Speech
Recognition | Whether to intelligently convert Chinese numbers to Arabic numerals (for the Mandarin engine only): 0: Directly outputs Chinese numbers. 1: Intelligently converts based on the scenario. 3: Enables mathematic number conversion. This parameter is supported only for general ASR. Default value: 0. | Integer | No |
SpeakerDiarization | Request.Speech
Recognition | Whether to enable speaker separation: 0: No. 1: Yes (for mono-channel audios with the 8k_zh, 16k_zh, or 16k_zh_video engine only). Default value: 0.
Note: In the 8 kHz phone call scenario, we recommend you use dual channels to distinguish between the caller and callee by setting ChannelNum=2, so you don't need to enable speaker separation. | Integer | No |
SpeakerNumber | Request.Speech
Recognition | This parameter is supported only for general ASR. Number of speakers to be separated (with speaker separation enabled). Value range: 0–10. 0: Automatic separation (currently only for six or fewer people only). 1–10: Specified number of speakers to be separated. Default value: 0. | Integer | No |
FilterPunc | Request.Speech
Recognition | Whether to filter punctuation marks (currently for the Mandarin engine only): 0: Does not filter. 1: Filters the punctuation mark at the end of the sentence. 2: Filters all punctuation marks. Default value: 0. | Integer | No |
OutputFileType | Request.Speech
Recognition | Output file type. Valid values: txt (default), srt txt is supported only for fast ASR. | String | No |
FlashAsr | Request.Speech
Recognition | Whether to enable fast ASR. Valid values: true, false (default). | String | No |
Format | Request.Speech
Recognition | Audio format for fast ASR. Supported formats include WAV, PCM, OGG-OPUS, SPEEX, SILK, MP3, M4A, and AAC. | String | Yes if FlashAsr is true |
FirstChannelOnly | Request.Speech
Recognition | Whether to recognize only the first sound channel (in fast ASR mode). Valid values: 0 (recognizes all sound channels); 1 (recognizes the first sound channel). Default value: 1. | Integer | No |
WordInfo | Request.Speech
Recognition | Whether to display word-level timestamps (in fast ASR mode). Valid values: 0 (no); 1 (yes, excluding punctuation timestamps), 2 (yes, including punctuation timestamps). Default value: 0. | Integer | No |
<Response><Template><Tag>SpeechRecognition</Tag><Name>TemplateName</Name><State>Normal</State><Tag>SpeechRecognition</Tag><CreateTime></CreateTime><UpdateTime></UpdateTime><BucketId></BucketId><Category>Custom</Category><SpeechRecognition><EngineModelType>16k_zh</EngineModelType><ResTextFormat>1</ResTextFormat><FilterDirty>0</FilterDirty><FilterModal>1</FilterModal><ConvertNumMode>0</ConvertNumMode><SpeakerDiarization>1</SpeakerDiarization><SpeakerNumber>0</SpeakerNumber><FilterPunc>0</FilterPunc><OutputFileType>txt</OutputFileType></SpeechRecognition></Template></Response>
Node Name (Keyword) | Parent Node | Description | Type |
Response | None | Result storage container | Container |
Response
has the following sub-nodes:Node Name (Keyword) | Parent Node | Description | Type |
TemplateId | Response.Template | Template ID | String |
Name | Response.Template | Template name | String |
BucketId | Response.Template | Template bucket | String |
Category | Response.Template | Template category: Custom or Official | String |
Tag | Response.Template | Template tag: SpeechRecognition. | String |
UpdateTime | Response.Template | Update time | String |
CreateTime | Response.Template | Creation time | String |
SpeechRecognition | Response.Template | Same as Request.SpeechRecognition in the request body. | Container |
POST /template HTTP/1.1Authorization: q-sign-algorithm=sha1&q-ak=AKIDZfbOAo7cllgPvF9cXFrJD0a1ICvR****&q-sign-time=1497530202;1497610202&q-key-time=1497530202;1497610202&q-header-list=&q-url-param-list=&q-signature=28e9a4986df11bed0255e97ff90500557e0e****Host: test-1234567890.ci.ap-chongqing.myqcloud.comContent-Length: 1666Content-Type: application/xml<Request><Tag>SpeechRecognition</Tag><Name>TemplateName</Name><SpeechRecognition><EngineModelType>16k_zh</EngineModelType><ResTextFormat>1</ResTextFormat><FilterDirty>0</FilterDirty><FilterModal>1</FilterModal><ConvertNumMode>0</ConvertNumMode><SpeakerDiarization>1</SpeakerDiarization><SpeakerNumber>0</SpeakerNumber><FilterPunc>0</FilterPunc><OutputFileType>txt</OutputFileType></SpeechRecognition></Request>
HTTP/1.1 200 OKContent-Type: application/xmlContent-Length: 100Connection: keep-aliveDate: Thu, 14 Jul 2022 12:37:29 GMTServer: tencent-cix-ci-request-id: NTk0MjdmODlfMjQ4OGY3XzYzYzhf****<Response><Template><TemplateId>t1460606b9752148c4ab182f55163ba7cd</TemplateId><Name>TemplateName</Name><State>Normal</State><Tag>SpeechRecognition</Tag><CreateTime>2020-08-05T11:35:24+0800</CreateTime><UpdateTime>2020-08-31T16:15:20+0800</UpdateTime><BucketId>test-1234567890</BucketId><Category>Custom</Category><SpeechRecognition><EngineModelType>16k_zh</EngineModelType><ChannelNum>1</ChannelNum><ResTextFormat>0</ResTextFormat><FilterDirty>1</FilterDirty><FilterModal>0</FilterModal><ConvertNumMode>1</ConvertNumMode><SpeakerDiarization>0</SpeakerDiarization><SpeakerNumber>0</SpeakerNumber><FilterPunc>0</FilterPunc></SpeechRecognition></Template></Response>
Was this page helpful?