tencent cloud

All product documents
APIs
Creating Speech Recognition Template
Last updated: 2024-03-01 17:29:56
Creating Speech Recognition Template
Last updated: 2024-03-01 17:29:56

Feature Description

This API is used to create a speech recognition template.


Request

Sample request

POST /template HTTP/1.1
Host: <BucketName-APPID>.ci.<Region>.myqcloud.com
Date: <GMT Date>
Authorization: <Auth String>
Content-Length: <length>
Content-Type: application/xml

<body>
Note:
Authorization: Auth String (for more information, see Request Signature).
When this feature is used by a sub-account, relevant permissions must be granted. For more information, see Authorization Granularity Details.

Request headers

This API only uses common request headers. For more information, see Common Request Headers.

Request body

This request requires the following request body:
<Request>
<Tag>SpeechRecognition</Tag>
<Name>TemplateName</Name>
<SpeechRecognition>
<EngineModelType>16k_zh</EngineModelType>
<ResTextFormat>1</ResTextFormat>
<FilterDirty>0</FilterDirty>
<FilterModal>1</FilterModal>
<ConvertNumMode>0</ConvertNumMode>
<SpeakerDiarization>1</SpeakerDiarization>
<SpeakerNumber>0</SpeakerNumber>
<FilterPunc>0</FilterPunc>
<OutputFileType>txt</OutputFileType>
</SpeechRecognition>
</Request>
The nodes are described as follows:
Node Name (Keyword)
Parent Node
Description
Type
Required
Request
None
Request container
Container
Yes
Request has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
Required
Constraints
Tag
Request
Template tag: SpeechRecognition.
String
Yes
No
Name
Request
Template name, which can contain letters, digits, underscores (_), hyphens (-), and asterisks (*).
String
Yes
None
SpeechRecognition
Request
Speech recognition parameter.
Container
Yes
None
SpeechRecognition has the following sub-nodes:

SpeechRecognition has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
Required
EngineModelType
Request.Speech Recognition
Engine model type, divided into phone call and non-phone call scenarios.
Phone call scenarios:
8k_zh: 8 kHz, for Mandarin in general scenarios (available for dual-channel audio).
8k_zh_s: 8 kHz, for Mandarin with speaker separation (available for mono-channel audio only).
8k_en: 8 kHz, for English.
Non-phone call scenarios:
16k_zh: 16 kHz, for Mandarin in general scenarios.
16k_zh_video: 16 kHz, for audio/video scenarios.
16k_en: 16 kHz, for English.
16k_ca: 16 kHz, for Cantonese.
16k_ja: 16 kHz, for Japanese.
16k_zh_edu: For Mandarin in education scenarios.
16k_en_edu: For English in education scenarios.
16k_zh_medical: For healthcare scenarios.
16k_th: For Thai.
16k_zh_dialect: Multi-dialect, for up to 23 dialects.
Fast ASR is only supported for 8k_zh, 16k_zh, 16k_en, and 16k_zh_video.
String
Yes
ChannelNum
Request.Speech Recognition
This parameter is supported only for general ASR.
Number of sound channels:
1: Mono. If EngineModelType is not the phone call scenario, only mono channel is supported.
2: Dual (for the 8k_zh engine only, where the two channels correspond to the caller and callee respectively).
Integer
Yes
ResTextFormat
Request.Speech Recognition
This parameter is supported only for general ASR.
Format of the returned recognition result.
0: Recognition result text, including the list of segment timestamps.
1: Detailed word-level recognition result, excluding punctuation marks but including the speech speed value (the list of word timestamps, generally used to generate subtitles).
2: Detailed word-level recognition result, including punctuation marks and the speech speed value.
Integer
Yes
FilterDirty
Request.Speech Recognition
Whether to filter restricted words (for the Mandarin engine only).
0: Does not filter.
1: Filters.
2: Replaces restricted words with *.
Default value: 0.
Integer
No
FilterModal
Request.Speech Recognition
Whether to filter modal particles (for the Mandarin engine only).
0: Does not filter.
1: Filters partially.
2: Filters strictly.
Default value: 0.
Integer
No
ConvertNumMode
Request.Speech Recognition
Whether to intelligently convert Chinese numbers to Arabic numerals (for the Mandarin engine only):
0: Directly outputs Chinese numbers.
1: Intelligently converts based on the scenario.
3: Enables mathematic number conversion. This parameter is supported only for general ASR.
Default value: 0.
Integer
No
SpeakerDiarization
Request.Speech Recognition
Whether to enable speaker separation:
0: No.
1: Yes (for mono-channel audios with the 8k_zh, 16k_zh, or 16k_zh_video engine only).
Default value: 0. Note: In the 8 kHz phone call scenario, we recommend you use dual channels to distinguish between the caller and callee by setting ChannelNum=2, so you don't need to enable speaker separation.
Integer
No
SpeakerNumber
Request.Speech Recognition
This parameter is supported only for general ASR.
Number of speakers to be separated (with speaker separation enabled). Value range: 0–10.
0: Automatic separation (currently only for six or fewer people only). 1–10: Specified number of speakers to be separated. Default value: 0.
Integer
No
FilterPunc
Request.Speech Recognition
Whether to filter punctuation marks (currently for the Mandarin engine only):
0: Does not filter.
1: Filters the punctuation mark at the end of the sentence.
2: Filters all punctuation marks.
Default value: 0.
Integer
No
OutputFileType
Request.Speech Recognition
Output file type. Valid values: txt (default), srt
txt is supported only for fast ASR.
String
No
FlashAsr
Request.Speech Recognition
Whether to enable fast ASR. Valid values: true, false (default).
String
No
Format
Request.Speech Recognition
Audio format for fast ASR. Supported formats include WAV, PCM, OGG-OPUS, SPEEX, SILK, MP3, M4A, and AAC.
String
Yes if FlashAsr is true
FirstChannelOnly
Request.Speech Recognition
Whether to recognize only the first sound channel (in fast ASR mode). Valid values: 0 (recognizes all sound channels); 1 (recognizes the first sound channel). Default value: 1.
Integer
No
WordInfo
Request.Speech Recognition
Whether to display word-level timestamps (in fast ASR mode). Valid values: 0 (no); 1 (yes, excluding punctuation timestamps), 2 (yes, including punctuation timestamps). Default value: 0.
Integer
No


Response

Response headers

This API only returns common response headers. For more information, see Common Response Headers.

Response body

The response body returns application/xml data. The following contains all the nodes:
<Response>
<Template>
<Tag>SpeechRecognition</Tag>
<Name>TemplateName</Name>
<State>Normal</State>
<Tag>SpeechRecognition</Tag>
<CreateTime></CreateTime>
<UpdateTime></UpdateTime>
<BucketId></BucketId>
<Category>Custom</Category>
<SpeechRecognition>
<EngineModelType>16k_zh</EngineModelType>
<ResTextFormat>1</ResTextFormat>
<FilterDirty>0</FilterDirty>
<FilterModal>1</FilterModal>
<ConvertNumMode>0</ConvertNumMode>
<SpeakerDiarization>1</SpeakerDiarization>
<SpeakerNumber>0</SpeakerNumber>
<FilterPunc>0</FilterPunc>
<OutputFileType>txt</OutputFileType>
</SpeechRecognition>
</Template>
</Response>
The nodes are as described below:
Node Name (Keyword)
Parent Node
Description
Type
Response
None
Result storage container
Container
Response has the following sub-nodes:
Node Name (Keyword)
Parent Node
Description
Type
TemplateId
Response.Template
Template ID
String
Name
Response.Template
Template name
String
BucketId
Response.Template
Template bucket
String
Category
Response.Template
Template category: Custom or Official
String
Tag
Response.Template
Template tag: SpeechRecognition.
String
UpdateTime
Response.Template
Update time
String
CreateTime
Response.Template
Creation time
String
SpeechRecognition
Response.Template
Same as Request.SpeechRecognition in the request body.
Container

Error codes

There are no special error messages for this request. For common error messages, see Error Codes.

Samples

Request

POST /template HTTP/1.1
Authorization: q-sign-algorithm=sha1&q-ak=AKIDZfbOAo7cllgPvF9cXFrJD0a1ICvR****&q-sign-time=1497530202;1497610202&q-key-time=1497530202;1497610202&q-header-list=&q-url-param-list=&q-signature=28e9a4986df11bed0255e97ff90500557e0e****
Host: test-1234567890.ci.ap-chongqing.myqcloud.com
Content-Length: 1666
Content-Type: application/xml

<Request>
<Tag>SpeechRecognition</Tag>
<Name>TemplateName</Name>
<SpeechRecognition>
<EngineModelType>16k_zh</EngineModelType>
<ResTextFormat>1</ResTextFormat>
<FilterDirty>0</FilterDirty>
<FilterModal>1</FilterModal>
<ConvertNumMode>0</ConvertNumMode>
<SpeakerDiarization>1</SpeakerDiarization>
<SpeakerNumber>0</SpeakerNumber>
<FilterPunc>0</FilterPunc>
<OutputFileType>txt</OutputFileType>
</SpeechRecognition>
</Request>

Response

HTTP/1.1 200 OK
Content-Type: application/xml
Content-Length: 100
Connection: keep-alive
Date: Thu, 14 Jul 2022 12:37:29 GMT
Server: tencent-ci
x-ci-request-id: NTk0MjdmODlfMjQ4OGY3XzYzYzhf****

<Response>
<Template>
<TemplateId>t1460606b9752148c4ab182f55163ba7cd</TemplateId>
<Name>TemplateName</Name>
<State>Normal</State>
<Tag>SpeechRecognition</Tag>
<CreateTime>2020-08-05T11:35:24+0800</CreateTime>
<UpdateTime>2020-08-31T16:15:20+0800</UpdateTime>
<BucketId>test-1234567890</BucketId>
<Category>Custom</Category>
<SpeechRecognition>
<EngineModelType>16k_zh</EngineModelType>
<ChannelNum>1</ChannelNum>
<ResTextFormat>0</ResTextFormat>
<FilterDirty>1</FilterDirty>
<FilterModal>0</FilterModal>
<ConvertNumMode>1</ConvertNumMode>
<SpeakerDiarization>0</SpeakerDiarization>
<SpeakerNumber>0</SpeakerNumber>
<FilterPunc>0</FilterPunc>
</SpeechRecognition>
</Template>
</Response>

Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback

Contact Us

Contact our sales team or business advisors to help your business.

Technical Support

Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

7x24 Phone Support