tencent cloud

All product documents
Real-Time Speech Recognition
Last updated: 2022-07-25 10:03:59
Real-Time Speech Recognition
Last updated: 2022-07-25 10:03:59

Connection Preparations

SDK acquisition

The real-time speech recognition SDK and demo for iOS can be downloaded here.

Notes on connection

You need to view the API description of real-time speech recognition to understand the use requirements and directions of the API before calling it.
The API requires the phone to have an internet connection over GPRS, 3G, Wi-Fi, etc. and requires the system to be iOS 9.0 or later.

Development environment

Add the following settings in the info.plist project:
Set NSAppTransportSecurity policies and add the following content:
<key>NSAppTransportSecurity</key>
<dict>
<key>NSExceptionDomains</key>
<dict>
<key>qcloud.com</key>
<dict>
<key>NSExceptionAllowsInsecureHTTPLoads</key>
<true/>
<key>NSExceptionMinimumTLSVersion</key>
<string>TLSv1.2</string>
<key>NSIncludesSubdomains</key>
<true/>
<key>NSRequiresCertificateTransparency</key>
<false/>
</dict>
</dict>
</dict>
Request the system's mic permission and add the following content:
<key>NSMicrophoneUsageDescription</key>
<string> Your mic is required to capture audios </string>
Add dependent libraries in the project and add the following libraries in Build Phases' Link Binary With Libraries:
AVFoundation.framework
AudioToolbox.framework
QCloudSDK.framework
CoreTelephony.framework
libWXVoiceSpeex.a
After adding, it appears as follows:


Quick Connection

Connection process and demo

The following describes the connection processes and demos for capturing audio for recognition with the built-in recorder and providing audio data respectively.

Demo for capturing audio for recognition with built-in recorder

1. Import the header file of QCloudSDK and change the filename extension from .m to .mm.
#import<QCloudSDK/QCloudSDK.h>
2. Create a QCloudConfig instance.
//1. Create QCloudConfig instance
QCloudConfig *config = [[QCloudConfig alloc] initWithAppId:kQDAppId
secretId:kQDSecretId
secretKey:kQDSecretKey
projectId:kQDProjectId];
config.sliceTime = 600; //Voice segmentation duration 600ms
config.enableDetectVolume = YES; //Detect volume or not
config.endRecognizeWhenDetectSilence = YES; //Stop recognition when silence is detected
3. Create a QCloudRealTimeRecognizer instance.
QCloudRealTimeRecognizer *recognizer = [[QCloudRealTimeRecognizer alloc] initWithConfig:config];
4. Set the delegate and implement the QCloudRealTimeRecognizerDelegate method
recognizer.delegate = self;
5. Start recognition
[recognizer start];
6. End recognition
[recognizer stop];

Sample for providing audio data

1. Import the header file of QCloudSDK and change the filename extension from .m to .mm.
#import<QCloudSDK/QCloudSDK.h>
2. Create a QCloudConfig instance.
//1. Create QCloudConfig instance
QCloudConfig *config = [[QCloudConfig alloc] initWithAppId:kQDAppId
secretId:kQDSecretId
secretKey:kQDSecretKey
projectId:kQDProjectId];
config.sliceTime = 600; //Voice segmentation duration 600ms
config.enableDetectVolume = YES; //Detect volume or not
config.endRecognizeWhenDetectSilence = YES; //Stop recognition when silence is detected
3. Customize QCloudDemoAudioDataSource and implement the QCloudAudioDataSource protocol
QCloudDemoAudioDataSource *dataSource = [[QCloudDemoAudioDataSource alloc] init];
4. Create a QCloudRealTimeRecognizer instance.
QCloudRealTimeRecognizer *recognizer = [[QCloudRealTimeRecognizer alloc] initWithConfig:config dataSource:dataSource];
5. Set the delegate and implement the QCloudRealTimeRecognizerDelegate method
recognizer.delegate = self;
6. Start recognition
[recognizer start];
7. End recognition
[recognizer stop];

Descriptions of main API classes

QCloudRealTimeRecognizer initialization description

QCloudRealTimeRecognizer is the real-time speech recognition class, which provides two initialization methods.
/**
* Initialization method where the built-in recorder is used to capture audios
* @param config Configuration parameters, see QCloudConfig Definition
*/
- (instancetype)initWithConfig:(QCloudConfig *)config;

/**
* Initialization method, where the caller passes voice data to call this initialization method
* @param config Configuration parameters, see QCloudConfig Definition
* @param dataSource Voice data source which must implement QCloudAudioDataSource protocol
*/
- (instancetype)initWithConfig:(QCloudConfig *)config dataSource:(id<QCloudAudioDataSource>)dataSource;

QCloudConfig initialization method description

/**
* Initialization method - direct authentication
* @param appid Tencent Cloud `appId`
* @param secretId Tencent Cloud `secretId`
* @param secretKey Tencent Cloud `secretKey`
* @param projectId Tencent Cloud `projectId`
*/
- (instancetype)initWithAppId:(NSString *)appid
secretId:(NSString *)secretId
secretKey:(NSString *)secretKey
projectId:(NSString *)projectId;

/**
* Initialization method - authentication through STS temporary credentials
* @param appid Tencent Cloud `appId`
* @param secretId Tencent Cloud temporary `secretId`
* @param secretKey Tencent Cloud temporary `secretKey`
* @param token Corresponding `token`
*/
- (instancetype)initWithAppId:(NSString *)appid
secretId:(NSString *)secretId
secretKey:(NSString *)secretKey
token:(NSString *)token;


QCloudRealTimeRecognizerDelegate method description

/**
* Real-time recording recognition is divided into multiple flows. Each flow can be understood as a sentence. A recognition session can include multiple sentences.
* Each flow contains multiple seq voice data packets. Each flow's seq starts from 0
*/
@protocol QCloudRealTimeRecognizerDelegate <NSObject>

@required
/**
* Fragmented recognition result of each voice packet
* @param response Recognition result of the voice fragment
*/
- (void)realTimeRecognizerOnSliceRecognize:(QCloudRealTimeRecognizer *)recognizer response:(QCloudRealTimeResponse *)response;

@optional
/**
* Callback for a successful single recognition
@param recognizer Real-time ASR instance
@param result Total text from a single recognition
*/
- (void)realTimeRecognizerDidFinish:(QCloudRealTimeRecognizer *)recognizer result:(NSString *)result;
/**
* Callback for a failed single recognition
* @param recognizer Real-time ASR instance
* @param error Error message
* @param voiceId If the error is returned from the backend, include voiceId
*/

- (void)realTimeRecognizerDidError:(QCloudRealTimeRecognizer *)recognizer error:(NSError *)error voiceId:(NSString * _Nullable) voiceId;

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
/**
* Callback for recording start
* @param recognizer Real-time ASR instance
* @param error Failed to start recording, error message
*/
- (void)realTimeRecognizerDidStartRecord:(QCloudRealTimeRecognizer *)recognizer error:(NSError *)error;
/**
* Callback for recording end
* @param recognizer Real-time ASR instance
*/
- (void)realTimeRecognizerDidStopRecord:(QCloudRealTimeRecognizer *)recognizer;
/**
* Real-Time callback for recording volume
* @param recognizer Real-time ASR instance
* @param volume Audio volume level in the range of -40 to 0
*/
- (void)realTimeRecognizerDidUpdateVolume:(QCloudRealTimeRecognizer *)recognizer volume:(float)volume;


//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
/**
* Start recognition of the voice stream
* @param recognizer Real-time ASR instance
* @param voiceId The voiceId corresponding to the voice stream, a unique identifier
* @param seq The sequence number of the flow
*/
- (void)realTimeRecognizerOnFlowRecognizeStart:(QCloudRealTimeRecognizer *)recognizer voiceId:(NSString *)voiceId seq:(NSInteger)seq;
/**
* End recognition of the voice stream
* @param recognizer Real-time ASR instance
* @param voiceId The voiceId corresponding to the voice stream, a unique identifier
* @param seq The sequence number of the flow
*/
- (void)realTimeRecognizerOnFlowRecognizeEnd:(QCloudRealTimeRecognizer *)recognizer voiceId:(NSString *)voiceId seq:(NSInteger)seq;

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
/**
* Voice stream recognition started
* @param recognizer Real-time ASR instance
* @param voiceId The voiceId corresponding to the voice stream, a unique identifier
* @param seq The sequence number of the flow
*/
- (void)realTimeRecognizerOnFlowStart:(QCloudRealTimeRecognizer *)recognizer voiceId:(NSString *)voiceId seq:(NSInteger)seq;
/**
* Voice stream recognition ended
* @param recognizer Real-time ASR instance
* @param voiceId The voiceId corresponding to the voice stream, a unique identifier
* @param seq The sequence number of the flow
*/
- (void)realTimeRecognizerOnFlowEnd:(QCloudRealTimeRecognizer *)recognizer voiceId:(NSString *)voiceId seq:(NSInteger)seq;

@end


QCloudAudioDataSource protocol description

If you provide audio data instead of capturing audio data with the recorder built in the SDK, you need to implement all methods in this protocol in the same way as used for the implementation of QDAudioDataSource in the demo project.
/**
* Voice data source. If you need to provide your own voice data, implement all methods in this protocol
* Provide voice data that meets the following requirements:
* Sampling rate: 16k
* Audio format: PCM
* Encoding: 16-bit single channel
*/
@protocol QCloudAudioDataSource <NSObject>

@required

/**
* Indicates whether the data source is working. Set to YES after executing start, set to NO after executing stop
*/
@property (nonatomic, assign) BOOL running;

/**
* The SDK will call the start method. Classes implementing this protocol need to initialize the data source.
*/
- (void)start:(void(^)(BOOL didStart, NSError *error))completion;
/**
* The SDK will call the stop method. Classes implementing this protocol need to stop providing data
*/
- (void)stop;
/**
* The SDK will call this method on the object implementing this protocol to read voice data. If there is not enough voice data for the expected length, it should return nil.
* @param expectLength The expected number of bytes to read. If the returned NSData is less than the expected length, the SDK will throw an exception.
*/
- (nullable NSData *)readData:(NSInteger)expectLength;

@end
Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback

Contact Us

Contact our sales team or business advisors to help your business.

Technical Support

Open a ticket if you're looking for further assistance. Our Ticket is 7x24 available.

7x24 Phone Support
Hong Kong, China
+852 800 906 020 (Toll Free)
United States
+1 844 606 0804 (Toll Free)
United Kingdom
+44 808 196 4551 (Toll Free)
Canada
+1 888 605 7930 (Toll Free)
Australia
+61 1300 986 386 (Toll Free)
EdgeOne hotline
+852 300 80699
More local hotlines coming soon