Real-Time Speech Recognition

Prev Next

search by keyword

Recent Pages

Download PDF

Real-Time Speech Recognition

Last updated: 2022-07-25 10:03:59

Download PDF

Connection Preparations

SDK acquisition

The real-time speech recognition SDK and demo for iOS can be downloaded here.

Notes on connection

You need to view the API description of real-time speech recognition to understand the use requirements and directions of the API before calling it.
The API requires the phone to have an internet connection over GPRS, 3G, Wi-Fi, etc. and requires the system to be iOS 9.0 or later.

Development environment

Add the following settings in the info.plist project:

Set the NSAppTransportSecurity policy by adding the following content:

 <key>NSAppTransportSecurity</key>
<dict>
<key>NSExceptionDomains</key>
<dict>
  <key>qcloud.com</key>
  <dict>
      <key>NSExceptionAllowsInsecureHTTPLoads</key>
      <true/>
      <key>NSExceptionMinimumTLSVersion</key>
      <string>TLSv1.2</string>
      <key>NSIncludesSubdomains</key>
      <true/>
      <key>NSRequiresCertificateTransparency</key>
      <false/>
  </dict>
</dict>
 </dict>

Request the system's mic permission and add the following content:

  <key>NSMicrophoneUsageDescription</key>
<string>Your mic is required to capture audios</string>

Add dependent libraries in the project and add the following libraries in Build Phases' Link Binary With Libraries:
- AVFoundation.framework
- AudioToolbox.framework
- QCloudSDK.framework
- CoreTelephony.framework
- libWXVoiceSpeex.a

The libraries are added as shown below:

Quick Connection

Connection process and demo

The following describes the connection processes and demos for capturing audio for recognition with the built-in recorder and providing audio data respectively.

Demo for capturing audio for recognition with built-in recorder

Import the header file of QCloudSDK and change the filename extension from .m to .mm.
```
#import<QCloudSDK/QCloudSDK.h>
```

Create a QCloudConfig instance.

//1. Create a `QCloudConfig` instance
QCloudConfig *config = [[QCloudConfig alloc] initWithAppId:kQDAppId 
                         secretId:kQDSecretId 
                       secretKey:kQDSecretKey 
                       projectId:kQDProjectId];
config.sliceTime = 600;                        // The length of the audio segment is 600 ms
config.enableDetectVolume = YES;               // Specify whether to detect the volume
config.endRecognizeWhenDetectSilence = YES;    // Specify whether to stop recognition when silence is detected

Create a QCloudRealTimeRecognizer instance.

QCloudRealTimeRecognizer *recognizer = [[QCloudRealTimeRecognizer alloc] initWithConfig:config];

Set the delegate and implement the QCloudRealTimeRecognizerDelegate method.
```
recognizer.delegate = self;
```
Start recognition.
```
[recognizer start];
```
End recognition.
```
[recognizer stop];
```

Sample for providing audio data

Import the header file of QCloudSDK and change the filename extension from .m to .mm.
```
#import<QCloudSDK/QCloudSDK.h>
```

Create a QCloudConfig instance.

//1. Create a `QCloudConfig` instance
QCloudConfig *config = [[QCloudConfig alloc] initWithAppId:kQDAppId 
                       secretId:kQDSecretId 
                  secretKey:kQDSecretKey 
                  projectId:kQDProjectId];
config.sliceTime = 600;                        // The length of the audio segment is 600 ms
config.enableDetectVolume = YES;               // Specify whether to detect the volume
config.endRecognizeWhenDetectSilence = YES;    // Specify whether to stop recognition when silence is detected

Customize QCloudDemoAudioDataSource and implement the QCloudAudioDataSource protocol.

QCloudDemoAudioDataSource *dataSource = [[QCloudDemoAudioDataSource alloc] init];

Create a QCloudRealTimeRecognizer instance.

QCloudRealTimeRecognizer *recognizer = [[QCloudRealTimeRecognizer alloc] initWithConfig:config dataSource:dataSource];

Set the delegate and implement the QCloudRealTimeRecognizerDelegate method.
```
recognizer.delegate = self;
```
Start recognition.
```
[recognizer start];
```
End recognition.
```
[recognizer stop];
```

Descriptions of main API classes

QCloudRealTimeRecognizer initialization description

QCloudRealTimeRecognizer is the real-time speech recognition class, which provides two initialization methods.

/**
* Initialization method where the built-in recorder is used to capture audios
* @param config Configuration parameter. For more information, see the definition of `QCloudConfig`.
*/
- (instancetype)initWithConfig:(QCloudConfig *)config;
/**
* Initialization method which will be called to pass in audio data
* @param config Configuration parameter. For more information, see the definition of `QCloudConfig`.
* @param dataSource Data source of audio data. You must implement the `QCloudAudioDataSource` protocol.
*/
- (instancetype)initWithConfig:(QCloudConfig *)config dataSource:(id<QCloudAudioDataSource>)dataSource;

QCloudConfig initialization method description

/**
* Initialization method - direct authentication
* @param appid     Tencent Cloud `appId` 
* @param secretId  Tencent Cloud `secretId`
* @param secretKey Tencent Cloud `secretKey`
* @param projectId Tencent Cloud `projectId`
*/
- (instancetype)initWithAppId:(NSString *)appid
                    secretId:(NSString *)secretId
                   secretKey:(NSString *)secretKey
                   projectId:(NSString *)projectId;
/**
* Initialization method - authentication through STS temporary credentials
* @param appid     Tencent Cloud `appId` 
* @param secretId  Tencent Cloud temporary `secretId`  
* @param secretKey Tencent Cloud temporary `secretKey`
* @param token     Token
*/
- (instancetype)initWithAppId:(NSString *)appid
                       secretId:(NSString *)secretId
                          secretKey:(NSString *)secretKey
                       token:(NSString *)token;

QCloudRealTimeRecognizerDelegate method description

/**
* One real-time speech recognition is divided into multiple flows, each of which can be understood as a sentence, and multiple sentences can be included in one recognition.
 * Each flow contains multiple `seq` audio data packets, and the `seq` of each flow starts from 0.
*/
@protocol QCloudRealTimeRecognizerDelegate <NSObject>
@required
/**
* Recognition result of each audio package segment
* @param response Recognition result of the audio segment
*/
- (void)realTimeRecognizerOnSliceRecognize:(QCloudRealTimeRecognizer *)recognizer response:(QCloudRealTimeResponse *)response;
@optional
/**
* Callback for recognition success
@param recognizer Real-time speech recognition instance
@param result Total text recognized at one time
*/
- (void)realTimeRecognizerDidFinish:(QCloudRealTimeRecognizer *)recognizer result:(NSString *)result;
/**
* Callback for recognition failure
* @param recognizer Real-time speech recognition instance
* @param error Error message
* @param voiceId  The `voiceId` attached to the error returned by the backend
*/
- (void)realTimeRecognizerDidError:(QCloudRealTimeRecognizer *)recognizer error:(NSError *)error  voiceId:(NSString * _Nullable) voiceId;
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
/**
* Callback for recording start
* @param recognizer Real-time speech recognition instance
* @param error Error message for recording start failure
*/
- (void)realTimeRecognizerDidStartRecord:(QCloudRealTimeRecognizer *)recognizer error:(NSError *)error;
/**
* Callback for recording end
* @param recognizer Real-time speech recognition instance
*/
- (void)realTimeRecognizerDidStopRecord:(QCloudRealTimeRecognizer *)recognizer;
/**
* Real-time callback for recording volume
* @param recognizer Real-time speech recognition instance
* @param volume Audio volume level in the range of -40–0
*/
- (void)realTimeRecognizerDidUpdateVolume:(QCloudRealTimeRecognizer *)recognizer volume:(float)volume;

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
/**
* Audio stream recognition start
* @param recognizer Real-time speech recognition instance
* @param voiceId `voiceId` of the audio stream, which is the unique identifier
* @param seq Sequence number of the flow
*/
- (void)realTimeRecognizerOnFlowRecognizeStart:(QCloudRealTimeRecognizer *)recognizer voiceId:(NSString *)voiceId seq:(NSInteger)seq;
/**
* Audio stream recognition end
* @param recognizer Real-time speech recognition instance
* @param voiceId `voiceId` of the audio stream, which is the unique identifier
* @param seq Sequence number of the flow
*/
- (void)realTimeRecognizerOnFlowRecognizeEnd:(QCloudRealTimeRecognizer *)recognizer voiceId:(NSString *)voiceId seq:(NSInteger)seq;
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
/**
* Audio stream recognition start
* @param recognizer Real-time speech recognition instance
* @param voiceId `voiceId` of the audio stream, which is the unique identifier
* @param seq Sequence number of the flow
*/
- (void)realTimeRecognizerOnFlowStart:(QCloudRealTimeRecognizer *)recognizer voiceId:(NSString *)voiceId seq:(NSInteger)seq;
/**
* Audio stream recognition end
* @param recognizer Real-time speech recognition instance
* @param voiceId `voiceId` of the audio stream, which is the unique identifier
* @param seq Sequence number of the flow
*/
- (void)realTimeRecognizerOnFlowEnd:(QCloudRealTimeRecognizer *)recognizer voiceId:(NSString *)voiceId seq:(NSInteger)seq;
@end

QCloudAudioDataSource protocol description

If you provide audio data instead of capturing audio data with the recorder built in the SDK, you need to implement all methods in this protocol in the same way as used for the implementation of QDAudioDataSource in the demo project.

/**
* Data source of audio data. If you want to provide audio data on your own, you need to implement all methods in this protocol.
* Provide audio data that meets the following requirements:
* Sample rate: 16 kHz
* Audio format: PCM
* Encoding: 16-bit mono-channel
*/
@protocol QCloudAudioDataSource <NSObject>
@required
/**
* It identifies whether the data source has started to work and needs to be set to `YES` after the `start` is executed and to `NO` after the `stop` is executed.
*/
@property (nonatomic, assign) BOOL running;
/**
* The SDK will call the `start` method. Implementing the class of this protocol requires initializing the data source.
*/
- (void)start:(void(^)(BOOL didStart, NSError *error))completion;
/**
* The SDK will call the `stop` method. Implementing the class of this protocol requires stopping supplying data.
*/
- (void)stop;
/**
* The SDK will call this method of the object that implements the protocol to read audio data. If the audio data is less than `expectLength`, `nil` will be returned directly.
* @param expectLength The number of bytes expected to be read. If the returned `NSData` is less than `expectLength` bytes, the SDK will report an exception.
*/
- (nullable NSData *)readData:(NSInteger)expectLength;
@end

Contact Us

Contact our sales team or business advisors to help your business.

Technical Support

Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

7x24 Phone Support

tencent cloud

Recent Pages

Real-Time Speech Recognition

Connection Preparations

SDK acquisition

Notes on connection

Development environment

Quick Connection

Connection process and demo

Demo for capturing audio for recognition with built-in recorder

Sample for providing audio data

Descriptions of main API classes

QCloudRealTimeRecognizer initialization description

QCloudConfig initialization method description

QCloudRealTimeRecognizerDelegate method description

QCloudAudioDataSource protocol description

Was this page helpful?

Was this page helpful?

tencent cloud

Sign Up

Log in

Recent Pages

Real-Time Speech Recognition

Connection Preparations

SDK acquisition

Notes on connection

Development environment

Quick Connection

Connection process and demo

Demo for capturing audio for recognition with built-in recorder

Sample for providing audio data

Descriptions of main API classes

QCloudRealTimeRecognizer initialization description

QCloudConfig initialization method description

QCloudRealTimeRecognizerDelegate method description

QCloudAudioDataSource protocol description

Was this page helpful?

Was this page helpful?