tencent cloud

Feedback

Real-Time Speech Recognition

Last updated: 2022-07-25 10:03:59

    Connection Preparations

    SDK acquisition

    The real-time speech recognition SDK and demo for iOS can be downloaded here.

    Notes on connection

    You need to view the API description of real-time speech recognition to understand the use requirements and directions of the API before calling it.
    The API requires the phone to have an internet connection over GPRS, 3G, Wi-Fi, etc. and requires the system to be iOS 9.0 or later.

    Development environment

    Add the following settings in the info.plist project:
    Set NSAppTransportSecurity policies and add the following content:
    <key>NSAppTransportSecurity</key>
    <dict>
    <key>NSExceptionDomains</key>
    <dict>
    <key>qcloud.com</key>
    <dict>
    <key>NSExceptionAllowsInsecureHTTPLoads</key>
    <true/>
    <key>NSExceptionMinimumTLSVersion</key>
    <string>TLSv1.2</string>
    <key>NSIncludesSubdomains</key>
    <true/>
    <key>NSRequiresCertificateTransparency</key>
    <false/>
    </dict>
    </dict>
    </dict>
    Request the system's mic permission and add the following content:
    <key>NSMicrophoneUsageDescription</key>
    <string> Your mic is required to capture audios </string>
    Add dependent libraries in the project and add the following libraries in Build Phases' Link Binary With Libraries:
    AVFoundation.framework
    AudioToolbox.framework
    QCloudSDK.framework
    CoreTelephony.framework
    libWXVoiceSpeex.a
    After adding, it appears as follows:
    

    Quick Connection

    Connection process and demo

    The following describes the connection processes and demos for capturing audio for recognition with the built-in recorder and providing audio data respectively.

    Demo for capturing audio for recognition with built-in recorder

    1. Import the header file of QCloudSDK and change the filename extension from .m to .mm.
    #import<QCloudSDK/QCloudSDK.h>
    2. Create a QCloudConfig instance.
    //1. Create QCloudConfig instance
    QCloudConfig *config = [[QCloudConfig alloc] initWithAppId:kQDAppId
    secretId:kQDSecretId
    secretKey:kQDSecretKey
    projectId:kQDProjectId];
    config.sliceTime = 600; //Voice segmentation duration 600ms
    config.enableDetectVolume = YES; //Detect volume or not
    config.endRecognizeWhenDetectSilence = YES; //Stop recognition when silence is detected
    3. Create a QCloudRealTimeRecognizer instance.
    QCloudRealTimeRecognizer *recognizer = [[QCloudRealTimeRecognizer alloc] initWithConfig:config];
    4. Set the delegate and implement the QCloudRealTimeRecognizerDelegate method
    recognizer.delegate = self;
    5. Start recognition
    [recognizer start];
    6. End recognition
    [recognizer stop];

    Sample for providing audio data

    1. Import the header file of QCloudSDK and change the filename extension from .m to .mm.
    #import<QCloudSDK/QCloudSDK.h>
    2. Create a QCloudConfig instance.
    //1. Create QCloudConfig instance
    QCloudConfig *config = [[QCloudConfig alloc] initWithAppId:kQDAppId
    secretId:kQDSecretId
    secretKey:kQDSecretKey
    projectId:kQDProjectId];
    config.sliceTime = 600; //Voice segmentation duration 600ms
    config.enableDetectVolume = YES; //Detect volume or not
    config.endRecognizeWhenDetectSilence = YES; //Stop recognition when silence is detected
    3. Customize QCloudDemoAudioDataSource and implement the QCloudAudioDataSource protocol
    QCloudDemoAudioDataSource *dataSource = [[QCloudDemoAudioDataSource alloc] init];
    4. Create a QCloudRealTimeRecognizer instance.
    QCloudRealTimeRecognizer *recognizer = [[QCloudRealTimeRecognizer alloc] initWithConfig:config dataSource:dataSource];
    5. Set the delegate and implement the QCloudRealTimeRecognizerDelegate method
    recognizer.delegate = self;
    6. Start recognition
    [recognizer start];
    7. End recognition
    [recognizer stop];

    Descriptions of main API classes

    QCloudRealTimeRecognizer initialization description

    QCloudRealTimeRecognizer is the real-time speech recognition class, which provides two initialization methods.
    /**
    * Initialization method where the built-in recorder is used to capture audios
    * @param config Configuration parameters, see QCloudConfig Definition
    */
    - (instancetype)initWithConfig:(QCloudConfig *)config;
    
    /**
    * Initialization method, where the caller passes voice data to call this initialization method
    * @param config Configuration parameters, see QCloudConfig Definition
    * @param dataSource Voice data source which must implement QCloudAudioDataSource protocol
    */
    - (instancetype)initWithConfig:(QCloudConfig *)config dataSource:(id<QCloudAudioDataSource>)dataSource;

    QCloudConfig initialization method description

    /**
    * Initialization method - direct authentication
    * @param appid Tencent Cloud `appId`
    * @param secretId Tencent Cloud `secretId`
    * @param secretKey Tencent Cloud `secretKey`
    * @param projectId Tencent Cloud `projectId`
    */
    - (instancetype)initWithAppId:(NSString *)appid
    secretId:(NSString *)secretId
    secretKey:(NSString *)secretKey
    projectId:(NSString *)projectId;
    
    /**
    * Initialization method - authentication through STS temporary credentials
    * @param appid Tencent Cloud `appId`
    * @param secretId Tencent Cloud temporary `secretId`
    * @param secretKey Tencent Cloud temporary `secretKey`
    * @param token Corresponding `token`
    */
    - (instancetype)initWithAppId:(NSString *)appid
    secretId:(NSString *)secretId
    secretKey:(NSString *)secretKey
    token:(NSString *)token;
    

    QCloudRealTimeRecognizerDelegate method description

    /**
    * Real-time recording recognition is divided into multiple flows. Each flow can be understood as a sentence. A recognition session can include multiple sentences.
    * Each flow contains multiple seq voice data packets. Each flow's seq starts from 0
    */
    @protocol QCloudRealTimeRecognizerDelegate <NSObject>
    
    @required
    /**
    * Fragmented recognition result of each voice packet
    * @param response Recognition result of the voice fragment
    */
    - (void)realTimeRecognizerOnSliceRecognize:(QCloudRealTimeRecognizer *)recognizer response:(QCloudRealTimeResponse *)response;
    
    @optional
    /**
    * Callback for a successful single recognition
    @param recognizer Real-time ASR instance
    @param result Total text from a single recognition
    */
    - (void)realTimeRecognizerDidFinish:(QCloudRealTimeRecognizer *)recognizer result:(NSString *)result;
    /**
    * Callback for a failed single recognition
    * @param recognizer Real-time ASR instance
    * @param error Error message
    * @param voiceId If the error is returned from the backend, include voiceId
    */
    
    - (void)realTimeRecognizerDidError:(QCloudRealTimeRecognizer *)recognizer error:(NSError *)error voiceId:(NSString * _Nullable) voiceId;
    
    //////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
    /**
    * Callback for recording start
    * @param recognizer Real-time ASR instance
    * @param error Failed to start recording, error message
    */
    - (void)realTimeRecognizerDidStartRecord:(QCloudRealTimeRecognizer *)recognizer error:(NSError *)error;
    /**
    * Callback for recording end
    * @param recognizer Real-time ASR instance
    */
    - (void)realTimeRecognizerDidStopRecord:(QCloudRealTimeRecognizer *)recognizer;
    /**
    * Real-Time callback for recording volume
    * @param recognizer Real-time ASR instance
    * @param volume Audio volume level in the range of -40 to 0
    */
    - (void)realTimeRecognizerDidUpdateVolume:(QCloudRealTimeRecognizer *)recognizer volume:(float)volume;
    
    
    //////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
    /**
    * Start recognition of the voice stream
    * @param recognizer Real-time ASR instance
    * @param voiceId The voiceId corresponding to the voice stream, a unique identifier
    * @param seq The sequence number of the flow
    */
    - (void)realTimeRecognizerOnFlowRecognizeStart:(QCloudRealTimeRecognizer *)recognizer voiceId:(NSString *)voiceId seq:(NSInteger)seq;
    /**
    * End recognition of the voice stream
    * @param recognizer Real-time ASR instance
    * @param voiceId The voiceId corresponding to the voice stream, a unique identifier
    * @param seq The sequence number of the flow
    */
    - (void)realTimeRecognizerOnFlowRecognizeEnd:(QCloudRealTimeRecognizer *)recognizer voiceId:(NSString *)voiceId seq:(NSInteger)seq;
    
    //////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
    /**
    * Voice stream recognition started
    * @param recognizer Real-time ASR instance
    * @param voiceId The voiceId corresponding to the voice stream, a unique identifier
    * @param seq The sequence number of the flow
    */
    - (void)realTimeRecognizerOnFlowStart:(QCloudRealTimeRecognizer *)recognizer voiceId:(NSString *)voiceId seq:(NSInteger)seq;
    /**
    * Voice stream recognition ended
    * @param recognizer Real-time ASR instance
    * @param voiceId The voiceId corresponding to the voice stream, a unique identifier
    * @param seq The sequence number of the flow
    */
    - (void)realTimeRecognizerOnFlowEnd:(QCloudRealTimeRecognizer *)recognizer voiceId:(NSString *)voiceId seq:(NSInteger)seq;
    
    @end
    

    QCloudAudioDataSource protocol description

    If you provide audio data instead of capturing audio data with the recorder built in the SDK, you need to implement all methods in this protocol in the same way as used for the implementation of QDAudioDataSource in the demo project.
    /**
    * Voice data source. If you need to provide your own voice data, implement all methods in this protocol
    * Provide voice data that meets the following requirements:
    * Sampling rate: 16k
    * Audio format: PCM
    * Encoding: 16-bit single channel
    */
    @protocol QCloudAudioDataSource <NSObject>
    
    @required
    
    /**
    * Indicates whether the data source is working. Set to YES after executing start, set to NO after executing stop
    */
    @property (nonatomic, assign) BOOL running;
    
    /**
    * The SDK will call the start method. Classes implementing this protocol need to initialize the data source.
    */
    - (void)start:(void(^)(BOOL didStart, NSError *error))completion;
    /**
    * The SDK will call the stop method. Classes implementing this protocol need to stop providing data
    */
    - (void)stop;
    /**
    * The SDK will call this method on the object implementing this protocol to read voice data. If there is not enough voice data for the expected length, it should return nil.
    * @param expectLength The expected number of bytes to read. If the returned NSData is less than the expected length, the SDK will throw an exception.
    */
    - (nullable NSData *)readData:(NSInteger)expectLength;
    
    @end
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support