tencent cloud

Feedback

Scenario Solution

Last updated: 2024-07-25 16:45:57

    Scenario Introduction

    A voice chat room provides a virtual space for audio-only online social interaction. Typically, the room contains several seats where anchors and co-speakers can engage in voice conversations, while other listeners can join the room to listen in. The number of seats and listeners vary by room type. Tencent Cloud's Tencent Real-Time Communication (TRTC) supports up to 50 people chatting on the mic simultaneously, with smooth transitions between speaking and listening, and a voice chat latency of less than 300ms. It includes a variety of audio effects like voice changing, ambiance effects, and reverb to enrich the chat experience. Combined with Instant Messaging, it supports various forms of message interaction such as public chat, private chat, group chat, likes, and gift sending, creating a lively and engaging chat interaction experience.
    
    
    

    Implementation Scheme

    Implementing a complete voice chat room scenario usually involves several functional modules: Room Management, Seat Management, Audio Stream Management, On-Cloud Recording, etc. The key actions and feature points under each functional module are as follows:
    Functional Module
    Key Actions and Feature Points
    Room Management
    Room list, create a room, enter a room, exit a room, and terminate a room.
    Seat Management
    Become a speaker, invite a listener to speak, become a listener, remove a speaker, mute a seat, lock a seat and move a seat.
    Audio Stream Management
    Publishing/Playback Architecture Solution, Real-Time Stream Subscription Mode
    On-Cloud Recording
    TRTC On-Cloud Recording.
    The overall business architecture of the voice chat room is shown in the figure below. Room owners create voice chat rooms, and users can choose to join rooms that interest them. After entering a room, users can go on the seat and engage in voice interaction with the speakers. However, due to compliance requirements, the voice content in the room needs to be recorded and reviewed.
    
    
    

    Room Management

    The Room Management Module is primarily responsible for maintaining the room list and includes the following features:
    Create Room: After users log in to the business system, they can create a room. The room list needs to be updated after a room is created.
    Enter Room: Users can choose to enter an existing room. Upon entering, the current list of room members should be updated.
    Exit Room: Users can choose to exit the current room. Upon exiting, the current list of room members needs to be updated with a delete operation.
    Terminate Room: After all users exit the room, it needs to be terminateed. Upon destruction, the room list needs to be updated with a delete operation.

    Scheme Architecture

    In the overall architecture of room management, there are primarily three major modules involved:
    Room Management: Mainly used for the maintenance and administration of the room list, such as synchronizing the properties and status of rooms. Features include room list querying, entering/exiting rooms, and creating/terminateing rooms.
    IM group management: This module is primarily used for managing room member lists, signaling transmission, and message interactions. For instance, it handles actions such as approving/rejecting a speaking request, inviting a listener to speak/removing a listener, muting/unmuting a seat, and blocking/unblocking a seat. This feature is also distinguished by group dimension, including creating groups, joining groups, exiting groups, and terminateing groups.
    TRTC Room Management: This module is mainly used for the interaction and transmission of audio streams. For instance, it facilitates the sending and listening of voice/music between anchors and audiences. It is also distinguished by room dimensions. Features include entering/exiting TRTC rooms.
    
    
    

    Specific Implementation

    In room management, different user roles have different feature permissions and implementation processes. In voice chat rooms, there are mainly two roles: the room owner and the listeners. For a detailed description and differences of roles, see the table below:
    Roles
    Description
    Differences
    Room Owner
    The room owner with the highest authority in the room can create or terminate the room.
    The role must be an anchor.
    Creates or terminates rooms/IM groups/RTC rooms
    Listeners
    Participants in the room can also take the mic to become the anchor.
    The role can be either audience or anchor.
    Enter/Exit Room

    Implementation Process

    Room Owner
    
    
    
    1. Obtain the room list.
    2. Create the corresponding room through the business API.
    3. Create an IM group.
    4. Enter the Room/IM Room/RTC Room, and interact with others.
    5. Exit the IM Room/RTC Room/Room.
    6. Terminate the IM Group.
    Listeners
    
    
    
    1. Obtain the room list.
    2. Enter the Room/IM Group/RTC Room, and interact with others.
    3. Exit the IM Group/RTC Room/Room.

    Seat Management

    In a Voice Chat Room, the seats are usually ordered and limited. For example, an audience needs the room owner's approval to speak orderly. Generally, the number of seats in a room does not exceed 10. Seat management is mainly responsible for managing the number of seats in a room according to the business scenario, as well as the status of all current seats in the room. The main features of seat management include: becoming a speaker, inviting a listener to speak, becoming a listener, removing a speaker, muting a seat, locking a seat, and moving a seat.
    After users enter the room, they can only request to speak when there are idle seats available.
    After the room owner approves a user to become a speaker, the seat status needs to be changed to non-idle.
    After the user stops streaming and becomes a listener, the seat status needs to be reset.
    The room owner has the authority to lock the seat, invite a listener to speak, remove a speaker, mute a speaker, etc.

    Scheme Architecture

    The architecture of seat management will be organized by integrating Tencent Real-Time Communication (TRTC) and Instant Messaging (IM). In the entire room management architecture, the room owner has the highest authority and can invite a listener to speak, remove a speaker, mute/unmute a seat, and block/unblock a seat. Listeners can also request to speak, become speakers, and interact with other speakers in the room.
    
    
    

    Specific Implementation

    In seat management, different user roles have different feature permissions and implementation processes, primarily involving two roles: room owner and audience. For details on role descriptions and their differences, see the table below:
    Roles
    Description
    Differences
    Room Owner
    The figure with the highest authority over seats. The room owner is responsible for managing all seats. When the room owner exits the room, all seats are automatically dissolved.
    The role must be an anchor.
    Enter the room and become a speaker
    Approve or reject speaking requests
    Invite a listener to speak/Remove a listener
    Mutes/Unmutes a seat
    Blocks/Unblocks a seat
    Listeners
    Room seat participants who engage in voice interactions.
    The role can be either audience or anchor.
    Request to speak/become a listener

    Implementation Process

    Room Owner
    
    
    
    1. Anchors enter the Room Lobby and obtain the room list.
    2. Anchors create and enter the room as room owners.
    3. The room owner obtains the seat list based on group attributes and becomes a speaker.
    4. Listeners become speakers. After becoming speakers, they can interact with other speakers. There are two ways for listeners to become speakers: they can either request to speak actively, which the room owner approves, or the room owner can invite them to speak, which they accept.
    5. Speakers become listeners. There are two ways to become listeners: they can become listeners actively, or the room owner can forcibly remove them.
    6. The room owner exits and terminates the room (the room is dissolved, and all speakers are forcibly removed and exit the room).
    Listeners
    
    
    
    1. Listeners enter the Room Lobby and obtain the room list.
    2. Listeners select and enter the room.
    3. Listeners obtain the seat list based on group attributes.
    4. Listeners request to speak. After approval from the room owner, they interact with other speakers.
    5. Speakers become listeners and exit the room.

    Audio Stream Management

    The typical interactive scenario for voice chat often opts for the RTC stream access solution, as it offers simplicity and quick integration while providing the low-latency characteristics of real-time interaction. As shown in the figure below, a classic publishing/playback architecture of real-time interactive voice chat is displayed, showcasing two roles: speakers and listeners.
    
    
    
    For real-time stream subscription within the room, TRTC offers two subscription modes: Automatic Subscription and Manual Subscription.
    Automatic Subscription: Upon entering the room, users will immediately receive the room's audio and video streams, with audio automatically playing and video automatically starting to decode.
    Manual Subscription: After entering the room, users need to manually call startRemoteView to start the subscription and decoding of the video stream, and manually call muteRemoteAudio to start the playback of audio.
    In most scenarios, TRTC defaults to the Automatic Subscription mode, where users subscribe to audio and video streams from all anchors in the room upon entering, achieving a better instant opening experience. The Manual Subscription mode, on the other hand, offers greater flexibility and customizability, allowing users to selectively subscribe to audio and video streams.
    Note:
    Compared to the Manual Subscription mode, Automatic Subscription does not require complex media stream subscription management. For voice chat scenarios without special requirements, Automatic Subscription is recommended.

    On-Cloud Recording

    TRTC's newly upgraded On-Cloud Recording does not depend on Cloud Streaming Services. It does not require a relayed push for cloud live streaming and uses TRTC's internal real-time recording cluster for audio and video recording, offering a more comprehensive and unified recording experience.
    Single Stream Recording: With TRTC's On-Cloud Recording feature, you can record each user's audio stream in the room as a separate file.
    
    
    
    Mixed Stream Recording: Mix and record the audio media streams from the same room into a single file.
    
    
    
    Note:
    For a detailed introduction and activation guide to TRTC On-Cloud Recording, see On-Cloud Recording.

    Key Business Logic

    Ganchor Mic Handling Solution

    Ganchor mic, also known as blast mic or black mic, refers to the phenomenon where users not becoming speakers can speak and other users can hear their voice. The fundamental reason for the ganchor mic phenomenon is the inconsistency between the seat status and the TRTC user role status. There are several possible reasons for this issue.
    When a speaker becomes a listener, the seat list is updated accordingly. However, if the seat information callback is not triggered or intercepted, the local TRTC operations for switching the audience's role and turning off the mic will not be performed. This can result in the listener being able to speak.
    When a speaker becomes a listener, the seat list is updated accordingly. However, after the seat information callback is received, the audience's local call to the TRTC switch audience's role API fails, resulting in listeners being able to speak.
    The app is cracked by brute force, leading to the UserSig being intercepted by hackers, allowing hackers to enter the TRTC room as an anchor and speak at will.

    Detection and Handling of Ganchor Mics

    By detecting ganchor mics, we can proactively identify and promptly handle them. It is recommended to use a server detection solution: Real-time anchor list comparison detection.
    Principle of the Solution: In the voice chat room scenario, user roles are divided into anchors and audiences, with only the anchor being able to upstream local audio. Therefore, ganchor mics can be detected by comparing the seat list with the TRTC role list. TRTC provides server-side room and media event callbacks. By listening for events such as entering the room, switching roles, and leaving the room, you can maintain a real-time anchor list for the current room. Then, by comparing the TRTC real-time anchor list with the full seat list, ganchor mics can be easily detected and identified, and actions such as removing from the room or muting can be performed.
    1. Tencent Real-Time Communication (TRTC) Console supports self-configuration of callback information. Once the configuration is complete, event callback notifications can be received. For more details, see Callback Configuration.
    2. Receive and parse callback event packages, pay attention to events 103/104/105, and count the real-time online anchor list in the current room. For more details, see Callback Event.
    103
    104
    105
    {
    "EventGroupId": 1, #Room event group
    "EventType": 103, #Enter room event
    "CallbackTs": 1687679847972, #Callback time, in milliseconds
    "EventInfo": {
    "RoomId": "123456", #Room ID
    "EventTs": 1687679847, #Event occurrence time, in seconds
    "EventMsTs": 1687679847899, #Event occurrence time, in milliseconds
    "UserId": "1a99b0a9", #Username
    "Role": 20, #User role 20:Anchor; 21:Audience
    "TerminalType": 2, #Terminal Type
    "UserType": 3, #User Type
    "Reason": 1 #Specific reasons
    }
    }
    {
    "EventGroupId": 1, #Room event group
    "EventType": 104, #Exit room event
    "CallbackTs": 1687679847972, #Callback time, in milliseconds
    "EventInfo": {
    "RoomId": "123456", #Room ID
    "EventTs": 1687679847, #Event occurrence time, in seconds
    "EventMsTs": 1687679847899, #Event occurrence time, in milliseconds
    "UserId": "1a99b0a9", #Username
    "Role": 20, #User role 20:Anchor; 21:Audience
    "Reason": 1 #Specific reasons
    }
    }
    {
    "EventGroupId": 1, #Room event group
    "EventType": 105, #Switch role event
    "CallbackTs": 1687679847972, #Callback time, in milliseconds
    "EventInfo": {
    "RoomId": "123456", #Room ID
    "EventTs": 1687679847, #Event occurrence time, in seconds
    "EventMsTs": 1687679847899, #Event occurrence time, in milliseconds
    "UserId": "1a99b0a9", #Username
    "Role": 20 #User role 20: Anchor; 21: Audience
    }
    }
    Note:
    105-Switch Role Event is only triggered by changes in the user role after entering the room. Therefore, you also need to supplement the user role list based on the initial role information in 103-Enter Room Event, as well as update the user role list according to 104-Exit Room Event, to maintain a more accurate list of room user roles.
    3. Periodically compare the seat list and the real-time TRTC anchor list for each room, identify ganchor mics, and mute or remove them accordingly.

    Anti-Stutter Solution When Switching on and off the Mic

    Problem Description

    Due to differences in the mechanisms of mobile device systems, the performance of switching on and off the mic in the voice chat scenario may differ between Android and iOS. On iOS devices, there may be brief audio stutters when switching on and off the mic.

    Cause Analysis

    This is related to the iOS system's audio mechanism. The startLocalAudio and stopLocalAudio operations access and release the microphone device permissions, respectively. SDK's audio re-capturing causes the AVAudioSession to restart the audio driver, resulting in a temporary audio stutter when switching on and off the mic.

    Solutions

    The typical sequence of switching on and off the mic in TRTC is shown in the figure below. Switching roles simultaneously starts or stops local audio capture and publishing. This solution works normally on the Android platform.
    On iOS, during the mic off operation, it is possible to stop streaming simply by switching the audience role, without the need to call stopLocalAudio to stop audio capture and release mic permissions, thereby avoiding audio stutters during mic on/off.
    
    
    
    Note:
    In the anti-stutter solution, not calling stopLocalAudio will keep the mic in a continuous capturing state, which may lead to user misunderstanding.

    Best Practices for Audio Configuration

    In audio configuration, audio quality and volume type are two distinct concepts. In TRTC, audio quality can be set during the enabling of local audio capture and publishing by using startLocalAudio(TRTCAudioQuality) or setAudioQuality(TRTCAudioQuality) to individually set audio quality; volume type is determined by a combination of factors such as the room entry scenario and audio quality settings. Additionally, it can be forcibly specified through setSystemVolumeType(TRTCSystemVolumeType).

    Best Practices for Audio Quality Configuration

    The TRTC SDK provides three finely tuned audio quality modes to meet the diverse audio quality needs of various vertical scenarios.
    Audio Quality Mode
    Audio Quality Enumeration Values
    Audio Quality Parameters
    Audio Quality Explanation
    Voice Mode
    TRTCAudioQualitySpeech
    Sampling Rate: 16 k; Mono;
    Encoding Bitrate: 16 kbps
    It has strong network resilience and performs well in poor network environments, making it suitable for applications primarily focused on voice communication, such as online meetings, voice calls, etc.
    Default Mode
    TRTCAudioQualityDefault
    Sampling Rate: 48k; Mono;
    Encoding Bitrate: 50 kbps
    The default SDK settings. It offers better fidelity for music than the Voice Mode, while also transmitting much less data volume than the Music Mode. This makes it versatile and suitable for various scenarios.
    Music Mode
    TRTCAudioQualityMusic
    Sampling Rate: 48 k; Full-Band Stereo;
    Encoding Bitrate: 128 kbps
    Under this mode, the audio transmission consumes a significant data volume, ensuring that the music signal achieves high fidelity in detail restoration across all frequency bands, suitable for scenarios requiring high-quality music transmission.
    As can be seen from the table, from Voice Mode to Music Mode, the audio quality effect improves, but the data volume of audio transmission also increases.
    In the scenario of a voice chat room, it is recommended to choose the Voice Mode for pure voice communication, which can achieve better smoothness under weak network conditions.
    For voice chat rooms with a need to play background music, it is recommended to choose the Default Mode or Music Mode to achieve good audio detail restoration.
    Considering the bandwidth pressure on downstream audience networks, to ensure a good user experience, it is advisable to use the Music Mode cautiously in business scenarios with ten or more seats.
    Note:
    TRTC audio quality supports dynamic adjustment, which means audio quality can be dynamically adjusted during the streaming process by calling setAudioQuality(TRTCAudioQuality).

    Best Practices for Volume Type Configuration

    The TRTC SDK provides three control modes for system volume types to meet the different needs of volume types in various scenarios.
    Volume Type Mode
    Volume Type Mode Enumerated Values
    Volume Type Mode Description
    Full Call Volume
    TRTCSystemVolumeTypeVOIP
    The advantage of this solution is that the audio module does not need to switch working modes during mic on/off, enabling seamless mic switching. It is suitable for applications where users frequently switch mics. If the scenario selected upon entering the room is TRTCAppSceneVideoCall or TRTCAppSceneAudioCall, the SDK will automatically use this mode.
    Automatic Switching Mode
    TRTCSystemVolumeTypeAuto
    Also known as Voice Call on Mic, Media Off Mic. This mode ensures that the anchor uses call volume when on the mic, while the off-mic audience uses media volume, suitable for live streaming scenarios. If the scenario selected upon entering the room is TRTCAppSceneLIVE or TRTCAppSceneVoiceChatRoom, the SDK will automatically use this mode.
    Full Media Volume
    TRTCSystemVolumeTypeMedia
    Use Media Volume for the entire call. It is suitable for music scenarios where demanding audio quality is required. If your users mainly use external devices (such as external sound cards), this mode can be adopted.
    In call scenarios, it is recommended to choose the default Full Call Volume, where the audio module does not need to switch;
    In voice chat room scenarios, it is recommended to use the default Automatic Switching mode for pure voice communication, that is, Voice Call on Mic, Media Off Mic.
    Voice chat rooms that need background music can consider setting the volume to Full Media Volume throughout to avoid users perceiving stuttering or sudden volume changes in music when going on and off the mic.
    Note:
    If you need to specify a volume type, it is recommended to call setSystemVolumeType once after entering the room and before starting to stream. Do not call it during the mic on/off.
    Call Volume supports the phone's built-in AEC feature and allows audio pickup via the mic on Bluetooth headphones, but the disadvantage is the audio quality is relatively average.
    Media Volume does not support the phone's built-in AEC feature and does not support audio pickup via the mic on Bluetooth headphones, but it has better music playback performance.

    Single-Stream Volume Evaluation

    In voice chat room scenarios, some customers may opt to push and pull RTC single streams for speakers, while pulling mixed streams from the room for audiences, aiming to save bandwidth costs. However, in voice chat room scenarios, it is usually necessary to provide UI prompts based on the volume level of the speakers, such as sound waves or volume bars. While volume evaluation and feedback for single-channel audio is straightforward to implement in TRTC rooms, achieving this in audio-only mix streams requires specialized techniques. Below are the specific implementations of two solutions.

    Single-Stream Volume Evaluation in RTC Room

    Step One: Enable Volume Prompts
    Enable the volume callback through the enableAudioVolumeEvaluation API, and optionally enable the local voice detection feature. After enabling this feature, the SDK will provide feedback in the onUserVoiceVolume callback about the volume of both local users and remote streaming users, the maximum volume value, as well as the local voice detection result.
    Note:
    Starting from TRTC SDK version 10.2, the local voice detection feature has been added. Once enabled, the local voice detection result will be displayed in TRTCVolumeInfo.vad (for users in the anchor role). Operations such as muteLocalAudio and setAudioCaptureVolume(0) will not affect the voice detection result, making it convenient to remind users to turn on their mics.
    Step Two: Listen to the Volume Callback
    Listen to the onUserVoiceVolume callback in TRTCCloudListener. This callback provides information on the volume levels of both local and remote users' streams, as well as the maximum volume value of remote users. Based on these volume levels, you can adjust the UI to display corresponding voice waveforms.
    Note:
    Rendering voice waveform animations for speakers can be determined by the volume level in the onUserVoiceVolume callback. The activation and deactivation of voice waveform animations (user's mic on/off state) are recommended to be determined based on the onUserAudioAvailable callback.

    Evaluation of Single Stream Volume in Audio-Only Mixed Stream

    
    
    
    The implementation process of evaluating single stream volume in audio-only mixed stream is shown in the diagram above. Speakers need to listen for volume level callback and determine both local and remote volume levels. Then, insert the local volume value and user information into the audio stream in the form of SEI messages. After mixing, these messages are transmitted to listeners. Alternatively, the room owner can send out speakers' callback volume values through SEI messages. The diagram below shows the sequence diagram of the entire process:
    
    
    
    Note:
    If there is a requirement for mixing and relaying to CDN while transmitting SEI:
    The room entry scenario must be set to LIVE and cannot be set to pure audio entry, otherwise SEI messages will not be transmitted.
    If the mixing API adopts setMixTranscodingConfig, then the mixing mode cannot use the PureAudio audio-only mode.
    If the mixing API adopts startPublishMediaStream, then the media stream transcoding configuration parameters must carry the TRTCVideoLayout parameter.
    As shown below, the audience will see the volume levels of the respective speakers in the SEI messages parsed from the mixed stream.
    
    
    

    Scenario Gameplay

    In the voice chat room scenario, the room owner and several speakers interact online through voice, and there may also be listeners who cannot speak but only listen and interact through sending gifts and chat messages. Different room themes are usually set to attract users with similar interests for viewing and interaction. Common themes include FM radio, Karaoke chat, game interaction, and sports event streaming.

    FM Radio Room

    There may be a solo live broadcast by an anchor or a host with several fixed chatting guests, while background music and sound effects are played simultaneously. Listeners can request to speak by giving gifts to participate in voice interaction.
    This scenario typically involves a large number of audiences, with infrequent mic switching. It is suitable to use the solution of speakers pushing and pulling RTC streams while the audience pulls CDN mixed streams. When audiences become speakers, they switch from the RTMP channel to the TRTC channel to enter the room and stream in real time. This solution balances real-time interaction with cost.
    Note:
    For this scenario, it is recommended to use the solution of speakers pushing and pulling RTC streams while the audience pulls CDN mixed streams.
    When audiences switch between the RTMP and RTC protocol for mic-connecting, ensure a smooth transition between on-mic and off-mic states.

    Karaoke Voice Chat Room

    Usually, there is one administrator, and everyone can select songs, comment, guess songs, continue singing, etc. It mainly consists of two models: Multi-Person Co-Anchoring and Multi-Person Mic Rotation. In the Multi-Person Co-Anchoring mode, one person sings while other co-anchoring users can listen and speak simultaneously, but the lead singer cannot hear the other speakers. However, the audience in the room can hear all the voices. The Multi-Person Mic Rotation mode allows a person to sing a portion of a song, after which it automatically transitions to the next person; meanwhile, other users can only listen and comment during the waiting period, without participating in voice chat.
    Online Karaoke scenarios require high synchronization and allow audiences to join in singing at any time, making it suitable to use the solution of the speaker pushing and pulling RTC streams while the audience pulls RTC mixed streams. Here, a mixing robot is needed to initiate the mixing command and push the mixed stream back to the TRTC room for the listeners to pull and watch.
    Note:
    For this scenario, it is recommended to use the solution of the speaker pushing and pulling RTC streams while the audience pulls RTC mixed streams.
    For specific technical details and precautions regarding the implementation of Karaoke Voice Chat Rooms, please refer to Online Karaoke Scenario Solutions.

    Interactive Gaming Room

    In scenarios like Werewolf, Murder Mystery, Dubbing, Truth or Dare, and Draw and Guess, rooms are created based on the game's progression, and the speaking permissions of players are controlled in sequence according to the game's progress.
    In interactive gaming scenarios, the number of participants is typically limited, and there is a need for frequent joining and leaving of the mic for gaming purposes. This scenario is suitable for the conventional approach of having the speaker pull and push RTC streams while the audience pulls RTC single streams. Game participants can request to speak at any time or choose to mute themselves until their character dies, forcing them to become listeners or exit the room.
    Note:
    This scenario recommends the solution of the speaker pulling and pushing RTC streams while the audience pulls RTC single streams.
    Interactive gaming rooms usually include the playing of local game audio effects, so attention must be paid to AEC processing and the selection of volume types.

    Supporting Products for the Solution

    System Level
    Product Name
    Application Scenarios
    Access Layer
    Provides low-latency, high-quality real-time interactive live streaming solutions for multi-person voice interaction, serving as the foundational capability for voice chat scenarios.
    Access Layer
    Provides room management and seat management capabilities based on group features, enables the sending and receiving of rich media messages such as live streaming room-wide messaging, public screen messages, as well as custom signaling and other communication needs.
    Cloud Services
    Provides real-time audio and video relayed push, along with accelerated media stream distribution services, as well as additional capabilities such as recording and pornography detection.
    Cloud Services
    For audio-video media, it offers an integrated high-quality media service that includes production and upload, storage, transcoding, media processing, media AI, accelerated distribution and playback, and copyright protection
    Data Storage
    Provides storage services for audio recording files and audio slicing files.
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support