tencent cloud

Feedback

Scenario Solution

Last updated: 2024-07-25 17:01:48

    Scenario Introduction

    According to data from iiMedia Research, in 2021, the number of online Karaoke users in China was about 510 million, with a penetration rate of approximately 49.7%. Online Karaoke offers a more immersive experience and its diverse gameplay caters to the personalized needs of different user groups, becoming one of the main projects in the online pan-entertainment field. Based on network technology innovations, online Karaoke apps continue to launch diverse singing patterns and gameplay, and the continuously enriched features have enhanced the practicality and playability of online Karaoke apps. This document will provide a detailed introduction to the online Karaoke scenario-based solution based on Tencent Real-Time Communication (TRTC) in the following sections.
    
    
    

    Implementation Scheme

    Typically, implementing a complete online Karaoke scenario involves multiple functional modules: room management, seat management, song selection management, karaoke management, scoring management, etc. Key actions and feature points under each functional module are shown in the following table:
    Functional Module
    Key Actions and Feature Points
    Room Management
    Room list, create a room, enter a room, exit a room, and terminate a room.
    Seat Management
    Become a speaker/listener, seat control, change a speaker, lock the seat, invite a listener to speak, and mute a speaker.
    Song Selection Management
    Song list display, searching songs, song selection, queue management, and selected song list.
    Karaoke Management
    Karaoke play mode, start/stop/switch songs, accompaniment and vocal volume adjustment, reverb/sound effects, original sound and accompaniment switch, and lyrics synchronization
    Scoring Management
    Singing Scoring and Pitch Line Display
    The overall business process of the online Karaoke scene is shown in the following diagram. The room owner creates a Karaoke room, and users can choose the Karaoke room they are interested in to enter. After entering the room, users can request to speak to participate in the interaction. After becoming a speaker, they can also choose their favorite songs to sing and wait in line. When it is their turn, they can sing along with the accompaniment. Of course, users can also choose to become a speaker directly to participate in a chorus. These are two different Karaoke play modes. During the singing process, there will be pitch scoring for individual sentences, and there will also be singing scoring for the entire song after the singing is finished.
    
    
    

    Room Management

    The Room Management module is primarily responsible for maintaining the room list, which includes functions such as create a room, enter a room, exit a room, and terminate a room. Additionally, a Karaoke room differs from regular rooms in that it requires a separate Karaoke room identifier to initiate related component management: Song Selection Management, Karaoke Management, Scoring Management, etc.
    Create Room: After users log in to the business system, they can create a room. The room list needs to be updated after a room is created.
    Enter a Room: Users can choose to enter an existing room. Upon entering, the current list of room members should be updated.
    Exit a Room: Users can choose to exit the current room. Upon exiting, the current list of room members needs to be updated with a delete operation.
    Terminate Room: After all users exit the room, it needs to be terminateed. Upon destruction, the room list needs to be updated with a delete operation.
    
    
    
    Note:
    Room Management is a necessary functional module for implementing online Karaoke but is not the main functional module. Specific implementation can be achieved through integration with business systems and IM&TRTC SDKs. For details, see Voice Chat Room > Room Management.

    Seat Management

    In a Karaoke room, seats are generally orderly and limited. Seat management primarily involves defining the number of seats in the room based on the business scenario, as well as managing the status of all seats in the current room. Seat management includes features such as become a speaker/listener, seat control, change a speaker, lock the seat, invite a listener to speak, and mute a speaker.
    After users enter a room, only idle seats can be applied for.
    After the room owner approves a user's speaker request, the corresponding seat status should change to occupied.
    When a user stops streaming and becomes a listener, the corresponding seat status should revert to idle.
    The room owner has the authority to lock the seat, invite a listener to speak, remove a speaker, mute a speaker, etc.
    Note:
    Seat management is a necessary functional module for implementing online Karaoke but is not the main functional module. Specific implementation can be achieved through integration with business systems and IM&TRTC SDKs. For details, see Voice Chat Room -> Seat Management.

    Song Selection Management

    Basic Introduction

    Song selection management is an important part of the online Karaoke scenario, mainly including features such as song list display, search for songs, song selection, and queue management, selected song list. Each Karaoke room needs to maintain a selected song list and an auto queue management feature, which needs to be implemented by the business backend. Meanwhile, aspects related to accompaniment resources such as song list display and song search are recommended to be implemented with accompaniment library products for overseas users.

    Implementation Process

    
    
    
    The entire song selection management, mainly involves the business-side app, business backend, and music library, where their respective functions are as follows:
    Business-Side App:
    Call the song selection API to report song information.
    Call the change song-switching API to notify the business backend.
    Call the singing confirmation API to notify the business backend.
    Business Backend:
    Maintain the selected song list.
    Send notifications to tell the business-side app to switch the song.
    Music Library:
    Provide authorized music resources for TRTC to play.
    Provide lyric files and pitch files matching the music resources.

    Karaoke Management

    The Karaoke system primarily includes functions such as: Karaoke play mode, start/stop/switch songs, accompaniment and vocal volume adjustment, reverb/sound effects, original sound and accompaniment switch, and lyrics synchronization. Below, we will introduce the implementation process of the Karaoke management module in detail through two typical Karaoke gameplay: solo singing and real-time chorus.

    Solo Singing

    Solo singing: Primarily in the interactive Karaoke scenario with multiple participants, after the anchor/audience members become speakers, they can proceed to select songs. Once a song selection is successful, it will be displayed collectively on the song selection platform. When it's someone's turn to select a song, the corresponding individual will play the song's accompaniment, start singing, and undergo scoring.
    Solution Architecture
    The overall solution primarily relies on the music library for song and lyric resources and TRTC for streaming the singer's vocals, song accompaniment, and streaming. The solution architecture is as follows:
    
    
    
    Specific Implementation
    In the solo singing scenario, different roles have different implementation processes. There are two roles: singer and audience. The description and differences of their roles are detailed in the table below:
    Roles
    Description
    Differences
    Singer
    The singer in the Karaoke room is evolved from the anchor/audience who selects songs and sings after becoming a speaker. After leaving the room, the room is automatically dissolved and the list of selected songs is automatically cleared.
    The role must be an anchor.
    Upstream audio and video (no video upstream black frame)
    Play BGM
    Send SEI information (sending lyric information)
    Song Selection
    Audience
    The audience in the Karaoke room plays the media stream of the singer or other people.
    The role is an audience, but can also become an anchor by becoming a speaker.
    Downstream Audio and Video Streams
    Receive SEI information (receive lyric information)
    Implementation Process
    Singer
    
    
    
    1. The anchor/audience creates/enters a TRTC room, and automatically becomes a singer after selecting a song.
    2. After the singer selects a song, the song/lyric is downloaded, and then the song is played through the BGM interface.
    3. If the singer does not bring up the video upstream, they need to enable the video upstream.
    4. Synchronize the lyric progress of everyone through SEI information.
    5. When the singer becomes a listener, all songs they selected will be cleared, and they revert to their original role.
    6. After the anchor/audience exits the room, the TRTC room will be dissolved.
    Note:
    Anchors/Audiences on the seat can select songs for themselves or others, but the corresponding singer must play the BGM; otherwise, it may cause asynchrony between the singer and the song due to latency (about 300 ms or more).
    Audience
    
    
    
    1. Anchor/Co-Anchor/Audience can create/enter a TRTC room.
    2. Monitor room song changes and load lyrics.
    3. Pull the singer's stream.
    4. Parse the SEI messages sent by the singer and synchronize the lyrics.

    Real-Time Chorus

    Real-time chorus refers to playing the song accompaniment simultaneously on all ends based on co-mic, and then performing the chorus on the mic. In a duo pattern, the lead and backing vocals can hear each other; in a multi-person pattern, all choristers can hear each other with almost no delay, achieving true real-time chorus.
    Solution Architecture
    In terms of media streams, the singers publish/playback streams to each other, while one leading singer uploads accompaniment, and the other singers play accompaniment locally, synchronized via NTP. Additionally, the accompaniment and all singers' voices are mixed through a mixing bot to form a single stream, which is then pushed back to the TRTC room, allowing the audience to hear the synchronized voices from all ends by pulling a single stream, achieving a multi-person chorus effect. The real-time chorus solution architecture is shown in the figure below:
    
    
    
    The advantages of this solution are:
    It reduces end-to-end latency.
    It provides a solution for users to join the chorus midway.
    It accurately synchronizes accompaniment, lyrics, and vocals between different ends.
    It improves the performance of devices on different ends and the accuracy of local time, and reduces the impact of network environment latency.
    Note:
    Depending on business needs, you can choose a real-time chorus solution for audio-only or audio and video scenarios. If it is a pure audio scenario, black frames need to be added to send SEI messages for lyric synchronization.
    The lead singer needs to use a sub-instance to upstream both the accompaniment and vocals at the same time; other singers only need to pull each other's vocal streams and play accompaniment locally; the audience only needs to pull one mixed stream.
    Specific Implementation
    In online Karaoke rooms, different roles have different feature permissions and implementation processes, divided into three roles: lead singer, chorus, and audience, as shown in the table below:
    Roles
    Description
    Differences
    Lead Singer
    The lead singer is responsible for selecting songs, sending chorus signalings, and sending SEI messages.
    The role must be an Anchor.
    Upstream accompaniment and vocals.
    Song selection and initiating chorus.
    Push Back Mixed Stream.
    Send SEI Message
    Chorus
    Chorus can receive and process chorus signalings, and participate in the chorus on the seat.
    The role must be an Anchor.
    Upstream Vocals
    Play Accompaniment Locally
    Receive Chorus Signals
    Audience
    After entering the Karaoke room, the audience can pull the stream from the seat and also participate in the chorus on the seat.
    The role must be an audience.
    Downstream mixed stream
    Receive SEI messages
    Request to Become an Anchor
    Implementation Process
    Lead Singer
    
    
    
    1. The lead singer needs to select songs on-demand and send chorus signalings.
    2. The lead singer creates a sub-instance to push vocals and accompaniment and pulls the vocals of other singers.
    3. After streaming, the lead singer is responsible for initiating the mixed stream push task.
    4. After starting to sing, play the accompaniment and synchronize the lyrics through the playback progress callback.
    5. SEI messages need to be sent to synchronize the song progress on the audience end.
    6. Al singers need to calibrate the local song playback progress according to NTP.
    Chorus
    
    
    
    1. The chorus pushes a vocal stream, pulling the vocal stream of chorus users on the seat.
    2. The singers need to listen and receive chorus signalings, preloading accompaniment resources.
    3. After they start to sing and play accompaniment locally, the singers synchronizes the lyrics through the playback progress callback.
    4. Al singers need to calibrate the local song playback progress according to NTP.
    Audience
    
    
    
    1. Upon entering the TRTC room, the audience receives the mixed chorus stream.
    2. Parse the song progress information in the SEI of the mixed stream for lyric synchronization.
    3. After the audience becomes a speaker, the mixed stream is stopped and switched to pulling the vocal stream on the seat, and the chorus mode is started.

    Scoring Management

    Basic Introduction

    The scoring function is also one of the mainstream play methods in the Karaoke scenario, mainly judging the results of pitch accuracy and sound quality during the actual singing process. It can be used for score comparisons after multi-person singing.

    Implementation Process

    In the entire scoring management process, singers and audiences have different implementations based on their user roles. Scoring is usually done locally by the singer and synchronized with other people in the room.
    Singer
    
    
    
    The anchor/audience creates/enters a TRTC room, and automatically becomes a singer after selecting a song.
    After the singer selects a song, the song/lyric is downloaded, and then the song is played through the BGM interface.
    The vocals captured by TRTC and the progress of the BGM playback are transmitted in real-time to the rating module.
    After the rating module produces the data in real time, it synchronizes with everyone in the room through SEI.
    Audience
    The audience side process is identical to the solo singing audience role action process; you can refer to the audience implementation process.

    Key Business Logic

    Accompaniment Synchronization Solution

    In real-time scenarios, it is necessary to synchronize the accompaniment progress in real-time after starting the performance to avoid increasing end-to-end latency due to accompaniment errors. Synchronizing the accompaniment requires NTP time-based synchronization because the local clocks of different devices are not consistent, resulting in some errors. Therefore, Tencent Cloud's self-developed NTP service is introduced. Additionally, users who join the ensemble midway also need to synchronize the accompaniment progress. Only after synchronizing the progress can they join in the chorus.
    The approach to accompaniment synchronization is: the lead singer convention starts to play the accompaniment at a future point in time (e.g., after a 3-second latency), and other users join in the chorus. All ends' time is based on NTP time, which is synchronized after the TRTC SDK initialization.
    
    
    
    The specific process is as follows:
    1. All ends calibrate the NTP time, update, and access the latest NTP time T from the TRTC cloud.
    2. The lead singer sends the chorus signalings (custom message), agreeing on the chorus start time T2.
    3. Preload the accompaniment locally based on T2, and schedule playback.
    4. Other chorus participants follow step 3 upon receiving the chorus signalings.
    5. During the process, verify the local accompaniment playback progress, and perform seek calibration when the difference between TE and TC exceeds 50 ms.
    Note:
    The 50 ms deviation here is a typical value, which can be adjusted appropriately based on the business tolerance, with a recommendation to fluctuate around 50 ms.

    Lyrics Synchronization Solution

    In the lyrics synchronization solution, the actions of three different roles are as follows:
    Lead Singer
    Chorus
    Audience
    NTP Time Synchronization
    Enable Black Frame Insertion
    Send SEI Message
    Local Lyric Synchronization.
    Update Lyrics Control
    NTP Time Synchronization
    Local Lyric Synchronization.
    Update Lyrics Control
    NTP Time Synchronization
    Receive SEI messages
    Update Lyrics Control
    Among them, the lead singer and chorus update the lyric progress locally based on the playback progress of the synchronized accompaniment; the audience needs to receive SEI messages sent from the lead vocalist containing the latest lyric progress to update the local lyric progress. The overall process of the lyrics synchronization solution is shown in the following diagram.
    
    
    

    Music Scoring Integration Solution

    The accompaniment scoring feature is an indispensable feature in the Online Karaoke scenario. You need to access the standardized audio file and MIDI pitch file of the music resources in advance, and then it is recommended to use the music scoring feature of the Intelligent Music Platform to score the singer's voice from TRTC on-cloud recording. The overall process of the Music Scoring Integration Solution is shown in the following diagram.
    
    
    
    1. The business backend starts the on-cloud recording task when the singer begins to sing, and stops the on-cloud recording task when the singer finishes singing.
    2. The TRTC backend will upload the recorded singing clip media file to the COS bucket specified when initiating the recording task.
    3. After the recording file is uploaded, the TRTC backend will callback the on-cloud recording results to the business backend.
    4. The business backend uses the Intelligent Music Platform's music scoring feature to create a music scoring task.
    5. The Intelligent Music Platform reads the singing clip and the standard pitch file from the COS bucket for scoring.
    6. The Intelligent Music Platform will write the JSON file containing the scoring results into the specified path in COS.
    7. After the music scoring is completed, the Intelligent Music Platform will call back the music scoring results to the business backend.
    8. The business backend reads the music scoring results JSON file from COS according to the callback path.
    9. The business backend analyzes the music scoring results and displays the scoring results on the singer's App.
    Note:
    The input file format for the Intelligent Music Platform's music scoring should use MP3 or WAV. If the on-cloud recording file format is HLS or AAC, audio transcoding is required.

    Best Practices for Audio Tuning Strategies

    In the entire Karaoke scenario, the audio quality is mainly affected by parameters such as sampling rate, number of channels, bitrate, and 3A. According to different room scenarios, we recommend various audio parameters and volume mixing schemes, as well as commonly used vocal and accompaniment synchronization alignment solutions.
    1. Best Parameter Configuration for Different Scenarios
    Room Scenario
    Entry Mode
    Audio Quality
    Volume Type
    Hidden Interface
    Solo Singing
    Video or CDN Push Requirements:
    LIVE
    Audio-only or Pure RTC Requirements:
    VOICE_CHATROOM
    MUSIC
    TRTCSystemVolumeTypeMedia
    enableBlackStream
    Real-Time Chorus
    Video or CDN Push Requirements:
    LIVE
    Audio-only or Pure RTC Requirements:
    VOICE_CHATROOM
    MUSIC
    TRTCSystemVolumeTypeMedia
    enableBlackStream
    enableChorus
    setLowLatencyModeEnabled
    Chat and Listen to Music
    VOICE_CHATROOM
    DEFAULT
    TRTCSystemVolumeTypeAuto
    No
    2. Best Volume Ratio for Different Scenarios
    The TRTC SDK has initial default values for voice collection and music playback. If in the default situation, there is suppression of voice by accompaniment in the live streaming room, leading to the voice being masked by music, you can adjust the voice and music volume ratio according to the recommended values in the table below.
    Room Scenario
    Recommended Configuration for Voice/Music/Sound Effects
    Solo Singing
    Voice Capture Volume: 60
    Music Playback Volume: 50
    Enable Reverb Effect: Yes
    Real-Time Chorus
    Chat and Listen to Music
    Vocal Capture Volume: 100
    Music Playback Volume: 30
    Enable Reverb Effect: No
    3. Voice and Accompaniment Synchronization Alignment
    Due to the JitterBuffer for local vocal capture, the JitterBuffer for song playback mixing, and the GAP that exists from when the human ear receives the accompaniment to when singing begins if the singer sings entirely in sync with the lyrics and accompaniment, the remote audience may perceive a certain latency and misalignment between the vocal, accompaniment, and lyrics. This issue can be improved through the following two methods.
    Enable Chorus Mode
    Android
    iOS
    JSONObject jsonObject = new JSONObject();
    try {
    jsonObject.put("api", "enableChorus");
    JSONObject params = new JSONObject();
    params.put("enable", true);
    params.put("audioSource", 0);
    jsonObject.put("params", params);
    mTRTCCloud.callExperimentalAPI(String.format(Locale.ENGLISH, jsonObject.toString()));
    } catch (JSONException e) {
    e.printStackTrace();
    }
    NSDictionary *jsonDic = @{
    @"api": @"enableChorus",
    @"params": @{
    @"enable": @(YES),
    @"audioSource": @(0)
    }
    };
    NSData *jsonData = [NSJSONSerialization dataWithJSONObject:jsonDic options:NSJSONWritingPrettyPrinted error:nil];
    NSString *jsonString = [[NSString alloc] initWithData:jsonData encoding:NSUTF8StringEncoding];
    [trtcCloud callExperimentalAPI:jsonString];
    Note:
    audioSource: 0 (Vocal), audioSource: 1 (Accompaniment); If using single instance streaming, set audioSource to 0 for all streams.
    Enable Low Latency Mode
    Android
    iOS
    JSONObject jsonObject = new JSONObject();
    try {
    jsonObject.put("api", "setLowLatencyModeEnabled");
    JSONObject params = new JSONObject();
    params.put("enable", true);
    jsonObject.put("params", params);
    mTRTCCloud.callExperimentalAPI(String.format(Locale.ENGLISH, jsonObject.toString()));
    } catch (JSONException e) {
    e.printStackTrace();
    }
    NSDictionary *jsonDic = @{
    @"api": @"setLowLatencyModeEnabled",
    @"params": @{
    @"enable": @(1)
    }
    };
    NSData *jsonData = [NSJSONSerialization dataWithJSONObject:jsonDic options:NSJSONWritingPrettyPrinted error:nil];
    NSString *jsonString = [[NSString alloc] initWithData:jsonData encoding:NSUTF8StringEncoding];
    [trtcCloud callExperimentalAPI:jsonString];

    Scenario Gameplay

    Solo Singing

    After becoming a speaker, the audience can select songs and wait in line. After the song starts to play, they can sing solo. This game mode is relatively simple and can be achieved using TRTC single-instance mixing and streaming.

    Real-Time Chorus

    After becoming a speaker, the audience sings a song with the lead singer at the same time. This game mode is relatively complex. The lead singer side needs to use TRTC dual-instance streaming, and all ends also need to pay attention to accompaniment synchronization and lyric synchronization.

    Mass Singing Competition

    Users can choose song rooms of different categories according to their preferences. The room will randomly play music clips, and users in the room can grab the microphone at any time to sing the music clips.

    Segmental Singing

    The same song is divided into segments and assigned to different speakers. After the lead singer sings a segment, other speakers sing the assigned music clips respectively.

    Cross-Room Singing Competition

    Anchors from different rooms sing, and the audience in their respective rooms helps their anchors. In addition to the Karaoke scenario, this game mode also involves TRTC cross-room competition, and it is necessary to pay attention to the subscription logic of audio and video streams in different rooms.

    Supporting Products for the Solution

    System Level
    Product Name
    Application Scenarios
    Access Layer
    Provides a low-latency, high-quality real-time interactive live streaming solution for multiple people's audio, which is the basic foundation for online Karaoke scenarios.
    Access Layer
    Provides room management and seat management capabilities based on group features, enables the sending and receiving of rich media messages such as live streaming room-wide messaging, public screen messages, as well as custom signaling and other communication needs.
    Access Layer
    Based on the self-developed music understanding technology of Tencent Media Lab, it helps users to deeply understand, analyze and create music, and provides capabilities such as lyric recognition, intelligent composition, song recognition, and music scoring.
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support