tencent cloud

Feedback

TDMQ for RabbitMQ Broker Down

Last updated: 2024-09-26 15:49:18

    Background

    TDMQ for RabbitMQ (TDMQ for RabbitMQ, referred to as TDMQ RabbitMQ version) is a message queue service independently developed by Tencent. It supports the AMQP 0-9-1 protocol and is fully compatible with all components and concepts of the open-source RabbitMQ. Additionally, it provides strengths like compute-storage separation and flexible scaling-in/out. TDMQ for RabbitMQ is equipped with highly flexible routing to accommodate various business message shipping rules. It can buffer upstream traffic pressure and ensure the stable running of the message system. The persistence mechanism ensures the high reliability of TDMQ for RabbitMQ. Set the persistence of exchanges, queues, and messages to prevent metadata and message content loss after service restarts. The messages are stored using a three-replica policy, allowing for quick data migration in case of a physical machine fault, ensuring 3 backups of user data are available. The service availability reaches 99.95%.
    To help users verify TDMQ for RabbitMQ's disaster recovery capabilities in the face of an availability zone-level fault, the CFG provides an availability zone-level broker down fault experiment action for TDMQ for RabbitMQ (cross-AZ instances) to simulate a real availability zone disaster. By conducting a disaster recovery experiment on the Broker, users can verify the disaster recovery capabilities of their business, test the durability mechanism, and evaluate the scope and duration of the fault's impact.

    Must-Knows

    The broker down fault experiment action only supports TDMQ for RabbitMQ instances deployed across availability zones. Otherwise, fault injection cannot be performed. For instances that are not deployed across availability zones, please upgrade to a cross-availability zone instance before conducting the experiment.
    If you need to inject faults into multiple instances, it is recommended to split them into multiple action groups, with each action group processing one instance.
    After fault injection, there may be momentary disconnections on the producer or consumer side. Please proceed with caution.

    Experiment Preparation

    Prepare a TDMQ for RabbitMQ instance deployed across availability zones that is ready for the experiment.

    Step 1: Create an experiment

    2. In the left sidebar, select Experiment Management page, and click Create a New Experiment.
    3. Click Skip and create a blank experiment.
    4. After filling in the basic information, enter the Experiment Object Configuration. Select the Middleware > RabbitMQ object type, and click </5>Add Instance</5>. After you click Add Instance, all RabbitMQ instance information in the target region will be listed. You can filter instances based on Instance ID and Instance Name.
    5. After you select the target instance, click Add Now to add the experiment action.
    6. Select the experiment action Broker Down, and then click Next.
    7. Select the corresponding availability zone for injection, then click Confirm.
    8. Click Next to go to Global Configuration. See Quick Start for Global Configuration.
    9. After confirmation, click Submit.
    10. Click Experiment Details to start the experiment.

    Step 2: Execute the experiment

    1. Before executing the experiment, you can go to the TDMQ > RabbitMQ > Cluster Management section and use the Web Console Access Address of the corresponding instance to view the RabbitMQ Console for observation.
    2. After entering the RabbitMQ Console, you can see the survival status of the corresponding Broker.
    
    
    
    3. Since the experiment is manual, manually execution of the fault action is needed. As the experiment is manually executed, fault actions must be executed manually. Click Execute in Action Card to start fault injection.
    4. Once the fault injection is successful, you can click the action card to view the corresponding execution details. It can be observed that the fault is successfully initiated and that the broker in the specified availability zone of the instance has gone offline.
    5. Go to the RabbitMQ Web Console, and you can see that one broker node is down and not running.
    
    
    
    You can also see on the Tencent Cloud TDMQ > RabbitMQ > Cluster Management page that the current instance is in abnormal status, indicating that a broker has not started up.
    6. Click the Execute button of the recovery action to execute the recovery action.
    7. It takes some time to resume execution. Once the recovery is complete, you can observe through the RabbitMQ Web Console that the down broker node has restarted.
    
    
    
    You will also see on the Tencent Cloud TDMQ > RabbitMQ > Cluster Management page that the current instance has recovered to normal status.
    
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support