tencent cloud

Feedback

Simulating MongoDB Storage Node Fault

Last updated: 2024-09-26 15:47:38

    Background

    MongoDB instances provide multiple data storage nodes and HA mechanisms for users to ensure data security and high availability of services. To verify usage and disaster recovery of TencentDB for MongoDB in businesses of users, Tencent Smart Advisor-Chaotic Fault Generator provides multiple fault scenes to simulate storage node faults.

    Experiment Implementation

    Experiment Preparation

    Purchase cross-availability zone cloud MongoDB instances, deploy a local or cloud server-side test environment, and connect MongoDB instances.
    Scripts for simulating common client requests.
    #!/usr/bin/python
    """
    Through simple ways of data reading and writing, the operation of a data volume of 100,000 records is simulated. During data insertion, faults are injected through the Chaotic Fault Generator. Observe the differences before and after fault injection.
    This script is for reference only. In real experiments, it is advised that faults be simulated by means of business scenes that are closer to production environments.
    """
    import pymongo
    import random
    
    # During an experiment, use the MongoDB instance Uri where a fault is injected for replacement. See MongoDB [Instance Details-Network Configuration-Connection Address] for details.
    mongodbUri = 'mongodb://mongouser:thepasswordA1@ip:port/admin'
    client = pymongo.MongoClient(mongodbUri)
    # Database assignment
    db = client.somedb
    # Data sets deletion
    db.user.drop()
    # Customize the size of data to be inserted. It is advised that a larger data size be used to allow observation during experiments.
    element_num = 3 * 10 ** 6
    for id in range(element_num):
    # Insert random documents
    name = random.choice(['R9', 'caÒt', 'owen', 'lee', 'J'])
    sex = random.choice(['male', 'female'])
    try:
    db.user.insert_one({'id': id, 'name': name, 'sex': sex})
    except Exception as e:
    print('error id', id)
    # Query full-load documents
    content = db.user.find()
    for i in content:
    print(i)

    Experiment Steps

    Step 1: Create an Experiment

    1. Log in to the Tencent Smart Advisor > Chaotic Fault Generator, go to the Experiment Management page, and click Create a New Experiment.
    2. Click Skip and create a blank experiment.

    Step 2: Add MongoDB Instances and Actions

    1. In the experiment object configuration, you can enter or select VPC filtering and add MongoDB instances for the experiment through batch instance ID.
    2. Add experiment actions.
    MongoDB master node restart: Simulate the impact of MongoDB primary node restart on businesses and the HA mechanism for MongoDB.
    MongoDB primary-secondary switch: Simulate the process of primary-secondary nodes switch in MongoDB and the scene in which node IP changes after the switch. The action supports two execution modes: prioritizing switch in the same availability zone and prioritizing cross-availability zone switch.

    Step 3: Add Monitoring Metrics as Required

    Step 4: Go to Experiment Details and Execute an Experiment

    Check Results

    Simulate user behaviors in a production environment using the script, and observe response at three time states: before, during and after a fault.

    Primary Node Restart

    Before a Fault
    Business Performance: Data of a large size is continuously inserted into the database, and MongoDB processes the data at a stable rate.
    Instance State: Primary nodes, secondary nodes and hidden nodes have been configured and taken effect.
    During a Fault
    Business Performance: When insertion into the script fails, the insertion can be retried through a configuration-driven retry mechanism, Mongo node abnormality will be detected, and temporary data backlog will occur.
    Instance State: Select a node from secondary nodes as the primary node. As no node is selected from hidden nodes, the original secondary node will be selected as the primary node, and the original primary node will be regarded as the secondary node.
    After a Fault
    Business Performance: When backlog data and current data yet to be inserted are inserted simultaneously, MongoDB processes the data at nearly twice the rate.
    Instance State: MongoDB automatically recovers primary-secondary nodes relation and selects the node with higher weight as the primary node. When the weights are identical, they will remain unchanged.
    Conclusion
    After completion, check that database data is normal, businesses are insensitive to faults, and the overall situation is as expected.

    Primary-secondary Switch

    Before a Fault
    Business Performance: Data of a large size is continuously inserted into the database, and MongoDB processes the data at a stable rate.
    Instance State: Primary nodes, secondary nodes and hidden nodes have been configured and taken effect.
    During a Fault
    Business Performance: Availability zone switch occurs in the node, and a situation similar to node restart occurs in the business, resulting in short-term data backlog.
    Instance State: Based on the action mode, a node in the same availability zone or cross-availability zone node is preferentially selected from secondary nodes in the instance and promoted to the primary node. The original primary node is re-started as a secondary node.
    After a Fault
    Business Performance: After an availability zone switch, business operations are normal, and data insertion and data inquiry actions are unaffected.
    Manual Recovery from a Fault
    Instance State: Execute a fault recovery action to recover the state before the fault.
    Conclusion
    Business is insensitive to faults, data in the database is intact, and overall performance is normal and as expected.
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support