Background
MongoDB instances provide multiple data storage nodes and HA mechanisms for users to ensure data security and high availability of services. To verify usage and disaster recovery of TencentDB for MongoDB in businesses of users, Tencent Smart Advisor-Chaotic Fault Generator provides multiple fault scenes to simulate storage node faults.
Experiment Implementation
Experiment Preparation
Purchase cross-availability zone cloud MongoDB instances, deploy a local or cloud server-side test environment, and connect MongoDB instances.
Scripts for simulating common client requests.
"""
Through simple ways of data reading and writing, the operation of a data volume of 100,000 records is simulated. During data insertion, faults are injected through the Chaotic Fault Generator. Observe the differences before and after fault injection.
This script is for reference only. In real experiments, it is advised that faults be simulated by means of business scenes that are closer to production environments.
"""
import pymongo
import random
mongodbUri = 'mongodb://mongouser:thepasswordA1@ip:port/admin'
client = pymongo.MongoClient(mongodbUri)
db = client.somedb
db.user.drop()
element_num = 3 * 10 ** 6
for id in range(element_num):
name = random.choice(['R9', 'caÒt', 'owen', 'lee', 'J'])
sex = random.choice(['male', 'female'])
try:
db.user.insert_one({'id': id, 'name': name, 'sex': sex})
except Exception as e:
print('error id', id)
content = db.user.find()
for i in content:
print(i)
Experiment Steps
Step 1: Create an Experiment
2. Click Skip and create a blank experiment.
Step 2: Add MongoDB Instances and Actions
1. In the experiment object configuration, you can enter or select VPC filtering and add MongoDB instances for the experiment through batch instance ID.
2. Add experiment actions.
MongoDB master node restart: Simulate the impact of MongoDB primary node restart on businesses and the HA mechanism for MongoDB.
MongoDB primary-secondary switch: Simulate the process of primary-secondary nodes switch in MongoDB and the scene in which node IP changes after the switch. The action supports two execution modes: prioritizing switch in the same availability zone and prioritizing cross-availability zone switch.
Step 3: Add Monitoring Metrics as Required
Step 4: Go to Experiment Details and Execute an Experiment
Check Results
Simulate user behaviors in a production environment using the script, and observe response at three time states: before, during and after a fault.
Primary Node Restart
Before a Fault
Business Performance: Data of a large size is continuously inserted into the database, and MongoDB processes the data at a stable rate.
Instance State: Primary nodes, secondary nodes and hidden nodes have been configured and taken effect.
During a Fault
Business Performance: When insertion into the script fails, the insertion can be retried through a configuration-driven retry mechanism, Mongo node abnormality will be detected, and temporary data backlog will occur.
Instance State: Select a node from secondary nodes as the primary node. As no node is selected from hidden nodes, the original secondary node will be selected as the primary node, and the original primary node will be regarded as the secondary node.
After a Fault
Business Performance: When backlog data and current data yet to be inserted are inserted simultaneously, MongoDB processes the data at nearly twice the rate.
Instance State: MongoDB automatically recovers primary-secondary nodes relation and selects the node with higher weight as the primary node. When the weights are identical, they will remain unchanged.
Conclusion
After completion, check that database data is normal, businesses are insensitive to faults, and the overall situation is as expected.
Primary-secondary Switch
Before a Fault
Business Performance: Data of a large size is continuously inserted into the database, and MongoDB processes the data at a stable rate.
Instance State: Primary nodes, secondary nodes and hidden nodes have been configured and taken effect.
During a Fault
Business Performance: Availability zone switch occurs in the node, and a situation similar to node restart occurs in the business, resulting in short-term data backlog.
Instance State: Based on the action mode, a node in the same availability zone or cross-availability zone node is preferentially selected from secondary nodes in the instance and promoted to the primary node. The original primary node is re-started as a secondary node.
After a Fault
Business Performance: After an availability zone switch, business operations are normal, and data insertion and data inquiry actions are unaffected.
Manual Recovery from a Fault
Instance State: Execute a fault recovery action to recover the state before the fault.
Conclusion
Business is insensitive to faults, data in the database is intact, and overall performance is normal and as expected.
Was this page helpful?