Background
The message middleware plays a crucial role in distributed systems. However, in actual production environments, various factors can lead to high disk I/O load on broker nodes. Here are some common real-world scenarios:
High concurrent write: A large number of messages are written to the broker at the same time. This can occur in high-traffic message production environments, such as large-scale real-time data transmissions or high-frequency log records.
Message storage pressure: If the volume of messages stored on the broker is very large, or if the messages are of considerable size, this may occur in long-running systems or applications that need to retain a large number of historical messages.
High volume of message copy and synchronization: When the broker is part of a cluster, the process of copying and synchronizing a large number of messages increases the disk I/O load. This includes copying messages from one broker to others to ensure high availability and data redundancy.
Index and retrieval operations: If the message storage on the broker uses index structures, maintaining and retrieving indexes when the data volume is large can also increase the disk I/O load. This may occur in applications that require quick message retrieval, such as inquiring messages based on specific conditions.
Disk fault and recovery: When a disk fault occurs or data recovery is needed, the broker executes disk I/O-intensive operations, such as recovering data from backups or fixing disks.
To address these issues, the CFG provides CKafka Broker high disk I/O load experiment actions. These experiment actions test the business system's ability to process unexpected delays and recovery when faced with high disk I/O loads on CKafka Brokers, thereby improving business security and stability.
Must-Knows
Instance type: This action is only open for fault injection on instances of the CKafka Pro Edition type. CKafka Standard Edition instances do not support experiments yet.
Instance status: It is recommended that the instance used for the experiment has real message production and consumption traffic, with the number of topic partitions greater than 3, to better observe the impact of the fault on the business.(non-mandatory item)
Experiment Preparation
Prepare a CKafka Pro Edition instance for the experiment.
Step 1: Create an experiment
2. In the left sidebar, select Experiment Management page, and click Create a New Experiment.
3. Select Skip and create a blank experiment.
4. After filling in the basic information, enter the Experiment Object Configuration. Select the Middleware > CKafka for the object type, and click Add Instance. After clicking Add Instance, all CKafka instances in the target region will be listed. You can filter instances based on Instance ID, VPC ID, Subnet ID, and Tags.
5. After you select the target instance, click Add Now to add the experiment action.
6. Select the experiment action Broker disk IO high load, and then click Next.
7. After setting action parameters, click Confirm.
8. Click Next to enter the Global Configuration. For global configuration, see Quick Start. 9. After confirmation, click Submit.
10. Click Experiment Details to start the experiment.
Step 2: Execute the experiment
Note:
Currently, the CKafka console does not have disk I/O load monitoring metrics. When CKafka traffic is high, you can observe production and consumption duration monitoring metrics. For disk I/O load monitoring data, you can submit a ticket to contact CKafka Ops. 1. Observe the instance monitoring data before the experiment, and you can focus on the monitoring metrics in Advanced Monitoring. You can view this on the CKafka Console. 2. As the experiment is manually executed, fault actions must be executed manually. Click Execute in Action Card to start fault injection.
3. While fault injection is in progress, you can click the links in the logs to view monitoring metrics such as production and consumption duration in Advanced Monitoring.
Was this page helpful?