Background
The message middleware plays a crucial role in distributed systems. However, in actual production environments, various factors can lead to a high CPU load on broker nodes. Here are some common scenarios:
High message throughput: If a specific topic or partition within the CKafka cluster receives a very high message throughput, the broker node needs to process a large number of read and write operations.
Large number of consumer groups: If a large number of consumer groups subscribe to the same topic or partition, the broker should distribute and manage messages for each consumer group.
Copy and synchronize: If the CKafka cluster has data copy and synchronization features enabled, the broker needs to process read and write operations for copy tasks and synchronization communication with other brokers.
Compression and decompression: If messages are stored compressed, the broker needs to perform compression and decompression operations, which may consume significant CPU resources.
Index and log compression: CKafka uses indexes to speed up message searches. If the index is too large or needs compression, the broker must perform index maintenance and compression operations.
High concurrent connections: If there are a large number of producers and consumers connected to the broker, the broker needs to process the establishment and maintenance of these connections, increasing CPU load.
When a broker node experiences high CPU load, several issues may arise:
Increased delay: High CPU load may slow down message processing, thereby increasing message transmission and processing delays. This can affect the speed at which consumers read messages from CKafka, potentially preventing them from obtaining the latest messages in time.
Decreased throughput: Due to the CPU resources being occupied by high-load tasks, the CKafka Broker may be unable to process more messages, leading to an overall decrease in throughput. This will affect the speed at which producers send messages and consumers consume messages.
Network congestion: High CPU load may cause the CKafka Broker to be unable to process network requests in time, leading to network congestion. This will affect data copy and synchronization with other brokers, potentially causing increased data copy delays or untimely synchronization.
Increased response time: Due to the high CPU load, the CKafka Broker may be unable to respond to client requests in time, leading to increased client wait time. This will affect the performance and response time of applications interacting with the CKafka cluster.
To address these issues, the CFG provides CKafka Broker high CPU load experiment actions. These experiment actions test the business system's ability to process unexpected delays and recovery when faced with high CPU loads on CKafka Brokers, thereby improving business security and stability.
Must-Knows
Instance type: This action is only open for fault injection on instances of the CKafka Pro Edition type. CKafka Standard Edition instances do not support experiments yet.
Instance status: It is recommended that the instance used for the experiment has real message production and consumption traffic, with the number of topic partitions greater than 3, to better observe the impact of the fault on the business.(non-mandatory item)
Experiment Preparation
Prepare a CKafka Pro Edition instance for the experiment.
Step 1: Create an experiment
2. In the left sidebar, select Experiment Management page, and click Create a New Experiment.
3. Click Skip and create a blank experiment.
4. After filling in the basic information, enter the Experiment Object Configuration. Select the Middleware > CKafka object type, and click Add Instance. After clicking Add Instance, all CKafka instances in the target region will be listed. You can filter instances based on Instance ID, VPC ID, Subnet ID, and Tags.
5. After you select the target instance, click Add Now to add the experiment action.
6. Select the experiment action Broker-CPU High Load, and then click Next.
7. Set action parameters. In this document, the CPU load rate 80% is selected and the duration is set to 200 s, then click Confirm.
8. Click Next to enter the Global Configuration. For global configuration, see Quick Start. 9. After confirmation, click Submit.
10. Click Experiment Details to start the experiment.
Step 2: Execute the experiment
1. Observe the instance monitoring data before the experiment, and you can focus on the monitoring metrics in Advanced Monitoring. You can view this on the CKafka Console. 2. As the experiment is manually executed, fault actions must be executed manually. Click Execute in Action Card to start fault injection.
3. While fault injection is in progress, you can click the links in the logs to observe in Advanced Monitoring.
4. Observe that the CPU utilization reaches the set value.
Was this page helpful?