Background
CVM kernel faults may result in business failure and impact the stability of the whole system. Hardware faults, kernel software defects, drivers, incompatibility, and other issues may cause kernel faults and CVM failure. For services that rely on high availability, this will cause great inconvenience and loss to users.
To improve business reliability and stability, kernel fault action experiments are required. Through the experiment, the impact of a kernel fault on business can be verified and issues caused by the fault can be revealed in advance so that the faults can be solved quickly and effectively. In solving a kernel fault, please assign personnel with sufficient system knowledge and experience to complete the operation to avoid further damage to the system.
Experiment Implementation
Step 2: Experiment Preparation
Prepare several CVM instances in which TAT (TAT) have been installed.
Step 2: Experiment Orchestration
2. Click Skip and create a blank experiment. Fill in the basic information of the experiment and action groups, and add a target CVM instance.
3. Add an experiment action, select Kernel Fault action in CPU Resources, click Next, and go to parameter configuration.
4. Configure fault action parameters. There is no required parameter for the action. You can click Confirm to complete adding.
5. Confirm the configuration, and click Submit experiment to complete the creation.
Step 3: Experiment Execution
1. Go to experiment details, and click Go to the action group for execution.
2. Click Execute to start fault task assignment.
3. Check fault results: The existing connection is interrupted and the instance is restarted.
4. Execute recovery actions.
Note:
Different operating systems have different policies for coping with kernel faults. Automatic restart of the computer is a common method. If there is no response from the operating system, you can manually execute recovery actions on the platform to force a restart.
Was this page helpful?