This document describes how to quickly get started with Tencent Smart Advisor-Chaotic Fault Generator.
Introduction
1. Overview: Provide features such as beginner operation guides, access to the event experience, and popular experiment templates to help you obtain product information in a timely manner.
2. Experiment Management: Create a new experiment and manage all historical experiments.
3. Action Library Management: View details of the platform's fault action library and manage custom action scripts.
4. Template Library Management: View platform-recommended experiment templates and manage custom template library.
5. Agent Management: Manage the agents pre-installed in the cluster used for TKE-type fault actions.
Directions
To validate the system's fault tolerance, availability, and other performance metrics, you can inject appropriate faults into the system and observe its behavior. This allows you to identify potential issues within the system and address them promptly. Let's take the High CPU Utilization experiment as an example to demonstrate how to quickly create a chaos engineering experiment.
Step 1: Create an Experiment
2. When creating a new experiment, you can either select a platform-recommended Industry Experience Library or select Skip and create a blank experiment. If you create the experiment using a template, the basic experiment information and failure action orchestration details will be automatically populated. You only need to select the instance resources. You can choose to skip this step.
3. Go to the Basic Information Filling page, and fill in Experiments Name, Experiments Description and Tag. Among them, Tag can be used to manage and search for experiments. Click Next.
4. Go to the Experiment Object Configuration page, fill in the Action Group orchestration details, and select Add Instance.
Within the same action group, fault injection can only be performed on the same object type.
You can add multiple actions within an action group as needed, allowing for flexible combination and orchestration.
5. Click Add Instance, and select the instance resources where you want to inject faults, such as selecting a CVM instance.
You can search by instance type and instance name.
Batch addition of instances is supported.
6. After completing adding instances, click Add an experiment action.
7. Click Add Now to add an experiment action, such as selecting the High CPU Utilization fault action under the CPU Resources category, and then click Next.
The platform supports searching for a variety of fault atomic actions.
Custom fault injection can be achieved by uploading custom scripts, allowing you to meet specific business needs.
8. Set action parameters, including configuring wait time before and after the action and timeout period under General Parameters to control the experiment's pace.
Set the Duration for this action in the Action Parameters. After the setting is completed, click Confirm to complete the action addition.
9. Click Next to proceed to the Global Configuration page, where you can select the action Execution Method, and configure Guardrail Policyand Monitoring Metrics.
Click Choose guardrail policy (this tutorial does not include configuration):
Click Add monitoring metrics, and select a metric such as CPU Usage to observe the fault injection in real-time.
10. After the configuration is completed, select Submit to finalize the experiment. The experiment will be successfully created. The system automatically redirects to the Experiment Details page, where a pre-check of the environment is performed. This includes verifying the agent installation, TAT installation status, operating system version, and other relevant factors. The purpose of this check is to ensure that the experiment can proceed smoothly.
Note:
The experiment environment check feature is only intended as a risk warning and will not block the experiment process. Even if the environment pre-check does not pass, you can still proceed with the experiment. However, this may lead to experiment failure. To ensure the experiment is executed correctly, it is recommended to follow the pre-check guide before continuing the process.
11. The experiment Environment Check has passed. Click View Detection Details to view the experiment information.
Step 2: Execute the Experiment
1. Go to the Experiment Details page, and click Execute in the upper right corner to start the experiment.
Note:
If during the experiment creation process, the Automatic Execution method was selected, the system will automatically start executing the actions after clicking the execution button in the upper right corner, without requiring manual intervention.
If the execution method is set to Manual, after clicking Execution in the upper right corner, you will still need to manually click Start within the action group to proceed.
If any action execution fails, the system will automatically switch to Manual mode. Manual intervention is required to click Execute or Skip the action within the action group.
2. In the experiment action group area, select the fault action you want to execute, then click Execute to start injecting the fault.
3. During the experiment execution, click an action card to expand and view the action execution details and View Log.
Step 3: End the Experiment
1. Once the fault action is successfully executed, click End Experiment.
2. Fill in Experiment Results, to document any issues encountered during the experiment, emergency response measures, and other relevant details to facilitate subsequent review and analysis.
3. Click Generate Experiment Report, to export a report of the experiment with a single click.
You can review the CFG experiment report, which includes the following: basic experiment information, experiment action groups, experiment logs, issue records, etc.
Step 4: Template Library Management
For experiments that need to be conducted frequently or have proven effective in past runs, you can extract the experiment orchestration elements and create them as the template library. The templates can then be quickly reused in future experiments to improve efficiency. Additionally, users can activate or disable custom templates in the Template Library Management.
Was this page helpful?