Concept | Description | Example |
Chaos engineering | Chaos engineering is a discipline that conducts experiments on distributed systems. It updates the understanding of the system through practice, thereby understanding and discovering the unknown weaknesses of the system. The purpose is to build the ability and confidence of the system to resist out-of-control conditions in the production environment. | - |
Experiment | The process of verifying and improving system availability by injecting specified faults into specified locations of the system and observing the experimental results. | - |
Action | It refers to the atomic fault actions injected into the system during the experiment, including various fault injection scenes of IaaS, PaaS, and SaaS. In an experiment, users can freely combine and orchestrate multiple experiment actions. An action group is a collection of actions. | High CPU usage, CVM shutdown, and database primary/secondary switch |
Object | The instance object that the action acts on. | CVM and MySQL |
Template | Save valuable and frequently used experiments and scenes as experiment templates for quick reuse later. The templates include basic experiment information and action orchestration solution, and you only need to determine the experiment object for subsequent use. | Cross-AZ disaster recovery experiment template and network fault template |
Monitoring metrics | To determine whether the system is running stably and whether the fault injection is successful, the system steady-state metrics can be configured in advance to observe changes in steady-state metrics during experiments, perceiving system changes in real time. | Disk usage (%) |
Guardrail policy | Configure alarm metrics and trigger policies. When the alarm metrics reach the trigger threshold, the system can automatically stop the experiment and roll back the action to control the impact scope of the experiment. | If the disk usage (%) reaches 90%, the experiment will automatically stop. |
Was this page helpful?