tencent cloud

Feedback

Overview

Last updated: 2024-09-26 15:34:19
    Tencent Smart Advisor-Chaotic Fault Generator (CFG) provides efficient, convenient, safe, and reliable fault injection services. In addition, it also provides industry templates, monitoring guardrails, and other core functions, and is committed to helping users promptly discover business disaster recovery risks and verify the effectiveness of high-availability plans, thereby improving system availability and resilience.

    Basic Concepts

    Before use of the CFG, understanding the relevant concepts will help you get started with product operations faster.
    Concept
    Description
    Example
    Chaos engineering
    Chaos engineering is a discipline that conducts experiments on distributed systems. It updates the understanding of the system through practice, thereby understanding and discovering the unknown weaknesses of the system. The purpose is to build the ability and confidence of the system to resist out-of-control conditions in the production environment.
    -
    Experiment
    The process of verifying and improving system availability by injecting specified faults into specified locations of the system and observing the experimental results.
    -
    Action
    It refers to the atomic fault actions injected into the system during the experiment, including various fault injection scenes of IaaS, PaaS, and SaaS. In an experiment, users can freely combine and orchestrate multiple experiment actions. An action group is a collection of actions.
    High CPU usage, CVM shutdown, and database primary/secondary switch
    Object
    The instance object that the action acts on.
    CVM and MySQL
    Template
    Save valuable and frequently used experiments and scenes as experiment templates for quick reuse later. The templates include basic experiment information and action orchestration solution, and you only need to determine the experiment object for subsequent use.
    Cross-AZ disaster recovery experiment template and network fault template
    Monitoring metrics
    To determine whether the system is running stably and whether the fault injection is successful, the system steady-state metrics can be configured in advance to observe changes in steady-state metrics during experiments, perceiving system changes in real time.
    Disk usage (%)
    Guardrail policy
    Configure alarm metrics and trigger policies. When the alarm metrics reach the trigger threshold, the system can automatically stop the experiment and roll back the action to control the impact scope of the experiment.
    If the disk usage (%) reaches 90%, the experiment will automatically stop.

    

    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support