Overview

Tencent Cloud Smart Advisor (TSA) is a cloud governance platform that provides multiple vertical applications in ITOM fields. Relying on the experience of Tencent Cloud's massive operation and maintenance experts, it optimizes cloud infrastructure with multiple governance solutions such as Cloud Risk Assessment and Chaotic Fault Generator to improve system security and service reliability.
Discover business risks through out-of-the-box cloud resource risk assessment, provide online optimization suggestions based on actual needs, improve business continuity, and combine it with efficient and safe fault experiment services to help you promptly discover business disaster recovery risks and verify the effectiveness of high-availability plans, thereby improving system availability and resilience.
TSA - Cloud Risk Assessment
Cloud Risk Assessment is an out-of-the-box product that assesses risks for Tencent Cloud resources. After Cloud Risk Assessment is granted to a CAM service role, it can quickly assess and analyze risks in cloud resources, application architecture, business performance, and security and then offer optimization suggestions online according to the actual business usage, helping improve the system security, business stability, and service reliability.
List of supported products
Cloud Risk Assessment provides a wide variety of assessment items, flexible assessment configurations, and system optimization suggestions to help you improve business continuity.
It offers various risk assessment items in multiple dimensions, such as security, reliability, cost, service restriction, and performance for different Tencent Cloud products. For cloud products that currently support evaluation, please refer to Assessment Settings.
More Tencent Cloud products and services will be supported, and more risk assessment items will be available.
TSA - Chaotic Fault Generator
Chaotic Fault Generator (CFG) provides efficient, convenient, safe, and reliable fault injection services. In addition, it also provides industry templates, monitoring guardrails, and other core functions, and is committed to helping users promptly discover business disaster recovery risks and verify the effectiveness of high-availability plans, thereby improving system availability and resilience.
Basic Concepts
Before use of the CFG, understanding the relevant concepts will help you get started with product operations faster.
Concept
Description
Example
Chaos engineering
Chaos engineering is a discipline that conducts experiments on distributed systems. It updates the understanding of the system through practice, thereby understanding and discovering the unknown weaknesses of the system. The purpose is to build the ability and confidence of the system to resist out-of-control conditions in the production environment.
-
Experiment
The process of verifying and improving system availability by injecting specified faults into specified locations of the system and observing the experimental results.
-
Action
It refers to the atomic fault actions injected into the system during the experiment, including various fault injection scenes of IaaS, PaaS, and SaaS. In an experiment, users can freely combine and orchestrate multiple experiment actions. An action group is a collection of actions.
High CPU usage, CVM shutdown, and database primary/secondary switch
Object
The instance object that the action acts on.
CVM and MySQL
Template
Save valuable and frequently used experiments and scenes as experiment templates for quick reuse later. The templates include basic experiment information and action orchestration solution, and you only need to determine the experiment object for subsequent use.
Cross-AZ disaster recovery experiment template and network fault template
Monitoring metrics
To determine whether the system is running stably and whether the fault injection is successful, the system steady-state metrics can be configured in advance to observe changes in steady-state metrics during experiments, perceiving system changes in real time.
Disk usage (%)
Guardrail policy
Configure alarm metrics and trigger policies. When the alarm metrics reach the trigger threshold, the system can automatically stop the experiment and roll back the action to control the impact scope of the experiment.
If the disk usage (%) reaches 90%, the experiment will automatically stop.

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

Concept	Description	Example
Chaos engineering	Chaos engineering is a discipline that conducts experiments on distributed systems. It updates the understanding of the system through practice, thereby understanding and discovering the unknown weaknesses of the system. The purpose is to build the ability and confidence of the system to resist out-of-control conditions in the production environment.	-
Experiment	The process of verifying and improving system availability by injecting specified faults into specified locations of the system and observing the experimental results.	-
Action	It refers to the atomic fault actions injected into the system during the experiment, including various fault injection scenes of IaaS, PaaS, and SaaS. In an experiment, users can freely combine and orchestrate multiple experiment actions. An action group is a collection of actions.	High CPU usage, CVM shutdown, and database primary/secondary switch
Object	The instance object that the action acts on.	CVM and MySQL
Template	Save valuable and frequently used experiments and scenes as experiment templates for quick reuse later. The templates include basic experiment information and action orchestration solution, and you only need to determine the experiment object for subsequent use.	Cross-AZ disaster recovery experiment template and network fault template
Monitoring metrics	To determine whether the system is running stably and whether the fault injection is successful, the system steady-state metrics can be configured in advance to observe changes in steady-state metrics during experiments, perceiving system changes in real time.	Disk usage (%)
Guardrail policy	Configure alarm metrics and trigger policies. When the alarm metrics reach the trigger threshold, the system can automatically stop the experiment and roll back the action to control the impact scope of the experiment.	If the disk usage (%) reaches 90%, the experiment will automatically stop.

tencent cloud

Sign Up

Log in

Compute

Microservice

Data Migration

Database SaaS Tool

Data Security

Application Security

Big Data

Image Creation

Internet of Things

Stream Services

Cloud Real-time Rendering

Management and Audit Tools

Edge Computing

Serverless

Relational Database

Networking

Business Security

Domains & Websites

Face Recognition

AI Platform Service

Middleware

Media On-Demand

Game Services

Developer Tools

Container

Essential Storage Service

Enterprise Distributed DBMS

CDN and Acceleration

Security Services

Enterprise Applications

Voice Technology

Natural Language Processing

Communication

Media Process Services

Education Sevices

Monitor and Operation

Distributed cloud

Data Process and Analysis

NoSQL Database

Network Security

Cloud Security

Office Collaboration

Tencent Big Model

Optical Character Recognition

Interactive Video Services

Media SDK

Cloud Resource Management

More

TSA - Cloud Risk Assessment

List of supported products

TSA - Chaotic Fault Generator

Basic Concepts