High Utilization of CVM Resources (CPU, Memory, and Disk)

Tencent Smart Advisor

Release Notes

Product Introduction

Overview

Features

Introduction to TSA - Cloud Risk Assessment

Introduction to TSA - Chaotic Fault Generator

Strengths

Strenths to TSA - Cloud Risk Assessment

Strenths to TSA - Chaotic Fault Generator

Use Cases

Purchase Guide

Getting Started

Quick Start to TSA - Cloud Risk Assessment

Quick Start to TSA - Chaotic Fault Generator

Quick Start with the Console

Quick Start with API

Operation Guide

Operation Guide to TSA - Cloud Risk Assessment

Inspection Report

Generating Report

Subscribing to Report

Managing Custom Report Templates

Assessment Items

Enabling/Disabling Assessment Items

Modifying Custom Rules

Ignoring/Adding Resources

Risks

Querying Risk Trends

Querying Risk Details

Operation Guide to TSA - Chaotic Fault Generator

Template Library

Using Industry Template Library

Creating a Template Library

Experiments

Pre-Checking Environment for Chaos Engineering Experiments

Creating an Experiment

Exporting Experiment Reports

Fault Action

Editing Action Parameters

Concurrent Injection of Multiple Action Groups in Experiments

Guide to Viewing Action Execution Duration Data

Guardrail Monitoring

Using Guardrails

Tag

Managing Permissions with Tags

Agent Management

Fault Action Library

Compute

JVM Process CPU at Full Load

Cross-AZ Experiment in CVM

CVM DNS Unavailability Experiment

CVM Domain Name Parsing Tampering Experiment

CVM System Time Skew

CVM Disk IO Hang Fault Experiment

CVM Memory OOM and Disk IO Load

Experiments on CVM Intra-host Network Disorder

CVM Kernel Faults

Experiments on CVM Intra-host Network Latency

High Utilization of CVM Resources (CPU, Memory, and Disk)

Experiments on CVM Intra-host Network Corruption

Experiments on CVM Intra-host Network Duplication

Experiments on CVM Intra-host Network Occupation

CVM Network Interruption

CVM Ping Unreachable

Cloud API Ban in CVM

Database

Primary-secondary Switch in TencentDB for PostgreSQL

MySQL Instance Overall Unavailable

Primary-secondary Switch in MySQL

Setting Maximum Number of Connections in MySQL

Primary Node Fault Experiment on TencentDB for MySQL

TencentDB for MySQL Read-only Instance Group Unavailable

Primary-secondary Switch in TDSQL for MySQM

Primary-secondary Switch in MariaDB

Primary/Replica Switch in TDSQL-C

Practice of TencentDB for Redis Proxy Node Faults

TencentDB for Redis Primary Node Fault

Primary-secondary Nodes in TencentDB for Redis Instance Unavailable

Simulating Primary-secondary Switch in Redis

Simulating MongoDB Storage Node Fault

Simulating Self-Built MySQL Crash Through Network Blocking

Restart TencentDB for MySQL

Primary-secondary Switch in SQL Server

Network

VPC Subnet Network Isolation

NAT Gateway Fault Experiment Case

Container

Simulating Container Resource Network Faults

Experiment on Container Resource Pod Operation Faults

Experiment on Container Resource Node Faults

Experiment on Container Resource Application Process Faults

Standard Cluster and Serverless Cluster Super Node Faults

Serverless Pod Fault Experiment Case

Cluster Node Resource (CPU, Memory, Disk) Stress Test Faults

Serverless Pod Virtual Node Shutdown Faults

Simulating Serverless Cluster Pod Network Faults

High Cluster Pod Resource (CPU, Memory, and Disk) Utilization Rate

Cluster Pod TencentCloud API Ban

Big Data

Elasticsearch Service Node Down

Cloud Load Balancer

CLB Stop Fault

Message Queue

CKafka Broker High Disk IO Load

CKafka Broker High CPU Load

CKafka Broker Down

TDMQ for RabbitMQ Broker Down

Direct Connect

Simulating DC Tunnel Disconnection Faults

Custom Actions

Expanding Fault Injection Actions with Custom Scripts

Performing Single-Core CPU Stress Test with Custom Actions

Implementing CPU Accumulation Faults with Custom Actions

Implementing CRS Connection Count Increase with Custom Actions

Injecting PowerShell Scripts for Windows Systems

Cloud Streaming Services (CSS)

Stream Push Interruption

Stream Push Disabled

Primary-Secondary Stream Switch

Primary-secondary Stream Single Path Interruption

Permission Management

Role Permissions Related to Service Authorization

Permission Management Guide to TSA - Chaotic Fault Generator

Overview

Authorization Policy Syntax

Authorizable Resource Types

Service Authorization and Role Permissions

Sub-users and Authorization

FAQs

FAQs: TSA - Cloud Risk Assessment

FAQs: TSA - Chaotic Fault Generator

Related Protocol

DATA PRIVACY AND SECURITY AGREEMENT MODULE CHAOTIC FAULT GENERATOR

DocumentationTencent Smart AdvisorOperation GuideOperation Guide to TSA - Chaotic Fault GeneratorFault Action LibraryComputeHigh Utilization of CVM Resources (CPU, Memory, and Disk)

High Utilization of CVM Resources (CPU, Memory, and Disk)

Download PDF

Last updated: 2025-03-24 15:23:00

High Utilization of CVM Resources (CPU, Memory, and Disk)

Last updated: 2025-03-24 15:23:00

Download PDF

Background
As one of the most basic cloud resources, Cloud Virtual Machine (CVM) is widely used. When a CVM is used, program errors, improper configuration, and other factors may result in faults such as high CPU utilization, high memory utilization, and high disk partition utilization, which will lead to CVM performance degradation and even service unavailability so users will suffer a loss.
To improve CVM reliability and stability, fault simulation experiments are required to verify the capability of the system for normal operation when utilization of resources such as CPU, memory, and disk is excessively high so that contingency plans can be prepared in advance.
Experiment Implementation
Step 1: Experiment Preparation
Prepare a CVM instance available for the experiment.
Go to the agent management page, and install an agent for the CVM node. For specific installation steps, see Agent Management for installation.
Step 2: Experiment Orchestration
1. Log in to the Tencent Cloud Smart Advisor > Chaotic Fault Generator, go to the Experiment Management page, click Create a New Experiment, and click Skip and create a blank experiment.
2. Fill in the basic information of the experiment.
3. Fill in the experiment action group information, and select Compute-CVM.
4. Added experiment instances.
5. To add an experiment action, click Add Now, and configure fault action parameters.
Configure High CPU utilization fault action parameters.
Note:
CPU Utilization: Specify CPU load percentage, which is 0 to 100.
Duration: Duration of a fault action, upon lapse of which, the agent will automatically recover the fault.
Scheduling Priority: It affects process priority in CPU scheduling. A lower nice value makes it more likely that the process would have a CPU time slice so that its execution priority can be improved. It is effective only if utilization is 100%.
Configure High memory utilization fault action parameters.
Note:
Memory Usage Rate: Specify a memory load percentage that is 0 to 100.
Duration: Duration of a fault action, upon lapse of which, the agent will automatically recover the fault.
Enable OOM Protection: If it is enabled, the possibility of fault process OOM-KILL will be reduced, and business processes will be killed first.
Memory Occupation Rate: Memory usage increase per second.
Configure High disk usage fault action parameters.
Note:
Disk Directory: A disk directory to be populated, i.e., a directory where files are written.
File size: Size of a file populated.
Disk Usage Rate: Learn disk usage through staf commands, and calculate the file size required for specified utilization.
Reserved space: Size of remaining space.
Duration: Duration of a fault action, upon lapse of which, the agent will automatically recover the fault.
If there are file size, disk utilization, and reserved space parameters, the priority calculation logic is disk utilization > reserved space > file size.
Configure Disk IO load fault action parameters.
Note:
Disk Directory: Specify a directory to enhance disk IO, which will apply to the disk it resides on.
Mode: Provide both read and write modes to execute high loads.
Block Size: Specify block size for every read or write.
Number of Blocks: Specify number of blocks to be copied.
Duration: Duration of a fault action, upon lapse of which, the agent will automatically recover the fault.
6. After action parameter configuration, click Next. Configure Guardrail Policy and Monitoring Metrics considering actual situations. After all configurations are completed, click Submit to complete experiment creation.
Step 3: Experiment Execution
1. Click Execute high CPU utilization action to start an experiment.
2. Observe Monitoring Metrics. It can be seen that the CPU load is up to the specified utilization. Execute a rollback action and then recover.
3. Execute high memory utilization action and configured occupation rate so that specified memory utilization is obtained. Execute a rollback action and then recover a steady status.
Note:
Injection tools collect memory utilization metric from /proc/meminfo, and calculation formula is Percent = (MemTotal-MemAvailable)/MemTotal.
A metric observation system provided by cloud platform: Tencent Cloud Observability Platform, information of which is also collected from /proc/meminfo, but its algorithm contains no buffer and system cache occupancy, and there is difference from injection tools, details are given below: Percent = (MemTotal-MemFree-Buffers-Cached-SReclaimable+Shmem)/MemTotal.
Memory information of this experiment instance is as follows. The following results are obtained through metric substitution in the above two algorithms:
[root@VM-22-12-tencentos ~]# cat /proc/meminfo
MemTotal:        1721620 kB    //Total system memory (RAM) size
MemFree:          111260 kB    //Unused memory size
MemAvailable:     349964 kB     //Memory size available for starting a new process. Usage of system cache and buffer is considered for this value.
Buffers::           59624 kB    //Memory size for file system buffer
Cached:           570612 kB    //Memory size for file system cache
......
Shmem:            269980 kB    //Shared memory size
......
SReclaimable:      46308 kB    //Reclaimable cache size of kernel memory
Utilization achieved with injection tool is: (1721620-349964)/1721620 = 79.6%
Utilization achieved through Tencent Observability Platform is: (1721620-111260-59624-570612-46308+269980)/1721620 = 69.9%
4. To execute a high disk utilization action, log in to the machine and check that the disk has specified utilization through a df command. Execute a rollback action to recover the normal status.
In fault
﻿
After rollback
﻿
5. Execute disk IO high load action, go to the terminal, and use the iostat command for observation.
In fault
﻿
After rollback
﻿
﻿

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

tencent cloud

New User Offers

Next-Generation CDN：EdgeOne

Elasticsearch Service Special Offers

Free Tier

Tencent Cloud Startup Program

Special Offers

Lighthouse Special Offers

Cloud Object Storage Special Offers

Featured Products

New Products

Education

Tencent Cloud Online Education Solutions

Gaming

Gaming Solution

Game Media Solutions

Financial Services

Financial Services Solution

Audio & Video

Audio/Video Solution

LVB Recording Solution

Interactive Classroom Solution

Interactive Live Streaming Solution

Audio Chat Social Networking Solution

Real Estate

Tencent Cloud LinkBase(Weiling)

E-commerce

E-commerce retail solutions

Compute

Cloud Virtual Machine

Auto Scaling

Batch Compute

CVM Dedicated Host

Database

TencentDB for MySQL

TencentDB for Redis®

TencentDB for CTSDB

TDSQL for MySQL

Data Transfer Service

TencentDB for MongoDB

TencentDB for PostgreSQL

TencentDB for SQL Server

TencentDB for TcaplusDB

Video Service

Cloud Streaming Services

Video on Demand

Media Processing Service

Cloud Application Rendering

Cloud Contact Center

Game Multimedia Engine

Chat

Real-time Communication

Tencent Effect SDK

AI and Machine Learning

Image Creation Large Model

Face Fusion

eKYC

Optical Character Recognition

Video Creation Large Model

Industry Applications

Tencent HealthCare Omics Platform

Container and Middleware

TDMQ for CKafka

Serverless Cloud Function

Tencent Kubernetes Engine

Tencent Kubernetes Engine for Serverless

Networking

Cloud Load Balancer

Virtual Private Cloud

Direct Connect

Cloud Connect Network

NAT Gateway

VPN Connection

Bandwidth Package

Anycast Internet Acceleration

Elastic Network Interface

Flow Logs

Global Application Acceleration Platform

Security

Captcha