Problem Description
When the CPU utilization of TDSQL-C for MySQL exceeds 80%, it may lead to slow business responses, timeouts, and database connection failures.
Note:
When the CPU utilization gets too high, and both CPU and memory need to be expanded, we recommend adjusting the compute configuration and adjusting the storage space to upgrade instance specifications to ensure normal business operation. Subsequently, you can refer to this document for troubleshooting and optimization. Failure Risk
If the CPU utilization of TDSQL-C for MySQL remains high for a long time, the overall performance of a database will be severely compromised, and under extreme circumstances, instances may be hung up.
When the high availability (HA) system detects that an instance is hung up, it will trigger a primary-replica switch to ensure high availability of business. During the switch, the business will be unavailable for a short period of time and instances are typically unavailable for less than 60 seconds. If the switch occurs during business peak hours, it will seriously affect the stability and continuity of the business.
To protect your business from being affected by CPU resource shortage, we recommend that you optimize the application or upgrade the CPU resources for the instances with high CPU utilization. A primary-replica switch is accompanied by a disconnection lasting for just seconds; therefore, for persistent connections, your application should have a reconnection mechanism.
Possible Causes
In TDSQL-C for MySQL, CPU resources are mainly occupied by two types of threads: system threads and user threads. Therefore, on Cloud Virtual Machine (CVM) dedicated to TDSQL-C for MySQL, focusing on these two types of threads can address most failure scenarios.
User Threads
User threads are busy, and in most scenarios, this is caused by slow queries. Besides slow queries, factors such as heavy computation workload and high queries per second (QPS) also contribute.
Slow queries
Long-lasting computations are executed, such as order by, group by, temporary tables, and join. This type of operations has low query efficiency, causing a single SQL statement to occupy CPU for a long period of time.
Heavy computation workload
The large volume of data leads to huge computation workload.
High QPS
High QPS stress makes the CPU time fully occupied. For example, if a 4-core server sustains a high QPS of 20,000 to 30,000, the total CPU time can be very long even when the CPU time of a single SQL statement is short.
System Threads
In a production environment, system thread issues occur less frequently. Generally, it is rare for multiple system threads to run at full capacity simultaneously. As long as CVM has at least 4 available CPU cores, the CPU utilization will not be too high. However, there are some bugs that may affect the CPU utilization, as shown in the figure below:
Solutions
As most CPU issues are caused by busy user threads, the following section focuses on the solutions to high CPU utilization caused by user threads.
Slow queries: Use DBbrain to identify and optimize slow queries. For details, see Slow Queries. Heavy computation workload: The CPU utilization is too high due to the large volume of data being processed. For details, see Heavy Computation Workload. High QPS: The CPU utilization is too high due to too many access requests. For details, see High QPS. Directions
Use DBbrain to identify and optimize the SQL statements which cause high CPU utilization:
The default value of TDSQL-C for MySQL slow query time (long_query_time) is 1s. When performance issues occur, if you find no slow queries, it is recommended to reduce the parameter value, and then observe the slow queries during a business cycle and optimize them accordingly. If no slow queries are found after you adjust the parameter value and the CPU utilization is still high, it is recommended to upgrade the CPU configuration to improve the overall database performance.
Heavy Computation Workload
If the data volume is large, even with proper indexing and execution plans, it will lead to high CPU utilization. Combining this with MySQL's one-thread-per-connection characteristic, it does not require too much concurrency to max out the CPU utilization.
Generally, there are two common solutions:
Enable read/write separation. Run queries on a read-only replica node where the business access pressure is low.
Optimize your program by splitting a large SQL query into smaller ones.
Upgrade the CPU configuration to improve the overall database performance.
Mount read-only instances to share the load of read-write instances.
Optimize query statements to enhance efficiency.
Was this page helpful?