Tencent Cloud

Recent Pages

High Workload

Terakhir diperbarui:2024-12-13 14:48:39
This article describes how to troubleshoot TKE cluster issues caused by high loads.
Error Description
High loads prevent node processes from getting the CPU time they need to function properly, which can lead to network timeout, health check failures, and service unavailability.
Troubleshooting
At times, a node’s load increases even though cpu ‘us’ (user) is low and cpu ‘id’ (idle) is high. This is usually caused by file I/O bottlenecks, which results in excessive I/O wait. In turn, this leads to high loads and impacts the performance of other processes.
This article uses top, atop, and iotop to diagnose if the performance issue is caused by disk I/O bottlenecks.
Query average load and wait time
1. Log in to your node and use top to query the current load. The following results are displayed:
Note: 
 High load average means the node is handling a large amount of requests. You can use values in the Cpu(s), Mem, %CPU, and %MEM columns to see which processes are using a large portion of the resources.
 top - 19:42:06 up 23:59,  2 users,  load average: 34.64, 35.80, 35.76
 Tasks: 679 total,   1 running, 678 sleeping,   0 stopped,   0 zombie
 Cpu(s): 15.6%us,  1.7%sy,  0.0%ni, 74.7%id,  7.9%wa,  0.0%hi,  0.1%si,  0.0%st
 Mem:  32865032k total, 30989168k used,  1875864k free,   370748k buffers
 Swap:  8388604k total,     5440k used,  8383164k free,  7982424k cached
﻿
   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  9783 mysql     20   0 17.3g  16g 8104 S 186.9 52.3   3752:33 mysqld
  5700 nginx     20   0 1330m  66m 9496 S  8.9  0.2   0:20.82 php-fpm
  6424 nginx     20   0 1330m  65m 8372 S  8.3  0.2   0:04.97 php-fpm
  6573 nginx     20   0 1330m  64m 7368 S  8.3  0.2   0:01.49 php-fpm
  5927 nginx     20   0 1320m  56m 9272 S  7.6  0.2   0:12.54 php-fpm
  5956 nginx     20   0 1330m  65m 8500 S  7.6  0.2   0:12.70 php-fpm
  6126 nginx     20   0 1321m  57m 8964 S  7.3  0.2   0:09.72 php-fpm
  6127 nginx     20   0 1319m  54m 9520 S  6.6  0.2   0:08.73 php-fpm
  6131 nginx     20   0 1320m  56m 9404 S  6.6  0.2   0:09.43 php-fpm
  6174 nginx     20   0 1321m  56m 8444 S  6.3  0.2   0:08.92 php-fpm
  5790 nginx     20   0 1319m  54m 9468 S  5.6  0.2   0:17.33 php-fpm
  6575 nginx     20   0 1320m  55m 8212 S  5.6  0.2   0:02.11 php-fpm
  6160 nginx     20   0 1310m  44m 8296 S  4.0  0.1   0:10.05 php-fpm
  5597 nginx     20   0 1310m  46m 9556 S  3.6  0.1   0:21.03 php-fpm
  5786 nginx     20   0 1310m  45m 8528 S  3.6  0.1   0:15.53 php-fpm
  5797 nginx     20   0 1310m  46m 9444 S  3.6  0.1   0:14.02 php-fpm
  6158 nginx     20   0 1310m  45m 8324 S  3.6  0.1   0:10.20 php-fpm
  5698 nginx     20   0 1310m  46m 9184 S  3.3  0.1   0:20.62 php-fpm
  5779 nginx     20   0 1309m  44m 8336 S  3.3  0.1   0:15.34 php-fpm
  6540 nginx     20   0 1306m  40m 7884 S  3.3  0.1   0:02.46 php-fpm
  5553 nginx     20   0 1300m  36m 9568 S  3.0  0.1   0:21.58 php-fpm
  5722 nginx     20   0 1310m  45m 8552 S  3.0  0.1   0:17.25 php-fpm
  5920 nginx     20   0 1302m  36m 8208 S  3.0  0.1   0:14.23 php-fpm
  6432 nginx     20   0 1310m  45m 8420 S  3.0  0.1   0:05.86 php-fpm
  5285 nginx     20   0 1302m  38m 9696 S  2.7  0.1   0:23.41 php-fpm
2. Among the results is the CPU wa value. wa (wait) is the percent of CPU resources used by IO WAIT. By default, the result shows the average value of all cores. Press 1 to view the wa value of each core, as shown below:
Note: 
wa is usually 0%. If it constantly floats above 1%, this indicates a storage bottleneck has been reached and storage cannot keep up with CPU processing speed.
 top - 19:42:08 up 23:59,  2 users,  load average: 34.64, 35.80, 35.76
 Tasks: 679 total,   1 running, 678 sleeping,   0 stopped,   0 zombie
 Cpu0  : 29.5%us,  3.7%sy,  0.0%ni, 48.7%id, 17.9%wa,  0.0%hi,  0.1%si,  0.0%st
 Cpu1  : 29.3%us,  3.7%sy,  0.0%ni, 48.9%id, 17.9%wa,  0.0%hi,  0.1%si,  0.0%st
 Cpu2  : 26.1%us,  3.1%sy,  0.0%ni, 64.4%id,  6.0%wa,  0.0%hi,  0.3%si,  0.0%st
 Cpu3  : 25.9%us,  3.1%sy,  0.0%ni, 65.5%id,  5.4%wa,  0.0%hi,  0.1%si,  0.0%st
 Cpu4  : 24.9%us,  3.0%sy,  0.0%ni, 66.8%id,  5.0%wa,  0.0%hi,  0.3%si,  0.0%st
 Cpu5  : 24.9%us,  2.9%sy,  0.0%ni, 67.0%id,  4.8%wa,  0.0%hi,  0.3%si,  0.0%st
 Cpu6  : 24.2%us,  2.7%sy,  0.0%ni, 68.3%id,  4.5%wa,  0.0%hi,  0.3%si,  0.0%st
 Cpu7  : 24.3%us,  2.6%sy,  0.0%ni, 68.5%id,  4.2%wa,  0.0%hi,  0.3%si,  0.0%st
 Cpu8  : 23.8%us,  2.6%sy,  0.0%ni, 69.2%id,  4.1%wa,  0.0%hi,  0.3%si,  0.0%st
 Cpu9  : 23.9%us,  2.5%sy,  0.0%ni, 69.3%id,  4.0%wa,  0.0%hi,  0.3%si,  0.0%st
 Cpu10 : 23.3%us,  2.4%sy,  0.0%ni, 68.7%id,  5.6%wa,  0.0%hi,  0.0%si,  0.0%st
 Cpu11 : 23.3%us,  2.4%sy,  0.0%ni, 69.2%id,  5.1%wa,  0.0%hi,  0.0%si,  0.0%st
 Cpu12 : 21.8%us,  2.4%sy,  0.0%ni, 60.2%id, 15.5%wa,  0.0%hi,  0.0%si,  0.0%st
 Cpu13 : 21.9%us,  2.4%sy,  0.0%ni, 60.6%id, 15.2%wa,  0.0%hi,  0.0%si,  0.0%st
 Cpu14 : 21.4%us,  2.3%sy,  0.0%ni, 72.6%id,  3.7%wa,  0.0%hi,  0.0%si,  0.0%st
 Cpu15 : 21.5%us,  2.2%sy,  0.0%ni, 73.2%id,  3.1%wa,  0.0%hi,  0.0%si,  0.0%st
 Cpu16 : 21.2%us,  2.2%sy,  0.0%ni, 73.6%id,  3.0%wa,  0.0%hi,  0.0%si,  0.0%st
 Cpu17 : 21.2%us,  2.1%sy,  0.0%ni, 73.8%id,  2.8%wa,  0.0%hi,  0.0%si,  0.0%st
 Cpu18 : 20.9%us,  2.1%sy,  0.0%ni, 74.1%id,  2.9%wa,  0.0%hi,  0.0%si,  0.0%st
 Cpu19 : 21.0%us,  2.1%sy,  0.0%ni, 74.4%id,  2.5%wa,  0.0%hi,  0.0%si,  0.0%st
 Cpu20 : 20.7%us,  2.0%sy,  0.0%ni, 73.8%id,  3.4%wa,  0.0%hi,  0.0%si,  0.0%st
 Cpu21 : 20.8%us,  2.0%sy,  0.0%ni, 73.9%id,  3.2%wa,  0.0%hi,  0.0%si,  0.0%st
 Cpu22 : 20.8%us,  2.0%sy,  0.0%ni, 74.4%id,  2.8%wa,  0.0%hi,  0.0%si,  0.0%st
 Cpu23 : 20.8%us,  1.9%sy,  0.0%ni, 74.4%id,  2.8%wa,  0.0%hi,  0.0%si,  0.0%st
 Mem:  32865032k total, 30209248k used,  2655784k free,   370748k buffers
 Swap:  8388604k total,     5440k used,  8383164k free,  7986552k cached
Monitoring Disk I/O Statistics
1. Use atop to query disk I/O. In the following example, disk sda shows busy 100%, meaning it has reached the bottleneck.
ATOP - lemp              2017/01/23  19:42:32              ---------                10s elapsed
PRC | sys    3.18s | user  33.24s | #proc    679 | #tslpu    28 | #zombie    0 | #exit      0 |
CPU | sys      29% | user    330% | irq       1% | idle   1857% | wait    182% | curscal  69% |
CPL | avg1   33.00 | avg5   35.29 | avg15  35.59 | csw    62610 | intr   76926 | numcpu    24 |
MEM | tot    31.3G | free    2.1G | cache   7.6G | dirty  41.0M | buff  362.1M | slab    1.2G |
SWP | tot     8.0G | free    8.0G |              |              | vmcom  23.9G | vmlim  23.7G |
DSK |          sda | busy    100% | read       4 | write   1789 | MBw/s   2.84 | avio 5.58 ms |
NET | transport    | tcpi   10357 | tcpo    9065 | udpi       0 | udpo       0 | tcpao    174 |
NET | network      | ipi    10360 | ipo     9065 | ipfrw      0 | deliv  10359 | icmpo      0 |
NET | eth0      4% | pcki    6649 | pcko    6136 | si 1478 Kbps | so 4115 Kbps | erro       0 |
NET | lo      ---- | pcki    4082 | pcko    4082 | si 8967 Kbps | so 8967 Kbps | erro       0 |
﻿
PID   TID  THR  SYSCPU  USRCPU  VGROW  RGROW  RDDSK  WRDSK ST EXC S CPUNR  CPU CMD       1/12
 9783     -  156   0.21s  19.44s     0K  -788K     4K  1344K --   - S     4 197% mysqld
 5596     -    1   0.10s   0.62s 47204K 47004K     0K   220K --   - S    18   7% php-fpm
 6429     -    1   0.06s   0.34s 19840K 19968K     0K     0K --   - S    21   4% php-fpm
 6210     -    1   0.03s   0.30s -5216K -5204K     0K     0K --   - S    19   3% php-fpm
 5757     -    1   0.05s   0.27s 26072K 26012K     0K     4K --   - S    13   3% php-fpm
 6433     -    1   0.04s   0.28s -2816K -2816K     0K     0K --   - S    11   3% php-fpm
 5846     -    1   0.06s   0.22s -2560K -2660K     0K     0K --   - S     7   3% php-fpm
 5791     -    1   0.05s   0.21s  5764K  5692K     0K     0K --   - S    22   3% php-fpm
 5860     -    1   0.04s   0.21s 48088K 47724K     0K     0K --   - S     1   3% php-fpm
 6231     -    1   0.04s   0.20s  -256K    -4K     0K     0K --   - S     1   2% php-fpm
 6154     -    1   0.03s   0.21s -3004K -3184K     0K     0K --   - S    21   2% php-fpm
 6573     -    1   0.04s   0.20s  -512K  -168K     0K     0K --   - S     4   2% php-fpm
 6435     -    1   0.04s   0.19s -3216K -2980K     0K     0K --   - S    15   2% php-fpm
 5954     -    1   0.03s   0.20s     0K   164K     0K     4K --   - S     0   2% php-fpm
 6133     -    1   0.03s   0.19s 41056K 40432K     0K     0K --   - S    18   2% php-fpm
 6132     -    1   0.02s   0.20s 37836K 37440K     0K     0K --   - S    11   2% php-fpm
 6242     -    1   0.03s   0.19s -12.2M -12.3M     0K     4K --   - S    12   2% php-fpm
 6285     -    1   0.02s   0.19s 39516K 39420K     0K     0K --   - S     3   2% php-fpm
 6455     -    1   0.05s   0.16s 29008K 28560K     0K     0K --   - S    14   2% php-fpm
2. Use one of the following methods to view process disk I/O usage:
Press d to view process disk I/O usage, as shown below:
  ATOP - lemp               2017/01/23  19:42:46               ---------               2s elapsed
  PRC | sys    0.24s | user   1.99s | #proc    679 | #tslpu    54 | #zombie    0 | #exit      0 |
  CPU | sys      11% | user    101% | irq       1% | idle   2089% | wait    208% | curscal  63% |
  CPL | avg1   38.49 | avg5   36.48 | avg15  35.98 | csw     4654 | intr    6876 | numcpu    24 |
  MEM | tot    31.3G | free    2.2G | cache   7.6G | dirty  48.7M | buff  362.1M | slab    1.2G |
  SWP | tot     8.0G | free    8.0G |              |              | vmcom  23.9G | vmlim  23.7G |
  DSK |          sda | busy    100% | read       2 | write    362 | MBw/s   2.28 | avio 5.49 ms |
  NET | transport    | tcpi    1031 | tcpo     968 | udpi       0 | udpo       0 | tcpao     45 |
  NET | network      | ipi     1031 | ipo      968 | ipfrw      0 | deliv   1031 | icmpo      0 |
  NET | eth0      1% | pcki     558 | pcko     508 | si  762 Kbps | so 1077 Kbps | erro       0 |
  NET | lo      ---- | pcki     406 | pcko     406 | si 2273 Kbps | so 2273 Kbps | erro       0 |
﻿
    PID          TID         RDDSK         WRDSK        WCANCL         DSK        CMD         1/5
   9783            -            0K          468K           16K         40%        mysqld
   1930            -            0K          212K            0K         18%        flush-8:0
   5896            -            0K          152K            0K         13%        nginx
    880            -            0K          148K            0K         13%        jbd2/sda5-8
   5909            -            0K           60K            0K          5%        nginx
   5906            -            0K           36K            0K          3%        nginx
   5907            -           16K            8K            0K          2%        nginx
   5903            -           20K            0K            0K          2%        nginx
   5901            -            0K           12K            0K          1%        nginx
   5908            -            0K            8K            0K          1%        nginx
   5894            -            0K            8K            0K          1%        nginx
   5911            -            0K            8K            0K          1%        nginx
   5900            -            0K            4K            4K          0%        nginx
   5551            -            0K            4K            0K          0%        php-fpm
   5913            -            0K            4K            0K          0%        nginx
   5895            -            0K            4K            0K          0%        nginx
   6133            -            0K            0K            0K          0%        php-fpm
   5780            -            0K            0K            0K          0%        php-fpm
   6675            -            0K            0K            0K          0%        atop
You can also use iotop -oPa to view process disk I/O usage, as shown below:
  Total DISK READ: 15.02 K/s | Total DISK WRITE: 3.82 M/s
    PID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
   1930 be/4 root          0.00 B   1956.00 K  0.00 % 83.34 % [flush-8:0]
   5914 be/4 nginx         0.00 B      0.00 B  0.00 % 36.56 % nginx: cache manager process
    880 be/3 root          0.00 B     21.27 M  0.00 % 35.03 % [jbd2/sda5-8]
   5913 be/2 nginx        36.00 K   1000.00 K  0.00 %  8.94 % nginx: worker process
   5910 be/2 nginx         0.00 B   1048.00 K  0.00 %  8.43 % nginx: worker process
   5896 be/2 nginx        56.00 K    452.00 K  0.00 %  6.91 % nginx: worker process
   5909 be/2 nginx        20.00 K   1144.00 K  0.00 %  6.24 % nginx: worker process
   5890 be/2 nginx        48.00 K    692.00 K  0.00 %  6.07 % nginx: worker process
   5892 be/2 nginx        84.00 K    736.00 K  0.00 %  5.71 % nginx: worker process
   5901 be/2 nginx        20.00 K    504.00 K  0.00 %  5.46 % nginx: worker process
   5899 be/2 nginx         0.00 B    596.00 K  0.00 %  5.14 % nginx: worker process
   5897 be/2 nginx        28.00 K   1388.00 K  0.00 %  4.90 % nginx: worker process
   5908 be/2 nginx        48.00 K    700.00 K  0.00 %  4.43 % nginx: worker process
   5905 be/2 nginx        32.00 K   1140.00 K  0.00 %  4.36 % nginx: worker process
   5900 be/2 nginx         0.00 B   1208.00 K  0.00 %  4.31 % nginx: worker process
   5904 be/2 nginx        36.00 K   1244.00 K  0.00 %  2.80 % nginx: worker process
   5895 be/2 nginx        16.00 K    780.00 K  0.00 %  2.50 % nginx: worker process
   5907 be/2 nginx         0.00 B   1548.00 K  0.00 %  2.43 % nginx: worker process
   5903 be/2 nginx        36.00 K   1032.00 K  0.00 %  2.34 % nginx: worker process
   6130 be/4 nginx         0.00 B     72.00 K  0.00 %  2.18 % php-fpm: pool www
   5906 be/2 nginx        12.00 K    844.00 K  0.00 %  2.10 % nginx: worker process
   5889 be/2 nginx        40.00 K   1164.00 K  0.00 %  2.00 % nginx: worker process
   5894 be/2 nginx        44.00 K    760.00 K  0.00 %  1.61 % nginx: worker process
   5902 be/2 nginx        52.00 K    992.00 K  0.00 %  1.55 % nginx: worker process
   5893 be/2 nginx        64.00 K    972.00 K  0.00 %  1.22 % nginx: worker process
   5814 be/4 nginx        36.00 K     44.00 K  0.00 %  1.06 % php-fpm: pool www
   6159 be/4 nginx         4.00 K      4.00 K  0.00 %  1.00 % php-fpm: pool www
   5693 be/4 nginx         0.00 B      4.00 K  0.00 %  0.86 % php-fpm: pool www
   5912 be/2 nginx        68.00 K    300.00 K  0.00 %  0.72 % nginx: worker process
   5911 be/2 nginx        20.00 K    788.00 K  0.00 %  0.72 % nginx: worker process
Use man iotop to view the descriptions of the following parameters:
   -o, --only
          Only show processes or threads actually doing I/O, instead of showing all processes or threads. This can be dynamically toggled by pressing o.
   -P, --processes
          Only show processes. Normally iotop shows all threads.
﻿
   -a, --accumulated
          Show accumulated I/O instead of bandwidth. In this mode, iotop shows the amount of I/O processes have done since iotop started.
Other Reasons
Deploying non-Kubernetes services, such as databases, on the node may also cause high loads.
Hubungi Kami
Hubungi tim penjualan atau penasihat bisnis kami untuk membantu bisnis Anda.
Dukungan Teknis
Buka tiket jika Anda mencari bantuan lebih lanjut. Tiket kami tersedia 7x24.
Dukungan Telepon 7x24
tencent cloud

Recent Pages

High Workload

Error Description

Troubleshooting

Query average load and wait time

Monitoring Disk I/O Statistics

Other Reasons

Apakah halaman ini membantu?

Apakah halaman ini membantu?

tencent cloud

Daftar

Login

Recent Pages

High Workload

Error Description

Troubleshooting

Query average load and wait time

Monitoring Disk I/O Statistics

Other Reasons

Apakah halaman ini membantu?

Apakah halaman ini membantu?