tencent cloud

masukan

High Workload

Terakhir diperbarui:2024-12-13 14:48:39
    This article describes how to troubleshoot TKE cluster issues caused by high loads.

    Error Description

    High loads prevent node processes from getting the CPU time they need to function properly, which can lead to network timeout, health check failures, and service unavailability.

    Troubleshooting

    At times, a node’s load increases even though cpu ‘us’ (user) is low and cpu ‘id’ (idle) is high. This is usually caused by file I/O bottlenecks, which results in excessive I/O wait. In turn, this leads to high loads and impacts the performance of other processes. This article uses top, atop, and iotop to diagnose if the performance issue is caused by disk I/O bottlenecks.

    Query average load and wait time

    1. Log in to your node and use top to query the current load. The following results are displayed:
    Note:
    High load average means the node is handling a large amount of requests. You can use values in the Cpu(s), Mem, %CPU, and %MEM columns to see which processes are using a large portion of the resources.
    top - 19:42:06 up 23:59, 2 users, load average: 34.64, 35.80, 35.76
    Tasks: 679 total, 1 running, 678 sleeping, 0 stopped, 0 zombie
    Cpu(s): 15.6%us, 1.7%sy, 0.0%ni, 74.7%id, 7.9%wa, 0.0%hi, 0.1%si, 0.0%st
    Mem: 32865032k total, 30989168k used, 1875864k free, 370748k buffers
    Swap: 8388604k total, 5440k used, 8383164k free, 7982424k cached
    
    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    9783 mysql 20 0 17.3g 16g 8104 S 186.9 52.3 3752:33 mysqld
    5700 nginx 20 0 1330m 66m 9496 S 8.9 0.2 0:20.82 php-fpm
    6424 nginx 20 0 1330m 65m 8372 S 8.3 0.2 0:04.97 php-fpm
    6573 nginx 20 0 1330m 64m 7368 S 8.3 0.2 0:01.49 php-fpm
    5927 nginx 20 0 1320m 56m 9272 S 7.6 0.2 0:12.54 php-fpm
    5956 nginx 20 0 1330m 65m 8500 S 7.6 0.2 0:12.70 php-fpm
    6126 nginx 20 0 1321m 57m 8964 S 7.3 0.2 0:09.72 php-fpm
    6127 nginx 20 0 1319m 54m 9520 S 6.6 0.2 0:08.73 php-fpm
    6131 nginx 20 0 1320m 56m 9404 S 6.6 0.2 0:09.43 php-fpm
    6174 nginx 20 0 1321m 56m 8444 S 6.3 0.2 0:08.92 php-fpm
    5790 nginx 20 0 1319m 54m 9468 S 5.6 0.2 0:17.33 php-fpm
    6575 nginx 20 0 1320m 55m 8212 S 5.6 0.2 0:02.11 php-fpm
    6160 nginx 20 0 1310m 44m 8296 S 4.0 0.1 0:10.05 php-fpm
    5597 nginx 20 0 1310m 46m 9556 S 3.6 0.1 0:21.03 php-fpm
    5786 nginx 20 0 1310m 45m 8528 S 3.6 0.1 0:15.53 php-fpm
    5797 nginx 20 0 1310m 46m 9444 S 3.6 0.1 0:14.02 php-fpm
    6158 nginx 20 0 1310m 45m 8324 S 3.6 0.1 0:10.20 php-fpm
    5698 nginx 20 0 1310m 46m 9184 S 3.3 0.1 0:20.62 php-fpm
    5779 nginx 20 0 1309m 44m 8336 S 3.3 0.1 0:15.34 php-fpm
    6540 nginx 20 0 1306m 40m 7884 S 3.3 0.1 0:02.46 php-fpm
    5553 nginx 20 0 1300m 36m 9568 S 3.0 0.1 0:21.58 php-fpm
    5722 nginx 20 0 1310m 45m 8552 S 3.0 0.1 0:17.25 php-fpm
    5920 nginx 20 0 1302m 36m 8208 S 3.0 0.1 0:14.23 php-fpm
    6432 nginx 20 0 1310m 45m 8420 S 3.0 0.1 0:05.86 php-fpm
    5285 nginx 20 0 1302m 38m 9696 S 2.7 0.1 0:23.41 php-fpm
    2. Among the results is the CPU wa value. wa (wait) is the percent of CPU resources used by IO WAIT. By default, the result shows the average value of all cores. Press 1 to view the wa value of each core, as shown below:
    Note:
    wa is usually 0%. If it constantly floats above 1%, this indicates a storage bottleneck has been reached and storage cannot keep up with CPU processing speed.
    top - 19:42:08 up 23:59, 2 users, load average: 34.64, 35.80, 35.76
    Tasks: 679 total, 1 running, 678 sleeping, 0 stopped, 0 zombie
    Cpu0 : 29.5%us, 3.7%sy, 0.0%ni, 48.7%id, 17.9%wa, 0.0%hi, 0.1%si, 0.0%st
    Cpu1 : 29.3%us, 3.7%sy, 0.0%ni, 48.9%id, 17.9%wa, 0.0%hi, 0.1%si, 0.0%st
    Cpu2 : 26.1%us, 3.1%sy, 0.0%ni, 64.4%id, 6.0%wa, 0.0%hi, 0.3%si, 0.0%st
    Cpu3 : 25.9%us, 3.1%sy, 0.0%ni, 65.5%id, 5.4%wa, 0.0%hi, 0.1%si, 0.0%st
    Cpu4 : 24.9%us, 3.0%sy, 0.0%ni, 66.8%id, 5.0%wa, 0.0%hi, 0.3%si, 0.0%st
    Cpu5 : 24.9%us, 2.9%sy, 0.0%ni, 67.0%id, 4.8%wa, 0.0%hi, 0.3%si, 0.0%st
    Cpu6 : 24.2%us, 2.7%sy, 0.0%ni, 68.3%id, 4.5%wa, 0.0%hi, 0.3%si, 0.0%st
    Cpu7 : 24.3%us, 2.6%sy, 0.0%ni, 68.5%id, 4.2%wa, 0.0%hi, 0.3%si, 0.0%st
    Cpu8 : 23.8%us, 2.6%sy, 0.0%ni, 69.2%id, 4.1%wa, 0.0%hi, 0.3%si, 0.0%st
    Cpu9 : 23.9%us, 2.5%sy, 0.0%ni, 69.3%id, 4.0%wa, 0.0%hi, 0.3%si, 0.0%st
    Cpu10 : 23.3%us, 2.4%sy, 0.0%ni, 68.7%id, 5.6%wa, 0.0%hi, 0.0%si, 0.0%st
    Cpu11 : 23.3%us, 2.4%sy, 0.0%ni, 69.2%id, 5.1%wa, 0.0%hi, 0.0%si, 0.0%st
    Cpu12 : 21.8%us, 2.4%sy, 0.0%ni, 60.2%id, 15.5%wa, 0.0%hi, 0.0%si, 0.0%st
    Cpu13 : 21.9%us, 2.4%sy, 0.0%ni, 60.6%id, 15.2%wa, 0.0%hi, 0.0%si, 0.0%st
    Cpu14 : 21.4%us, 2.3%sy, 0.0%ni, 72.6%id, 3.7%wa, 0.0%hi, 0.0%si, 0.0%st
    Cpu15 : 21.5%us, 2.2%sy, 0.0%ni, 73.2%id, 3.1%wa, 0.0%hi, 0.0%si, 0.0%st
    Cpu16 : 21.2%us, 2.2%sy, 0.0%ni, 73.6%id, 3.0%wa, 0.0%hi, 0.0%si, 0.0%st
    Cpu17 : 21.2%us, 2.1%sy, 0.0%ni, 73.8%id, 2.8%wa, 0.0%hi, 0.0%si, 0.0%st
    Cpu18 : 20.9%us, 2.1%sy, 0.0%ni, 74.1%id, 2.9%wa, 0.0%hi, 0.0%si, 0.0%st
    Cpu19 : 21.0%us, 2.1%sy, 0.0%ni, 74.4%id, 2.5%wa, 0.0%hi, 0.0%si, 0.0%st
    Cpu20 : 20.7%us, 2.0%sy, 0.0%ni, 73.8%id, 3.4%wa, 0.0%hi, 0.0%si, 0.0%st
    Cpu21 : 20.8%us, 2.0%sy, 0.0%ni, 73.9%id, 3.2%wa, 0.0%hi, 0.0%si, 0.0%st
    Cpu22 : 20.8%us, 2.0%sy, 0.0%ni, 74.4%id, 2.8%wa, 0.0%hi, 0.0%si, 0.0%st
    Cpu23 : 20.8%us, 1.9%sy, 0.0%ni, 74.4%id, 2.8%wa, 0.0%hi, 0.0%si, 0.0%st
    Mem: 32865032k total, 30209248k used, 2655784k free, 370748k buffers
    Swap: 8388604k total, 5440k used, 8383164k free, 7986552k cached

    Monitoring Disk I/O Statistics

    1. Use atop to query disk I/O. In the following example, disk sda shows busy 100%, meaning it has reached the bottleneck.
    ATOP - lemp 2017/01/23 19:42:32 --------- 10s elapsed
    PRC | sys 3.18s | user 33.24s | #proc 679 | #tslpu 28 | #zombie 0 | #exit 0 |
    CPU | sys 29% | user 330% | irq 1% | idle 1857% | wait 182% | curscal 69% |
    CPL | avg1 33.00 | avg5 35.29 | avg15 35.59 | csw 62610 | intr 76926 | numcpu 24 |
    MEM | tot 31.3G | free 2.1G | cache 7.6G | dirty 41.0M | buff 362.1M | slab 1.2G |
    SWP | tot 8.0G | free 8.0G | | | vmcom 23.9G | vmlim 23.7G |
    DSK | sda | busy 100% | read 4 | write 1789 | MBw/s 2.84 | avio 5.58 ms |
    NET | transport | tcpi 10357 | tcpo 9065 | udpi 0 | udpo 0 | tcpao 174 |
    NET | network | ipi 10360 | ipo 9065 | ipfrw 0 | deliv 10359 | icmpo 0 |
    NET | eth0 4% | pcki 6649 | pcko 6136 | si 1478 Kbps | so 4115 Kbps | erro 0 |
    NET | lo ---- | pcki 4082 | pcko 4082 | si 8967 Kbps | so 8967 Kbps | erro 0 |
    
    PID TID THR SYSCPU USRCPU VGROW RGROW RDDSK WRDSK ST EXC S CPUNR CPU CMD 1/12
    9783 - 156 0.21s 19.44s 0K -788K 4K 1344K -- - S 4 197% mysqld
    5596 - 1 0.10s 0.62s 47204K 47004K 0K 220K -- - S 18 7% php-fpm
    6429 - 1 0.06s 0.34s 19840K 19968K 0K 0K -- - S 21 4% php-fpm
    6210 - 1 0.03s 0.30s -5216K -5204K 0K 0K -- - S 19 3% php-fpm
    5757 - 1 0.05s 0.27s 26072K 26012K 0K 4K -- - S 13 3% php-fpm
    6433 - 1 0.04s 0.28s -2816K -2816K 0K 0K -- - S 11 3% php-fpm
    5846 - 1 0.06s 0.22s -2560K -2660K 0K 0K -- - S 7 3% php-fpm
    5791 - 1 0.05s 0.21s 5764K 5692K 0K 0K -- - S 22 3% php-fpm
    5860 - 1 0.04s 0.21s 48088K 47724K 0K 0K -- - S 1 3% php-fpm
    6231 - 1 0.04s 0.20s -256K -4K 0K 0K -- - S 1 2% php-fpm
    6154 - 1 0.03s 0.21s -3004K -3184K 0K 0K -- - S 21 2% php-fpm
    6573 - 1 0.04s 0.20s -512K -168K 0K 0K -- - S 4 2% php-fpm
    6435 - 1 0.04s 0.19s -3216K -2980K 0K 0K -- - S 15 2% php-fpm
    5954 - 1 0.03s 0.20s 0K 164K 0K 4K -- - S 0 2% php-fpm
    6133 - 1 0.03s 0.19s 41056K 40432K 0K 0K -- - S 18 2% php-fpm
    6132 - 1 0.02s 0.20s 37836K 37440K 0K 0K -- - S 11 2% php-fpm
    6242 - 1 0.03s 0.19s -12.2M -12.3M 0K 4K -- - S 12 2% php-fpm
    6285 - 1 0.02s 0.19s 39516K 39420K 0K 0K -- - S 3 2% php-fpm
    6455 - 1 0.05s 0.16s 29008K 28560K 0K 0K -- - S 14 2% php-fpm
    2. Use one of the following methods to view process disk I/O usage:
    Press d to view process disk I/O usage, as shown below:
    ATOP - lemp 2017/01/23 19:42:46 --------- 2s elapsed
    PRC | sys 0.24s | user 1.99s | #proc 679 | #tslpu 54 | #zombie 0 | #exit 0 |
    CPU | sys 11% | user 101% | irq 1% | idle 2089% | wait 208% | curscal 63% |
    CPL | avg1 38.49 | avg5 36.48 | avg15 35.98 | csw 4654 | intr 6876 | numcpu 24 |
    MEM | tot 31.3G | free 2.2G | cache 7.6G | dirty 48.7M | buff 362.1M | slab 1.2G |
    SWP | tot 8.0G | free 8.0G | | | vmcom 23.9G | vmlim 23.7G |
    DSK | sda | busy 100% | read 2 | write 362 | MBw/s 2.28 | avio 5.49 ms |
    NET | transport | tcpi 1031 | tcpo 968 | udpi 0 | udpo 0 | tcpao 45 |
    NET | network | ipi 1031 | ipo 968 | ipfrw 0 | deliv 1031 | icmpo 0 |
    NET | eth0 1% | pcki 558 | pcko 508 | si 762 Kbps | so 1077 Kbps | erro 0 |
    NET | lo ---- | pcki 406 | pcko 406 | si 2273 Kbps | so 2273 Kbps | erro 0 |
    
    PID TID RDDSK WRDSK WCANCL DSK CMD 1/5
    9783 - 0K 468K 16K 40% mysqld
    1930 - 0K 212K 0K 18% flush-8:0
    5896 - 0K 152K 0K 13% nginx
    880 - 0K 148K 0K 13% jbd2/sda5-8
    5909 - 0K 60K 0K 5% nginx
    5906 - 0K 36K 0K 3% nginx
    5907 - 16K 8K 0K 2% nginx
    5903 - 20K 0K 0K 2% nginx
    5901 - 0K 12K 0K 1% nginx
    5908 - 0K 8K 0K 1% nginx
    5894 - 0K 8K 0K 1% nginx
    5911 - 0K 8K 0K 1% nginx
    5900 - 0K 4K 4K 0% nginx
    5551 - 0K 4K 0K 0% php-fpm
    5913 - 0K 4K 0K 0% nginx
    5895 - 0K 4K 0K 0% nginx
    6133 - 0K 0K 0K 0% php-fpm
    5780 - 0K 0K 0K 0% php-fpm
    6675 - 0K 0K 0K 0% atop
    You can also use iotop -oPa to view process disk I/O usage, as shown below:
    Total DISK READ: 15.02 K/s | Total DISK WRITE: 3.82 M/s
    PID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
    1930 be/4 root 0.00 B 1956.00 K 0.00 % 83.34 % [flush-8:0]
    5914 be/4 nginx 0.00 B 0.00 B 0.00 % 36.56 % nginx: cache manager process
    880 be/3 root 0.00 B 21.27 M 0.00 % 35.03 % [jbd2/sda5-8]
    5913 be/2 nginx 36.00 K 1000.00 K 0.00 % 8.94 % nginx: worker process
    5910 be/2 nginx 0.00 B 1048.00 K 0.00 % 8.43 % nginx: worker process
    5896 be/2 nginx 56.00 K 452.00 K 0.00 % 6.91 % nginx: worker process
    5909 be/2 nginx 20.00 K 1144.00 K 0.00 % 6.24 % nginx: worker process
    5890 be/2 nginx 48.00 K 692.00 K 0.00 % 6.07 % nginx: worker process
    5892 be/2 nginx 84.00 K 736.00 K 0.00 % 5.71 % nginx: worker process
    5901 be/2 nginx 20.00 K 504.00 K 0.00 % 5.46 % nginx: worker process
    5899 be/2 nginx 0.00 B 596.00 K 0.00 % 5.14 % nginx: worker process
    5897 be/2 nginx 28.00 K 1388.00 K 0.00 % 4.90 % nginx: worker process
    5908 be/2 nginx 48.00 K 700.00 K 0.00 % 4.43 % nginx: worker process
    5905 be/2 nginx 32.00 K 1140.00 K 0.00 % 4.36 % nginx: worker process
    5900 be/2 nginx 0.00 B 1208.00 K 0.00 % 4.31 % nginx: worker process
    5904 be/2 nginx 36.00 K 1244.00 K 0.00 % 2.80 % nginx: worker process
    5895 be/2 nginx 16.00 K 780.00 K 0.00 % 2.50 % nginx: worker process
    5907 be/2 nginx 0.00 B 1548.00 K 0.00 % 2.43 % nginx: worker process
    5903 be/2 nginx 36.00 K 1032.00 K 0.00 % 2.34 % nginx: worker process
    6130 be/4 nginx 0.00 B 72.00 K 0.00 % 2.18 % php-fpm: pool www
    5906 be/2 nginx 12.00 K 844.00 K 0.00 % 2.10 % nginx: worker process
    5889 be/2 nginx 40.00 K 1164.00 K 0.00 % 2.00 % nginx: worker process
    5894 be/2 nginx 44.00 K 760.00 K 0.00 % 1.61 % nginx: worker process
    5902 be/2 nginx 52.00 K 992.00 K 0.00 % 1.55 % nginx: worker process
    5893 be/2 nginx 64.00 K 972.00 K 0.00 % 1.22 % nginx: worker process
    5814 be/4 nginx 36.00 K 44.00 K 0.00 % 1.06 % php-fpm: pool www
    6159 be/4 nginx 4.00 K 4.00 K 0.00 % 1.00 % php-fpm: pool www
    5693 be/4 nginx 0.00 B 4.00 K 0.00 % 0.86 % php-fpm: pool www
    5912 be/2 nginx 68.00 K 300.00 K 0.00 % 0.72 % nginx: worker process
    5911 be/2 nginx 20.00 K 788.00 K 0.00 % 0.72 % nginx: worker process
    Use man iotop to view the descriptions of the following parameters:
    -o, --only
    Only show processes or threads actually doing I/O, instead of showing all processes or threads. This can be dynamically toggled by pressing o.
    -P, --processes
    Only show processes. Normally iotop shows all threads.
    
    -a, --accumulated
    Show accumulated I/O instead of bandwidth. In this mode, iotop shows the amount of I/O processes have done since iotop started.

    Other Reasons

    Deploying non-Kubernetes services, such as databases, on the node may also cause high loads.
    Hubungi Kami

    Hubungi tim penjualan atau penasihat bisnis kami untuk membantu bisnis Anda.

    Dukungan Teknis

    Buka tiket jika Anda mencari bantuan lebih lanjut. Tiket kami tersedia 7x24.

    Dukungan Telepon 7x24