TKE clusters occupy certain node resources to run add-ons (such as kubelet, kube-proxy, and runtime). Therefore, the total number of node resources and the number of allocable resources in a cluster may differ from each other. This document mainly describes a new algorithm of node resource reservation in TKE clusters with Kubernetes versions 1.30 and later, helping you set reasonable numbers of requested resources and limited resources for Pods when deploying an application.
Applicable Scope
This algorithm is applicable to both normal and native nodes in clusters with Kubernetes versions 1.30 and later.
Definition
Resources used by the entire machine = Resources occupied by system processes to maintain node operation + Resources occupied by Pod operation
Note:
This document primarily discusses the reservation algorithm for resources occupied by system processes to maintain node operation.
Node CPU Reservation Rules
Since CPU is a compressible resource and there are some per-CPU kernel threads, we strive to maintain algorithm consistency and provide more resources on large-scale models. The updated algorithm is as follows:
6% of the first core.
1% of the next core (for up to 2 cores).
0.5% of the next 2 cores (for up to 4 cores).
0.25% of any core (for more than 4 cores).
Note that creating too many Pods on small-scale machines may result in insufficient reserved CPU, affecting the system stability. The following behaviors may also occupy system resources and you can use custom parameters to appropriately adjust the reserved resources:
Printing excessive container logs (which are piped to containerd and finally compressed and written to disk by kubelet);
Executing commands by the exec probe too frequently (such as once a second) via runc in a container, which may occupy a large amount of CPU;
Deploying other services on nodes (non-Pod managed services will occupy reserved node resources, leading to system instability).
The comparison between the new and old algorithms in CPU resource consumption is as follows:
CPU (Cores) | Old Algorithm | New Algorithm |
1 | 0.1 | 0.06 |
2 | 0.1 | 0.07 |
4 | 0.1 | 0.08 |
8 | 0.2 | 0.09 |
16 | 0.4 | 0.11 |
32 | 0.8 | 0.15 |
64 | 1.6 | 0.23 |
128 | 2.4 | 0.39 |
256 | 3.04 | 0.71 |
Node Memory Reservation Rules
Since the memory is a non-compressible resource, you should configure the reserved memory with caution. Tests showed that the memory is closely related to the number of Pods and machine specifications. Through adjustment, the new algorithm is min(old algorithm, 20 MiB * number of Pods + 256 MiB)
. Note that deploying other services on nodes may also occupy reserved node resources, reducing the node stability. If you need to deploy non-Pod managed services, it is recommended to adjust the reserved resources.
The comparison between the new and old algorithms in memory resource consumption is as follows:
Memory Size (GiB) | Old Algorithm | 16 Pods (GiB) | 32 Pods (GiB) | 64 Pods (GiB) | 128 Pods (GiB) | 256 Pods (GiB) |
1 | 0.25 | 0.25 | 0.25 | 0.25 | 0.25 | 0.25 |
2 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 |
4 | 1 | 0.58 | 0.9 | 1 | 1 | 1 |
8 | 1.8 | 0.58 | 0.9 | 1.54 | 1.8 | 1.8 |
16 | 2.6 | 0.58 | 0.9 | 1.54 | 2.6 | 2.6 |
32 | 3.56 | 0.58 | 0.9 | 1.54 | 2.82 | 3.56 |
64 | 5.48 | 0.58 | 0.9 | 1.54 | 2.82 | 5.38 |
128 | 9.32 | 0.58 | 0.9 | 1.54 | 2.82 | 5.38 |
256 | 11.88 | 0.58 | 0.9 | 1.54 | 2.82 | 5.38 |
FAQs
What Are the Differences Between the New and Old Algorithms?
Old algorithm: reserves fixed amounts of resources based on machine specifications. For details, see Resource Reservation Description. The algorithm is relatively conservative, reserving a large amount of resources, so fewer resources are available for business use. New algorithm: obtains the resource reservation formula by conducting large-scale load tests on various machine specifications with different numbers of Pods. The new algorithm provides more resources for business use on large-scale models.
Explanation About Configuring kube-reserved But Not Configuring system-reserved
Kubelet uses capacity - kubeReserved - systemReserved - evictionHard
for allocable node resources to control the number of resources available for Pods. According to the documentation, kubeReserved can be used to restrict the resource usage of the kubelet and runtime add-ons, while systemReserved can be used to restrict the resource usage of system services. But the prerequisite is to enable enforce-node-allocatable and specify kube-reserved-cgroup and system-reserved-cgroup, imposing additional requirements on the cgroup layout of the system. Moreover, considering the overall stability, both the official provider of the community edition and many cloud vendors do not enable these restrictions. Therefore, TKE does not impose resource limits on kubelet and system add-ons. It makes no practical difference whether the reserved resources are configured completely with kube-reserved or separately with kube-reserved and system-reserved.
Does the New Algorithm Support Nodes of Earlier Versions?
Not yet. The new algorithm will apply to the existing nodes through node upgrade in the future.
Apakah halaman ini membantu?