In many TKE scenarios, such as Kubernetes version upgrade and kernel version upgrade, you must remove the node and then add it back. This document describes the process of removing and re-adding a node in detail. This operation can be divided into the following steps:
Before performing the removal and re-addition of a node in a cluster, you must first drain the Pods on the node to be removed to have them operate on a different node. The draining process involves deleting the Pods on the node one by one, and then reconstruct them on another node.
To streamline node maintenance operations, Kubernetes introduced the drain
command. The use principles are as follows:
For versions after Kubernetes 1.4, the drain
operation is to first cordon the node and then delete all the Pods on the node. If this Pod is managed by a controller such as Deployment, the controller will re-construct the Pod when it detects that the number of Pod replicas has decreased, and will schedule them to other nodes that meet the conditions. If this Pod is a bare Pod that is not managed by a controller, it will not be re-constructed after it is drained.
This process involves first deleting, and then re-creation, and is not a rolling update. Therefore, in the update process, some requests for drained services may fail. If all the related Pods of the drained service are on the drained node, the service may become completely unavailable.
To avoid this situation, Kubernetes versions 1.4 and later introduced PodDisruptionBudget (PDB). You only need to select a business (a group of Pods) in the PDB policy file, to declare the minimum number of replicas that this business can tolerate. Now when you execute the drain
operation, the Pod is no longer deleted directly, but instead whether it meets the PDB policy is checked through evict api
. The Pods will only be deleted if the PDB policy is satisfied, protecting business availability. Note that the impact of the drain
operation on businesses can only be controlled if PDB is correctly configured.
The draining process involves reconstructing Pods, which may affect services in the cluster. Therefore, it is recommended that you perform the following checks before performing draining:
hostpath volume
method, when the Pod is scheduled to another node, the data will be lost which may affect the business. If the data is important, back it up before draining. Note:Currently, kubelet’s image pull policy is serial. If a large number of Pods is scheduled to the same node in a short period of time, the Pod launch time may be longer.
Currently, there are two ways to complete drainage for TKE clusters:
When the Pods running on a node are drained, this node is cordoned.
Note:
- Note the node ID to be used for re-adding the node to the cluster.
- If the node is pay-as-you-go, make sure not to select Terminate pay-as-you-go nodes. Terminated nodes cannot be restored.
NoteMount Data Disk and Container Directory are not selected by default.
If you need to store the container and image on the data disk, select Mount Data disk. When *Mount Data Disk** is selected, formatted system disks of ext3, ext4 or XFS file systems will be mounted directly. Data disks of other file systems or unformatted data disks will be automatically formatted as ext4 and mounted.
If you need to keep the data on the data disk and mount the data disk without formatting it, perform the steps below:
- On the CVM Configuration page, do not select Mount Data Disk.
- Open Advanced Settings. In the Custom Data area, enter the following node initialization script and select Cordon this node.
systemctl stop kubelet docker stop $(docker ps -a | awk '{ print $1}' | tail -n +2) systemctl stop dockerd echo '/dev/vdb /data ext4 noatime,acl,user_xattr 1 1' >> /etc/fstab mount -a sed -i 's#"graph": "/var/lib/docker",#"data-root": "/data/docker",#g' /etc/docker/daemon.json systemctl start dockerd systemctl start kubelet
Note:After the node is added successfully, it is still cordoned.
Was this page helpful?