tencent cloud

All product documents
Tencent Kubernetes Engine
Pod Remains in Terminating
Last updated: 2024-12-13 14:48:39
Pod Remains in Terminating
Last updated: 2024-12-13 14:48:39
This article describes the causes that lead to a Pod remaining in the Terminating status and how to troubleshoot these issues. Refer to the following instructions for troubleshooting.

Possible Causes

Insufficient disk space
Files with the i attribute exist
A bug in Docker version 17
Finalizers exist
A bug in earlier versions of kubelet list-watch
Dockerd status and containerd status is not in sync
A bug in Daemonset Controller

Troubleshooting

Checking if disk space is sufficient

If the disk where the Docker data directory resides is full, Docker will not function properly. It cannot even delete or create containers. Therefore it cannot respond to kubelet’s call to delete containers. Use kubectl describe pod <pod-name> to query event and get the following messages:
Normal Killing 39s (x735 over 15h) kubelet, 10.179.80.31 Killing container with id docker://apigateway:Need to kill Pod
For solutions and more information, see Disk Full.

Checking to see if files with the i attribute exist

Error description

Use man chattr to display a description of the i attribute, as shown below:
A file with the 'i' attribute cannot be modified: it cannot be deleted or renamed, no link can be created to this file and no data can be written to the file. Only the superuser or a process possessing the CAP_LINUX_IMMUTABLE capability can set or clear this attribute.
Note:
If the container image file itself or files stored in the container have the i attribute, they cannot be modified or deleted.
When Pods are deleted, container directories are cleaned. If the directories have files that cannot be deleted, the directories cannot be deleted, which causes the Pods to remain in the Terminating status. In this case, kubelet displays the following error message:
Sep 27 14:37:21 VM_0_7_centos kubelet[14109]: E0927 14:37:21.922965 14109 remote_runtime.go:250] RemoveContainer "19d837c77a3c294052a99ff9347c520bc8acb7b8b9a9dc9fab281fc09df38257" from runtime service failed: rpc error: code = Unknown desc = failed to remove container "19d837c77a3c294052a99ff9347c520bc8acb7b8b9a9dc9fab281fc09df38257": Error response from daemon: container 19d837c77a3c294052a99ff9347c520bc8acb7b8b9a9dc9fab281fc09df38257: driver "overlay2" failed to remove root filesystem: remove /data/docker/overlay2/b1aea29c590aa9abda79f7cf3976422073fb3652757f0391db88534027546868/diff/usr/bin/bash: operation not permitted
Sep 27 14:37:21 VM_0_7_centos kubelet[14109]: E0927 14:37:21.923027 14109 kuberuntime_gc.go:126] Failed to remove container "19d837c77a3c294052a99ff9347c520bc8acb7b8b9a9dc9fab281fc09df38257": rpc error: code = Unknown desc = failed to remove container "19d837c77a3c294052a99ff9347c520bc8acb7b8b9a9dc9fab281fc09df38257": Error response from daemon: container 19d837c77a3c294052a99ff9347c520bc8acb7b8b9a9dc9fab281fc09df38257: driver "overlay2" failed to remove root filesystem: remove /data/docker/overlay2/b1aea29c590aa9abda79f7cf3976422073fb3652757f0391db88534027546868/diff/usr/bin/bash: operation not permitted

Solution

Permanent solution: do not store files with the i attribute in container images or set a launched container with the i attribute.
Temporary solution:
1.1 Use the file path in the kubelet log and run the command chattr -i <file>, as shown below:
chattr -i /data/docker/overlay2/b1aea29c590aa9abda79f7cf3976422073fb3652757f0391db88534027546868/diff/usr/bin/bash
1.2 Wait for kubelet to restart and try again. You can delete the Pod now.

Checking for the bug in Docker Version 17

Error description

Docker hangs without any response. Running kubectl describe pod <pod-name> returns the following results:
Warning FailedSync 3m (x408 over 1h) kubelet, 10.179.80.31 error determining status: rpc error: code = DeadlineExceeded desc = context deadline exceeded
The cause is likely to be a bug in Docker version 17. You can use kubectl -n cn-staging delete pod apigateway-6dc48bf8b6-clcwk –force –grace-period=0 to force delete the pod, but you can still see it using docker ps.

Solution

Upgrade Docker to version 18. Version 18 uses a new dockerd version and fixed many bugs.
If the problem persists, submit a ticket for further assistance. We do not recommend that you force delete the pod as this may impact your business.

Checking for Finalizers

Error description

If a Kubernetes resource has the finalizers metadata, it is created by an application and the finalizers field contains an identifier of the application. For example, Rancher-created resources have the finalizers identifier.
To delete this type of resource, the application responsible must clean them up and remove the finalizers identifiers before they can be deleted.

Solution

Use kubectl edit to manually edit the resources to remove finalizers before deleting them.

Check for a bug in an earlier version of kubelet list-watch

We discovered that, when you use Kubernetes v1.8.13, kubelet list-watch has a bug that prevents kubelet from receiving event information after deleting a Pod, which means the Pod is not truly deleted. This leads to the Pod remaining in the Terminating status.
Refer to Updating Clusters for instructions on how to update Kubernetes.

Checking if dockerd and containerd are synchronized

Error description

If you use the AUFS storage driver and the disk is full, the kernel may panic and output the following error message:
aufs au_opts_verify:1597:dockerd[5347]: dirperm1 breaks the protection by the permission bits on the lower branch
If this happens, it may lead to status synchronization issues, and dockerd logs may contain records similar to the following:
Sep 18 10:19:49 VM-1-33-ubuntu dockerd[4822]: time="2019-09-18T10:19:49.903943652+08:00" level=error msg="Failed to log msg \\"\\" for logger json-file: write /opt/docker/containers/54922ec8b1863bcc504f6dac41e40139047f7a84ff09175d2800100aaccbad1f/54922ec8b1863bcc504f6dac41e40139047f7a84ff09175d2800100aaccbad1f-json.log: no space left on device"

Analysis

You can use one of the following methods to find out if dockerd and containerd are in sync.
Use describe pod to obtain the ID of the container. Then, use docker ps to query the status of the container and see if it matches the status from dockerd.
Use docker-container-ctr to query the container status in containerd, as shown below:
$ docker-container-ctr --namespace moby --address /var/run/docker/containerd/docker-containerd.sock task ls |grep a9a1785b81343c3ad2093ad973f4f8e52dbf54823b8bb089886c8356d4036fe0
a9a1785b81343c3ad2093ad973f4f8e52dbf54823b8bb089886c8356d4036fe0 30639 STOPPED
If the status of the container in containerd is stopped or empty and it is running in dockerd, then the container status is not synchronized between dockerd and containerd.

Solution

Temporary solution: run docker container prune or restart dockerd.
Permanent solution: use containerd instead of both containerd and dockerd to work around the bug in dockerd.

Checking for the Daemonset Controller bug

Kubernetes 1.10 and 1.11 have a bug that causes Daemonset Pod to remain in the Terminating status. In this case, Daemonset Controller reuses the predicates logic of scheduler which sorts the nodeSelector array (passed as pointer parameters) from nodeAffinity. This results in spec being different from that stored by apiserver. At the same time, Daemonset Controller uses spec to calculate the hash of Daemonset for version control purposes. This difference in parameter values causes the Pod to get stuck in a loop of launching and stopping.

Solution

Temporary solution: make sure rollingUpdate Daemonset uses nodeSelector rather than nodeAffinity.
Permanent solution: refer to Updating Clusters for instructions on how to update Kubernetes to 1.12.
Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback

Contact Us

Contact our sales team or business advisors to help your business.

Technical Support

Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

7x24 Phone Support
Hong Kong, China
+852 800 906 020 (Toll Free)
United States
+1 844 606 0804 (Toll Free)
United Kingdom
+44 808 196 4551 (Toll Free)
Canada
+1 888 605 7930 (Toll Free)
Australia
+61 1300 986 386 (Toll Free)
EdgeOne hotline
+852 300 80699
More local hotlines coming soon