This article describes the causes that will lead a Pod to become stuck in the ContainerCreating
or Waiting
status and how to troubleshoot this issue. Refer to the following instructions to troubleshoot and solve these issues.
Possible Causes
Incorrect Pod configurations
Volume failed to mount
Insufficient disk space
Node memory fragmentation
The value of Limit is too small or uses the wrong unit
Failure to pull the image
CNI network error
controller-manager exception
New Docker installed without completely uninstalling the old version
Duplicate container names
Troubleshooting
Checking Pod configuration
1. Make sure the image is properly packaged.
2. Make sure the container parameters are configured correctly.
Checking volume mounting
In the following two scenarios, volume mounting issues may cause exceptions:
1. A volume fails to unmount due to Pod float
Analysis
The default volume in a managed Kubernetes cluster is usually a storage class cloud disk. If a node malfunctions and causes kubelet to fail or not be able to communicate with apiserver and the time threshold is reached, the Pods on the node are drained and backup Pods on another node are automatically started. This is called Pod floating. Drained Pods cannot function properly nor are they aware of their states. Therefore, the volume mounted to the node is not properly unmounted.
cloud-controller-manager requires the volume to unmount properly in order to invoke vendor APIs to unmount disks from the node. Pod floating causes cloud-controller-manager to force unmount a volume after the time threshold is reached and mount it to the node where the Pod is scheduled.
Impact
The Pod may spend an extended period of time in ContainerCreating but will launch successfully.
2. Hitting a subpath bug when mounting configmap/secret
If you modify the content of configmap or secret that is already mounted and the container restarts in place, such as restarting after being killed for failing a liveness check, this bug is triggered.
In this case, the container continuously fails to launch. The error messages are as follows:
$ kubectl -n prod get pod -o yaml manage-5bd487cf9d-bqmvm
...
lastState: terminated
containerID: containerd://e6746201faa1dfe7f3251b8c30d59ebf613d99715f3b800740e587e681d2a903
exitCode: 128
finishedAt: 2019-09-15T00:47:22Z
message: 'failed to create containerd task: OCI runtime create failed: container_linux.go:345:
starting container process caused "process_linux.go:424: container init
caused \\"rootfs_linux.go:58: mounting \\\\\\"/var/lib/kubelet/pods/211d53f4-d08c-11e9-b0a7-b6655eaf02a6/volume-subpaths/manage-config-volume/manage/0\\\\\\"
to rootfs \\\\\\"/run/containerd/io.containerd.runtime.v1.linux/k8s.io/e6746201faa1dfe7f3251b8c30d59ebf613d99715f3b800740e587e681d2a903/rootfs\\\\\\"
at \\\\\\"/run/containerd/io.containerd.runtime.v1.linux/k8s.io/e6746201faa1dfe7f3251b8c30d59ebf613d99715f3b800740e587e681d2a903/rootfs/app/resources/application.properties\\\\\\"
caused \\\\\\"no such file or directory\\\\\\"\\"": unknown'
For more information on how to resolve this issue, see pr82784. Checking for insufficient disk space
A Pod uses the CRI APIs to create containers when it launches. This usually involves creating directories and files for the new containers under the data directory. If there is not enough disk space, container creation will fail with the following error messages:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 2m (x4307 over 16h) kubelet, 10.179.80.31 (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to create a sandbox for pod "apigateway-6dc48bf8b6-l8xrw": Error response from daemon: mkdir /var/lib/docker/aufs/mnt/1f09d6c1c9f24e8daaea5bf33a4230de7dbc758e3b22785e8ee21e3e3d921214-init: no space left on device
For more information and further instructions, see Disk Full. Checking for node memory fragmentation
If node memory is severely fragmented or lacks large page memory, requests for more memory will fail even though there is plenty of memory left. For instructions on troubleshooting and solutions, refer to Memory Fragmentation. Checking for limit configuration
Error description
Run kubectl describe pod
and get the following message:
Pod sandbox changed, it will be killed and re-created.
kubelet outputs the following error message:
to start sandbox container for pod ... Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:301: running exec setns process for init caused \\"signal: killed\\"": unknown
Solution
If the value of limit is too small, Sandbox will fail to run. This will cause the Pod to remain in the ContainerCreating or Waiting status. This is usually a memory limit unit issue.
For example, if you used m
as the memory limit unit, then Kubernetes reads it as byte. The correct unit to use is Mi
or M
. If you set a memory limit to 1024m, that translates to 1.024 bytes, which causes a container to be killed by cgroup-oom every it attempts to launch. This results in the Pod remaining in the ContainerCreating state.
Checking for image pull failures
The failure to pull an image produces the same issue. There are many reasons why image pull may fail. The common ones are as follows:
The wrong image is used
kubelet cannot access the image registry. For example, images hosted on gcr.io are more difficult to access from Mainland China.
imagePullSecret is not present or is incorrect when pulling private images.
Image pull times out due to the size of the image. Adjust the value of --runtime-request-timeout
and --image-pull-progress-deadline
.
Pull the image again after checking the above items and check the state of the Pod.
Checking for CNI errors
Make sure that CNI is configured and running properly. If not, you get the following messages:
Cannot configure Pod network
Cannot assign Pod IP address
Checking for controller-manager issues
Make sure the Master kube-controller-manager is running properly. Restart it if it is not.
Checking for existing Docker versions
If the node already has Docker installed or installed Docker without completely uninstalling the old Docker, a Pod may encounter the same issue.
For example, if you have installed Docker multiple times using the following command in CentOS:
Due to the incompatibility issue among components of different versions, dockerd continuously fails to create containers. This results in the Pod remaining in the ContainerCreating status. Use kubectl describe pod
and get the following error messages:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 18m (x3583 over 83m) kubelet, 192.168.4.5 (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "nginx-7db9fccd9b-2j6dh": Error response from daemon: ttrpc: client shutting down: read unix @->@/containerd-shim/moby/de2bfeefc999af42783115acca62745e6798981dff75f4148fae8c086668f667/shim.sock: read: connection reset by peer: unknown
Normal SandboxChanged 3m12s (x4420 over 83m) kubelet, 192.168.4.5 Pod sandbox changed, it will be killed and re-created.
Choose a Docker version to keep and completely uninstall the other versions.
Checking for duplicate container names
Duplicate container names on the same node cause sandbox creation failures, which leads to Pods remaining in the ContainerCreating and Waiting statuses.
Run kubectl describe pod
and get the following error messages:
Warning FailedCreatePodSandBox 2m kubelet, 10.205.8.91 Failed create pod sandbox: rpc error: code = Unknown desc = failed to create a sandbox for pod "lomp-ext-d8c8b8c46-4v8tl": operation timeout: context deadline exceeded
Warning FailedCreatePodSandBox 3s (x12 over 2m) kubelet, 10.205.8.91 Failed create pod sandbox: rpc error: code = Unknown desc = failed to create a sandbox for pod "lomp-ext-d8c8b8c46-4v8tl": Error response from daemon: Conflict. The container name "/k8s_POD_lomp-ext-d8c8b8c46-4v8tl_default_65046a06-f795-11e9-9bb6-b67fb7a70bad_0" is already in use by container "30aa3f5847e0ce89e9d411e76783ba14accba7eb7743e605a10a9a862a72c1e2". You have to remove (or rename) that container to be able to reuse that name.
Change container names and make sure there are no duplicate container names on the same node.
Was this page helpful?