Memory Fragmentation

This article describes how to identify if a TKE cluster issue is caused by memory fragmentation and how to troubleshoot it.
Problem Analysis
If memory page allocation fails, the memory kernel outputs the following error message:
mysqld: page allocation failure. order:4, mode:0x10c0d0
mysqld: application requesting memory.
order: number of requested sequential memory pages (2^order). This example has an order of 4, which means 2^4 = 16 sequential pages.
mode: memory allocation mode marker. This is defined in the kernel source code file include/linux/gfp.h and usually the result of the AND operation on multiple markers. Different kernels have different mode markers. For example, GFP_KERNEL in the new kernel is the result of __GFP_RECLAIM | __GFP_IO | __GFP_FS, and __GFP_RECLAIM is the result of ___GFP_DIRECT_RECLAIM | ___GFP_KSWAPD_RECLAIM.
Note: 
When the value of order is 0, the system has no available memory.
When the value of order is large, the memory is fragmented, and no sequential large memory page can be allocated.
Error Description
Container fails to launch
Kubernetes creates netns for each Pod to isolate the network namespace. When the kernel initializes netns, it creates a cache for the nf_conntrack table, which needs large memory pages. If system memory is already fragmented, kernel will output the following error message due to the failure to allocate large memory pages (v2.6.33 - v4.6):
runc:[1:CHILD]: page allocation failure: order:6, mode:0x10c0d0
The pod remains in the ContainerCreating status and dockerd fails to launch containers. The following are the related log entries:
Jan 23 14:15:31 dc05 dockerd: time="2019-01-23T14:15:31.288446233+08:00" level=error msg="containerd: start container" error="oci runtime error: container_linux.go:247: starting container process caused \\"process_linux.go:245: running exec setns process for init caused \\\\\\"exit status 6\\\\\\"\\"\\n" id=5b9be8c5bb121264899fac8d9d36b02150269d41ce96ba6ad36d70b8640cb01c
Jan 23 14:15:31 dc05 dockerd: time="2019-01-23T14:15:31.317965799+08:00" level=error msg="Create container failed with error: invalid header field value \\"oci runtime error: container_linux.go:247: starting container process caused \\\\\\"process_linux.go:245: running exec setns process for init caused \\\\\\\\\\\\\\"exit status 6\\\\\\\\\\\\\\"\\\\\\"\\\\n\\""
kubelet log entries are as follows:
Jan 23 14:15:31 dc05 kubelet: E0123 14:15:31.352386   26037 remote_runtime.go:91] RunPodSandbox from runtime service failed: rpc error: code = 2 desc = failed to start sandbox container for pod "matchdataserver-1255064836-t4b2w": Error response from daemon: {"message":"invalid header field value \\"oci runtime error: container_linux.go:247: starting container process caused \\\\\\"process_linux.go:245: running exec setns process for init caused \\\\\\\\\\\\\\"exit status 6\\\\\\\\\\\\\\"\\\\\\"\\\\n\\""}
Jan 23 14:15:31 dc05 kubelet: E0123 14:15:31.352496   26037 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "matchdataserver-1255064836-t4b2w_basic(485fd485-1ed6-11e9-8661-0a587f8021ea)" failed: rpc error: code = 2 desc = failed to start sandbox container for pod "matchdataserver-1255064836-t4b2w": Error response from daemon: {"message":"invalid header field value \\"oci runtime error: container_linux.go:247: starting container process caused \\\\\\"process_linux.go:245: running exec setns process for init caused \\\\\\\\\\\\\\"exit status 6\\\\\\\\\\\\\\"\\\\\\"\\\\n\\""}
Jan 23 14:15:31 dc05 kubelet: E0123 14:15:31.352518   26037 kuberuntime_manager.go:618] createPodSandbox for pod "matchdataserver-1255064836-t4b2w_basic(485fd485-1ed6-11e9-8661-0a587f8021ea)" failed: rpc error: code = 2 desc = failed to start sandbox container for pod "matchdataserver-1255064836-t4b2w": Error response from daemon: {"message":"invalid header field value \\"oci runtime error: container_linux.go:247: starting container process caused \\\\\\"process_linux.go:245: running exec setns process for init caused \\\\\\\\\\\\\\"exit status 6\\\\\\\\\\\\\\"\\\\\\"\\\\n\\""}
Jan 23 14:15:31 dc05 kubelet: E0123 14:15:31.352580   26037 pod_workers.go:182] Error syncing pod 485fd485-1ed6-11e9-8661-0a587f8021ea ("matchdataserver-1255064836-t4b2w_basic(485fd485-1ed6-11e9-8661-0a587f8021ea)"), skipping: failed to "CreatePodSandbox" for "matchdataserver-1255064836-t4b2w_basic(485fd485-1ed6-11e9-8661-0a587f8021ea)" with CreatePodSandboxError: "CreatePodSandbox for pod \\"matchdataserver-1255064836-t4b2w_basic(485fd485-1ed6-11e9-8661-0a587f8021ea)\\" failed: rpc error: code = 2 desc = failed to start sandbox container for pod \\"matchdataserver-1255064836-t4b2w\\": Error response from daemon: {\\"message\\":\\"invalid header field value \\\\\\"oci runtime error: container_linux.go:247: starting container process caused \\\\\\\\\\\\\\"process_linux.go:245: running exec setns process for init caused \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"exit status 6\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"\\\\\\\\\\\\\\"\\\\\\\\n\\\\\\"\\"}"
Jan 23 14:15:31 dc05 kubelet: I0123 14:15:31.372181   26037 kubelet.go:1916] SyncLoop (PLEG): "matchdataserver-1255064836-t4b2w_basic(485fd485-1ed6-11e9-8661-0a587f8021ea)", event: &pleg.PodLifecycleEvent{ID:"485fd485-1ed6-11e9-8661-0a587f8021ea", Type:"ContainerDied", Data:"5b9be8c5bb121264899fac8d9d36b02150269d41ce96ba6ad36d70b8640cb01c"}
Jan 23 14:15:31 dc05 kubelet: W0123 14:15:31.372225   26037 pod_container_deletor.go:77] Container "5b9be8c5bb121264899fac8d9d36b02150269d41ce96ba6ad36d70b8640cb01c" not found in pod's containers
Jan 23 14:15:31 dc05 kubelet: I0123 14:15:31.678211   26037 kuberuntime_manager.go:383] No ready sandbox for pod "matchdataserver-1255064836-t4b2w_basic(485fd485-1ed6-11e9-8661-0a587f8021ea)" can be found. Need to start a new one
Use cat /proc/buddyinfo to view slab. If there is no large memory available, you will see a lot of 0s, as shown below:
$ cat /proc/buddyinfo
Node 0, zone      DMA      1      0      1      0      2      1      1      0      1      1      3
Node 0, zone    DMA32   2725    624    489    178      0      0      0      0      0      0      0
Node 0, zone   Normal   1163   1101    932    222      0      0      0      0      0      0      0
System OOM
Memory fragmentation leads to a lack of large memory pages. This causes application memory allocation failures even though there is plenty of system memory available. The system will assume it is out of memory and try to terminate processes in order to release memory, which leads to system OOM errors.
Directions
1. Periodically drop the cache or do so when there is a shortage of large memory pages.
echo 3 > /proc/sys/vm/drop_caches
2. Run the following command to compact the memory:
Note: 
 This operation is resource intensive and may cause business interruptions.
echo 1 > /proc/sys/vm/compact_memory

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

tencent cloud

Problem Analysis

Error Description

Container fails to launch

System OOM

Directions

About Tencent Cloud

Help & Support

Resources

User Center

tencent cloud

Sign Up

Log in

Compute

Microservice

Data Migration

Database SaaS Tool

Data Security

Application Security

Big Data

Tencent Big Model

Internet of Things

Stream Services

Cloud Real-time Rendering

Cloud Resource Management

More

Edge Computing

Serverless

Relational Database

Networking

Business Security

Domains & Websites

Face Recognition

AI Platform Service

Middleware

Media On-Demand

Game Services

Management and Audit Tools

Container

Essential Storage Service

Enterprise Distributed DBMS

CDN and Acceleration

Security Services

Enterprise Applications

Voice Technology

Natural Language Processing

Communication

Media Process Services

Education Sevices

Developer Tools

Distributed cloud

Data Process and Analysis

NoSQL Database

Network Security

Cloud Security

Office Collaboration

Image Creation

Optical Character Recognition

Interactive Video Services

Media SDK

Medical Services

Monitor and Operation

Problem Analysis

Error Description

Container fails to launch

System OOM

Directions

About Tencent Cloud

Help & Support

Resources

User Center