tencent cloud

Feedback

OOM Causes docker to Stop and Not Restart for Repair

Last updated: 2024-05-27 16:08:53

    Problem Description

    In Docker 19 and later versions, when excessive system memory usage causes containerd to encounter an Out of Memory (OOM) situation, it may result in Docker stopping and not restarting automatically. This issue can be reproduced by executing the pkill -9 containerd; systemctl is-active dockerd containerd command. At this point, dockerd will be stopped by systemd.
    The most severe impact could be general nodes becoming NotReady after OOM, and issues with the primary node in an independent cluster could trigger an avalanche effect.

    Problem Analysis

    Initially, the Docker community set the relationship between docker and containerd as dockerd.service BindsTo containerd.service. This causes systemd to actively stop dockerd when containerd is forcibly terminated by the kill -9 command. Even if Restart is set in Docker, recovery is not possible. For more information, see:

    Fixing Incremental Nodes

    Incremental nodes were fixed on April 20, 2023.

    Fixing Legacy Nodes

    For legacy nodes, you can fix the problem with the following script:
    #!/bin/bash
    insert_if_absent() {
    line="${1}"
    lead="$(echo "${line}" | cut -f1 -d=)""="
    if ! grep "^${lead}" /usr/lib/systemd/system/containerd.service > /dev/null 2>&1; then
    sed -i "/^ExecStart=/a${line}" /usr/lib/systemd/system/containerd.service
    fi
    }
    
    insert_if_absent OOMScoreAdjust=-999
    insert_if_absent RestartSec=5
    insert_if_absent Restart=always
    
    sed -i '/BindsTo/d' /usr/lib/systemd/system/dockerd.service
    sed -i 's/^Wants.*/Wants\\=network-online.target containerd.service/' /usr/lib/systemd/system/dockerd.service
    
    systemctl daemon-reload
    You can verify whether the issue of Docker not being able to restart after containerd is forcibly terminated has been successfully fixed by executing the command below. Additionally, you can further verify by executing the docker run command.
    pkill -9 containerd;systemctl is-active dockerd containerd
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support