tencent cloud

masukan

OOM Causes docker to Stop and Not Restart for Repair

Terakhir diperbarui:2024-05-27 16:08:53

    Problem Description

    In Docker 19 and later versions, when excessive system memory usage causes containerd to encounter an Out of Memory (OOM) situation, it may result in Docker stopping and not restarting automatically. This issue can be reproduced by executing the pkill -9 containerd; systemctl is-active dockerd containerd command. At this point, dockerd will be stopped by systemd.
    The most severe impact could be general nodes becoming NotReady after OOM, and issues with the primary node in an independent cluster could trigger an avalanche effect.

    Problem Analysis

    Initially, the Docker community set the relationship between docker and containerd as dockerd.service BindsTo containerd.service. This causes systemd to actively stop dockerd when containerd is forcibly terminated by the kill -9 command. Even if Restart is set in Docker, recovery is not possible. For more information, see:

    Fixing Incremental Nodes

    Incremental nodes were fixed on April 20, 2023.

    Fixing Legacy Nodes

    For legacy nodes, you can fix the problem with the following script:
    #!/bin/bash
    insert_if_absent() {
    line="${1}"
    lead="$(echo "${line}" | cut -f1 -d=)""="
    if ! grep "^${lead}" /usr/lib/systemd/system/containerd.service > /dev/null 2>&1; then
    sed -i "/^ExecStart=/a${line}" /usr/lib/systemd/system/containerd.service
    fi
    }
    
    insert_if_absent OOMScoreAdjust=-999
    insert_if_absent RestartSec=5
    insert_if_absent Restart=always
    
    sed -i '/BindsTo/d' /usr/lib/systemd/system/dockerd.service
    sed -i 's/^Wants.*/Wants\\=network-online.target containerd.service/' /usr/lib/systemd/system/dockerd.service
    
    systemctl daemon-reload
    You can verify whether the issue of Docker not being able to restart after containerd is forcibly terminated has been successfully fixed by executing the command below. Additionally, you can further verify by executing the docker run command.
    pkill -9 containerd;systemctl is-active dockerd containerd
    
    Hubungi Kami

    Hubungi tim penjualan atau penasihat bisnis kami untuk membantu bisnis Anda.

    Dukungan Teknis

    Buka tiket jika Anda mencari bantuan lebih lanjut. Tiket kami tersedia 7x24.

    Dukungan Telepon 7x24