Check Item | Description | Risk Level | Self-Heal Action |
FDPressure | Too many files opened. This is to check whether the number of file descriptors of the server has reached 90% of the maximum value. | low | - |
RuntimeUnhealthy | List containerd task failed | low | RestartRuntime |
KubeletUnhealthy | Call kubelet healthz failed | low | RestartKubelet |
ReadonlyFilesystem | Filesystem is readonly | high | - |
OOMKilling | Process has been oom-killed | high | - |
TaskHung | Task blocked more then beyond the threshold | high | - |
UnregisterNetDevice | Net device unregister | high | - |
KernelOopsDivideError | Kernel oops with divide error | high | - |
KernelOopsNULLPointer | Kernel oops with NULL pointer | high | - |
Ext4Error | Ext4 filesystem error | high | - |
Ext4Warning | Ext4 filesystem warning | high | - |
IOError | IOError | high | - |
MemoryError | MemoryError | high | - |
DockerHung | Task blocked more then beyond the threshold | high | - |
KubeletRestart | Kubelet restart | low | - |
kubectl ceate -f demo-HealthCheckPolicy.yaml
command to create self-heal rules for a cluster:apiVersion: config.tke.cloud.tencent.com/v1kind: HealthCheckPolicymetadata:name: test-allnamespace: cls-xxxxxxxx (the ID of the cluster)spec:machineSetSelector:matchLabels:key: fake-labelrules:- action: RestartKubeletenabled: truename: FDPressure- action: RestartKubeletautoRepairEnabled: trueenabled: truename: RuntimeUnhealthy- action: RestartKubeletautoRepairEnabled: trueenabled: truename: KubeletUnhealthy- action: RestartKubeletenabled: truename: ReadonlyFilesystem- action: RestartKubeletenabled: truename: OOMKilling- action: RestartKubeletenabled: truename: TaskHung- action: RestartKubeletenabled: truename: UnregisterNetDevice- action: RestartKubeletenabled: truename: KernelOopsDivideError- action: RestartKubeletenabled: truename: KernelOopsNULLPointer- action: RestartKubeletenabled: truename: Ext4Error- action: RestartKubeletenabled: truename: Ext4Warning- action: RestartKubeletenabled: truename: IOError- action: RestartKubeletenabled: truename: MemoryError- action: RestartKubeletenabled: truename: DockerHung- action: RestartKubeletenabled: truename: KubeletRestart
MachineSet
parameter to healthCheckPolicyName: test-all
in the YAML configuration file:apiVersion: node.tke.cloud.tencent.com/v1beta1kind: MachineSetspec:type: HosteddisplayName: demo-machinesetreplicas: 2autoRepair: truedeletePolicy: RandomhealthCheckPolicyName: test-allinstanceTypes:- C3.LARGE8subnetIDs:- subnet-xxxxxxxx- subnet-yyyyyyyy......
Was this page helpful?