检测项 | 描述 | 风险等级 | 自愈动作 |
FDPressure | Too many files opened(查看主机的文件描述符数量是否达到最大值的 90%) | low | - |
RuntimeUnhealthy | List containerd task failed | low | RestartRuntime |
KubeletUnhealthy | Call kubelet healthz failed | low | RestartKubelet |
ReadonlyFilesystem | Filesystem is readonly | high | - |
OOMKilling | Process has been oom-killed | high | - |
TaskHung | Task blocked more then beyond the threshold | high | - |
UnregisterNetDevice | Net device unregister | high | - |
KernelOopsDivideError | Kernel oops with divide error | high | - |
KernelOopsNULLPointer | Kernel oops with NULL pointer | high | - |
Ext4Error | Ext4 filesystem error | high | - |
Ext4Warning | Ext4 filesystem warning | high | - |
IOError | IOError | high | - |
MemoryError | MemoryError | high | - |
DockerHung | Task blocked more then beyond the threshold | high | - |
KubeletRestart | Kubelet restart | low | - |
kubectl ceate -f demo-HealthCheckPolicy.yaml
集群中创建自愈规则,YAML 配置如下:apiVersion: config.tke.cloud.tencent.com/v1kind: HealthCheckPolicymetadata:name: test-allnamespace: cls-xxxxxxxx(集群 id)spec:machineSetSelector:matchLabels:key: fake-labelrules:- action: RestartKubeletenabled: truename: FDPressure- action: RestartKubeletautoRepairEnabled: trueenabled: truename: RuntimeUnhealthy- action: RestartKubeletautoRepairEnabled: trueenabled: truename: KubeletUnhealthy- action: RestartKubeletenabled: truename: ReadonlyFilesystem- action: RestartKubeletenabled: truename: OOMKilling- action: RestartKubeletenabled: truename: TaskHung- action: RestartKubeletenabled: truename: UnregisterNetDevice- action: RestartKubeletenabled: truename: KernelOopsDivideError- action: RestartKubeletenabled: truename: KernelOopsNULLPointer- action: RestartKubeletenabled: truename: Ext4Error- action: RestartKubeletenabled: truename: Ext4Warning- action: RestartKubeletenabled: truename: IOError- action: RestartKubeletenabled: truename: MemoryError- action: RestartKubeletenabled: truename: DockerHung- action: RestartKubeletenabled: truename: KubeletRestart
healthCheckPolicyName: test-all
,YAML 配置如下:apiVersion: node.tke.cloud.tencent.com/v1beta1kind: MachineSetspec:type: HosteddisplayName: demo-machinesetreplicas: 2autoRepair: truedeletePolicy: RandomhealthCheckPolicyName: test-allinstanceTypes:- C3.LARGE8subnetIDs:- subnet-xxxxxxxx- subnet-yyyyyyyy......
本页内容是否解决了您的问题?