When deploying or running business, you may trigger high-risk operations at different levels, leading to service failures to different degrees. To help you estimate and avoid operational risks, this document describes the consequences of the high-risk operations and corresponding solutions. Below you can find the high-risk operations you may trigger when dealing with clusters, networking and load balancing, logs, and cloud disks.
Category |
High-risk Operation |
Consequence |
Solution |
Master and etcd nodes |
Modifying the security groups of nodes in a cluster |
Master node may become unavailable |
Configure security groups as recommended by Tencent Cloud |
Node expires or is terminated |
The master node becomes unavailable |
Unrecoverable |
Reinstalling operating system |
Master components get deleted |
Unrecoverable |
Upgrading master or etcd component version on your own |
Cluster may become unavailable |
Roll back to the original version |
Deleting or formatting core directory data such as node /etc/kubernetes |
The master node becomes unavailable |
Unrecoverable |
Changing node IP |
The master node becomes unavailable |
Change back to the old IP |
Modifying parameters of core components, e.g. etcd, kube-apiserver, docker, etc., on your own |
Master node may become unavailable |
Configure parameters as recommended by Tencent Cloud |
Changing master or etcd certificate on your own |
Cluster may become unavailable |
Unrecoverable |
Worker node |
Modifying the security groups of nodes in a cluster |
Nodes may become unavailable |
Configure security groups as recommended by Tencent Cloud |
Node expires or is terminated |
The node becomes unavailable |
Unrecoverable |
Reinstalling operating system |
Node components get deleted |
Remove the node and add it back to the cluster |
Upgrading node component version on your own |
Node may become unavailable |
Roll back to the original version |
Changing node IP |
Node becomes unavailable |
Change back to the old IP |
Modifying parameters of core components, e.g. etcd, kube-apiserver, docker, etc., on your own |
Node may become unavailable |
Configure parameters as recommended by Tencent Cloud |
Modifying operating system configuration |
Node may become unavailable |
Try to restore the configurations or delete the node and purchase a new one |
Others |
Modifying permissions in CAM |
Some cluster resources, such as cloud load balancers, may not be able to be created |
Restore the permissions |
High-risk Operation |
Consequence |
Solution |
Modifying kernel parameters net.ipv4.ip_forward=0 |
Network not connected |
Modify kernel parameters to net.ipv4.ip_forward=1 |
Modifying kernel parameter net.ipv4.tcp_tw_recycle = 1 |
NAT exception |
Modify kernel parameter net.ipv4.tcp_tw_recycle = 0 |
Container CIDR’s UDP port 53 is not opened to the Internet in the security group configuration of the node |
In-cluster DNS cannot work normally |
Configure security groups as recommended by Tencent Cloud |
Modifying or deleting LB tags added in TKE |
A new LB is purchased |
Restore the LB tags |
Creating custom listeners in TKE-managed LB through LB console |
Modification gets reset by TKE |
Automatically create listeners through service YAML |
Binding custom backend rs in TKE-managed LB through LB console |
Prohibit manual binding of backend rs |
Modifying certificate of TKE-managed LB through LB console |
Automatically manage certificate through ingress YAML |
Modifying TKE-managed LB listener name through LB console |
Prohibit modification of TKE-managed LB listener name |
Was this page helpful?