tencent cloud

Feedback

High-risk Operations of Container Service

Last updated: 2024-12-11 18:14:24
    When deploying or running business, you may trigger high-risk operations at different levels, leading to service failures to different degrees. To help you estimate and avoid operational risks, this document describes the consequences of the high-risk operations and corresponding solutions. Below you can find the high-risk operations you may trigger when dealing with clusters, networking and load balancing, logs, and cloud disks.

    Clusters

    Category
    High-risk Operation
    Consequence
    Solution
    Master and etcd nodes
    Modifying the security groups of nodes in a cluster
    Master node may become unavailable
    Configure security groups as recommended by Tencent Cloud
    Node expires or is terminated
    The master node becomes unavailable
    Unrecoverable
    Reinstalling operating system
    Master components get deleted
    Unrecoverable
    Upgrading master or etcd component version on your own
    Cluster may become unavailable
    Roll back to the original version
    Deleting or formatting core directory data such as node /etc/kubernetes
    The master node becomes unavailable
    Unrecoverable
    Changing node IP
    The master node becomes unavailable
    Change back to the old IP
    Modifying parameters of core components, e.g. etcd, kube-apiserver, docker, etc., on your own
    Master node may become unavailable
    Configure parameters as recommended by Tencent Cloud
    Changing master or etcd certificate on your own
    Cluster may become unavailable
    Unrecoverable
    Worker node
    Modifying the security groups of nodes in a cluster
    Nodes may become unavailable
    Configure security groups as recommended by Tencent Cloud
    Node expires or is terminated
    The node becomes unavailable
    Unrecoverable
    Reinstalling operating system
    Node components get deleted
    Remove the node and add it back to the cluster
    Upgrading node component version on your own
    Node may become unavailable
    Roll back to the original version
    Changing node IP
    Node becomes unavailable
    Change back to the old IP
    Modifying parameters of core components, e.g. etcd, kube-apiserver, docker, etc., on your own
    Node may become unavailable
    Configure parameters as recommended by Tencent Cloud
    Modifying operating system configuration
    Node may become unavailable
    Try to restore the configurations or delete the node and purchase a new one
    Others
    Modifying permissions in CAM
    Some cluster resources, such as cloud load balancers, may not be able to be created
    Restore the permissions
    

    Networking and Load Balancing

    High-risk Operation
    Consequence
    Solution
    Modifying kernel parameters net.ipv4.ip_forward=0
    Network not connected
    Modify kernel parameters to net.ipv4.ip_forward=1
    Modifying kernel parameter net.ipv4.tcp_tw_recycle = 1
    NAT exception
    Modify kernel parameter net.ipv4.tcp_tw_recycle = 0
    Container CIDR’s UDP port 53 is not opened to the Internet in the security group configuration of the node
    In-cluster DNS cannot work normally
    Configure security groups as recommended by Tencent Cloud
    Modifying or deleting LB tags added in TKE
    A new LB is purchased
    Restore the LB tags
    Creating custom listeners in TKE-managed LB through LB console
    Modification gets reset by TKE
    Automatically create listeners through service YAML
    Binding custom backend rs in TKE-managed LB through LB console
    Prohibit manual binding of backend rs
    Modifying certificate of TKE-managed LB through LB console
    Automatically manage certificate through ingress YAML
    Modifying TKE-managed LB listener name through LB console
    Prohibit modification of TKE-managed LB listener name
    

    Logs

    High-risk Operation
    Consequence
    Solution
    Notes
    Deleting the /tmp/ccs-log-collector/pos directory of the host
    Log gets collected again
    None
    Files in Pod record where they are collected
    Deleting the /tmp/ccs-log-collector/buffer directory of the host
    Log gets lost
    None
    Buffer contains log cache file
    

    Cloud Disks

    High-risk Operation
    Consequence
    Solution
    Manually unmounting cloud disks through console
    Writing to Pod reports IO errors
    Delete the mount directory of the node and reschedule the Pod
    Unmounting disk mounting path on the node
    Pod gets written to the local disk
    Re-mount the corresponding directory onto Pod
    Directly operating CBS block device on the node
    Pod gets written to the local disk
    None
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support