tencent cloud

Feedback

Practice Tutorial for Cluster Capacity Planning

Last updated: 2024-11-07 15:05:18
    When you use the TDMQ for CKafka, the main specifications include bandwidth and storage, as well as availability zone (AZ) distribution and the number of partitions. These metrics determine the cluster's load capability to some extent. However, in actual operations, due to differences in business scenarios, the actual load of the cluster may be affected by various factors, such as the message size, whether the messages are compressed, message sending/receiving ratio, number of Topic replicas, and their key attributes. Therefore, it is not comprehensive enough to simply use the cluster's bandwidth and storage ratio as the sole judgment metrics for cluster scaling.
    To better ensure the stable operation of business and to plan and manage cluster capacity reasonably, advanced monitoring currently provides the cluster load metric. This metric can help you obtain the current cluster's load status in a simpler way, thus serving as a reference for assessing whether the current CKafka cluster needs to be scaled out.

    Applicable Scenarios

    CKafka Pro Edition.
    In some special business scenarios, the bandwidth utilization is low but the cluster load is high. Therefore, the cluster bandwidth needs to be scaled out based on the cluster load metric.

    Metric Viewing Path

    View the overall cluster load at the node level. For more information, see Querying Advanced Monitoring (Pro Edition).

    Reference Policies

    To ensure the stability of your production business and the processing performance of the CKafka cluster, it is recommended to reasonably plan the cluster capacity according to the deployment method and load status of the cluster. If the cluster load exceeds the following reference value, it is recommended to increase the cluster bandwidth specifications promptly.
    Single-AZ deployment
    When the cluster is deployed in a single AZ, it is recommended that the maximum cluster load value be kept about 70%.
    Multi-AZ deployment
    When the cluster is deployed in multiple AZs, a certain level of redundancy needs to be considered so that if an unexpected exception occurs in one AZ, the remaining AZs can handle the business load normally. For example:
    Two-AZ deployment: When a single AZ is unavailable, the cluster has half of its nodes remaining. Considering the 70% utilization, it is recommended to keep the normal load of the cluster below 35%.
    Three-AZ deployment: When a single AZ is unavailable, the cluster has 2/3 of its nodes remaining. Considering the 70% utilization, it is recommended to keep the normal load of the cluster below 47%.
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support