tencent cloud

Feedback

Resource Specification Selection and Optimization Suggestions

Last updated: 2024-08-02 12:44:13
    This document will introduce how to choose the instance specifications of Tencent Cloud TCHouse-D and provide optimization suggestions when resources are insufficient.
    Note:
    For different types of businesses, it is recommended to configure resource isolation policies or split clusters, for example, one cluster for real-time report business, and one cluster for real-time risk control business.
    When a business supports multiple ToB tenants simultaneously, it is recommended to isolate resources or split clusters based on the actual situation to reduce mutual interference, for example, providing SaaS services for 200 tenants simultaneously, spliting into 4 clusters, and each supporting 50 tenants.

    Resource Specifications and Adaptation Scenes

    When purchasing a Tencent Cloud TCHouse-D cluster, you need to select the computing resource specifications and storage resource specifications of the FE node and BE node, and choose whether to enable high availability.

    Resource Specifications and Recommended Scenes

    Model Type
    Compute Node Specifications
    Recommended Storage Type
    Recommended Scenes
    Standard
    4-core 16 GB
    High Performance Cloud Disk
    SSD
    Enhanced SSD Cloud Disk
    Limited to POC feature testing or personal learning use, mainly used to experience and test product capabilities
    8-core 32 GB
    High Performance Cloud Disk
    SSD
    Enhanced SSD Cloud Disk
    Recommended for the test environment, supporting medium data scale and rather complex data analysis
    16-core 64 GB
    High Performance Cloud Disk
    SSD Cloud Disk
    Enhanced SSD Cloud Disk
    Recommended for the production environment, supporting data analysis of larger scale and more complex scenes, as well as high concurrency scenes
    32 cores and above
    High Performance Cloud Disk
    SSD
    Enhanced SSD Cloud Disk
    Recommended for the production environment, supporting large-scale, highly complex data analysis, high concurrency, and other scenes

    High Availability and Node Quantity Suggestions

    Scene
    High Availability Selection
    Recommended Minimum Number of FE Nodes
    Recommended Minimum Number of BE Nodes
    POC feature testing
    Non-high availability
    1
    3
    Production scenario (query high availability)
    Read high availability
    3 FE nodes at least
    3 BE nodes at least, on-demand scaling
    Production scenario (query-write high availability)
    Read-write high availability
    5 FE nodes at least
    3 BE nodes at least, on-demand scaling
    Cross-AZ high availability scenario
    Read-write high availability + 3 AZ deployment
    5 FE nodes at least
    3 BE nodes at least, scaling in increments of 3

    Examples of Resource Specification Selection

    Note:
    The following content is for reference only. The performance may vary greatly in different business scenes.
    1. Scene 1: Product feature verification and simple data analysis
    FE: High availability not enabled, single node, 4-core 16 GB
    BE: 3 nodes, 4-core 16 GB per node
    2. Scene 2: Simple query of small- to medium-sized data, such as hundreds of GB of data, less than 1,000 QPS
    FE: High availability not enabled, single node, 8-core 32 GB.
    BE: 3 nodes, 8-core 32 GB per node
    3. Scene 3: Production scene, TB-level data volume, involving complex queries such as multi-table join and GROUP BY
    FE: High availability enabled, 3 nodes, 16-core 64 GB per node
    BE: 3 nodes, 16-core 64 GB per node
    4. Scene 4: Production business, TB-level data volume, complex queries, and a large number of high-concurrency point queries
    FE: High availability enabled, 3 nodes, 16-core 64 GB per node
    BE: 6 nodes, 16-core 64 GB per node

    Resource Monitoring and Optimization Suggestions

    Operations such as large-scale data import, data query, concurrent query, and multi-taweweweqweqweqweqeble join will cause a large amount of CPU and memory usage. If the CPU/memory utilization continues to exceed 85%, the cluster will become unstable. It is recommended to optimize the business or change the configuration.wewaeqeadwadasdwadasdwawdaw

    Resource Usage Monitoring

    You can go to Cluster Management> Cluster Monitoring to check the CPU and memory usage of each BE and FE node, as shown in the following figure.
    Cluster Monitoring > BE metrics
    
    
    
    Cluster Monitoring > FE metrics
    
    
    

    Resource Scale-out Suggestions

    When the CPU and memory usage of FE and BE exceeds 85% continuously, you need to consider upgrade or scale out resources.
    Note:
    The main reasons for the high CPU and memory usage of FE and BE are as follows:
    High usage of FE CPU: Multiple concurrent queries and a large number of complex queries.
    High usage of FE memory: Too much metadata (unreasonable partitioning) and frequent table deletion.
    High usage of BE CPU: Large amounts of data imported and large amounts of complex queries (such as aggregate queries).
    High usage of BE memory: Large amounts of data imported and large amounts of complex queries (such as aggregate queries).
    Common Scenes
    Resource Consumption Performance
    Optimization Suggestions for the Usage Continuously Exceeding 85%
    Too much data continuously imported
    The CPU and memory of FE and BE will be highly used.
    If the bottleneck is FE: Vertical upgrade is recommended.
    If the bottleneck is BE: Vertical upgrade is recommended.
    Frequent point checks/high concurrency
    The CPU of FE and BE will be highly used.
    If the bottleneck is FE: Vertical upgrade is recommended.
    If the bottleneck is BE: Vertical upgrade is recommended.
    Frequent metadata changes and deletions
    The memory of FE will be highly used.
    It is recommended to upgrade FE vertically and increase memory.
    Many multi-table join/aggregation queries
    The CPU and memory of BE will be highly used.
    It is recommended to horizontally scale out BE. Vertical upgrade is also an option.
    Data multi-concurrency writing
    The CPU and memory of BE will be highly used.
    It is recommended to horizontally scale out BE. Vertical upgrade is also an option.

    Cluster Scaling Must-Knows

    Operation Type
    Must-Knows
    Scale-out
    During the horizontal scale-out process, system reading and writing are still possible, but there may be some jitters. The operation takes about 5 to 15 minutes. Choose to perform it during non-business peak hours.
    When both the amount of data storage and the amount of queries increase relatively, horizontal scale-out is the preferred option.
    Scale-in
    Only one type of nodes can be selected for scale-in operation at a time, such as FE scale-in only or BE scale-in only.
    FE scale-in: Multiple FE nodes can be scaled in at one time.
    BE scale-in: Scaling in multiple BE nodes at one time may result in data loss or be time-consuming. It is recommended to scale the node in one by one.
    During the scale-in process, system reading and writing are still possible, but there may be some jitters.
    Vertical upgrade/downgrade
    The scale up/down system cannot be read or written.
    Computing specifications can be upgraded or downgraded; storage specifications can only be upgraded.
    The results of specification adjustment are effective for all nodes in a cluster.

    Business Optimization Suggestions

    Optimization Type
    Optimization Instructions
    Usage recommendations
    If you often perform point queries on a column and the column has a high cardinality, it is recommended to create a Bloom filter index on this column.
    If you often perform fixed-mode aggregate queries on a table, it is recommended to create a materialized view on this table.
    It is recommended to divide partitions and buckets reasonably according to business scenes to avoid excessive FE memory usage due to too many partitions and buckets.
    For SQL queries of general data exploration, if not all data is needed, it is recommended to add a limit number for the records returned, which can also speed up the query.
    It is recommended to use CSV for data import and avoid JSON format.
    Try-to-avoid
    Avoid select * queries.
    Avoid enabling profiles globally (This will result in more resource consumption. It is recommended to enable profiles for the demanding SQL statements).
    When creating a table: Avoid enabling merge_on_write (this feature is not yet mature).
    When creating a table: Avoid enabling auto bucket (this feature is not yet mature).
    When creating a table: Avoid opening a dynamic Schema table (this feature is not yet mature).
    Avoid the join of multiple large tables. When multiple large tables are joined:
    Every two large tables can be joined through Colocation Join.
    Or use pre-aggregate tables and indexes to speed up queries.
    Parameter optimization
    When a SQL statement involves multiple concurrent operations, it is recommended to increase the parallel_fragment_exec_instance_num parameter. The default value of this parameter is 200. It can be increased by multiples (such as 400 and 800). It is recommended to control it within 2,000.
    It is recommended to control the compaction speed. If the monitoring metric base_compaction_score exceeds 200 and continues to rise (for details, see the Cluster Monitoring-BE Indicators-BE page), you can increase the compaction_task_num_per_disk parameter configuration (the system default is 2, which can be increased to 4 or greater).
    
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support