Test Scheme Introduction

Last updated: 2024-07-31 09:18:34
    This document will introduce how to use TPC-H (Business Intelligence Computing Test) to perform performance testing on Tencent Cloud TCHouse-D. Taking the TPC-H query performance of a 16-core cluster under a 100 GB data set as an example, a reference test scheme is given.

    About TPC-H Performance Test

    TPC-H is a decision support benchmark that consists of a set of business-oriented ad hoc queries and concurrent data modifications. The data it queries and populates in the database is extensively industry-related. This benchmark test demonstrates the ability of a decision support system to examine large amounts of data, perform highly complex queries, and answer critical business questions. The performance metric reported by TPC-H is called TPC-H Composite Query-per-Hour Performance Metric (QphH@Size), which reflects the system's ability to process multiple queries.
    TPC-H simulates a data warehouse of a sales system. The benchmark test includes 22 queries in total, and the main evaluation metric is the response time of each query, that is, the time required from submitting the query to returning the result. TPC-H test results can comprehensively reflect the system's ability to process queries.

    Test Scheme Introduction

    Test Environment Preparation

    Hardware Environment

    In the reference scheme given in this document, the tested cluster includes 1 FE and 3 BEs. The FE/BE node processes are deployed separately. The specific specifications are as follows. It should be noted that in actual testing, such a large amount of hardware resources will not be consumed.
    Node Type
    Node Specifications
    1 FE, standard
    CPU:4 cores
    Memory: 16 GB
    Hard disk: Enhanced SSD Cloud Disk 200 GB
    3 BEs, standard
    CPU:16 cores
    Memory: 64 GB
    Hard disk: Enhanced SSD Cloud Disk 1000 GB

    Software Version

    Tencent Cloud TChouse-D 1.2.7

    Test Script Preparation

    Download the TPC-H toolkit from Toolkit Address and compile it.

    TPC-H 100 G Data Test

    Generate a 100 G data set.

    sh gen-tpch-data.sh -s 100 -c 10
    The data generated is shown in the following table:
    TPC-H table name
    Number of rows
    Region Table
    Country Table
    1 million
    Supplier Table
    20 million
    Parts List
    80 million
    Parts Supply List
    15 million
    Customer Table
    150 million
    Order Table
    600 million
    Order Details Table

    Create a table

    Modify the doris-cluster.conf configuration file.
    Modify configuration: FE_HOST, PASSWORD, DB.
    # cat doris-cluster.conf
    # Any of FE host
    export FE_HOST=''
    # http_port in fe.conf
    export FE_HTTP_PORT=8030
    # query_port in fe.conf
    export FE_QUERY_PORT=9030
    # Doris username
    export USER='root'
    # Doris password
    export PASSWORD=''
    # The database where TPC-H tables located
    export DB='tpch_100g_decimalv3'
    # The scale of testing data
    export SCALE='100g' # only support '100g' or '1t'
    Create a Table:
    sh create-tpch-tables.sh

    Import Data

    sh load-tpch-data.sh
    MySQL [tpch100g]> show data;
    | TableName | Size | ReplicaCount |
    | customer | 1.317 GB | 24 |
    | lineitem | 20.880 GB | 96 |
    | nation | 2.571 KB | 1 |
    | orders | 6.302 GB | 96 |
    | part | 752.470 MB | 24 |
    | partsupp | 4.375 GB | 24 |
    | region | 1.090 KB | 1 |
    | supplier | 85.528 MB | 12 |
    | Total | 33.693 GB | 278 |
    | Quota | 1024.000 TB | 1073741824 |
    | Left | 1023.967 TB | 1073741546 |
    11 rows in set (0.00 sec)


    [root@9 tpch-tools]# sh bin/run-tpch-queries.sh
    q1: 2103
    q2: 305
    q3: 792
    q4: 516
    q5: 1036
    q6: 60
    q7: 493
    q8: 954
    q9: 4411
    q10: 870
    q11: 183
    q12: 1847
    q13: 2886
    q14: 165
    q15: 255
    q16: 398
    q17: 520
    q18: 1665
    q19: 468
    q20: 347
    q21: 1741
    q22: 412
    total time: 22427 ms
    Thus, TCP-H data generation, table creation, import, and query under the 100 GB data set scene is completed.
