tencent cloud

Feedback

Data Partition and Bucket

Last updated: 2024-06-27 10:54:28
    In order to process storage and computing of large amounts of data, Doris applies the principle of divide-and-conquer to partition data.
    Doris supports two levels of data partition. The first level is Partition, including range partition and list partition. The second level is Bucket, including Hash partition. Both partition and bucket are horizontal divisions of data.
    It is also possible to use only one level of partition. When using the first level of partition, only Bucket partition is supported. The partition and bucket are introduced as follows separately.

    Partition

    Partition is used to divide data into different intervals, which logically can be understood as dividing the original table into multiple sub-tables. It is convenient to manage data by partition, for example, deleting data is faster.
    You can specify one or more columns as the partition columns, but they must be KEY columns. The usage of multi-column partitions is introduced in the multi-column partition below.
    Regardless of the type the partition columns, double quotes are required for partition values.
    There is no theoretical limit on the number of partitions.
    When you don't use Partition to create a table, the system will automatically generate a Partition with the same name as the table. This Partition contains all the data in the table, and is neither visible to users nor modifiable.
    Partition shouldnot have overlappingranges.

    Range Partition

    Partition supports specifying only the upper bound by VALUES LESS THAN (...). The system will use the upper bound of the previous partition as the lower bound of the partition, and generate a left-closed and right-open interval. It also supports specifying both the upper and lower bounds by VALUES [...), and generate a left-closed right-open interval.
    It is easier to understand by specifying the upper and lower bounds at the same time through VALUES [...) . Here is an example to illustrate the change of partition range when using VALUES LESS THAN (...) statement for adding or deleting partition .
    -- Range Partition
    
    CREATE TABLE IF NOT EXISTS example_db.expamle_range_tbl
    (
    `user_id` LARGEINT NOT NULL COMMENT "user id",
    `date` DATE NOT NULL COMMENT "data import date",
    `timestamp` DATETIME NOT NULL COMMENT "data import timestamp",
    `city` VARCHAR(20) COMMENT "user's city",
    `age` SMALLINT COMMENT "user's age",
    `sex` TINYINT COMMENT "user's gender",
    `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "last visit time of user",
    `cost` BIGINT SUM DEFAULT "0" COMMENT "total consumption of user",
    `max_dwell_time` INT MAX DEFAULT "0" COMMENT "user's maximum stay time",
    `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "user's minimum stay time"
    )
    ENGINE=OLAP
    AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`)
    PARTITION BY RANGE(`date`)
    (
    PARTITION `p201701` VALUES LESS THAN ("2017-02-01"),
    PARTITION `p201702` VALUES LESS THAN ("2017-03-01"),
    PARTITION `p201703` VALUES LESS THAN ("2017-04-01")
    )
    DISTRIBUTED BY HASH(`user_id`) BUCKETS 16
    PROPERTIES
    (
    "replication_num" = "3"
    );
    As shown in the expamle_range_tbl example, after the table is created, view the partition, and you will find the following three partitions are automatically generated:
    show partitions from expamle_range_tbl;
    p201701: [MIN_VALUE, 2017-02-01)
    p201702: [2017-02-01, 2017-03-01)
    p201703: [2017-03-01, 2017-04-01)
    When we use SQL to add a partition - p201705 VALUES LESS THAN ("2017-06-01"):
    alter table expamle_range_tbl add partition p201705 VALUES LESS THAN ("2017-06-01");
    show partitions from expamle_range_tbl;
    The partition result is as follows:
    p201701: [MIN_VALUE, 2017-02-01)
    p201702: [2017-02-01, 2017-03-01)
    p201703: [2017-03-01, 2017-04-01)
    p201705: [2017-04-01, 2017-06-01)
    Now we delete the partition - p201703:
    alter table expamle_range_tbl drop partition p201703;
    show partitions from expamle_range_tbl;
    The partition result is as follows:
    p201701: [MIN_VALUE, 2017-02-01)
    p201702: [2017-02-01, 2017-03-01)
    p201705: [2017-04-01, 2017-06-01)
    Note
    Note that the partition range of p201702 and p201705 hasn't changed. However, and there is a gap between these two partitions: [2017-03-01, 2017-04-01). That is to say, it is impossible to import if the data range is within this gap range.
    Now we go on and delete the partition - p201702:
    alter table expamle_range_tbl drop partition p201702;
    show partitions from expamle_range_tbl;
    The partition result is as follows:
    p201701: [MIN_VALUE, 2017-02-01)
    p201705: [2017-04-01, 2017-06-01)
    Note
    The gap range expands to: [2017-02-01, 2017-04-01).
    Then we add a partition - p201702new VALUES LESS THAN ("2017-03-01"):
    alter table expamle_range_tbl add partition p201702new VALUES LESS THAN ("2017-03-01");
    show partitions from expamle_range_tbl;
    The partition result is as follows:
    p201701: [MIN_VALUE, 2017-02-01)
    p201702new: [2017-02-01, 2017-03-01)
    p201705: [2017-04-01, 2017-06-01)
    Note
    We can see that the gap range shrinks to: [2017-03-01, 2017-04-01).
    Now we delete partition p201701, and add partition p201612 VALUES LESS THAN ("2017-01-01"):
    alter table expamle_range_tbl drop partition p201701;
    alter table expamle_range_tbl add partition p201612 VALUES LESS THAN ("2017-01-01");
    show partitions from expamle_range_tbl;
    The partition result is as follows:
    p201612: [MIN_VALUE, 2017-01-01)
    p201702new: [2017-02-01, 2017-03-01)
    p201705: [2017-04-01, 2017-06-01)
    Note
    That is, a new gap range appears: [2017-01-01, 2017-02-01).
    In summary, deletion of a partition will not change the range of existing partitions, but might result in gaps. When a partition is added via the VALUES LESS THAN statement, the lower bound of one partition is the upper bound of its previous partition.

    Multi-column partition

    In addition to the single-column partition we saw above, Multi-column partitionis also supported. Here's an example:
    -- Range Partition
    
    CREATE TABLE IF NOT EXISTS example_db.expamle_range_multi_partiton_key_tbl
    (
    `user_id` LARGEINT NOT NULL COMMENT "user id",
    `date` DATE NOT NULL COMMENT "data import date",
    `timestamp` DATETIME NOT NULL COMMENT "data import timestamp",
    `city` VARCHAR(20) COMMENT "user's city",
    `age` SMALLINT COMMENT "user's age",
    `sex` TINYINT COMMENT "user's gender",
    `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "last visit time of user",
    `cost` BIGINT SUM DEFAULT "0" COMMENT "total consumption of user",
    `max_dwell_time` INT MAX DEFAULT "0" COMMENT "user's maximum stay time",
    `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "user's minimum stay time"
    )
    ENGINE=OLAP
    AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`)
    PARTITION BY RANGE(`date`, `user_id`)
    (
    PARTITION `p201701_1000` VALUES LESS THAN ("2017-02-01", "1000"),
    PARTITION `p201702_2000` VALUES LESS THAN ("2017-03-01", "2000"),
    PARTITION `p201703_all` VALUES LESS THAN ("2017-04-01")
    )
    DISTRIBUTED BY HASH(`user_id`) BUCKETS 16
    PROPERTIES
    (
    "replication_num" = "3"
    );
    In the above example, we specify date (DATE type) and user_id (INT type) as partition columns. The final result of the sample partition is as follows:
    show partitions from expamle_range_multi_partiton_key_tbl;
    * p201701_1000: [(MIN_VALUE, MIN_VALUE), ("2017-02-01", "1000") )
    * p201702_2000: [("2017-02-01", "1000"), ("2017-03-01", "2000") )
    * p201703_all: [("2017-03-01", "2000"), ("2017-04-01", MIN_VALUE))
    Note
    In the last partition, the user only specified the partition value the date column by default, so the partition value of the user_id column will be automatically filled with MIN_VALUE. When data is imported, the system will compare them with the partition values in order, and put the data in their corresponding partitions.
    For Example:
    * Data --> Partition
    * 2017-01-01, 200 --> p201701_1000
    * 2017-01-01, 2000 --> p201701_1000
    * 2017-02-01, 100 --> p201701_1000
    * 2017-02-01, 2000 --> p201702_2000
    * 2017-02-15, 5000 --> p201702_2000
    * 2017-03-01, 2000 --> p201703_all
    * 2017-03-10, 1 --> p201703_all
    * 2017-04-01, 1000 --> Unable to import
    * 2017-05-01, 1000 --> Unable to import
    Verification method: Insert a piece of data and check which partition it is stored in. The partition fields VisibleVersionTime and VisibleVersion that were just updated are the partitions where the newly inserted data is located.
    insert into expamle_range_multi_partiton_key_tbl values (200, '2017-01-01', '2017-01-01 12:00:05', 'Beijing', 25, 1, '2017-01-01 12:00:05', 100, 30, 10);
    insert into expamle_range_multi_partiton_key_tbl values (2000, '2017-01-01', '2017-01-01 16:10:05', 'Shanghai', 33, 1, '2017-01-01 16:10:05', 800, 50, 1);
    insert into expamle_range_multi_partiton_key_tbl values (200, '2017-02-01', '2017-01-01 16:10:05', 'Guangzhou', 22, 0, '2017-02-01 16:10:05', 80, 200, 1);
    show partitions from expamle_range_multi_partiton_key_tbl\\G

    List Partition

    The partition columns support BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, DATE, DATETIME, CHAR, and VARCHAR data types, and the partition values are an enumeration values. Partitions can be only hit if the data is one of the enumeration values in the target partition. List partition supports using VALUES IN (...) to specify the enumeration values contained in each partition. The following example illustrates how partitions change when adding or deleting a partition.
    -- List Partition
    
    CREATE TABLE IF NOT EXISTS example_db.expamle_list_tbl
    (
    `user_id` LARGEINT NOT NULL COMMENT "user id",
    `date` DATE NOT NULL COMMENT "data import date",
    `timestamp` DATETIME NOT NULL COMMENT "data import timestamp",
    `city` VARCHAR(20) NOT NULL COMMENT "User's city",
    `age` SMALLINT COMMENT "user's age",
    `sex` TINYINT COMMENT "user's gender",
    `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "last visit time of user",
    `cost` BIGINT SUM DEFAULT "0" COMMENT "total consumption of user",
    `max_dwell_time` INT MAX DEFAULT "0" COMMENT "user's maximum stay time",
    `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "user's minimum stay time"
    )
    ENGINE=olap
    AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`)
    PARTITION BY LIST(`city`)
    (
    PARTITION `p_cn` VALUES IN ("Beijing", "Shanghai", "Hong Kong"),
    PARTITION `p_usa` VALUES IN ("New York", "San Francisco"),
    PARTITION `p_jp` VALUES IN ("Tokyo")
    )
    DISTRIBUTED BY HASH(`user_id`) BUCKETS 16
    PROPERTIES
    (
    "replication_num" = "3"
    );
    As shown in the example_list_tbl example, after the table is created, the following three partitions will be automatically generated:
    show partitions from expamle_list_tbl;
    p_cn: ("Beijing", "Shanghai", "Hong Kong")
    p_usa: ("New York", "San Francisco")
    p_jp: ("Tokyo")
    When we add a partition p_uk VALUES IN ("London")
    alter table expamle_list_tbl add partition p_uk VALUES IN ("London");
    show partitions from expamle_list_tbl;
    The partition result is as follows:
    p_cn: ("Beijing", "Shanghai", "Hong Kong")
    p_usa: ("New York", "San Francisco")
    p_jp: ("Tokyo")
    p_uk: ("London")
    When we delete partition p_jp, the result of the partition is as follows:
    alter table expamle_list_tbl drop partition p_jp;
    show partitions from expamle_list_tbl;
    p_cn: ("Beijing", "Shanghai", "Hong Kong")
    p_usa: ("New York", "San Francisco")
    p_uk: ("London")

    Multi-column partition

    List partition also supports multi-column partition, the example is as follows:
    -- List Partition
    
    CREATE TABLE IF NOT EXISTS example_db.expamle_list_multi_partiton_key_tbl
    (
    `user_id` LARGEINT NOT NULL COMMENT "user id",
    `date` DATE NOT NULL COMMENT "data import date",
    `timestamp` DATETIME NOT NULL COMMENT "data import timestamp",
    `city` VARCHAR(20) NOT NULL COMMENT "User's city",
    `age` SMALLINT COMMENT "user's age",
    `sex` TINYINT COMMENT "user's gender",
    `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "last visit time of user",
    `cost` BIGINT SUM DEFAULT "0" COMMENT "total consumption of user",
    `max_dwell_time` INT MAX DEFAULT "0" COMMENT "user's maximum stay time",
    `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "user's minimum stay time"
    )
    ENGINE=olap
    AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`)
    PARTITION BY LIST(`user_id`, `city`)
    (
    PARTITION `p1_city` VALUES IN (("1", "Beijing"), ("1", "Shanghai")),
    PARTITION `p2_city` VALUES IN (("2", "Beijing"), ("2", "Shanghai")),
    PARTITION `p3_city` VALUES IN (("3", "Beijing"), ("3", "Shanghai"))
    )
    DISTRIBUTED BY HASH(`user_id`) BUCKETS 16
    PROPERTIES
    (
    "replication_num" = "3"
    );
    In the above example, we specify user_id(INT type) and city(VARCHAR type) as partition columns. The final partition obtained in the above example is as follows:
    show partitions from expamle_list_multi_partiton_key_tbl;
    * p1_city: [("1", "Beijing"), ("1", "Shanghai")]
    * p2_city: [("2", "Beijing"), ("2", "Shanghai")]
    * p3_city: [("3", "Beijing"), ("3", "Shanghai")]
    When users insert data, the values of partition columns will be compared in order to obtain the corresponding partition. The examples are as follows:
    * Data ---> partition
    * 1, Beijing ---> p1_city
    * 1, Shanghai ---> p1_city
    * 2, Shanghai ---> p2_city
    * 3, Beijing ---> p3_city
    * 1, Tianjin ---> Unable to import
    * 4, Beijing ---> Unable to import
    Verification method: Insert a piece of data and check which partition it is stored in. The partition fields VisibleVersionTime and VisibleVersion that were just updated are the partitions where the newly inserted data is located.
    insert into expamle_list_multi_partiton_key_tbl values (1, '2017-01-01', '2017-01-01 12:00:05', 'Beijing', 25, 1, '2017-01-01 12:00:05', 100, 30, 10);
    show partitions from expamle_list_multi_partiton_key_tbl\\G

    Bucket

    The data is divided into different Buckets based on the hash value of the bucket column.
    If Partition is used, the DISTRIBUTED ... statement describes the division rules of data within each partition. If Partition is not used, it describes the division rules of the data in the entire table.
    Multiple bucket columns can be selected, but they must be Key columns. The bucket column can be the same or different from the Partition column.
    The choice of bucket column is a trade-off between query throughput and query concurrency:
    1.1 If you choose to specify multiple bucket columns, the data will be more evenly distributed. However, if the query condition does not contain the equivalent conditions for all bucket columns, the system will scan all buckets, largely increasing the query throughput and decreasing the latency of a single query. This method is suitable for high-throughput and low-concurrency query scenarios.
    1.2 If you choose to specify only one or a few bucket columns, point queries might scan only one bucket. Thus, when multiple point queries are performed concurrently, they might scan various buckets, with no interaction between the IO operations (especially when different buckets are stored on different disks). This approach is suitable for high-concurrency point query scenarios.
    There is no theoretical upper limit to the number of buckets.

    Best Practice

    Selection of Bucket Columns

    It is recommended to use columns with high distinction for bucket to avoid data skew.
    To recover data conveniently, it is recommended that the size of a single Bucket should not be too large and be kept within 10GB. Therefore, please consider the number of Buckets reasonably when creating tables or adding Partitions, and different Partitions can specify different numbers of Buckets.

    Recommendations on the number and data volume of Partition and Buckets

    The total number of tablet in a table is equal to (Partition num * Bucket num).
    The recommended number of tablet in a table, regardless of capacity expansion, is slightly more than the number of disks in the entire cluster.
    The data volume of a single tablet has no upper or lower limit theoretically, but is recommended to be in the range of 1 GB - 10 GB. Overly small data volume of a single tablet can impose a stress on data aggregation and metadata management; while overly large data volume can cause trouble in data migration and completion, and increase the cost of Schema Change or Rollup operation failures (These operations are performed on the Tablet level).
    For the tablets, if you cannot have the ideal data volume and the ideal quantity principles at the same time, it is recommended to prioritize the ideal data volume.
    When a table is created, you specify the same number of buckets for each partition. But when dynamically adding partitions (ADD PARTITION), you can specify the bucket number of the new partition separately. This feature can help you to deal with data reduction or expansion conveniently.
    Once the number of buckets for a partition is specified, it cannot be changed. So when determining the number of buckets, you have to consider the need of cluster expansion in advance. For example, if there are only 3 hosts, and each host has only one disk, and the number of buckets is only set to 3 or less, then no amount of newly added machines can increase the concurrency.
    For example: suppose there are 10 BEs, each BE has one disk. If the total size of a table is 500MB, then you can consider dividing it into 4-8 tablets. 5 GB: 8-16 tablets. 50 GB: 32 tablets. 500 GB: It is recommended to divide it into partitions, with each partition about 50 GB in size, and 16-32 tablets per partition. 5 TB: It is suggested to divide it into partitions, with each partition about 50 GB in size, and 16-32 tablets per partition.
    Note
    The data volume of the table can be viewed using the SHOW DATA command. Divide the returned result by the number of replicas to get the data volume of the table.

    Compound Partition vs. Single Partition

    Compound Partition

    The first layer of data partition is called Partition. You can specify a dimension column as a partition column (currently only integer and time columns are supported), and specify the value range of each partition.
    The second layer is called Distribution, or bucket. Users can perform HASH distribution on data by specifying the number of buckets and one or more dimension columns as the bucket columns.
    Compound partition is recommended for the following scenarios:
    Scenarios with time dimensions or similar dimensions with ordered values, which can be used as partition columns. The partition granularity can be evaluated according to the data import frequency, data volume, etc.
    Scenarios with a need to delete historical data: If, for example, you only need to keep the data of the last N days, you can use compound partition so you can delete historical partitions To remove historical data, you can also send a DELETE statement within the specified partition.
    Scenarios with a need to avoid data skew: You can specify the number of buckets for each partition separately. For example, if you choose to partition the data by day, and the data volume per day varies greatly, you can customize the number of buckets for each partition. For the choice of bucket column, it is recommended to select the columns with variety in values.

    Single partition

    You can also not use compound partition (not selecting partition columns), that is, single partition. In this case, data is only distributed by HASH.

    FAQs

    Failed to create partition [xxx] . Timeout

    In Doris, tables are created in the order of partition granularity. When the creation of a partition fails, this error may be reported. Even if you don’t use a partition, when a problem occurs during table creation, Failed to create partition will also be reported, because as mentioned earlier, Doris will create an unmodifiable default partition for a table without specified partition. Usually, this error occurs because the tablet creation goes wrong in BE. You can follows the steps below for troubleshooting:
    1. In the fe.log, find the Failed to create partition log of the corresponding time point. In the log, find a number pair that looks like {10001-10010}. The first number of the pair is the Backend ID, and the second number is the Tablet ID. For example, the above-mentioned pair of numbers means that on Backend ID 10001, the creation of Tablet ID 10010 failed.
    2. After finding the target Backend, go to the corresponding be.INFO log, and find the log of the target tablet, and then check the error message.
    3. The following are some common tablet creation failures, including but not limited to:
    4. The task is not received by BE. In this case, the tablet id related log cannot be found in be.INFO, or the creation of BE is successful, but still reports a failure. To solve the above problems, see Installation and Deployment about how to check the connectivity of FE and BE.
    5. Pre-allocated memory failure. It may be that the length of a row in the table exceeds 100KB.
    6. Too many open files. The number of open file descriptors exceeds the Linux system limit. In this case, you need to modify the open file descriptor limit of the Linux system.
    If a timeout error occurs when you create data tablets, you can set tablet_create_timeout_second=xxx and max_create_table_timeout_second=xxx in fe.conf. The default value of tablet_create_timeout_second is 1 second, and the default value of max_create_table_timeout_second is 60 seconds. The overall timeout is min(tablet_create_timeout_second * replication_num, max_create_table_timeout_second), for specific parameter setting, please see FE Configuration Item .

    Are there any limits on the number of partitions and bucket quantities?

    There is no upper limit on the number of partitions and split tables, but too many partitions and buckets might impact Doris performance.
    To avoid creating too many partitions at once, the parameters of max_dynamic_partition_num and max_multi_partition_num on FE limit the number of autonomous partition creation and batch created partitions respectively.

    How to set the number of bucket and data replicas for partition?

    When manually creating a partition, you can specify the number of replicas and buckets for the partition. If not specified, the parameters replication_allocation and buckets of the table would be used.
    When a partition is created automatically, the number of replicas and buckets for the partition will use the dynamic_partition.replication_allocation and dynamic_partition.buckets parameters of the table.

    Can the bucket number be modified?

    The bucket number of the created partition cannot be modified.
    For tables using Range partition, the bucket number of the table can be modified. For manually created table partitions after modification, the modified bucket number will be used, and this modification will not be applicable to the already created partitions. If automatic partition is enabled, this modification will not affect the bucket number for the new automatic partitions. To modify the bucket number for automatic partition, you need to modify dynamic_partition.buckets, only the bucket number of the newly created partitions by this modification will be valid, and it will not be applicable to the already created partitions.
    For modifying the bucket number of existing partitions (The operation involves data, remember to perform data backup and watch out the system load of Doris).
    For tables that does not use Range partition, you can only modify the bucket number by creating a new table and then import the data into the new table.
    For existing partitions, you can create temporary partitions and import the specific partition into the temporary partition, and then set the temporary partition name to the existing partition name.

    Can the bucket key be modified?

    No, the bucket key can only be set during table creation and cannot be changed afterwards.

    Is it possible to enable the Auto-Bucket feature on an existing table?

    No, the Auto-Bucket feature can only be enabled during the creation of the table.

    Does the existing partition still exist after the partition is replaced?

    Once the partition is successfully replaced, the replaced partition will be deleted and cannot be recovered.
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support