tencent cloud

Feedback

Sequence Column

Last updated: 2024-06-27 11:08:21
    The unique model mainly targets scenarios that require a unique primary key. It can guarantee primary key uniqueness constraints, but due to the use of REPLACE Aggregate mode, for data imported in the same batch, the replacement order is not guaranteed. In this case, the specific data finally imported into the table cannot be determined, creating uncertainty.
    To solve this problem, Doris supports sequence columns. Users specify the sequence column when importing, and under the same key column, the columns of the REPLACE Aggregate type will be replaced according to the value of the sequence column. Larger values?can replace smaller values, and vice versa cannot be replaced. This method gives the user control over the replacement order.

    Applicable Scenario

    The sequence column can only be used under the Unique data model.

    Basic Principles

    This is implemented through the addition of a hidden column __DORIS_SEQUENCE_COL__. The type of this column is specified by the user when creating the table, the specific value of this column is determined when importing, and this value is used to replace the column of REPLACE.

    Creating table

    When you are creating a Unique table, a hidden column __DORIS_SEQUENCE_COL__ will be automatically added according to the type specified by the user.

    Import

    When importing, the FE sets the value of the hidden column as the value of the order by expression (Broker load and Routine load), or the value of the function_column.sequence_col expression (stream load) during parsing. Value column will be replaced according to this value. The value of the hidden column __DORIS_SEQUENCE_COL__ can be set as one column in the data source or one column in the table structure.

    Read

    When the request contains Value column, it needs to read the __DORIS_SEQUENCE_COL__ column. This column is used as the basis for the replacement order of the REPLACE Aggregate function under the same Key column. Larger values can replace smaller values. Otherwise, they cannot be replaced.

    Cumulative Compaction

    The principle is the same as the reading process during Cumulative Compaction.

    Base Compaction

    The principle is the same as the reading process during Base Compaction.

    Using Syntax

    A property is added in the syntax when you are creating a table to identify the type of __DORIS_SEQUENCE_COL__ The main design aspect of the import syntax is to add a map from sequence columns to other columns, which will be introduced below in various import methods.

    Creating table

    When creating a Unique table, you can specify the sequence column type.
    PROPERTIES (
    "function_column.sequence_type" = 'Date',
    );
    sequence_type is used to specify the type of the sequence column, which can be integer and time type.

    Stream load

    The writing method of Stream load is to add a map corresponding to source_sequence of the hidden column in the function_column.sequence_col field in the header, for example:
    curl --location-trusted -u root -H "columns: k1,k2,source_sequence,v1,v2" -H "function_column.sequence_col: source_sequence" -T testData http://host:port/api/testDb/testTbl/_stream_load

    Broker load

    Setting the source_sequence field of the hidden column map at ORDER BY.
    LOAD LABEL db1.label1
    (
    DATA INFILE("hdfs://host:port/user/data/*/test.txt")
    INTO TABLE tbl1
    COLUMNS TERMINATED BY ","
    (k1,k2,source_sequence,v1,v2)
    ORDER BY source_sequence
    )
    WITH BROKER 'broker'
    (
    "username"="user",
    "password"="pass"
    )
    PROPERTIES
    (
    "timeout" = "3600"
    );

    Routine load

    The mapping method is the same as above, as shown in the example below:
    CREATE ROUTINE LOAD example_db.test1 ON example_tbl
    [WITH MERGE|APPEND|DELETE]
    COLUMNS(k1, k2, source_sequence, v1, v2),
    WHERE k1 > 100 and k2 like "%doris%"
    [ORDER BY source_sequence]
    PROPERTIES
    (
    "desired_concurrent_number"="3",
    "max_batch_interval" = "20",
    "max_batch_rows" = "300000",
    "max_batch_size" = "209715200",
    "strict_mode" = "false"
    )
    FROM KAFKA
    (
    "kafka_broker_lsequence_typeist" = "broker1:9092,broker2:9092,broker3:9092",
    "kafka_topic" = "my_topic",
    "kafka_partitions" = "0,1,2,3",
    "kafka_offsets" = "101,0,0,200"
    );

    Activate support for sequence column

    If you set function_column.sequence_type when creating a new table, the new table will support sequence column. For a table that does not support sequence column, if you want to use this feature, you can use the following statement to enable it.
    ALTER TABLE example_db.my_table ENABLE FEATURE "SEQUENCE_LOAD" WITH PROPERTIES ("function_column.sequence_type" = "Date")
    To determine whether a table supports sequence column, you can set a session variable to display hidden columns SET show_hidden_columns=true, and then use desc tablename. It supports if the output has a __DORIS_SEQUENCE_COL__ column then it supports; if not, it does not support.

    Example

    Here we use Stream load as an example to show how to use
    1. Create a table that supports sequence column. Table structure is as follows:
    MySQL > desc test_table;
    +-------------+--------------+------+-------+---------+---------+
    | Field | Type | Null | Key | Default | Extra |
    +-------------+--------------+------+-------+---------+---------+
    | user_id | BIGINT | No | true | NULL | |
    | date | DATE | No | true | NULL | |
    | group_id | BIGINT | No | true | NULL | |
    | modify_date | DATE | No | false | NULL | REPLACE |
    | keyword | VARCHAR(128) | No | false | NULL | REPLACE |
    +-------------+--------------+------+-------+---------+---------+
    2. Normally import data. Import the following data:
    1 2020-02-22 1 2020-02-22 a
    1 2020-02-22 1 2020-02-22 b
    1 2020-02-22 1 2020-03-05 c
    1 2020-02-22 1 2020-02-26 d
    1 2020-02-22 1 2020-02-22 e
    1 2020-02-22 1 2020-02-22 b
    Here we take Stream load as an example, map the sequence column to the modify_date column.
    curl --location-trusted -u root: -H "function_column.sequence_col: modify_date" -T testData http://host:port/api/test/test_table/_stream_load
    The result is:
    MySQL > select * from test_table;
    +---------+------------+----------+-------------+---------+
    | user_id | date | group_id | modify_date | keyword |
    +---------+------------+----------+-------------+---------+
    | 1 | 2020-02-22 | 1 | 2020-03-05 | c |
    +---------+------------+----------+-------------+---------+
    In this import, because the value of sequence column (the value in modify_date) '2020-03-05' is the largest, so the keyword column finally retains c.
    3. Guarantee of replacement order. After the above steps are completed, import the following data:
    1 2020-02-22 1 2020-02-22 a
    1 2020-02-22 1 2020-02-23 b
    Querying Data:
    MySQL [test]> select * from test_table;
    +---------+------------+----------+-------------+---------+
    | user_id | date | group_id | modify_date | keyword |
    +---------+------------+----------+-------------+---------+
    | 1 | 2020-02-22 | 1 | 2020-03-05 | c |
    +---------+------------+----------+-------------+---------+
    Since the sequence column of the newly imported data is smaller than the value already in the table, it cannot be replaced. Try importing the following data again:
    1 2020-02-22 1 2020-02-22 a
    1 2020-02-22 1 2020-03-23 w
    Querying Data:
    MySQL [test]> select * from test_table;
    +---------+------------+----------+-------------+---------+
    | user_id | date | group_id | modify_date | keyword |
    +---------+------------+----------+-------------+---------+
    | 1 | 2020-02-22 | 1 | 2020-03-23 | w |
    +---------+------------+----------+-------------+---------+
    At this point, you can replace the original data in the table.

    FAQs

    What happens if the sequence_col is not specified during import for a table that has specified function_column.sequence_type?

    The import task will not execute the judgment logic related to the sequence column, causing the import result to not meet expectations. Therefore, strongly recommend to use sequence column feature in the way of setting function_column.sequence_col, to avoid dirty data caused by forgetting to specify sequence_col column.
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support