tencent cloud

All product documents
Tencent Cloud TCHouse-D
Sequence Column
Last updated: 2024-06-27 11:08:21
Sequence Column
Last updated: 2024-06-27 11:08:21
The unique model mainly targets scenarios that require a unique primary key. It can guarantee primary key uniqueness constraints, but due to the use of REPLACE Aggregate mode, for data imported in the same batch, the replacement order is not guaranteed. In this case, the specific data finally imported into the table cannot be determined, creating uncertainty.
To solve this problem, Doris supports sequence columns. Users specify the sequence column when importing, and under the same key column, the columns of the REPLACE Aggregate type will be replaced according to the value of the sequence column. Larger values?can replace smaller values, and vice versa cannot be replaced. This method gives the user control over the replacement order.

Applicable Scenario

The sequence column can only be used under the Unique data model.

Basic Principles

This is implemented through the addition of a hidden column __DORIS_SEQUENCE_COL__. The type of this column is specified by the user when creating the table, the specific value of this column is determined when importing, and this value is used to replace the column of REPLACE.

Creating table

When you are creating a Unique table, a hidden column __DORIS_SEQUENCE_COL__ will be automatically added according to the type specified by the user.

Import

When importing, the FE sets the value of the hidden column as the value of the order by expression (Broker load and Routine load), or the value of the function_column.sequence_col expression (stream load) during parsing. Value column will be replaced according to this value. The value of the hidden column __DORIS_SEQUENCE_COL__ can be set as one column in the data source or one column in the table structure.

Read

When the request contains Value column, it needs to read the __DORIS_SEQUENCE_COL__ column. This column is used as the basis for the replacement order of the REPLACE Aggregate function under the same Key column. Larger values can replace smaller values. Otherwise, they cannot be replaced.

Cumulative Compaction

The principle is the same as the reading process during Cumulative Compaction.

Base Compaction

The principle is the same as the reading process during Base Compaction.

Using Syntax

A property is added in the syntax when you are creating a table to identify the type of __DORIS_SEQUENCE_COL__ The main design aspect of the import syntax is to add a map from sequence columns to other columns, which will be introduced below in various import methods.

Creating table

When creating a Unique table, you can specify the sequence column type.
PROPERTIES (
"function_column.sequence_type" = 'Date',
);
sequence_type is used to specify the type of the sequence column, which can be integer and time type.

Stream load

The writing method of Stream load is to add a map corresponding to source_sequence of the hidden column in the function_column.sequence_col field in the header, for example:
curl --location-trusted -u root -H "columns: k1,k2,source_sequence,v1,v2" -H "function_column.sequence_col: source_sequence" -T testData http://host:port/api/testDb/testTbl/_stream_load

Broker load

Setting the source_sequence field of the hidden column map at ORDER BY.
LOAD LABEL db1.label1
(
DATA INFILE("hdfs://host:port/user/data/*/test.txt")
INTO TABLE tbl1
COLUMNS TERMINATED BY ","
(k1,k2,source_sequence,v1,v2)
ORDER BY source_sequence
)
WITH BROKER 'broker'
(
"username"="user",
"password"="pass"
)
PROPERTIES
(
"timeout" = "3600"
);

Routine load

The mapping method is the same as above, as shown in the example below:
CREATE ROUTINE LOAD example_db.test1 ON example_tbl
[WITH MERGE|APPEND|DELETE]
COLUMNS(k1, k2, source_sequence, v1, v2),
WHERE k1 > 100 and k2 like "%doris%"
[ORDER BY source_sequence]
PROPERTIES
(
"desired_concurrent_number"="3",
"max_batch_interval" = "20",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
"strict_mode" = "false"
)
FROM KAFKA
(
"kafka_broker_lsequence_typeist" = "broker1:9092,broker2:9092,broker3:9092",
"kafka_topic" = "my_topic",
"kafka_partitions" = "0,1,2,3",
"kafka_offsets" = "101,0,0,200"
);

Activate support for sequence column

If you set function_column.sequence_type when creating a new table, the new table will support sequence column. For a table that does not support sequence column, if you want to use this feature, you can use the following statement to enable it.
ALTER TABLE example_db.my_table ENABLE FEATURE "SEQUENCE_LOAD" WITH PROPERTIES ("function_column.sequence_type" = "Date")
To determine whether a table supports sequence column, you can set a session variable to display hidden columns SET show_hidden_columns=true, and then use desc tablename. It supports if the output has a __DORIS_SEQUENCE_COL__ column then it supports; if not, it does not support.

Example

Here we use Stream load as an example to show how to use
1. Create a table that supports sequence column. Table structure is as follows:
MySQL > desc test_table;
+-------------+--------------+------+-------+---------+---------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-------+---------+---------+
| user_id | BIGINT | No | true | NULL | |
| date | DATE | No | true | NULL | |
| group_id | BIGINT | No | true | NULL | |
| modify_date | DATE | No | false | NULL | REPLACE |
| keyword | VARCHAR(128) | No | false | NULL | REPLACE |
+-------------+--------------+------+-------+---------+---------+
2. Normally import data. Import the following data:
1 2020-02-22 1 2020-02-22 a
1 2020-02-22 1 2020-02-22 b
1 2020-02-22 1 2020-03-05 c
1 2020-02-22 1 2020-02-26 d
1 2020-02-22 1 2020-02-22 e
1 2020-02-22 1 2020-02-22 b
Here we take Stream load as an example, map the sequence column to the modify_date column.
curl --location-trusted -u root: -H "function_column.sequence_col: modify_date" -T testData http://host:port/api/test/test_table/_stream_load
The result is:
MySQL > select * from test_table;
+---------+------------+----------+-------------+---------+
| user_id | date | group_id | modify_date | keyword |
+---------+------------+----------+-------------+---------+
| 1 | 2020-02-22 | 1 | 2020-03-05 | c |
+---------+------------+----------+-------------+---------+
In this import, because the value of sequence column (the value in modify_date) '2020-03-05' is the largest, so the keyword column finally retains c.
3. Guarantee of replacement order. After the above steps are completed, import the following data:
1 2020-02-22 1 2020-02-22 a
1 2020-02-22 1 2020-02-23 b
Querying Data:
MySQL [test]> select * from test_table;
+---------+------------+----------+-------------+---------+
| user_id | date | group_id | modify_date | keyword |
+---------+------------+----------+-------------+---------+
| 1 | 2020-02-22 | 1 | 2020-03-05 | c |
+---------+------------+----------+-------------+---------+
Since the sequence column of the newly imported data is smaller than the value already in the table, it cannot be replaced. Try importing the following data again:
1 2020-02-22 1 2020-02-22 a
1 2020-02-22 1 2020-03-23 w
Querying Data:
MySQL [test]> select * from test_table;
+---------+------------+----------+-------------+---------+
| user_id | date | group_id | modify_date | keyword |
+---------+------------+----------+-------------+---------+
| 1 | 2020-02-22 | 1 | 2020-03-23 | w |
+---------+------------+----------+-------------+---------+
At this point, you can replace the original data in the table.

FAQs

What happens if the sequence_col is not specified during import for a table that has specified function_column.sequence_type?

The import task will not execute the judgment logic related to the sequence column, causing the import result to not meet expectations. Therefore, strongly recommend to use sequence column feature in the way of setting function_column.sequence_col, to avoid dirty data caused by forgetting to specify sequence_col column.
Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback

Contact Us

Contact our sales team or business advisors to help your business.

Technical Support

Open a ticket if you're looking for further assistance. Our Ticket is 7x24 available.

7x24 Phone Support
Hong Kong, China
+852 800 906 020 (Toll Free)
United States
+1 844 606 0804 (Toll Free)
United Kingdom
+44 808 196 4551 (Toll Free)
Canada
+1 888 605 7930 (Toll Free)
Australia
+61 1300 986 386 (Toll Free)
EdgeOne hotline
+852 300 80699
More local hotlines coming soon