Sequence Column

The unique model mainly targets scenarios that require a unique primary key. It can guarantee primary key uniqueness constraints, but due to the use of REPLACE Aggregate mode, for data imported in the same batch, the replacement order is not guaranteed. In this case, the specific data finally imported into the table cannot be determined, creating uncertainty.
To solve this problem, Doris supports sequence columns. Users specify the sequence column when importing, and under the same key column, the columns of the REPLACE Aggregate type will be replaced according to the value of the sequence column. Larger values?can replace smaller values, and vice versa cannot be replaced. This method gives the user control over the replacement order.
Applicable Scenario
The sequence column can only be used under the Unique data model.
Basic Principles
This is implemented through the addition of a hidden column __DORIS_SEQUENCE_COL__. The type of this column is specified by the user when creating the table, the specific value of this column is determined when importing, and this value is used to replace the column of REPLACE.
Creating table
When you are creating a Unique table, a hidden column __DORIS_SEQUENCE_COL__ will be automatically added according to the type specified by the user.
Import
When importing, the FE sets the value of the hidden column as the value of the order by expression (Broker load and Routine load), or the value of the function_column.sequence_col expression (stream load) during parsing. Value column will be replaced according to this value. The value of the hidden column __DORIS_SEQUENCE_COL__ can be set as one column in the data source or one column in the table structure.
Read
When the request contains Value column, it needs to read the __DORIS_SEQUENCE_COL__ column. This column is used as the basis for the replacement order of the REPLACE Aggregate function under the same Key column. Larger values can replace smaller values. Otherwise, they cannot be replaced.
Cumulative Compaction
The principle is the same as the reading process during Cumulative Compaction.
Base Compaction
The principle is the same as the reading process during Base Compaction.
Using Syntax
A property is added in the syntax when you are creating a table to identify the type of __DORIS_SEQUENCE_COL__
The main design aspect of the import syntax is to add a map from sequence columns to other columns, which will be introduced below in various import methods.
Creating table
When creating a Unique table, you can specify the sequence column type.
PROPERTIES (
    "function_column.sequence_type" = 'Date',
);
sequence_type is used to specify the type of the sequence column, which can be integer and time type.
Stream load
The writing method of Stream load is to add a map corresponding to source_sequence of the hidden column in the function_column.sequence_col field in the header, for example:
curl --location-trusted -u root -H "columns: k1,k2,source_sequence,v1,v2" -H "function_column.sequence_col: source_sequence" -T testData http://host:port/api/testDb/testTbl/_stream_load
Broker load
Setting the source_sequence field of the hidden column map at ORDER BY.
LOAD LABEL db1.label1
(
    DATA INFILE("hdfs://host:port/user/data/*/test.txt")
    INTO TABLE tbl1
    COLUMNS TERMINATED BY ","
    (k1,k2,source_sequence,v1,v2)
    ORDER BY source_sequence
)
WITH BROKER 'broker'
(
    "username"="user",
    "password"="pass"
)
PROPERTIES
(
    "timeout" = "3600"
);
Routine load
The mapping method is the same as above, as shown in the example below:
   CREATE ROUTINE LOAD example_db.test1 ON example_tbl 
    [WITH MERGE|APPEND|DELETE]
    COLUMNS(k1, k2, source_sequence, v1, v2),
    WHERE k1 > 100 and k2 like "%doris%"
    [ORDER BY source_sequence]
    PROPERTIES
    (
        "desired_concurrent_number"="3",
        "max_batch_interval" = "20",
        "max_batch_rows" = "300000",
        "max_batch_size" = "209715200",
        "strict_mode" = "false"
    )
    FROM KAFKA
    (
        "kafka_broker_lsequence_typeist" = "broker1:9092,broker2:9092,broker3:9092",
        "kafka_topic" = "my_topic",
        "kafka_partitions" = "0,1,2,3",
        "kafka_offsets" = "101,0,0,200"
    );
Activate support for sequence column
If you set function_column.sequence_type when creating a new table, the new table will support sequence column.
For a table that does not support sequence column, if you want to use this feature, you can use the following statement to enable it.
ALTER TABLE example_db.my_table ENABLE FEATURE "SEQUENCE_LOAD" WITH PROPERTIES ("function_column.sequence_type" = "Date")
To determine whether a table supports sequence column, you can set a session variable to display hidden columns SET show_hidden_columns=true, and then use desc tablename. It supports if the output has a __DORIS_SEQUENCE_COL__ column then it supports; if not, it does not support.
Example
Here we use Stream load as an example to show how to use
1. Create a table that supports sequence column.
Table structure is as follows:
MySQL > desc test_table;
+-------------+--------------+------+-------+---------+---------+
| Field       | Type         | Null | Key   | Default | Extra   |
+-------------+--------------+------+-------+---------+---------+
| user_id     | BIGINT       | No   | true  | NULL    |         |
| date        | DATE         | No   | true  | NULL    |         |
| group_id    | BIGINT       | No   | true  | NULL    |         |
| modify_date | DATE         | No   | false | NULL    | REPLACE |
| keyword     | VARCHAR(128) | No   | false | NULL    | REPLACE |
+-------------+--------------+------+-------+---------+---------+
2. Normally import data.
Import the following data:
1       2020-02-22      1       2020-02-22      a
1       2020-02-22      1       2020-02-22      b
1       2020-02-22      1       2020-03-05      c
1       2020-02-22      1       2020-02-26      d
1       2020-02-22      1       2020-02-22      e
1       2020-02-22      1       2020-02-22      b
Here we take Stream load as an example, map the sequence column to the modify_date column.
curl --location-trusted -u root: -H "function_column.sequence_col: modify_date" -T testData http://host:port/api/test/test_table/_stream_load
The result is:
MySQL > select * from test_table;
+---------+------------+----------+-------------+---------+
| user_id | date       | group_id | modify_date | keyword |
+---------+------------+----------+-------------+---------+
|       1 | 2020-02-22 |        1 | 2020-03-05  | c       |
+---------+------------+----------+-------------+---------+
In this import, because the value of sequence column (the value in modify_date) '2020-03-05' is the largest, so the keyword column finally retains c.
3. Guarantee of replacement order.
After the above steps are completed, import the following data:
1       2020-02-22      1       2020-02-22      a
1       2020-02-22      1       2020-02-23      b
Querying Data:
MySQL [test]> select * from test_table;
+---------+------------+----------+-------------+---------+
| user_id | date       | group_id | modify_date | keyword |
+---------+------------+----------+-------------+---------+
|       1 | 2020-02-22 |        1 | 2020-03-05  | c       |
+---------+------------+----------+-------------+---------+
Since the sequence column of the newly imported data is smaller than the value already in the table, it cannot be replaced.
Try importing the following data again:
1       2020-02-22      1       2020-02-22      a
1       2020-02-22      1       2020-03-23      w
Querying Data:
MySQL [test]> select * from test_table;
+---------+------------+----------+-------------+---------+
| user_id | date       | group_id | modify_date | keyword |
+---------+------------+----------+-------------+---------+
|       1 | 2020-02-22 |        1 | 2020-03-23  | w       |
+---------+------------+----------+-------------+---------+
At this point, you can replace the original data in the table.
FAQs
What happens if the sequence_col is not specified during import for a table that has specified function_column.sequence_type?
The import task will not execute the judgment logic related to the sequence column, causing the import result to not meet expectations. Therefore, strongly recommend to use sequence column feature in the way of setting function_column.sequence_col, to avoid dirty data caused by forgetting to specify sequence_col column.

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

tencent cloud

Applicable Scenario

Basic Principles

Creating table

Import

Read

Cumulative Compaction

Base Compaction

Using Syntax

Creating table

Stream load

Broker load

Routine load

Activate support for sequence column

Example

FAQs

What happens if the sequence_col is not specified during import for a table that has specified function_column.sequence_type?

About Tencent Cloud

Help & Support

Resources

User Center

tencent cloud

Sign Up

Log in

Compute

Microservice

Data Migration

Database SaaS Tool

Data Security

Application Security

Big Data

Voice Technology

Internet of Things

Stream Services

Cloud Real-time Rendering

Cloud Resource Management

More

Edge Computing

Serverless

Relational Database

Networking

Business Security

Domains & Websites

Face Recognition

AI Platform Service

Middleware

Media On-Demand

Game Services

Management and Audit Tools

Container

Essential Storage Service

Enterprise Distributed DBMS

CDN and Acceleration

Security Services

Enterprise Applications

Tencent Big Model

Natural Language Processing

Communication

Media Process Services

Education Sevices

Developer Tools

Distributed cloud

Data Process and Analysis

NoSQL Database

Network Security

Cloud Security

Office Collaboration

Image Creation

Optical Character Recognition

Interactive Video Services

Media SDK

Medical Services

Monitor and Operation

Applicable Scenario

Basic Principles

Creating table

Import

Read

Cumulative Compaction

Base Compaction

Using Syntax

Creating table

Stream load

Broker load

Routine load

Activate support for sequence column

Example

FAQs

What happens if the sequence_col is not specified during import for a table that has specified function_column.sequence_type?

About Tencent Cloud

Help & Support

Resources

User Center