Doris/TCHouse-D Data Source

Recent Pages

Doris/TCHouse-D Data Source

Last updated: 2024-11-01 17:48:13

The configuration of the TCHouse-D data source is the same as that of Doris. Here, Doris data source is used as an example for the following explanation:
Supported Versions
Supports Doris versions 0.x, 1.1.x, 1.2.x, and 2.x.
Use Limits
1. Doris writes use the Stream load HTTP interface. It is necessary to ensure that the IP or port of FE or BE in the data source is correctly filled in.
2. Since the principle of Stream load is that BE initiates the import and distributes data, the recommended import data volume is between 1G and 10G. The default maximum Stream load import data volume is 10G.
To import files larger than 10G, you need to modify the BE configuration streaming_load_max_mb.
For example: if the file size to be imported is 15G, modify the BE configuration streaming_load_max_mb to 16000.
3. The default timeout for Stream load is 600 seconds. According to the current maximum import speed limit of Doris, it is necessary to modify the default timeout of the import task for files that exceed about 3G.
The timeout for an import task is equal to the import data volume divided by 10M/s (the specific average import speed needs to be calculated by the user based on their own cluster situation).
For example: to import a 10G file, the timeout equals 1000s, or 10G/10M/s
Doris Offline Single Table Read Node Configuration
﻿
﻿
﻿
Parameter
Description
Data Source
Available Doris data source to be synchronized.
Database
Supports selection or manual input of the library name to read from.
By default, the database bound to the data source is used as the default database. Other databases need to be manually entered.
If the data source network is not connected and the database information cannot be fetched directly, you can manually enter the database name. Data synchronization can still be performed when the Data Integration network is connected.
Table
Supports selecting or manually entering the table name to be read.
Split Key
Specify the field for data sharding. After specifying, concurrent tasks will be launched for data synchronization. You can use a column in the source data table as the partition key. It is recommended to use the primary key or indexed column as the partition key.
Filter Conditions (Optional)
In actual business scenarios, it is common to select the data of the current day for synchronization, setting the WHERE condition to gmt_create>$bizdate. WHERE condition can effectively perform business incremental synchronization. If the WHERE statement is not filled, including not providing the key or value of WHERE, data synchronization will be regarded as synchronizing full data.
Advanced Settings (Optional)
You can configure parameters according to business needs.
Doris Offline Single Table Write Node Configuration
﻿
﻿
﻿
Parameter
Description
Data Destination
Doris data source to be written into.
Database
Support selection or manual entry of the library name to be written to.
By default, the database bound to the data source is used as the default database. Other databases need to be manually entered.
If the data source network is not connected and the database information cannot be fetched directly, you can manually enter the database name. Data synchronization can still be performed when the Data Integration network is connected.
Table
Support selection or manual entry of the table name to be written to.
If the data source network is not connected and the table information cannot be fetched directly, you can manually enter the table name. Data synchronization can still be performed when the Data Integration network is connected.
Table Overwriting
When enabled, Doris will support atomic overwrite operations at the table level. Before writing data, a new table with the same structure will be created using the CREATE TABLE LIKE statement. The new data will be imported into the new table and the old table will be atomically replaced via swap, achieving table overwrite.
Maximum Number of Rows to Submit Each Time
Record size for one-time batch submission.
Maximum Bytes per Submission
Maximum data volume for one-time batch submission.
Line Separator(Optional)
The key delimiter for Doris write operations, default is '\\n'. Supports manual input. You must ensure it is consistent with the field delimiter of the created Doris table, otherwise data cannot be found in the Doris table.
Pre-Executed SQL
The SQL statement executed before the synchronization task. Fill in the correct SQL syntax according to the data source type, such as clearing the old data in the table before execution (truncate table tablename).
Post-Executed SQL
The SQL statement executed after the synchronization task. Fill in the correct SQL syntax according to the data source type, such as adding a timestamp (alter table tablename add colname timestamp DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP).
Advanced Settings
You can configure parameters according to business needs.
Data type conversion support
Read
Doris data type
Internal Types
TINYINT,SMALLINT,INT,BIGINT
Long
FLOAT,DOUBULE,DECIMAL
Double
VARCHAR,CHAR,ARRAY,STRUCT,STRING
String
DATE,DATETIME
Date
BOOLEAN
Boolean
Write
Internal Types
Doris data type
Long
TINYINT,SMALLINT,INT,BIGINT
Double
DOUBLE,FLOAT,DECIMAL
String
STRING,VARCHAR,CHAR,ARRAY,STRUCT
Date
DATETIME,DATE
Boolean
BOOLEAN
FAQs
1. Partition not found error: "no partition for this tuple"
Cause: Doris lacks the corresponding partition.
Solution: If using time partitioning, it is recommended to enable dynamic partitioning.
The table 'tbl1' has a partition column 'k1' of type DATE. Create a dynamic partition rule to partition by day, keeping only the last 7 days of partitions and pre-creating partitions for the next 3 days.


CREATE TABLE tbl1
(
    k1 DATE,
    ...
)
PARTITION BY RANGE(k1) ()
DISTRIBUTED BY HASH(k1)
PROPERTIES
(
    "dynamic_partition.enable" = "true",
    "dynamic_partition.time_unit" = "DAY",
    "dynamic_partition.start" = "-7",
    "dynamic_partition.end" = "3",
    "dynamic_partition.prefix" = "p",
    "dynamic_partition.buckets" = "32"
);


Assuming the current date is 2020-05-29, according to the above rule, 'tbl1' will have the following partitions:


p20200529: ["2020-05-29", "2020-05-30")
p20200530: ["2020-05-30", "2020-05-31")
p20200531: ["2020-05-31", "2020-06-01")
p20200601: ["2020-06-01", "2020-06-02")


On the next day, 2020-05-30, a new partition 'p20200602' will be created: ["2020-06-02", "2020-06-03")


On 2020-06-06, because dynamic_partition.start is set to 7, the partition from 7 days prior will be deleted, i.e., partition 'p20200529' will be deleted.
2. Data synchronization too large results in: "The size of this batch exceed the max size of JSON type data"
Cause: The batch submission size is too large. It is recommended to ignore the JSON data size check.
﻿﻿
3. Import frequency too fast results in: "tablet writer write failed, err=-235"
Cause: Import frequency too fast, causing VersionCount to exceed the set size (default is 500, controlled by the BE parameter max_tablet_version_num). It is recommended to adjust the VersionCount.
Solution:
1. Locate the tablet_id with the error.
show tablet 28750963;
2. Execute the DetailCmd command.
SHOW PROC '/dbs/40637/16934967/partitions/28750944/16934968/28750963';
﻿
﻿
﻿
3. The versionCount in the result represents the number of versions. If a replica has too many versions, reduce the import frequency or stop importing. If the version count does not decrease after stopping the import, check the be.INFO log on the corresponding BE node by searching for the tablet id and compaction keywords to ensure compaction is running normally.
4. By increasing max_tablet_version_num or optimizing the compaction of the tablet, you can improve compression efficiency to reduce the number of versions.
4. Header temporary partition length exceeds limit: "Bad Message 431"
Cause: The temporary_partition is too large; reduce the number of temporary_partition.
﻿
﻿
﻿
﻿﻿
5. When synchronizing Doris with DLC, Doris reports an error: the field exceeds the schema length
Problem information:
The length of uuid is set to 200, but the uuid length in the synchronization record is 232.
Reason: column_name[uuid], the length of input is too long than schema. first 32 bytes of input str: [0000000000000BB4E595527BE******] schema length: 200; actual length: 232; . src line []; 
Cause:
When calculating the character length, the Spark engine of DLC treats each Chinese character as one character length, whereas when Doris calculates the length, it treats Chinese characters as UTF-8 encoding, occupying 3 character lengths. The inconsistency in length calculation between the two ends may cause strings with critical lengths to trigger warnings.
Solution:
Modify dirty data at the source end of DLC
Increase the length limit of uuid in Doris table
﻿

Contact Us

Contact our sales team or business advisors to help your business.

Technical Support

Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

7x24 Phone Support

tencent cloud

Recent Pages

Doris/TCHouse-D Data Source

Supported Versions

Use Limits

Doris Offline Single Table Read Node Configuration

Doris Offline Single Table Write Node Configuration

Data type conversion support

Read

Write

FAQs

1. Partition not found error: "no partition for this tuple"

2. Data synchronization too large results in: "The size of this batch exceed the max size of JSON type data"

3. Import frequency too fast results in: "tablet writer write failed, err=-235"

4. Header temporary partition length exceeds limit: "Bad Message 431"

5. When synchronizing Doris with DLC, Doris reports an error: the field exceeds the schema length

Was this page helpful?

Was this page helpful?

Parameter	Description
Data Source	Available Doris data source to be synchronized.
Database	Supports selection or manual input of the library name to read from. By default, the database bound to the data source is used as the default database. Other databases need to be manually entered. If the data source network is not connected and the database information cannot be fetched directly, you can manually enter the database name. Data synchronization can still be performed when the Data Integration network is connected.
Table	Supports selecting or manually entering the table name to be read.
Split Key	Specify the field for data sharding. After specifying, concurrent tasks will be launched for data synchronization. You can use a column in the source data table as the partition key. It is recommended to use the primary key or indexed column as the partition key.
Filter Conditions (Optional)	In actual business scenarios, it is common to select the data of the current day for synchronization, setting the WHERE condition to gmt_create>$bizdate. WHERE condition can effectively perform business incremental synchronization. If the WHERE statement is not filled, including not providing the key or value of WHERE, data synchronization will be regarded as synchronizing full data.
Advanced Settings (Optional)	You can configure parameters according to business needs.

Parameter	Description
Data Destination	Doris data source to be written into.
Database	Support selection or manual entry of the library name to be written to. By default, the database bound to the data source is used as the default database. Other databases need to be manually entered. If the data source network is not connected and the database information cannot be fetched directly, you can manually enter the database name. Data synchronization can still be performed when the Data Integration network is connected.
Table	Support selection or manual entry of the table name to be written to. If the data source network is not connected and the table information cannot be fetched directly, you can manually enter the table name. Data synchronization can still be performed when the Data Integration network is connected.
Table Overwriting	When enabled, Doris will support atomic overwrite operations at the table level. Before writing data, a new table with the same structure will be created using the CREATE TABLE LIKE statement. The new data will be imported into the new table and the old table will be atomically replaced via swap, achieving table overwrite.
Maximum Number of Rows to Submit Each Time	Record size for one-time batch submission.
Maximum Bytes per Submission	Maximum data volume for one-time batch submission.
Line Separator(Optional)	The key delimiter for Doris write operations, default is '\\n'. Supports manual input. You must ensure it is consistent with the field delimiter of the created Doris table, otherwise data cannot be found in the Doris table.
Pre-Executed SQL	The SQL statement executed before the synchronization task. Fill in the correct SQL syntax according to the data source type, such as clearing the old data in the table before execution (truncate table tablename).
Post-Executed SQL	The SQL statement executed after the synchronization task. Fill in the correct SQL syntax according to the data source type, such as adding a timestamp (alter table tablename add colname timestamp DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP).
Advanced Settings	You can configure parameters according to business needs.

Doris data type	Internal Types
TINYINT,SMALLINT,INT,BIGINT	Long
FLOAT,DOUBULE,DECIMAL	Double
VARCHAR,CHAR,ARRAY,STRUCT,STRING	String
DATE,DATETIME	Date
BOOLEAN	Boolean

tencent cloud

Sign Up

Log in

Recent Pages

Doris/TCHouse-D Data Source

Supported Versions

Use Limits

Doris Offline Single Table Read Node Configuration

Doris Offline Single Table Write Node Configuration

Data type conversion support

Read

Write

FAQs

1. Partition not found error: "no partition for this tuple"

2. Data synchronization too large results in: "The size of this batch exceed the max size of JSON type data"

3. Import frequency too fast results in: "tablet writer write failed, err=-235"

4. Header temporary partition length exceeds limit: "Bad Message 431"

5. When synchronizing Doris with DLC, Doris reports an error: the field exceeds the schema length

Was this page helpful?

Was this page helpful?