tencent cloud

All product documents
Tencent Cloud WeData
Iceberg Data Source
Last updated: 2024-11-01 17:00:28
Iceberg Data Source
Last updated: 2024-11-01 17:00:28
Data Integration provides real-time writing capabilities for Iceberg. This article introduces the current capability support for real-time data synchronization with Iceberg.

Supported Versions

Currently, DataInLong supports real-time writing for single Iceberg tables and whole databases. To use real-time synchronization, the following version limitations must be observed:
Data Source Type
Edition
Iceberg
0.13.1+

Use Limits

Upsert writing is only supported for Iceberg V2 tables.
After writing to Iceberg, to improve query performance, it is generally required for downstream to perform small file merging through scheduled Spark Action and to configure the Checkpoint interval reasonably.
Iceberg column type changes are only supported for changing int to long, float to double, and increasing precision for Decimal type with the same scale.

Whole Database Writing Configuration

Data Target Settings




Parameter
Description
Data Destination
Select the target data source to be synchronized.
Write Mode
Upsert: Update writing. If the primary key does not conflict, a new row can be inserted; if the primary key conflicts, it will be updated. Suitable for scenarios where the target table has a primary key and needs to be updated in real-time based on the source data. There will be some performance overhead. The Upsert writing mode only supports Iceberg V2 tables and must have a unique key.
Append: Append write. Regardless of whether there is a primary key, data is appended by inserting new rows. Whether there is a primary key conflict depends on the target end. Suitable for scenarios where there is no primary key and data duplication is allowed. No performance loss.
Full Append + Incremental Upsert: Automatically switch data writing methods based on the synchronization phase of the source data. Full stage uses Append writing to improve performance, while the incremental stage uses Upsert writing for real-time updates
Database/Table Matching Policy
Name matching rules for databases and data table objects in Iceberg:
Default has the same name as the Source Database/Source Table.
Self Definition: Support combining built-in parameters and strings to generate target database table names.
Note:
Example: If the source table name is table1, and the mapping rule is ${table_name_di_src}_inlong, the data from table1 will be finally mapped to table1_inlong.
The system will match the target database/table based on matching rules:
If the matching database/table does not exist in the Iceberg target, it will automatically create the database/table.
If the matching database/table already exists in the Iceberg target, it will not automatically create the database/table and will use the existing database/table by default.
Advanced Settings
You can configure parameters according to business needs.

Single Table Writing Node Configuration

1. In the DataInLong page, click Real-time synchronization on the left directory bar.
2. In the real-time synchronization page, select Single Table Synchronization at the top to create a new one (you can choose either form or canvas mode) and enter the configuration page.



Parameter
Description
Data Destination
Iceberg Data Source to be written to.
Database
Support selection or manual entry of the library name to be written to.
By default, the database bound to the data source is used as the default database. Other databases need to be manually entered.
If the data source network is not connected and the database information cannot be fetched directly, you can manually enter the database name. Data synchronization can still be performed when the Data Integration network is connected.
Table
Support selection or manual entry of the table name to be written to.
If the data source network is not connected and the table information cannot be fetched directly, you can manually enter the table name. Data synchronization can still be performed when the Data Integration network is connected.
Create Target Table with One Click
When the source is MySQL, TDSQL-C MySQL, TDSQL MySQL, Oracle, PostgreQL, Oceanbase, or Dameng, you can create the Iceberg target table based on the source table structure with one click.
Write Mode
Upsert: Update and insert. If the primary key does not conflict, a new row can be inserted; if the primary key conflicts, it will be updated. Suitable for scenarios where the target table has a primary key and needs to be updated in real-time based on source data. There will be some performance overhead. Only supports Iceberg V2 tables and requires a unique key.
Append: Append write. Regardless of whether there is a primary key, data is appended by inserting new rows. Whether there is a primary key conflict depends on the target end. Suitable for scenarios where there is no primary key and data duplication is allowed. No performance loss.
Unique Key
In Upsert write mode, a unique key needs to be set to ensure data ordering, and multiple selections are supported.
Advanced Settings
You can configure parameters according to business needs.

Log collection write node configuration

Parameter
Description
Data Destination
Select an available Iceberg Data Source in the current project.
Database/Table
Select the corresponding database table from this data source.
Write Mode
Iceberg supports two write modes:
Append: Append write.
Upsert: Insert messages in Upsert mode. Once set, the messages can only be processed once by the consumer to ensure Exactly-Once.
Unique Key
In Upsert write mode, you need to set the unique key to ensure data ordering. Multiple selections are supported. In Append mode, setting a unique key is not required.
Advanced Settings (optional)
You can configure parameters according to business needs.

Write Data Type Conversion Supported

Internal Types
Iceberg Type
CHAR
STRING
VARCHAR
STRING
STRING
STRING
BOOLEAN
BOOLEAN
BINARY
FIXED(L)
VARBINARY
BINARY
DECIMAL
DECIMAL(P,S)
TINYINT
INT
SMALLINT
INT
INTEGER
INT
BIGINT
LONG
FLOAT
FLOAT
DOUBLE
DOUBLE
DATE
DATE
TIME
TIME
TIMESTAMP
TIMESTAMP
TIMESTAMP_LTZ
TIMESTAMPTZ
INTERVAL
-
ARRAY
LIST
MULTISET
MAP
MAP
MAP
ROW
STRUCT
RAW
-

FAQs

Submission failed due to excessively long table field length




Solution:
1. First, back up the TABLE_PARAMS table: mysqldump -hxxx -uroot -pxxx hivemetastore TABLE_PARAMS > table_params.sql
2. Change the length to 40000: alter table TABLE_PARAMS MODIFY PARAM_VALUE VARCHAR(40000);
Note:
When in UTF-8 format, 40000 length is not supported. You can change it to text type or reduce it to 20000.


Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback

Contact Us

Contact our sales team or business advisors to help your business.

Technical Support

Open a ticket if you're looking for further assistance. Our Ticket is 7x24 available.

7x24 Phone Support
Hong Kong, China
+852 800 906 020 (Toll Free)
United States
+1 844 606 0804 (Toll Free)
United Kingdom
+44 808 196 4551 (Toll Free)
Canada
+1 888 605 7930 (Toll Free)
Australia
+61 1300 986 386 (Toll Free)
EdgeOne hotline
+852 300 80699
More local hotlines coming soon