Iceberg Data Source

Data Integration provides real-time writing capabilities for Iceberg. This article introduces the current capability support for real-time data synchronization with Iceberg.
Supported Versions
Currently, DataInLong supports real-time writing for single Iceberg tables and whole databases. To use real-time synchronization, the following version limitations must be observed:
Data Source Type
Edition
Iceberg
0.13.1+
Use Limits 
Upsert writing is only supported for Iceberg V2 tables.
After writing to Iceberg, to improve query performance, it is generally required for downstream to perform small file merging through scheduled Spark Action and to configure the Checkpoint interval reasonably.
Iceberg column type changes are only supported for changing int to long, float to double, and increasing precision for Decimal type with the same scale.
Whole Database Writing Configuration
Data Target Settings
﻿
﻿
﻿
Parameter
Description
Data Destination
Select the target data source to be synchronized.
Write Mode
Upsert: Update writing. If the primary key does not conflict, a new row can be inserted; if the primary key conflicts, it will be updated. Suitable for scenarios where the target table has a primary key and needs to be updated in real-time based on the source data. There will be some performance overhead. The Upsert writing mode only supports Iceberg V2 tables and must have a unique key.
Append: Append write. Regardless of whether there is a primary key, data is appended by inserting new rows. Whether there is a primary key conflict depends on the target end. Suitable for scenarios where there is no primary key and data duplication is allowed. No performance loss.
Full Append + Incremental Upsert: Automatically switch data writing methods based on the synchronization phase of the source data. Full stage uses Append writing to improve performance, while the incremental stage uses Upsert writing for real-time updates
Database/Table Matching Policy
Name matching rules for databases and data table objects in Iceberg:
Default has the same name as the Source Database/Source Table.
Self Definition: Support combining built-in parameters and strings to generate target database table names.
Note:
Example: If the source table name is table1, and the mapping rule is ${table_name_di_src}_inlong, the data from table1 will be finally mapped to table1_inlong.
The system will match the target database/table based on matching rules:
If the matching database/table does not exist in the Iceberg target, it will automatically create the database/table.
If the matching database/table already exists in the Iceberg target, it will not automatically create the database/table and will use the existing database/table by default.
Advanced Settings
You can configure parameters according to business needs.
 Single Table Writing Node Configuration
1. In the DataInLong page, click Real-time synchronization on the left directory bar.
2. In the real-time synchronization page, select Single Table Synchronization at the top to create a new one (you can choose either form or canvas mode) and enter the configuration page.
﻿
﻿
﻿
Parameter
Description
Data Destination
Iceberg Data Source to be written to.
Database
Support selection or manual entry of the library name to be written to.
By default, the database bound to the data source is used as the default database. Other databases need to be manually entered.
If the data source network is not connected and the database information cannot be fetched directly, you can manually enter the database name. Data synchronization can still be performed when the Data Integration network is connected.
Table
Support selection or manual entry of the table name to be written to.
If the data source network is not connected and the table information cannot be fetched directly, you can manually enter the table name. Data synchronization can still be performed when the Data Integration network is connected.
Create Target Table with One Click
When the source is MySQL, TDSQL-C MySQL, TDSQL MySQL, Oracle, PostgreQL, Oceanbase, or Dameng, you can create the Iceberg target table based on the source table structure with one click.
Write Mode
Upsert: Update and insert. If the primary key does not conflict, a new row can be inserted; if the primary key conflicts, it will be updated. Suitable for scenarios where the target table has a primary key and needs to be updated in real-time based on source data. There will be some performance overhead. Only supports Iceberg V2 tables and requires a unique key.
Append: Append write. Regardless of whether there is a primary key, data is appended by inserting new rows. Whether there is a primary key conflict depends on the target end. Suitable for scenarios where there is no primary key and data duplication is allowed. No performance loss.
Unique Key
In Upsert write mode, a unique key needs to be set to ensure data ordering, and multiple selections are supported.
Advanced Settings
You can configure parameters according to business needs.
Log collection write node configuration
Parameter
Description
Data Destination
Select an available Iceberg Data Source in the current project.
Database/Table
Select the corresponding database table from this data source.
Write Mode
 Iceberg supports two write modes:
Append: Append write.
Upsert: Insert messages in Upsert mode. Once set, the messages can only be processed once by the consumer to ensure Exactly-Once.
Unique Key
In Upsert write mode, you need to set the unique key to ensure data ordering. Multiple selections are supported. In Append mode, setting a unique key is not required.
Advanced Settings (optional)
You can configure parameters according to business needs.
Write Data Type Conversion Supported
Internal Types
Iceberg Type
CHAR
STRING
VARCHAR
STRING
STRING
STRING
BOOLEAN
BOOLEAN
BINARY
FIXED(L)
VARBINARY
BINARY
DECIMAL
DECIMAL(P,S)
TINYINT
INT
SMALLINT
INT
INTEGER
INT
BIGINT
LONG
FLOAT
FLOAT
DOUBLE
DOUBLE
DATE
DATE
TIME
TIME
TIMESTAMP
TIMESTAMP
TIMESTAMP_LTZ
TIMESTAMPTZ
INTERVAL
-
ARRAY
LIST
MULTISET
MAP
MAP
MAP
ROW
STRUCT
RAW
-
FAQs
Submission failed due to excessively long table field length
﻿
﻿
﻿
Solution:
1. First, back up the TABLE_PARAMS table:
mysqldump -hxxx -uroot -pxxx hivemetastore TABLE_PARAMS > table_params.sql
2. Change the length to 40000:
alter table TABLE_PARAMS MODIFY PARAM_VALUE VARCHAR(40000);
Note:
When in UTF-8 format, 40000 length is not supported. You can change it to text type or reduce it to 20000.
﻿
﻿

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

Parameter	Description
Data Destination	Select the target data source to be synchronized.
Write Mode	Upsert: Update writing. If the primary key does not conflict, a new row can be inserted; if the primary key conflicts, it will be updated. Suitable for scenarios where the target table has a primary key and needs to be updated in real-time based on the source data. There will be some performance overhead. The Upsert writing mode only supports Iceberg V2 tables and must have a unique key. Append: Append write. Regardless of whether there is a primary key, data is appended by inserting new rows. Whether there is a primary key conflict depends on the target end. Suitable for scenarios where there is no primary key and data duplication is allowed. No performance loss. Full Append + Incremental Upsert: Automatically switch data writing methods based on the synchronization phase of the source data. Full stage uses Append writing to improve performance, while the incremental stage uses Upsert writing for real-time updates
Database/Table Matching Policy	Name matching rules for databases and data table objects in Iceberg: Default has the same name as the Source Database/Source Table. Self Definition: Support combining built-in parameters and strings to generate target database table names. Note: Example: If the source table name is table1, and the mapping rule is ${table_name_di_src}_inlong, the data from table1 will be finally mapped to table1_inlong. The system will match the target database/table based on matching rules: If the matching database/table does not exist in the Iceberg target, it will automatically create the database/table. If the matching database/table already exists in the Iceberg target, it will not automatically create the database/table and will use the existing database/table by default.
Advanced Settings	You can configure parameters according to business needs.

Internal Types	Iceberg Type
CHAR	STRING
VARCHAR	STRING
STRING	STRING
BOOLEAN	BOOLEAN
BINARY	FIXED(L)
VARBINARY	BINARY
DECIMAL	DECIMAL(P,S)
TINYINT	INT
SMALLINT	INT
INTEGER	INT
BIGINT	LONG
FLOAT	FLOAT
DOUBLE	DOUBLE
DATE	DATE
TIME	TIME
TIMESTAMP	TIMESTAMP
TIMESTAMP_LTZ	TIMESTAMPTZ
INTERVAL	-
ARRAY	LIST
MULTISET	MAP
MAP	MAP
ROW	STRUCT
RAW	-

tencent cloud

New User Offers

Next-Generation CDN：EdgeOne

Elasticsearch Service free trial

Free Tier

Tencent Cloud Startup Program

Special Offers

Lighthouse Special Offers

Cloud Object Storage Special Offers

Featured Products

New Products

Education

Tencent Cloud Online Education Solutions

Gaming

Gaming Solution

Game Media Solutions

E-commerce

E-commerce retail solutions

Audio & Video

Audio/Video Solution

LVB Recording Solution

Interactive Classroom Solution

Interactive Live Streaming Solution

Audio Chat Social Networking Solution

Financial Services

Financial Services Solution

Compute

Cloud Virtual Machine

Auto Scaling

Batch Compute

CVM Dedicated Host

Database

TencentDB for MySQL

TencentDB for Redis®

TencentDB for CTSDB

TDSQL for MySQL

Data Transfer Service

TencentDB for MongoDB

TencentDB for PostgreSQL

TencentDB for SQL Server

Video Service

Cloud Streaming Services

Video on Demand

Media Processing Service

Cloud Application Rendering

Cloud Contact Center

Game Multimedia Engine

Chat

Real-time Communication

Tencent Effect SDK

AI and Machine Learning

Image Creation Large Model

Face Fusion

eKYC

Optical Character Recognition

Video Creation Large Model

Industry Applications

Tencent HealthCare Omics Platform

Container and Middleware

TDMQ for CKafka

Serverless Cloud Function

Tencent Kubernetes Engine

Tencent Kubernetes Engine for Serverless

Networking

Cloud Load Balancer

Virtual Private Cloud

Direct Connect

Cloud Connect Network

NAT Gateway

VPN Connection

Bandwidth Package

Anycast Internet Acceleration

Elastic Network Interface

Flow Logs

Global Application Acceleration Platform

Security

Captcha

Cloud Workload Protection Platform

Data Security Governance Center

Key Management Service