tencent cloud

Feedback

Iceberg Data Source

Last updated: 2024-11-01 17:00:28
    Data Integration provides real-time writing capabilities for Iceberg. This article introduces the current capability support for real-time data synchronization with Iceberg.

    Supported Versions

    Currently, DataInLong supports real-time writing for single Iceberg tables and whole databases. To use real-time synchronization, the following version limitations must be observed:
    Data Source Type
    Edition
    Iceberg
    0.13.1+

    Use Limits

    Upsert writing is only supported for Iceberg V2 tables.
    After writing to Iceberg, to improve query performance, it is generally required for downstream to perform small file merging through scheduled Spark Action and to configure the Checkpoint interval reasonably.
    Iceberg column type changes are only supported for changing int to long, float to double, and increasing precision for Decimal type with the same scale.

    Whole Database Writing Configuration

    Data Target Settings

    
    
    
    Parameter
    Description
    Data Destination
    Select the target data source to be synchronized.
    Write Mode
    Upsert: Update writing. If the primary key does not conflict, a new row can be inserted; if the primary key conflicts, it will be updated. Suitable for scenarios where the target table has a primary key and needs to be updated in real-time based on the source data. There will be some performance overhead. The Upsert writing mode only supports Iceberg V2 tables and must have a unique key.
    Append: Append write. Regardless of whether there is a primary key, data is appended by inserting new rows. Whether there is a primary key conflict depends on the target end. Suitable for scenarios where there is no primary key and data duplication is allowed. No performance loss.
    Full Append + Incremental Upsert: Automatically switch data writing methods based on the synchronization phase of the source data. Full stage uses Append writing to improve performance, while the incremental stage uses Upsert writing for real-time updates
    Database/Table Matching Policy
    Name matching rules for databases and data table objects in Iceberg:
    Default has the same name as the Source Database/Source Table.
    Self Definition: Support combining built-in parameters and strings to generate target database table names.
    Note:
    Example: If the source table name is table1, and the mapping rule is ${table_name_di_src}_inlong, the data from table1 will be finally mapped to table1_inlong.
    The system will match the target database/table based on matching rules:
    If the matching database/table does not exist in the Iceberg target, it will automatically create the database/table.
    If the matching database/table already exists in the Iceberg target, it will not automatically create the database/table and will use the existing database/table by default.
    Advanced Settings
    You can configure parameters according to business needs.

    Single Table Writing Node Configuration

    1. In the DataInLong page, click Real-time synchronization on the left directory bar.
    2. In the real-time synchronization page, select Single Table Synchronization at the top to create a new one (you can choose either form or canvas mode) and enter the configuration page.
    
    
    
    Parameter
    Description
    Data Destination
    Iceberg Data Source to be written to.
    Database
    Support selection or manual entry of the library name to be written to.
    By default, the database bound to the data source is used as the default database. Other databases need to be manually entered.
    If the data source network is not connected and the database information cannot be fetched directly, you can manually enter the database name. Data synchronization can still be performed when the Data Integration network is connected.
    Table
    Support selection or manual entry of the table name to be written to.
    If the data source network is not connected and the table information cannot be fetched directly, you can manually enter the table name. Data synchronization can still be performed when the Data Integration network is connected.
    Create Target Table with One Click
    When the source is MySQL, TDSQL-C MySQL, TDSQL MySQL, Oracle, PostgreQL, Oceanbase, or Dameng, you can create the Iceberg target table based on the source table structure with one click.
    Write Mode
    Upsert: Update and insert. If the primary key does not conflict, a new row can be inserted; if the primary key conflicts, it will be updated. Suitable for scenarios where the target table has a primary key and needs to be updated in real-time based on source data. There will be some performance overhead. Only supports Iceberg V2 tables and requires a unique key.
    Append: Append write. Regardless of whether there is a primary key, data is appended by inserting new rows. Whether there is a primary key conflict depends on the target end. Suitable for scenarios where there is no primary key and data duplication is allowed. No performance loss.
    Unique Key
    In Upsert write mode, a unique key needs to be set to ensure data ordering, and multiple selections are supported.
    Advanced Settings
    You can configure parameters according to business needs.

    Log collection write node configuration

    Parameter
    Description
    Data Destination
    Select an available Iceberg Data Source in the current project.
    Database/Table
    Select the corresponding database table from this data source.
    Write Mode
    Iceberg supports two write modes:
    Append: Append write.
    Upsert: Insert messages in Upsert mode. Once set, the messages can only be processed once by the consumer to ensure Exactly-Once.
    Unique Key
    In Upsert write mode, you need to set the unique key to ensure data ordering. Multiple selections are supported. In Append mode, setting a unique key is not required.
    Advanced Settings (optional)
    You can configure parameters according to business needs.

    Write Data Type Conversion Supported

    Internal Types
    Iceberg Type
    CHAR
    STRING
    VARCHAR
    STRING
    STRING
    STRING
    BOOLEAN
    BOOLEAN
    BINARY
    FIXED(L)
    VARBINARY
    BINARY
    DECIMAL
    DECIMAL(P,S)
    TINYINT
    INT
    SMALLINT
    INT
    INTEGER
    INT
    BIGINT
    LONG
    FLOAT
    FLOAT
    DOUBLE
    DOUBLE
    DATE
    DATE
    TIME
    TIME
    TIMESTAMP
    TIMESTAMP
    TIMESTAMP_LTZ
    TIMESTAMPTZ
    INTERVAL
    -
    ARRAY
    LIST
    MULTISET
    MAP
    MAP
    MAP
    ROW
    STRUCT
    RAW
    -

    FAQs

    Submission failed due to excessively long table field length

    
    
    
    Solution:
    1. First, back up the TABLE_PARAMS table: mysqldump -hxxx -uroot -pxxx hivemetastore TABLE_PARAMS > table_params.sql
    2. Change the length to 40000: alter table TABLE_PARAMS MODIFY PARAM_VALUE VARCHAR(40000);
    Note:
    When in UTF-8 format, 40000 length is not supported. You can change it to text type or reduce it to 20000.
    
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support