SHOW TBLPROPERTIES `DataLakeCatalog`.`databases_name`.`table_name`;
CREATE TABLE IF NOT EXISTS `DataLakeCatalog`.`dbname`.`test_v1` (`id` int, `name` string, `ts` date) PARTITIONED BY (`ts`);
CREATE TABLE IF NOT EXISTS `DataLakeCatalog`.`dbname`.`test_v2` (`id` int, `name` string, `ts` date) PARTITIONED BY (`ts`) TBLPROPERTIES ('format-version' = '2', -- Create a V2 table'write.upsert.enabled' = 'true', -- Perform upsert operation during writing, only supported for V2 tables'write.distribution-mode' = 'hash', -- Define data distribution during writing, set to hash to support concurrent writes'write.update.mode' = 'merge-on-read' -- Update mode during writing, perform merge operations during writing, only supported for V2 tables)
SHOW TBLPROPERTIES table_name [('property_name')]
ALTER TABLE`DataLakeCatalog`.`database_name`.`table_name`SETTBLPROPERTIES ('format-version' = '2','write.upsert.enabled' = 'true')
Parameters | Description |
Data Destination | Select the target data source to be synchronized. |
Write Mode | Upsert: Update write. When there is no primary key conflict, a new row can be inserted; when there is a primary key conflict, an update is performed. Suitable for scenarios where the target table has a primary key and needs to be updated in real-time based on the source data. There will be some performance overhead. Append: Append write. Regardless of whether there is a primary key, data is appended by inserting new rows. Whether there is a primary key conflict depends on the target end. Suitable for scenarios where there is no primary key and data duplication is allowed. No performance loss. Full Append + Incremental Upsert: Automatically switches data writing modes based on the source data synchronization stage. In the full stage, Append write is used to improve performance; in the incremental stage, Upsert write is used for real-time data updates. This mode currently supports data sources from MySQL, TDSQL-C MySQL, TDSQL MySQL, Oracle, and PostgreSQL. |
Database/Table Matching Policy | Name matching rules for database and data table objects in DLC. |
Write Optimization | Write optimization is suitable for scenarios with frequent or real-time writing. Enabling write optimization will automatically merge files and delete expired snapshots. It is strongly recommended to enable write optimization in real-time Upsert writing scenarios. For details, please refer to Data Optimization. Note: Integration tasks currently only support the default configuration of DLC write optimization. If users need to modify the relevant configuration, please go to the DLC page to make changes. |
Data Optimization Resources | Data optimization tasks may consume a significant amount of your cluster resources, depending on the data writing situation. To avoid impacting normal business, it is strongly recommended to use separate cluster resources for data optimization. Data optimization supports SuperSQL Engine > SparkSQL, SuperSQL Engine > Spark Jobs. If you require lifecycle management, please use SuperSQL Engine > Spark Jobs engine. |
Advanced Settings | You can configure parameters according to business needs. |
Parameters | Description |
Data Destination | DLC Data Source to be written. |
Database | Supports selection or manual input of the database name to write to By default, the database bound to the data source is used as the default database. Other databases need to be manually entered. If the data source network is not connected and the database information cannot be fetched directly, you can manually enter the database name. Data synchronization can still be performed when the DataInLong network is connected. |
Table | Support selection or manual entry of the table name to be written to. If the data source network is not connected and the table information cannot be fetched directly, you can manually enter the table name. Data synchronization can still be performed when the DataInLong network is connected. |
Write Mode | DLC real-time synchronous writing supports two to three modes: Upsert: Update write. When there is no primary key conflict, a new row can be inserted; when there is a primary key conflict, an update is performed. Suitable for scenarios where the target table has a primary key and needs to be updated in real-time based on the source data. There will be some performance overhead. Append: Append write. Regardless of whether there is a primary key, data is appended by inserting new rows. Whether there is a primary key conflict depends on the target end. Suitable for scenarios where there is no primary key and data duplication is allowed. No performance loss. Full Append + Incremental Upsert: Automatically switches data writing modes based on the source data synchronization stage. In the full stage, Append write is used to improve performance; in the incremental stage, Upsert write is used for real-time data updates. This mode currently supports data sources from MySQL, TDSQL-C MySQL, TDSQL MySQL, Oracle, and PostgreSQL. |
Unique Key | In Upsert write mode, a unique key needs to be set to ensure data ordering, and multiple selections are supported. |
Write Optimization | Write optimization is suitable for scenarios with frequent or real-time writing. Enabling write optimization will automatically merge files and delete expired snapshots. It is strongly recommended to enable write optimization in real-time Upsert writing scenarios. For details, please refer to Data Optimization. Note: Integration tasks currently only support the default configuration of DLC write optimization. If users need to modify the relevant configuration, please go to the DLC page to make changes. |
Data Optimization Resources | Data optimization may generate a large number of tasks depending on the data writing situation, consuming your cluster resources. To avoid affecting normal business, it is highly recommended to use separate cluster resources here. Data optimization supports SuperSQL Engine > SparkSQL, SuperSQL Engine > Spark Job. If there is a need for lifecycle management, please use SuperSQL Engine > Spark Job. |
Advanced Settings | You can configure parameters according to business needs. |
Internal Types | Iceberg Type |
CHAR | STRING |
VARCHAR | STRING |
STRING | STRING |
BOOLEAN | BOOLEAN |
BINARY | FIXED(L) |
VARBINARY | BINARY |
DECIMAL | DECIMAL(P,S) |
TINYINT | INT |
SMALLINT | INT |
INTEGER | INT |
BIGINT | LONG |
FLOAT | FLOAT |
DOUBLE | DOUBLE |
DATE | DATE |
TIME | TIME |
TIMESTAMP | TIMESTAMP |
TIMESTAMP_LTZ | TIMESTAMPTZ |
INTERVAL | - |
ARRAY | LIST |
MULTISET | MAP |
MAP | MAP |
ROW | STRUCT |
RAW | - |
Caused by: java.lang.IllegalArgumentException: Cannot write incompatible dataset to table with schema:* mobile should be required, but is optionalat org.apache.iceberg.types.TypeUtil.checkSchemaCompatibility(TypeUtil.java:364)at org.apache.iceberg.types.TypeUtil.validateWriteSchema(TypeUtil.java:323)
java.lang.ArrayIndexOutOfBoundsException: 1at org.apache.flink.table.data.binary.BinarySegmentUtils.getLongMultiSegments(BinarySegmentUtils.java:736) ~[flink-table-blink_2.11-1.13.6.jar:1.13.6]at org.apache.flink.table.data.binary.BinarySegmentUtils.getLong(BinarySegmentUtils.java:726) ~[flink-table-blink_2.11-1.13.6.jar:1.13.6]at org.apache.flink.table.data.binary.BinarySegmentUtils.readTimestampData(BinarySegmentUtils.java:1022) ~[flink-table-blink_2.11-1.13.6.jar:1.13.6]at org.apache.flink.table.data.binary.BinaryRowData.getTimestamp(BinaryRowData.java:356) ~[flink-table-blink_2.11-1.13.6.jar:1.13.6]at org.apache.flink.table.data.RowData.lambda$createFieldGetter$39385f9c$1(RowData.java:260) ~[flink-table-blink_2.11-1.13.6.jar:1.13.6]
Was this page helpful?