Overview
When using DLC Native Table (Iceberg), users can follow the process below to create native tables and complete the necessary configurations.
Step I: Enabling Managed Storage
Note:
Managed storage must be enabled by a DLC administrator.
Step II: Creating the DLC Native Table
There are two ways to create native tables.
1. Create a visual table through the console interface.
2. Create a table using SQL.
Note:
A database must be created before a DLC Native Table is created.
Creating Tables through the Console Interface
DLC provides a data management module for table creation. For detailed operations, see Data Management. Creating Tables through SQL
When creating tables through SQL, users write their own CREATE TABLE SQL statements. For DLC Native Table (Iceberg) creation, table descriptions, locations, and table formats do not need to be specified. However, some advanced parameters need to be included depending on the use case, and those parameters are added through TBLPROPERTIES.
If parameters were not included when you created the table or if certain attributes need to be modified, use the alter table set tblproperties command. After the alter table command is executed, restart the upstream import tasks to complete the attribute modification or addition.
Typical table creation statements for Append and Upsert scenarios are shown as follows. Users can adjust these statements based on their actual needs.
Append Scenario Table Creation
CREATE TABLE IF NOT EXISTS `DataLakeCatalog`.`axitest`.`append_case` (`id` int, `name` string, `pt` string)
PARTITIONED BY (`pt`)
TBLPROPERTIES (
'format-version' = '1',
'write.upsert.enabled' = 'false',
'write.distribution-mode' = 'hash',
'write.metadata.delete-after-commit.enabled' = 'true',
'write.metadata.previous-versions-max' = '100',
'write.metadata.metrics.default' = 'full',
'smart-optimizer.inherit' = 'default'
);
Upsert Scenario Table Creation
For Upsert scenario table creation, specify the version as 2, and set the write.upsert.enabled attribute to true, and configure bloom filters according to upsert key-values. If users have multiple primary keys, generally use the first two key-values for bloom filter configuration. If the upsert table is not partitioned and updates frequently with large data volumes, consider doing bucketing by primary key for distribution.
Examples for both partitioned and non-partitioned tables are provided as follows.
CREATE TABLE IF NOT EXISTS `DataLakeCatalog`.`axitest`.`upsert_case` (`id` int, `name` string, `pt` string)
PARTITIONED BY (bucket(4, `id`))
TBLPROPERTIES (
'format-version' = '2',
'write.upsert.enabled' = 'true',
'write.update.mode' = 'merge-on-read',
'write.merge.mode' = 'merge-on-read',
'write.parquet.bloom-filter-enabled.column.id' = 'true',
'dlc.ao.data.govern.sorted.keys' = 'id',
'write.distribution-mode' = 'hash',
'write.metadata.delete-after-commit.enabled' = 'true',
'write.metadata.previous-versions-max' = '100',
'write.metadata.metrics.default' = 'full',
'smart-optimizer.inherit' = 'default'
);
CREATE TABLE IF NOT EXISTS `DataLakeCatalog`.`axitest`.`upsert_case` (`id` int, `name` string, `pt` string)
TBLPROPERTIES (
'format-version' = '2',
'write.upsert.enabled' = 'true',
'write.update.mode' = 'merge-on-read',
'write.merge.mode' = 'merge-on-read',
'write.parquet.bloom-filter-enabled.column.id' = 'true',
'dlc.ao.data.govern.sorted.keys' = 'id',
'write.distribution-mode' = 'hash',
'write.metadata.delete-after-commit.enabled' = 'true',
'write.metadata.previous-versions-max' = '100',
'write.metadata.metrics.default' = 'full',
'smart-optimizer.inherit' = 'default'
);
Modifying Table Attributes
If related attribute values were not included when the user created the table, use the alter table to modify, add, or remove attribute values, as shown below. Any changes to table attribute values can be made this way. Note that the Iceberg format-version field cannot be modified. Additionally, if the table already has real-time imports from InLong/Oceanus/Flink, you need to restart the upstream import businesses after modifications.
ALTER TABLE `DataLakeCatalog`.`axitest`.`upsert_case` SET TBLPROPERTIES('commit.retry.num-retries' = '10');
ALTER TABLE `DataLakeCatalog`.`axitest`.`upsert_case` UNSET TBLPROPERTIES('write.parquet.bloom-filter-enabled.column.id');
Step III: Data Optimization and Lifecycle Configuration
Data optimization and lifecycle configuration can be done in two ways.
1. Through the console interface for visual configuration
2. Through SQL for configuration
Through the Console Interface for Configuration
Through SQL for Configuration
DLC defines detailed attributes for managing data optimization and lifecycle. You can flexibly configure data management and lifecycle based on business characteristics. For detailed data optimization and lifecycle configuration values, see Enable data optimization. Configuring the Database
The data optimization and lifecycle of the database can be adjusted through DBPROPERTIES, as shown below.
ALTER DATABASE DataLakeCatalog.my_database SET DBPROPERTIES ('smart-optimizer.inherit'='none', 'smart-optimizer.written.enable'='enable');
ALTER DATABASE DataLakeCatalog.my_database SET DBPROPERTIES ('smart-optimizer.inherit'='default');
ALTER DATABASE DataLakeCatalog.my_database SET DBPROPERTIES ('smart-optimizer.inherit'='none', 'smart-optimizer.lifecycle.enable'='disable');
Configuring the Data Table
Data optimization and lifecycle for data tables are configured through TBLPROPERTIES, as shown below.
ALTER TABLE `DataLakeCatalog`.`axitest`.`upsert_case` SET TBLPROPERTIES('smart-optimizer.inherit'='none', 'smart-optimizer.written.enable'='disable');
ALTER TABLE `DataLakeCatalog`.`axitest`.`upsert_case` SET TBLPROPERTIES('smart-optimizer.inherit'='default');
ALTER TABLE `DataLakeCatalog`.`axitest`.`upsert_case` SET TBLPROPERTIES('smart-optimizer.inherit'='none', 'smart-optimizer.lifecycle.enable'='enable', 'smart-optimizer.lifecycle.expiration'='7');
Step IV: Data Ingestion into Native Table
Step V: Viewing Data Optimization Tasks
You can view data governance tasks in the DLC console under the Data Operation and Maintenance menu by navigating to the Historical Tasks page. You can query tasks using keywords such as CALL, Auto, database name, and table name.
Note:
To view system data optimization tasks, users need the permissions of DLC administrators.
Tasks with IDs starting with "Auto" are automatically generated data optimization tasks. As shown in the table below.
You can also click View Details to check the basic information and results of running the tasks.
Was this page helpful?