tencent cloud

All product documents
Data Lake Compute
DLC Source Table Operation Configuration
Last updated: 2024-07-31 17:34:44
DLC Source Table Operation Configuration
Last updated: 2024-07-31 17:34:44

Overview

When using DLC Native Table (Iceberg), users can follow the process below to create native tables and complete the necessary configurations.





Step I: Enabling Managed Storage

Note:
Managed storage must be enabled by a DLC administrator.
Enabling managed storage requires operations in the console. For details, see Managed Storage Configuration. If you use a metadata acceleration bucket, pay attention to permission configurations. For details, see Binding of Metadata Acceleration Bucket. Note that shared engines cannot access metadata acceleration buckets.

Step II: Creating the DLC Native Table

There are two ways to create native tables.
1. Create a visual table through the console interface.
2. Create a table using SQL.
Note:
A database must be created before a DLC Native Table is created.

Creating Tables through the Console Interface

DLC provides a data management module for table creation. For detailed operations, see Data Management.

Creating Tables through SQL

When creating tables through SQL, users write their own CREATE TABLE SQL statements. For DLC Native Table (Iceberg) creation, table descriptions, locations, and table formats do not need to be specified. However, some advanced parameters need to be included depending on the use case, and those parameters are added through TBLPROPERTIES.
If parameters were not included when you created the table or if certain attributes need to be modified, use the alter table set tblproperties command. After the alter table command is executed, restart the upstream import tasks to complete the attribute modification or addition.
Typical table creation statements for Append and Upsert scenarios are shown as follows. Users can adjust these statements based on their actual needs.

Append Scenario Table Creation

CREATE TABLE IF NOT EXISTS `DataLakeCatalog`.`axitest`.`append_case` (`id` int, `name` string, `pt` string)
PARTITIONED BY (`pt`)
TBLPROPERTIES (
'format-version' = '1',
'write.upsert.enabled' = 'false',
'write.distribution-mode' = 'hash',
'write.metadata.delete-after-commit.enabled' = 'true',
'write.metadata.previous-versions-max' = '100',
'write.metadata.metrics.default' = 'full',
'smart-optimizer.inherit' = 'default'
);

Upsert Scenario Table Creation

For Upsert scenario table creation, specify the version as 2, and set the write.upsert.enabled attribute to true, and configure bloom filters according to upsert key-values. If users have multiple primary keys, generally use the first two key-values for bloom filter configuration. If the upsert table is not partitioned and updates frequently with large data volumes, consider doing bucketing by primary key for distribution.
Examples for both partitioned and non-partitioned tables are provided as follows.
// Partitioned table
CREATE TABLE IF NOT EXISTS `DataLakeCatalog`.`axitest`.`upsert_case` (`id` int, `name` string, `pt` string)
PARTITIONED BY (bucket(4, `id`))
TBLPROPERTIES (
'format-version' = '2',
'write.upsert.enabled' = 'true',
'write.update.mode' = 'merge-on-read',
'write.merge.mode' = 'merge-on-read',
'write.parquet.bloom-filter-enabled.column.id' = 'true',
'dlc.ao.data.govern.sorted.keys' = 'id',
'write.distribution-mode' = 'hash',
'write.metadata.delete-after-commit.enabled' = 'true',
'write.metadata.previous-versions-max' = '100',
'write.metadata.metrics.default' = 'full',
'smart-optimizer.inherit' = 'default'
);
// Non-partitioned table
CREATE TABLE IF NOT EXISTS `DataLakeCatalog`.`axitest`.`upsert_case` (`id` int, `name` string, `pt` string)
TBLPROPERTIES (
'format-version' = '2',
'write.upsert.enabled' = 'true',
'write.update.mode' = 'merge-on-read',
'write.merge.mode' = 'merge-on-read',
'write.parquet.bloom-filter-enabled.column.id' = 'true',
'dlc.ao.data.govern.sorted.keys' = 'id',
'write.distribution-mode' = 'hash',
'write.metadata.delete-after-commit.enabled' = 'true',
'write.metadata.previous-versions-max' = '100',
'write.metadata.metrics.default' = 'full',
'smart-optimizer.inherit' = 'default'
);

Modifying Table Attributes

If related attribute values were not included when the user created the table, use the alter table to modify, add, or remove attribute values, as shown below. Any changes to table attribute values can be made this way. Note that the Iceberg format-version field cannot be modified. Additionally, if the table already has real-time imports from InLong/Oceanus/Flink, you need to restart the upstream import businesses after modifications.
// Modify conflict retry attempts to 10

ALTER TABLE `DataLakeCatalog`.`axitest`.`upsert_case` SET TBLPROPERTIES('commit.retry.num-retries' = '10');
// Cancel bloom filter setting for the name field

ALTER TABLE `DataLakeCatalog`.`axitest`.`upsert_case` UNSET TBLPROPERTIES('write.parquet.bloom-filter-enabled.column.id');

Step III: Data Optimization and Lifecycle Configuration

Data optimization and lifecycle configuration can be done in two ways.
1. Through the console interface for visual configuration
2. Through SQL for configuration

Through the Console Interface for Configuration

DLC provides a data management module for configuration. For detailed operations, see Enable data optimization.

Through SQL for Configuration

DLC defines detailed attributes for managing data optimization and lifecycle. You can flexibly configure data management and lifecycle based on business characteristics. For detailed data optimization and lifecycle configuration values, see Enable data optimization.

Configuring the Database

The data optimization and lifecycle of the database can be adjusted through DBPROPERTIES, as shown below.
// Enable write optimization for the my_database table and do not inherit the data management policy.

ALTER DATABASE DataLakeCatalog.my_database SET DBPROPERTIES ('smart-optimizer.inherit'='none', 'smart-optimizer.written.enable'='enable');
// Set my_database to inherit the data management policy.

ALTER DATABASE DataLakeCatalog.my_database SET DBPROPERTIES ('smart-optimizer.inherit'='default');
// Disable lifecycle for the my_database table and do not inherit the data management policy.

ALTER DATABASE DataLakeCatalog.my_database SET DBPROPERTIES ('smart-optimizer.inherit'='none', 'smart-optimizer.lifecycle.enable'='disable');

Configuring the Data Table

Data optimization and lifecycle for data tables are configured through TBLPROPERTIES, as shown below.
// Disable write optimization for the upsert_cast table and do not inherit the database policy.

ALTER TABLE `DataLakeCatalog`.`axitest`.`upsert_case` SET TBLPROPERTIES('smart-optimizer.inherit'='none', 'smart-optimizer.written.enable'='disable');
// Set the upsert_cast table to inherit the database policy.

ALTER TABLE `DataLakeCatalog`.`axitest`.`upsert_case` SET TBLPROPERTIES('smart-optimizer.inherit'='default');
// Enable lifecycle for the upsert_cast table, set the lifecycle duration to 7 days, and do not inherit the database policy.

ALTER TABLE `DataLakeCatalog`.`axitest`.`upsert_case` SET TBLPROPERTIES('smart-optimizer.inherit'='none', 'smart-optimizer.lifecycle.enable'='enable', 'smart-optimizer.lifecycle.expiration'='7');

Step IV: Data Ingestion into Native Table

DLC Native Table supports multiple data ingestion methods. Depending on your data source, see DLC Native Table Lake Ingestion Practice.

Step V: Viewing Data Optimization Tasks

You can view data governance tasks in the DLC console under the Data Operation and Maintenance menu by navigating to the Historical Tasks page. You can query tasks using keywords such as CALL, Auto, database name, and table name.
Note:
To view system data optimization tasks, users need the permissions of DLC administrators.
Tasks with IDs starting with "Auto" are automatically generated data optimization tasks. As shown in the table below.




You can also click View Details to check the basic information and results of running the tasks.




Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback

Contact Us

Contact our sales team or business advisors to help your business.

Technical Support

Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

7x24 Phone Support