tencent cloud

Feedback

DLC Source Table Operation Configuration

Last updated: 2024-07-31 17:34:44

    Overview

    When using DLC Native Table (Iceberg), users can follow the process below to create native tables and complete the necessary configurations.
    
    
    
    

    Step I: Enabling Managed Storage

    Note:
    Managed storage must be enabled by a DLC administrator.
    Enabling managed storage requires operations in the console. For details, see Managed Storage Configuration. If you use a metadata acceleration bucket, pay attention to permission configurations. For details, see Binding of Metadata Acceleration Bucket. Note that shared engines cannot access metadata acceleration buckets.

    Step II: Creating the DLC Native Table

    There are two ways to create native tables.
    1. Create a visual table through the console interface.
    2. Create a table using SQL.
    Note:
    A database must be created before a DLC Native Table is created.

    Creating Tables through the Console Interface

    DLC provides a data management module for table creation. For detailed operations, see Data Management.

    Creating Tables through SQL

    When creating tables through SQL, users write their own CREATE TABLE SQL statements. For DLC Native Table (Iceberg) creation, table descriptions, locations, and table formats do not need to be specified. However, some advanced parameters need to be included depending on the use case, and those parameters are added through TBLPROPERTIES.
    If parameters were not included when you created the table or if certain attributes need to be modified, use the alter table set tblproperties command. After the alter table command is executed, restart the upstream import tasks to complete the attribute modification or addition.
    Typical table creation statements for Append and Upsert scenarios are shown as follows. Users can adjust these statements based on their actual needs.

    Append Scenario Table Creation

    CREATE TABLE IF NOT EXISTS `DataLakeCatalog`.`axitest`.`append_case` (`id` int, `name` string, `pt` string)
    PARTITIONED BY (`pt`)
    TBLPROPERTIES (
    'format-version' = '1',
    'write.upsert.enabled' = 'false',
    'write.distribution-mode' = 'hash',
    'write.metadata.delete-after-commit.enabled' = 'true',
    'write.metadata.previous-versions-max' = '100',
    'write.metadata.metrics.default' = 'full',
    'smart-optimizer.inherit' = 'default'
    );

    Upsert Scenario Table Creation

    For Upsert scenario table creation, specify the version as 2, and set the write.upsert.enabled attribute to true, and configure bloom filters according to upsert key-values. If users have multiple primary keys, generally use the first two key-values for bloom filter configuration. If the upsert table is not partitioned and updates frequently with large data volumes, consider doing bucketing by primary key for distribution.
    Examples for both partitioned and non-partitioned tables are provided as follows.
    // Partitioned table
    CREATE TABLE IF NOT EXISTS `DataLakeCatalog`.`axitest`.`upsert_case` (`id` int, `name` string, `pt` string)
    PARTITIONED BY (bucket(4, `id`))
    TBLPROPERTIES (
    'format-version' = '2',
    'write.upsert.enabled' = 'true',
    'write.update.mode' = 'merge-on-read',
    'write.merge.mode' = 'merge-on-read',
    'write.parquet.bloom-filter-enabled.column.id' = 'true',
    'dlc.ao.data.govern.sorted.keys' = 'id',
    'write.distribution-mode' = 'hash',
    'write.metadata.delete-after-commit.enabled' = 'true',
    'write.metadata.previous-versions-max' = '100',
    'write.metadata.metrics.default' = 'full',
    'smart-optimizer.inherit' = 'default'
    );
    // Non-partitioned table
    CREATE TABLE IF NOT EXISTS `DataLakeCatalog`.`axitest`.`upsert_case` (`id` int, `name` string, `pt` string)
    TBLPROPERTIES (
    'format-version' = '2',
    'write.upsert.enabled' = 'true',
    'write.update.mode' = 'merge-on-read',
    'write.merge.mode' = 'merge-on-read',
    'write.parquet.bloom-filter-enabled.column.id' = 'true',
    'dlc.ao.data.govern.sorted.keys' = 'id',
    'write.distribution-mode' = 'hash',
    'write.metadata.delete-after-commit.enabled' = 'true',
    'write.metadata.previous-versions-max' = '100',
    'write.metadata.metrics.default' = 'full',
    'smart-optimizer.inherit' = 'default'
    );

    Modifying Table Attributes

    If related attribute values were not included when the user created the table, use the alter table to modify, add, or remove attribute values, as shown below. Any changes to table attribute values can be made this way. Note that the Iceberg format-version field cannot be modified. Additionally, if the table already has real-time imports from InLong/Oceanus/Flink, you need to restart the upstream import businesses after modifications.
    // Modify conflict retry attempts to 10
    
    ALTER TABLE `DataLakeCatalog`.`axitest`.`upsert_case` SET TBLPROPERTIES('commit.retry.num-retries' = '10');
    // Cancel bloom filter setting for the name field
    
    ALTER TABLE `DataLakeCatalog`.`axitest`.`upsert_case` UNSET TBLPROPERTIES('write.parquet.bloom-filter-enabled.column.id');

    Step III: Data Optimization and Lifecycle Configuration

    Data optimization and lifecycle configuration can be done in two ways.
    1. Through the console interface for visual configuration
    2. Through SQL for configuration

    Through the Console Interface for Configuration

    DLC provides a data management module for configuration. For detailed operations, see Enable data optimization.

    Through SQL for Configuration

    DLC defines detailed attributes for managing data optimization and lifecycle. You can flexibly configure data management and lifecycle based on business characteristics. For detailed data optimization and lifecycle configuration values, see Enable data optimization.

    Configuring the Database

    The data optimization and lifecycle of the database can be adjusted through DBPROPERTIES, as shown below.
    // Enable write optimization for the my_database table and do not inherit the data management policy.
    
    ALTER DATABASE DataLakeCatalog.my_database SET DBPROPERTIES ('smart-optimizer.inherit'='none', 'smart-optimizer.written.enable'='enable');
    // Set my_database to inherit the data management policy.
    
    ALTER DATABASE DataLakeCatalog.my_database SET DBPROPERTIES ('smart-optimizer.inherit'='default');
    // Disable lifecycle for the my_database table and do not inherit the data management policy.
    
    ALTER DATABASE DataLakeCatalog.my_database SET DBPROPERTIES ('smart-optimizer.inherit'='none', 'smart-optimizer.lifecycle.enable'='disable');

    Configuring the Data Table

    Data optimization and lifecycle for data tables are configured through TBLPROPERTIES, as shown below.
    // Disable write optimization for the upsert_cast table and do not inherit the database policy.
    
    ALTER TABLE `DataLakeCatalog`.`axitest`.`upsert_case` SET TBLPROPERTIES('smart-optimizer.inherit'='none', 'smart-optimizer.written.enable'='disable');
    // Set the upsert_cast table to inherit the database policy.
    
    ALTER TABLE `DataLakeCatalog`.`axitest`.`upsert_case` SET TBLPROPERTIES('smart-optimizer.inherit'='default');
    // Enable lifecycle for the upsert_cast table, set the lifecycle duration to 7 days, and do not inherit the database policy.
    
    ALTER TABLE `DataLakeCatalog`.`axitest`.`upsert_case` SET TBLPROPERTIES('smart-optimizer.inherit'='none', 'smart-optimizer.lifecycle.enable'='enable', 'smart-optimizer.lifecycle.expiration'='7');

    Step IV: Data Ingestion into Native Table

    DLC Native Table supports multiple data ingestion methods. Depending on your data source, see DLC Native Table Lake Ingestion Practice.

    Step V: Viewing Data Optimization Tasks

    You can view data governance tasks in the DLC console under the Data Operation and Maintenance menu by navigating to the Historical Tasks page. You can query tasks using keywords such as CALL, Auto, database name, and table name.
    Note:
    To view system data optimization tasks, users need the permissions of DLC administrators.
    Tasks with IDs starting with "Auto" are automatically generated data optimization tasks. As shown in the table below.
    
    
    
    
    You can also click View Details to check the basic information and results of running the tasks.
    
    
    
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support