Table Creation Practice

Data Lake Compute

Product Introduction

Purchase Guide

Configuration Adjustment Fees

Getting Started

Complete Process for New User Activation

DLC Data Import Guide

Quick Start with Data Analytics in Data Lake Compute

Quick Start with Permission Management in Data Lake Compute

Quick Start with Partition Table

Enabling Data Optimization

Cross-Source Analysis of EMR Hive Data

Standard Engine Configuration Guide

Operation Guide

Console Operation Introduction

Data Development and Exploration

Data Exploration

SQL Editor

Data Query Task

SELECT Task

Querying Partition Table

Querying JSON Data

Querying Data from Other Sources

Using View

INSERT INTO

Querying Script Parameters

Obtaining Task Results

Query Script Analysis

Data Job

Overview

Configuring Data Access Policy

Creating Data Job

Managing Data Job

PySpark Dependency Package Management

Resource Management

Engine Management

Network Connection Configuration

Storage Configuration

Managed Storage Configuration

Binding a Metadata Acceleration Bucket

Metadata Management

Data Catalogs and DMC

Data Table Management

Data View Management

Function Management

Partition Field Policy

Ops Management

Historical Task Instances

Historical task(Old version)

Session Management

Insight Management

Task Insights

System Management

User and Permission Management

CAM Service

Permission Overview

User and Work Group

Sub-Account Permission Management

Monitoring and Alarms

Data Engine Monitoring

Data Job Monitoring

Access Point Gateway Engine Monitoring

Monitoring Alarm Configuration

Audit Log

Development Guide

SparkJar Job Development Guide

PySpark Job Development Guide

Query Performance Optimization Guide

UDF Function Development Guide

Materialized View

System Restraints

Metadata Information

Computing Task

Client Access

JDBC Access

DLC JDBC Access

Hive JDBC Access

Presto JDBC Access

Configuring Public Access for Standard Engine

TDLC Command Line Interface Tool Access

Third-party Software Linkage

Python Access

Practical Tutorial

Table Creation Practice

Using Apache Airflow to Schedule DLC Engine to Submit Tasks

Direct Query of DLC Internal Storage with StarRocks

DLC Native Table

DLC Source Table Core Capabilities

DLC Source Table Operation Configuration

DLC Source Table Lake Ingestion Practice

DLC Source Table FAQs

SQL Statement

SuperSQL Statement

Overview of SuperSQL Statement

Unified Statement

DDL Statement

ALTER DATABASE

ALTER DATABASE SET DBPROPERTIES

ALTER DATABASE SET LOCATION

DROP DATABASE

CREATE TABLE

REPLACE TABLE AS SELECT

SHOW COLUMNS IN TABLE

ALTER TABLE

ALTER TABLE ADD COLUMNS

ALTER TABLE ADD COLUMN AFTER/FIRST

ALTER TABLE DROP COLUMN

ALTER TABLE ADD PARTATION

SHOW PARTITIONS

ALTER TABLE DROP PARTITION

ALTER TABLE ADD PARTITION FIELD

ALTER TABLE DROP PARTITION FIELD

ALTER TABLE ... RENAME COLUMN

ALTER TABLE SET TBLPROPERTIES

ALTER TABLE SET LOCATION

ALTER TABLE ... WRITE ORDERED BY

ALTER TABLE ... WRITE DISTRIBUTED BY PARTITION

ALTER TABLE ... SET IDENTIFIER FIELDS

ALTER TABLE ... DROP IDENTIFIER FIELDS

ALTER VIEW

ALTER VIEW SET TBLPROPERTIES

DML Statement

DQL Statement

Iceberg Table Statement

Differences in Statement Between Iceberg External Tables and Native Tables

Materialized View Statement

SQL Implicit Conversion

Functions

Unified Functions

Overview of Unified Functions

Binary Functions

Bitwise Functions

Collection Functions

Date and Time Functions

JSON Functions

Mathematical Functions

Presto Built-in Functions

Comparison of Hive Functions

Overview of Standard Spark Statement

Overview of Standard Presto Statement

API Documentation

Making API Requests

Data Table APIs

DescribeLakeFsDirSummary

DescribeLakeFsInfo

QueryResult

GenerateCreateMangedTableSql

Task APIs

Metadata APIs

DescribeForbiddenTablePro

DescribeDLCCatalogAccess

GrantDLCCatalogAccess

RevokeDLCCatalogAccess

DropDMSTable

DropDLCTable

DescribeDMSDatabaseList

Service Configuration APIs

CreateCHDFSBindingProduct

DeleteCHDFSBindingProduct

DescribeOtherCHDFSBindingList

CreateStoreLocation

DescribeStoreLocation

ModifyDataEngineDescription

RollbackDataEngineImage

SwitchDataEngine

SwitchDataEngineImage

UpgradeDataEngineImage

DeleteThirdPartyAccessUser

DescribeDataEngineImageVersions

DescribeSubUserAccessPolicy

DescribeThirdPartyAccessUser

RegisterThirdPartyAccessUser

RestartDataEngine

UpdateUserDataEngineConfig

UpdateDataEngineConfig

Permission Management APIs

Database APIs

ModifyAdvancedStoreLocation

ModifyGovernEventRule

DescribeAdvancedStoreLocation

Data Source Connection APIs

CheckDataEngineImageCanBeRollback

CheckDataEngineImageCanBeUpgrade

DescribeDataEnginePythonSparkImages

Data Optimization APIs

GetOptimizerPolicy

Data Engine APIs

CreateDataEngine

DescribeDataEnginesScaleDetail

DeleteDataEngine

RenewDataEngine

SuspendResumeDataEngine

UpdateDataEngine

DescribeUpdatableDataEngines

DescribeDataEngine

DescribeUserDataEngineConfig

CheckDataEngineConfigPairsValidity

General Reference

Operation Guide on Connecting Third-Party Software to DLC

Connecting CBoard to DLC

DLC Policy

Data Privacy And Security Agreement

Service Level Agreement

DocumentationData Lake ComputePractical TutorialTable Creation Practice

Table Creation Practice

Download PDF

Last updated: 2025-03-12 18:15:01

Table Creation Practice

Last updated: 2025-03-12 18:15:01

Download PDF

DLC (Data Lake Compute) supports creating native tables (Iceberg) and external tables in various scenarios. For Table creation, refer to the following practice cases.
Create Native Table (Iceberg)
Spark ETL Scenarios
Applicable to: Periodically perform batch job operations such as insert into, insert overwrite, merge into, etc.
/** 
Copy-on-write is the default mode. If it cannot be determined which one of the two modes is used, the copy-on-write mode can be used. The merge-on-read mode has better performance in row-level update scenarios.
Applicable scenarios of copy-on-write: Featuring relatively higher query performance but  relatively slower writing speed, copy-on-write is suitable for scenarios that have periodic ETL tasks or batch update operations with a large data volume.
Applicable scenarios of merge-on-read: Featuring relatively lower query performance but relatively faster writing speed, merge-on-read is suitable for scenarios with high write performance requirements. It has strong row-level update capabilities and significantly improves the write performance for frequent and small-scale (< 10%) merge-into/update/delete operations or Oceanus (Flink stream writing) scenarios*/
/** Copy-on-write table */
CREATE TABLE dlc_db.iceberg_etl (
      id          INT,
      name        string,
      age         INT
  ) TBLPROPERTIES (
    'format-version' = '2',
    'write.metadata.previous-versions-max' = '100',
    'write.metadata.delete-after-commit.enabled' = 'true');
﻿
/** merge on read table */
CREATE TABLE dlc_db.iceberg_etl (
      id          INT,
      name        string,
      age         INT
  ) TBLPROPERTIES (
    'format-version' = '2',
    'write.metadata.previous-versions-max' = '100',
    'write.metadata.delete-after-commit.enabled' = 'true',
    'write.update.mode' = 'merge-on-read',
    'write.merge.mode' = 'merge-on-read',
    'write.delete.mode' = 'merge-on-read'
  );
Console Creation: Copy-On-Write Mode
1. lick to create native tables.
﻿
2. Select the data table version.
Console Creation: Merge-On-Read Mode (Requires Adding Three Additional Attribute Values)
﻿
Flink Stream Writing Scenarios
Suitable for Oceanus (Flink stream writing) scenarios.
/** The Flink stream writing primary key is the ID */
CREATE TABLE dlc_db.iceberg_cdc_by_id (
      id          INT,
      name        string,
      age         INT
  ) TBLPROPERTIES (
    'format-version' = '2',
    'write.metadata.previous-versions-max' = '100',
    'write.metadata.delete-after-commit.enabled' = 'true',
    'write.upsert.enabled' = 'true',
    'write.update.mode' = 'merge-on-read',
    'write.merge.mode' = 'merge-on-read',
    'write.delete.mode' = 'merge-on-read',
    'write.distribution-mode' = 'hash',
    'write.parquet.bloom-filter-enabled.column.id' = 'true',
    'dlc.ao.data.govern.sorted.keys' = 'id'
  );
﻿
/** The Flink stream writing primary key is a composite primary key of ID and name */
CREATE TABLE dlc_db.iceberg_cdc_by_id_and_name (
      id          INT,
      name        string,
      age         INT
  ) TBLPROPERTIES (
    'format-version' = '2',
    'write.metadata.previous-versions-max' = '100',
    'write.metadata.delete-after-commit.enabled' = 'true',
    'write.upsert.enabled' = 'true',
    'write.update.mode' = 'merge-on-read',
    'write.merge.mode' = 'merge-on-read',
    'write.delete.mode' = 'merge-on-read',
    'write.distribution-mode' = 'hash',
    'write.parquet.bloom-filter-enabled.column.id' = 'true',
    'write.parquet.bloom-filter-enabled.column.name' = 'true',
    'dlc.ao.data.govern.sorted.keys' = 'id,name'
  );
Create a table with the primary key of ID in the console.
﻿
Configuration Instructions
Attribute Value
Description
Configuration Guidance
format-version
Iceberg table version. The values include 1 and 2. The default value is 2  for Spark Standard-S 1.1 and SuperSQL Spark 3.5 and 1 for other scenarios.
It is   recommended that it be set to 2.
write.upsert.enabled
Specifies whether to enable upsert. The value is true.   If it is not specified, upsert is not enabled.
If upsert is involved in a writing scenario, it should   be set to true.
write.update.mode
Update   mode
The default value is copy-on-write.
Copy-on-write is the default mode. If it cannot be determined   which one of the two modes is used, the copy-on-write mode can be used. The   merge-on-read mode has better performance in row-level update scenarios.
Applicable scenarios of copy-on-write: Featuring relatively   higher query performance but    relatively slower writing speed, copy-on-write is suitable for   scenarios that have periodic ETL tasks or batch update operations with a   large data volume.
Applicable scenarios of merge-on-read: Featuring relatively   lower query performance but relatively faster writing speed, merge-on-read is   suitable for scenarios with high write performance requirements. It has   strong row-level update capabilities and significantly improves the write   performance for frequent and small-scale (< 10%) merge-into/update/delete   operations or Oceanus (Flink stream writing) scenarios.
write.merge.mode
Merge   mode
The   default value is copy-on-write.
Copy-on-write   is the default mode. If it cannot be determined which one of the two modes is   used, the copy-on-write mode can be used. The merge-on-read mode has better   performance in row-level update scenarios.
Applicable   scenarios of copy-on-write: Featuring relatively higher query performance   but  relatively slower writing speed,   copy-on-write is suitable for scenarios that have periodic ETL tasks or batch   update operations with a large data volume.
Applicable   scenarios of merge-on-read: Featuring relatively lower query performance but   relatively faster writing speed, merge-on-read is suitable for scenarios with   high write performance requirements. It has strong row-level update   capabilities and significantly improves the write performance for frequent   and small-scale (< 10%) merge-into/update/delete operations or Oceanus   (Flink stream writing) scenarios.
write.parquet.bloom-filter-enabled.column.{col}
Specifies   whether to enable bloom. It is only suitable for Oceanus (Flink stream writing)   scenarios. The value true indicates that it is enabled, and it is not enabled   by default.
For   Flink stream writing scenarios, it should be enabled and configured based on   the upstream primary key. If there are multiple primary keys in the upstream,   the first two (at   most) primary keys are used. After it is   enabled, the MOR query and small file merge performance can be improved.
write.distribution-mode
Write   mode
When it is set to hash, written data is repartitioned   automatically. However, the write performance is affected for some write situations. It is recommended that it is not configured by default. For Oceanus   (Flink stream writing) scenarios, it is recommended that it be set to hash   to optimize the write performance. For other scenarios, it is   recommended that the default value be retained, that is, it is not   configured.
write.metadata.delete-after-commit.enabled
Starts   auto cleanup of metadata files.
It is   strongly recommended that it be set to true. After it is enabled, Iceberg   automatically cleans historical metadata files when generating snapshots,   which prevents the accumulation of a large number of metadata files.
write.metadata.previous-versions-max
Sets   the default number of metadata files to be retained.
The   default value is 100. In some special cases, users can appropriately adjust   this value. It needs to be used together with write.metadata.delete-after-commit.enabled.
Create External Table
Creating a CSV Format External Table
/**
1. separatorChar: Separator, which is a comma (,) by default. It is used to specify the separator between fields in a CSV file, helping Hive correctly parse the fields in each row.
2. quoteChar: Quote character, which are quotation marks (") by default. If the original file does not have a quote character, the default value can be used. The quote character helps process fields that contain separators (such as commas) or line breaks. For example, if the value x1,x2 of the column1 field is surrounded by quotation marks "x1,x2", it will not be incorrectly parsed as two separate fields.
3. LOCATION: It needs to be changed to the COS storage path.
4. Table attribute skip.header.line.count in TBLPROPERTIES: The default value is 0. When it is set to 1, one row is skipped. This attribute specifies the number of header rows to skip when a file is read because the headers of many CSV files usually contain column names instead of actual data.
*/
CREATE EXTERNAL TABLE IF NOT EXISTS  dlc_db.`csv_tb`(
    `id` int,
    `name` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
  'quoteChar'='"',
  'separatorChar'=',')
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'cosn://your_cos_location'
TBLPROPERTIES (
  'skip.header.line.count'='1');
﻿
Creating a Json Format External Table
/**
 LOCATION： Refers to the COS storage path.
Others remain unchanged.
*/
CREATE EXTERNAL TABLE IF NOT EXISTS dlc_db.json_demo
(`id` bigint, `name` string) 
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' 
STORED AS TEXTFILE 
LOCATION 'cosn://your_cos_location'
Example of JSON file content. Each line is an independent JSON string:
{"id":1,"name":"tom"}
{"id":2,"name":"tony"}
Creating an External Table in Parquet Format
The SQL statements for table creation are as follows:
/**
 LOCATION： Refers to the COS storage path.
Others remain unchanged.
*/
CREATE EXTERNAL TABLE IF NOT EXISTS dlc_db.parquet_demo
 (`id` int, `name` string) 
 PARTITIONED BY (`dt` string) 
 STORED AS PARQUET LOCATION 'cosn://your_cos_location';
Creating an ORC Format External Table
/**
 LOCATION： Refers to the COS storage path.
Others remain unchanged.
*/
CREATE EXTERNAL TABLE IF NOT EXISTS dlc_db.orc_demo
(`id` int,`name` string)
 PARTITIONED BY (`dt` string) 
 STORED AS ORC LOCATION 'cosn://your_cos_location'
Creating an External Table in AVRO Format
/**
 LOCATION： Refers to the COS storage path.
Others remain unchanged.
*/
CREATE EXTERNAL TABLE IF NOT EXISTS dlc_db.avro_demo
(`id` int,`name` string)
 PARTITIONED BY (`dt` string) 
 STORED AS ORC LOCATION 'cosn://your_cos_location'
Supplementary Description
Column Types and Partition Field Types
For details, see the Parameter section in CREATE TABLE.
Note:
When a SELECT query statement is executed to query binary fields, the following error may occur because the engine writes the result set to a CSV file by default, which does not support binary data.
﻿
Solution (engine or task-level configuration are all supported):
1. Change the results file format for saving: kyuubi.operation.result.saveToFile.format=parquet (Set the format of the stored file, Parquet or ORC.)
2. Change the configuration to save the results to another place other than COS: kyuubi.operation.result.saveToFile.enabled=false
Complex Column Types
/**
LOCATION： Refers to the COS storage path.
Others remain unchanged.
*/
CREATE EXTERNAL TABLE dlc_db.orc_demo_with_complex_type(
    col_bigint bigint COMMENT 'id number',
    col_int int,
    col_struct struct<x: double, y: double>,
    col_array array<struct<x: double, y: double>>,
    col_map map<struct<x: int>, struct<a: int>>,
    col_decimal DECIMAL(10,2),
    col_float FLOAT,
    col_double DOUBLE,
    col_string STRING,
    col_boolean BOOLEAN,
    col_date DATE,
    col_timestamp TIMESTAMP
)
PARTITIONED BY (`dt` string) 
STORED AS ORC LOCATION 'cosn://your_cos_location';
Note：
1. AVRO data sources do not support nested struct fields to map or array fields.
2. The key of map fields in an AVRO data source can only be string type.
3. When struct, array, or map fields are used in a CSV data source, the following error may occur because the engine performs strong verification on the data format.
Disable strong verification settings of the engine and set the static parameter of the engine with spark.sql.storeAssignmentPolicy=legacy.
Complex Partition Field Types
1. LOCATION: It needs to be changed to the COS storage path.
2. Supported partition field types include TINYINT, SMALLINT, INT, BIGINT, DECIMAL, FLOAT (not recommended, and DECIMAL is recommended ), DOUBLE (not recommended, and DECIMAL is recommended ), STRING, BOOLEAN, DATE, and TIMESTAMP.
CREATE EXTERNAL TABLE dlc_db.orc_demo_with_complex_partition(
    col_int int
)
PARTITIONED BY (
    pt_tinyint TINYINT,
    pt_smallint SMALLINT,
    pt_decimal DECIMAL(10,2),
    pt_string STRING,
    pt_date DATE,
    pt_timestamp TIMESTAMP )
STORED AS ORC LOCATION 'cosn://lcl-bucket-1305424723/dlc/orc_demo_with_complex_partition/';
Note：
The sum of Hive table partition names cannot exceed 767 characters.
Case-insensitive Metadata
Table names and column names in metadata are case-insensitive when they are used. However, the original case format during creation is retained on the Data Management page.

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

Attribute Value	Description	Configuration Guidance
format-version	Iceberg table version. The values include 1 and 2. The default value is 2 for Spark Standard-S 1.1 and SuperSQL Spark 3.5 and 1 for other scenarios.	It is recommended that it be set to 2.
write.upsert.enabled	Specifies whether to enable upsert. The value is true. If it is not specified, upsert is not enabled.	If upsert is involved in a writing scenario, it should be set to true.
write.update.mode	Update mode	The default value is copy-on-write. Copy-on-write is the default mode. If it cannot be determined which one of the two modes is used, the copy-on-write mode can be used. The merge-on-read mode has better performance in row-level update scenarios. Applicable scenarios of copy-on-write: Featuring relatively higher query performance but relatively slower writing speed, copy-on-write is suitable for scenarios that have periodic ETL tasks or batch update operations with a large data volume. Applicable scenarios of merge-on-read: Featuring relatively lower query performance but relatively faster writing speed, merge-on-read is suitable for scenarios with high write performance requirements. It has strong row-level update capabilities and significantly improves the write performance for frequent and small-scale (< 10%) merge-into/update/delete operations or Oceanus (Flink stream writing) scenarios.
write.merge.mode	Merge mode	The default value is copy-on-write. Copy-on-write is the default mode. If it cannot be determined which one of the two modes is used, the copy-on-write mode can be used. The merge-on-read mode has better performance in row-level update scenarios. Applicable scenarios of copy-on-write: Featuring relatively higher query performance but relatively slower writing speed, copy-on-write is suitable for scenarios that have periodic ETL tasks or batch update operations with a large data volume. Applicable scenarios of merge-on-read: Featuring relatively lower query performance but relatively faster writing speed, merge-on-read is suitable for scenarios with high write performance requirements. It has strong row-level update capabilities and significantly improves the write performance for frequent and small-scale (< 10%) merge-into/update/delete operations or Oceanus (Flink stream writing) scenarios.
write.parquet.bloom-filter-enabled.column.{col}	Specifies whether to enable bloom. It is only suitable for Oceanus (Flink stream writing) scenarios. The value true indicates that it is enabled, and it is not enabled by default.	For Flink stream writing scenarios, it should be enabled and configured based on the upstream primary key. If there are multiple primary keys in the upstream, the first two (at most) primary keys are used. After it is enabled, the MOR query and small file merge performance can be improved.
write.distribution-mode	Write mode	When it is set to hash, written data is repartitioned automatically. However, the write performance is affected for some write situations. It is recommended that it is not configured by default. For Oceanus (Flink stream writing) scenarios, it is recommended that it be set to hash to optimize the write performance. For other scenarios, it is recommended that the default value be retained, that is, it is not configured.
write.metadata.delete-after-commit.enabled	Starts auto cleanup of metadata files.	It is strongly recommended that it be set to true. After it is enabled, Iceberg automatically cleans historical metadata files when generating snapshots, which prevents the accumulation of a large number of metadata files.
write.metadata.previous-versions-max	Sets the default number of metadata files to be retained.	The default value is 100. In some special cases, users can appropriately adjust this value. It needs to be used together with write.metadata.delete-after-commit.enabled.

tencent cloud

New User Offers

Next-Generation CDN：EdgeOne

Elasticsearch Service Special Offers

Free Tier

Tencent Cloud Startup Program

Special Offers

Lighthouse Special Offers

Cloud Object Storage Special Offers

Featured Products

New Products

Education

Tencent Cloud Online Education Solutions

Gaming

Gaming Solution

Game Media Solutions

Financial Services

Financial Services Solution

Audio & Video

Audio/Video Solution

LVB Recording Solution

Interactive Classroom Solution

Interactive Live Streaming Solution

Audio Chat Social Networking Solution

Real Estate

Tencent Cloud LinkBase(Weiling)

E-commerce

E-commerce retail solutions

Compute

Cloud Virtual Machine

Auto Scaling

Batch Compute

CVM Dedicated Host

Database

TencentDB for MySQL

TencentDB for Redis®

TencentDB for CTSDB

TDSQL for MySQL

Data Transfer Service

TencentDB for MongoDB

TencentDB for PostgreSQL

TencentDB for SQL Server

TencentDB for TcaplusDB

Video Service

Cloud Streaming Services

Video on Demand

Media Processing Service

Cloud Application Rendering

Cloud Contact Center

Game Multimedia Engine

Chat

Real-time Communication

Tencent Effect SDK

AI and Machine Learning

Image Creation Large Model

Face Fusion

eKYC

Optical Character Recognition

Video Creation Large Model

Industry Applications

Tencent HealthCare Omics Platform

Container and Middleware

TDMQ for CKafka

Serverless Cloud Function

Tencent Kubernetes Engine

Tencent Kubernetes Engine for Serverless

Networking

Cloud Load Balancer

Virtual Private Cloud

Direct Connect

Cloud Connect Network

NAT Gateway

VPN Connection

Bandwidth Package

Anycast Internet Acceleration

Elastic Network Interface

Flow Logs

Global Application Acceleration Platform

Security

Captcha