CREATE TABLE

Description
Supported engines: Presto and SparkSQL
Applicable table: Native Iceberg tables and external tables
Purpose: Create a table with some attributes. CREATE TABLE AS can also be used.
Storage path for the created table: The path can be a COS directory but not a file.
Statement for External Table
Statement
CREATE TABLE [ IF NOT EXISTS ] table_identifier
    ( col_name[:] col_type [ COMMENT col_comment ], ... )
USING data_source
    [ COMMENT table_comment ]
    [ OPTIONS ( 'key1'='value1', 'key2'='value2' )]
    [ PARTITIONED BY ( col_name1, transform(col_name2), ... ) ]
    [ LOCATION path ]
    [ TBLPROPERTIES ( property_name=property_value, ... ) ]
Parameter
USING data_source: Data source used during table creation. Currently, the value of data_source can be CSV, ORC, PARQUET, and ICEBERG.
table_identifier: Table name in the three-part format. Example: catalog.database.table.
COMMENT: Table description
OPTIONS: Additional parameters of USING data_source, which are used to pass dynamic values during storage.
PARTITIONED BY: Create partitions based on specified columns.
LOCATION path: Storage path of the data table.
TBLPROPERTIES: A set of key-value pairs used to specify table parameters.
Detailed Description of USING and OPTIONS
USING CSV
USING ORC
USING PARQUET
Reference: Working with CSV.
Configurations supported by CSV data tables are as follows.
Key Supported by OPTIONS
Corresponding Default Value
Description
sep or delimiter
,
Delimiter between columns for storage in CSV files. The comma is used by default.
mode
PERMISSIVE
Processing mode when data conversion does not meet expectations.
PERMISSIVE: A more relaxed mode, which is the default mode. If extra columns exist after the data conversion of a certain row, only the required columns are adopted automatically.
DROPMALFORMED: Discard data that does not meet expectations. For example, if a row has an extra column, the row will be discarded.
FAILFAST: Strictly follow the CSV format. The conversion fails if the result does not meet expectations, such as extra columns in a row.
encoding or charset
UTF-8
String encoding scheme
Example: UTF-8, US-ASCII, ISO-8859-1, UTF-16BE, UTF-16LE, and UTF-16
quote
\\"
Indicate whether the quotation marks are single or double. Remember to use an escape character.
escape
\\\\
Escape character
charToEscapeQuoteEscaping
-
Escape characters required inside quotation marks
comment
\\u0000
Remarks.
header
false
Indicate whether a header exists.
inferSchema
false
Indicate whether to infer the data type of each column. If the inference is not performed, the data type of each column is string.
ignoreLeadingWhiteSpace
Read: false
Write: true
Ignore the leading empty string.
ignoreTrailingWhiteSpace
Read: false
Write: true
Ignore the trailing empty string.
columnNameOfCorruptRecord
_corrupt_record
Name of the column where data cannot be converted. If the value is inconsistent with that of spark.sql.columnNameOfCorruptRecord, the value specified in the CREATE TABLE statement prevails.
nullValue
-
Storage format for null values, which are empty strings by default. In this case, use the format defined by emptyValue.
nanValue
NaN
Storage format for non-numeric values
positiveInf
Inf
Storage format for positive infinity values
negativeInf
-Inf
Storage format for negative infinity values
compression or codec
-
Name of the compression algorithm. No compression is performed by default. Abbreviations such as bzip2, deflate, gzip, lz4, and snappy can be used.
timeZone
Default system time zone
Default time zone. If the value is inconsistent with that of spark.sql.session.timeZone (example: Asia/Shanghai), the value specified in the CREATE TABLE statement prevails.
locale
en-US
Language
dateFormat
yyyy-MM-dd
Default date format
timestampFormat
yyyy-MM-dd'T'HH:mm:ss.SSSXXX
Default time format, which is yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX] in non-LEGACY mode.
multiLine
false
Allow multiple rows.
maxColumns
20480
Maximum number of columns
maxCharsPerColumn
-1
Maximum number of characters per column. -1 indicates that the maximum number is not limited.
escapeQuotes
true
Escape quotation marks
quoteAll
quoteAll
Enclose all values in quotation marks.
samplingRatio
1.0
Sampling ratio
enforceSchema
true
Forcibly apply the specified schema for reading, and ignore the 
table headers.
emptyValue
Read:
Write: \\"\\"
Read and write formats for empty values.
lineSep
-
Line break.
inputBufferSize
-
Buffer size for read. If the value is inconsistent with that of spark.sql.csv.parser.inputBufferSize, the value specified in the CREATE TABLE statement prevails.
unescapedQuoteHandling
STOP_AT_DELIMITER
Policy for handling unescaped quotation marks
STOP_AT_DELIMITER: Stop reading when a delimiter is found.
BACK_TO_DELIMITER: Back to the delimiter.
STOP_AT_CLOSING_QUOTE: Stop reading when a closing quotation mark is found.
SKIP_VALUE: Skip this column of data.
RAISE_ERROR: Report an error.
﻿
Configurations supported by ORC data tables are as follows:
Key Supported by OPTIONS
Corresponding Default Value
Description
compression or orc.compress
snappy
Compression algorithm. Abbreviations such as snappy, zlib, lzo, lz3, and zstd are supported. If the value is inconsistent with that of spark.sql.orc.compression.codec, the value specified in the CREATE TABLE statement prevails.
mergeSchema
false
Merge schemas. If the value is inconsistent with that of spark.sql.orc.mergeSchema, the value specified in the CREATE TABLE statement prevails.
If you use HiveRead and HiveWriter (configuring spark.sql.hive.convertMetastoreOrc=false) for read and write, OPTIONS can also support native ORC configurations. For details, see LanguageManual ORC.
Most of the parameters related to PARQUET data tables can be configured using SparkConf (recommended). Configurations supported by OPTIONS are as follows:
Key Supported by OPTIONS
Corresponding Default Value
Description
compression or parquet.compression
snappy
Compression algorithm. Snappy is used by default. If the value is inconsistent with that of spark.sql.parquet.compression.codec, the value specified in the CREATE TABLE statement prevails.
mergeSchema
false
Indicate whether to merge schemas. If the value is inconsistent with that of spark.sql.parquet.mergeSchema, the value specified in the CREATE TABLE statement prevails.
datetimeRebaseMode
EXCEPTION
Policy of date conversion during PARQUET file write. Dates are converted based on the Gregorian calendar in LGACY mode but not converted in CORRECTED mode. In EXCEPTION mode, an error is reported when dates are in different formats. If the value is inconsistent with that of spark.sql.parquet.datetimeRebaseModeInRead, the value specified in the CREATE TABLE statement prevails.
int96RebaseMode
EXCEPTION
Policy of time conversion during PARQUET file read. Time values are converted based on the Gregorian calendar in LGACY mode but not converted in CORRECTED mode. In EXCEPTION mode, an error is reported when time values are in different formats. If the value is inconsistent with that of spark.sql.parquet.int96RebaseModeInRead, the value specified in the CREATE TABLE statement prevails.
If you use HiveRead and HiveWriter (setting spark.sql.hive.convertMetastoreParquet to false) for read and write, OPTIONS can also support native PARQUET configurations. Reference: Hadoop integration.
Example
CREATE TABLE dempts(
    id bigint COMMENT 'id number',
    num int,
    eno float,
    dno double,
    cno decimal(9,3),
    flag  boolean,
    data string,
    ts_year timestamp,
    date_month date,
    bno binary,
    point struct<x: double, y: double>,
    points array<struct<x: double, y: double>>,
    pointmaps map<struct<x: int>, struct<a: int>> 
    )
USING iceberg
COMMENT 'table documentation' 
PARTITIONED BY (bucket(16,id), years(ts_year), months(date_month), identity(bno),  bucket(3,num),  truncate(10,data))
LOCATION '/warehouse/db_001/dempts'
TBLPROPERTIES ('write.format.default'='orc');
Notes
Spark's USING and Hive's STORED AS are different when they are used as keywords of CREATE_TABLE. The file format and read method after table creation may not meet expectations. Note:
USING DATA_SOURCE: Used in Spark statement. This keyword indicates the data source used as the input format for table creation and directly affects the file format and reading method of the table under the location. The value can be CSV, TXT, Iceberg, Parquet, Orc, etc.
STORED AS FILE_FORMAT: This keyword is used in Hive statements to specify the file format for table storage. The value can be TXT, Parquet, Orc, etc. This keyword is not recommended because the native reader/writer of Spark may not support the specified file format such as CSV.
Statement for Native Iceberg Table
Caution
This statement is supported for creating native tables only.
Statement
CREATE TABLE [ IF NOT EXISTS ] table_identifier
    ( col_name[:] col_type [ COMMENT col_comment ], ... )
[ COMMENT table_comment ]
[ PARTITIONED BY ( col_name1, transform(col_name2), ... ) ]
Parameter
table_identifier: Table name in the three-part format. Example: catalog.db.name.
Schemas and Data Types
col_type
  : primitive_type
  | nested_type
﻿
primitive_type
  : boolean
  | int/integer
  | long/bigint
  | float
  | double
  | decimal(p,s), p=maximum number of digits, s=maximum number of decimal places, s<=p<=38
  | date
  | timestamp, timestamp with timezone, time and without timezone are not supported.
  | string, which corresponds to the UUID type of Iceberg tables.
  | binary, which corresponds to the FIXED type of Iceberg tables.
﻿
nested_type
  : struct
  | list
  | map
Partition Transforms
transform
  : identity, support any type. DLC does not support this transformation.
  | bucket[N], partition by hashed value mod N buckets, supporting col_type: int,long, decimal, date, timestamp, string, binary
  | truncate[L], partition by value truncated to L, supporting col_type: int,long,decimal,string
  | years, partition by year, supporting col_type: date,timestamp
  | months, partition by month, supporting col_type: date,timestamp
  | days/date, partition by date, supporting col_type: date,timestamp
  | hours/date_hour, partition by hour, supporting col_type: timestamp
Example
CREATE TABLE dempts(
    id bigint COMMENT 'id number',
    num int,
    eno float,
    dno double,
    cno decimal(9,3),
    flag  boolean,
    data string,
    ts_year timestamp,
    date_month date,
    bno binary,
    point struct<x: double, y: double>,
    points array<struct<x: double, y: double>>,
    pointmaps map<struct<x: int>, struct<a: int>> 
    )
COMMENT 'table documentation' 
PARTITIONED BY (bucket(16,id), years(ts_year), months(date_month), identity(bno),  bucket(3,num),  truncate(10,data));
﻿

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

tencent cloud

Description

Statement for External Table

Statement

Parameter

Detailed Description of USING and OPTIONS

Example

Notes

Statement for Native Iceberg Table

Statement

Parameter

Example

About Tencent Cloud

Help & Support

Resources

User Center

Key Supported by OPTIONS	Corresponding Default Value	Description
sep or delimiter	,	Delimiter between columns for storage in CSV files. The comma is used by default.
mode	PERMISSIVE	Processing mode when data conversion does not meet expectations. PERMISSIVE: A more relaxed mode, which is the default mode. If extra columns exist after the data conversion of a certain row, only the required columns are adopted automatically. DROPMALFORMED: Discard data that does not meet expectations. For example, if a row has an extra column, the row will be discarded. FAILFAST: Strictly follow the CSV format. The conversion fails if the result does not meet expectations, such as extra columns in a row.
encoding or charset	UTF-8	String encoding scheme Example: UTF-8, US-ASCII, ISO-8859-1, UTF-16BE, UTF-16LE, and UTF-16
quote	\\"	Indicate whether the quotation marks are single or double. Remember to use an escape character.
escape	\\\\	Escape character
charToEscapeQuoteEscaping	-	Escape characters required inside quotation marks
comment	\\u0000	Remarks.
header	false	Indicate whether a header exists.
inferSchema	false	Indicate whether to infer the data type of each column. If the inference is not performed, the data type of each column is string.
ignoreLeadingWhiteSpace	Read: false Write: true	Ignore the leading empty string.
ignoreTrailingWhiteSpace	Read: false Write: true	Ignore the trailing empty string.
columnNameOfCorruptRecord	_corrupt_record	Name of the column where data cannot be converted. If the value is inconsistent with that of spark.sql.columnNameOfCorruptRecord, the value specified in the CREATE TABLE statement prevails.
nullValue	-	Storage format for null values, which are empty strings by default. In this case, use the format defined by emptyValue.
nanValue	NaN	Storage format for non-numeric values
positiveInf	Inf	Storage format for positive infinity values
negativeInf	-Inf	Storage format for negative infinity values
compression or codec	-	Name of the compression algorithm. No compression is performed by default. Abbreviations such as bzip2, deflate, gzip, lz4, and snappy can be used.
timeZone	Default system time zone	Default time zone. If the value is inconsistent with that of spark.sql.session.timeZone (example: Asia/Shanghai), the value specified in the CREATE TABLE statement prevails.
locale	en-US	Language
dateFormat	yyyy-MM-dd	Default date format
timestampFormat	yyyy-MM-dd'T'HH:mm:ss.SSSXXX	Default time format, which is yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX] in non-LEGACY mode.
multiLine	false	Allow multiple rows.
maxColumns	20480	Maximum number of columns
maxCharsPerColumn	-1	Maximum number of characters per column. -1 indicates that the maximum number is not limited.
escapeQuotes	true	Escape quotation marks
quoteAll	quoteAll	Enclose all values in quotation marks.
samplingRatio	1.0	Sampling ratio
enforceSchema	true	Forcibly apply the specified schema for reading, and ignore the table headers.
emptyValue	Read: Write: \\"\\"	Read and write formats for empty values.
lineSep	-	Line break.
inputBufferSize	-	Buffer size for read. If the value is inconsistent with that of spark.sql.csv.parser.inputBufferSize, the value specified in the CREATE TABLE statement prevails.
unescapedQuoteHandling	STOP_AT_DELIMITER	Policy for handling unescaped quotation marks STOP_AT_DELIMITER: Stop reading when a delimiter is found. BACK_TO_DELIMITER: Back to the delimiter. STOP_AT_CLOSING_QUOTE: Stop reading when a closing quotation mark is found. SKIP_VALUE: Skip this column of data. RAISE_ERROR: Report an error.

tencent cloud

Sign Up

Log in

Compute

Microservice

Data Migration

Database SaaS Tool

Data Security

Application Security

Big Data

Tencent Big Model

Internet of Things

Stream Services

Cloud Real-time Rendering

Cloud Resource Management

More

Edge Computing

Serverless

Relational Database

Networking

Business Security

Domains & Websites

Face Recognition

AI Platform Service

Middleware

Media On-Demand

Game Services

Management and Audit Tools

Container

Essential Storage Service

Enterprise Distributed DBMS

CDN and Acceleration

Security Services

Enterprise Applications

Voice Technology

Natural Language Processing

Communication

Media Process Services

Education Sevices

Developer Tools

Distributed cloud

Data Process and Analysis

NoSQL Database

Network Security

Cloud Security

Office Collaboration

Image Creation

Optical Character Recognition

Interactive Video Services

Media SDK

Medical Services

Monitor and Operation

Description

Statement for External Table

Statement

Parameter

Detailed Description of USING and OPTIONS

Example

Notes

Statement for Native Iceberg Table

Statement

Parameter

Example

About Tencent Cloud

Help & Support

Resources

User Center