CREATE TABLE [ IF NOT EXISTS ] table_identifier( col_name[:] col_type [ COMMENT col_comment ], ... )USING data_source[ COMMENT table_comment ][ OPTIONS ( 'key1'='value1', 'key2'='value2' )][ PARTITIONED BY ( col_name1, transform(col_name2), ... ) ][ LOCATION path ][ TBLPROPERTIES ( property_name=property_value, ... ) ]
USING data_source
: Data source used during table creation. Currently, the value of data_source can be CSV, ORC, PARQUET, and ICEBERG.
table_identifier
: Table name in the three-part format. Example: catalog.database.table.
COMMENT
: Table descriptionOPTIONS
: Additional parameters of USING data_source, which are used to pass dynamic values during storage.
PARTITIONED BY
: Create partitions based on specified columns.
LOCATION path
: Storage path of the data table.
TBLPROPERTIES
: A set of key-value pairs used to specify table parameters.Key Supported by OPTIONS | Corresponding Default Value | Description |
sep or delimiter | , | Delimiter between columns for storage in CSV files. The comma is used by default. |
mode | PERMISSIVE | Processing mode when data conversion does not meet expectations. PERMISSIVE: A more relaxed mode, which is the default mode. If extra columns exist after the data conversion of a certain row, only the required columns are adopted automatically. DROPMALFORMED: Discard data that does not meet expectations. For example, if a row has an extra column, the row will be discarded. FAILFAST: Strictly follow the CSV format. The conversion fails if the result does not meet expectations, such as extra columns in a row. |
encoding or charset | UTF-8 | String encoding scheme Example: UTF-8, US-ASCII, ISO-8859-1, UTF-16BE, UTF-16LE, and UTF-16 |
quote | \\" | Indicate whether the quotation marks are single or double. Remember to use an escape character. |
escape | \\\\ | Escape character |
charToEscapeQuoteEscaping | - | Escape characters required inside quotation marks |
comment | \\u0000 | Remarks. |
header | false | Indicate whether a header exists. |
inferSchema | false | Indicate whether to infer the data type of each column. If the inference is not performed, the data type of each column is string. |
ignoreLeadingWhiteSpace | Read: false Write: true | Ignore the leading empty string. |
ignoreTrailingWhiteSpace | Read: false Write: true | Ignore the trailing empty string. |
columnNameOfCorruptRecord | _corrupt_record | Name of the column where data cannot be converted. If the value is inconsistent with that of spark.sql.columnNameOfCorruptRecord, the value specified in the CREATE TABLE statement prevails. |
nullValue | - | Storage format for null values, which are empty strings by default. In this case, use the format defined by emptyValue. |
nanValue | NaN | Storage format for non-numeric values |
positiveInf | Inf | Storage format for positive infinity values |
negativeInf | -Inf | Storage format for negative infinity values |
compression or codec | - | Name of the compression algorithm. No compression is performed by default. Abbreviations such as bzip2, deflate, gzip, lz4, and snappy can be used. |
timeZone | Default system time zone | Default time zone. If the value is inconsistent with that of spark.sql.session.timeZone (example: Asia/Shanghai), the value specified in the CREATE TABLE statement prevails. |
locale | en-US | Language |
dateFormat | yyyy-MM-dd | Default date format |
timestampFormat | yyyy-MM-dd'T'HH:mm:ss.SSSXXX | Default time format, which is yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX] in non-LEGACY mode. |
multiLine | false | Allow multiple rows. |
maxColumns | 20480 | Maximum number of columns |
maxCharsPerColumn | -1 | Maximum number of characters per column. -1 indicates that the maximum number is not limited. |
escapeQuotes | true | Escape quotation marks |
quoteAll | quoteAll | Enclose all values in quotation marks. |
samplingRatio | 1.0 | Sampling ratio |
enforceSchema | true | Forcibly apply the specified schema for reading, and ignore the
table headers. |
emptyValue | Read: Write: \\"\\" | Read and write formats for empty values. |
lineSep | - | Line break. |
inputBufferSize | - | Buffer size for read. If the value is inconsistent with that of spark.sql.csv.parser.inputBufferSize, the value specified in the CREATE TABLE statement prevails. |
unescapedQuoteHandling | STOP_AT_DELIMITER | Policy for handling unescaped quotation marks STOP_AT_DELIMITER: Stop reading when a delimiter is found. BACK_TO_DELIMITER: Back to the delimiter. STOP_AT_CLOSING_QUOTE: Stop reading when a closing quotation mark is found. SKIP_VALUE: Skip this column of data. RAISE_ERROR: Report an error. |
Key Supported by OPTIONS | Corresponding Default Value | Description |
compression or orc.compress | snappy | Compression algorithm. Abbreviations such as snappy, zlib, lzo, lz3, and zstd are supported. If the value is inconsistent with that of spark.sql.orc.compression.codec, the value specified in the CREATE TABLE statement prevails. |
mergeSchema | false | Merge schemas. If the value is inconsistent with that of spark.sql.orc.mergeSchema, the value specified in the CREATE TABLE statement prevails. |
Key Supported by OPTIONS | Corresponding Default Value | Description |
compression or parquet.compression | snappy | Compression algorithm. Snappy is used by default. If the value is inconsistent with that of spark.sql.parquet.compression.codec, the value specified in the CREATE TABLE statement prevails. |
mergeSchema | false | Indicate whether to merge schemas. If the value is inconsistent with that of spark.sql.parquet.mergeSchema, the value specified in the CREATE TABLE statement prevails. |
datetimeRebaseMode | EXCEPTION | Policy of date conversion during PARQUET file write. Dates are converted based on the Gregorian calendar in LGACY mode but not converted in CORRECTED mode. In EXCEPTION mode, an error is reported when dates are in different formats. If the value is inconsistent with that of spark.sql.parquet.datetimeRebaseModeInRead, the value specified in the CREATE TABLE statement prevails. |
int96RebaseMode | EXCEPTION | Policy of time conversion during PARQUET file read. Time values are converted based on the Gregorian calendar in LGACY mode but not converted in CORRECTED mode. In EXCEPTION mode, an error is reported when time values are in different formats. If the value is inconsistent with that of spark.sql.parquet.int96RebaseModeInRead, the value specified in the CREATE TABLE statement prevails. |
CREATE TABLE dempts(id bigint COMMENT 'id number',num int,eno float,dno double,cno decimal(9,3),flag boolean,data string,ts_year timestamp,date_month date,bno binary,point struct<x: double, y: double>,points array<struct<x: double, y: double>>,pointmaps map<struct<x: int>, struct<a: int>>)USING icebergCOMMENT 'table documentation'PARTITIONED BY (bucket(16,id), years(ts_year), months(date_month), identity(bno), bucket(3,num), truncate(10,data))LOCATION '/warehouse/db_001/dempts'TBLPROPERTIES ('write.format.default'='orc');
CREATE TABLE [ IF NOT EXISTS ] table_identifier( col_name[:] col_type [ COMMENT col_comment ], ... )[ COMMENT table_comment ][ PARTITIONED BY ( col_name1, transform(col_name2), ... ) ]
table_identifier
: Table name in the three-part format. Example: catalog.db.name.
Schemas and Data Typescol_type: primitive_type| nested_typeprimitive_type: boolean| int/integer| long/bigint| float| double| decimal(p,s), p=maximum number of digits, s=maximum number of decimal places, s<=p<=38| date| timestamp, timestamp with timezone, time and without timezone are not supported.| string, which corresponds to the UUID type of Iceberg tables.| binary, which corresponds to the FIXED type of Iceberg tables.nested_type: struct| list| map
transform: identity, support any type. DLC does not support this transformation.| bucket[N], partition by hashed value mod N buckets, supporting col_type: int,long, decimal, date, timestamp, string, binary| truncate[L], partition by value truncated to L, supporting col_type: int,long,decimal,string| years, partition by year, supporting col_type: date,timestamp| months, partition by month, supporting col_type: date,timestamp| days/date, partition by date, supporting col_type: date,timestamp| hours/date_hour, partition by hour, supporting col_type: timestamp
CREATE TABLE dempts(id bigint COMMENT 'id number',num int,eno float,dno double,cno decimal(9,3),flag boolean,data string,ts_year timestamp,date_month date,bno binary,point struct<x: double, y: double>,points array<struct<x: double, y: double>>,pointmaps map<struct<x: int>, struct<a: int>>)COMMENT 'table documentation'PARTITIONED BY (bucket(16,id), years(ts_year), months(date_month), identity(bno), bucket(3,num), truncate(10,data));
Was this page helpful?