FIELD
Function statement:
FIELD(<val> T, <val1> T, <val2> T, ...)
Supported engine: Presto
Usage instructions: Return the index of val in the list of val1, val2, etc. If it is not found, 0 is returned.
All primitive types are supported, and parameters are compared using str.equals(x). If val is NULL, the return value is 0.
Return type: integer
Example:
select field('world', 'say', 'hello', 'world');
3
COALESCE
Function statement:
COALESCE(<expr1> T, <expr2> T)
Supported engines: SparkSQL and Presto
Usage instructions: Return the first non-null parameter, if any. Otherwise, null is returned.
Return type: integer
Example:
> SELECT coalesce(NULL, 1, NULL);
1
EXPLODE
Function statement:
EXPLODE(<expr> array<T>|map<K, V>)
Supported engine: SparkSQL
Usage instructions: Separate the elements of an array-type expr into multiple rows, or separate a map-type expr into multiple rows and columns. Use the default column name col for array elements, or key and value for mapping elements.
Return type: row(col T) | row(key K, value V)
Example:
SELECT explode(array(10, 20));
10
20
EXPLODE_OUTER
Function statement:
EXPLODE_OUTER(<expr> array<T>|map<K, V>)
Supported engine: SparkSQL
Usage instructions: Separate the elements of an array-type expr into multiple rows, or separate a map-type expr into multiple rows and columns. Use the default column name col for array elements, or key and value for mapping elements.
Return type: row(col T) | row(key K, value V)
Example:
SELECT explode_outer(array(10, 20));
10
20
GREATEST
Function statement:
GREATEST(<expr1> T, <expr2> T, ...>)
Supported engines: SparkSQL and Presto
Usage instructions: Return the maximum value among all parameters, ignoring null values.
Return type: T
Example:
> SELECT greatest(10, 9, 2, 4, 3);
10
IF
Function statement:
IF(<expr1> boolean, <expr2> T, <expr3> U)
Supported engines: SparkSQL and Presto
Usage instructions: Return expr2 if expr1 evaluates to true; otherwise, return expr3.
Return type: T|U
Example:
> SELECT if(1 < 2, 'a', 'b');
a
INLINE
Function statement:
INLINE(a array<struct<f1:T1,...,fn:Tn>>)
Supported engine: SparkSQL
Usage instructions: Convert an array of structs into a table. By default, col1, col2, etc. are used as column names.
Return type: row(T1, ..., Tn)
Example:
> SELECT inline(array(struct(1, 'a'), struct(2, 'b')));
1 a
2 b
INLINE_OUTER
Function statement:
INLINE_OUTER(a array<struct<f1:T1,...,fn:Tn>>)
Supported engine: SparkSQL
Usage instructions: Convert an array of structs into a table. By default, col1, col2, etc. are used as column names.
Return type: row(T1, ..., Tn)
Example:
> SELECT inline(array(struct(1, 'a'), struct(2, 'b')));
1 a
2 b
IN
Function statement:
<expr1> IN(<expr2> T, <expr3> T, ...)
Supported engines: SparkSQL and Presto
Usage instructions: Return true if expr1 is equal to any of exprn.
Return type: boolean
Example:
> SELECT 1 in(1, 2, 3);
true
> SELECT 1 in(2, 3, 4);
false
ISNAN
Function statement:
Supported engines: SparkSQL and Presto
Usage instructions: Return true if expr is NaN; otherwise, return false.
Return type: boolean
Example:
> SELECT isnan(cast('NaN' as double));
true
IFNULL
Function statement:
IFNULL(<expr1> T, <expr2> U)
Supported engine: SparkSQL
Usage instructions: Return expr2 if expr1 is null; otherwise, return expr1.
Return type: T|U
Example:
> SELECT ifnull(NULL, array('2'));
["2"]
ISNULL
Function statement:
Supported engines: SparkSQL and Presto
Usage instructions: Return true if expr is null; otherwise, return false.
Return type: boolean
Example:
> SELECT isnull(1);
false
ISNOTNULL
Function statement:
Supported engines: SparkSQL and Presto
Usage instructions: Return true if expr is not null; otherwise, return false.
Return type: boolean
Example:
> SELECT isnotnull(1);
true
LEAST
Function statement:
LEAST(<expr1> T, <expr2> T, ...)
Supported engines: SparkSQL and Presto
Usage instructions: Return the minimum value among all parameters, ignoring null values.
Return type: T
Example:
> SELECT least(10, 9, 2, 4, 3);
2
NANVL
Function statement:
NANVL(<expr1> T, <expr2> U)
Supported engines: SparkSQL and Presto
Usage instructions: nanvl(expr1, expr2) returns expr1 if expr1 is not NaN; otherwise, it returns expr2.
Return type: T|U
Example:
> SELECT nanvl(cast('NaN' as double), 123);
123.0
NULLIF
Function statement:
NULLIF(<expr1> T, <expr2> U)
Supported engines: SparkSQL and Presto
Usage instructions: Return null if expr1 is equal to expr2; otherwise, return expr1.
Return type: T
Example:
> SELECT nullif(2, 2);
NULL
NVL
Function statement:
NVL(<expr1> T, <expr2> U)
Supported engines: SparkSQL and Presto
Usage instructions: Return expr2 if expr1 is null; otherwise, return expr1.
Return type: T|U
Example:
> SELECT nvl(NULL, array('2'));
["2"]
NVL2
Function statement:
NVL2(<expr1> T1, <expr2> T2, <expr3> T3)
Supported engines: SparkSQL and Presto
Usage instructions: Return expr2 if expr1 is not null; otherwise, return expr3.
Return type: T2|T3
Example:
> SELECT nvl2(NULL, 2, 1);
1
POSEXPLODE
Function statement:
POSEXPLODE(<expr> array<T>|map<K, V>)
Supported engine: SparkSQL
Usage instructions: Separate the elements of an array-type expr into multiple rows, or separate a map-type expr into multiple rows and columns. Use the column name pos to represent position, with the default column name col for array elements, or key and value for map elements.
Return type: row(pos integer, col T)|row(row integer, key K, value V)
Example:
> SELECT posexplode(array(10,20));
0 10
1 20
POSEXPLODE_OUTER
Function statement:
POSEXPLODE_OUTER(<expr> array<T>|map<K, V>)
Supported engine: SparkSQL
Usage instructions: Separate the elements of an array-type expr into multiple rows, or separate a map-type expr into multiple rows and columns. Use the column name pos to represent position, with the default column name col for array elements, or key and value for map elements.
Return type: row(pos integer, col T)|row(row integer, key K, value V)
Example:
> SELECT posexplode_outer(array(10,20));
0 10
1 20
STACK
Function statement:
STACK(<n> integer, <expr0> T0, ..., <expr1> T1)
Supported engine: SparkSQL
Usage instructions: stack(n, expr1, ..., exprk) - Separate expr1, ..., exprk into n rows. By default, col0, col1, etc. are used as column names.
Return type: row(col0 T0, ..., coln Tn)
Example:
> SELECT stack(2, 1, 2, 3);
1 2
3 NULL
ASSERT_TRUE
Function statement:
ASSERT_TRUE(<expr> boolean)
Supported engines: SparkSQL and Presto
Usage instructions: Throw an exception if expr is not true.
Return type: boolean
Example:
> SELECT assert_true(0 < 1);
NULL
RAISE_ERROR
Function statement:
RAISE_ERROR(<error> string)
Supported engines: SparkSQL and Presto
Usage instructions: Throw an exception of expr.
Return type: string
Example:
> SELECT raise_error('custom error message');
java.lang.RuntimeException
custom error message
SPARK_PARTITION_ID
Function statement:
Supported engine: SparkSQL
Usage instructions: Return the current partition ID.
Return type: integer
Example:
> SELECT spark_partition_id();
0
Function statement:
Supported engine: SparkSQL
Usage instructions: Return the name of the file being read. If the file name is unavailable, return a null string.
Return type: string
Example:
> SELECT input_file_name();
Function statement:
INPUT_FILE_BLOCK_START()
Supported engine: SparkSQL
Usage instructions: Return the starting offset of the block being read. If the offset is unavailable, return -1.
Return type: integer
Example:
> SELECT input_file_block_start();
-1
Function statement:
INPUT_FILE_BLOCK_LENGTH()
Supported engine: SparkSQL
Usage instructions: Return the length of the block being read. If the length is unavailable, return -1.
Return type: integer
Example:
> SELECT input_file_block_length();
-1
MONOTONICALLY_INCREASING_ID
Function statement:
MONOTONICALLY_INCREASING_ID()
Supported engine: SparkSQL
Usage instructions: Return a monotonically increasing 64-bit integer. The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the high 31 bits and the record number within each partition in the low 33 bits. Assume the data frame has fewer than 1 billion partitions and fewer than 8 billion records per partition. This function is non-deterministic because its result depends on the partition ID.
Return type: bigint
Example:
> SELECT monotonically_increasing_id();
0
CURRENT_DATABASE
Function statement:
Supported engine: SparkSQL
Usage instructions: Return the current database.
Return type: string
Example:
> SELECT current_database();
default
CURRENT_CATALOG
Function statement:
Supported engine: SparkSQL
Usage instructions: Return the current catalog.
Return type: string
Example:
> SELECT current_catalog();
spark_catalog
CURRENT_USER
Function statement:
Supported engines: SparkSQL and Presto
Usage instructions: Return the current user.
Return type: string
Example:
REFLECT
Function statement:
REFLECT(<class> string, <method> string[, <arg1> T1[, <arg2> T2, ...]])
Supported engines: SparkSQL and Presto
Usage instructions: Call a method with reflection.
Return type: string
Example:
> select reflect('java.lang.Math', 'abs', -1);
1
JAVA_METHOD
Function statement:
JAVA_METHOD(<class> string, <method> string[, <arg1> T1[, <arg2> T2, ...]])
Supported engines: SparkSQL and Presto
Usage instructions: Call a method with reflection.
Return type: string
Example:
> select JAVA_METHOD('java.lang.Math', 'abs', -1);
1
VERSION
Function statement:
Supported engines: SparkSQL and Presto
Usage instructions: Return the engine version.
Return type: string
Example:
> select VERSION()
3.0.0 rce61711a5fa54ab34fc74d86d521ecaeea6b072a
TYPEOF
Function statement:
Supported engines: SparkSQL and Presto
Usage instructions: Return the data type of expr.
Return type: string
Example:
> SELECT typeof(1);
int
> SELECT typeof(array(1));
array<int>
CAST
Function statement:
Supported engines: SparkSQL and Presto
Usage instructions: Convert expr to the type.
Return type: <type>
Example:
> SELECT cast('10' as int);
10
BOOLEAN
Function statement:
Supported engines: SparkSQL and Presto
Usage instructions: Convert expr to the boolean type.
Return type: boolean
Example:
> SELECT boolean(1);
true
BIGINT
Function statement:
Supported engines: SparkSQL and Presto
Usage instructions: Force the type to be converted to bigint.
Return type: bigint
Example:
BINARY
Function statement:
Supported engines: SparkSQL and Presto
Usage instructions: Force the type to be converted to BINARY.
Return type: binary
Example:
DOUBLE
Function statement:
Supported engines: SparkSQL and Presto
Usage instructions: Force the type to be converted to double.
Return type: double
Example:
FLOAT
Function statement:
Supported engines: SparkSQL and Presto
Usage instructions: Force the type to be converted to float.
Return type: float
Example:
INT
Function statement:
Supported engines: SparkSQL and Presto
Usage instructions: Force the type to be converted to integer.
Return type: integer
Example:
SMALLINT
Function statement:
Supported engines: SparkSQL and Presto
Usage instructions: Force the type to be converted to smallint.
Return type: smallint
Example:
select typeof(smallint(1));
smallint
STRING
Function statement:
Supported engines: SparkSQL and Presto
Usage instructions: Force the type to be converted to string.
Return type: string
Example:
> select typeof(string(1));
string
TINYINT
Function statement:
Supported engines: SparkSQL and Presto
Usage instructions: Force the type to be converted to tinyint.
Return type: tinyint
Example:
> select typeof(tinyint(1));
tinyint
DECIMAL
Function statement:
Supported engines: SparkSQL and Presto
Usage instructions: Force the type to be converted to decimal.
Return type: decimal
Example:
> select typeof(decimal(1));
decimal(10, 0)
GET_IDCARD_BIRTHDAY
Function statement:
GET_IDCARD_BIRTHDAY(<string> idcardno)
Supported engine: SparkSQL
Usage instructions: Obtain the birth date from the ID card number.
idcardno: Required, string type. Must be a 15-digit or 18-digit ID card number, and its validity will be verified. Other non-null inputs are not allowed. Return null if the input is null.
Return type: date
Example (The following test ID numbers are random samples and do not represent valid ID numbers.)
> SELECT get_idcard_birthday('421081199001011222');
1990-01-01
GET_IDCARD_SEX
Function statement:
GET_IDCARD_SEX(<string> idcardno)
Supported engine: SparkSQL
Usage instructions: Obtain the gender from the ID card number.
idcardno: Required, string type. Must be a 15-digit or 18-digit ID card number, and its validity will be verified. Other non-null inputs are not allowed. Return null if the input is null.
Return type: string
Example: (The following test ID numbers are random samples and do not represent valid ID numbers.)
> SELECT get_idcard_birthday('421081199001011222');
GET_IDCARD_AGE
Function statement:
GET_IDCARD_AGE(<string> idcardno)
Supported engine: SparkSQL
Usage instructions: Obtain the age from the ID card number.
idcardno: Required, string type. Must be a 15-digit or 18-digit ID card number, and its validity will be verified. Other non-null inputs are not allowed. Return null if the input is null.
Return type: int
Example (The following test ID numbers are random samples and do not represent valid ID numbers.)
> SELECT get_idcard_age('421081197001021233');
53
MAX_PT
Function statement:
MAX_PT(<const string> tableFullName)
Supported engine: SparkSQL
Usage instructions: Retrieve the maximum value in the specified partitioned table.
tableFullName: Required. It is of string type. Must be a constant value, otherwise an error will occur. tableFullName consists of three parts: catalog.db.table. If catalog and db are omitted, the current session's configuration will be used by default, but it is recommended to specify the full name.
An error will occur if the input table is not a partitioned table.
This function only returns the maximum partition that has data. Since data presence involves the metadata center, for external tables, ensure that you have executed the ANALYZE statement to report the partition's statistics to the metadata.
Partition values are sorted in dictionary order to get the maximum value. When there are multiple levels of partitions, only the first-level partition is sorted.
Return type: string
Example:
> SELECT max_pt('test.tableName');
20231024
> select * from test.tableName where dt=max_pt('test.tableName');
Equivalent to select * from test.tableName where dt='20231014';
TRANS_ARRAY
Function statement:
TRANS_ARRAY(<int> numKeys, <string> separator, <key1>, <key2>, ..., <col1>, <col2>, ...)
Supported engine: SparkSQL
Usage instructions: Split and transpose the specified columns cols into multiple rows, while supporting the specification of certain columns keys to be used as keys for the transposition.
numKeys: int type, required. Indicates the number of key columns used for transposition. Must be greater than or equal to 0.
separator: String type, required. When cols is a string, it needs to be split based on the separator; when cols is an array, this parameter can be filled arbitrarily.
keys: Any type, the number is determined by numKeys.
cols: String type or array type. All specified columns, except keys, are considered cols, which are the columns to be split and transposed.** All types of cols must be the same, meaning they should either all be strings or all be arrays. Alternatively, cols can also support a combination of strings and string arrays. If there are no cols, only one row will be output.
Note 1: The sum of the quantities of keys and cols must be greater than 0.
Note 2: When the lengths of the split cols are not equal, the number of rows after the final transpose is based on the longest column, with other columns padded with NULL.
Return type: any type The type of keys remains unchanged, and the type of cols is the element type of the string or array.
Example:
> SELECT trans_array(1, ',', key, trans1, trans2) as (key, col1, col2) from values ('1', '2,3', array('4', '5')) as tab (key, trans1, trans2);
1 2 4
1 3 5
> SELECT trans_array(0, ',', key, trans1, trans2) as (key, col1, col2) from values ('1', '2,3', array('4')) as tab (key, trans1, trans2);
1 2 4
NULL 3 NULL
> SELECT trans_array(3, ',', key, trans1, trans2) as (key, col1, col2) from values ('1', '2,3', array('4', '5')) as tab (key, trans1, trans2);
12,3 [4,5]
TRANS_COLS
Function statement:
TRANS_COLS(<int> numKeys, <key1>, <key2>, ..., <col1>, <col2>, ...)
Supported engine: SparkSQL
Usage instructions: Transpose the specified multiple columns cols into multiple rows, while also supporting the specification of certain columns keys as the keys for the transposition.
numKeys: int type, required. Indicates the number of keys columns used for transposition. Must be greater than or equal to 0.
keys: Any type, the number is determined by numKeys.
cols: Any type, all specified columns excluding keys are considered as cols, i.e., the columns to be transposed. All cols must be of the same type. When there are no cols, only one row will be output.
Note: The sum of the quantities of keys and cols must be greater than 0.
Return type: Any type The output will include a column idx, indicating the index of the row among all transposed rows. The type of keys remains unchanged, and the type of cols remains unchanged.
Example:
> SELECT trans_cols(1,sid,ip1,ip2) as (idx, sid, ip) from values ('s1','0.0.0.0','1.1.1.1') as tab(sid,ip1,ip2);
1 s1 0.0.0.0
2 s1 1.1.1.1
SAMPLE
Function statement:
SAMPLE(<int/long> totalParts[, <int/long> pointParts][, <col1>, <col2>, ...])
Supported engine: SparkSQL
Usage instructions: Sampling function. Divide the data globally or by the hash value of specified columns into totalParts parts and select the specified pointParts part to return.
totalParts: int/long type, required. Indicates the total number of partitions needed.
pointParts: int/long type, optional. Indicates which part of the data to return. The default value is 1 and must be less than or equal to totalParts.
cols: Any type, optional. You can specify any number of columns as keys for dividing the data. The data will be partitioned based on the hash values of these columns.
Note 1: When cols are not specified, or when all the specified cols are NULL, to avoid data skew, random values will be used instead of hash values for calculation. When cols are specified, as long as at least one column is not NULL, it will affect the hash value calculation and partitioning result.
Note 2: Since the seeds for the hash value and random value are fixed, if the data order remains consistent, the sampled data within a single partition will also be consistent upon repeated executions. In scenes involving small file merges, the sampling order may change. It is recommended to specify columns for sampling in such cases.
Return type: boolean, where true indicates the data has been sampled, and false indicates the data has not been sampled.
Example:
> select * from values (1, 'm1'), (2, 't2'), (3, 'z3') as tab(dt, uid) where sample(3, 1);
2 t2
→ select * from values (1, 'm1'), (2, 't2'), (3, 'z3') as tab(dt, uid) where sample(3, 1, dt);
2 t2
3 z3
TO_CHAR
Function statement:
TO_CHAR(<boolean|int|long|double|decimal> data [, <string> format])
TO_CHAR(<date|timestamp> data, string format)
Supported engine: SparkSQL
Usage instructions 1: Format numbers according to the specified template. The format can be omitted, which directly converts the number to a string. Otherwise, the format is defined as follows:
'0' or '9': Placeholder for digits. At the beginning of the string, if 0 is used as a placeholder, it replaces missing digits with 0. If 9 is used, it replaces missing digits with spaces. At the end of the string, if there are no digits, 0 is used as a replacement.
'.' or 'D': Placeholder for the decimal point, which can appear only once.
',' or 'G': Comma, used as a placeholder for hundreds, millions, etc., and must be surrounded by digits 0-9.
'$': USD sign, which can appear only once.
'S' or 'MI': Placeholder for negative and positive signs, which can only appear at most once at the beginning or end of the string.
'PR': It is only allowed to appear once at the end of the string. When the input data is negative, it converts the number to positive and adds angle brackets.
Usage instructions 2: Format the date or time according to the specified template. The format cannot be omitted and is defined as follows:
A template composed of any characters from yyyyMMdd HH:mm:ss.SSS.
Usage instructions 3: TO_CHAR supports implicit conversion of the type of the first parameter. When the first parameter is of string type, the function will implicitly convert it to a timestamp and apply the date or time formatting logic based on the template.
Return type: string
Example:
> select to_char(124.23);
124.23
> select to_char(124.23, '00999.9999');
<space><space>124.2300
> select to_char(4124.23, '9,999.99');
4,124.23
> select to_char(-124.23, '999.99S');
124.23-
> select to_char(-4124.23, '$9,999.99PR');
<$4,124.23>
> select to_char(date '2016-12-31 12:34:56', 'yyyyMMddHHmmss');
20161231123456
Was this page helpful?