APPROX_COUNT_DISTINCT
Function statement:
APPROX_COUNT_DISTINCT(<expr> any [, <relativeSD> integer|double|decimal])
Supported engine: SparkSQL
Usage instructions: Return the cardinality estimated by HyperLogLog++. relativeSD defines the allowable maximum relative standard deviation.
Return type: bigint
Example:
SELECT approx_count_distinct(col1) FROM (VALUES (1), (1), (2), (2), (3)) tab(col1);
3
AVG
Function statement:
AVG(<expr> integer|double|decimal)
Supported engines: SparkSQL and Presto
Usage instructions: Return the average value calculated based on the group's values.
Return type: double
Example:
> SELECT avg(col) FROM (VALUES (1), (2), (3)) AS tab(col);
2.0
> SELECT avg(col) FROM (VALUES (1), (2), (NULL)) AS tab(col);
1.5
CORR
Function statement:
CORR(<expr> integer|double|decimal, <expr> integer|double|decimal)
Supported engines: SparkSQL and Presto
Usage instructions: Return the Pearson correlation coefficient between a set of numeric pairs.
Return type: double
Example:
> SELECT corr(c1, c2) FROM (VALUES (3, 2), (3, 3), (6, 4)) as tab(c1, c2);
0.8660254037844387
COUNT
Function statement:
COUNT(*)
COUNT([DISTINCT] <col1> ANY, <col2> ANY, ...)
Supported engines: SparkSQL and Presto
Usage instructions:
COUNT(*): Return the total number of retrieved rows, including rows with null values.
COUNT(<col1> ANY, <col2> ANY, ...): Return the number of rows where all provided expressions are non-null.
COUNT([DISTINCT] <col1> ANY, <col2> ANY, ...): Return the number of unique and non-null rows for the provided expressions.
Return type: integer
Example:
> SELECT count(*) FROM (VALUES (NULL), (5), (5), (20)) AS tab(col);
4
> SELECT count(col) FROM (VALUES (NULL), (5), (5), (20)) AS tab(col);
3
> SELECT count(DISTINCT col) (FROM VALUES (NULL), (5), (5), (10)) AS tab(col);
2
COUNT_IF
Function statement:
Supported engines: SparkSQL and Presto
Usage instructions: Return the number of rows where the expression is TRUE.
Return type: int
Example:
> SELECT count_if(col % 2 = 0) FROM (VALUES (NULL), (0), (1), (2), (3)) AS tab(col);
2
> SELECT count_if(col IS NULL) FROM (VALUES (NULL), (0), (1), (2), (3)) AS tab(col);
1
COVER_POP
Function statement:
COVAR_POP(<expr1> integer|double|decimal, <expr2> integer|double|decimal)
Supported engines: SparkSQL and Presto
Usage instructions: Return the population covariance of a set of numeric pairs.
Return type: double
Example:
> SELECT covar_pop(c1, c2) FROM (VALUES (1,1), (2,2), (3,3)) AS tab(c1, c2);
0.6666666666666666
COVER_SAMP
Function statement:
COVAR_SAMP(<expr1> integer|double|decimal, <expr2> integer|double|decimal)
Supported engines: SparkSQL and Presto
Usage instructions: Return the sample covariance of a set of numeric pairs.
Return type: double
Example:
> SELECT covar_samp(c1, c2) FROM (VALUES (1,1), (2,2), (3,3)) AS tab(c1, c2);
1.0
FIRST_VALUE
Function statement:
FIRST_VALUE(<expr> T[, <isIgnoreNull> boolean])
Supported engines: SparkSQL and Presto
Usage instructions: Return the first value of expr in a set of rows. If isIgnoreNull is true, only non-null values are returned.
Return type: T
Example:
> SELECT first_value(col) FROM (VALUES (10), (5), (20)) AS tab(col);
10
> SELECT first_value(col) FROM (VALUES (NULL), (5), (20)) AS tab(col);
NULL
> SELECT first_value(col, true) FROM (VALUES (NULL), (5), (20)) AS tab(col);
5
FIRST
Function statement:
FIRST(<expr> T[, <isIgnoreNull> boolean])
Supported engine: SparkSQL
Usage instructions: Return the first value of expr in a set of rows. If isIgnoreNull is true, only non-null values are returned.
Return type: T
Example:
> SELECT first(col) FROM (VALUES (10), (5), (20)) AS tab(col);
10
> SELECT first(col) FROM (VALUES (NULL), (5), (20)) AS tab(col);
NULL
> SELECT first(col, true) FROM (VALUES (NULL), (5), (20)) AS tab(col);
5
KURTOSIS
Function statement:
KURTOSIS(<expr> integer|double|decimal)
Supported engine: SparkSQL
Usage instructions: Return the kurtosis value calculated based on the group's values.
Return type: double
Example:
> SELECT kurtosis(col) FROM (VALUES (-10), (-20), (100), (1000)) AS tab(col);
-0.7014368047529627
> SELECT kurtosis(col) FROM (VALUES (1), (10), (100), (10), (1)) as tab(col);
0.19432323191699075s
LAST_VALUE
Function statement:
LAST_VALUE(<expr> T[, <isIgnoreNull> boolean])
Supported engines: SparkSQL and Presto
Usage instructions: Return the last value of expr in a set of rows. If isIgnoreNull is true, only non-null values are returned.
Return type: T
Example:
> SELECT last_value(col) FROM (VALUES (10), (5), (20)) AS tab(col);
20
> SELECT last_value(col) FROM (VALUES (10), (5), (NULL)) AS tab(col);
NULL
> SELECT last_value(col, true) FROM (VALUES (10), (5), (NULL)) AS tab(col);
5
LAST
Function statement:
LAST(<expr> T[, <isIgnoreNull> boolean])
Supported engine: SparkSQL
Usage instructions: Return the last value of expr in a set of rows. If isIgnoreNull is true, only non-null values are returned.
Return type: T
Example:
> SELECT last(col) FROM (VALUES (10), (5), (20)) AS tab(col);
20
> SELECT last(col) FROM (VALUES (10), (5), (NULL)) AS tab(col);
NULL
> SELECT last(col, true) FROM (VALUES (10), (5), (NULL)) AS tab(col);
5
MEAN
Function statement:
MEAN(<expr> integer|double|decimal)
Supported engines: SparkSQL and Presto
Usage instructions: Return the average value calculated based on the group's values.
Return type: double
Example:
> SELECT mean(col) FROM (VALUES (1), (2), (3)) AS tab(col);
2.0
> SELECT mean(col) FROM (VALUES (1), (2), (NULL)) AS tab(col);
1.5
PERCENTILE
Function statement:
PERCENTILE(<col> ANY, <percentage> integer|double|decimal|array<double> [, <frequency> integer])
Supported engines: SparkSQL and Presto
Usage instructions: Return the exact percentile value of the numeric column col at the given percentage. The percentage value must be between 0.0 and 1.0. frequency should be a positive integer.
Return type: double
Example:
> SELECT percentile(col, 0.3) FROM (VALUES (0), (10)) AS tab(col);
3.0
> SELECT percentile(col, array(0.25, 0.75)) FROM (VALUES (0), (10)) AS tab(col);
[2.5,7.5]
SKEWNESS
Function statement:
SKEWNESS(<col> integer|double|decimal)
Supported engine: SparkSQL
Usage instructions: Return the skewness value calculated based on the group's values.
Return type: double
Example:
> SELECT skewness(col) FROM (VALUES (-10), (-20), (100), (1000)) AS tab(col);
1.1135657469022011
> SELECT skewness(col) FROM (VALUES (-1000), (-100), (10), (20)) AS tab(col);
-1.1135657469022011
PERCENTILE_APPROX
Function statement:
PERCENTILE_APPROX(<col> integer|double|decimal, <percentage> double|array<double>[, <accuracy> integer])
Supported engines: SparkSQL and Presto
Usage instructions: Return the approximate percentile of the numeric column col, which is the minimum value in the ordered col values (sorted from minimum to maximum) such that no more than the given percentage of col values are less than or equal to this value. The percentage value must be between 0.0 and 1.0. The accuracy parameter (default value: 10,000) is a positive numeric literal that controls the approximation accuracy at the cost of memory. The higher the accuracy value, the better the precision. 1.0/accuracy is the approximate relative error. When percentage is an array, each value in the percentage array must be between 0.0 and 1.0. In this case, an array of approximate percentiles for the column col corresponding to the given percentage array is returned.
Return type: integer |array<integer>
Example:
> SELECT percentile_approx(col, array(0.5, 0.4, 0.1), 100) FROM (VALUES (0), (1), (2), (10)) AS tab(col);
[1,1,0]
> SELECT percentile_approx(col, 0.5, 100) FROM (VALUES (0), (6), (7), (9), (10)) AS tab(col);
7
APPROX_PERCENTILE
Function statement:
APPROX_PERCENTILE(<col> integer|double|decimal, <percentage> double|array<double>[, <accuracy> integer])
Supported engines: SparkSQL and Presto
Usage instructions: Return the approximate percentile of the numeric column col, which is the minimum value in the ordered col values (sorted from minimum to maximum) such that no more than the given percentage of col values are less than or equal to this value. The percentage value must be between 0.0 and 1.0. The accuracy parameter (default value: 10,000) is a positive numeric literal that controls the approximation accuracy at the cost of memory. The higher the accuracy value, the better the precision. 1.0/accuracy is the approximate relative error. When percentage is an array, each value in the percentage array must be between 0.0 and 1.0. In this case, an array of approximate percentiles for the column col corresponding to the given percentage array is returned.
Return type: integer |array<integer>
Example:
> SELECT APPROX_PERCENTILE(col, array(0.5, 0.4, 0.1), 100) FROM (VALUES (0), (1), (2), (10)) AS tab(col);
[1,1,0]
> SELECT APPROX_PERCENTILE(col, 0.5, 100) FROM (VALUES (0), (6), (7), (9), (10)) AS tab(col);
7
MAX
Function statement:
MAX(<col> integer|double|decimal)
Supported engines: SparkSQL and Presto
Usage instructions: Return the maximum value of col.
Return type: same as col
Example:
> SELECT max(col) FROM (VALUES (10), (50), (20)) AS tab(col);
50
MAX_BY
Function statement:
MAX_BY(<x> T, <y> integer|double|decimal)
Supported engines: SparkSQL and Presto
Usage instructions: Return the x value associated with the maximum y value.
Return type: T
Example:
> SELECT max_by(x, y) FROM (VALUES (('a', 10)), (('b', 50)), (('c', 20))) AS tab(x, y);
b
MIN
Function statement:
MIN(<col> integer|double|decimal)
Supported engines: SparkSQL and Presto
Usage instructions: Return the minimum value of col.
Return type: same as col
Example:
> SELECT min(col) FROM (VALUES (10), (50), (20)) AS tab(col);
10
MIN_BY
Function statement:
MIN_BY(<x> T, <y> integer|double|decimal)
Supported engines: SparkSQL and Presto
Usage instructions: Return the x value associated with the minimum y value.
Return type: T
Example:
> SELECT min_by(x, y) FROM (VALUES (('a', 10)), (('b', 50)), (('c', 20))) AS tab(x, y);
a
STD
Function statement:
STD(<col> integer|double|decimal)
Supported engines: SparkSQL and Presto
Usage Instructions
SparkSQL: Return the sample standard deviation calculated based on the group's values.
Presto: Return the population standard deviation calculated based on the group's values.
Return type: double
Example:
> SELECT std(col) FROM (VALUES (1), (2), (3)) AS tab(col);
1.0
STDDEV
Function statement:
STDDEV(<col> integer|double|decimal)
Supported engines: SparkSQL and Presto
Usage instructions: Return the sample standard deviation calculated based on the group's values.
Return type: double
Example:
> SELECT stddev(col) FROM (VALUES (1), (2), (3)) AS tab(col);
1.0
STDDEV_POP
Function statement:
STDDEV_POP(<col> integer|double|decimal)
Supported engines: SparkSQL and Presto
Usage instructions: Return the population standard deviation calculated based on the group's values.
Return type: double
Example:
> SELECT stddev_pop(col) FROM (VALUES (1), (2), (3)) AS tab(col);
0.816496580927726
STDDEV_SAMP
Function statement:
STDDEV_SAMP(<col> integer|double|decimal)
Supported engines: SparkSQL and Presto
Usage instructions: Return the sample standard deviation calculated based on the group's values.
Return type: double
Example:
> SELECT stddev_samp(col) FROM (VALUES (1), (2), (3)) AS tab(col);
1.0
SUM
Function statement:
SUM(<col> integer|double|decimal)
Supported engines: SparkSQL and Presto
Usage instructions: Return the sum calculated based on the group's values.
Return type: same as col
Example:
> SELECT sum(col) FROM (VALUES (5), (10), (15)) AS tab(col);
30
> SELECT sum(col) FROM (VALUES (NULL), (10), (15)) AS tab(col);
25
> SELECT sum(col) FROM (VALUES (NULL), (NULL)) AS tab(col);
NULL
VARIANCE
Function statement:
VARIANCE(<col> integer|double|decimal)
Supported engines: SparkSQL and Presto
Usage instructions: Return the sample variance calculated based on the group's values.
Return type: double
Example:
> SELECT VARIANCE(col) FROM (VALUES (1), (2), (3)) AS tab(col);
1.0
VAR_POP
Function statement:
VAR_POP(<col> integer|double|decimal)
Supported engines: SparkSQL and Presto
Usage instructions: Return the population variance calculated based on the group's values.
Return type: double
Example:
> SELECT var_pop(col) FROM (VALUES (1), (2), (3)) AS tab(col);
0.6666666666666666
VAR_SAMP
Function statement:
VAR_SAMP(<col> integer|double|decimal)
Supported engines: SparkSQL and Presto
Usage instructions: Return the sample variance calculated based on the group's values.
Return type: double
Example:
> SELECT var_samp(col) FROM (VALUES (1), (2), (3)) AS tab(col);
1.0
HISTOGRAM_NUMERIC
Function statement:
HISTOGRAM_NUMERIC(<col> integer, <nb> integer)
Supported engine: Presto
Usage instructions: Compute a histogram of a numeric column in the group using nb bins with non-uniform intervals. The output is an array of nb (x, y) coordinates, representing the center and height of the bins.
Return type: array<struct {'x','y'}>
Example:
> SELECT histogram_numeric(col, 5) FROM (VALUES (0), (1), (2), (10)) AS tab(col);
[{"x":0,"y":1.0},{"x":1,"y":1.0},{"x":2,"y":1.0},{"x":10,"y":1.0}]
COLLECT_LIST
Function statement:
Supported engines: SparkSQL and Presto
Usage instructions: Collect and return a list of non-unique elements.
Return type: array<T>
Example:
> SELECT collect_list(col) FROM (VALUES (1), (2), (1)) AS tab(col);
[1,2,1]
COLLECT_SET
Function statement:
Supported engines: SparkSQL and Presto
Usage instructions: Collect and return a set of unique elements.
Return type: array<T>
Example:
> SELECT collect_set(col) FROM (VALUES (1), (2), (1)) AS tab(col);
[1,2]
COUNT_MIN_SKETCH
Function statement:
COUNT_MIN_SKETCH(<col> T, <eps> double, <confidence> double, <seed> integer)
Supported engine: SparkSQL
Usage instructions: Return the Count-Min Sketch of a column with the given esp, confidence, and seed. The result is binary and can be deserialized into CountMinSketch before use.
Return type: binary
Example:
> SELECT hex(count_min_sketch(col, 0.5d, 0.5d, 1)) FROM (VALUES (1), (2), (1)) AS tab(col);
0000000100000000000000030000000100000004000000005D8D6AB90000000000000000000000000000000200000000000000010000000000000000
EVERY
Function statement:
Supported engines: SparkSQL and Presto
Usage instructions: Return true if all values of col are true.
Return type: boolean
Example:
> SELECT every(col) FROM (VALUES (true), (true), (true)) AS tab(col);
true
> SELECT every(col) FROM (VALUES (NULL), (true), (true)) AS tab(col);
true
> SELECT every(col) FROM (VALUES (true), (false), (true)) AS tab(col);
false
BOOL_AND
Function statement:
Supported engines: SparkSQL and Presto
Usage instructions: Return true if all values of col are true.
Return type: boolean
Example:
> SELECT bool_and(col) FROM (VALUES (true), (true), (true)) AS tab(col);
true
> SELECT bool_and(col) FROM (VALUES (NULL), (true), (true)) AS tab(col);
true
> SELECT bool_and(col) FROM (VALUES (true), (false), (true)) AS tab(col);
false
AND
Function statement:
Supported engines: SparkSQL and Presto
Usage Instructions
Logical AND
Return type: boolean
Example:
> SELECT true and true;
true
> SELECT true and false;
false
> SELECT true and NULL;
NULL
> SELECT false and NULL;
false
OR
Function statement:
Supported engines: SparkSQL and Presto
Usage instructions: Logical OR
Return type: boolean
Example:
> SELECT true or false;
true
> SELECT false or false;
false
> SELECT true or NULL;
true
> SELECT false or NULL;
NULL
ANY
Function statement:
Supported engines: SparkSQL and Presto
Usage instructions: Return true if at least one value of col is true.
Return type: boolean
Example:
> SELECT any(col) FROM (VALUES (true), (false), (false)) AS tab(col);
true
> SELECT any(col) FROM (VALUES (NULL), (true), (false)) AS tab(col);
true
> SELECT any(col) FROM (VALUES (false), (false), (NULL)) AS tab(col);
false
SOME
Function statement:
Supported engines: SparkSQL and Presto
Usage instructions: Return true if at least one value of col is true.
Return type: boolean
Example:
> SELECT some(col) FROM (VALUES (true), (false), (false)) AS tab(col);
true
> SELECT some(col) FROM (VALUES (NULL), (true), (false)) AS tab(col);
true
> SELECT some(col) FROM (VALUES (false), (false), (NULL)) AS tab(col);
false
BOOL_OR
Function statement:
Supported engines: SparkSQL and Presto
Usage instructions: Return true if at least one value of col is true.
Return type: boolean
Example:
> SELECT BOOL_OR(col) FROM (VALUES (true), (false), (false)) AS tab(col);
true
> SELECT BOOL_OR(col) FROM (VALUES (NULL), (true), (false)) AS tab(col);
true
> SELECT BOOL_OR(col) FROM (VALUES (false), (false), (NULL)) AS tab(col);
false
BIT_AND
Function statement:
BIT_AND(<col> integer|bigint)
Supported engines: SparkSQL and Presto
Usage instructions: Return the bitwise AND of all non-null input values, or null if there are none.
Return type: same as col
Example:
> SELECT bit_and(col) FROM (VALUES (3), (5)) AS tab(col);
1
BIT_OR
Function statement:
BIT_OR(<col> integer|bigint)
Supported engines: SparkSQL and Presto
Usage instructions: Return the bitwise OR of all non-null input values, or null if there are none.
Return type: same as col
Example:
> SELECT bit_or(col) FROM (VALUES (3), (5)) AS tab(col);
7
BIT_XOR
Function statement:
BIT_XOR(<col> integer|bigint)
Supported engines: SparkSQL and Presto
Usage instructions: Return the bitwise XOR of all non-null input values, or null if there are none.
Return type: same as col
Example:
> SELECT bit_xor(col) FROM (VALUES (3), (5)) AS tab(col);
6
ARG_MIN
Function statement:
ARG_MIN(<col1>, <col2> | expr(col2))
Supported engine: SparkSQ
Usage instructions: Return the row with the minimum value of the specified column col1 based on the specified calculation expression.
col1: For sortable column names, if a constant is provided, any value may be returned. If there are multiple minimum values, one of them is returned randomly.
Return type: same as col2 or expr(col2) type
Example:
> SELECT arg_min(dt, uid) from values (1, 'm1'), (2, 't2'), (3, 'z3') as tab(dt, uid);
m1
> SELECT arg_min(dt, upper(uid)) from values (1, 'm1', 1), (2, 't2', 1), (3, 'z3', 2) as tab(dt, uid, gid) group by gid;
M1
Z3
ARG_MAX
Function statement:
ARG_MAX(<col1>, <col2> | expr(col2))
Supported engine: SparkSQL
Usage instructions: Return the row with the maximum value of the specified column col1 based on the specified calculation expression.
col1: For sortable column names, if a constant is provided, any value may be returned. If there are multiple maximum values, one of them is returned randomly.
Return type: same as col2 or expr(col2) type
Example:
> SELECT arg_max(dt, uid) from values (1, 'm1'), (2, 't2'), (3, 'z3') as tab(dt, uid);
z3
> SELECT arg_max(dt, upper(uid)) from values (1, 'm1', 1), (2, 't2', 1), (3, 'z3', 2) as tab(dt, uid, gid) group by gid;
T2
Z3
MAP_UNION_SUM
Function statement:
MAP_UNION_SUM(map<k, v> input)
Supported engine: SparkSQL
Usage instructions: Specify the column and sum all the values in the map by key.
Note 1: Aggregated data is allowed to be null. If all are null, the output is null.
Note 2: The value of the key is allowed to be null, in which case it will be converted to the number 0.
Return type: same as input type
Example:
> SELECT map_union_sum(ids) from values (map(1, 1)), (map(1,2)), (map(2,1)) as tab(ids);
{1:3,2:1}
> SELECT idx, map_union_sum(ids) from values (map(1, 1), 1), (map(1,2), 1), (map(2,1), 1) as tab(ids, idx) group by idx;
1 {1:3,2:1}
Was this page helpful?