tencent cloud

Feedback

Collection Functions

Last updated: 2024-08-07 17:33:09

    APPROX_COUNT_DISTINCT

    Function statement:
    APPROX_COUNT_DISTINCT(<expr> any [, <relativeSD> integer|double|decimal])
    Supported engine: SparkSQL
    Usage instructions: Return the cardinality estimated by HyperLogLog++. relativeSD defines the allowable maximum relative standard deviation.
    Return type: bigint
    Example:
    SELECT approx_count_distinct(col1) FROM (VALUES (1), (1), (2), (2), (3)) tab(col1);
    3

    AVG

    Function statement:
    AVG(<expr> integer|double|decimal)
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the average value calculated based on the group's values.
    Return type: double
    Example:
    > SELECT avg(col) FROM (VALUES (1), (2), (3)) AS tab(col);
    2.0
    > SELECT avg(col) FROM (VALUES (1), (2), (NULL)) AS tab(col);
    1.5

    CORR

    Function statement:
    CORR(<expr> integer|double|decimal, <expr> integer|double|decimal)
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the Pearson correlation coefficient between a set of numeric pairs.
    Return type: double
    Example:
    > SELECT corr(c1, c2) FROM (VALUES (3, 2), (3, 3), (6, 4)) as tab(c1, c2);
    0.8660254037844387

    COUNT

    Function statement:
    COUNT(*)
    COUNT([DISTINCT] <col1> ANY, <col2> ANY, ...)
    Supported engines: SparkSQL and Presto
    Usage instructions: COUNT(*): Return the total number of retrieved rows, including rows with null values. COUNT(<col1> ANY, <col2> ANY, ...): Return the number of rows where all provided expressions are non-null. COUNT([DISTINCT] <col1> ANY, <col2> ANY, ...): Return the number of unique and non-null rows for the provided expressions.
    Return type: integer
    Example:
    > SELECT count(*) FROM (VALUES (NULL), (5), (5), (20)) AS tab(col);
    4
    > SELECT count(col) FROM (VALUES (NULL), (5), (5), (20)) AS tab(col);
    3
    > SELECT count(DISTINCT col) (FROM VALUES (NULL), (5), (5), (10)) AS tab(col);
    2

    COUNT_IF

    Function statement:
    COUNT_IF(<expr> ANY)
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the number of rows where the expression is TRUE.
    Return type: int
    Example:
    > SELECT count_if(col % 2 = 0) FROM (VALUES (NULL), (0), (1), (2), (3)) AS tab(col);
    2
    > SELECT count_if(col IS NULL) FROM (VALUES (NULL), (0), (1), (2), (3)) AS tab(col);
    1

    COVER_POP

    Function statement:
    COVAR_POP(<expr1> integer|double|decimal, <expr2> integer|double|decimal)
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the population covariance of a set of numeric pairs.
    Return type: double
    Example:
    > SELECT covar_pop(c1, c2) FROM (VALUES (1,1), (2,2), (3,3)) AS tab(c1, c2);
    0.6666666666666666

    COVER_SAMP

    Function statement:
    COVAR_SAMP(<expr1> integer|double|decimal, <expr2> integer|double|decimal)
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the sample covariance of a set of numeric pairs.
    Return type: double
    Example:
    > SELECT covar_samp(c1, c2) FROM (VALUES (1,1), (2,2), (3,3)) AS tab(c1, c2);
    1.0

    FIRST_VALUE

    Function statement:
    FIRST_VALUE(<expr> T[, <isIgnoreNull> boolean])
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the first value of expr in a set of rows. If isIgnoreNull is true, only non-null values are returned.
    Return type: T
    Example:
    > SELECT first_value(col) FROM (VALUES (10), (5), (20)) AS tab(col);
    10
    > SELECT first_value(col) FROM (VALUES (NULL), (5), (20)) AS tab(col);
    NULL
    > SELECT first_value(col, true) FROM (VALUES (NULL), (5), (20)) AS tab(col);
    5

    FIRST

    Function statement:
    FIRST(<expr> T[, <isIgnoreNull> boolean])
    Supported engine: SparkSQL
    Usage instructions: Return the first value of expr in a set of rows. If isIgnoreNull is true, only non-null values are returned.
    Return type: T
    Example:
    > SELECT first(col) FROM (VALUES (10), (5), (20)) AS tab(col);
    10
    > SELECT first(col) FROM (VALUES (NULL), (5), (20)) AS tab(col);
    NULL
    > SELECT first(col, true) FROM (VALUES (NULL), (5), (20)) AS tab(col);
    5

    KURTOSIS

    Function statement:
    KURTOSIS(<expr> integer|double|decimal)
    Supported engine: SparkSQL
    Usage instructions: Return the kurtosis value calculated based on the group's values.
    Return type: double
    Example:
    > SELECT kurtosis(col) FROM (VALUES (-10), (-20), (100), (1000)) AS tab(col);
    -0.7014368047529627
    > SELECT kurtosis(col) FROM (VALUES (1), (10), (100), (10), (1)) as tab(col);
    0.19432323191699075s

    LAST_VALUE

    Function statement:
    LAST_VALUE(<expr> T[, <isIgnoreNull> boolean])
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the last value of expr in a set of rows. If isIgnoreNull is true, only non-null values are returned.
    Return type: T
    Example:
    > SELECT last_value(col) FROM (VALUES (10), (5), (20)) AS tab(col);
    20
    > SELECT last_value(col) FROM (VALUES (10), (5), (NULL)) AS tab(col);
    NULL
    > SELECT last_value(col, true) FROM (VALUES (10), (5), (NULL)) AS tab(col);
    5

    LAST

    Function statement:
    LAST(<expr> T[, <isIgnoreNull> boolean])
    Supported engine: SparkSQL
    Usage instructions: Return the last value of expr in a set of rows. If isIgnoreNull is true, only non-null values are returned.
    Return type: T
    Example:
    > SELECT last(col) FROM (VALUES (10), (5), (20)) AS tab(col);
    20
    > SELECT last(col) FROM (VALUES (10), (5), (NULL)) AS tab(col);
    NULL
    > SELECT last(col, true) FROM (VALUES (10), (5), (NULL)) AS tab(col);
    5

    MEAN

    Function statement:
    MEAN(<expr> integer|double|decimal)
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the average value calculated based on the group's values.
    Return type: double
    Example:
    > SELECT mean(col) FROM (VALUES (1), (2), (3)) AS tab(col);
    2.0
    > SELECT mean(col) FROM (VALUES (1), (2), (NULL)) AS tab(col);
    1.5

    PERCENTILE

    Function statement:
    PERCENTILE(<col> ANY, <percentage> integer|double|decimal|array<double> [, <frequency> integer])
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the exact percentile value of the numeric column col at the given percentage. The percentage value must be between 0.0 and 1.0. frequency should be a positive integer.
    Return type: double
    Example:
    > SELECT percentile(col, 0.3) FROM (VALUES (0), (10)) AS tab(col);
    3.0
    > SELECT percentile(col, array(0.25, 0.75)) FROM (VALUES (0), (10)) AS tab(col);
    [2.5,7.5]

    SKEWNESS

    Function statement:
    SKEWNESS(<col> integer|double|decimal)
    Supported engine: SparkSQL
    Usage instructions: Return the skewness value calculated based on the group's values.
    Return type: double
    Example:
    > SELECT skewness(col) FROM (VALUES (-10), (-20), (100), (1000)) AS tab(col);
    1.1135657469022011
    > SELECT skewness(col) FROM (VALUES (-1000), (-100), (10), (20)) AS tab(col);
    -1.1135657469022011

    PERCENTILE_APPROX

    Function statement:
    PERCENTILE_APPROX(<col> integer|double|decimal, <percentage> double|array<double>[, <accuracy> integer])
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the approximate percentile of the numeric column col, which is the minimum value in the ordered col values (sorted from minimum to maximum) such that no more than the given percentage of col values are less than or equal to this value. The percentage value must be between 0.0 and 1.0. The accuracy parameter (default value: 10,000) is a positive numeric literal that controls the approximation accuracy at the cost of memory. The higher the accuracy value, the better the precision. 1.0/accuracy is the approximate relative error. When percentage is an array, each value in the percentage array must be between 0.0 and 1.0. In this case, an array of approximate percentiles for the column col corresponding to the given percentage array is returned.
    Return type: integer |array<integer>
    Example:
    > SELECT percentile_approx(col, array(0.5, 0.4, 0.1), 100) FROM (VALUES (0), (1), (2), (10)) AS tab(col);
    [1,1,0]
    > SELECT percentile_approx(col, 0.5, 100) FROM (VALUES (0), (6), (7), (9), (10)) AS tab(col);
    7

    APPROX_PERCENTILE

    Function statement:
    APPROX_PERCENTILE(<col> integer|double|decimal, <percentage> double|array<double>[, <accuracy> integer])
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the approximate percentile of the numeric column col, which is the minimum value in the ordered col values (sorted from minimum to maximum) such that no more than the given percentage of col values are less than or equal to this value. The percentage value must be between 0.0 and 1.0. The accuracy parameter (default value: 10,000) is a positive numeric literal that controls the approximation accuracy at the cost of memory. The higher the accuracy value, the better the precision. 1.0/accuracy is the approximate relative error. When percentage is an array, each value in the percentage array must be between 0.0 and 1.0. In this case, an array of approximate percentiles for the column col corresponding to the given percentage array is returned.
    Return type: integer |array<integer>
    Example:
    > SELECT APPROX_PERCENTILE(col, array(0.5, 0.4, 0.1), 100) FROM (VALUES (0), (1), (2), (10)) AS tab(col);
    [1,1,0]
    > SELECT APPROX_PERCENTILE(col, 0.5, 100) FROM (VALUES (0), (6), (7), (9), (10)) AS tab(col);
    7

    MAX

    Function statement:
    MAX(<col> integer|double|decimal)
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the maximum value of col.
    Return type: same as col
    Example:
    > SELECT max(col) FROM (VALUES (10), (50), (20)) AS tab(col);
    50

    MAX_BY

    Function statement:
    MAX_BY(<x> T, <y> integer|double|decimal)
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the x value associated with the maximum y value.
    Return type: T
    Example:
    > SELECT max_by(x, y) FROM (VALUES (('a', 10)), (('b', 50)), (('c', 20))) AS tab(x, y);
    b

    MIN

    Function statement:
    MIN(<col> integer|double|decimal)
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the minimum value of col.
    Return type: same as col
    Example:
    > SELECT min(col) FROM (VALUES (10), (50), (20)) AS tab(col);
    10

    MIN_BY

    Function statement:
    MIN_BY(<x> T, <y> integer|double|decimal)
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the x value associated with the minimum y value.
    Return type: T
    Example:
    > SELECT min_by(x, y) FROM (VALUES (('a', 10)), (('b', 50)), (('c', 20))) AS tab(x, y);
    a

    STD

    Function statement:
    STD(<col> integer|double|decimal)
    Supported engines: SparkSQL and Presto

    Usage Instructions

    SparkSQL: Return the sample standard deviation calculated based on the group's values.
    Presto: Return the population standard deviation calculated based on the group's values.
    Return type: double
    Example:
    > SELECT std(col) FROM (VALUES (1), (2), (3)) AS tab(col);
    1.0

    STDDEV

    Function statement:
    STDDEV(<col> integer|double|decimal)
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the sample standard deviation calculated based on the group's values.
    Return type: double
    Example:
    > SELECT stddev(col) FROM (VALUES (1), (2), (3)) AS tab(col);
    1.0

    STDDEV_POP

    Function statement:
    STDDEV_POP(<col> integer|double|decimal)
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the population standard deviation calculated based on the group's values.
    Return type: double
    Example:
    > SELECT stddev_pop(col) FROM (VALUES (1), (2), (3)) AS tab(col);
    0.816496580927726

    STDDEV_SAMP

    Function statement:
    STDDEV_SAMP(<col> integer|double|decimal)
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the sample standard deviation calculated based on the group's values.
    Return type: double
    Example:
    > SELECT stddev_samp(col) FROM (VALUES (1), (2), (3)) AS tab(col);
    1.0

    SUM

    Function statement:
    SUM(<col> integer|double|decimal)
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the sum calculated based on the group's values.
    Return type: same as col
    Example:
    > SELECT sum(col) FROM (VALUES (5), (10), (15)) AS tab(col);
    30
    > SELECT sum(col) FROM (VALUES (NULL), (10), (15)) AS tab(col);
    25
    > SELECT sum(col) FROM (VALUES (NULL), (NULL)) AS tab(col);
    NULL

    VARIANCE

    Function statement:
    VARIANCE(<col> integer|double|decimal)
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the sample variance calculated based on the group's values.
    Return type: double
    Example:
    > SELECT VARIANCE(col) FROM (VALUES (1), (2), (3)) AS tab(col);
    1.0

    VAR_POP

    Function statement:
    VAR_POP(<col> integer|double|decimal)
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the population variance calculated based on the group's values.
    Return type: double
    Example:
    > SELECT var_pop(col) FROM (VALUES (1), (2), (3)) AS tab(col);
    0.6666666666666666

    VAR_SAMP

    Function statement:
    VAR_SAMP(<col> integer|double|decimal)
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the sample variance calculated based on the group's values.
    Return type: double
    Example:
    > SELECT var_samp(col) FROM (VALUES (1), (2), (3)) AS tab(col);
    1.0

    HISTOGRAM_NUMERIC

    Function statement:
    HISTOGRAM_NUMERIC(<col> integer, <nb> integer)
    Supported engine: Presto
    Usage instructions: Compute a histogram of a numeric column in the group using nb bins with non-uniform intervals. The output is an array of nb (x, y) coordinates, representing the center and height of the bins.
    Return type: array<struct {'x','y'}>
    Example:
    > SELECT histogram_numeric(col, 5) FROM (VALUES (0), (1), (2), (10)) AS tab(col);
    [{"x":0,"y":1.0},{"x":1,"y":1.0},{"x":2,"y":1.0},{"x":10,"y":1.0}]

    COLLECT_LIST

    Function statement:
    COLLECT_LIST(<col> T)
    Supported engines: SparkSQL and Presto
    Usage instructions: Collect and return a list of non-unique elements.
    Return type: array<T>
    Example:
    > SELECT collect_list(col) FROM (VALUES (1), (2), (1)) AS tab(col);
    [1,2,1]

    COLLECT_SET

    Function statement:
    COLLECT_SET(<col> T)
    Supported engines: SparkSQL and Presto
    Usage instructions: Collect and return a set of unique elements.
    Return type: array<T>
    Example:
    > SELECT collect_set(col) FROM (VALUES (1), (2), (1)) AS tab(col);
    [1,2]

    COUNT_MIN_SKETCH

    Function statement:
    COUNT_MIN_SKETCH(<col> T, <eps> double, <confidence> double, <seed> integer)
    Supported engine: SparkSQL
    Usage instructions: Return the Count-Min Sketch of a column with the given esp, confidence, and seed. The result is binary and can be deserialized into CountMinSketch before use.
    Return type: binary
    Example:
    > SELECT hex(count_min_sketch(col, 0.5d, 0.5d, 1)) FROM (VALUES (1), (2), (1)) AS tab(col);
    0000000100000000000000030000000100000004000000005D8D6AB90000000000000000000000000000000200000000000000010000000000000000

    EVERY

    Function statement:
    EVERY(<col> boolean)
    Supported engines: SparkSQL and Presto
    Usage instructions: Return true if all values of col are true.
    Return type: boolean
    Example:
    > SELECT every(col) FROM (VALUES (true), (true), (true)) AS tab(col);
    true
    > SELECT every(col) FROM (VALUES (NULL), (true), (true)) AS tab(col);
    true
    > SELECT every(col) FROM (VALUES (true), (false), (true)) AS tab(col);
    false

    BOOL_AND

    Function statement:
    BOOL_AND(<col> boolean)
    Supported engines: SparkSQL and Presto
    Usage instructions: Return true if all values of col are true.
    Return type: boolean
    Example:
    > SELECT bool_and(col) FROM (VALUES (true), (true), (true)) AS tab(col);
    true
    > SELECT bool_and(col) FROM (VALUES (NULL), (true), (true)) AS tab(col);
    true
    > SELECT bool_and(col) FROM (VALUES (true), (false), (true)) AS tab(col);
    false

    AND

    Function statement:
    <expr1> AND <expr2>
    Supported engines: SparkSQL and Presto

    Usage Instructions

    Logical AND
    Return type: boolean
    Example:
    > SELECT true and true;
    true
    > SELECT true and false;
    false
    > SELECT true and NULL;
    NULL
    > SELECT false and NULL;
    false

    OR

    Function statement:
    <expr1> OR <expr2>
    Supported engines: SparkSQL and Presto
    Usage instructions: Logical OR
    Return type: boolean
    Example:
    > SELECT true or false;
    true
    > SELECT false or false;
    false
    > SELECT true or NULL;
    true
    > SELECT false or NULL;
    NULL

    ANY

    Function statement:
    ANY(<col> boolean)
    Supported engines: SparkSQL and Presto
    Usage instructions: Return true if at least one value of col is true.
    Return type: boolean
    Example:
    > SELECT any(col) FROM (VALUES (true), (false), (false)) AS tab(col);
    true
    > SELECT any(col) FROM (VALUES (NULL), (true), (false)) AS tab(col);
    true
    > SELECT any(col) FROM (VALUES (false), (false), (NULL)) AS tab(col);
    false

    SOME

    Function statement:
    SOME(<col> boolean)
    Supported engines: SparkSQL and Presto
    Usage instructions: Return true if at least one value of col is true.
    Return type: boolean
    Example:
    > SELECT some(col) FROM (VALUES (true), (false), (false)) AS tab(col);
    true
    > SELECT some(col) FROM (VALUES (NULL), (true), (false)) AS tab(col);
    true
    > SELECT some(col) FROM (VALUES (false), (false), (NULL)) AS tab(col);
    false

    BOOL_OR

    Function statement:
    BOOL_OR(<col> boolean)
    Supported engines: SparkSQL and Presto
    Usage instructions: Return true if at least one value of col is true.
    Return type: boolean
    Example:
    > SELECT BOOL_OR(col) FROM (VALUES (true), (false), (false)) AS tab(col);
    true
    > SELECT BOOL_OR(col) FROM (VALUES (NULL), (true), (false)) AS tab(col);
    true
    > SELECT BOOL_OR(col) FROM (VALUES (false), (false), (NULL)) AS tab(col);
    false

    BIT_AND

    Function statement:
    BIT_AND(<col> integer|bigint)
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the bitwise AND of all non-null input values, or null if there are none.
    Return type: same as col
    Example:
    > SELECT bit_and(col) FROM (VALUES (3), (5)) AS tab(col);
    1

    BIT_OR

    Function statement:
    BIT_OR(<col> integer|bigint)
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the bitwise OR of all non-null input values, or null if there are none.
    Return type: same as col
    Example:
    > SELECT bit_or(col) FROM (VALUES (3), (5)) AS tab(col);
    7

    BIT_XOR

    Function statement:
    BIT_XOR(<col> integer|bigint)
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the bitwise XOR of all non-null input values, or null if there are none.
    Return type: same as col
    Example:
    > SELECT bit_xor(col) FROM (VALUES (3), (5)) AS tab(col);
    6

    ARG_MIN

    Function statement:
    ARG_MIN(<col1>, <col2> | expr(col2))
    Supported engine: SparkSQ
    Usage instructions: Return the row with the minimum value of the specified column col1 based on the specified calculation expression.
    col1: For sortable column names, if a constant is provided, any value may be returned. If there are multiple minimum values, one of them is returned randomly.
    Return type: same as col2 or expr(col2) type
    Example:
    > SELECT arg_min(dt, uid) from values (1, 'm1'), (2, 't2'), (3, 'z3') as tab(dt, uid);
    m1
    > SELECT arg_min(dt, upper(uid)) from values (1, 'm1', 1), (2, 't2', 1), (3, 'z3', 2) as tab(dt, uid, gid) group by gid;
    M1
    Z3

    ARG_MAX

    Function statement:
    ARG_MAX(<col1>, <col2> | expr(col2))
    Supported engine: SparkSQL
    Usage instructions: Return the row with the maximum value of the specified column col1 based on the specified calculation expression.
    col1: For sortable column names, if a constant is provided, any value may be returned. If there are multiple maximum values, one of them is returned randomly.
    Return type: same as col2 or expr(col2) type
    Example:
    > SELECT arg_max(dt, uid) from values (1, 'm1'), (2, 't2'), (3, 'z3') as tab(dt, uid);
    z3
    > SELECT arg_max(dt, upper(uid)) from values (1, 'm1', 1), (2, 't2', 1), (3, 'z3', 2) as tab(dt, uid, gid) group by gid;
    T2
    Z3

    MAP_UNION_SUM

    Function statement:
    MAP_UNION_SUM(map<k, v> input)
    Supported engine: SparkSQL
    Usage instructions: Specify the column and sum all the values in the map by key.
    Note 1: Aggregated data is allowed to be null. If all are null, the output is null.
    Note 2: The value of the key is allowed to be null, in which case it will be converted to the number 0.
    Return type: same as input type
    Example:
    > SELECT map_union_sum(ids) from values (map(1, 1)), (map(1,2)), (map(2,1)) as tab(ids);
    {1:3,2:1}
    > SELECT idx, map_union_sum(ids) from values (map(1, 1), 1), (map(1,2), 1), (map(2,1), 1) as tab(ids, idx) group by idx;
    1 {1:3,2:1}
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support