tencent cloud

Feedback

Window Functions

Last updated: 2024-08-07 17:33:25

    row_number

    Function statement:
    row_number()
    Supported engines: SparkSQL and Presto
    Usage instructions: Assign a unique consecutive number to each row.
    Return type: int
    Example:
    
    SELECT a, b, row_number() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
    A1 1 1
    A1 1 2
    A1 2 3
    A2 3 1

    rank

    Function statement:
    rank()
    Supported engines: SparkSQL and Presto
    Usage instructions: Compute the rank of a value within a group of values. If there are ties, leaves gaps in the ranking sequence.
    Return type: int
    Example:
    SELECT a, b, rank() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
    A1 1 1
    A1 1 1
    A1 2 3
    A2 3 1

    dense_rank

    Function statement:
    dense_rank()
    Supported engines: SparkSQL and Presto
    Usage instructions: Compute the rank of a value within a group of values. Unlike the rank function, dense_rank does not leave gaps in the ranking sequence in the case of ties.
    Return type: int
    Example:
    SELECT a, b, dense_rank() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
    A1 1 1
    A1 1 1
    A1 2 2
    A2 3 1

    percent_rank()

    Function statement:
    percent_rank()
    
    Supported engines: SparkSQL and Presto
    Usage instructions: Compute the percentile rank of a value within a group of values. The return value is a decimal between 0 and 1.
    Return type: double
    Example:
    SELECT a, b, percent_rank() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
    A1 1 0.0
    A1 1 0.0
    A1 2 1.0
    A2 3 0.0

    cume_dist

    Function statement:
    cume_dist()
    Supported engines: SparkSQL and Presto
    Usage instructions: Compute the position of a value relative to all values within a partition.
    Return type: double
    Example:
    SELECT a, b, cume_dist() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
    A1 1 0.6666666666666666
    A1 1 0.6666666666666666
    A1 2 1.0
    A2 3 1.0

    first_value

    Function statement:
    first_value(col)
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the value of the first row of a column in a partition.
    Return type: data type of the col column
    Example:
    SELECT a, b, first_value(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
    A1 1 1
    A1 1 1
    A1 2 1
    A2 3 3

    last_value

    Function statement:
    last_value(col)
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the value of the last row of a column in a partition.
    Return type: int
    Example:
    SELECT a, b, last_value(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
    A1 1 1
    A1 1 1
    A1 2 2
    A2 3 3

    lag

    Function statement:
    lag(col[, n [, default]])
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the value of the nth row above the current row in the window. The default value for n is 1, and the default value for default is null. If the value of the nth row is null, null is returned. If such an offset row does not exist (e.g., the first row of the window has no row above it when the offset is 1), default is returned. The first parameter is the column name, the second parameter is the nth row before, and the third parameter is the default value.
    Return type: data type of the col column
    Example:
    SELECT a, b, lag(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
    A1 1 NULL
    A1 1 1
    A1 2 1
    A2 3 NULL

    lead

    Function statement:
    lead(col[, n[, default]])
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the value of the nth row after the current row in the window. The default value for n is 1, and the default value for default is null. If the value of the nth row is null, null is returned. If such an offset row does not exist (e.g., the last row of the window has no row below it when the offset is 1), default is returned. The first parameter is the column name, the second parameter is the nth row before, and the third parameter is the default value.
    Return type: data type of the col column
    Example:
    SELECT a, b, lead(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
    A1 1 1
    A1 1 2
    A1 2 NULL
    A2 3 NULL

    nth_value

    Function statement:
    nth_value(col[, n])
    Supported engines: SparkSQL and Presto
    Usage instructions: Return the value of the nth row from the start of the window. n starts from 1. If ignoreNulls=true, null will be skipped when searching for the nth row. Otherwise, every row is counted in n. If the nth row does not exist (e.g., when n is 10 but the window size is less than 10), null is returned. The first parameter is the column name, and the second parameter is the nth row.
    Return type: data type of the col column
    Example:
    SELECT a, b, nth_value(b, 2) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
    A1 1 1
    A1 1 1
    A1 2 1
    A2 3 NULL

    ntile

    Function statement:
    ntile(n)
    Supported engines: SparkSQL and Presto
    Usage instructions: Divide the rows in the window partition into n buckets and return the bucket number of the row, ranging from 1 to n.
    Return type: int
    Example:
    SELECT a, b, ntile(2) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
    A1 1 1
    A1 1 1
    A1 2 2
    A2 3 1

    CLUSTER_SAMPLE

    Function statement:
    CLUSTER_SAMPLE(<int> N[, <int> M]) over (PARTITION BY col1 ORDER by col2)
    Supported engine: SparkSQL
    Usage instructions: Sample within the window according to the specified proportion or quantity.
    N: Required, int type. When only N is provided, it indicates sampling N data entries. The sampling result is close to N but not guaranteed to be exactly N.
    M: Optional, int type. When M is specified, it indicates sampling the total number of entries in the M/N window. The sampling result is close to total entries of M/N .
    Return type: boolean, where true indicates the row is sampled, and false indicates it is not sampled.
    Example:
    > SELECT a, b, cluster_sample(2) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
    A1 2 true
    A1 1 true
    A2 3 true
    A1 1 false
    > SELECT a, b from (select a, b, cluster_sample(2) OVER (PARTITION BY a ORDER BY b) as c FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)) where c;
    A1 2
    A1 1
    A2 3
    
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support