Window Functions

row_number
Function statement:
  row_number()
Supported engines: SparkSQL and Presto
Usage instructions: Assign a unique consecutive number to each row.
Return type: int
Example:
﻿
SELECT a, b, row_number() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1	1	1
A1	1	2
A1	2	3
A2	3	1
rank
Function statement:
  rank()
Supported engines: SparkSQL and Presto
Usage instructions: Compute the rank of a value within a group of values. If there are ties, leaves gaps in the ranking sequence.
Return type: int
Example:
SELECT a, b, rank() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1	1	1
A1	1	1
A1	2	3
A2	3	1
dense_rank
Function statement:
  dense_rank()
Supported engines: SparkSQL and Presto
Usage instructions: Compute the rank of a value within a group of values. Unlike the rank function, dense_rank does not leave gaps in the ranking sequence in the case of ties.
Return type: int
Example:
SELECT a, b, dense_rank() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1	1	1
A1	1	1
A1	2	2
A2	3	1
percent_rank()
Function statement:
  percent_rank()
﻿
Supported engines: SparkSQL and Presto
Usage instructions: Compute the percentile rank of a value within a group of values. The return value is a decimal between 0 and 1.
Return type: double
Example:
SELECT a, b, percent_rank() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1	1	0.0
A1	1	0.0
A1	2	1.0
A2	3	0.0
cume_dist
Function statement:
cume_dist()  
Supported engines: SparkSQL and Presto
Usage instructions: Compute the position of a value relative to all values within a partition.
Return type: double
Example:
SELECT a, b, cume_dist() OVER (PARTITION BY a ORDER BY b)  FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1	1	0.6666666666666666
A1	1	0.6666666666666666
A1	2	1.0
A2	3	1.0
first_value
Function statement:
  first_value(col)
Supported engines: SparkSQL and Presto
Usage instructions: Return the value of the first row of a column in a partition.
Return type: data type of the col column
Example:
SELECT a, b, first_value(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1	1	1
A1	1	1
A1	2	1
A2	3	3
last_value
Function statement:
  last_value(col)
Supported engines: SparkSQL and Presto
Usage instructions: Return the value of the last row of a column in a partition.
Return type: int
Example:
SELECT a, b, last_value(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1	1	1
A1	1	1
A1	2	2
A2	3	3
lag
Function statement:
 lag(col[, n [, default]]) 
Supported engines: SparkSQL and Presto
Usage instructions: Return the value of the nth row above the current row in the window. The default value for n is 1, and the default value for default is null. If the value of the nth row is null, null is returned. If such an offset row does not exist (e.g., the first row of the window has no row above it when the offset is 1), default is returned. The first parameter is the column name, the second parameter is the nth row before, and the third parameter is the default value.
Return type: data type of the col column
Example:
SELECT a, b, lag(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1	1	NULL
A1	1	1
A1	2	1
A2	3	NULL
lead
Function statement:
  lead(col[, n[, default]])
Supported engines: SparkSQL and Presto
Usage instructions: Return the value of the nth row after the current row in the window. The default value for n is 1, and the default value for default is null. If the value of the nth row is null, null is returned. If such an offset row does not exist (e.g., the last row of the window has no row below it when the offset is 1), default is returned. The first parameter is the column name, the second parameter is the nth row before, and the third parameter is the default value.
Return type: data type of the col column
Example:
SELECT a, b, lead(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1	1	1
A1	1	2
A1	2	NULL
A2	3	NULL
nth_value
Function statement:
  nth_value(col[, n])
Supported engines: SparkSQL and Presto
Usage instructions: Return the value of the nth row from the start of the window. n starts from 1. If ignoreNulls=true, null will be skipped when searching for the nth row. Otherwise, every row is counted in n. If the nth row does not exist (e.g., when n is 10 but the window size is less than 10), null is returned. The first parameter is the column name, and the second parameter is the nth row.
Return type: data type of the col column
Example:
SELECT a, b, nth_value(b, 2) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1	1	1
A1	1	1
A1	2	1
A2	3	NULL 
ntile
Function statement:
  ntile(n)
Supported engines: SparkSQL and Presto
Usage instructions: Divide the rows in the window partition into n buckets and return the bucket number of the row, ranging from 1 to n.
Return type: int
Example:
SELECT a, b, ntile(2) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)
A1	1	1
A1	1	1
A1	2	2
A2	3	1
CLUSTER_SAMPLE
Function statement:
  CLUSTER_SAMPLE(<int> N[, <int> M]) over (PARTITION BY col1 ORDER by col2)
Supported engine: SparkSQL
Usage instructions: Sample within the window according to the specified proportion or quantity.
N: Required, int type. When only N is provided, it indicates sampling N data entries. The sampling result is close to N but not guaranteed to be exactly N.
M: Optional, int type. When M is specified, it indicates sampling the total number of entries in the M/N  window. The sampling result is close to total entries of M/N .
Return type: boolean, where true indicates the row is sampled, and false indicates it is not sampled.
Example:
> SELECT a, b, cluster_sample(2) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1 2 true
A1 1 true
A2 3 true
A1 1 false
> SELECT a, b from (select a, b, cluster_sample(2) OVER (PARTITION BY a ORDER BY b) as c FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b)) where c;
A1 2
A1 1
A2 3
﻿
﻿

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

tencent cloud

row_number

rank

dense_rank

percent_rank()

cume_dist

first_value

last_value

lag

lead

nth_value

ntile

CLUSTER_SAMPLE

About Tencent Cloud

Help & Support

Resources

User Center

tencent cloud

Sign Up

Log in

Compute

Microservice

Data Migration

Database SaaS Tool

Data Security

Application Security

Big Data

Tencent Big Model

Internet of Things

Stream Services

Cloud Real-time Rendering

Cloud Resource Management

More

Edge Computing

Serverless

Relational Database

Networking

Business Security

Domains & Websites

Face Recognition

AI Platform Service

Middleware

Media On-Demand

Game Services

Management and Audit Tools

Container

Essential Storage Service

Enterprise Distributed DBMS

CDN and Acceleration

Security Services

Enterprise Applications

Voice Technology

Natural Language Processing

Communication

Media Process Services

Education Sevices

Developer Tools

Distributed cloud

Data Process and Analysis

NoSQL Database

Network Security

Cloud Security

Office Collaboration

Image Creation

Optical Character Recognition

Interactive Video Services

Media SDK

Medical Services

Monitor and Operation

row_number

rank

dense_rank

percent_rank()

cume_dist

first_value

last_value

lag

lead

nth_value

ntile

CLUSTER_SAMPLE

About Tencent Cloud

Help & Support

Resources

User Center