tencent cloud

All product documents
Stream Compute Service
Time Window Functions
Last updated: 2023-11-08 11:33:35
Time Window Functions
Last updated: 2023-11-08 11:33:35
In stream processing, streams are unbounded, and we don't know when the data source will continue/stop sending data. Therefore, the way to do aggregations (e.g., count, sum) on streams differs from that in batch processing. In stream processing, windows are used to limit the scope of aggregation, such as "counting website clicks in the last 2 minutes" and "counting the total number of people who liked this video among the recent 100 people". Windows are equivalent to collecting a dynamic table of bounded data so that we can perform aggregations on the data in the table.
Window functions are a special type of functions that are not used in the projection list of SELECT, but in the GROUP BY clause. Stream Compute Service supports three types of window functions: TUMBLE, HOP, and SESSION.
To learn about Flink stream processing, see Timely Stream Processing.

TUMBLE

A tumbling window assigns data to non-overlapping windows with a fixed size that is customizable. We can perform aggregations on the data within the window.

Syntax

TUMBLE(time_attr, interval)
time_attr is a timestamp parameter that specifies the time when a record is processed. If specified as PROCTIME, it is an automatically generated timestamp that records the moment when the data is processed by Flink. It is generally used in Processing Time mode.
interval specifies the window size. For example, use INTERVAL '1' DAY to set a 1-day window size and INTERVAL '2' HOUR a 2-hour window size. For other usage, see Date and Time Functions.
Note
In Event Time mode (the WATERMARK FOR statement is used to define the timestamp field), the first parameter of the TUMBLE, HOP, and SESSION window functions must be the timestamp field.
In Processing Time mode, the first parameter of the TUMBLE, HOP, and SESSION window functions must be the calculated column generated by the proctime() function. In the following example, we use PROCTIME, but you should replace it with the actual column name in your job.

Identification functions

Identification functions are used to identify the start and end timestamps of windows.
Function
Description
TUMBLE_START(time-attr, size-interval)
Returns the start timestamp of the window.
TUMBLE_END(time-attr, size-interval)
Returns the end timestamp of the window.

Example

This example helps you better understand a tumbling window. It uses Event Time to count the hourly income of each user.
Example data:
username (VARCHAR)
income (BIGINT)
times (TIMESTAMP)
Tom
20
2021-11-11 10:30:00.0
Jack
10
2021-11-11 10:35:00.0
Tom
10
2021-11-11 10:35:00.0
Tom
10
2021-11-11 10:40:00.0
Tom
15
2021-11-11 11:30:00.0
Jack
10
2021-11-11 11:30:00.0
Jack
15
2021-11-11 11:40:00.0
SQL statements:
CREATE TABLE user_income (
username VARCHAR,
Income INT,
times TIMESTAMP(3),
WATERMARK FOR times AS times - INTERVAL '3' SECOND
) WITH (
...
);

CREATE TABLE output (
win_start TIMESTAMP,
win_end TIMESTAMP,
username VARCHAR,
hour_income BIGINT
)WITH(
...
);

INSERT INTO output
SELECT
TUMBLE_START(times,INTERVAL '1' HOUR),
TUMBLE_END(times,INTERVAL '1' HOUR),
username,
SUM(Income)
FROM user_income
GROUP BY TUMBLE(times,INTERVAL '1' HOUR),username;
Output result:
win_start (TIMESTAMP)
win_end (TIMESTAMP)
username (VARCHAR)
hour_income (BIGINT)
2021-11-11 10:00:00.0
2021-11-11 11:00:00.0
Tom
40
2021-11-11 10:00:00.0
2021-11-11 11:00:00.0
Jack
10
2021-11-11 11:00:00.0
2021-11-11 12:00:00.0
Tom
15
2021-11-11 11:00:00.0
2021-11-11 12:00:00.0
Jack
25

HOP

A hopping window assigns elements to fixed-size windows. Similar to a tumbling window, a hopping window supports window size customization. The other parameter controls how frequently a window is started.
A hopping window maintains a fixed window size and slides at a specified hop interval, allowing overlapping windows.
The hop interval determines the frequency at which Flink creates new windows.
If the hop interval is smaller than the window size, hopping windows are overlapping. In this case, elements are assigned to multiple windows.
If the hop interval is greater than the window size, some events may be discarded.
If the hop interval is equal to the window size, this window is equivalent to a tumbling window.

Syntax

HOP(time_attr, sliding_interval, window_size_interval)
time_attr is a timestamp parameter that specifies the time when a record is processed. If specified as PROCTIME, it is an automatically generated timestamp that records the moment when the data is processed by Flink. It is generally used in Processing Time mode.
window_size_interval specifies the window size. For example, use INTERVAL '1' DAY to set a 1-day window size and INTERVAL '2' HOUR a 2-hour window size. For other usage, see Date and Time Functions.
sliding_interval specifies the hop interval. For example, use INTERVAL '1' DAY to set a 1-day window size and INTERVAL '2' HOUR a 2-hour window size. For other usage, see Date and Time Functions.

Identification functions

Identification functions are used to identify the start and end timestamps of windows.
Function
Description
HOP_START(time-attr, slide-interval,size-interval)
Returns the start timestamp of the window.
HOP_END(time-attr, slide-interval,size-interval)
Returns the end timestamp of the window.

Example

This example helps you better understand a hopping window. It uses Event Time to count the hourly income of each user, which is updated every 30 minutes. The window size is 1 hour, with a hop interval of 10 minutes.
Example data:
username (VARCHAR)
income (BIGINT)
times (TIMESTAMP)
Tom
20
2021-11-11 10:30:00.0
Jack
10
2021-11-11 10:35:00.0
Tom
10
2021-11-11 10:35:00.0
Tom
10
2021-11-11 10:40:00.0
Tom
15
2021-11-11 11:35:00.0
Jack
10
2021-11-11 11:30:00.0
Jack
15
2021-11-11 11:40:00.0
SQL statements:
CREATE TABLE user_income (
username VARCHAR,
Income INT,
times TIMESTAMP(3),
WATERMARK FOR times AS times - INTERVAL '3' MINUTE
)WITH(
...
);

CREATE TABLE output (
win_start TIMESTAMP,
win_end TIMESTAMP,
username VARCHAR,
hour_income BIGINT
)WITH(
...
);

INSERT INTO output
SELECT
HOP_START(times,INTERVAL '30' MINUTE,INTERVAL '1' HOUR),
HOP_END(times,INTERVAL '30' MINUTE,INTERVAL '1' HOUR),
username,
SUM(income)
FROM user_income
GROUP BY HOP(times,INTERVAL '30' MINUTE,INTERVAL '1' HOUR),username;
Output result:
win_start (TIMESTAMP)
win_end (TIMESTAMP)
username (VARCHAR)
hour_income (BIGINT)
2021-11-11 10:00:00.0
2021-11-11 11:00:00.0
Tom
40
2021-11-11 10:00:00.0
2021-11-11 11:00:00.0
Jack
10
2021-11-11 10:30:00.0
2021-11-11 11:30:00.0
Jack
10
2021-11-11 10:30:00.0
2021-11-11 11:30:00.0
Tom
40
2021-11-11 11:00:00.0
2021-11-11 12:00:00.0
Tom
15
2021-11-11 11:00:00.0
2021-11-11 12:00:00.0
Jack
25
2021-11-11 11:30:00.0
2021-11-11 12:30:00.0
Jack
25
2021-11-11 11:30:00.0
2021-11-11 12:30:00.0
Tom
15

SESSION

A session window groups elements by session activity. Unlike a tumbling or hopping window, a session window does not have overlapping windows or fixed start and end timestamps. Instead, the window is closed when it no longer receives elements within a fixed time period, which is called an inactive interval. A session window is configured using a session interval. The session interval defines the length of the inactive period. When this period elapses, the current session is closed and subsequent elements are assigned to a new session window.
Session windows are based on inactivity rather than size. For example, if there is no activity (no new data) for more than 30 minutes, the existing window is closed, and a new window starts when new data is observed later.

Syntax

SESSION(time_attr, interval)
time_attr is a timestamp parameter that specifies the time when a record is processed. If specified as PROCTIME, it is an automatically generated timestamp that records the moment when the data is processed by Flink. It is generally used in Processing Time mode.
interval specifies the window size. For example, use INTERVAL '1' DAY to set a 1-day window size and INTERVAL '2' HOUR a 2-hour window size. For other usage, see Date and Time Functions.

Identification functions

Identification functions are used to identify the start and end timestamps of windows.
Function
Description
SESSION_START(time-attr, size-interval)
Returns the start timestamp of the window.
SESSION_END(time-attr, size-interval)
Returns the end timestamp of the window.

Example

This example helps you better understand a session window. It uses Event Time to count the hourly income of each user, with a session timeout period of 30 minutes.
Example data:
username (VARCHAR)
income (BIGINT)
times (TIMESTAMP)
Tom
20
2021-11-11 10:30:00.0
Jack
10
2021-11-11 10:35:00.0
Tom
10
2021-11-11 10:35:00.0
Tom
10
2021-11-11 10:40:00.0
Tom
15
2021-11-11 11:50:00.0
Jack
10
2021-11-11 11:40:00.0
Jack
15
2021-11-11 11:45:00.0
SQL statements:
CREATE TABLE user_income (
username VARCHAR,
Income INT,
times TIMESTAMP(3),
WATERMARK FOR times AS times - INTERVAL '3' MINUTE
)WITH(
...
);

CREATE TABLE output (
win_start TIMESTAMP,
win_end TIMESTAMP,
username VARCHAR,
hour_income BIGINT
)WITH(
...
);

INSERT INTO output
SELECT
SESSION_START(times,INTERVAL '30' MINUTE),
SESSION_END(times,INTERVAL '30' MINUTE),
username,
SUM(Income)
FORM user_income
GROUP BY SESSION(times,INTERVAL '30' MINUTE),username;
Output result:
win_start (TIMESTAMP)
win_end (TIMESTAMP)
username (VARCHAR)
hour_income (BIGINT)
2021-11-11 10:30:00.0
2021-11-11 11:10:00.0
Tom
40
2021-11-11 10:35:00.0
2021-11-11 11:05:00.0
Jack
10
2021-11-11 11:30:00.0
2021-11-11 12:00:00.0
Tom
15
2021-11-11 11:30:00.0
2021-11-11 12:10:00.0
Jack
25

More notes

The preceding windows all have corresponding auxiliary functions, which are the same except for the prefix. Here, the auxiliary functions for tumbling windows are used as examples.
TUMBLE_ROWTIME: (Event Time mode) Returns the timestamp of the inclusive upper bound of the tumbling window. The resulting attribute can be used in subsequent time-based operations such as interval joins and group window or over window aggregations. Example:
SELECT user,
TUMBLE_START(rowtime, INTERVAL '12' HOUR) AS sStart,
TUMBLE_ROWTIME(rowtime, INTERVAL '12' HOUR) AS snd,
SUM(amount)
FROM Orders
GROUP BY TUMBLE(rowtime, INTERVAL '12' HOUR), user
TUMBLE_PROCTIME: (Processing Time mode) Returns the timestamp of the inclusive upper bound of the tumbling window. The resulting attribute can be used in subsequent time-based operations such as interval joins and group window or over window aggregations. Example:
SELECT user,
TUMBLE_START(PROCTIME, INTERVAL '12' HOUR) AS sStart,
TUMBLE_PROCTIME(PROCTIME, INTERVAL '12' HOUR) AS snd,
SUM(amount)
FROM Orders
GROUP BY TUMBLE(PROCTIME, INTERVAL '12' HOUR), user

Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback

Contact Us

Contact our sales team or business advisors to help your business.

Technical Support

Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

7x24 Phone Support
Hong Kong, China
+852 800 906 020 (Toll Free)
United States
+1 844 606 0804 (Toll Free)
United Kingdom
+44 808 196 4551 (Toll Free)
Canada
+1 888 605 7930 (Toll Free)
Australia
+61 1300 986 386 (Toll Free)
EdgeOne hotline
+852 300 80699
More local hotlines coming soon