tencent cloud

All product documents
Stream Compute Service
Last updated: 2023-11-08 15:32:35
SELECT
Last updated: 2023-11-08 15:32:35

SELECT FROM

The SELECT statement must be used together with CREATE VIEW … AS or INSERT INTO; otherwise, a no operator error will occur.

Syntax

SELECT Fields to select, separated with commas
FROM Data source or view
WHERE Filter condition
Other subqueries

Example

SELECT s1.time_, s1.client_ip, s1.uri, s1.protocol_version, s2.status_code, s2.date_
FROM KafkaSource1 AS s1, KafkaSource2 AS s2
WHERE s1.time_ = s2.time_ AND s1.client_ip = s2.client_ip;

WHERE

You can use WHERE to filter the data to select from. Combine multiple filter conditions with AND or OR. When a TencentDB table is joined, only AND is supported. To use OR, see UNION ALL below.

HAVING

You can use HAVING to filter the result of GROUP BY. WHERE filters data before GROUP BY, while HAVING filters data after GROUP BY.
SELECT SUM(amount)
FROM Orders
WHERE price > 10
GROUP BY users
HAVING SUM(amount) > 50

GROUP BY

In Stream Compute Service, GROUP BY arranges query results into groups. It supports GROUP BY with a time window and GROUP BY without (continuous query).
GROUP BY with a time window does not update previous results and therefore generates append (tuple) data streams. Such data can only be written to MySQL, PostgreSQL, Kafka, and Elasticsearch sinks that do not have a primary key.
GROUP BY without a time window updates previously sent records and therefore generates upsert data streams. Such data can be written to MySQL, PostgreSQL, and Elasticsearch sinks with a primary key (the primary key must be identical to the upsert field in the GROUP BY statement).

GROUP BY with a time window

The example below defines a GROUP BY query with a time window. For details about time window functions, see Time Window Functions.
SELECT user, SUM(amount)
FROM Orders
GROUP BY TUMBLE(rowtime, INTERVAL '1' DAY), user
In Event Time mode (the WATERMARK FOR statement is used to define the timestamp field), the first parameter of the TUMBLE function must be the timestamp field. The same is true for HOP and SESSION.
In Processing Time mode, the first parameter of the TUMBLE function must be the field declared by proctime(). The same is true for HOP and SESSION.

GROUP BY without a time window (continuous query)

The example below defines a GROUP BY query without a time window. This is known as a continuous query. It determines whether to update previously sent results depending on each arriving data record and therefore generates an upsert stream.
SELECT a, SUM(b) as d
FROM Orders
GROUP BY a
Note
Out of Memory errors may occur for such queries due to too many keys or too much data. Please consider this when setting the timeout period. Do not set it too long.

JOIN

Currently, Stream Compute Service only supports Equi-JOIN. That is to say, the JOIN query must include at least one filter condition that equates a field in the left and right tables.

Inner Equi-JOIN (stream join)

There are two types of stream join: JOIN with a time range and JOIN without a time range. The former generates append (tuple) streams, while the latter generates upsert streams.

Inner JOIN with a time range

A JOIN query with a time range is also known as interval join. The WHERE clause of such queries must have at least one equality join condition and a time range. The time range can be specified using <, <=, >=, > or BETWEEN … AND.
ltime = rtime
ltime >= rtime AND ltime < rtime + INTERVAL '10' MINUTE
ltime BETWEEN rtime - INTERVAL '10' SECOND AND rtime + INTERVAL '5' SECOND
Example:
SELECT *
FROM Orders o, Shipments s
WHERE o.id = s.orderId AND
o.ordertime BETWEEN s.shiptime - INTERVAL '4' HOUR AND s.shiptime

Inner JOIN without a time range

An inner join without a time range must have at least one equality join condition, but does not need to specify a time range. This means all historical data will be used for calculation (you can specify a timeout period to exclude inactive elements).
Note
Such queries may significantly drive up memory usage. We recommend you set an appropriate timeout period to exclude inactive objects.
Such queries generate upsert streams and therefore can only use MySQL, PostgreSQL, and Elasticsearch sinks that have a primary key.
Example:
SELECT *
FROM Orders INNER JOIN Product ON Orders.productId = Product.id

Outer Equi-JOIN

‌Outer Equi-JOIN generates upsert streams and therefore can only use MySQL, PostgreSQL, and Elasticsearch sinks that accept such streams.
Note
Because the join order is not optimized, the JOIN query will follow the order of tables in the FROM clause. This may result in high state pressure and cause the query to fail.
SELECT *
FROM Orders LEFT JOIN Product ON Orders.productId = Product.id

SELECT *
FROM Orders RIGHT JOIN Product ON Orders.productId = Product.id

SELECT *
FROM Orders FULL OUTER JOIN Product ON Orders.productId = Product.id

JOIN with temporal tables

Stream Compute Service also supports joining streams and temporal tables (tables that change constantly over time). The syntax is the same, except that the temporal table must be used as the right table.
SELECT
o.amout, o.currency, r.rate, o.amount * r.rate
FROM
Orders AS o
JOIN LatestRates FOR SYSTEM_TIME AS OF o.proctime AS r
ON r.currency = o.currency
Note
Make sure you include the FOR SYSTEM_TIME AS OF clause, without which the JOIN query will still be executed, but the database will be read in its entirety only once, and the result may not meet expectations.

JOIN with user-defined table functions

You can use a user-defined table function (UDTF) as the right table in a JOIN query. The syntax is similar to other JOIN queries. You just need to put the UDTF in LATERAL TABLE( ).

Inner UDTF JOIN

SELECT users, tag
FROM Orders, LATERAL TABLE(unnest_udtf(tags)) t AS tag

Left Outer UDTF JOIN

SELECT users, tag
FROM Orders LEFT JOIN LATERAL TABLE(unnest_udtf(tags)) t AS tag ON TRUE
Note
Currently, Left Outer JOIN with UDTFs only supports ON TRUE, which is similar to CROSS JOIN.

JOIN with arrays

Stream Compute Service supports join operations with a defined array object (you can use value construction functions to construct an array object).
Example: Assume that tags is a defined array.
SELECT users, tag
FROM Orders CROSS JOIN UNNEST(tags) AS t (tag)

UNION ALL

UNION ALL combines the results of two queries.
SELECT *
FROM (
(SELECT user FROM Orders WHERE a % 2 = 0)
UNION ALL
(SELECT user FROM Orders WHERE b = 0)
)
Currently, Stream Compute Service only supports UNION ALL and not UNION. That is, it does not remove duplicate rows.
To remove duplicates, you can use DISTINCT together with UNION ALL. DISTINCT converts append (tuple) streams into upsert streams and therefore can only use MySQL, PostgreSQL, and Elasticsearch sinks that have a primary key.

OVER

You can use OVER to aggregate data streams based on hopping windows (without using GROUP BY). In the OVER clause, you can specify the partition, order, and window frame.
The example below defines an aggregate query based on topping windows. It calculates the total transaction volume (amount) for a window size of 3. To specify the number of preceding rows, use PRECEDING. FOLLOWING is not supported currently.
You can specify only one timestamp field for ORDER BY. In the example below, the proctime field declared in the data source is used.
SELECT SUM(amount) OVER (
PARTITION BY user
ORDER BY proctime
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
FROM Orders
SELECT COUNT(amount) OVER w, SUM(amount) OVER w
FROM Orders
WINDOW w AS (
PARTITION BY user
ORDER BY proctime
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)

ORDER BY

ORDER BY sorts query results. By default, it sorts the data in ascending order (ASC). You can also ‍use DESC to sort the data in descending order.
Note
The first field specified for sorting must be the time column (Event Time or Processing Time, i.e., PROCTIME), in ascending order. After that, you can specify your own fields to sort by.
SELECT *
FROM Orders
ORDER BY `orderTime`, `username` DESC, `userId` ASC

DISTINCT

DISTINCT removes duplicates from query results. It should be added after SELECT.
SELECT DISTINCT users FROM Orders
DISTINCT generates upsert streams, so you need to use sinks that accept upsert streams. Please note that continuous use of such queries may result in high memory usage.
We recommend you set an appropriate timeout period to exclude inactive objects and reduce memory usage.

IN

You can use the IN keyword to determine whether an element exists in a specified set, such as a subquery.
Note
This operation is demanding on memory.
SELECT user, amount
FROM Orders
WHERE product IN (
SELECT product FROM NewProducts
)

EXISTS

If the result of the subquery following EXISTS has one or more rows (data exists), true is returned.
Note
This operation is demanding on memory.
SELECT user, amount
FROM Orders
WHERE product EXISTS (
SELECT product FROM NewProducts
)

ORDER BY

ORDER BY sorts data by a specified field.
Note
This operation is demanding on memory.
SELECT *
FROM Orders
ORDER BY orderTime

Grouping Sets, Rollup, Cube

Grouping Sets, Rollup, and Cube generate upsert streams. Therefore, you need to use data sinks that accept upsert streams.
SELECT SUM(amount)
FROM Orders
GROUP BY GROUPING SETS ((user), (product))

MATCH_RECOGNIZE

MATCH_RECOGNIZE performs pattern recognition on an input stream, allowing you to use a SQL query to describe complex event processing (CEP) logic.
SELECT T.aid, T.bid, T.cid
FROM MyTable
MATCH_RECOGNIZE (
PARTITION BY userid
ORDER BY proctime
MEASURES
A.id AS aid,
B.id AS bid,
C.id AS cid
PATTERN (A B C)
DEFINE
A AS name = 'a',
B AS name = 'b',
C AS name = 'c'
) AS T
The above example defines three events, A, B, and C, where the name field equates a, b, and c. PATTERN specifies the trigger rule. In the example, the trigger rule is when A, B, and C occur consecutively. MEASURES specifies the output format.
For details, see the Flink document Detecting Patterns in Tables.

Top-N

Top-N gets the N smallest or largest values from a data stream. It generates upsert streams, so you need to use data sinks that accept upsert streams.
To learn more about ‌Top-N syntax, see the Flink document Top-N.

Deduplication

In cases where the input data includes consecutive duplicate values, you can use this clause to remove the duplicates. To learn more about the syntax, see Flink document Deduplication.
Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback

Contact Us

Contact our sales team or business advisors to help your business.

Technical Support

Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

7x24 Phone Support