ColumnName | Type | AggregationType | Comment |
user_id | LARGEINT | - | User ID |
date | DATE | - | Data Import Date |
timestamp | DATETIME | - | Data import time, precise to the second |
city | VARCHAR(20) | - | User's city |
age | SMALLINT | - | User's age |
sex | TINYINT | - | User's gender |
last_visit_date | DATETIME | REPLACE | User's last access time |
cost | BIGINT | SUM | Total user consumption |
max_dwell_time | INT | MAX | User's maximum stay time |
min_dwell_time | INT | MIN | User's minimum stay time |
user_id | date | timestamp | city | age | sex | last_visit_date | cost | max_dwell_time | min_dwell_time |
10000 | 2017-10-01 | 2017-10-01 08:00:05 | Beijing | 20 | 0 | 2017-10-01 06:00:00 | 20 | 10 | 10 |
10000 | 2017-10-01 | 2017-10-01 09:00:05 | Beijing | 20 | 0 | 2017-10-01 07:00:00 | 15 | 2 | 2 |
10001 | 2017-10-01 | 2017-10-01 18:12:10 | Beijing | 30 | 1 | 2017-10-01 17:05:45 | 2 | 22 | 22 |
10002 | 2017-10-02 | 2017-10-02 13:10:00 | Shanghai | 20 | 1 | 2017-10-02 12:59:12 | 200 | 5 | 5 |
10003 | 2017-10-02 | 2017-10-02 13:15:00 | Guangzhou | 32 | 0 | 2017-10-02 11:20:00 | 30 | 11 | 11 |
10004 | 2017-10-01 | 2017-10-01 12:12:48 | Shenzhen | 35 | 0 | 2017-10-01 10:00:15 | 100 | 3 | 3 |
10004 | 2017-10-03 | 2017-10-03 12:38:20 | Shenzhen | 35 | 0 | 2017-10-03 10:20:22 | 11 | 6 | 6 |
user_id | cost |
10000 | 35 |
10001 | 2 |
10002 | 200 |
10003 | 30 |
10004 | 111 |
SELECT user_id, sum(cost) FROM table GROUP BY user_id;
ColumnName | Type | AggregationType | Comment |
city | VARCHAR(20) | - | User's city |
age | SMALLINT | - | User's age |
cost | BIGINT | SUM | Total user consumption |
max_dwell_time | INT | MAX | User's maximum stay time |
min_dwell_time | INT | MIN | User's minimum stay time |
city | age | cost | max_dwell_time | min_dwell_time |
Beijing | 20 | 35 | 10 | 2 |
Beijing | 30 | 2 | 22 | 22 |
Shanghai | 20 | 200 | 5 | 5 |
Guangzhou | 32 | 30 | 11 | 11 |
Shenzhen | 35 | 111 | 6 | 3 |
mysql> SELECT city, age, sum(cost), max(max_dwell_time), min(min_dwell_time) FROM table GROUP BY city, age;mysql> SELECT city, sum(cost), max(max_dwell_time), min(min_dwell_time) FROM table GROUP BY city;mysql> SELECT city, age, sum(cost), min(min_dwell_time) FROM table GROUP BY city, age;
ColumnName | Type |
user_id | BIGINT |
age | INT |
message | VARCHAR(100) |
max_dwell_time | DATETIME |
min_dwell_time | DATETIME |
ColumnName | Type |
age | INT |
user_id | BIGINT |
message | VARCHAR(100) |
max_dwell_time | DATETIME |
min_dwell_time | DATETIME |
mysql> SELECT * FROM table where age=20 and message LIKE "%error%";
EXPLAIN your_sql;
command to obtain the query execution plan, and check whether ROLLUP is hit in the plan.DESC tbl_name ALL;
command.-----> Match from left to right+----+----+----+----+----+----+| c1 | c2 | c3 | c4 | c5 |... |
+---------------+-------+--------------+------+-------+---------+-------+| IndexName | Field | Type | Null | Key | Default | Extra |+---------------+-------+--------------+------+-------+---------+-------+| test | k1 | TINYINT | Yes | true | N/A | || | k2 | SMALLINT | Yes | true | N/A | || | k3 | INT | Yes | true | N/A | || | k4 | BIGINT | Yes | true | N/A | || | k5 | DECIMAL(9,3) | Yes | true | N/A | || | k6 | CHAR(5) | Yes | true | N/A | || | k7 | DATE | Yes | true | N/A | || | k8 | DATETIME | Yes | true | N/A | || | k9 | VARCHAR(20) | Yes | true | N/A | || | k10 | DOUBLE | Yes | false | N/A | MAX || | k11 | FLOAT | Yes | false | N/A | SUM || | | | | | | || rollup_index1 | k9 | VARCHAR(20) | Yes | true | N/A | || | k1 | TINYINT | Yes | true | N/A | || | k2 | SMALLINT | Yes | true | N/A | || | k3 | INT | Yes | true | N/A | || | k4 | BIGINT | Yes | true | N/A | || | k5 | DECIMAL(9,3) | Yes | true | N/A | || | k6 | CHAR(5) | Yes | true | N/A | || | k7 | DATE | Yes | true | N/A | || | k8 | DATETIME | Yes | true | N/A | || | k10 | DOUBLE | Yes | false | N/A | MAX || | k11 | FLOAT | Yes | false | N/A | SUM || | | | | | | || rollup_index2 | k9 | VARCHAR(20) | Yes | true | N/A | || | k2 | SMALLINT | Yes | true | N/A | || | k1 | TINYINT | Yes | true | N/A | || | k3 | INT | Yes | true | N/A | || | k4 | BIGINT | Yes | true | N/A | || | k5 | DECIMAL(9,3) | Yes | true | N/A | || | k6 | CHAR(5) | Yes | true | N/A | || | k7 | DATE | Yes | true | N/A | || | k8 | DATETIME | Yes | true | N/A | || | k10 | DOUBLE | Yes | false | N/A | MAX || | k11 | FLOAT | Yes | false | N/A | SUM || | | | | | | || rollup_index3 | k4 | BIGINT | Yes | true | N/A | || | k5 | DECIMAL(9,3) | Yes | true | N/A | || | k6 | CHAR(5) | Yes | true | N/A | || | k1 | TINYINT | Yes | true | N/A | || | k2 | SMALLINT | Yes | true | N/A | || | k3 | INT | Yes | true | N/A | || | k7 | DATE | Yes | true | N/A | || | k8 | DATETIME | Yes | true | N/A | || | k9 | VARCHAR(20) | Yes | true | N/A | || | k10 | DOUBLE | Yes | false | N/A | MAX || | k11 | FLOAT | Yes | false | N/A | SUM || | | | | | | || rollup_index4 | k4 | BIGINT | Yes | true | N/A | || | k6 | CHAR(5) | Yes | true | N/A | || | k5 | DECIMAL(9,3) | Yes | true | N/A | || | k1 | TINYINT | Yes | true | N/A | || | k2 | SMALLINT | Yes | true | N/A | || | k3 | INT | Yes | true | N/A | || | k7 | DATE | Yes | true | N/A | || | k8 | DATETIME | Yes | true | N/A | || | k9 | VARCHAR(20) | Yes | true | N/A | || | k10 | DOUBLE | Yes | false | N/A | MAX || | k11 | FLOAT | Yes | false | N/A | SUM |+---------------+-------+--------------+------+-------+---------+-------+
Base(k1 ,k2, k3, k4, k5, k6, k7)rollup_index1(k9)rollup_index2(k9)rollup_index3(k4, k5, k6, k1, k2, k3, k7)rollup_index4(k4, k6, k5, k1, k2, k3, k7)
=<><=>=inbetween
, and the parallel relationship is connected with and
, for or
,!=
and so on cannot be hit, then look at the following query:
SELECT * FROM test WHERE k1 = 1 AND k2 > 3;
.
There are conditions on k1 and k2, check that only the first column of Base contains the k1 in the condition, so the longest matching prefix index is test:| 0:OlapScanNode| TABLE: test| PREAGGREGATION: OFF. Reason: No AggregateInfo| PREDICATES: `k1` = 1, `k2` > 3| partitions=1/1| rollup: test| buckets=1/10| cardinality=-1| avgRowSize=0.0| numNodes=0| tuple ids: 0
SELECT * FROM test WHERE k4 = 1 AND k5 > 3;
There are conditions on k4 and k5, check that the first column of rollup_index3, rollup_index4 contains k4, but the second column of rollup_index3 contains k5, so the longest matching prefix index.| 0:OlapScanNode| TABLE: test| PREAGGREGATION: OFF. Reason: No AggregateInfo| PREDICATES: `k4` = 1, `k5` > 3| partitions=1/1| rollup: rollup_index3| buckets=10/10| cardinality=-1| avgRowSize=0.0| numNodes=0| tuple ids: 0
SELECT * FROM test WHERE k9 IN ("xxx", "yyyy") AND k1 = 10;
.| 0:OlapScanNode| TABLE: test| PREAGGREGATION: OFF. Reason: No AggregateInfo| PREDICATES: `k9` IN ('xxx', 'yyyy'), `k1` = 10| partitions=1/1| rollup: rollup_index1| buckets=1/10| cardinality=-1| avgRowSize=0.0| numNodes=0| tuple ids: 0
SELECT * FROM test WHERE k4 < 1000 AND k5 = 80 AND k6 >= 10000;
.
There are three conditions, k4, k5, k6. The first three columns of rollup_index3 and rollup_index4 respectively contain these three columns, so the length of the prefix index matched by these two is the same, and the selection of these two can be done. The current default strategy is to select a rollup created earlier, and rollup_index3 is selected here.| 0:OlapScanNode| TABLE: test| PREAGGREGATION: OFF. Reason: No AggregateInfo| PREDICATES: `k4` < 1000, `k5` = 80, `k6` >= 10000.0| partitions=1/1| rollup: rollup_index3| buckets=10/10| cardinality=-1| avgRowSize=0.0| numNodes=0| tuple ids: 0
SELECT * FROM test WHERE k4 < 1000 AND k5 = 80 OR k6 >= 10000;
, this query cannot hit the prefix index. (Even any Min/Max, BloomFilter index in Doris storage engine cannot work).Column Type Query Type | Sum | Distinct/Count Distinct | Min | Max | APPROX_COUNT_DISTINCT |
Key | false | true | true | true | true |
Value(Sum) | true | false | false | false | false |
Value(Replace) | false | false | false | false | false |
Value(Min) | false | false | true | false | false |
Value(Max) | false | false | false | true | false |
+-------------+-------+--------------+------+-------+---------+-------+| IndexName | Field | Type | Null | Key | Default | Extra |+-------------+-------+--------------+------+-------+---------+-------+| test_rollup | k1 | TINYINT | Yes | true | N/A | || | k2 | SMALLINT | Yes | true | N/A | || | k3 | INT | Yes | true | N/A | || | k4 | BIGINT | Yes | true | N/A | || | k5 | DECIMAL(9,3) | Yes | true | N/A | || | k6 | CHAR(5) | Yes | true | N/A | || | k7 | DATE | Yes | true | N/A | || | k8 | DATETIME | Yes | true | N/A | || | k9 | VARCHAR(20) | Yes | true | N/A | || | k10 | DOUBLE | Yes | false | N/A | MAX || | k11 | FLOAT | Yes | false | N/A | SUM || | | | | | | || rollup2 | k1 | TINYINT | Yes | true | N/A | || | k2 | SMALLINT | Yes | true | N/A | || | k3 | INT | Yes | true | N/A | || | k10 | DOUBLE | Yes | false | N/A | MAX || | k11 | FLOAT | Yes | false | N/A | SUM || | | | | | | || rollup1 | k1 | TINYINT | Yes | true | N/A | || | k2 | SMALLINT | Yes | true | N/A | || | k3 | INT | Yes | true | N/A | || | k4 | BIGINT | Yes | true | N/A | || | k5 | DECIMAL(9,3) | Yes | true | N/A | || | k10 | DOUBLE | Yes | false | N/A | MAX || | k11 | FLOAT | Yes | false | N/A | SUM |+-------------+-------+--------------+------+-------+---------+-------+
SELECT SUM(k11) FROM test_rollup WHERE k1 = 10 AND k2 > 200 AND k3 in (1,2,3);
.
Firstly, judge whether the query can hit the aggregate's Rollup table. After checking the above diagram, it is positive. The conditions contain the three conditions k1, k2, k3, which are all contained in the first three columns of test_rollup, rollup1, rollup2, so the prefix index length is consistent, and then compare the number of rows. Obviously, rollup2 has the highest aggregate degree and the fewest number of rows, so you need to choose rollup2.| 0:OlapScanNode || TABLE: test_rollup || PREAGGREGATION: ON || PREDICATES: `k1` = 10, `k2` > 200, `k3` IN (1, 2, 3) || partitions=1/1 || rollup: rollup2 || buckets=1/10 || cardinality=-1 || avgRowSize=0.0 || numNodes=0 || tuple ids: 0 |
select l_partkey,l_suppkey,l_quantity from lineitem where l_partkey=xxxx;
alter table lineitem add rollup rutest(l_partkey,l_suppkey,l_quantity);
explain select l_partkey,l_suppkey,l_quantity from lineitem where l_partkey= 10187238;
Was this page helpful?