ColumnName | Type | AggregationType | Comment |
user_id | LARGEINT | - | 用户 ID |
date | DATE | - | 数据导入日期 |
timestamp | DATETIME | - | 数据导入时间,精确到秒 |
city | VARCHAR(20) | - | 用户所在城市 |
age | SMALLINT | - | 用户年龄 |
sex | TINYINT | - | 用户性别 |
last_visit_date | DATETIME | REPLACE | 用户最后一次访问时间 |
cost | BIGINT | SUM | 用户总消费 |
max_dwell_time | INT | MAX | 用户最大停留时间 |
min_dwell_time | INT | MIN | 用户最小停留时间 |
user_id | date | timestamp | city | age | sex | last_visit_date | cost | max_dwell_time | min_dwell_time |
10000 | 2017-10-01 | 2017-10-01 08:00:05 | 北京 | 20 | 0 | 2017-10-01 06:00:00 | 20 | 10 | 10 |
10000 | 2017-10-01 | 2017-10-01 09:00:05 | 北京 | 20 | 0 | 2017-10-01 07:00:00 | 15 | 2 | 2 |
10001 | 2017-10-01 | 2017-10-01 18:12:10 | 北京 | 30 | 1 | 2017-10-01 17:05:45 | 2 | 22 | 22 |
10002 | 2017-10-02 | 2017-10-02 13:10:00 | 上海 | 20 | 1 | 2017-10-02 12:59:12 | 200 | 5 | 5 |
10003 | 2017-10-02 | 2017-10-02 13:15:00 | 广州 | 32 | 0 | 2017-10-02 11:20:00 | 30 | 11 | 11 |
10004 | 2017-10-01 | 2017-10-01 12:12:48 | 深圳 | 35 | 0 | 2017-10-01 10:00:15 | 100 | 3 | 3 |
10004 | 2017-10-03 | 2017-10-03 12:38:20 | 深圳 | 35 | 0 | 2017-10-03 10:20:22 | 11 | 6 | 6 |
user_id | cost |
10000 | 35 |
10001 | 2 |
10002 | 200 |
10003 | 30 |
10004 | 111 |
SELECT user_id, sum(cost) FROM table GROUP BY user_id;
ColumnName | Type | AggregationType | Comment |
city | VARCHAR(20) | - | 用户所在城市 |
age | SMALLINT | - | 用户年龄 |
cost | BIGINT | SUM | 用户总消费 |
max_dwell_time | INT | MAX | 用户最大停留时间 |
min_dwell_time | INT | MIN | 用户最小停留时间 |
city | age | cost | max_dwell_time | min_dwell_time |
北京 | 20 | 35 | 10 | 2 |
北京 | 30 | 2 | 22 | 22 |
上海 | 20 | 200 | 5 | 5 |
广州 | 32 | 30 | 11 | 11 |
深圳 | 35 | 111 | 6 | 3 |
mysql> SELECT city, age, sum(cost), max(max_dwell_time), min(min_dwell_time) FROM table GROUP BY city, age;mysql> SELECT city, sum(cost), max(max_dwell_time), min(min_dwell_time) FROM table GROUP BY city;mysql> SELECT city, age, sum(cost), min(min_dwell_time) FROM table GROUP BY city, age;
ColumnName | Type |
user_id | BIGINT |
age | INT |
message | VARCHAR(100) |
max_dwell_time | DATETIME |
min_dwell_time | DATETIME |
ColumnName | Type |
age | INT |
user_id | BIGINT |
message | VARCHAR(100) |
max_dwell_time | DATETIME |
min_dwell_time | DATETIME |
mysql> SELECT * FROM table where age=20 and message LIKE "%error%";
EXPLAIN your_sql;
命令获得查询执行计划,在执行计划中,查看是否命中 ROLLUP。DESC tbl_name ALL;
语句显示 Base 表和所有已创建完成的 ROLLUP。-----> 从左到右匹配+----+----+----+----+----+----+| c1 | c2 | c3 | c4 | c5 |... |
+---------------+-------+--------------+------+-------+---------+-------+| IndexName | Field | Type | Null | Key | Default | Extra |+---------------+-------+--------------+------+-------+---------+-------+| test | k1 | TINYINT | Yes | true | N/A | || | k2 | SMALLINT | Yes | true | N/A | || | k3 | INT | Yes | true | N/A | || | k4 | BIGINT | Yes | true | N/A | || | k5 | DECIMAL(9,3) | Yes | true | N/A | || | k6 | CHAR(5) | Yes | true | N/A | || | k7 | DATE | Yes | true | N/A | || | k8 | DATETIME | Yes | true | N/A | || | k9 | VARCHAR(20) | Yes | true | N/A | || | k10 | DOUBLE | Yes | false | N/A | MAX || | k11 | FLOAT | Yes | false | N/A | SUM || | | | | | | || rollup_index1 | k9 | VARCHAR(20) | Yes | true | N/A | || | k1 | TINYINT | Yes | true | N/A | || | k2 | SMALLINT | Yes | true | N/A | || | k3 | INT | Yes | true | N/A | || | k4 | BIGINT | Yes | true | N/A | || | k5 | DECIMAL(9,3) | Yes | true | N/A | || | k6 | CHAR(5) | Yes | true | N/A | || | k7 | DATE | Yes | true | N/A | || | k8 | DATETIME | Yes | true | N/A | || | k10 | DOUBLE | Yes | false | N/A | MAX || | k11 | FLOAT | Yes | false | N/A | SUM || | | | | | | || rollup_index2 | k9 | VARCHAR(20) | Yes | true | N/A | || | k2 | SMALLINT | Yes | true | N/A | || | k1 | TINYINT | Yes | true | N/A | || | k3 | INT | Yes | true | N/A | || | k4 | BIGINT | Yes | true | N/A | || | k5 | DECIMAL(9,3) | Yes | true | N/A | || | k6 | CHAR(5) | Yes | true | N/A | || | k7 | DATE | Yes | true | N/A | || | k8 | DATETIME | Yes | true | N/A | || | k10 | DOUBLE | Yes | false | N/A | MAX || | k11 | FLOAT | Yes | false | N/A | SUM || | | | | | | || rollup_index3 | k4 | BIGINT | Yes | true | N/A | || | k5 | DECIMAL(9,3) | Yes | true | N/A | || | k6 | CHAR(5) | Yes | true | N/A | || | k1 | TINYINT | Yes | true | N/A | || | k2 | SMALLINT | Yes | true | N/A | || | k3 | INT | Yes | true | N/A | || | k7 | DATE | Yes | true | N/A | || | k8 | DATETIME | Yes | true | N/A | || | k9 | VARCHAR(20) | Yes | true | N/A | || | k10 | DOUBLE | Yes | false | N/A | MAX || | k11 | FLOAT | Yes | false | N/A | SUM || | | | | | | || rollup_index4 | k4 | BIGINT | Yes | true | N/A | || | k6 | CHAR(5) | Yes | true | N/A | || | k5 | DECIMAL(9,3) | Yes | true | N/A | || | k1 | TINYINT | Yes | true | N/A | || | k2 | SMALLINT | Yes | true | N/A | || | k3 | INT | Yes | true | N/A | || | k7 | DATE | Yes | true | N/A | || | k8 | DATETIME | Yes | true | N/A | || | k9 | VARCHAR(20) | Yes | true | N/A | || | k10 | DOUBLE | Yes | false | N/A | MAX || | k11 | FLOAT | Yes | false | N/A | SUM |+---------------+-------+--------------+------+-------+---------+-------+
Base(k1 ,k2, k3, k4, k5, k6, k7)rollup_index1(k9)rollup_index2(k9)rollup_index3(k4, k5, k6, k1, k2, k3, k7)rollup_index4(k4, k6, k5, k1, k2, k3, k7)
=<><=>=inbetween
,并列的且关系使用 and
连接,对于or
、!=
等这些不能命中,然后看以下查询:
SELECT * FROM test WHERE k1 = 1 AND k2 > 3;
。
有 k1 以及 k2 上的条件,检查只有 Base 的第一列含有条件里的 k1,所以匹配最长的前缀索引即 test:| 0:OlapScanNode| TABLE: test| PREAGGREGATION: OFF. Reason: No AggregateInfo| PREDICATES: `k1` = 1, `k2` > 3| partitions=1/1| rollup: test| buckets=1/10| cardinality=-1| avgRowSize=0.0| numNodes=0| tuple ids: 0
SELECT * FROM test WHERE k4 = 1 AND k5 > 3;
有 k4 以及 k5 的条件,检查 rollup_index3、rollup_index4 的第一列含有 k4,但是 rollup_index3 的第二列含有 k5,所以匹配的前缀索引最长。| 0:OlapScanNode| TABLE: test| PREAGGREGATION: OFF. Reason: No AggregateInfo| PREDICATES: `k4` = 1, `k5` > 3| partitions=1/1| rollup: rollup_index3| buckets=10/10| cardinality=-1| avgRowSize=0.0| numNodes=0| tuple ids: 0
SELECT * FROM test WHERE k9 IN ("xxx", "yyyy") AND k1 = 10;
。| 0:OlapScanNode| TABLE: test| PREAGGREGATION: OFF. Reason: No AggregateInfo| PREDICATES: `k9` IN ('xxx', 'yyyy'), `k1` = 10| partitions=1/1| rollup: rollup_index1| buckets=1/10| cardinality=-1| avgRowSize=0.0| numNodes=0| tuple ids: 0
SELECT * FROM test WHERE k4 < 1000 AND k5 = 80 AND k6 >= 10000;
。
有 k4,k5,k6 三个条件,rollup_index3 以及 rollup_index4 的前3列分别含有这三列,所以两者匹配的前缀索引长度一致,选取两者都可以,当前默认的策略为选取了比较早创建的一张 rollup,这里为 rollup_index3。| 0:OlapScanNode| TABLE: test| PREAGGREGATION: OFF. Reason: No AggregateInfo| PREDICATES: `k4` < 1000, `k5` = 80, `k6` >= 10000.0| partitions=1/1| rollup: rollup_index3| buckets=10/10| cardinality=-1| avgRowSize=0.0| numNodes=0| tuple ids: 0
SELECT * FROM test WHERE k4 < 1000 AND k5 = 80 OR k6 >= 10000;
。
则这里的查询不能命中前缀索引。(甚至 Doris 存储引擎内的任何 Min/Max,BloomFilter 索引都不能起作用)。列类型 查询类型 | Sum | Distinct/Count Distinct | Min | Max | APPROX_COUNT_DISTINCT |
Key | false | true | true | true | true |
Value(Sum) | true | false | false | false | false |
Value(Replace) | false | false | false | false | false |
Value(Min) | false | false | true | false | false |
Value(Max) | false | false | false | true | false |
+-------------+-------+--------------+------+-------+---------+-------+| IndexName | Field | Type | Null | Key | Default | Extra |+-------------+-------+--------------+------+-------+---------+-------+| test_rollup | k1 | TINYINT | Yes | true | N/A | || | k2 | SMALLINT | Yes | true | N/A | || | k3 | INT | Yes | true | N/A | || | k4 | BIGINT | Yes | true | N/A | || | k5 | DECIMAL(9,3) | Yes | true | N/A | || | k6 | CHAR(5) | Yes | true | N/A | || | k7 | DATE | Yes | true | N/A | || | k8 | DATETIME | Yes | true | N/A | || | k9 | VARCHAR(20) | Yes | true | N/A | || | k10 | DOUBLE | Yes | false | N/A | MAX || | k11 | FLOAT | Yes | false | N/A | SUM || | | | | | | || rollup2 | k1 | TINYINT | Yes | true | N/A | || | k2 | SMALLINT | Yes | true | N/A | || | k3 | INT | Yes | true | N/A | || | k10 | DOUBLE | Yes | false | N/A | MAX || | k11 | FLOAT | Yes | false | N/A | SUM || | | | | | | || rollup1 | k1 | TINYINT | Yes | true | N/A | || | k2 | SMALLINT | Yes | true | N/A | || | k3 | INT | Yes | true | N/A | || | k4 | BIGINT | Yes | true | N/A | || | k5 | DECIMAL(9,3) | Yes | true | N/A | || | k10 | DOUBLE | Yes | false | N/A | MAX || | k11 | FLOAT | Yes | false | N/A | SUM |+-------------+-------+--------------+------+-------+---------+-------+
SELECT SUM(k11) FROM test_rollup WHERE k1 = 10 AND k2 > 200 AND k3 in (1,2,3);
。
首先判断查询是否可以命中聚合的 Rollup表,经过查上面的图是可以的,然后条件中含有 k1,k2,k3 三个条件,这三个条件 test_rollup、rollup1、rollup2 的前三列都含有,所以前缀索引长度一致,然后比较行数显然 rollup2 的聚合程度最高行数最少所以选取 rollup2。| 0:OlapScanNode || TABLE: test_rollup || PREAGGREGATION: ON || PREDICATES: `k1` = 10, `k2` > 200, `k3` IN (1, 2, 3) || partitions=1/1 || rollup: rollup2 || buckets=1/10 || cardinality=-1 || avgRowSize=0.0 || numNodes=0 || tuple ids: 0 |
select l_partkey,l_suppkey,l_quantity from lineitem where l_partkey=xxxx;
alter table lineitem add rollup rutest(l_partkey,l_suppkey,l_quantity);
explain select l_partkey,l_suppkey,l_quantity from lineitem where l_partkey= 10187238;
本页内容是否解决了您的问题?