聚合函数 | SUM | MAX | SUM | COUNT | BITMAP_UNION | HLL_UNION |
Rollup | true | true | true | false | false | false |
物化视图 | true | true | true | true | true | true |
HELP CREATE MATERIALIZED VIEW
BITMAP_UNION(TO_BITMAP(COLUMN))
column 列的类型只能是整数(largeint 也不支持),或者 BITMAP_UNION(COLUMN)
且 base 表为 AGG 模型。HLL_UNION(HLL_HASH(COLUMN))
column 列的类型不能是 DECIMAL ,或者 HLL_UNION(COLUMN)
且 base 表为 AGG 模型。物化视图聚合 | 查询中聚合 |
sum | sum |
min | min |
max | max |
count | count |
bitmap_union | bitmap_union,bitmap_union_count, count(distinct) |
hll_union | hll_raw_agg, hll_union_agg, ndv, approx_count_distinct |
MySQL [test]> desc mv_test all;+-----------+---------------+-----------------+----------+------+-------+---------+--------------+| IndexName | IndexKeysType | Field | Type | Null | Key | Default | Extra |+-----------+---------------+-----------------+----------+------+-------+---------+--------------+| mv_test | DUP_KEYS | k1 | INT | Yes | true | NULL | || | | k2 | BIGINT | Yes | true | NULL | || | | k3 | LARGEINT | Yes | true | NULL | || | | k4 | SMALLINT | Yes | false | NULL | NONE || | | | | | | | || mv_2 | AGG_KEYS | k2 | BIGINT | Yes | true | NULL | || | | k4 | SMALLINT | Yes | false | NULL | MIN || | | k1 | INT | Yes | false | NULL | MAX || | | | | | | | || mv_3 | AGG_KEYS | k1 | INT | Yes | true | NULL | || | | to_bitmap(`k2`) | BITMAP | No | false | | BITMAP_UNION || | | | | | | | || mv_1 | AGG_KEYS | k4 | SMALLINT | Yes | true | NULL | || | | k1 | BIGINT | Yes | false | NULL | SUM || | | k3 | LARGEINT | Yes | false | NULL | SUM || | | k2 | BIGINT | Yes | false | NULL | MIN |+-----------+---------------+-----------------+----------+------+-------+---------+--------------+
mv_test
表一共有三张物化视图:mv_1,mv_2 和 mv_3,以及他们的表结构。DROP MATERIALIZED VIEW
HELP DROP MATERIALIZED VIEW
SHOW CREATE MATERIALIZED VIEW
create table sales_records(record_id int, seller_id int, store_id int, sale_date date, sale_amt bigint) distributed by hash(record_id) properties("replication_num" = "1");
sales_records
的表结构如下:MySQL [test]> desc sales_records;+-----------+--------+------+-------+---------+-------+| Field | Type | Null | Key | Default | Extra |+-----------+--------+------+-------+---------+-------+| record_id | INT | Yes | true | NULL | || seller_id | INT | Yes | true | NULL | || store_id | INT | Yes | true | NULL | || sale_date | DATE | Yes | false | NULL | NONE || sale_amt | BIGINT | Yes | false | NULL | NONE |+-----------+--------+------+-------+---------+-------+
sales_records
表创建一张以售卖门店分组,对相同售卖门店的销售额求和的一个物化视图。创建语句如下:MySQL [test]> create materialized view store_amt as select store_id, sum(sale_amt) from sales_records group by store_id;
Query OK, 0 rows affected (0.012 sec)
SHOW ALTER TABLE ROLLUP FROM db_name; (Version 0.12)SHOW ALTER TABLE MATERIALIZED VIEW FROM db_name; (Version 0.13)
db_name
是一个参数,您需要替换成自己真实的 db 名称。命令的结果是显示这个 db 的所有创建物化视图的任务。结果如下:+-------+---------------+---------------------+---------------------+---------------+-----------------+----------+---------------+-----------+-------------------------------------------------------------------------------------------------------------------------+----------+---------+| JobId | TableName | CreateTime | FinishedTime | BaseIndexName | RollupIndexName | RollupId | TransactionId | State | Msg | Progress | Timeout |+-------+---------------+---------------------+---------------------+---------------+-----------------+----------+---------------+-----------+-------------------------------------------------------------------------------------------------------------------------+----------+---------+| 22036 | sales_records | 2020-07-30 20:04:28 | 2020-07-30 20:04:57 | sales_records | store_amt | 22037 | 5008 | FINISHED | | NULL | 86400 |+-------+---------------+---------------------+---------------------+---------------+-----------------+----------+---------------+-----------+-------------------------------------------------------------------------------------------------------------------------+----------+---------+
store_amt
中读取聚合好的数据。达到提升查询效率的效果。
用户的查询依旧指定查询 sales_records
表,例如:SELECT store_id, sum(sale_amt) FROM sales_records GROUP BY store_id;
store_amt
。用户可以通过下面命令,检验当前查询是否匹配到了合适的物化视图。EXPLAIN SELECT store_id, sum(sale_amt) FROM sales_records GROUP BY store_id;+-----------------------------------------------------------------------------+| Explain String |+-----------------------------------------------------------------------------+| PLAN FRAGMENT 0 || OUTPUT EXPRS:<slot 2> `store_id` | <slot 3> sum(`sale_amt`) || PARTITION: UNPARTITIONED || || RESULT SINK || || 4:EXCHANGE || || PLAN FRAGMENT 1 || OUTPUT EXPRS: || PARTITION: HASH_PARTITIONED: <slot 2> `store_id` || || STREAM DATA SINK || EXCHANGE ID: 04 || UNPARTITIONED || || 3:AGGREGATE (merge finalize) || | output: sum(<slot 3> sum(`sale_amt`)) || | group by: <slot 2> `store_id` || | || 2:EXCHANGE || || PLAN FRAGMENT 2 || OUTPUT EXPRS: || PARTITION: RANDOM || || STREAM DATA SINK || EXCHANGE ID: 02 || HASH_PARTITIONED: <slot 2> `store_id` || || 1:AGGREGATE (update serialize) || | STREAMING || | output: sum(`sale_amt`) || | group by: `store_id` || | || 0:OlapScanNode || TABLE: sales_records || PREAGGREGATION: ON || partitions=1/1 || rollup: store_amt || tabletRatio=10/10 || tabletList=22038,22040,22042,22044,22046,22048,22050,22052,22054,22056 || cardinality=0 || avgRowSize=0.0 || numNodes=1 |+-----------------------------------------------------------------------------+45 rows in set (0.006 sec)
store_amt
。也就是说查询已经正确匹配到物化视图 store_amt
,并直接从物化视图中读取数据了。bitmap_union
的物化视图来提升查询速度。
通过下面语句首先创建一个存储广告点击数据明细的表,包含每条点击的点击事件,点击的是什么广告,通过什么渠道点击,以及点击的用户是谁。MySQL [test]> create table advertiser_view_record(time date, advertiser varchar(10), channel varchar(10), user_id int) distributed by hash(time) properties("replication_num" = "1");Query OK, 0 rows affected (0.014 sec)
MySQL [test]> desc advertiser_view_record;+------------+-------------+------+-------+---------+-------+| Field | Type | Null | Key | Default | Extra |+------------+-------------+------+-------+---------+-------+| time | DATE | Yes | true | NULL | || advertiser | VARCHAR(10) | Yes | true | NULL | || channel | VARCHAR(10) | Yes | false | NULL | NONE || user_id | INT | Yes | false | NULL | NONE |+------------+-------------+------+-------+---------+-------+4 rows in set (0.001 sec)
SELECT advertiser, channel, count(distinct user_id) FROM advertiser_view_record GROUP BY advertiser, channel;
bitmap_union
的物化视图从而达到一个预先精确去重的效果。
在 Doris 中,count(distinct)
聚合的结果和 bitmap_union_count
聚合的结果是完全一致的。而bitmap_union_count
等于 bitmap_union
的结果求 count,所以如果查询中涉及到 count(distinct)
则通过创建带 bitmap_union
聚合的物化视图方可加快查询。
针对这个 case,则可以创建一个根据广告和渠道分组,对 user_id
进行精确去重的物化视图。MySQL [test]> create materialized view advertiser_uv as select advertiser, channel, bitmap_union(to_bitmap(user_id)) from advertiser_view_record group by advertiser, channel;Query OK, 0 rows affected (0.012 sec)
to_bitmap
转换为 bitmap 类型然后才可以进行 bitmap_union
聚合。MySQL [test]> desc advertiser_view_record all;+------------------------+---------------+----------------------+-------------+------+-------+---------+--------------+| IndexName | IndexKeysType | Field | Type | Null | Key | Default | Extra |+------------------------+---------------+----------------------+-------------+------+-------+---------+--------------+| advertiser_view_record | DUP_KEYS | time | DATE | Yes | true | NULL | || | | advertiser | VARCHAR(10) | Yes | true | NULL | || | | channel | VARCHAR(10) | Yes | false | NULL | NONE || | | user_id | INT | Yes | false | NULL | NONE || | | | | | | | || advertiser_uv | AGG_KEYS | advertiser | VARCHAR(10) | Yes | true | NULL | || | | channel | VARCHAR(10) | Yes | true | NULL | || | | to_bitmap(`user_id`) | BITMAP | No | false | | BITMAP_UNION |+------------------------+---------------+----------------------+-------------+------+-------+---------+--------------+
advertiser_uv
中查询数据。例如原始的查询语句如下:SELECT advertiser, channel, count(distinct user_id) FROM advertiser_view_record GROUP BY advertiser, channel;
SELECT advertiser, channel, bitmap_union_count(to_bitmap(user_id)) FROM advertiser_uv GROUP BY advertiser, channel;
MySQL [test]> explain SELECT advertiser, channel, count(distinct user_id) FROM advertiser_view_record GROUP BY advertiser, channel;+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+| Explain String |+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+| PLAN FRAGMENT 0 || OUTPUT EXPRS:<slot 7> `advertiser` | <slot 8> `channel` | <slot 9> bitmap_union_count(`default_cluster:test`.`advertiser_view_record`.`mv_bitmap_union_user_id`) || PARTITION: UNPARTITIONED || || RESULT SINK || || 4:EXCHANGE || || PLAN FRAGMENT 1 || OUTPUT EXPRS: || PARTITION: HASH_PARTITIONED: <slot 4> `advertiser`, <slot 5> `channel` || || STREAM DATA SINK || EXCHANGE ID: 04 || UNPARTITIONED || || 3:AGGREGATE (merge finalize) || | output: bitmap_union_count(<slot 6> bitmap_union_count(`default_cluster:test`.`advertiser_view_record`.`mv_bitmap_union_user_id`)) || | group by: <slot 4> `advertiser`, <slot 5> `channel` || | || 2:EXCHANGE || || PLAN FRAGMENT 2 || OUTPUT EXPRS: || PARTITION: RANDOM || || STREAM DATA SINK || EXCHANGE ID: 02 || HASH_PARTITIONED: <slot 4> `advertiser`, <slot 5> `channel` || || 1:AGGREGATE (update serialize) || | STREAMING || | output: bitmap_union_count(`default_cluster:test`.`advertiser_view_record`.`mv_bitmap_union_user_id`) || | group by: `advertiser`, `channel` || | || 0:OlapScanNode || TABLE: advertiser_view_record || PREAGGREGATION: ON || partitions=1/1 || rollup: advertiser_uv || tabletRatio=10/10 || tabletList=22084,22086,22088,22090,22092,22094,22096,22098,22100,22102 || cardinality=0 || avgRowSize=0.0 || numNodes=1 |+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+45 rows in set (0.030 sec)
user_id
字段求 count(distinct)
被改写为求 bitmap_union_count(to_bitmap)
。也就是通过 bitmap 的方式来达到精确去重的效果。where k1=1 and k2=2
就能通过索引加速查询。
但是有些情况下,用户的过滤条件无法匹配到前缀索引,例如 where k3=3
。则无法通过索引提升查询速度。创建以 k3 作为第一列的物化视图就可以解决这个问题。CREATE MATERIALIZED VIEW mv_1 as SELECT k3, k2, k1 FROM tableA ORDER BY k3;
MySQL [test]> desc tableA all;+-----------+---------------+-------+------+------+-------+---------+-------+| IndexName | IndexKeysType | Field | Type | Null | Key | Default | Extra |+-----------+---------------+-------+------+------+-------+---------+-------+| tableA | DUP_KEYS | k1 | INT | Yes | true | NULL | || | | k2 | INT | Yes | true | NULL | || | | k3 | INT | Yes | true | NULL | || | | | | | | | || mv_1 | DUP_KEYS | k3 | INT | Yes | true | NULL | || | | k2 | INT | Yes | false | NULL | NONE || | | k1 | INT | Yes | false | NULL | NONE |+-----------+---------------+-------+------+------+-------+---------+-------+
select k1, k2, k3 from table A where k3=3;
本页内容是否解决了您的问题?