Aggregate Function | SUM | MAX | SUM | COUNT | BITMAP_UNION | HLL_UNION |
Rollup | true | true | true | false | false | false |
Materialized View | true | true | true | true | true | true |
HELP CREATE MATERIALIZED VIEW
BITMAP_UNION(TO_BITMAP(COLUMN))
where the column type can only be an integer (largeint is not supported), or BITMAP_UNION(COLUMN)
with the base table as the AGG model.HLL_UNION(HLL_HASH(COLUMN))
where the column type cannot be DECIMAL, or HLL_UNION(COLUMN)
with the base table as the AGG model.Aggregate in the Materialized View | Aggregate in the Query |
sum | sum |
min | min |
max | max |
count | count |
bitmap_union | bitmap_union,bitmap_union_count, count(distinct) |
hll_union | hll_raw_agg, hll_union_agg, ndv, approx_count_distinct |
MySQL [test]> desc mv_test all;+-----------+---------------+-----------------+----------+------+-------+---------+--------------+| IndexName | IndexKeysType | Field | Type | Null | Key | Default | Extra |+-----------+---------------+-----------------+----------+------+-------+---------+--------------+| mv_test | DUP_KEYS | k1 | INT | Yes | true | NULL | || | | k2 | BIGINT | Yes | true | NULL | || | | k3 | LARGEINT | Yes | true | NULL | || | | k4 | SMALLINT | Yes | false | NULL | NONE || | | | | | | | || mv_2 | AGG_KEYS | k2 | BIGINT | Yes | true | NULL | || | | k4 | SMALLINT | Yes | false | NULL | MIN || | | k1 | INT | Yes | false | NULL | MAX || | | | | | | | || mv_3 | AGG_KEYS | k1 | INT | Yes | true | NULL | || | | to_bitmap(`k2`) | BITMAP | No | false | | BITMAP_UNION || | | | | | | | || mv_1 | AGG_KEYS | k4 | SMALLINT | Yes | true | NULL | || | | k1 | BIGINT | Yes | false | NULL | SUM || | | k3 | LARGEINT | Yes | false | NULL | SUM || | | k2 | BIGINT | Yes | false | NULL | MIN |+-----------+---------------+-----------------+----------+------+-------+---------+--------------+
mv_test
table: mv_1, mv_2 and mv_3, as well as their table structure.DROP MATERIALIZED VIEW
HELP DROP MATERIALIZED VIEW
SHOW CREATE MATERIALIZED VIEW
create table sales_records(record_id int, seller_id int, store_id int, sale_date date, sale_amt bigint) distributed by hash(record_id) properties("replication_num" = "1");
sales_records
table is as follows:MySQL [test]> desc sales_records;+-----------+--------+------+-------+---------+-------+| Field | Type | Null | Key | Default | Extra |+-----------+--------+------+-------+---------+-------+| record_id | INT | Yes | true | NULL | || seller_id | INT | Yes | true | NULL | || store_id | INT | Yes | true | NULL | || sale_date | DATE | Yes | false | NULL | NONE || sale_amt | BIGINT | Yes | false | NULL | NONE |+-----------+--------+------+-------+---------+-------+
sales_records
table, which groups by the selling store and sums the sales of the same store. The command to create the materialized view is as follows:MySQL [test]> create materialized view store_amt as select store_id, sum(sale_amt) from sales_records group by store_id;
Query OK, 0 rows affected (0.012 sec)
SHOW ALTER TABLE ROLLUP FROM db_name; (Version 0.12)SHOW ALTER TABLE MATERIALIZED VIEW FROM db_name; (Version 0.13)
db_name
is a parameter, and you need to replace it with your actual database name. The command displays all the tasks of creating materialized views in this database. The result is as follows:+-------+---------------+---------------------+---------------------+---------------+-----------------+----------+---------------+-----------+-------------------------------------------------------------------------------------------------------------------------+----------+---------+| JobId | TableName | CreateTime | FinishedTime | BaseIndexName | RollupIndexName | RollupId | TransactionId | State | Msg | Progress | Timeout |+-------+---------------+---------------------+---------------------+---------------+-----------------+----------+---------------+-----------+-------------------------------------------------------------------------------------------------------------------------+----------+---------+| 22036 | sales_records | 2020-07-30 20:04:28 | 2020-07-30 20:04:57 | sales_records | store_amt | 22037 | 5008 | FINISHED | | NULL | 86400 |+-------+---------------+---------------------+---------------------+---------------+-----------------+----------+---------------+-----------+-------------------------------------------------------------------------------------------------------------------------+----------+---------+
store_amt
. This achieves the effect of improving the efficiency of queries.
Users still specify the sales_records
table for their queries, for example:SELECT store_id, sum(sale_amt) FROM sales_records GROUP BY store_id;
store_amt
. Users can check whether the current query has been matched to the appropriate materialized view with the following command.EXPLAIN SELECT store_id, sum(sale_amt) FROM sales_records GROUP BY store_id;+-----------------------------------------------------------------------------+| Explain String |+-----------------------------------------------------------------------------+| PLAN FRAGMENT 0 || OUTPUT EXPRS:<slot 2> `store_id` | <slot 3> sum(`sale_amt`) || PARTITION: UNPARTITIONED || || RESULT SINK || || 4:EXCHANGE || || PLAN FRAGMENT 1 || OUTPUT EXPRS: || PARTITION: HASH_PARTITIONED: <slot 2> `store_id` || || STREAM DATA SINK || EXCHANGE ID: 04 || UNPARTITIONED || || 3:AGGREGATE (merge finalize) || | output: sum(<slot 3> sum(`sale_amt`)) || | group by: <slot 2> `store_id` || | || 2:EXCHANGE || || PLAN FRAGMENT 2 || OUTPUT EXPRS: || PARTITION: RANDOM || || STREAM DATA SINK || EXCHANGE ID: 02 || HASH_PARTITIONED: <slot 2> `store_id` || || 1:AGGREGATE (update serialize) || | STREAMING || | output: sum(`sale_amt`) || | group by: `store_id` || | || 0:OlapScanNode || TABLE: sales_records || PREAGGREGATION: ON || partitions=1/1 || rollup: store_amt || tabletRatio=10/10 || tabletList=22038,22040,22042,22044,22046,22048,22050,22052,22054,22056 || cardinality=0 || avgRowSize=0.0 || numNodes=1 |+-----------------------------------------------------------------------------+45 rows in set (0.006 sec)
store_amt
. In other words, the query has correctly matched the materialized view store_amt
and directly reads data from it.bitmap_union
materialized view.
First, create a table to store ad click data details, including each click event, what ad was clicked, what channel was clicked, and who the user is.MySQL [test]> create table advertiser_view_record(time date, advertiser varchar(10), channel varchar(10), user_id int) distributed by hash(time) properties("replication_num" = "1");Query OK, 0 rows affected (0.014 sec)
MySQL [test]> desc advertiser_view_record;+------------+-------------+------+-------+---------+-------+| Field | Type | Null | Key | Default | Extra |+------------+-------------+------+-------+---------+-------+| time | DATE | Yes | true | NULL | || advertiser | VARCHAR(10) | Yes | true | NULL | || channel | VARCHAR(10) | Yes | false | NULL | NONE || user_id | INT | Yes | false | NULL | NONE |+------------+-------------+------+-------+---------+-------+4 rows in set (0.001 sec)
SELECT advertiser, channel, count(distinct user_id) FROM advertiser_view_record GROUP BY advertiser, channel;
bitmap_union
to achieve a pre-accurate deduplication effect.
In Doris, the result of count(distinct)
Aggregate is completely consistent with the result of bitmap_union_count
Aggregate. The bitmap_union_count
is equal to the count of the bitmap_union
result, so if is involved in the query, count(distinct)
, by creating a materialized view with bitmap_union
Aggregate can speed up the query.
For this case, one can create a materialized view that groups by advertising and channel, and user_id
for precise deduplication.MySQL [test]> create materialized view advertiser_uv as select advertiser, channel, bitmap_union(to_bitmap(user_id)) from advertiser_view_record group by advertiser, channel;Query OK, 0 rows affected (0.012 sec)
to_bitmap
in Doris before it can be bitmap_union
aggregate.MySQL [test]> desc advertiser_view_record all;+------------------------+---------------+----------------------+-------------+------+-------+---------+--------------+| IndexName | IndexKeysType | Field | Type | Null | Key | Default | Extra |+------------------------+---------------+----------------------+-------------+------+-------+---------+--------------+| advertiser_view_record | DUP_KEYS | time | DATE | Yes | true | NULL | || | | advertiser | VARCHAR(10) | Yes | true | NULL | || | | channel | VARCHAR(10) | Yes | false | NULL | NONE || | | user_id | INT | Yes | false | NULL | NONE || | | | | | | | || advertiser_uv | AGG_KEYS | advertiser | VARCHAR(10) | Yes | true | NULL | || | | channel | VARCHAR(10) | Yes | true | NULL | || | | to_bitmap(`user_id`) | BITMAP | No | false | | BITMAP_UNION |+------------------------+---------------+----------------------+-------------+------+-------+---------+--------------+
advertiser_uv
. For example, the original query statement is as follows:SELECT advertiser, channel, count(distinct user_id) FROM advertiser_view_record GROUP BY advertiser, channel;
SELECT advertiser, channel, bitmap_union_count(to_bitmap(user_id)) FROM advertiser_uv GROUP BY advertiser, channel;
MySQL [test]> explain SELECT advertiser, channel, count(distinct user_id) FROM advertiser_view_record GROUP BY advertiser, channel;+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+| Explain String |+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+| PLAN FRAGMENT 0 || OUTPUT EXPRS:<slot 7> `advertiser` | <slot 8> `channel` | <slot 9> bitmap_union_count(`default_cluster:test`.`advertiser_view_record`.`mv_bitmap_union_user_id`) || PARTITION: UNPARTITIONED || || RESULT SINK || || 4:EXCHANGE || || PLAN FRAGMENT 1 || OUTPUT EXPRS: || PARTITION: HASH_PARTITIONED: <slot 4> `advertiser`, <slot 5> `channel` || || STREAM DATA SINK || EXCHANGE ID: 04 || UNPARTITIONED || || 3:AGGREGATE (merge finalize) || | output: bitmap_union_count(<slot 6> bitmap_union_count(`default_cluster:test`.`advertiser_view_record`.`mv_bitmap_union_user_id`)) || | group by: <slot 4> `advertiser`, <slot 5> `channel` || | || 2:EXCHANGE || || PLAN FRAGMENT 2 || OUTPUT EXPRS: || PARTITION: RANDOM || || STREAM DATA SINK || EXCHANGE ID: 02 || HASH_PARTITIONED: <slot 4> `advertiser`, <slot 5> `channel` || || 1:AGGREGATE (update serialize) || | STREAMING || | output: bitmap_union_count(`default_cluster:test`.`advertiser_view_record`.`mv_bitmap_union_user_id`) || | group by: `advertiser`, `channel` || | || 0:OlapScanNode || TABLE: advertiser_view_record || PREAGGREGATION: ON || partitions=1/1 || rollup: advertiser_uv || tabletRatio=10/10 || tabletList=22084,22086,22088,22090,22092,22094,22096,22098,22100,22102 || cardinality=0 || avgRowSize=0.0 || numNodes=1 |+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+45 rows in set (0.030 sec)
user_id
field, count(distinct)
is rewritten as bitmap_union_count(to_bitmap)
. It achieves the effect of accurate de-duplication through the bitmap.where k1=1 and k2=2
, the query can be accelerated through the index.
However, in some cases, user filtering conditions can't match the prefix index, for example, where k3=3
. Hence, it will not be able to improve query speed through the index. Creating a materialized view with k3 as the first column can solve this problem.CREATE MATERIALIZED VIEW mv_1 as SELECT k3, k2, k1 FROM tableA ORDER BY k3;
MySQL [test]> desc tableA all;+-----------+---------------+-------+------+------+-------+---------+-------+| IndexName | IndexKeysType | Field | Type | Null | Key | Default | Extra |+-----------+---------------+-------+------+------+-------+---------+-------+| tableA | DUP_KEYS | k1 | INT | Yes | true | NULL | || | | k2 | INT | Yes | true | NULL | || | | k3 | INT | Yes | true | NULL | || | | | | | | | || mv_1 | DUP_KEYS | k3 | INT | Yes | true | NULL | || | | k2 | INT | Yes | false | NULL | NONE || | | k1 | INT | Yes | false | NULL | NONE |+-----------+---------------+-------+------+------+-------+---------+-------+
select k1, k2, k3 from table A where k3=3;
Was this page helpful?