Basic Query Optimization
When querying a partition table, be sure to include the partition field.Explain + SQL can help users analyze and query data in several partitions and tablets.
It is best if the query SQL condition can hit the partition Key and bucket Key.
It is best if the query SQL condition can hit the prefix index.
Since Doris is a column-based database, when there are enough fields to query, the performance may be worse than row-based storage. It is recommended to select specific fields instead of * as much as possible when querying, and add a limit number at the end of the query.
When performing a Select operation, avoid writing function(column) = "xxxx" as much as possible; otherwise, the strength of Doris system in pushing down predicates cannot be exerted. The left side shall be the column name and the right side shall be a constant value that can be calculated and flattened.
Avoid using "or, union all" in queries as much as possible. In most scenes, consider using in instead of "or". .
For SQL queries of general data exploration, if not all data is needed, it is recommended to add a limit number for the records returned, which can also speed up the query.
Join Optimization
Shuffle mode optimization: Efficiency is Colocate join > Bucket Shuffle > Shuffle > BroadCast. For details, see Bucket Shuffle Join. RuntimeFilter : In a join query, in addition to the join condition, there are other filtering conditions on the right.
Use Rollup.
The query cannot cover the prefix index of the base table. The prefix index is formed by adjusting the Key order through Rollup.
Perform Key filtering aggregation on the Aggregate table.
Using Materialized View
If you often perform fixed-mode aggregate queries on a table, it is recommended to create a materialized view on this table.
It can be used in all scenes supported by Rollup.
An additional aggregation is formed for the Duplicate table.
Index Optimization
Bitmap index: Select a column with a relatively small value cardinality [100-100,000], where the query condition hits the column.
BloomFilter index: If you often perform precise point queries on a column and the column has a high cardinality, it is recommended to create a Bloom filter index on this column.
Use cache.
PageCache: This configuration is enabled by default.
SqlCache: This configuration is disabled by default. The effect is better when the concurrency is high and the query result set is small.
Was this page helpful?