HASH JOIN is a common database connection algorithm that accelerates the connection process using hash tables. It is generally divided into two phases, Build and Probe phases. When the data volume in the Probe phase is large and the output data volume is small, you can enable Runtime Filter to pre-filter some data, thereby improving the performance.
The Runtime Filter of the read-only analysis engine consists of two parts, RF Build and RF Filter. RF Build is on the Build side of the HASH JOIN and is used to build a Runtime Filter. RF Filter is on the TableScan of the Probe side of the corresponding HASH JOIN table and is used to pre-filter data to improve the performance.
Runtime Filter Types
Local Runtime Filter
Local Runtime Filter is generally used in scenarios where the data is not shuffled in JOIN. In this case, the Runtime Filter of the current build node can meet the needs of the Probe side, so the data of the Runtime Filter is directly passed to the Probe side for use without network transmission.
As shown in the above figure, if the Build table data is not shuffled in JOIN, the Runtime Filter Build operator under the same plan will send the data built by it to the corresponding Runtime Filter Probe part of the current plan.
Global Runtime Filter
When the JOIN data is shuffled to different nodes for build, the Runtime Filter of the current build node cannot meet the filtering requirements. In this case, it is necessary to accept the Runtime Filters from other nodes, merge the data of Runtime Filters from all build nodes, and then use it.
When the Build table data is shuffled in JOIN, the data built by the Runtime Filter Build operator in the current plan is incomplete. In this case, the Runtime Filter must accept the data built by the operator of the current plan and other data built in the same plan, and merge the data for use.
Filter Type
For selection of filter algorithms, one or more of the following filtering algorithms are selected generally based on the data distribution.
Bloom Filter
Bloom Filter as a classic filter algorithm, can determine whether data exists through several HASH functions. The size of the Bloom filter in Runtime Filters generally depends on the NDV of the data. Certainly, some false negatives may occur in the Bloom filter, that is, the data that needs to be filtered is not correctly filtered, but such data will also be filtered out in the Probe phase of JOIN.
MIN_MAX Filter
MIN_MAX Filter collects the maximum and minimum values of the data on the Build side and judges whether the data falls within this range during filtering. If not, the data needs to be filtered out. When the data on the Build side is distributed within a certain range, the filtering effect of the MIN_MAX Filter will be better.
IN Filter
IN Filter is used for scenarios with relatively small NDV values. In such scenarios, all values of this column are sent to the Probe side for matching.
Runtime Filter in the Read-Only Analysis Engine
Enabling or Disabling Runtime Filter
By default, Runtime Filter is enabled for the read-only analysis engine. You can use the following statements to enable or disable it.
mysql> set libra_enable_runtime_filter=ON;
mysql> set libra_enable_runtime_filter=OFF;
After enabling, the optimizer will evaluate the JOIN. If the conditions are met, the Runtime Filter will be automatically enabled.
If you need to force all JOINs to enable Runtime Filter, you can configure the following parameters on the basis of the above parameters.
mysql>SET libra_enable_cost_based_runtime_filter=OFF;
Runtime Filter Plan
As shown below, this is a Local RF plan. Three types of Runtime Filters are assigned on the JOIN. In this scenario, there is no data redistribution between the Build side and Probe side of the HASH JOIN.
The plan shown below is Global RF. There is data redistribution between the Build side and the Probe side. RF can pre-filter data before network transmission, to reduce the network transmission and subsequent JOIN overhead, thereby improving the performance.
Adjusting Runtime Filter Parameters
Runtime Filter supports adjusting the following parameters.
libra_enable_runtime_filter indicates whether to enable Runtime Filter.
|
Parameter Type | BOOL. |
Default Value | ON. |
Value Range | ON: Enable Runtime Filter. OFF: Disable Runtime Filter. |
Scope | Global & Session. |
Supporting SET_VAR Hint | Yes. |
libra_runtime_filter_type indicates the Runtime Filter type that can be assigned.
|
Parameter Type | VARCHAR. |
Default Value | MIN_MAX, BLOOM_FILTER, IN_FILTER. |
Value Range | BLOOM_FILTER: Builds a Bloom Filter for the JOIN KEY of the JOIN BUILD side to filter the Probe side data. MIN_MAX: Builds maximum and minimum values for the JOIN KEY of the JOIN BUILD side to filter the Probe side data. IN: Builds a value list for the JOIN KEY of the JOIN BUILD side to filter the Probe side data. Empty string: Indicates disabling the Runtime Filter feature. |
Scope | Global & Session. |
Supporting SET_VAR Hint | Yes. |
libra_enable_cost_based_runtime_filter indicates whether to enable cost-based Runtime Filter assignment. If disabled, all Runtime Filters will be generated by default.
|
Parameter Type | BOOL. |
Default Value | ON. |
Value Range | ON: Enable cost-based Runtime Filter assignment. OFF: Disable cost-based Runtime Filter assignment. |
Scope | Global & Session. |
Supporting SET_VAR Hint | Yes. |
libra_max_in_runtime_filter_ndv indicates the maximum NDV (Number of Distinct Values) on the BUILD side when an IN type Runtime Filter is generated in cost-based Runtime Filters.
|
Parameter Type | INT. |
Default Value | 1024. |
Value Range | 0-MaxValue. |
Scope | Global & Session. |
Supporting SET_VAR Hint | Yes. |
Was this page helpful?