tencent cloud

Feedback

Runtime Filter Use Instructions

Last updated: 2024-12-17 16:47:47
    HASH JOIN is a common database connection algorithm that accelerates the connection process using hash tables. It is generally divided into two phases, Build and Probe phases. When the data volume in the Probe phase is large and the output data volume is small, you can enable Runtime Filter to pre-filter some data, thereby improving the performance.
    
    The Runtime Filter of the read-only analysis engine consists of two parts, RF Build and RF Filter. RF Build is on the Build side of the HASH JOIN and is used to build a Runtime Filter. RF Filter is on the TableScan of the Probe side of the corresponding HASH JOIN table and is used to pre-filter data to improve the performance.

    Runtime Filter Types

    Local Runtime Filter

    Local Runtime Filter is generally used in scenarios where the data is not shuffled in JOIN. In this case, the Runtime Filter of the current build node can meet the needs of the Probe side, so the data of the Runtime Filter is directly passed to the Probe side for use without network transmission.
    
    As shown in the above figure, if the Build table data is not shuffled in JOIN, the Runtime Filter Build operator under the same plan will send the data built by it to the corresponding Runtime Filter Probe part of the current plan.

    Global Runtime Filter

    When the JOIN data is shuffled to different nodes for build, the Runtime Filter of the current build node cannot meet the filtering requirements. In this case, it is necessary to accept the Runtime Filters from other nodes, merge the data of Runtime Filters from all build nodes, and then use it.
    
    When the Build table data is shuffled in JOIN, the data built by the Runtime Filter Build operator in the current plan is incomplete. In this case, the Runtime Filter must accept the data built by the operator of the current plan and other data built in the same plan, and merge the data for use.

    Filter Type

    For selection of filter algorithms, one or more of the following filtering algorithms are selected generally based on the data distribution.
    Bloom Filter
    Bloom Filter as a classic filter algorithm, can determine whether data exists through several HASH functions. The size of the Bloom filter in Runtime Filters generally depends on the NDV of the data. Certainly, some false negatives may occur in the Bloom filter, that is, the data that needs to be filtered is not correctly filtered, but such data will also be filtered out in the Probe phase of JOIN.
    MIN_MAX Filter
    MIN_MAX Filter collects the maximum and minimum values of the data on the Build side and judges whether the data falls within this range during filtering. If not, the data needs to be filtered out. When the data on the Build side is distributed within a certain range, the filtering effect of the MIN_MAX Filter will be better.
    IN Filter
    IN Filter is used for scenarios with relatively small NDV values. In such scenarios, all values of this column are sent to the Probe side for matching.

    Runtime Filter in the Read-Only Analysis Engine

    Enabling or Disabling Runtime Filter

    By default, Runtime Filter is enabled for the read-only analysis engine. You can use the following statements to enable or disable it.
    mysql> set libra_enable_runtime_filter=ON;
    mysql> set libra_enable_runtime_filter=OFF;
    After enabling, the optimizer will evaluate the JOIN. If the conditions are met, the Runtime Filter will be automatically enabled.
    If you need to force all JOINs to enable Runtime Filter, you can configure the following parameters on the basis of the above parameters.
    mysql>SET libra_enable_cost_based_runtime_filter=OFF;

    Runtime Filter Plan

    As shown below, this is a Local RF plan. Three types of Runtime Filters are assigned on the JOIN. In this scenario, there is no data redistribution between the Build side and Probe side of the HASH JOIN.
    image.png
    
    
    The plan shown below is Global RF. There is data redistribution between the Build side and the Probe side. RF can pre-filter data before network transmission, to reduce the network transmission and subsequent JOIN overhead, thereby improving the performance.
    image.png
    

    Adjusting Runtime Filter Parameters

    Runtime Filter supports adjusting the following parameters.
    libra_enable_runtime_filter indicates whether to enable Runtime Filter.
    Attribute
    Description
    Parameter Type
    BOOL.
    Default Value
    ON.
    Value Range
    ON: Enable Runtime Filter.
    OFF: Disable Runtime Filter.
    Scope
    Global & Session.
    Supporting SET_VAR Hint
    Yes.
    libra_runtime_filter_type indicates the Runtime Filter type that can be assigned.
    Attribute
    Description
    Parameter Type
    VARCHAR.
    Default Value
    MIN_MAX, BLOOM_FILTER, IN_FILTER.
    Value Range
    BLOOM_FILTER: Builds a Bloom Filter for the JOIN KEY of the JOIN BUILD side to filter the Probe side data.
    MIN_MAX: Builds maximum and minimum values for the JOIN KEY of the JOIN BUILD side to filter the Probe side data.
    IN: Builds a value list for the JOIN KEY of the JOIN BUILD side to filter the Probe side data.
    Empty string: Indicates disabling the Runtime Filter feature.
    Scope
    Global & Session.
    Supporting SET_VAR Hint
    Yes.
    libra_enable_cost_based_runtime_filter indicates whether to enable cost-based Runtime Filter assignment. If disabled, all Runtime Filters will be generated by default.
    Attribute
    Description
    Parameter Type
    BOOL.
    Default Value
    ON.
    Value Range
    ON: Enable cost-based Runtime Filter assignment.
    OFF: Disable cost-based Runtime Filter assignment.
    Scope
    Global & Session.
    Supporting SET_VAR Hint
    Yes.
    libra_max_in_runtime_filter_ndv indicates the maximum NDV (Number of Distinct Values) on the BUILD side when an IN type Runtime Filter is generated in cost-based Runtime Filters.
    Attribute
    Description
    Parameter Type
    INT.
    Default Value
    1024.
    Value Range
    0-MaxValue.
    Scope
    Global & Session.
    Supporting SET_VAR Hint
    Yes.
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support