This document mainly introduces the currently supported variables. For variables in Doris, refer to the variable settings in MySQL. However, some variables are only used for compatibility with some MySQL client protocols, having no practical effect in MySQL database.
Variable Settings and View
View
You can view all or specific variables using the SHOW VARIABLES [LIKE 'xxx'];
command. For example:
SHOW VARIABLES;
SHOW VARIABLES LIKE '%time_zone%';
Settings
Some variables can be set to take effect globally or only for the current session. After global settings take effect, subsequent new session connections will continue to use the set value. If you set the variable only for the current session, the variable only affects the current session.
To take effect only in the current session, use the SET var_name=xxx;
command to set the variable. For example:
SET exec_mem_limit = 137438953472;
SET forward_to_master = true;
SET time_zone = "Asia/Shanghai";
To take effect globally, set the variable using the SET GLOBAL var_name=xxx;
command. For example:
SET GLOBAL exec_mem_limit = 137438953472
Note
Only ADMIN users can set a variable to take effect globally.
Globally effective variables do not affect variables in the current session, but only variables in new sessions.
Variables that can take effect either in the current session or globally include:
time_zone
wait_timeout
sql_mode
enable_profile
query_timeout
exec_mem_limit
batch_size
allow_partition_column_nullable
insert_visible_timeout_ms
enable_fold_constant_by_be
Variables that only take effect globally include:
default_rowset_type
In addition, variable settings also support constant expressions. For example:
SET exec_mem_limit = 10 * 1024 * 1024 * 1024;
SET forward_to_master = concat('tr', 'u', 'e');
Setting Variables in Query Statement
In some scenarios, we may need to set variables specifically for some queries.
Using the SET_VAR hint, you can set session variables in the queries (effective within a single statement). For example:
SELECT /*+ SET_VAR(exec_mem_limit = 8589934592) */ name FROM people ORDER BY name;
SELECT /*+ SET_VAR(query_timeout = 1, enable_partition_cache=true) */ sleep(3);
Note
The comment must start with /*+ and can only follow SELECT.
Supported Variables
SQL_AUTO_IS_NULL
This is for compatibility with the JDBC C3P0 connection pool. It has no practical effect.
auto_increment_increment
This is for compatibility with the MySQL client. It has no practical effect.
autocommit
This is for compatibility with the MySQL client. It has no practical effect.
batch_size
This is for specifying the number of rows in a single data packet transferred between nodes during query execution. By default, a data packet contains 1024 rows, i.e., the source node packs and sends the data to the destination node after generating 1024 rows of data.
A larger row number can improve the query throughput in big data scenarios, but it may also increase query latency in small-size query scenarios. At the same time, it would also increase the memory overhead of the query. It is recommended to set the range between 1024 and 4096.
character_set_client
This is for compatibility with the MySQL client. It has no practical effect.
character_set_connection
This is for compatibility with the MySQL client. It has no practical effect.
character_set_results
This is for compatibility with the MySQL client. It has no practical effect.
character_set_server
This is for compatibility with the MySQL client. It has no practical effect.
codegen_level
This is for setting the LLVM codegen level (currently ineffective).
collation_connection
This is for compatibility with the MySQL client. It has no practical effect.
collation_database
This is for compatibility with the MySQL client. It has no practical effect.
collation_server
This is for compatibility with the MySQL client. It has no practical effect.
delete_without_partition
When it is set to true, if you use the delete command to delete partition table data, you can do it without specifying the partition. The delete operation will automatically apply to all partitions.
But note that automatic application to all partitions might cause the delete command to trigger a large number of subtasks, resulting in longer execution time. If not necessary, it is not recommended to enable this.
disable_colocate_join
This is for controlling whether to enable the Colocation Join feature. False means that this feature is enabled, which is set by default. True means that this feature is disabled. When this feature is disabled, the query plan will not attempt to perform a Colocation Join.
enable_bucket_shuffle_join
This is for controlling whether to enable the Bucket Shuffle Join feature. True means that this feature is enabled, which is set by default. False means that this feature is disabled. When this feature is disabled, the query plan will not attempt to perform a Bucket Shuffle Join.
disable_streaming_preaggregations
This is for controlling whether to enable streaming pre-aggregations. False means that this feature is enabled, which is set by default. The setting cannot be modified currently.
enable_insert_strict
This is for setting whether to enable the Strict
mode during data import by INSERT statements. It defaults to false, i.e., the strict
mode is not enabled.
enable_spilling
This is for setting whether to enable the big data disk spilling sort. It defaults to false, which means that this feature is disabled. When the ORDER BY subclause's LIMIT condition is not specified by the user and enable_spilling
is set to true, the disk spilling sort will be enabled. After this feature is enabled, it uses the doris-scratch/
directory under BE data directory to store temporary spilling data, and will clear the temporary data after the query ends.
This feature is mainly used for sorting big data with limited memory.
Note
This feature is experimental and unstable. Enable it with caution.
exec_mem_limit
This is for setting the memory limit for a single query. It defaults to 2 GB. The unit can be B/K/KB/M/MB/G/GB/T/TB/P/PB, with B by default.
This parameter is used to limit the memory that a single instance can use in a query plan. A query plan may have multiple instances, and a BE node may execute one or more instances. This parameter cannot accurately limit a query's memory usage throughout the entire cluster, nor can it accurately limit a query's memory usage on a single BE node. The specific memory usage should be determined based on the generated query plan.
Usually, only some blocking nodes (such as sort nodes, aggregate nodes, and join nodes) consume more memory, whereas other nodes (such as scan nodes) stream data through and do not use much memory.
When the Memory Exceed Limit
error occurs, you can attempt to increase this parameter value exponentially like 4 GB, 8 GB, and 16 GB.
forward_to_master
This is for setting whether to forward some show category commands to the Master FE node for execution. It defaults to true
, meaning forwarding. There are multiple FE nodes in Doris, one of which is the Master node. Usually, users can connect to any FE node to conduct full feature operations. But for certain information view directive, only the Master FE node can access detailed information.
For a command like SHOW BACKENDS;
, if it is not forwarded to the Master FE node, the command can only show basic information like whether a node is alive or not. If it is forwarded to the Master FE, it can access more detailed information including the node's startup time and last heartbeat time.
Commands that are currently affected by this parameter are listed below:
1.1 SHOW FRONTENDS;
Forwarding to Master can view the last heartbeat information.
1.2 SHOW BACKENDS;
Forwarding to Master can view the startup time, last heartbeat information, and disk capacity information.
1.3 SHOW BROKER;
Forwarding to Master can view the startup time and last heartbeat information.
1.4 SHOW TABLET;
/ADMIN SHOW REPLICA DISTRIBUTION;
/ADMIN SHOW REPLICA STATUS;
Forwarding to Master can view tablet data in Storage Master FE metadata. Under normal circumstances, tablet information in different FE metadata should be consistent. When issues arise, you can use this method to compare the difference between the current FE and Master FE metadata.
1.5 SHOW PROC;
Forwarding to Master can view PROC information related to storage in the Master FE metadata. It's mainly used for metadata comparison.
init_connect
This is for compatibility with the MySQL client. It has no practical effect.
interactive_timeout
This is for compatibility with the MySQL client. It has no practical effect.
enable_profile
This is for setting whether to view the query profile. It defaults to false, meaning the profile is not needed.
By default, only when an error occurs in a query will BE send a profile to FE for exception checking. A normally ended query will not send a profile. Sending a profile will incur some network overhead, which is not beneficial for high-concurrency query scenarios.
To analyze a query profile, you can set this variable to true and send the query. After the query ends, you can view the profile on the current FE's web page by visiting: fe_host:fe_http_port/query
,
Here you can view the profile of the most recent 100 queries that have enable_profile
enabled.
language
This is for compatibility with the MySQL client. It has no practical effect.
license
This is for showing Doris's License only.
load_mem_limit
This is for specifying the memory limit for import operations. By default, it is set to 0, which means not using this variable but using exec_mem_limit
as the memory limit for import operations.
This variable is only used for INSERT operations. Because INSERT operations have both query and import parts, if this variable is not set, the memory limit for both query and import is exec_mem_limit
. If it is set, the memory limit of the query part of INSERT is exec_mem_limit
and the import part is load_mem_limit
.
For other types of import, such as BROKER LOAD and STREAM LOAD, the memory limit still is exec_mem_limit
.
lower_case_table_names
This is for controlling whether the user's table name is case-sensitive.
When it is set to 0, the table names are case-sensitive. The value is set to 0 by default.
When it is set to 1, the table names are case-insensitive, and doris will convert the table names to lowercase during storage and querying. Advantage: You can use any case of the table name in a sentence. The following SQL is correct:
mysql> show tables;
+------------------+
| Tables_in_testdb |
+------------------+
| cost |
+------------------+
mysql> select * from COST where COst.id < 100 order by cost.id;
Disadvantage: After creating a table, you cannot get the specified table name in the command. show tables
will display the table name in lowercase.
When it is set to 2, the table name is not case sensitive. Doris stores the specified table name in the created table command and converts it to lowercase for comparison during query.
Advantage: show tables
shows the table name specified in the table creation command; Disadvantage: Only one case of the table name can be used in the same statement. For example, the cost
table uses the table name COST
for query:
mysql> select * from COST where COST.id < 100 order by COST.id;
This variable is compatible with MySQL. It needs to be specified at the initialization of the cluster through fe.conf lower_case_table_names=
for configuration. After the cluster initialization is complete, this variable cannot be modified by the set
statement, nor can it be modified by rebooting or upgrading the cluster.
The system view table names in the information_schema are case-insensitive. When the value of lower_case_table_names
is 0, it displays as 2.
max_allowed_packet
This is for compatibility with the JDBC connection pool C3P0. It has no practical effect.
max_pushdown_conditions_per_column
This variable is set to -1 by default, which means using the configuration value in be.conf
. If set to a value greater than 0, the query in the current session will use this variable value and the configuration value in be.conf
will be ignored.
max_scan_key_num
This variable is set to -1 by default, which means using the configuration value in be.conf
. If set to a value greater than 0, the query in the current session will use this variable value and the configuration value in be.conf
will be ignored.
net_buffer_length
This is for compatibility with the MySQL client. It has no practical effect.
net_read_timeout
This is for compatibility with the MySQL client. It has no practical effect.
net_write_timeout
This is for compatibility with the MySQL client. It has no practical effect.
parallel_exchange_instance_num
This is for setting the number of exchange nodes that a higher level node uses to receive data from the lower level node in the execution plan. By default, it is -1, meaning the number of exchange nodes equals the number of execution instances of the lower level node (default behavior). When it is set to a value greater than 0 and less than the number of execution instances of the lower level node, the number of exchange nodes will be the set value.
In a distributed query execution plan, the higher level node typically uses one or more exchange nodes to receive data from instances of the lower level node executing on different BEs. Normally, the number of exchange nodes equals the number of execution instances of the lower level node.
In some aggregate query scenarios, if a large amount of data needs to be scanned at the bottom level, but the data quantity post-aggregation is small, you can try to set this variable to a smaller value to reduce resource consumption. For example, aggregate query on DUPLICATE KEY detail model.
parallel_fragment_exec_instance_num
For node scanning, you can set the number of execution instances on each BE node. It defaults to 1.
A query plan often results in a set of scan ranges, and they are data ranges that need to be scanned. This data is distributed across multiple BE nodes. A BE node may have one or more scan ranges. By default, a set of scan ranges on each BE node is processed by only one execution instance. When machine resources are sufficient, you can increase this variable for more execution instances to process a set of scan ranges simultaneously, thereby improving query efficiency.
The number of scan instances determines the number of other execution nodes at the upper level, such as aggregate nodes and join nodes. Therefore, it increases the concurrency of the entire query plan execution. Modifying this parameter can help to improve the efficiency of large queries, but a larger value can consume more machine resources, such as CPU, memory, and disk IO.
query_cache_size
This is for compatibility with the MySQL client. It has no practical effect.
query_cache_type
This is for compatibility with the JDBC connection pool C3P0. It has no practical effect.
-query_timeout
This is for setting the query timeout. This variable applies to all query statements and INSERT statements in the current connection. It defaults to 5 minutes, expressed in seconds.
resource_group
Not in use.
send_batch_parallelism
This is for setting the default parallelism for sending batch data during InsertStmt operations. If the parallelism value exceeds that in the BE configuration of max_send_batch_parallelism_per_job
, the coordinating BE will use the value of max_send_batch_parallelism_per_job
.
sql_mode
This is for specifying the SQL pattern to accommodate certain SQL dialects.
sql_safe_updates
This is for compatibility with the MySQL client. It has no practical effect.
sql_select_limit
This is for compatibility with the MySQL client. It has no practical effect.
system_time_zone
This is for displaying the current system time zone which is not modifiable.
time_zone
This is for setting the time zone for the current session. The time zone will affect the results of some time functions.
tx_isolation
This is for compatibility with the MySQL client. It has no practical effect.
version
This is for compatibility with the MySQL client. It has no practical effect.
performance_schema
This is for compatibility with MySQL JDBC of version 8.0.16 and above. It has no practical effect.
version_comment
This is for displaying the version of Doris, which is not modifiable.
wait_timeout
This is for setting the duration of idle connections. When an idle connection has no interaction with Doris within this duration, Doris will actively disconnect this link. It defaults to 8 hours, expressed in seconds.
default_rowset_type
This is for setting the default storage format of the computing node's storage engine. The currently supported storage formats include: alpha/beta.
use_v2_rollup
This is for controlling the access to rollup index data in segment v2 storage format during queries. This variable is used for verification during the segment v2 launch; it is not recommended in other cases.
rewrite_count_distinct_to_bitmap_hll
This is for controlling whether to override count distinct queries of bitmap and hll types as bitmap_union_count and hll_union_agg.
prefer_join_method
When you are choosing between broadcast join or shuffle join implementation, this variable determines which join method is preferred if the cost of broadcast join and shuffle join are equal.
The current optional value for this variable is either "broadcast" and "shuffle".
allow_partition_column_nullable
It determines whether the partition column can be NULL when you are creating tables. By default, this is set to true, meaning it allows NULL values. If set to false, the partition column must be defined as NOT NULL.
insert_visible_timeout_ms
When you are executing an insert statement, after the import action (query and insertion) is completed, there is a need to wait for the transaction to commit to make the data visible. This parameter controls the timeout period for waiting for the data to become visible. It is set to 10,000 by default, and the minimum value is 1,000.
enable_exchange_node_parallel_merge
In a sorted query, when an upper layer node receives ordered data from a lower node, it conducts a corresponding sort on the exchange node to ensure the final data is ordered. However, single-threaded merges of multichannel data can lead to bottlenecks at the exchange node if the data volume is too large.
Doris has optimized this process. If there are too many lower data nodes, the exchange node will launch multi-threaded parallel merges to accelerate the sorting process. By default, this parameter is set to false, which means the exchange node does not perform parallel merges in order to reduce additional CPU and memory consumption.
extract_wide_range_expr
This is for controlling whether the "Extract Wide Range Common Factor" optimization is enabled. There are two possible values: true and false. It is enabled by default.
enable_fold_constant_by_be
This is for controlling the method of constant folding computation. By default, it is false
, which means the computation is performed in the FE
. If it is set to true
, it is computed through the BE
via an RPC
request.
cpu_resource_limit
This is for limiting the resource consumption of a query. This is an experimental feature. Currently, it limits the number of scan threads for a query on a single node. In this way, the speed of data returned from the lower layer is reduced, thus limiting the overall computational resource consumption of the query. If it is set to 2, a query can use a maximum of 2 scan threads on a single node.
This parameter will override parallel_fragment_exec_instance_num
. If parallel_fragment_exec_instance_num
is set to 4 and this parameter is set to 2, the 4 execution instances on a single node will share a maximum of 2 scan threads.
This parameter is overridden by the cpu_resource_limit
configuration in the user property.
It defaults to -1, which means no limit.
disable_join_reorder
It is for disabling all system automatic join reorder algorithms. There are two possible values: true and false. By default, it is disabled, which means it uses the system's automatic join reorder algorithm. When it is set to true, the system will disable all automatic ordering algorithms and use the SQL's original table sequence for joins.
return_object_data_as_binary
It is for indicating whether to return bitmap/hll results in the select results. In the select into outfile statement, if the export file format is csv, the bimap/hll data will be base64 encoded, and if the file format is parquet, the data will be stored as a byte array.
block_encryption_mode
It is for controlling block encryption pattern, which defaults to empty. When AES encryption is used, it is equivalent to AES_128_ECB
, and when SM3 encryption is used, it is equivalent to SM3_128_ECB
. Optional values are:
AES_128_ECB,
AES_192_ECB,
AES_256_ECB,
AES_128_CBC,
AES_192_CBC,
AES_256_CBC,
AES_128_CFB,
AES_192_CFB,
AES_256_CFB,
AES_128_CFB1,
AES_192_CFB1,
AES_256_CFB1,
AES_128_CFB8,
AES_192_CFB8,
AES_256_CFB8,
AES_128_CFB128,
AES_192_CFB128,
AES_256_CFB128,
AES_128_CTR,
AES_192_CTR,
AES_256_CTR,
AES_128_OFB,
AES_192_OFB,
AES_256_OFB,
SM4_128_ECB,
SM4_128_CBC,
SM4_128_CFB128,
SM4_128_OFB,
SM4_128_CTR,
enable_infer_predicate
This is for controlling whether to perform predicate derivation. There are two possible values: true and false. By default, it is disabled, and the system no longer performs predicate derivation, but conducts related operations with the original predicate. When it is set to true, predicate expansion is carried out.
trim_tailing_spaces_for_external_table_query
This is for controlling whether to filter out trailing spaces when querying Hive external tables. It defaults to false.
Was this page helpful?