This document will introduce some advanced features.
Table Structure Change
You can use the ALTER TABLE command to modify the table Schema, including the following changes:
Add columns.
Drop columns.
Modify column types.
Change column order.
Here are some examples.
The Schema of the original table1 is as follows:
+----------+-------------+------+-------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+-------------+------+-------+---------+-------+
| siteid | int(11) | No | true | 10 | |
| citycode | smallint(6) | No | true | N/A | |
| username | varchar(32) | No | true | | |
| pv | bigint(20) | No | false | 0 | SUM |
+----------+-------------+------+-------+---------+-------+
We add a new column uv with the type BIGINT, aggregation type SUM, and a default value of 0:
ALTER TABLE table1 ADD COLUMN uv BIGINT SUM DEFAULT '0' after pv;
After successful submission, you can check the job progress with the following command:
SHOW ALTER TABLE COLUMN;
When the job status is FINISHED, it indicates that the job is complete.
After the ALTER TABLE is completed, you can use DESC TABLE
to view the latest Schema.
mysql> DESC table1;
+----------+-------------+------+-------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+-------------+------+-------+---------+-------+
| siteid | int(11) | No | true | 10 | |
| citycode | smallint(6) | No | true | N/A | |
| username | varchar(32) | No | true | | |
| pv | bigint(20) | No | false | 0 | SUM |
| uv | bigint(20) | No | false | 0 | SUM |
+----------+-------------+------+-------+---------+-------+
5 rows in set (0.00 sec)
You can use the following command to cancel the currently executing job: CANCEL ALTER TABLE COLUMN FROM table1
.
For more help, see HELP ALTER TABLE
.
Rollup
Materialized means its data is physically stored independently, and index implies that Rollup can adjust column order to increase the hit rates of the prefix index and reduce key columns to enhance data aggregation.
Here are some examples.
The Schema of the original table1 is as follows:
+----------+-------------+------+-------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+-------------+------+-------+---------+-------+
| siteid | int(11) | No | true | 10 | |
| citycode | smallint(6) | No | true | N/A | |
| username | varchar(32) | No | true | | |
| pv | bigint(20) | No | false | 0 | SUM |
| uv | bigint(20) | No | false | 0 | SUM |
+----------+-------------+------+-------+---------+-------+
For table1, the detailed data uses siteid, citycode, and username as a composite key to aggregate the pv field. If the business team frequently needs to view the total pv for each city, you can create a rollup with only citycode and pv.
ALTER TABLE table1 ADD ROLLUP rollup_city(citycode, pv);
After successful submission, you can check the job progress with the following command: SHOW ALTER TABLE ROLLUP;
. When the job status is FINISHED, it indicates that the job is complete.
After the Rollup is created, you can use DESC table1 ALL
to view the Rollup information of the table.
mysql> desc table1 all;
+-------------+----------+-------------+------+-------+--------+-------+
| IndexName | Field | Type | Null | Key | Default | Extra |
+-------------+----------+-------------+------+-------+---------+-------+
| table1 | siteid | int(11) | No | true | 10 | |
| | citycode | smallint(6) | No | true | N/A | |
| | username | varchar(32) | No | true | | |
| | pv | bigint(20) | No | false | 0 | SUM |
| | uv | bigint(20) | No | false | 0 | SUM |
| | | | | | | |
| rollup_city | citycode | smallint(6) | No | true | N/A | |
| | pv | bigint(20) | No | false | 0 | SUM |
+-------------+----------+-------------+------+-------+---------+-------+
8 rows in set (0.01 sec)
You can use the following command to cancel the currently executing job: CANCEL ALTER TABLE COLUMN FROM table1
After the Rollup is created, queries do not need to specify the Rollup to be used. The system will automatically determine whether to use the Rollup. You can check whether the Rollup is being used by executing the command EXPLAIN your_sql;
.
For more help, see HELP ALTER TABLE
.
Query Data Tables
Memory Limit
To prevent a user's query from potentially consuming too much memory, Memory limits are applied to query. By default, a single query task will use no more than 2 GB of memory on any BE node.
If users encounter a Memory limit exceeded
error, it usually means the memory limit has been exceeded. When the memory exceeds the limit, users should try to resolve it by optimizing their SQL statements. If the 2GB memory is insufficient, you can manually set memory parameters.
Display query memory limits:
mysql> SHOW VARIABLES LIKE "%mem_limit%";
+---------------+------------+
| Variable_name | Value |
+---------------+------------+
| exec_mem_limit| 2147483648 |
+---------------+------------+
1 row in set (0.00 sec)
The unit of exec_mem_limit
is byte. You can use the SET
command to change the value of exec_mem_limit
. For example, set it to 8GB.
SET exec_mem_limit = 8589934592;
mysql> SHOW VARIABLES LIKE "%mem_limit%";
+---------------+------------+
| Variable_name | Value |
+---------------+------------+
| exec_mem_limit| 8589934592 |
+---------------+------------+
1 row in set (0.00 sec)
Note
The above modification is at the session level and is only effective within the current connection session. Disconnecting and reconnecting will revert it to the default value.
If you need to modify the global variable, you can set it as follows: SET GLOBAL exec_mem_limit = 8589934592;
. After the settings are completed, disconnect the session and log in again for the parameter to take effect permanently.
Query Timeout
The current default query timeout is set to a maximum of 300 seconds. If a query is not complete within 300 seconds, the Doris system will cancel the query. Users can customize the timeout for their applications using this parameter to implement a blocking method similar to wait(timeout).
View Current Timeout Setting:
mysql> SHOW VARIABLES LIKE "%query_timeout%";
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| QUERY_TIMEOUT | 300 |
+---------------+-------+
1 row in set (0.00 sec)
Change the timeout to 1 minute:
SET query_timeout = 60;
Note
The current timeout check interval is 5 seconds, so timeouts shorter than 5 seconds may not be very accurate.
The above modification is also at the session level. You can use SET GLOBAL
to apply the change globally.
Broadcast/Shuffle Join
The system's default method for implementing a Join is to filter the small table based on conditions, broadcast it to each node containing the large table, create an in-memory Hash table, and then stream the large table's data for a Hash Join. But if the filtered data from the small table cannot fit into memory, the Join operation will fail. This typically results in a memory limit exceeded error.
If you encounter this situation, it is recommended to explicitly specify a Shuffle Join, also known as a Partitioned Join. This involves executing Hash both the small table and the large table based on the Join key, and then performing a distributed Join. This approach distributes the memory consumption across all computing nodes in the cluster.
Doris will automatically attempt a Broadcast Join. If the estimated size of the small table is too large, it will automatically switch to a Shuffle Join. Note that if Broadcast Join is explicitly specified under these conditions, Doris will still automatically switch to Shuffle Join.
mysql> select sum(table1.pv) from table1 join table2 where table1.siteid = 2;
+--------------------+
| sum(table1
.pv
) |
+--------------------+
| 10 |
+--------------------+
1 row in set (0.20 sec)
Using Broadcast Join (Explicitly Specified):
mysql> select sum(table1.pv) from table1 join [broadcast] table2 where table1.siteid = 2;
+--------------------+
| sum(table1
.pv
) |
+--------------------+
| 10 |
+--------------------+
1 row in set (0.20 sec)
Using Shuffle Join:
mysql> select sum(table1.pv) from table1 join [shuffle] table2 where table1.siteid = 2;
+--------------------+
| sum(table1
.pv
) |
+--------------------+
| 10 |
+--------------------+
1 row in set (0.15 sec)
Query Retries and High Availability
When multiple FE nodes are deployed, users can implement high availability for Doris by deploying a Cloud Load Balancer over the FE nodes.
Here are some high availability solutions:
First Solution
Implement retries and use a Cloud Load Balancer in your application layer code. For example, if a connection fails, it will automatically retry on another connection. Application layer code retries require the application to configure multiple Doris front-end node addresses.
Second Solution
If you use the MySQL JDBC connector to connect to Doris, you can use the JDBC automatic retry mechanism:
jdbc:mysql://[host:port],[host:port].../[database][?propertyName1][=propertyValue1][&propertyName2][=propertyValue2]...
Third Solution
Applications can connect to a MySQL Proxy deployed on the same machine as the application, achieving the desired result by configuring the Failover and Load Balance features of the MySQL Proxy.
http://dev.mysql.com/doc/refman/5.6/en/mysql-proxy-using.html
Was this page helpful?