The usage of the Insert Into statement is similar to that of Insert Into statements in MySQL and other databases. However, in Doris, all data writing is a separate import job. Therefore, Insert Into is also introduced as an import method.
The main Insert Into commands include the following two types:
INSERT INTO tbl SELECT ...
INSERT INTO tbl (col1, col2, ...) VALUES (1, 2, ...), (1,3, ...);
The second command is for demo only, not in a test or production environment.
Basic Operations
Creation Import
Insert Into commands need to be submitted via the MySQL protocol and the import request will return the import result synchronously.
Syntax:
INSERT INTO table_name [partition_info] [WITH LABEL label] [col_list] [query_stmt] [VALUES];
Example:
INSERT INTO tbl2 WITH LABEL label1 SELECT * FROM tbl3;
INSERT INTO tbl1 VALUES ("qweasdzxcqweasdzxc"), ("a");
Note
When the CTE (Common Table Expressions)
is required to be used as the query part in the insert operation, WITH LABEL
and the column list part must be specified.
Example:
INSERT INTO tbl1 WITH LABEL label1
WITH cte1 AS (SELECT * FROM tbl1), cte2 AS (SELECT * FROM tbl2)
SELECT k1 FROM cte1 JOIN cte2 WHERE cte1.k1 = 1;
INSERT INTO tbl1 (k1)
WITH cte1 AS (SELECT * FROM tbl1), cte2 AS (SELECT * FROM tbl2)
SELECT k1 FROM cte1 JOIN cte2 WHERE cte1.k1 = 1;
The following mainly introduces the parameters used in the creation of import statements:
partition_info
Target partition of the import table. If the target partition is specified, only the data that meets the target partition will be imported. If not specified, the default is all partitions of this table.
col_list
The target column of the import table, which can exist in any order. If no target column is specified, the default value is all the columns of this table. If a column in the table does not exist in the target column, that column requires a default value. Otherwise, the Insert Into will fail.
If the column type of the query result is inconsistent with the type of the target column, an implicit type conversion will be invoked. If the conversion cannot be done, the Insert Into statement will report a syntax parsing error.
query_stmt
A query statement is used to import the results of the query into other tables of the Doris system. The query statement supports any SQL query syntax supported by Doris.
VALUES
Users can insert one or more sets of data through the VALUES syntax.
Note
The VALUES method is only suitable for importing a few sets of data as a DEMO, and is completely not applicable to any test and production environments. The Doris system itself is also not suitable for single-row data import scenarios. It is recommended to use the INSERT INTO SELECT method to import in batches.
WITH LABEL
The INSERT operation, as an import task, can also specify a label. If not specified, the system will automatically assign a UUID as the label. This feature requires version 0.11+.
Note
It is recommended to specify a label instead of automatically being assigned by the system. If the label is automatically assigned by the system, but the network disconnection occurs due to network errors in the process of executing the Insert Into statement, it is impossible to know whether the Insert Into was successful. However, if a label is specified, you can view the task result again through the label.
Importing Result
Insert Into is a SQL command in itself and its return result will be one of the following, depending on the execution result:
1. Result set is empty.
If the result set of the corresponding select statement for insert is empty, the return is as follows:
mysql> insert into tbl1 select * from empty_tbl;
Query OK, 0 rows affected (0.02 sec)
Query OK
indicates successful execution. 0 rows affected
means no data was imported.
2. Result set is not empty.
In the case where the result set is not empty. The possible return results are:
2.1 Insert is executed successfully and visible:
mysql> insert into tbl1 select * from tbl2;
Query OK, 4 rows affected (0.38 sec)
{'label':'insert_8510c568-9eda-4173-9e36-6adc7d35291c', 'status':'visible', 'txnId':'4005'}
mysql> insert into tbl1 with label my_label1 select * from tbl2;
Query OK, 4 rows affected (0.38 sec)
{'label':'my_label1', 'status':'visible', 'txnId':'4005'}
mysql> insert into tbl1 select * from tbl2;
Query OK, 2 rows affected, 2 warnings (0.31 sec)
{'label':'insert_f0747f0e-7a35-46e2-affa-13a235f4020d', 'status':'visible', 'txnId':'4005'}
mysql> insert into tbl1 select * from tbl2;
Query OK, 2 rows affected, 2 warnings (0.31 sec)
{'label':'insert_f0747f0e-7a35-46e2-affa-13a235f4020d', 'status':'committed', 'txnId':'4005'}
Query OK
indicates successful execution. 4 rows affected
indicates that a total of 4 lines of data were imported. 2 warnings
indicates the number of rows that were filtered.
It will also return a json string:
{'label':'my_label1', 'status':'visible', 'txnId':'4005'}
{'label':'insert_f0747f0e-7a35-46e2-affa-13a235f4020d', 'status':'committed', 'txnId':'4005'}
{'label':'my_label1', 'status':'visible', 'txnId':'4005', 'err':'some other error'}
label
is the user-specified label or automatically generated label. Label is the identifier of this import job of Insert Into. Each import job has a label that is unique within a single database.
status
represents whether the imported data is visible. If it's visible, it shows visible
, if not visible, it shows committed
.
txnId
is the id of the import transaction corresponding to this insert.
err
field will display some other unexpected errors.
When you need to view filtered rows, you can use the following statement:
show load where label="xxx";
The URL in the returned result can be used to query the error data. See the Viewing Error Row summary below.
Invisibility of the data is a temporary state. This batch of data will eventually be visible
You can use the following statement to view the visibility status of this batch of data:
show transaction where id=4005;
If the TransactionStatus
column in the returned result is visible
, it means the data is visible.
2.2 Insert execution failed
Execution failure means that no data was successfully imported, and the following is returned:
mysql> insert into tbl1 select * from tbl2 where k1 = "a";
ERROR 1064 (HY000): all partitions have no load data. url: http://10.74.167.16:8042/api/_load_error_log?file=__shard_2/error_log_insert_stmt_ba8bb9e158e4879-ae8de8507c0bf8a2_ba8bb9e158e4879_ae8de8507c0bf8a2
Where ERROR 1064 (HY000): all partitions have no load data
shows the reason of the failure. The url at the end can be used to query the error data. See the Viewing Error Row summary below.
In summary, the correct processing logic for the returned result of the insert operation should be:
If the return result is ERROR 1064 (HY000)
, then the import failed.
If the return result is Query OK
, then the execution was successful.
If rows affected
is 0, the result set is empty and no data was imported.
If rows affected
is greater than 0:
If the status
is committed
, it means the data is invisible. Use the show transaction
statement to view the status until it becomes visible
.
If the status
is visible
, the data was imported successfully.
If warnings
are more than 0, it represents there are data filtered, you need to use the show load
command to view the filtered rows.
SHOW LAST INSERT
In the previous section, we introduced how to handle subsequent processes based on the returned result of insert operations. However, it is difficult to obtain json characters from MySQL libraries in some languages. Therefore, Doris also provides SHOW LAST INSERT
command to explicitly obtain the result of the most recent insert operation.
After executing an insert operation, you can execute SHOW LAST INSERT
within the same session connection. This command will return the result of the most recent insert operation, such as:
mysql> show last insert\\G
*************************** 1. row ***************************
TransactionId: 64067
Label: insert_ba8f33aea9544866-8ed77e2844d0cc9b
Database: default_cluster:db1
Table: t1
TransactionStatus: VISIBLE
LoadedRows: 2
FilteredRows: 0
This command will return the detailed information of insert and the corresponding transaction. Therefore, after each insert operation, you can continue to execute the show last insert
command to obtain the result of the insert.
Note
This command only returns the result of the most recent insert operation in the same session connection. If the connection is disconnected or switched to a new connection, it will return an empty set.
Related System Configuration
FE configuration
timeout
Import task timeout time (in seconds), If the import task is not completed within the set timeout time, the system will cancel it, and it will become CANCELLED.
Currently, Insert Into does not support self-defined import timeout time. All Insert Into's import timeout time is uniform, the default timeout time is 1 hour. If the source file cannot be imported within the specified time, you need to adjust the FE parameter insert_load_default_timeout_second
.
At the same time, the Insert Into statement is limited by the session variable query_timeout
. You can increase the timeout time by SET query_timeout = xxx;
. Unit: seconds.
Session Variables
enable_insert_strict
Insert Into import cannot control the import error rate that can be tolerated. Users can only control this through the session parameter enable_insert_strict
.
When this parameter is set to false, it means that as long as at least one piece of data is correctly imported, it returns success. If there is a failure data, a label will also be returned.
When this parameter is set to true, if there is any error data, the import will fail.
By default, it is false. You can configure the parameter using SET enable_insert_strict = true;
.
query_timeout
Insert Into is an SQL command, so, Insert Into statement is also limited by the session variable query_timeout
. You can increase the timeout time through SET query_timeout = xxx;
, Unit: seconds.
Best Practice
Application Scenario
1. When users wish to import only a few false data to validate the features of Doris system, then INSERT syntax is appropriate. 2. When users wish to perform ETL conversion on the data already in the Doris table and import it into a new Doris table, then INSERT syntax is appropriate. 3. Users can create an external table such as a MySQL external table which maps to a table in the MySQL system. Or create a Broker external table to map data files on HDFS. Then use the INSERT syntax to import the data from the external table to the Doris table in storage. Data Volume
There is no data volume limit for Insert Into, and it can also support large quantities of data. However, Insert Into has a default timeout, if the imported data volume estimated by users is too large, the system's Insert Into import timeout time needs to be modified.
The imported data volume = 36G, approximately ≤ 3600s * 10M/s
Among them, 10M/s is the maximum import limit rate, users need to calculate the average import speed according to the current cluster condition and replace the 10M/s in the formula.
Complete Example
The user has a table called store_sales in the sales database, and has created another table called bj_store_sales also in the sales database. The user wants to import the sales records from the store_sales table into the new bj_store_sales table. The volume of data to be imported is approximately: 10G.
store_sales schema:
(id, total, user_id, sale_timestamp, region)
bj_store_sales schema:
(id, total, user_id, sale_timestamp)
About Cluster: The average import speed of the user's current cluster is about 5M/s.
Step 1: Determining whether to modify the default timeout of Insert Into
Calculate the approximate time for import
10G / 5M/s = 2000s
Modify FE Configuration
insert_load_default_timeout_second = 2000
Step 2: Creating an import task
As the user wishes to perform ETL on the data in one table and import it into the target table, the user should use the Insert Into query_stmt method for import.
INSERT INTO bj_store_sales WITH LABEL `label` SELECT id, total, user_id, sale_timestamp FROM store_sales where region = "bj";
FAQs
Viewing Error Row
As Insert Into cannot control the error rate, it can only be set as completely tolerant of error data or completely ignorrant of error data using enable_insert_strict
. Therefore, if enable_insert_strict
is set to true, Insert Into may fail. If enable_insert_strict
is set to false, only a part of qualified data may be imported.
When the returned result provides the url field, you can view the error rows using the following command: SHOW LOAD WARNINGS ON "url";
Example:
SHOW LOAD WARNINGS ON "http://ip:port/api/_load_error_log?file=__shard_13/error_log_insert_stmt_d2cac0a0a16d482d-9041c949a4b71605_d2cac0a0a16d482d_9041c949a4b71605";
The reasons of the errors often encountered include: the length of the source data column exceeding the length of the destination data column, mismatch of column types, partition incompatibility, and mismatch of column order, etc.
Was this page helpful?