tencent cloud

Feedback

INSERT INTO

Last updated: 2024-06-27 11:03:33
    The usage of the Insert Into statement is similar to that of Insert Into statements in MySQL and other databases. However, in Doris, all data writing is a separate import job. Therefore, Insert Into is also introduced as an import method.
    The main Insert Into commands include the following two types:
    INSERT INTO tbl SELECT ...
    INSERT INTO tbl (col1, col2, ...) VALUES (1, 2, ...), (1,3, ...); The second command is for demo only, not in a test or production environment.

    Basic Operations

    Creation Import

    Insert Into commands need to be submitted via the MySQL protocol and the import request will return the import result synchronously. Syntax:
    INSERT INTO table_name [partition_info] [WITH LABEL label] [col_list] [query_stmt] [VALUES];
    Example:
    INSERT INTO tbl2 WITH LABEL label1 SELECT * FROM tbl3;
    INSERT INTO tbl1 VALUES ("qweasdzxcqweasdzxc"), ("a");
    Note
    When the CTE (Common Table Expressions) is required to be used as the query part in the insert operation, WITH LABEL and the column list part must be specified.
    Example:
    INSERT INTO tbl1 WITH LABEL label1
    WITH cte1 AS (SELECT * FROM tbl1), cte2 AS (SELECT * FROM tbl2)
    SELECT k1 FROM cte1 JOIN cte2 WHERE cte1.k1 = 1;
    
    
    INSERT INTO tbl1 (k1)
    WITH cte1 AS (SELECT * FROM tbl1), cte2 AS (SELECT * FROM tbl2)
    SELECT k1 FROM cte1 JOIN cte2 WHERE cte1.k1 = 1;
    The following mainly introduces the parameters used in the creation of import statements:
    partition_info Target partition of the import table. If the target partition is specified, only the data that meets the target partition will be imported. If not specified, the default is all partitions of this table.
    col_list The target column of the import table, which can exist in any order. If no target column is specified, the default value is all the columns of this table. If a column in the table does not exist in the target column, that column requires a default value. Otherwise, the Insert Into will fail. If the column type of the query result is inconsistent with the type of the target column, an implicit type conversion will be invoked. If the conversion cannot be done, the Insert Into statement will report a syntax parsing error.
    query_stmt A query statement is used to import the results of the query into other tables of the Doris system. The query statement supports any SQL query syntax supported by Doris.
    VALUES Users can insert one or more sets of data through the VALUES syntax.
    Note
    The VALUES method is only suitable for importing a few sets of data as a DEMO, and is completely not applicable to any test and production environments. The Doris system itself is also not suitable for single-row data import scenarios. It is recommended to use the INSERT INTO SELECT method to import in batches.
    WITH LABEL The INSERT operation, as an import task, can also specify a label. If not specified, the system will automatically assign a UUID as the label. This feature requires version 0.11+.
    Note
    It is recommended to specify a label instead of automatically being assigned by the system. If the label is automatically assigned by the system, but the network disconnection occurs due to network errors in the process of executing the Insert Into statement, it is impossible to know whether the Insert Into was successful. However, if a label is specified, you can view the task result again through the label.

    Importing Result

    Insert Into is a SQL command in itself and its return result will be one of the following, depending on the execution result:
    1. Result set is empty. If the result set of the corresponding select statement for insert is empty, the return is as follows:
    mysql> insert into tbl1 select * from empty_tbl;
    Query OK, 0 rows affected (0.02 sec)
    Query OK indicates successful execution. 0 rows affected means no data was imported.
    2. Result set is not empty. In the case where the result set is not empty. The possible return results are:
    2.1 Insert is executed successfully and visible:
    mysql> insert into tbl1 select * from tbl2;
    Query OK, 4 rows affected (0.38 sec)
    {'label':'insert_8510c568-9eda-4173-9e36-6adc7d35291c', 'status':'visible', 'txnId':'4005'}
    
    mysql> insert into tbl1 with label my_label1 select * from tbl2;
    Query OK, 4 rows affected (0.38 sec)
    {'label':'my_label1', 'status':'visible', 'txnId':'4005'}
    
    mysql> insert into tbl1 select * from tbl2;
    Query OK, 2 rows affected, 2 warnings (0.31 sec)
    {'label':'insert_f0747f0e-7a35-46e2-affa-13a235f4020d', 'status':'visible', 'txnId':'4005'}
    
    mysql> insert into tbl1 select * from tbl2;
    Query OK, 2 rows affected, 2 warnings (0.31 sec)
    {'label':'insert_f0747f0e-7a35-46e2-affa-13a235f4020d', 'status':'committed', 'txnId':'4005'}
    Query OK indicates successful execution. 4 rows affected indicates that a total of 4 lines of data were imported. 2 warnings indicates the number of rows that were filtered. It will also return a json string:
    {'label':'my_label1', 'status':'visible', 'txnId':'4005'}
    {'label':'insert_f0747f0e-7a35-46e2-affa-13a235f4020d', 'status':'committed', 'txnId':'4005'}
    {'label':'my_label1', 'status':'visible', 'txnId':'4005', 'err':'some other error'}
    label is the user-specified label or automatically generated label. Label is the identifier of this import job of Insert Into. Each import job has a label that is unique within a single database. status represents whether the imported data is visible. If it's visible, it shows visible, if not visible, it shows committed. txnId is the id of the import transaction corresponding to this insert. err field will display some other unexpected errors. When you need to view filtered rows, you can use the following statement:
    show load where label="xxx";
    The URL in the returned result can be used to query the error data. See the Viewing Error Row summary below. Invisibility of the data is a temporary state. This batch of data will eventually be visible You can use the following statement to view the visibility status of this batch of data:
    show transaction where id=4005;
    If the TransactionStatus column in the returned result is visible, it means the data is visible.
    2.2 Insert execution failed Execution failure means that no data was successfully imported, and the following is returned:
    mysql> insert into tbl1 select * from tbl2 where k1 = "a";
    ERROR 1064 (HY000): all partitions have no load data. url: http://10.74.167.16:8042/api/_load_error_log?file=__shard_2/error_log_insert_stmt_ba8bb9e158e4879-ae8de8507c0bf8a2_ba8bb9e158e4879_ae8de8507c0bf8a2
    Where ERROR 1064 (HY000): all partitions have no load data shows the reason of the failure. The url at the end can be used to query the error data. See the Viewing Error Row summary below.
    In summary, the correct processing logic for the returned result of the insert operation should be:
    If the return result is ERROR 1064 (HY000), then the import failed.
    If the return result is Query OK, then the execution was successful.
    If rows affected is 0, the result set is empty and no data was imported.
    If rows affected is greater than 0:
    If the status is committed, it means the data is invisible. Use the show transaction statement to view the status until it becomes visible.
    If the status is visible, the data was imported successfully.
    If warnings are more than 0, it represents there are data filtered, you need to use the show load command to view the filtered rows.

    SHOW LAST INSERT

    In the previous section, we introduced how to handle subsequent processes based on the returned result of insert operations. However, it is difficult to obtain json characters from MySQL libraries in some languages. Therefore, Doris also provides SHOW LAST INSERT command to explicitly obtain the result of the most recent insert operation. After executing an insert operation, you can execute SHOW LAST INSERT within the same session connection. This command will return the result of the most recent insert operation, such as:
    mysql> show last insert\\G
    *************************** 1. row ***************************
    TransactionId: 64067
    Label: insert_ba8f33aea9544866-8ed77e2844d0cc9b
    Database: default_cluster:db1
    Table: t1
    TransactionStatus: VISIBLE
    LoadedRows: 2
    FilteredRows: 0
    This command will return the detailed information of insert and the corresponding transaction. Therefore, after each insert operation, you can continue to execute the show last insert command to obtain the result of the insert.
    Note
    This command only returns the result of the most recent insert operation in the same session connection. If the connection is disconnected or switched to a new connection, it will return an empty set.

    Related System Configuration

    FE configuration

    timeout Import task timeout time (in seconds), If the import task is not completed within the set timeout time, the system will cancel it, and it will become CANCELLED. Currently, Insert Into does not support self-defined import timeout time. All Insert Into's import timeout time is uniform, the default timeout time is 1 hour. If the source file cannot be imported within the specified time, you need to adjust the FE parameter insert_load_default_timeout_second. At the same time, the Insert Into statement is limited by the session variable query_timeout. You can increase the timeout time by SET query_timeout = xxx;. Unit: seconds.

    Session Variables

    enable_insert_strict Insert Into import cannot control the import error rate that can be tolerated. Users can only control this through the session parameter enable_insert_strict.
    When this parameter is set to false, it means that as long as at least one piece of data is correctly imported, it returns success. If there is a failure data, a label will also be returned.
    When this parameter is set to true, if there is any error data, the import will fail.
    By default, it is false. You can configure the parameter using SET enable_insert_strict = true;.
    query_timeout Insert Into is an SQL command, so, Insert Into statement is also limited by the session variable query_timeout. You can increase the timeout time through SET query_timeout = xxx;, Unit: seconds.

    Best Practice

    Application Scenario

    1. When users wish to import only a few false data to validate the features of Doris system, then INSERT syntax is appropriate.
    2. When users wish to perform ETL conversion on the data already in the Doris table and import it into a new Doris table, then INSERT syntax is appropriate.
    3. Users can create an external table such as a MySQL external table which maps to a table in the MySQL system. Or create a Broker external table to map data files on HDFS. Then use the INSERT syntax to import the data from the external table to the Doris table in storage.

    Data Volume

    There is no data volume limit for Insert Into, and it can also support large quantities of data. However, Insert Into has a default timeout, if the imported data volume estimated by users is too large, the system's Insert Into import timeout time needs to be modified.
    The imported data volume = 36G, approximately ≤ 3600s * 10M/s
    Among them, 10M/s is the maximum import limit rate, users need to calculate the average import speed according to the current cluster condition and replace the 10M/s in the formula.

    Complete Example

    The user has a table called store_sales in the sales database, and has created another table called bj_store_sales also in the sales database. The user wants to import the sales records from the store_sales table into the new bj_store_sales table. The volume of data to be imported is approximately: 10G.
    store_sales schema:
    (id, total, user_id, sale_timestamp, region)
    
    bj_store_sales schema:
    (id, total, user_id, sale_timestamp)
    
    About Cluster: The average import speed of the user's current cluster is about 5M/s.
    Step 1: Determining whether to modify the default timeout of Insert Into
    Calculate the approximate time for import
    10G / 5M/s = 2000s
    
    Modify FE Configuration
    insert_load_default_timeout_second = 2000
    Step 2: Creating an import task As the user wishes to perform ETL on the data in one table and import it into the target table, the user should use the Insert Into query_stmt method for import.
    INSERT INTO bj_store_sales WITH LABEL `label` SELECT id, total, user_id, sale_timestamp FROM store_sales where region = "bj";

    FAQs

    Viewing Error Row As Insert Into cannot control the error rate, it can only be set as completely tolerant of error data or completely ignorrant of error data using enable_insert_strict. Therefore, if enable_insert_strict is set to true, Insert Into may fail. If enable_insert_strict is set to false, only a part of qualified data may be imported. When the returned result provides the url field, you can view the error rows using the following command: SHOW LOAD WARNINGS ON "url"; Example:
    SHOW LOAD WARNINGS ON "http://ip:port/api/_load_error_log?file=__shard_13/error_log_insert_stmt_d2cac0a0a16d482d-9041c949a4b71605_d2cac0a0a16d482d_9041c949a4b71605";
    The reasons of the errors often encountered include: the length of the source data column exceeding the length of the destination data column, mismatch of column types, partition incompatibility, and mismatch of column order, etc.
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support