If you need to modify or update the data in Doris, you can use the UPDATE command.
Applicable Scenario
Modify values for rows that meet certain conditions.
For point and small range update, the rows to be updated are best if they are a very small part of the entire table.
The UPDATE command can only be used in tables with Unique data models.
Definitions
Unique model: One of the data models in the Doris system. It separates columns into two categories, Key and Value. When a user imports rows with the same Key, the Value of the later row will overwrite the existing Value. The meaning is consistent with Unique in Mysql.
Basic Principles
It uses the query engine's own where filter logic to select the rows that need to be updated from the table to be updated. Then uses the Unique model's own logic of replacing old data with new data in the Value column, reinserts the updated rows into the table. This results in row-level updates.
Example:
Suppose there is an order table in Doris, where the order ID is the Key column, and the order status and order amount are Value columns. The data status is as follows:
At this point, when the user clicks Pay, the Doris system needs to change the order status of the order with id '1' to 'Pending Shipment', the Update feature is needed.
UPDATE order SET Order Status='Pending Shipment' WHERE Order ID=1;
After the user executes the UPDATE command, the system will perform the following three steps:
Step 1: Read the row that matches WHERE Order ID=1: (1,100,'Pending Payment')
Step 2: Change the order status from 'Pending Payment' to 'Pending Shipment': (1,100,'Pending Shipment')
Step 3: Reinsert the updated row back into the table, thus achieving the update effect.
Since the order table is of UNIQUE model, the later rows with the same Key will take effect, so the final effect is as follows:
Basic Operations
UPDATE Syntax
UPDATE table_name SET value=xxx WHERE condition;
table_name
: table to be updated. It must be a table of UNIQUE model to be eligible for update.
value=xxx
: columns to be updated. Left side of equation should be value column of the Table. Right side can be a constant or the transformed expression of some column in certain table.
For example: value = 1, meaning the value of updated column will change to 1. value = value +1, meaning the value of updated column will auto increment by 1.
condition
: Only rows that meet the condition will be updated. The condition must be an expression that results in a Boolean value.
For example, if k1 = 1, only the rows where the column value for k1 equals 1 will be updated. If k1 = k2, only the rows where the column value for k1 equals the column value for k2 will be updated.
Updating without a condition, i.e. full table updates, is not supported.
Sync
The Update syntax in Doris is a synchronous command. Therefore, the success of the Update statement indicates a successful update and the data are visible.
Performance
The performance of the Update statement is closely related to the number of rows to be updated and the search efficiency of the condition.
Number of rows to be updated: The more rows to be updated, the slower the Update statement will be. This follows the principle of importing. The update process in Doris is more suitable for occasional updates, such as changing the values?of a few rows, and not suitable for large scale data modifications. Large-scale modifications would make the runtime of the Update statement very long.
Search efficiency of the condition: The implementation of the Update function in Doris starts with processing the rows that meet the condition, so if the search efficiency of the condition is high, the speed of Update will also be fast.
The condition column should preferably hit the index or partition bucket trimming, so that Doris does not need to scan the entire table. It can quickly locate the rows to be updated, and hence improve the update efficiency.
Note
It is strongly discouraged to include a UNIQUE model Value column in the condition column in the Update statement, to avoid a full table scan during an update, which would decrease the update efficiency.
Concurrency Control
By default, concurrent multiple Updates are not allowed on the same table at the same time. This is primarily because, Doris currently supports row updates. It means, even if user claims something like SET v2 = 1
, in reality, other value column will be overwritten (although the value does not change).
This leads to a problem if two Update operations update the same row simultaneously - their behavior might be undefined. i.e., there may be dirty data.
However, in practical applications, if users can ensure that even if updates are concurrent, they will not operate on the same row at the same time, they can manually enable this feature. This can be done by modifying the FE configuration enable_concurrent_update
. It default to false. When it is set to true, there are no restrictions on the update concurrency. The user needs to be careful before making a critique and determine whether to change this configuration.
Risk of Usage
As Doris currently supports row updates and uses a read-before-write two-step operation, if an Update statement modifies the same row as another import or Delete statement, the result may be undefined. Therefore, users must exercise caution and control concurrency of their own User-Side for Update statements and other DML (Data Manipulation Language) statements.
Was this page helpful?