LOAD LABEL example_db.label1(DATA INFILE("bos://bucket/input/file")INTO TABLEmy_table
(k1, k2, tmpk3)PRECEDING FILTER k1 = 1SET (k3 = tmpk3 + 1)WHERE k1 > k2)WITH BROKER bos(...);
curl--location-trusted-u user:passwd-H "columns: k1, k2, tmpk3, k3 = tmpk3 + 1"-H "where: k1 > k2"-T file.txthttp://host:port/api/testDb/testTbl/_stream_load
CREATE ROUTINE LOAD example_db.label1 ON my_tableCOLUMNS(k1, k2, tmpk3, k3 = tmpk3 + 1),PRECEDING FILTER k1 = 1,WHERE k1 > k2...
PRECEDING FILTER k1 = 1
(k1, k2, tmpk3)
k3 = tmpk3 + 1
WHERE k1 > k2
Column 1 | Column 2 | Column 3 | Column 4 |
1 | 100 | beijing | 1.1 |
2 | 200 | shanghai | 1.2 |
3 | 300 | guangzhou | 1.3 |
4 | \\N | chongqing | 1.4 |
\\N
stands for null in the source file.k1,k2,k3,k4
four columns in the table. The desired import map relationship is as follows:Column 1 -> k1Column 2 -> k3Column 3 -> k2Column 4 -> k4
(k1, k3, k2, k4)
k1,k2,k3
three columns. The import map relationship we desire is as follows:Column 1 -> k1Column 2 -> k3Column 3 -> k2
(k1, k3, k2, tmpk4)
tmpk4
is a custom column name, which does not exist in the table. Doris will ignore the non-existing
column name.k1,k2,k3,k4,k5
five columns. The import mapping relationship we desire is as follows:Column 1 -> k1Column 2 -> k3Column 3 -> k2
k4,k5
two columns with default values.
The column mapping should be written in the following order:(k1, k3, k2)
k4,k5
have default values, the default values will be populated. Otherwise, if it is a nullable
columns, it will be populated with a null
value. Otherwise, the import job will report an error.Column 1 | Column 2 | Column 3 | Column 4 |
1 | 100 | beijing | 1.1 |
2 | 200 | shanghai | 1.2 |
3 | 300 | guangzhou | 1.3 |
4 | 400 | chongqing | 1.4 |
k1,k2,k3,k4
four columns in the table. The desired mapping and transformation relationship is as follows:Column 1 -> k1Column 2 * 100 -> k3Column 3 -> k2Column 4 -> k4
(k1, tmpk3, k2, k4, k3 = tmpk3 * 100)
tmpk3
, and specify the value of k3
in the table as tmpk3 * 100
. The data in the final table is as follows:k1 | k2 | k3 | k4 |
1 | beijing | 10000 | 1.1 |
2 | shanghai | 20000 | 1.2 |
3 | guangzhou | 30000 | 1.3 |
null | chongqing | 40000 | 1.4 |
k1,k2,k3,k4
four columns in the table. We hope that beijing, shanghai, guangzhou, chongqing
in the source data are tranforted to their corresponding region ids and imported:Column 1 -> k1Column 2 -> k2Column 3 converted to region id -> k3Column 4 -> k4
(k1, k2, tmpk3, k4, k3 = case tmpk3 when "beijing" then 1 when "shanghai" then 2 when "guangzhou" then 3 when "chongqing" then 4 else null end)
k1 | k2 | k3 | k4 |
1 | 100 | 1 | 1.1 |
2 | 200 | 2 | 1.2 |
3 | 300 | 3 | 1.3 |
null | 400 | 4 | 1.4 |
k1,k2,k3,k4
4 columns in the table. While converting the region id, we also want to transform the null value of the k1 column of the source data to 0 and import:Column 1 If null, then transform to 0 -> k1Column 2 -> k2Column 3 -> k3Column 4 -> k4
(tmpk1, k2, tmpk3, k4, k1 = ifnull(tmpk1, 0), k3 = case tmpk3 when "beijing" then 1 when "shanghai" then 2 when "guangzhou" then 3 when "chongqing" then 4 else null end)
k1 | k2 | k3 | k4 |
1 | 100 | 1 | 1.1 |
2 | 200 | 2 | 1.2 |
3 | 300 | 3 | 1.3 |
0 | 400 | 4 | 1.4 |
Column 1 | Column 2 | Column 3 | Column 4 |
1 | 100 | beijing | 1.1 |
2 | 200 | shanghai | 1.2 |
3 | 300 | guangzhou | 1.3 |
4 | 400 | chongqing | 1.4 |
k1,k2,k3,k4
4 columns in the table. We can directly define the filter conditions directly with default column mapping and transformation. If we want to import only the data rows whose fourth column in the source file is greater than 1.2, the filtering condition is as follows:where k4 > 1.2
k1 | k2 | k3 | k4 |
3 | 300 | guangzhou | 1.3 |
null | 400 | chongqing | 1.4 |
k4
column in the table.k1,k2,k3,k4
4 columns in the table. In the column transformation example, we transformed the province name into an id. Here we want to filter out the data with id 3. The transformation and filter conditions are as follows:(k1, k2, tmpk3, k4, k3 = case tmpk3 when "beijing" then 1 when "shanghai" then 2 when "guangzhou" then 3 when "chongqing" then 4 else null end)where k3 != 3
k1 | k2 | k3 | k4 |
1 | 100 | 1 | 1.1 |
2 | 200 | 2 | 1.2 |
null | 400 | 4 | 1.4 |
k1,k2,k3,k4
4 columns in the table. We want to filter out data whose k1
column is null
, and at the same time filter out data whose k4
column is less than 1.2. Then the filter conditions are as follows:where k1 is null and k4 < 1.2
k1 | k2 | k3 | k4 |
2 | 200 | 2 | 1.2 |
3 | 300 | 3 | 1.3 |
preceding filter
or where
column filtering conditions.max_filter_ratio
). If the error rate of imported data is below the threshold, those erroneous rows will be ignored and other correct data will be imported.
The error rate is calculated as follows:#Filtered Rows / (#Filtered Rows + #Loaded Rows)
Unselected Rows
will not be involved in the calculation of error rate.
Was this page helpful?