About DataX
As a universal offline data import tool open-sourced by Alibaba, DataX is widely used in the industry. DataX enables efficient data synchronization features between various heterogeneous data sources such as MySQL, Oracle, SQLServer, Postgre, HDFS, Hive, ADS, HBase, TableStore(OTS), MaxCompute(ODPS), Hologres, and DRDS, etc.
Importing data into Doris using DataX requires the DataX doriswriter plugin. This plugin uses the Stream Load feature of Doris for data import and works together with DataX service.
User Manual
For the code for the DataX doriswriter plugin, see here. This includes the plugin code and the development environment of the DataX project.
The doriswriter plugin depends on some modules in the DataX code, which are not in the official Maven repository. Thus, when developing the doriswriter plugin, it is necessary to download the complete DataX code library to compile and develop the plugin. Directory Structure
doriswriter/
This directory is the code directory of the doriswriter plugin. All the code in this directory is hosted in the Apache Doris repository. For the doriswriter plugin documentation, see:doriswriter/doc
.
init-env.sh
This script is primarily used to set up the DataX development environment, which includes the following operations:
1. Clone the DataX code library to local.
2. Connect doriswriter/
directory to DataX/doriswriter
directory through soft link.
3. Add the <module>doriswriter</module>
module to the DataX/pom.xml
file.
4. Change the httpclient version in the DataX/core/pom.xml
file from 4.5 to 4.5.13.
Note
Httpclient v4.5 has a bug in handling 307 redirects.
After running this script, developers may start developing or compiling in the DataX/
directory. Due to the soft link, any modifications in the DataX/doriswriter
directory will be reflected in the doriswriter/
directory, which facilitates code submission for developers.
Compilation
1. Run init-env.sh
.
2. Modify the code in DataX/doriswriter
as needed.
3. Compile doriswriter:
3.1 Compile the doriswriter plugin separately:
mvn clean install -pl plugin-rdbms-util,doriswriter -DskipTests
3.2 Compile the entire DataX project:
mvn package assembly:assembly -Dmaven.test.skip=true
The output is in target/datax/datax/
.
Note
The hdfsreader, hdfswriter, and oscarwriter plugins require additional jar packages. If you don't need these plugins, you can remove their modules from DataX/pom.xml
.
3.3 Compilation error
If you encounter the following compilation error:
Could not find artifact com.alibaba.datax:datax-all:pom:0.0.1-SNAPSHOT ...
You can try the following ways to solve it:
3.3.2 After decompression, copy the directory alibaba/datax/
obtained to the corresponding .m2/repository/com/alibaba/
of the maven in use.
3.3.3 Try compiling again.
3.3.4 Submit modifications as needed.
Sample code
Stream reads data and imports it into Doris
For instructions on how to use this sample plugin, please see here. Mysql reads and imports data into Doris
1. Mysql table structure
CREATE TABLE t_test
(
id
bigint(30) NOT NULL,
order_code
varchar(30) DEFAULT NULL COMMENT '',
line_code
varchar(30) DEFAULT NULL COMMENT '',
remark
varchar(30) DEFAULT NULL COMMENT '',
unit_no
varchar(30) DEFAULT NULL COMMENT '',
unit_name
varchar(30) DEFAULT NULL COMMENT '',
price
decimal(12,2) DEFAULT NULL COMMENT '',
PRIMARY KEY(id
) USING BTREE
)ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 ROW_FORMAT=DYNAMIC COMMENT='';
2. Doris table structure
CREATE TABLE ods_t_test
(
id
bigint(30) NOT NULL,
order_code
varchar(30) DEFAULT NULL COMMENT '',
line_code
varchar(30) DEFAULT NULL COMMENT '',
remark
varchar(30) DEFAULT NULL COMMENT '',
unit_no
varchar(30) DEFAULT NULL COMMENT '',
unit_name
varchar(30) DEFAULT NULL COMMENT '',
price
decimal(12,2) DEFAULT NULL COMMENT ''
)ENGINE=OLAP
UNIQUE KEY(id
, order_code
)
DISTRIBUTED BY HASH(order_code
) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 3",
"in_memory" = "false",
"storage_format" = "V2"
);
3. Create datax scriptimport_t_test.json
{
"job": {
"setting": {
"speed": {
"channel": 1
},
"errorLimit": {
"record": 0,
"percentage": 0
}
},
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "xxx",
"password": "xxx",
"column": ["id","order_code","line_code","remark","unit_no","unit_name","price"],
"connection": [ { "table": [ "t_test" ], "jdbcUrl": [ "jdbc:mysql://10.10.10.1:3306/demo" ] } ] }
},
"writer": {
"name": "doriswriter",
"parameter": {
"feLoadUrl": ["127.0.0.1:8030","127.0.0.2:8030"],
"beLoadUrl": ["127.0.0.3:8040","127.0.0.4:8040","127.0.0.5:8040"],
"jdbcUrl": "jdbc:mysql://127.0.0.1:9030/",
"database": "demo",
"table": "ods_t_test",
"column": ["id","order_code","line_code","remark","unit_no","unit_name","price"],
"username": "xxx",
"password": "xxx",
"postSql": [],
"preSql": [],
"loadProps": {
},
"maxBatchRows" : 300000,
"maxBatchByteSize" : 20971520
}
}
}
]
}
}
Was this page helpful?