hadoop-cos-2.x.x-${version}.jar
and cos_api-bundle-${version}.jar
to the Datax decompression paths plugin/reader/hdfsreader/libs/
and plugin/writer/hdfswriter/libs/
.bin/datax.py
script in the DataX decompression directory, and modify the CLASS_PATH variable in the script as follows:CLASS_PATH = ("%s/lib/*:%s/plugin/reader/hdfsreader/libs/*:%s/plugin/writer/hdfswriter/libs/*:.") % (DATAX_HOME, DATAX_HOME, DATAX_HOME)
hdfsreader
and hdfswriter
in JSON configuration file{"job": {"setting": {"speed": {"byte": 10485760},"errorLimit": {"record": 0,"percentage": 0.02}},"content": [{"reader": {"name": "hdfsreader","parameter": {"path": "testfile","defaultFS": "cosn://examplebucket-1250000000/","column": ["*"],"fileType": "text","encoding": "UTF-8","hadoopConfig": {"fs.cosn.impl": "org.apache.hadoop.fs.CosFileSystem","fs.cosn.userinfo.region": "ap-beijing","fs.cosn.tmp.dir": "/tmp/hadoop_cos","fs.cosn.userinfo.secretId": "COS_SECRETID","fs.cosn.userinfo.secretKey": "COS_SECRETKEY"},"fieldDelimiter": ","}},"writer": {"name": "hdfswriter","parameter": {"path": "/user/hadoop/","fileName": "testfile1","defaultFS": "cosn://examplebucket-1250000000/","column": [{"name": "col","type": "string"},{"name": "col1","type": "string"},{"name": "col2","type": "string"}],"fileType": "text","encoding": "UTF-8","hadoopConfig": {"fs.cosn.impl": "org.apache.hadoop.fs.CosFileSystem","fs.cosn.userinfo.region": "ap-beijing","fs.cosn.tmp.dir": "/tmp/hadoop_cos","fs.cosn.userinfo.secretId": "COS_SECRETID","fs.cosn.userinfo.secretKey": "COS_SECRETKEY"},"fieldDelimiter": ":","writeMode": "append"}}}]}}
hadoopConfig
as required for cosn.defaultFS
to specify the cosn path, e.g. cosn://examplebucket-1250000000/
.fs.cosn.userinfo.region
, enter the region where your bucket resides, such as ap-beijing
. For more information, see Regions and Access Endpoints.COS_SECRETID
and COS_SECRETKEY
, use your own COS key information.hdfs_job.json
in the job
directory by runningbin/datax.py job/hdfs_job.json
2020-03-09 16:49:59.543 [job-0] INFO JobContainer -[total cpu info] =>averageCpu | maxDeltaCpu | minDeltaCpu-1.00% | -1.00% | -1.00%[total gc info] =>NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTimePS MarkSweep | 1 | 1 | 1 | 0.024s | 0.024s | 0.024sPS Scavenge | 1 | 1 | 1 | 0.014s | 0.014s | 0.014s2020-03-09 16:49:59.543 [job-0] INFO JobContainer - PerfTrace not enable!2020-03-09 16:49:59.543 [job-0] INFO StandAloneJobContainerCommunicator - Total 2 records, 33 bytes | Speed 3B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.033s | Percentage 100.00%2020-03-09 16:49:59.544 [job-0] INFO JobContainer -Job start time : 2020-03-09 16:49:48Job end time : 2020-03-09 16:49:48Job duration : 11sAverage job traffic : 3B/sRecorded write speed : 0rec/sRecorded read count : 2Read/Write failure count : 0
Apakah halaman ini membantu?