/data01/jars
.JAR Filename | Description | Download Address |
cos-distcp-1.12-3.1.0.jar | COSDistCp package, whose data needs to be copied to COSN. | |
chdfs_hadoop_plugin_network-2.8.jar | OFS plugin | |
Hadoop-COS | 8.1.5 or later | |
cos_api-bundle | The version needs to match the Hadoop-COS version. |
cosn://bucketname-appid/
starting from v8.1.5.core-site.xml
and distribute the configuration to all nodes. If only data needs to be migrated, you don't need to restart the big data component.Key | Value | Configuration File | Description |
fs.cosn.trsf.fs.ofs.impl | com.qcloud.chdfs.fs.CHDFSHadoopFileSystemAdapter | core-site.xml | COSN implementation class, which is required. |
fs.cosn.trsf.fs.AbstractFileSystem.ofs.impl | com.qcloud.chdfs.fs.CHDFSDelegateFSAdapter | core-site.xml | COSN implementation class, which is required. |
fs.cosn.trsf.fs.ofs.tmp.cache.dir | In the format of `/data/emr/hdfs/tmp/` | core-site.xml | Temporary directory, which is required. It will be created on all MRS nodes. You need to ensure that there are sufficient space and permissions. |
fs.cosn.trsf.fs.ofs.user.appid | `appid` of your COS bucket | core-site.xml | Required |
fs.cosn.trsf.fs.ofs.ranger.enable.flag | false | core-site.xml | This key is required. You need to check whether the value is `false`. |
fs.cosn.trsf.fs.ofs.bucket.region | Bucket region | core-site.xml | This key is required. Valid values: eu-frankfurt (Frankfurt), ap-chengdu (Chengdu), and ap-singapore (Singapore). |
hdfs:///data/user/target
to cosn://{bucketname-appid}/data/user/target
.hdfs dfsadmin -disallowSnapshot hdfs:///data/user/hdfs dfsadmin -allowSnapshot hdfs:///data/user/targethdfs dfs -deleteSnapshot hdfs:///data/user/target {current date}hdfs dfs -createSnapshot hdfs:///data/user/target {current date}
hadoop fs -libjars /data01/jars/chdfs_hadoop_plugin_network-2.8.jar -mkdir cosn://bucket-appid/distcp-tmp
nohup hadoop jar /data01/jars/cos-distcp-1.10-2.8.5.jar -libjars /data01/jars/chdfs_hadoop_plugin_network-2.8.jar --src=hdfs:///data/user/target/.snapshot/{current date} --dest=cosn://{bucket-appid}/data/user/target --temp=cosn://bucket-appid/distcp-tmp/ --preserveStatus=ugpt --skipMode=length-checksum --checkMode=length-checksum --cosChecksumType=CRC32C --taskNumber 6 --workerNumber 32 --bandWidth 200 >> ./distcp.log &
--taskNumber=10
.workerNumber=4
.-1
, which indicates no limit on the read bandwidth. Example: --bandWidth=10
.COMPOSITE_CRC32
. The Hadoop version must be 3.1.1 or later; otherwise, you need to change this parameter to --cosChecksumType=CRC64
.workerNumber
to 1
, use the taskNumber
parameter to control the number of concurrent migrations, and use the bandWidth
parameter to control the bandwidth of a single concurrent migration.FILES_FAILED
indicates the number of failed files. If there is no FILES_FAILED
counter, all files have been migrated successfully.CosDistCp CountersBYTES_EXPECTED=10198247BYTES_SKIPPED=10196880FILES_COPIED=1FILES_EXPECTED=7FILES_FAILED=1FILES_SKIPPED=5
Statistics Item | Description |
BYTES_EXPECTED | Total size (in bytes) to copy according to the source directory |
FILES_EXPECTED | Number of files to copy according to the source directory, including the directory itself |
BYTES_SKIPPED | Total size (in bytes) of files that can be skipped (same length or checksum value) |
FILES_SKIPPED | Number of source files that can be skipped (same length or checksum value) |
FILES_COPIED | Number of source files that are successfully copied |
FILES_FAILED | Number of source files that failed to be copied |
FOLDERS_COPIED | Number of directories that are successfully copied |
FOLDERS_SKIPPED | Number of directories that are skipped |
--delete
parameter to guarantee the complete consistency between the HDFS and COS data.--delete
parameter, you need to add the --deleteOutput=/xxx(custom)
parameter but not the --diffMode
parameter.nohup hadoop jar /data01/jars/cos-distcp-1.10-2.8.5.jar -libjars /data01/jars/chdfs_hadoop_plugin_network-2.8.jar --src=--src=hdfs:///data/user/target/.snapshot/{current date} --dest=cosn://{bucket-appid}/data/user/target --temp=cosn://bucket-appid/distcp-tmp/ --preserveStatus=ugpt --skipMode=length-checksum --checkMode=length-checksum --cosChecksumType=CRC32C --taskNumber 6 --workerNumber 32 --bandWidth 200 --delete --deleteOutput=/dele-xx >> ./distcp.log &
trash
directory, and the list of moved files will be generated in the /xxx/failed
directory. You can run hadoop fs -rm URL
or hadoop fs -rmr URL
to delete the data in the trash
directory.
Was this page helpful?