tencent cloud

Feedback

Migrating Data from Native HDFS to CHDFS

Last updated: 2022-03-30 09:30:26

    Preparations

    1. Create a CHDFS instance and a CHDFS mount point and configure the permission information at the Tencent Cloud official website.
    2. Access the created CHDFS instance from a CVM instance in a VPC. For more information, please see Creating CHDFS Instance.
    3. After the mount is successful, open the Hadoop command line tool and run the following command to verify whether the CHDFS instance works properly.
    hadoop fs -ls ofs://f4xxxxxxxxxxxxxxx.chdfs.ap-beijing.myqcloud.com/
    If you can see the output similar to the following, the CHDFS instance works properly.
    

    Migration

    After the preparations are completed, you can use the standard DistCp tool in the Hadoop community to perform full or incremental HDFS data migration. For more information, please see DistCp.

    Notes

    The Hadoop DistCp tool provides some parameters that are incompatible with CHDFS. If you specify some parameters in the following table, they will not take effect.
    Parameter
    Description
    Status
    -p[rbax]
    r: replication; b: block-size; a: ACL, x: XATTR
    Not effective

    Samples

    1. When the CHDFS instance is ready, run the following Hadoop command to perform data migration.
    hadoop distcp hdfs://10.0.1.11:4007/testcp ofs://f4xxxxxxxx-xxxx.chdfs.ap-beijing.myqcloud.com/
    Here, f4xxxxxxxx-xxxx.chdfs.ap-beijing.myqcloud.com is the mount point domain name, which needs to be replaced with the information of your actual mount point.
    2. After the Hadoop command is executed, the details of the migration will be printed in the log as shown below:
    2019-12-31 10:59:31 [INFO ] [main:13300] [org.apache.hadoop.mapreduce.Job:] [Job.java:1385]
    Counters: 38
    File System Counters
    FILE: Number of bytes read=0
    FILE: Number of bytes written=387932
    FILE: Number of read operations=0
    FILE: Number of large read operations=0
    FILE: Number of write operations=0
    HDFS: Number of bytes read=1380
    HDFS: Number of bytes written=74
    HDFS: Number of read operations=21
    HDFS: Number of large read operations=0
    HDFS: Number of write operations=6
    OFS: Number of bytes read=0
    OFS: Number of bytes written=0
    OFS: Number of read operations=0
    OFS: Number of large read operations=0
    OFS: Number of write operations=0
    Job Counters
    Launched map tasks=3
    Other local map tasks=3
    Total time spent by all maps in occupied slots (ms)=419904
    Total time spent by all reduces in occupied slots (ms)=0
    Total time spent by all map tasks (ms)=6561
    Total vcore-milliseconds taken by all map tasks=6561
    Total megabyte-milliseconds taken by all map tasks=6718464
    Map-Reduce Framework
    Map input records=3
    Map output records=2
    Input split bytes=408
    Spilled Records=0
    Failed Shuffles=0
    Merged Map outputs=0
    GC time elapsed (ms)=179
    CPU time spent (ms)=4830
    Physical memory (bytes) snapshot=1051619328
    Virtual memory (bytes) snapshot=12525191168
    Total committed heap usage (bytes)=1383071744
    File Input Format Counters
    Bytes Read=972
    File Output Format Counters
    Bytes Written=74
    org.apache.hadoop.tools.mapred.CopyMapper$Counter
    BYTESSKIPPED=5
    COPY=1
    SKIP=2
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support