tencent cloud

Feedback

Using Semi-Managed Migration Agent

Last updated: 2024-05-30 10:50:05

    Overview

    msp-agent is a tool for migrating data to COS. You can deploy it on a server in your data center or in the cloud. It will execute the semi-managed migration tasks created in the MSP console, facilitating the easy migration of cloud storage data to COS.

    Support Features

    Migration from all the source vendors supported in the console.
    Easy-to-deploy distributed master-worker architecture for efficient migration of a high volume of data.
    Supports Breakpoint Resume.
    Supports Traffic Control.
    Supports all features that can be entered in the console, in line with fully managed service features.
    Supports both push and pull modes. Please contact the Storage Architect to design the migration plan.
    Push mode: Deploy the MSP Agent on the host nearest to the data source to push data to the COS target bucket.
    Pull mode: Deploy the MSP Agent on the host nearest to the Tencent Cloud COS target bucket to pull data from the data source and write it into the COS target bucket.
    Supports both Dedicated Line and Public Internet networking modes, see Network Traffic.
    Supports Private Network Global Acceleration. Enabling the global acceleration feature will incur COS global acceleration traffic costs, which are charged by COS. For more details, see Global Acceleration Overview.

    Running environment

    Suitable for Linux systems.

    System Deployment Method

    Installation

    Download the msp-agent installation package msp-agent, then decompress it. Here is the directory structure after decompression:
    
    
    Note:
    msp-agent adopts the distributed master-worker architecture, where one master can connect to one or multiple workers.
    The 'master' directory is for the master, and the 'worker' directory is for the workers.
    If you need to deploy multiple workers, copy the entire 'worker' directory, then modify the relevant parameters before starting them using the method below.
    You can start multiple worker processes on a single server. However, you need to modify relevant parameters as detailed below to avoid port conflicts.

    Startup

    Start the master:
    cd {path-to-msp-agent}/master && ./bin/start.sh
    Start the worker:
    cd {path-to-msp-agent}/worker && ./bin/start.sh

    Configuration Parameter Description

    Both the master and worker directories contain the same configs structure:
    
    
    Among them, pl_config.yaml configures the main parameters for process execution; app_logger_config.yaml configures the application runtime log format; query_logger_config.yaml configures the master-slave RPC communication log format.

    Log configuration

    You can basically use the default log parameter settings.
    Note:
    Log Rolling Configuration: If the disk space is limited and the task size is large, you need to adjust the log configuration to save disk space (Migration uses the disk only for storing logs, actual file migration does not use the disk).

    Master configuration

    Parameter
    Definition
    Description
    "....gRPCPort"
    gRPC port for listening on the master
    This parameter specifies the port used to receive the information reported by workers. This port must be opened to worker servers on the master server.
    failFilePartSize
    Part size of a failed file
    This parameter specifies the part size of a failed file, which is 10,485,760 bytes by default. The product of this value multiplied by 10,000 is the maximum size of failed files, i.e., 100 GB. If there are both high numbers of files to be migrated and failed files, for example, more than 200 million files, you can increase this value appropriately.
    fragMaxSize
    Fragment size of an assigned task
    To reduce the pressure on master-worker communication, the master packages multiple file paths into a fragment and assigns it to a worker as a subtask. fragMaxSize is the maximum size of files in each fragment in bytes.
    If the value is too small, the pressure on master-worker communication will get higher, and server resources will be wasted; if it is too large, workers will become too slow to report the completion of fragment migration and have imbalanced loads.
    The default value is 10737418240, i.e., 100 GB.
    When fragMaxSize or fragMaxNum reaches the limit during packaging, the system will stop adding more files.
    fragMaxNum
    Maximum number of files of an assigned task fragment
    To reduce the pressure on master-worker communication, the master packages multiple file paths into a fragment and assigns it to a worker as a subtask. fragMaxNum is the maximum number of files in each fragment.
    If the value is too small, the pressure on master-worker communication will get higher, and server resources will be wasted; if it is too large, workers will become too slow to report the completion of fragment migration and have imbalanced loads.
    The default value is 1,000.
    When fragMaxSize or fragMaxNum reaches the limit during packaging, the system will stop adding more files.
    secretId
    The SecretId used to request TencentCloud APIs for MSP
    As the master process needs to request TencentCloud APIs for MSP to get the tasks created in the console, you need to enter the secretId of your key.
    Note that the key here refers to your key used to create MSP tasks, not the keys of source and target buckets.
    
    secretKey
    The SecretKey used to request TencentCloud APIs for MSP
    As the master process needs to request TencentCloud APIs for MSP to get the tasks created in the console, you need to enter the secretKey of your key. Note that the key here refers to your key used to create MSP tasks, not the keys of source and target buckets.
    listerIp
    Private IP of the server where the master process is deployed
    You may create multiple tasks, and want them to run on different clusters. Therefore, you need to enter the Private IP Address of the server where the Master process is deployed. This way, the Master will only run tasks that are assigned to this server's IP during task creation in the Console.
    When creating tasks in the Console, enter the same IP in the form for the Master Node Private IP.
    
    
    
    useAccelerateDomain
    Access the internal COS bucket holding the Failed File List using the Global Acceleration Domain
    To avoid the failure of saving the "Failed File List" causing the retry of migration tasks to fail, the Failed File List of tasks is uniformly saved to a bucket in domestic regions. When the MSP Agent is deployed overseas, to ensure the task runs without errors, it is recommended to set this configuration to true in the configuration file.

    Worker configuration

    Parameter
    Definition
    Description
    "....gRPCPort"
    gRPC port for listening on the worker
    This parameter specifies the port used to receive the scheduling information from the master. This port must be opened to the master server on the worker server. If multiple worker processes run on a single server, you need to change this parameter to a value different from that of other workers, so as to avoid process startup failure due to port conflicts.
    fileMigrateTryTimes
    Number of retries
    This parameter specifies the maximum number of retries after a file fails.
    goroutineConcurrentNum
    Number of concurrent coroutines
    This parameter specifies the number of concurrent coroutines, i.e., the number of files to be migrated concurrently. The value is subject to two factors: server configuration and average file size. The more the server CPU cores, the greater the value can be set. The larger the average file size, the smaller the value can be set. As a small number of concurrent coroutines can lead to a high bandwidth usage, if the average file size is small, you should add more concurrent coroutines to increase the overall bandwidth.
    baseWorkerMaxConcurrentFileNum
    Queue of cached files to be migrated
    To improve the assignment efficiency of distributed tasks, each worker caches a certain number of file fragments to be migrated. If the files to be migrated are small ( that is, the migration QPS is high), you can increase this value to make workers less hungry during consumption. However, the larger the cache, the more the status data that the master needs to store, which will increase the master load and instability. Therefore, you should choose an appropriate value.
    partSize
    Part size
    This parameter specifies the default part size for multipart upload during large file migration.
    downloadPartTimeout
    Download timeout period
    This parameter specifies the file download timeout period in seconds.
    uploadPartTimeout
    Upload timeout period
    This parameter specifies the file upload timeout period in seconds.
    perHostMaxIdle
    HTTP client concurrency
    This parameter specifies the connection pool size per host and generally should be set to the same value as goroutineConcurrentNum.
    addr
    Private network communication address of the master
    This parameter specifies the master communication address of workers, so that workers will register with the master to form a cluster. For example, if the master's listerIp is 10.0.0.1, and the gRPC port to be listened on in the master configuration is 22011, then set addr to 10.0.0.1:22011.
    sample
    Whether to perform spot check
    The overview section describes the cases of data consistency check after migration. If source files don't have a Content-MD5 or CRC-64 checksum, they cannot be directly checked for data consistency. In this case, you can only perform spot check to reduce their probability of inconsistency. If this parameter is set to true, spot check will be performed on such files.
    sampleTimes
    Number of sampled fragments per file
    This parameter specifies the number of sampled fragments of a single file. Each sampled fragment will add one download request and increase the download traffic usage.
    sampleByte
    Sampled fragment size in bytes
    This parameter specifies the size of each sampled fragment. The greater the value, the higher the sampling bandwidth.
    useInternalAccDomain
    Using the COS private network global acceleration domain
    Enabling this configuration as true will use the private network global acceleration domain when uploading files to the target COS bucket, enhancing the performance of data upload to COS; if the source bucket is also a COS bucket, the private network global acceleration domain will be used for downloading as well. Before enabling this configuration, please ensure that both the target COS bucket and the source COS bucket (if the migration source is a COS bucket) have global acceleration enabled.
    Note:
    Enabling this configuration will incur COS global acceleration traffic costs, which are charged by COS. For more details, see Global Acceleration Overview.

    Network traffic

    If the data source is a competitor:
    Managed mode
    Dedicated Line or Public network
    Network traffic
    Push Mode
    Dedicated Line
    Through Dedicated Line, migration performance is guaranteed; Dedicated Line is connected with COS, please contact the Storage Architect.
    Public network
    Migration performance is limited by public network bandwidth.
    Pull Mode
    Dedicated Line
    As above, peer providers will incur outbound external network traffic.
    Public network
    As above, peer providers will incur outbound external network traffic.
    If the data source is Tencent Cloud COS, migrating buckets within the same region uses the private network; migrating buckets across regions uses the public network.
    
    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support