Using Semi-Managed Migration Agent

Overview
msp-agent is a tool for migrating data to COS. You can deploy it on a server in your data center or in the cloud. It will execute the semi-managed migration tasks created in the Cloud Migration console, facilitating the easy migration of cloud storage data to COS.
Support Features
Migration from all the source vendors supported in the console.
Easy-to-deploy distributed master-worker architecture for efficient migration of a high volume of data.
Supports Breakpoint Resume.
Supports Traffic Control.
Supports all features that can be entered in the console, in line with fully managed service features.
Supports both push and pull modes. Please contact the Storage Architect to design the migration plan.
Push mode: Deploy the msp-agent on the host nearest to the data source to push data to the COS target bucket.
Pull mode: Deploy the msp-agent on the host nearest to the Tencent Cloud COS target bucket to pull data from the data source and write it into the COS target bucket.
Supports both Dedicated Line and Public Internet networking modes, see Network Traffic.
Supports Private Network Global Acceleration. Enabling the global acceleration feature will incur COS global acceleration traffic costs, which are charged by COS. For more details, see Global Acceleration Overview.
Running environment
Suitable for Linux systems.
System Deployment Method
Installation
Download the msp-agent installation package msp-agent, then decompress it. Here is the directory structure after decompression:
﻿
﻿
Note:
msp-agent adopts the distributed master-worker architecture, where one master can connect to one or multiple workers.
The 'master' directory is for the master, and the 'worker' directory is for the workers.
If you need to deploy multiple workers, copy the entire 'worker' directory, then modify the relevant parameters before starting them using the method below.
You can start multiple worker processes on a single server. However, you need to modify relevant parameters as detailed below to avoid port conflicts.
Startup
Start the master:
  cd {path-to-msp-agent}/master && ./bin/start.sh
Start the worker:
cd {path-to-msp-agent}/worker && ./bin/start.sh
Configuration Parameter Description
Both the master and worker directories contain the same configs structure:
﻿
﻿
Among them, pl_config.yaml configures the main parameters for process execution; app_logger_config.yaml configures the application runtime log format; query_logger_config.yaml configures the master-slave RPC communication log format.
Log configuration
You can basically use the default log parameter settings.
Note:
Log Rolling Configuration: If the disk space is limited and the task size is large, you need to adjust the log configuration to save disk space (Migration uses the disk only for storing logs, actual file migration does not use the disk).
Master configuration
Parameter
Definition
Description
"....gRPCPort"
gRPC port for listening on the master
This parameter specifies the port used to receive the information reported by workers. This port must be opened to worker servers on the master server.
failFilePartSize
Part size of a failed file
This parameter specifies the part size of a failed file, which is 10,485,760 bytes by default. The product of this value multiplied by 10,000 is the maximum size of failed files, i.e., 100 GB. If there are both high numbers of files to be migrated and failed files, for example, more than 200 million files, you can increase this value appropriately.
fragMaxSize
Fragment size of an assigned task
To reduce the pressure on master-worker communication, the master packages multiple file paths into a fragment and assigns it to a worker as a subtask. fragMaxSize is the maximum size of files in each fragment in bytes. 
If the value is too small, the pressure on master-worker communication will get higher, and server resources will be wasted; if it is too large, workers will become too slow to report the completion of fragment migration and have imbalanced loads. 
The default value is 10737418240, i.e., 100 GB. 
When fragMaxSize or fragMaxNum reaches the limit during packaging, the system will stop adding more files.
fragMaxNum
Maximum number of files of an assigned task fragment
To reduce the pressure on master-worker communication, the master packages multiple file paths into a fragment and assigns it to a worker as a subtask. fragMaxNum is the maximum number of files in each fragment. 
If the value is too small, the pressure on master-worker communication will get higher, and server resources will be wasted; if it is too large, workers will become too slow to report the completion of fragment migration and have imbalanced loads. 
The default value is 1,000. 
When fragMaxSize or fragMaxNum reaches the limit during packaging, the system will stop adding more files.
secretId
The SecretId used to request TencentCloud APIs for Cloud Migration
As the master process needs to request TencentCloud APIs for Cloud Migration to get the tasks created in the console, you need to enter the secretId of your key. 
Note that the key here refers to your key used to create Cloud Migration tasks, not the keys of source and target buckets.
﻿
secretKey
The SecretKey
 used to request TencentCloud APIs for Cloud Migration
As the master process needs to request TencentCloud APIs for Cloud Migration to get the tasks created in the console, you need to enter the secretKey of your key. Note that the key here refers to your key used to create Cloud Migration tasks, not the keys of source and target buckets.
listerIp
Private IP of the server where the master process is deployed
You may create multiple tasks, and want them to run on different clusters. Therefore, you need to enter the Private IP Address of the server where the Master process is deployed. This way, the Master will only run tasks that are assigned to this server's IP during task creation in the Console.
When creating tasks in the Console, enter the same IP in the form for the Master Node Private IP.
﻿
﻿
﻿
useAccelerateDomain
Access the internal COS bucket holding the Failed File List using the Global Acceleration Domain
To avoid the failure of saving the "Failed File List" causing the retry of migration tasks to fail, the Failed File List of tasks is uniformly saved to a bucket in domestic regions. When the msp-agent is deployed overseas, to ensure the task runs without errors, it is recommended to set this configuration to true in the configuration file.
Worker configuration
Parameter
Definition
Description
"....gRPCPort"
gRPC port for listening on the worker
This parameter specifies the port used to receive the scheduling information from the master. This port must be opened to the master server on the worker server. If multiple worker processes run on a single server, you need to change this parameter to a value different from that of other workers, so as to avoid process startup failure due to port conflicts.
fileMigrateTryTimes
Number of retries
This parameter specifies the maximum number of retries after a file fails.
goroutineConcurrentNum
Number of concurrent coroutines
This parameter specifies the number of concurrent coroutines, i.e., the number of files to be migrated concurrently. The value is subject to two factors: server configuration and average file size. The more the server CPU cores, the greater the value can be set. The larger the average file size, the smaller the value can be set. As a small number of concurrent coroutines can lead to a high bandwidth usage, if the average file size is small, you should add more concurrent coroutines to increase the overall bandwidth.
baseWorkerMaxConcurrentFileNum
Queue of cached files to be migrated
To improve the assignment efficiency of distributed tasks, each worker caches a certain number of file fragments to be migrated. If the files to be migrated are small ( that is, the migration QPS is high), you can increase this value to make workers less hungry during consumption. However, the larger the cache, the more the status data that the master needs to store, which will increase the master load and instability. Therefore, you should choose an appropriate value.
partSize
Part size
This parameter specifies the default part size for multipart upload during large file migration.
downloadPartTimeout
Download timeout period
This parameter specifies the file download timeout period in seconds.
uploadPartTimeout
Upload timeout period
This parameter specifies the file upload timeout period in seconds.
perHostMaxIdle
HTTP client concurrency
This parameter specifies the connection pool size per host and generally should be set to the same value as goroutineConcurrentNum.
addr
Private network communication address of the master
This parameter specifies the master communication address of workers, so that workers will register with the master to form a cluster. For example, if the master's listerIp is 10.0.0.1, and the gRPC port to be listened on in the master configuration is 22011, then set addr to 10.0.0.1:22011.
sample
Whether to perform spot check
The overview section describes the cases of data consistency check after migration. If source files don't have a Content-MD5 or CRC-64 checksum, they cannot be directly checked for data consistency. In this case, you can only perform spot check to reduce their probability of inconsistency. If this parameter is set to true, spot check will be performed on such files.
sampleTimes
Number of sampled fragments per file
This parameter specifies the number of sampled fragments of a single file. Each sampled fragment will add one download request and increase the download traffic usage.
sampleByte
Sampled fragment size in bytes
This parameter specifies the size of each sampled fragment. The greater the value, the higher the sampling bandwidth.
useInternalAccDomain
Using the COS private network global acceleration domain
Enabling this configuration as true will use the private network global acceleration domain when uploading files to the target COS bucket, enhancing the performance of data upload to COS; if the source bucket is also a COS bucket, the private network global acceleration domain will be used for downloading as well. Before enabling this configuration, please ensure that both the target COS bucket and the source COS bucket (if the migration source is a COS bucket) have global acceleration enabled.
Note:
Enabling this configuration will incur COS global acceleration traffic costs, which are charged by COS. For more details, see Global Acceleration Overview.
Network traffic
If the data source is a competitor:
Managed mode
Dedicated Line or Public network
Network traffic
Push Mode
Dedicated Line
Through Dedicated Line, migration performance is guaranteed; Dedicated Line is connected with COS, please contact the Storage Architect.
﻿
Public network
Migration performance is limited by public network bandwidth.
Pull Mode
Dedicated Line
As above, peer providers will incur outbound external network traffic.
﻿
Public network
As above, peer providers will incur outbound external network traffic.
If the data source is Tencent Cloud COS, migrating buckets within the same region uses the private network; migrating buckets across regions uses the public network.
﻿

Was this page helpful?

You can also Contact Sales or Submit a Ticket for help.

Yes

tencent cloud

Overview

Support Features

Running environment

System Deployment Method

Installation

Startup

Configuration Parameter Description

Log configuration

Master configuration

Worker configuration

Network traffic

About Tencent Cloud

Help & Support

Resources

User Center

Parameter	Definition	Description
"....gRPCPort"	gRPC port for listening on the master	This parameter specifies the port used to receive the information reported by workers. This port must be opened to worker servers on the master server.
failFilePartSize	Part size of a failed file	This parameter specifies the part size of a failed file, which is 10,485,760 bytes by default. The product of this value multiplied by 10,000 is the maximum size of failed files, i.e., 100 GB. If there are both high numbers of files to be migrated and failed files, for example, more than 200 million files, you can increase this value appropriately.
fragMaxSize	Fragment size of an assigned task	To reduce the pressure on master-worker communication, the master packages multiple file paths into a fragment and assigns it to a worker as a subtask. `fragMaxSize` is the maximum size of files in each fragment in bytes. If the value is too small, the pressure on master-worker communication will get higher, and server resources will be wasted; if it is too large, workers will become too slow to report the completion of fragment migration and have imbalanced loads. The default value is 10737418240, i.e., 100 GB. When `fragMaxSize` or `fragMaxNum` reaches the limit during packaging, the system will stop adding more files.
fragMaxNum	Maximum number of files of an assigned task fragment	To reduce the pressure on master-worker communication, the master packages multiple file paths into a fragment and assigns it to a worker as a subtask. `fragMaxNum` is the maximum number of files in each fragment. If the value is too small, the pressure on master-worker communication will get higher, and server resources will be wasted; if it is too large, workers will become too slow to report the completion of fragment migration and have imbalanced loads. The default value is 1,000. When `fragMaxSize` or `fragMaxNum` reaches the limit during packaging, the system will stop adding more files.
secretId	The `SecretId` used to request TencentCloud APIs for Cloud Migration	As the master process needs to request TencentCloud APIs for Cloud Migration to get the tasks created in the console, you need to enter the `secretId` of your key. Note that the key here refers to your key used to create Cloud Migration tasks, not the keys of source and target buckets.
secretKey	The `SecretKey` used to request TencentCloud APIs for Cloud Migration	As the master process needs to request TencentCloud APIs for Cloud Migration to get the tasks created in the console, you need to enter the `secretKey` of your key. Note that the key here refers to your key used to create Cloud Migration tasks, not the keys of source and target buckets.
listerIp	Private IP of the server where the master process is deployed	You may create multiple tasks, and want them to run on different clusters. Therefore, you need to enter the Private IP Address of the server where the Master process is deployed. This way, the Master will only run tasks that are assigned to this server's IP during task creation in the Console. When creating tasks in the Console, enter the same IP in the form for the Master Node Private IP.
useAccelerateDomain	Access the internal COS bucket holding the Failed File List using the Global Acceleration Domain	To avoid the failure of saving the "Failed File List" causing the retry of migration tasks to fail, the Failed File List of tasks is uniformly saved to a bucket in domestic regions. When the msp-agent is deployed overseas, to ensure the task runs without errors, it is recommended to set this configuration to true in the configuration file.

Managed mode	Dedicated Line or Public network	Network traffic
Push Mode	Dedicated Line	Through Dedicated Line, migration performance is guaranteed; Dedicated Line is connected with COS, please contact the Storage Architect.
Push Mode		Public network	Migration performance is limited by public network bandwidth.
Pull Mode	Dedicated Line	As above, peer providers will incur outbound external network traffic.
Pull Mode		Public network	As above, peer providers will incur outbound external network traffic.

tencent cloud

Sign Up

Log in

Compute

Microservice

Data Migration

Database SaaS Tool

Data Security

Application Security

Big Data

Voice Technology

Internet of Things

Stream Services

Cloud Real-time Rendering

Cloud Resource Management

More

Edge Computing

Serverless

Relational Database

Networking

Business Security

Domains & Websites

Face Recognition

AI Platform Service

Middleware

Media On-Demand

Game Services

Management and Audit Tools

Container

Essential Storage Service

Enterprise Distributed DBMS

CDN and Acceleration

Security Services

Enterprise Applications

Image Creation

Natural Language Processing

Communication

Media Process Services

Education Sevices

Developer Tools

Distributed cloud

Data Process and Analysis

NoSQL Database

Network Security

Cloud Security

Office Collaboration

Tencent Big Model

Optical Character Recognition

Interactive Video Services

Media SDK

Medical Services

Monitor and Operation

Overview

Support Features

Running environment

System Deployment Method

Installation

Startup

Configuration Parameter Description

Log configuration

Master configuration

Worker configuration

Network traffic

About Tencent Cloud

Help & Support

Resources

User Center