Installation Methods | Method 1: compile and install Pytorch. | Method 2: install Pytorch communication plugins. | (Recommended) Method 3: install NCCL communication plugins. |
Usage Steps | Install TCCL. Recompile and install Pytorch. | Install Pytorch communication plugins. Modify the distributed communication backend. | Install NCCL plugins. Modify the startup script. |
Advantage | No intrusion into business code. | Easy to install. | Easy to install. |
Disadvantage | Require to recompile and install Pytorch. Have requirements for the software environment. | Require to modify the business code. Have requirements for the software environment. | Require to update the sorted list after scaling out cluster nodes. |
Software Dependency on the Environment | Corresponding to NCCL version 2.12. Require glibc version 2.17 or later. Require CUDA version 10.0 or later. | Current installation package only supports Pytorch 1.12 Require glibc version 2.17 or later.
Require CUDA version 10.0 or later. | Install NCCL. |
# Uninstall the existing tccl versions and nccl plugins.dpkg -r tccl && dpkg -r nccl-rdma-sharp-plugins# Download and install tccl v1.5 version.wget https://taco-1251783334.cos.ap-shanghai.myqcloud.com/tccl/TCCL_1.5-ubuntu.20.04.5_amd64.deb && dpkg -i TCCL_1.5-ubuntu.20.04.5_amd64.deb && rm -f TCCL_1.5-ubuntu.20.04.5_amd64.deb
# Uninstall the existing tccl versions and nccl plugins.rpm -e tccl && rpm -e nccl-rdma-sharp-plugins-1.0-1.x86_64# Download tccl v1.5 version.wget https://taco-1251783334.cos.ap-shanghai.myqcloud.com/tccl/tccl-1.5-1.tl2.x86_64.rpm && rpm -ivh --nodeps --force tccl-1.5-1.tl2.x86_64.rpm && rm -f tccl-1.5-1.tl2.x86_64.rpm
#!/bin/bash# Uninstall the current version.pip uninstall -y torch# Download pytorch source code.git clone --recursive https://github.com/pytorch/pytorchcd pytorch# <!Important> Configure the installation path of TCCL.export USE_SYSTEM_NCCL=1export NCCL_INCLUDE_DIR="/opt/tencent/tccl/include"export NCCL_LIB_DIR="/opt/tencent/tccl/lib"# See the official website to add other compilation options.# Install the development environment.python setup.py develop
export NCCL_DEBUG=INFOexport NCCL_SOCKET_IFNAME=eth0export NCCL_IB_GID_INDEX=3export NCCL_IB_DISABLE=0export NCCL_IB_HCA=mlx5_bond_0,mlx5_bond_1,mlx5_bond_2,mlx5_bond_3,mlx5_bond_4,mlx5_bond_5,mlx5_bond_6,mlx5_bond_7export NCCL_NET_GDR_LEVEL=2export NCCL_IB_QPS_PER_CONNECTION=4export NCCL_IB_TC=160export NCCL_IB_TIMEOUT=22export NCCL_PXN_DISABLE=0export TCCL_TOPO_AFFINITY=4
export LD_LIBRARY_PATH=/opt/tencent/tccl/lib:$LD_LIBRARY_PATH
# Uninstall the existing tccl and NCCL plugins.dpkg -r tccl && dpkg -r nccl-rdma-sharp-plugins# Uninstall torch_tccl.pip uninstall -y torch-tccl# Install torch_tccl version 0.0.2.wget https://taco-1251783334.cos.ap-shanghai.myqcloud.com/tccl/torch_tccl-0.0.2_pt1.12-py3-none-any.whl && pip install torch_tccl-0.0.2_pt1.12-py3-none-any.whl && rm -f torch_tccl-0.0.2_pt1.12-py3-none-any.whl
import torch_tccl#args.dist_backend = "nccl"args.dist_backend = "tccl"torch.distributed.init_process_group(backend=args.dist_backend,init_method=args.dist_url,world_size=args.world_size, rank=args.rank)
export NCCL_DEBUG=INFOexport NCCL_SOCKET_IFNAME=eth0export NCCL_IB_GID_INDEX=3export NCCL_IB_DISABLE=0export NCCL_IB_HCA=mlx5_bond_0,mlx5_bond_1,mlx5_bond_2,mlx5_bond_3,mlx5_bond_4,mlx5_bond_5,mlx5_bond_6,mlx5_bond_7export NCCL_NET_GDR_LEVEL=2export NCCL_IB_QPS_PER_CONNECTION=4export NCCL_IB_TC=160export NCCL_IB_TIMEOUT=22export NCCL_PXN_DISABLE=0export TCCL_TOPO_AFFINITY=4
# Uninstall the existing tccl and nccl plugins.dpkg -r tccl && dpkg -r nccl-rdma-sharp-plugins# Download and install nccl 1.2 plugins.wget https://taco-1251783334.cos.ap-shanghai.myqcloud.com/nccl/nccl-rdma-sharp-plugins_1.2_amd64.deb && dpkg -i nccl-rdma-sharp-plugins_1.2_amd64.deb# Please ensure that the version of nccl plugins used within the cluster is consistent. The following are the download and installation commands for nccl 1.0 version. It is recommended to use the more stable nccl 1.2 version.# wget https://taco-1251783334.cos.ap-shanghai.myqcloud.com/nccl/nccl-rdma-sharp-plugins_1.0_amd64.deb && dpkg -i nccl-rdma-sharp-plugins_1.0_amd64.deb && rm -f nccl-rdma-sharp-plugins_1.0_amd64.deb
# Uninstall the existing nccl plugins.rpm -e nccl-rdma-sharp-plugins-1.0-1.x86_64# Download and install nccl 1.2 plugins.wget https://taco-1251783334.cos.ap-shanghai.myqcloud.com/nccl/nccl-rdma-sharp-plugins-1.2-1.x86_64.rpm && rpm -ivh --nodeps --force nccl-rdma-sharp-plugins-1.2-1.x86_64.rpm# Ensure that the version of nccl plugins used within the cluster is consistent. The following is the download and installation commands for nccl 1.0 version. It is recommended to use the more stable nccl 1.2 version.# wget https://taco-1251783334.cos.ap-shanghai.myqcloud.com/nccl/nccl-rdma-sharp-plugins-1.0-1.x86_64.rpm && rpm -ivh --nodeps --force nccl-rdma-sharp-plugins-1.0-1.x86_64.rpm && rm -f nccl-rdma-sharp-plugins-1.0-1.x86_64.rpm
ifconfig eth0
, and each row has one node IP. The format is as follows:root@VM-125-10-tencentos:/workspace# cat ip_eth0.txt172.16.177.28172.16.176.11172.16.177.25172.16.177.12
wget https://taco-1251783334.cos.ap-shanghai.myqcloud.com/tccl/get_rdma_order_by_ip.sh && bash get_rdma_order_by_ip.sh ip_eth0.txt
apt install curl
).root@VM-125-10-tencentos:/workspace# cat hostfile.txt172.16.176.11172.16.177.12172.16.177.25172.16.177.28
export NCCL_DEBUG=INFOexport NCCL_SOCKET_IFNAME=eth0export NCCL_IB_GID_INDEX=3export NCCL_IB_DISABLE=0export NCCL_IB_HCA=mlx5_bond_0,mlx5_bond_1,mlx5_bond_2,mlx5_bond_3,mlx5_bond_4,mlx5_bond_5,mlx5_bond_6,mlx5_bond_7export NCCL_NET_GDR_LEVEL=2export NCCL_IB_QPS_PER_CONNECTION=4export NCCL_IB_TC=160export NCCL_IB_TIMEOUT=22export NCCL_PXN_DISABLE=0# After the machine IP is manually sorted, there is no need to add the following variables.# export TCCL_TOPO_AFFINITY=4
root@vm-3-17-centos:/workspace/ptm/gpt# cat hostfile172.16.176.11 slots=8172.16.177.12 slots=8172.16.177.25 slots=8172.16.177.28 slots=8deepspeed --hostfile ./hostfile --master_addr 172.16.176.11 train.py
--node_rank
,// on 172.16.176.11torchrun --nnodes=4 --nproc_per_node=8 --node_rank=0 --master_addr=172.16.176.11 train.py ...// on 172.16.176.12torchrun --nnodes=4 --nproc_per_node=8 --node_rank=1 --master_addr=172.16.176.11 train.py ...// on 172.16.176.25torchrun --nnodes=4 --nproc_per_node=8 --node_rank=2 --master_addr=172.16.176.11 train.py ...// on 172.16.176.28torchrun --nnodes=4 --nproc_per_node=8 --node_rank=3 --master_addr=172.16.176.11 train.py ...
mpirun \\-np 64 \\-H 172.16.176.11:8,172.16.177.12:8,172.16.177.25:8,172.16.177.28:8 \\--allow-run-as-root \\-bind-to none -map-by slot \\-x NCCL_DEBUG=INFO-x NCCL_IB_GID_INDEX=3 \\-x NCCL_IB_DISABLE=0 \\-x NCCL_SOCKET_IFNAME=eth0 \\-x NCCL_IB_HCA=mlx5_bond_0,mlx5_bond_1,mlx5_bond_2,mlx5_bond_3,mlx5_bond_4,mlx5_bond_5,mlx5_bond_6,mlx5_bond_7 \\-x NCCL_NET_GDR_LEVEL=2 \\-x NCCL_IB_QPS_PER_CONNECTION=4 \\-x NCCL_IB_TC=160 \\-x NCCL_IB_TIMEOUT=22 \\-x NCCL_PXN_DISABLE=0 \\-x LD_LIBRARY_PATH -x PATH \\-mca coll_hcoll_enable 0 \\-mca pml ob1 \\-mca btl_tcp_if_include eth0 \\-mca btl ^openib \\all_reduce_perf -b 1G -e 1G -n 1000 -g 1
Was this page helpful?