Overview
The Hyper Computing ClusterPNV4h instance is equipped with A100 GPUs and supports NvLink & NvSwitch. It requires the installation of the nvidia-fabricmanager service corresponding to the driver version to enable interconnection between GPUs. If you are using this instance, see this document to install the nvidia-fabricmanager service. Otherwise, you may not be able to use the GPU instance properly.
Directions
This document takes the driver version 470.103.01 as an example. You can follow the steps below for installation. You can replace the driver version after the version
parameter as needed.
Installing nvidia-fabricmanager Service
2. The installation varies by operating system. Run the corresponding command for installation.
version=470.103.01
yum -y install yum-utils
yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
yum install -y nvidia-fabric-manager-${version}-1
version=470.103.01
main_version=$(echo $version | awk -F '.' '{print $1}')
apt-get updateapt
get -y install nvidia-fabricmanager-${main_version}=${version}-*
version=470.103.01
yum -y install yum-utils
yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
yum install -y nvidia-fabric-manager-${version}-1
Starting nvidia-fabricmanager Service
Run the following commands in sequence to start the service.
systemctl enable nvidia-fabricmanager
systemctl start nvidia-fabricmanager
Viewing nvidia-fabricmanager Service Status
Run the following command to view the service status.
systemctl status nvidia-fabricmanager
If the following information is output, the service is installed successfully.
Was this page helpful?