Overview
The GPU instance must be installed with the necessary infrastructure software in advance. For an NVIDIA GPU instance, the following software packages are required:
Hardware driver for the GPU
Libraries required by upper-level applications
To use NVIDIA GPU instances for general computing tasks, you must install Tesla driver and Compute Unified Device Architecture (CUDA) driver. This document only describes how to install a Tesla driver. For more information on CUDA driver, please see Installing CUDA Driver. Directions
Installing an NVIDIA Tesla driver on a Linux instance
You can use the Shell script to install a driver on the Linux instance. This method is applicable to all Linux distributions, including CentOS and Ubuntu.
When installing an NVIDIA Tesla driver for Linux, the driver needs to compile the kernel module. You must install gcc and packages required to compile the Linux kernel module in advance, such as kernel-devel-$(uname -r)
.
1. Run the following command to check whether dkms has been installed in the operating system:
If the returned result is as shown in the following figure, dkms has been installed.
If dkms is not installed, run the following command to install dkms:
3. Configure the GPU type and operating system, and click SEARCH to search for the driver you need to download, as shown in the following figure. Below uses Tesla V100 as an example.
Note:
You can configure Operating System as Linux 64-bit to download shell setup files. If you configure Operating System to a specific Linux distribution, the corresponding installation files will be downloaded.
4. Select the required version to go to the driver download page, and click DOWNLOAD, as shown in the following figure.
5. You
can skip the page for entering personal information. If the following page appears, right-click AGREE & DOWNLOAD and select Copy link address. 7. Run the wget
command to download the installation package using the URL copied in Step 5, as shown in the following figure. You can also download the installation package to your local computer and upload it to the GPU instance.
8. Add execution permissions to the installation package. For example, run the following command to add execution permissions to the NVIDIA-Linux-x86_64-418.126.02.run
file:
chmod +x NVIDIA-Linux-x86_64-418.126.02.run
9. Run the following commands in sequence to check whether kernel-devel and gcc have been installed in the operating system:
rpm -qa | grep kernel-devel
If the returned result is as shown in the following figure, kernel-devel and gcc have been installed.
If kernel-devel and gcc are not installed, run the following command to install them:
sudo yum install -y gcc kernel-devel
Note:
If the kernel version has been upgraded, you must upgrade kernel-devel to the same version.
10. Run the following command to install the driver as instructed:
sudo sh NVIDIA-Linux-x86_64-418.126.02.run
11. After the installation is completed, run the following command to verify.
If GPU information similar to that shown in the following figure is returned, the installation is successful.
Installing an NVIDIA Tesla driver on a Windows instance
3. Configure the GPU type and operating system, and click SEARCH to search for the driver you need to download, as shown in the following figure. Below uses Tesla V100 as an example.
4. Go to the directory where the downloaded installation package is located, double-click on it to install the driver as instructed, and restart the GPU instance as required.
After the installation is completed, go to Device Manager to check whether the GPU works properly.
Reasons for installation failures
If nvidia-smi does not run properly, the driver has not been installed correctly. Common reasons include:
1. The operating system does not have the required packages installed for compiling the kernel module, such as gcc and kernel-devel.
2. The operating system has kernels in multiple versions. Due to incorrect DKMS configuration, the driver compiles a kernel module that is not in the version of the current kernel, causing kernel module installation to fail.
3. After the driver is installed, kernel version upgrade causes the original installation to fail.
Was this page helpful?