Tencent Cloud GPU Service

Not sure which Tencent Cloud GPU instance is right for you? Still fumbling a way to elegantly install the driver and other underlying development tools? These are common questions that bother deep learning researchers and developers. No worries. This document shares all you need to know through the best practices of Tencent Cloud GPU instance.

Environment and objective

The mainstream Tencent Cloud GPU Computing GN7 model with an NVIDIA Tesla T4 GPU is selected as an example, to be more specific, GN7.5XLARGE80. This model features a high comprehensive performance offered by 20 CPU cores and 80 GB memory, making it suitable for general deep neural network training. In addition, it is the most cost-effective GPU T4 model in the world.

Purchase here. Pay-as-you-go models are even more flexible for experiments.

Instance in this tutorial: GPU models like GN7 (recommended) or GN10X

Recommended system image: Ubuntu 18.04 (note that you don't need to select **Automatically install GPU driver on the backend**.)

Other information: Beijing Zone 5, 1 Mbps public network bandwidth, and 100 GB system disk, which can be selected as needed.

Installing CUDA Driver/Toolkit

Install the DEB package from the official website

Download CUDA Toolkit 10.2 from NVIDIA's official website.

https://developer.nvidia.com/cuda-downloads

Select the download method based on your environment. Here, DEB package installation is selected as shown below:

Run the following command as prompted:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600 sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub sudo add-apt-repository "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /" sudo apt-get update sudo apt-get -y install cuda

You can see that the official APT repository source of NVIDIA is installed in the system and `apt install cuda` is executed. Note that the last step of installation takes about five minutes and may be interrupted due to network and other issues, which can be fixed after a retry or two.

Note: The server needs to be restarted here.

Then how to know what is installed and verify the installation?

Verify GPU driver installation - view the device information

The NVIDIA GPU driver is a kernel module supporting CUDA programming. In the above steps, the driver is installed in the form of Dynamic Kernel Module Support (DKMS). This type of kernel module is not in the kernel source tree, but it can be automatically recompiled every time the kernel is updated and upgraded to generate a new initramfs. In addition, it is dynamically loaded as the system starts. In short, the driver only needs to be installed once, which is very convenient.

As can be seen, the NVIDIA instead of Nouveau driver is loaded.

The four kernel modules corresponding to the driver are in the /lib/modules/4.15.0-88-generic/updates/dkms/directory.

nvidia-smi

It is the command to view GPU device status, such as driver version (440.64.00), CUDA version (10.2), model (NVIDIA Tesla T4), video memory (15,109 MiB), temperature, and power consumption, as shown below:

Verify CUDA Toolkit installation - compile code samples

CUDA Toolkit is a development kit that includes build commands, header files, dynamic libraries, documentation, and code samples. It is installed in the /usr/local/cuda directory.

Note: To use CUDA Toolkit, you must set the system `PATH` to find commands such as `nvcc`.

The global `PATH` can be set in many places. The following takes `/etc/profile.d` as an example:

Compile and run the official code sample

There are about 170 official samples. Note that not all of them can be compiled (so the `-k` option is added), and the compilation only takes a few minutes.

Run the samples such as deviceQuery, bandwidthTest, simpleCUBLAS, and simpleCUFFT.

Installing cuDNN

Download from the official website and install

Currently, v7.6.5 is the latest and most stable version.

Download it at https://developer.nvidia.com/rdp/cudnn-download. You need to log in and indicate your consent to the terms and conditions, which are not required for CUDA Toolkit download.

On Ubuntu 18.04, three download packages are required, namely, the runtime library, development library, and code samples.

Download them, upload them (scp/rsync) to the GPU instance, and run `dpkg` for installation.

Verify cuDNN installation - compile and run the sample

cuDNN code samples are in /usr/src/cudnn_samples_v7, including conv_sample, mnistCUDNN, multiHeadAttention, and RNN.

Run RNN as follows:

What a high GFLOPS value thanks to the advancement of computing devices! Come and try out the speed and passion of the GPU GN7 series models.

References

tencent cloud

New User Offers

Next-Generation CDN：EdgeOne

Elasticsearch Service Special Offers

Free Tier

Tencent Cloud Startup Program

Special Offers

Lighthouse Special Offers

Cloud Object Storage Special Offers

Featured Products

New Products

Education

Tencent Cloud Online Education Solutions

Gaming

Gaming Solution

Game Media Solutions

Financial Services

Financial Services Solution

Audio & Video

Audio/Video Solution

LVB Recording Solution

Interactive Classroom Solution

Interactive Live Streaming Solution

Audio Chat Social Networking Solution

Real Estate

Tencent Cloud LinkBase(Weiling)

E-commerce

E-commerce retail solutions

Compute

Cloud Virtual Machine

Auto Scaling

Batch Compute

CVM Dedicated Host

Database

TencentDB for MySQL

TencentDB for Redis®

TencentDB for CTSDB

TDSQL for MySQL

Data Transfer Service

TencentDB for MongoDB

TencentDB for PostgreSQL

TencentDB for SQL Server

TencentDB for TcaplusDB

Video Service

Cloud Streaming Services

Video on Demand

Media Processing Service

Cloud Application Rendering

Cloud Contact Center

Game Multimedia Engine

Chat

Real-time Communication

Tencent Effect SDK

AI and Machine Learning

Image Creation Large Model

Face Fusion

eKYC

Optical Character Recognition

Video Creation Large Model

Industry Applications

Tencent HealthCare Omics Platform

Container and Middleware

TDMQ for CKafka

Serverless Cloud Function

Tencent Kubernetes Engine

Tencent Kubernetes Engine for Serverless

Networking

Cloud Load Balancer

Virtual Private Cloud

Direct Connect

Cloud Connect Network

NAT Gateway

VPN Connection

Bandwidth Package

Anycast Internet Acceleration

Elastic Network Interface

Flow Logs

Global Application Acceleration Platform

Security

Captcha