This tutorial explains how get your NVIDIA graphics card working with for deep learning.
Table of contents
- Hardware Requirements
- See where you are
- Compatibility
- Instructions with Anaconda
- Tensorflow and CUDA compatibility
- CUDA on Ubuntu
- cuDNN
- Paths
- Test if Tensorflow is working on the GPU
- gcc
- You can also look at CUDNN with PyTorch
- Still stuck
- Other Possible Problems
Hardware Requirements
Before you start, you’ll want to make sure that your graphics card is properly installed and compatible with CUDA.
- To find your graphics card, type
nvidia-smi
- If you’re on Windows, you can use a tool like Speccy.
Then, make sure it is on this list: https://developer.nvidia.com/cuda-gpus
See where you are
Sometimes you’ll get stuck somewhere in the middle, unsure of what actually installed correctly. You don’t want to start from the beginning because you don’t want to have multiple versions conflicting, but you don’t know what you need to do next. That’s why I want to start this off with some ways for you to figure out exactly where you are in the process first.
Are GPUs Installed
type nvidia-smi
and see if everything there is as you expect
julius@julius-MS-7B09:~/git$ nvidia-smi
Tue Feb 2 15:05:43 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:0A:00.0 Off | N/A |
| 0% 31C P8 8W / 260W | 5MiB / 11019MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 208... Off | 00000000:42:00.0 On | N/A |
| 41% 36C P5 21W / 260W | 632MiB / 11016MiB | 27% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1255 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 1255 G /usr/lib/xorg/Xorg 350MiB |
| 1 N/A N/A 1687 G /usr/bin/gnome-shell 84MiB |
| 1 N/A N/A 45805 G ...gAAAAAAAAA --shared-files 95MiB |
| 1 N/A N/A 46570 G ...gAAAAAAAAA --shared-files 99MiB |
+-----------------------------------------------------------------------------+
Note: The CUDA Version displayed isn’t there isn’t necessarily the version you have. It’s the highest version that your driver can support.
Can TensorFlow Detect GPUs
Here’s a good one-liner for this:
python -c "import tensorflow as tf; print('tf version:', tf.__version__); print('Num GPU devices: ', len(tf.config.list_physical_devices('GPU')))"
If it says Num GPU devices: 0
then your GPUs are not being recognized.
One of the most common problems is version incompatibility, so we’ll dig into that.
Compatibility
Version incompatibility is probably the biggest source of problems This is a big deal in all this.
CUDA and Tensorflow and cuDNN
Before you go any further, you should have a target. You should be clear on exactly which version of everything you’re trying to install. To help you find out, TensorFlow maintains this chart with the latest compatibility information.
CUDA version
cat /usr/local/cuda/version.txt
You can see what versions of CUDA are avilable:
conda search cudatoolkit
You can also find your version by opening a command prompt and enter nvcc -V
(tf) julius@julius-MS-7B09:~/git$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
cuDNN version
It’s probably here:
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
But if you’re not sure you can check here:
cat $(whereis cudnn.h) | grep CUDNN_MAJOR -A 2
cat $(whereis cuda)/include/cudnn.h | grep CUDNN_MAJOR -A 2
Windows
If you’re on Windows you should be able to find them here:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\include
You can also type this: where cudnn*
Instructions with Anaconda
This is my recommendation
Install Anaconda like normal. It’s a little annoying on Windows because of how paths work. If you are on Windows, don’t add it to the path environment in the setup window
What you’ll do is open Anaconda3 from the Start menu and …
Although tensorflow-gpu and tensorflow is a distinction of version <= 1.15, the distinction matters quite a lot here. If you do conda create -n tf tensorflow
it will not create a GPU version, even though it installs a 2.X version of Tensorflow. You’ll need to use conda create -n tf tensorflow-gpu
to get the GPU version.
Tensorflow and CUDA compatibility
Make sure you check what version of CUDA to install. The latest version will not necessarily be supported by TensorFlow.
Check here for the version: https://www.tensorflow.org/install/install_windows
To check your driver version you can go to GEForce Experience and click on “Drivers” Install the most recent NVIDIA Driver: http://www.nvidia.com/Download/index.aspx?lang=en-us
Then install the CUDA Toolkit: https://developer.nvidia.com/cuda-downloads If you want an older version, you can go here: https://developer.nvidia.com/cuda-10.1-download-archive-base After you input your operation system information, you have the option of downloading either the network or local installer. Either will work. “ The Network Installer allows you to download only the files you need. The Local Installer is a stand-alone installer with a large initial download.”
Here’s an example extraction path: C:\Users\Julius\AppData\Local\Temp\CUDA
OK, now let’s see if that worked by running nvcc --version
Using these sites: http://blog.nitishmutha.com/tensorflow/2017/01/22/TensorFlow-with-gpu-for-windows.html
Then to upgrade it, pip install tensorflow --upgrade
You can download the wheels from here: https://github.com/mind/wheels/releases/tag/tf1.4-gpu-cuda9
CUDA on Ubuntu
You should be able to install on a debian-based Linux with sudo apt install nvidia-cuda-toolkit
Now you should be able to see it when you check your CUDA version (nvcc --version
)
cuDNN
The main page only has the current version, which is 8.1 at that time of writing:
You can find the latest version of cuDNN here. However, you’ll probably need a version from the cuDNN archive.
You can also find the release notes on old versions of cuDNN.
Windows
You’ll need to get cuDNN. The guide is here.
On Windows the procedure is similar. Download and unzip the files.
For me, they download into a folder like C:\Users\Julius\Downloads\cudnn-10.1-windows10-x64-v7.6.5.32
Then you have to copy files:
Copy the following files into the CUDA Toolkit directory.
Going from here:
C:\Users\HMISYS\Downloads\cudnn-8.0-windows7-x64-v6.0\cuda\lib\x64
to here:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0
Copy the following files into the CUDA Toolkit directory.
- Copy
\cuda\bin\cudnn64_7.dll to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin. - Copy
\cuda\ include\cudnn.h to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\include. - Copy
\cuda\lib\x64\cudnn.lib to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\lib\x64.
Make sure you get the version right. You can cd to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA
and the use dir
to see what versions you have (it should match what you previously saw).
After extracting the files, here’s my install path:
C:\Users\Julius\Downloads\cudnn-10.1-windows10-x64-v7.6.5.32
For these to work you’ll probably need to run as administrator
Then, from your install path, you’ll want to:
copy cuda\bin\cudnn*.dll "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin"
copy cuda\include\cudnn*.h "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\include"
copy cuda\lib\x64\cudnn*.lib "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\lib\x64"
Make sure your destination is right too: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin
Installing
Once you’ve downloaded the correct version, it will probably be in a Download folder, somewhere like /home/julius/Downloads
You’ll need to extract it with something like this:
tar -xzvf cudnn-10.1-linux-x64-v7.6.5.32.tgz
Then you’ll need to copy the files like so:
$ sudo cp cuda/include/cudnn*.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
Then change the permissions like so:
$ sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
Paths
On Unix machines, you’ll need to add these to your .bashrc:
export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
export INCLUDE=/usr/local/cuda/include
Test if Tensorflow is working on the GPU
You can see all your physical devices like so:
import tensorflow as tf
tf.config.experimental.list_physical_devices()
and you can limit them to the GPU:
tf.config.experimental.list_physical_devices('GPU')
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
gcc
Make sure you have gcc:
gcc --version
You can also look at CUDNN with PyTorch
print(torch.__config__.show())
Still stuck
If you’re still stuck at the end, you can try:
python -c "import tensorflow as tf; print('tf version:', tf.version); tf.config.list_physical_devices()"
This should give you an error message that you can Google.
python -c "import tensorflow as tf; print('Num GPUs Available: ', len(tf.config.experimental.list_physical_devices('GPU')))"
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
Check whether devices are availble
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
What to do if they are not?
if it says 110 that means 11.X, so if the latest is 11.2 that’s fine.
Unzip it
<installpath>\cuda\bin\cudnn*.dll to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vx.x\bin
If you downloaded it to Downloads, after you extract it
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2
it will look like this
cp C:\Users\Julius\Downloads
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2
It’ll extract to C:\Users\Julius\Downloads\cudnn-11.2-windows-x64-v8.1.1.33
`copy C:\Users\Julius\Downloads\cudnn-11.2-windows-x64-v8.1.1.33\cuda\bin\cudnn*.dll "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin"`
`copy C:\Users\Julius\Downloads\cudnn-11.2-windows-x64-v8.1.1.33\cuda\include\cudnn*.h "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\include"`
`copy C:\Users\Julius\Downloads\cudnn-11.2-windows-x64-v8.1.1.33\cuda\lib\x64\cudnn*.lib "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\lib\x64"`
Then set your environment variables
- Do the system variables not the user ones
- This might already be done but ensure that it is
Other Possible Problems
Verify that there aren’t conflicting drivers (Linux only)
verify you have CUDA-enabled GPU:
You should see something saying “NVIDIA” when you do:
lspci | grep -i nvidia
But you shouldn’t see anything from:
lsmod | grep nouveau
If you do, you’ll need to remove it