Setting up Ubuntu Server 22.04 for TensorFlow

MY SETUP

I have a computer with 2 Nvidia Quadro 6000s running Ubuntu Server 22.04. Getting setup for TensorFlow has been difficult, but this is the set of instructions I followed and some articles that helped along the way.

GET VANILLA UBUNTU READY

sudo apt update
sudo apt upgrade
sudo apt-get install nano

NVIDIA DRIVER SETUP

Purge any NVidia drivers previously installed with:

sudo apt-get purge nvidia*

Disable nouveau driver

sudo nano /etc/modprobe.d/blacklist-nouveau.conf

This last command will create the config file where you will put the next 2 lines:

blacklist nouveau
options nouveau modeset=0

Hit ctrl+x to save the file and exit then regenerate the kernel initramfs and then reboot with

sudo update-initramfs -u
sudo reboot

Install dependencies:

sudo apt-get install build-essential
sudo apt-get install xorg
sudo apt-get install xorg-dev
sudo apt install nvidia-settings

Download and install NVidia driver (you will need to look up the path for your Nvidia driver).

wget https://us.download.nvidia.com/XFree86/Linux-x86_64/535.129.03/NVIDIA-Linux-x86_64-535.129.03.run
sudo sh NVIDIA-Linux-x86_64-535.129.03.run

You will see a warning about not knowing the path to install libglvnd library. This can be ignored. Select "Yes" to update the X config file if prompted.

You can test the install with

nvidia-smi

This will test the install and report the cuda enabled cards on the system.

INSTALL CUDNN LIBRARIES

For the TensorFlow tutorials (zero to hero) you will also need the NVidia CudNN libraries.

sudo apt-get install zlib1g
sudo apt-get install nvidia-cudnn

The second line is a big install (30 minutes or so) and you will need to enter through and accept the EULA (enter through and enter 2 for "I agree" before the install starts).

PYTHON AND TENSORFLOW SETUP

Ubuntu comes with Python installed already. TensorFlow requires Python version 3.9-3.11. You can check what version you have with

python3 -V

but you should be fine on Ubuntu 22.04 (I have 3.10). We will need to install the Python installer (pip).

sudo apt install python3-pip

And install TensorFlow with the command

pip3 install tensorflow[and-cuda]

This command may take a little bit to complete.

Once done you can confirm it worked with

python3 -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

TensorFlow is very verbose. You may receive warnings that I've been too lazy to fix. You may receive several lines of "something something something was already registered". Fine. You may receive many more lines about a NUMA node returning -1 so it was reassigned as 0. This is also just a warning and is fine. I know if you can't get over the NUMA node warning you can setup a chron job to rewrite the NUMANode files in all the devices folders to contain something other than -1 but that's covered by smarter folks than me in other articles. If the last line after running the above TensorFlow model returns something like
tf.Tensor(-11.84156, shape=(), dtype=float32)
than winner, winner, chicken dinner. I highly recommend the TensorFlow "Zero to Hero" series on YouTube for getting started after this.

References

https://askubuntu.com/questions/841876/how-to-disable-nouveau-kernel-driver
https://gist.github.com/wangruohui/df039f0dc434d6486f5d4d098aa52d07
https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html
https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html
https://www.tensorflow.org/install/pip