Setting up Ubuntu Server 22.04 for TensorFlow
MY SETUP
I have a computer with 2 Nvidia Quadro 6000s running Ubuntu Server 22.04. Getting setup for TensorFlow has been difficult, but this is the set of instructions I followed and some articles that helped along the way.
GET VANILLA UBUNTU READY
sudo apt update
sudo apt upgrade
sudo apt-get install nano
NVIDIA DRIVER SETUP
Purge any NVidia drivers previously installed with:
sudo apt-get purge nvidia*
Disable nouveau driver
sudo nano /etc/modprobe.d/blacklist-nouveau.conf
This last command will create the config file where you will put the next 2 lines:
blacklist nouveau
options nouveau modeset=0
Hit ctrl+x to save the file and exit then regenerate the kernel initramfs and then reboot with
sudo update-initramfs -u
sudo reboot
Install dependencies:
sudo apt-get install build-essential
sudo apt-get install xorg
sudo apt-get install xorg-dev
sudo apt install nvidia-settings
Download and install NVidia driver (you will need to look up the path for your Nvidia driver).
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/535.129.03/NVIDIA-Linux-x86_64-535.129.03.run
sudo sh NVIDIA-Linux-x86_64-535.129.03.run
You will see a warning about not knowing the path to install libglvnd library. This can be ignored. Select "Yes" to update the X config file if prompted.
You can test the install with
nvidia-smi
This will test the install and report the cuda enabled cards on the system.
INSTALL CUDNN LIBRARIES
For the TensorFlow tutorials (zero to hero) you will also need the NVidia CudNN libraries.
sudo apt-get install zlib1g
sudo apt-get install nvidia-cudnn
The second line is a big install (30 minutes or so) and you will need to enter through and accept the EULA (enter through and enter 2 for "I agree" before the install starts).
PYTHON AND TENSORFLOW SETUP
Ubuntu comes with Python installed already. TensorFlow requires Python version 3.9-3.11. You can check what version you have with
python3 -V
but you should be fine on Ubuntu 22.04 (I have 3.10). We will need to install the Python installer (pip).
sudo apt install python3-pip
And install TensorFlow with the command
pip3 install tensorflow[and-cuda]
This command may take a little bit to complete.
Once done you can confirm it worked with
python3 -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
TensorFlow is very verbose. You may receive warnings that I've been too lazy to fix. You may receive several lines of "something something something was already registered". Fine. You may receive many more lines about a NUMA node returning -1 so it was reassigned as 0. This is also just a warning and is fine. I know if you can't get over the NUMA node warning you can setup a chron job to rewrite the NUMANode files in all the devices folders to contain something other than -1 but that's covered by smarter folks than me in other articles. If the last line after running the above TensorFlow model returns something like
tf.Tensor(-11.84156, shape=(), dtype=float32)
than winner, winner, chicken dinner. I highly recommend the TensorFlow "Zero to Hero" series on YouTube for getting started after this.
References
https://askubuntu.com/questions/841876/how-to-disable-nouveau-kernel-driver
https://gist.github.com/wangruohui/df039f0dc434d6486f5d4d098aa52d07
https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html
https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html
https://www.tensorflow.org/install/pip