Getting TensorFlow to work with CUDA can be a real headache. You have to make sure that the versions of TensorFlow, CUDA, cuDNN match up. Missing one small detail can throw everything off.
Luckly, Google provides some pre-configured Docker images so you don't have to deal with this hassle. But there is still a bit of a hassle in managing the container, right?
Making things even easier, the lovely folks from System76 created Tensorman, which abstracts away all the complexity of pulling the container, running our app, stopping, removing the container, etc.
Running TensorFlow with Tensorman
With Tensorman installed (apt install tensorman
from PopOS), it's as easy as:
$ tensorman run --gpu python -- ./script.py
This takes care of everything, even stopping and removing the container once the process goes down.
Testing
Here's a sample script if you want to test it out:
import tensorflow as tf
print("TensorFlow version:", tf.__version__)
print("CUDA support:", tf.test.is_built_with_cuda())
print("GPU available:", tf.config.list_physical_devices('GPU'))
matrix1 = tf.constant([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]])
matrix2 = tf.constant([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]])
product = tf.matmul(matrix1, matrix2)
print("Matrix multiplication result: ", product.numpy())
Removing NUMA warning messages
When running my sample script I got my output as expected:
Ugh, I don't want all those NUMA warning messages on my output. Here's how I've disable it (thanks to this gist):
$ lspci | grep -i nvidia
10:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2060] (rev a1)
10:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1)
10:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev a1)
10:00.3 Serial bus controller: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)
$ sudo echo 0 | sudo tee -a "/sys/bus/pci/devices/0000:10:00.0/numa_node"
0
Running again:
Still noisy, but much better.
Top comments (0)