Skip to content

TensorFlow GPU Detection with CUDA 12

Problem Statement

When setting up TensorFlow with CUDA 12 for GPU acceleration, you might encounter the error Could not find cuda drivers on your machine, GPU will not be used, despite correct NVIDIA driver installation, valid environment paths, and successful verification through PyTorch or NVIDIA tools. This typically occurs because TensorFlow ships with precompiled binaries linked to specific CUDA versions, and CUDA 12 support required newer TensorFlow releases.

Common symptoms:

  • TensorFlow fails to detect GPUs while nvidia-smi shows correct drivers
  • Torch/PyTorch recognizes the GPU correctly
  • Library path validations (libcuda, libcudart, libcudnn) resolve successfully
  • Errors mention missing libraries or NUMA node issues

Solutions to GPU Detection Failure

Best for Linux & Latest GPUs

TensorFlow now bundles compatible CUDA libraries via the tensorflow[and-cuda] package. This automatically resolves version conflicts.

bash
# 1. Create a clean virtual environment
python -m venv tf-gpu-env
source tf-gpu-env/bin/activate

# 2. Install TF with bundled CUDA support
pip install tensorflow[and-cuda] --upgrade

Verification:

python
import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))
print(tf.sysconfig.get_build_info())  # Confirm CUDA versions
Expected Output
text
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
{'cuda_version': '12.3', ...}  # CUDA version matching your system

2. Use Anaconda for Environment Management

Best for Cross-Platform Stability

Conda handles complex CUDA dependencies automatically through pre-built channels.

bash
conda create -n tf-gpu tensorflow-gpu
conda activate tf-gpu

3. Fix CUDA Library Path Errors

Required When Not Using Bundled CUDA

If using LD_LIBRARY_PATH manually, ensure all CUDA libraries are discoverable:

bash
# Add CUDA paths to library search path (customize version)
export LD_LIBRARY_PATH=/usr/local/cuda-12.0/lib64:$LD_LIBRARY_PATH  

# Verify library discovery
ldconfig -N -v 2>/dev/null | grep libcudart

4. Fix NUMA Node Warning

For "negative NUMA node" errors

This kernel-related warning can be resolved by forcing NUMA node 0:

bash
for node in /sys/bus/pci/devices/*/numa_node; do 
  [ "$(cat "$node")" == "-1" ] && echo 0 | sudo tee "$node" 
done

Persistent fix:

bash
# Find your GPU's PCI address
lspci | grep -i nvidia
# Apply NUMA override (replace 0000:01:00.0)
echo 'SUBSYSTEM=="pci", ATTR{"address"}=="0000:01:00.0", ATTR{"numa_node"}="0"' | sudo tee /etc/udev/rules.d/99-numa.rules

5. Manual CUDA Downgrade (If TF Versions Require CUDA 11)

Legacy Workaround Only

Use if facing TF version constraints when none of the above solutions work.

  1. Uninstall existing CUDA:
bash
sudo apt-get purge "*cuda*" "*cublas*" "*cufft*" "*cusparse*"
  1. Install CUDA 11.8
  2. Install compatible cuDNN 8.6.0

Common Mistakes to Avoid

Incorrect Package Installation

diff
# WRONG: Pinned version prevents dependency resolution
pyt pip install tensorflow[and-cuda]==2.12.0

# CORRECT: Install latest compatible versions
pip install tensorflow[and-cuda]

Unverified Virtual Environments

python
# BEFORE: Missing GPU
import tensorflow
tf.config.list_physical_devices('GPU') ➜ []

# AFTER: New virtual environment created
source new_venv/bin/activate
pip install tensorflow[and-cuda]
tf.config.list_physical_devices('GPU') ➜ [PhysicalDevice(...)]

Verification Workflow

  1. Hardware Check
bash
nvidia-smi  # Verify driver and GPU detection
nvcc --version  # Check compiler version
  1. Library Validation
bash
# Check critical libraries (example)
ldconfig -p | grep libcuda.so
  1. TensorFlow Tests
python
# GPU Availability
print("GPUs:", tf.config.list_physical_devices('GPU'))
  1. Environment Diagnostics
python
# Display TF compilation details
print(tf.sysconfig.get_build_info())