TensorFlow GPU Detection with CUDA 12
Problem Statement
When setting up TensorFlow with CUDA 12 for GPU acceleration, you might encounter the error Could not find cuda drivers on your machine, GPU will not be used
, despite correct NVIDIA driver installation, valid environment paths, and successful verification through PyTorch or NVIDIA tools. This typically occurs because TensorFlow ships with precompiled binaries linked to specific CUDA versions, and CUDA 12 support required newer TensorFlow releases.
Common symptoms:
- TensorFlow fails to detect GPUs while
nvidia-smi
shows correct drivers - Torch/PyTorch recognizes the GPU correctly
- Library path validations (
libcuda
,libcudart
,libcudnn
) resolve successfully - Errors mention missing libraries or NUMA node issues
Solutions to GPU Detection Failure
1. Install TensorFlow with Bundled CUDA Dependencies (Recommended for TF 2.10+)
Best for Linux & Latest GPUs
TensorFlow now bundles compatible CUDA libraries via the tensorflow[and-cuda]
package. This automatically resolves version conflicts.
# 1. Create a clean virtual environment
python -m venv tf-gpu-env
source tf-gpu-env/bin/activate
# 2. Install TF with bundled CUDA support
pip install tensorflow[and-cuda] --upgrade
Verification:
import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))
print(tf.sysconfig.get_build_info()) # Confirm CUDA versions
Expected Output
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
{'cuda_version': '12.3', ...} # CUDA version matching your system
2. Use Anaconda for Environment Management
Best for Cross-Platform Stability
Conda handles complex CUDA dependencies automatically through pre-built channels.
conda create -n tf-gpu tensorflow-gpu
conda activate tf-gpu
3. Fix CUDA Library Path Errors
Required When Not Using Bundled CUDA
If using LD_LIBRARY_PATH
manually, ensure all CUDA libraries are discoverable:
# Add CUDA paths to library search path (customize version)
export LD_LIBRARY_PATH=/usr/local/cuda-12.0/lib64:$LD_LIBRARY_PATH
# Verify library discovery
ldconfig -N -v 2>/dev/null | grep libcudart
4. Fix NUMA Node Warning
For "negative NUMA node" errors
This kernel-related warning can be resolved by forcing NUMA node 0:
for node in /sys/bus/pci/devices/*/numa_node; do
[ "$(cat "$node")" == "-1" ] && echo 0 | sudo tee "$node"
done
Persistent fix:
# Find your GPU's PCI address
lspci | grep -i nvidia
# Apply NUMA override (replace 0000:01:00.0)
echo 'SUBSYSTEM=="pci", ATTR{"address"}=="0000:01:00.0", ATTR{"numa_node"}="0"' | sudo tee /etc/udev/rules.d/99-numa.rules
5. Manual CUDA Downgrade (If TF Versions Require CUDA 11)
Legacy Workaround Only
Use if facing TF version constraints when none of the above solutions work.
- Uninstall existing CUDA:
sudo apt-get purge "*cuda*" "*cublas*" "*cufft*" "*cusparse*"
Common Mistakes to Avoid
Incorrect Package Installation
# WRONG: Pinned version prevents dependency resolution
pyt pip install tensorflow[and-cuda]==2.12.0
# CORRECT: Install latest compatible versions
pip install tensorflow[and-cuda]
Unverified Virtual Environments
# BEFORE: Missing GPU
import tensorflow
tf.config.list_physical_devices('GPU') ➜ []
# AFTER: New virtual environment created
source new_venv/bin/activate
pip install tensorflow[and-cuda]
tf.config.list_physical_devices('GPU') ➜ [PhysicalDevice(...)]
Verification Workflow
- Hardware Check
nvidia-smi # Verify driver and GPU detection
nvcc --version # Check compiler version
- Library Validation
# Check critical libraries (example)
ldconfig -p | grep libcuda.so
- TensorFlow Tests
# GPU Availability
print("GPUs:", tf.config.list_physical_devices('GPU'))
- Environment Diagnostics
# Display TF compilation details
print(tf.sysconfig.get_build_info())