I'm running Kubernetes on Docker Desktop with WSL2 and have configured GPU monitoring using the NVIDIA GPU Operator and NVIDIA Device Plugin.
What I’ve Tried:
GPU Confirmed Working in WSL2:
nvidia-smi
works correctly and detects my NVIDIA RTX 4070 GPU.
- Running a CUDA container works as expected:
docker run --rm --gpus all nvidia/cuda:12.6.2-base-ubuntu22.04 nvidia-smi
- Output shows the correct CUDA version and GPU details.
Issue: Kubernetes Does Not Detect the GPU
- Running
kubectl get nodes -o=jsonpath='{.items[*].status.allocatable}'
does not list any GPU resources.
- Logs from
kubectl logs -n gpu-operator -l app=nvidia-device-plugin-daemonset
show an error: 'NVML not found'.
- Although the NVIDIA GPU Operator and Device Plugin are running, the GPU is not detected in Kubernetes.
Steps Taken:
Ensured that nvidia-container-runtime
is set correctly by editing /etc/docker/daemon.json
as follows:
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
Questions:
- Why is Kubernetes failing to detect the GPU?
- Why does the NVIDIA Device Plugin report 'could not load NVML library'?
- Is a special configuration needed for the GPU Operator to work in WSL2?
- What alternative debugging steps can confirm that NVML is correctly installed?
System Information:
-
OS: Ubuntu 24.04 LTS (WSL2)
-
Kubernetes: Docker Desktop with WSL2
-
NVIDIA Driver: 566.36
-
CUDA Version: 12.7 (verified with nvidia-smi
)
-
NVIDIA Container Toolkit: Latest version installed (nvidia-container-toolkit
)
-
NVIDIA GPU: RTX 4070 Laptop GPU
-
Docker Runtime:
docker info | grep -i runtime
Output:
Runtimes: io.containerd.runc.v2
nvidia runc Default Runtime: runc
If anyone has successfully set up the NVIDIA GPU Operator in a WSL2 environment with Kubernetes or has insights on these issues, your help would be greatly appreciated!