Add the GPU operator to a rancher deployed k8s environment
Jump to navigation
Jump to search
Setup helm
# add the gpu-operator (Mac osx) brew install helm
Add the Nvidia repo
helm repo add nvidia https://nvidia.github.io/gpu-operator helm repo update
Install the GPU operator
david@Davids-MacBook-Air-2 ~ % helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator
NAME: gpu-operator-1676571811
LAST DEPLOYED: Thu Feb 16 18:23:33 2023
NAMESPACE: gpu-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None
Check the status after the install
david@Davids-MacBook-Air-2 ~ % kubectl get pods -n gpu-operator NAME READY STATUS RESTARTS AGE gpu-feature-discovery-rxh9p 1/1 Running 0 13h gpu-operator-1676571811-node-feature-discovery-master-5d45zf949 1/1 Running 0 13h gpu-operator-1676571811-node-feature-discovery-worker-zkqhn 1/1 Running 0 13h gpu-operator-6c4c6f484-k97n9 1/1 Running 0 13h nvidia-container-toolkit-daemonset-snzzv 1/1 Running 0 13h nvidia-cuda-validator-vwldd 0/1 Completed 0 7h10m nvidia-dcgm-exporter-bmbwn 1/1 Running 0 13h nvidia-device-plugin-daemonset-9jvxm 1/1 Running 0 13h nvidia-device-plugin-validator-dlblr 0/1 Completed 0 7h10m nvidia-driver-daemonset-lppgj 1/1 Running 32 (7h19m ago) 13h nvidia-mig-manager-lrx6m 1/1 Running 0 13h nvidia-operator-validator-5dtlz 1/1 Running 0 13h
Run a gpu test job - Nvidia-smi
david@Davids-MacBook-Air-2 ~ % kubectl run gpu-test \ --rm -t -i \ --restart=Never \ --image=nvcr.io/nvidia/cuda:10.1-base-ubuntu18.04 nvidia-smi Fri Feb 17 08:09:04 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.60.13 Driver Version: 525.60.13 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A100-SXM... On | 00000000:00:05.0 Off | 0 | | N/A 26C P0 50W / 400W | 0MiB / 81920MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ pod "gpu-test" deleted david@Davids-MacBook-Air-2 ~ %