This is a draft document that was built and uploaded automatically. It may document beta software and be incomplete or even incorrect. Use this document at your own risk.
This feature is offered as a "tech preview".
We release this as a tech-preview in order to get early feedback from our customers. Tech previews are largely untested, unsupported, and thus not ready for production use.
That said, we strongly believe this technology is useful at this stage in order to make the right improvements based on your feedback. A fully supported, production-ready release is planned for a later point in time.
Graphics Processing Units (GPUs) provide a powerful way to run compute-intensive workloads such as machine learning pipelines. SUSE’s CaaS Platform supports scheduling GPU-dependent workloads on NVIDIA GPUs as a technical preview. This section illustrates how to prepare your host machine to expose GPU devices to your containers, and how to configure Kubernetes to schedule GPU-dependent workloads.
Not every worker node in the cluster need have a GPU device present. On the nodes that do have one or more NVIDIA GPUs, install the drivers from NVIDIA’s repository.
# zypper addrepo --refresh https://download.nvidia.com/suse/sle15sp2/ nvidia
# zypper refresh
# zypper install nvidia-glG05 nvidia-computeG05For most modern NVIDIA GPUs, the G05 driver will support your device. Check NVIDIA’s documentation for your GPU device model.
OCI hooks are a way for vendors or projects to inject executable actions into the lifecycle of a container managed by the container runtime (runc). SUSE provides an OCI hook for NVIDIA that enable the container runtime and therefor the kubelet and the Kubernetes scheduler to query the host system for the presence of a GPU device and access it directly. Install the hook on the worker nodes with GPUs:
# zypper install nvidia-container-toolkitAt this point, you should be able to run a container image that requires a GPU and directly access the device from the running container, for example using Podman:
# podman run docker.io/nvidia/cuda nvidia-smiAt this point, you should be able to run a container image using a GPU. If that is not working, check the following:
Ensure your GPU is visible from the host system:
# lspci | grep -i nvidia
# nvidia-smiEnsure the kernel modules are loaded:
# lsmod | grep nvidiaIf they are not, try loading them explicitly and check dmesg for an error indicating why they are missing:
# nvidia-modprobe
# dmesg | tailThe Kubernetes device plugin framework allows the kubelet to advertise system hardware resources that the Kubernetes scheduler can then use as hints to schedule workloads that require such devices. The Kubernetesdevice plugin from NVIDIA allows the kubelet to advertise NVIDIA GPUs it finds present on the worker node. Install the device plugin using kubectl:
$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta6/nvidia-device-plugin.ymlIn a heterogeneous cluster, it may be preferable to prevent scheduling pods that do not require a GPU on nodes with a GPU in order to ensure that GPU workloads are not competing for time on the hardware they need to run. To accomplish this, add a taint to the worker nodes that have GPUs:
$ kubectl taint nodes worker0 nvidia.com/gpu=:PreferNoScheduleor
$ kubectl taint nodes worker0 nvidia.com/gpu=:NoScheduleSee the Kubernetes documentation on taints and tolerations for a discussion on the considerations for using the NoSchedule or PreferNoSchedule effects. If you use the NoSchedule effect, you must also add the appropriate toleration to infrastructure-critical Daemonsets that must run on all nodes, such as the kured, kube-proxy, and cilium Daemonsets.
The ExtendedResourceToleration admission controller is enabled on SUSE CaaS Platform v5 by default. This is a mutating admission controller that reviews all pod requests and adds tolerations to any pod that requests an extended resource advertised by a device plugin. For the NVIDIA GPU device plugin, it will automatically add the nvidia.com/gpu toleration to pods that request the nvidia.com/gpu resource, so you will not need to add this toleration explicitly for every GPU workload.
To test your installation you can create a pod that requests GPU devices:
$ kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
containers:
- name: cuda-container
image: nvidia/cuda:9.0-devel
resources:
limits:
nvidia.com/gpu: 1 # requesting 1 GPU
- name: digits-container
image: nvidia/digits:6.0
resources:
limits:
nvidia.com/gpu: 1 # requesting 1 GPU
EOFThis example requests a total of two GPUs for two containers. If two GPUs are available on a worker in your cluster, this pod will be scheduled to that worker.
At this point, after a few moments your pod should transition to state "running". If it is not, check the following:
Examine the pod events for an indication of why it is not being scheduled:
$ kubectl describe pod gpu-podExamine the events for the device plugin daemonset for any issues:
$ kubectl describe daemonset nvidia-device-plugin-daemonset --namespace kube-systemCheck the logs of each pod in the daemonset running on a worker that has a GPU:
$ kubectl logs -l name=nvidia-device-plugin-ds --namespace kube-systemCheck the kubelet log on the worker node that has a GPU. This may indicate errors the container runtime had executing the OCI hook command:
# journalctl -u kubeletIf you have configured Monitoring for your cluster, you may want to use NVIDIA’s Data Center GPU Manager (DCGM) to monitor your GPUs. DCGM integrates with the Prometheus and Grafana services configured for your cluster. Follow the steps below to configure the Prometheus exporter and Grafana dashboard for your NVIDIA GPUs.
The DCGM requires use of the hostPath volume type to access the kubelet socket on the host worker node. Create an appropriate Pod Security Policy and RBAC configuration to allow this:
$ kubectl apply -f - <<EOF
---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: nvidia.dcgm
spec:
privileged: false
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
runAsUser:
rule: RunAsAny
fsGroup:
rule: RunAsAny
allowedHostPaths:
- pathPrefix: /var/lib/kubelet/pod-resources
volumes:
- hostPath
- configMap
- secret
- emptyDir
- downwardAPI
- projected
- persistentVolumeClaim
- nfs
- rbd
- cephFS
- glusterfs
- fc
- iscsi
- cinder
- gcePersistentDisk
- awsElasticBlockStore
- azureDisk
- azureFile
- vsphereVolume
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: nvidia:dcgm
rules:
- apiGroups:
- policy
resources:
- podsecuritypolicies
verbs:
- use
resourceNames:
- nvidia.dcgm
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: nvidia:dcgm
roleRef:
kind: ClusterRole
name: nvidia:dcgm
apiGroup: rbac.authorization.k8s.io
subjects:
- kind: Group
name: system:serviceaccounts:dcgm
EOFThe DCGM exporter monitors GPUs on each worker node and exposes metrics that can be queried.
$ kubectl create namespace dcgm
$ kubectl create --namespace dcgm -f https://raw.githubusercontent.com/NVIDIA/gpu-monitoring-tools/master/dcgm-exporter.yamlCheck that the metrics are being collected:
$ NAME=$(kubectl get pods --namespace dcgm -l "app.kubernetes.io/name=dcgm-exporter" -o "jsonpath={ .items[0].metadata.name}")
$ kubectl port-forward $NAME 8080:9400
$ # in another terminal
$ curl http://127.0.0.1:8080/metricsAfter deploying Prometheus as explained in Monitoring, configure Prometheus to monitor the DCGM pods. Gather the cluster IPs of the pods to monitor:
$ kubectl get pods --namespace dcgm -l "app.kubernetes.io/name=dcgm-exporter" -o "jsonpath={ .items[*].status.podIP}"
10.244.1.10 10.244.2.68Add the DCGM pods to Prometheus’s scrape configuration. Edit the Prometheus configmap:
$ kubectl edit --namespace monitoring configmap prometheus-serverUnder the scrape_configs section add a new job, using the pod IPs found above:
scrape_configs:
...
- job_name: dcgm
static_configs:
- targets: ['10.244.1.10:9400', '10.244.2.68:9400']
...Prometheus will automatically reload the new configuration.
Import the DCGM Exporter dashboard into Grafana.
In the Grafana web interface, navigate to › . In the field _Import via grafana.com, enter the dashboard ID 12219, and click .
Alternatively, download the dashboard JSON definition, and upload it with the button.
On the next page, in the dropdown menu › › .
The new dashboard will appear in the Grafana web interface.