:original_name: cce_10_0345.html .. _cce_10_0345: GPU Scheduling ============== You can use GPUs in CCE containers. Prerequisites ------------- - A GPU node has been created. For details, see :ref:`Creating a Node `. - The gpu-beta add-on has been installed. During the installation, select the GPU driver on the node. For details, see :ref:`gpu-beta `. - gpu-beta mounts the driver directory to **/usr/local/nvidia/lib64**. To use GPU resources in a container, you need to add **/usr/local/nvidia/lib64** to the **LD_LIBRARY_PATH** environment variable. Generally, you can use any of the following methods to add a file: #. Configure the **LD_LIBRARY_PATH** environment variable in the Dockerfile used for creating an image. (Recommended) .. code-block:: ENV LD_LIBRARY_PATH /usr/local/nvidia/lib64:$LD_LIBRARY_PATH #. Configure the **LD_LIBRARY_PATH** environment variable in the image startup command. .. code-block:: /bin/bash -c "export LD_LIBRARY_PATH=/usr/local/nvidia/lib64:$LD_LIBRARY_PATH && ..." #. Define the **LD_LIBRARY_PATH** environment variable when creating a workload. (Ensure that this variable is not configured in the container. Otherwise, it will be overwritten.) .. code-block:: env: - name: LD_LIBRARY_PATH value: /usr/local/nvidia/lib64 Using GPUs ---------- Create a workload and request GPUs. You can specify the number of GPUs as follows: .. code-block:: apiVersion: apps/v1 kind: Deployment metadata: name: gpu-test namespace: default spec: replicas: 1 selector: matchLabels: app: gpu-test template: metadata: labels: app: gpu-test spec: containers: - image: nginx:perl name: container-0 resources: requests: cpu: 250m memory: 512Mi nvidia.com/gpu: 1 # Number of requested GPUs limits: cpu: 250m memory: 512Mi nvidia.com/gpu: 1 # Maximum number of GPUs that can be used imagePullSecrets: - name: default-secret **nvidia.com/gpu** specifies the number of GPUs to be requested. The value can be smaller than **1**. For example, **nvidia.com/gpu: 0.5** indicates that multiple pods share a GPU. In this case, all the requested GPU resources come from the same GPU card. After **nvidia.com/gpu** is specified, workloads will not be scheduled to nodes without GPUs. If the node is GPU-starved, Kubernetes events similar to the following are reported: - 0/2 nodes are available: 2 Insufficient nvidia.com/gpu. - 0/4 nodes are available: 1 InsufficientResourceOnSingleGPU, 3 Insufficient nvidia.com/gpu. To use GPUs on the CCE console, select the GPU quota and specify the percentage of GPUs reserved for the container when creating a workload. .. figure:: /_static/images/en-us_image_0000001569022929.png :alt: **Figure 1** Using GPUs **Figure 1** Using GPUs GPU Node Labels --------------- CCE will label GPU-enabled nodes after they are created. Different types of GPU-enabled nodes have different labels. .. code-block:: $ kubectl get node -L accelerator NAME STATUS ROLES AGE VERSION ACCELERATOR 10.100.2.179 Ready 8m43s v1.19.10-r0-CCE21.11.1.B006-21.11.1.B006 nvidia-t4 When using GPUs, you can enable the affinity between pods and nodes based on labels so that the pods can be scheduled to the correct nodes. .. code-block:: apiVersion: apps/v1 kind: Deployment metadata: name: gpu-test namespace: default spec: replicas: 1 selector: matchLabels: app: gpu-test template: metadata: labels: app: gpu-test spec: nodeSelector: accelerator: nvidia-t4 containers: - image: nginx:perl name: container-0 resources: requests: cpu: 250m memory: 512Mi nvidia.com/gpu: 1 # Number of requested GPUs limits: cpu: 250m memory: 512Mi nvidia.com/gpu: 1 # Maximum number of GPUs that can be used imagePullSecrets: - name: default-secret