proposalbot 85e1a6ed92 Changes to cce_umn from docs/doc-exports#770 (Added the support of the OS for fe
Reviewed-by: Eotvos, Oliver <oliver.eotvos@t-systems.com>
Co-authored-by: proposalbot <proposalbot@otc-service.com>
Co-committed-by: proposalbot <proposalbot@otc-service.com>
2023-06-20 14:44:25 +00:00

139 lines
4.4 KiB
ReStructuredText

:original_name: cce_10_0345.html
.. _cce_10_0345:
GPU Scheduling
==============
You can use GPUs in CCE containers.
Prerequisites
-------------
- A GPU node has been created. For details, see :ref:`Creating a Node <cce_10_0363>`.
- The gpu-beta add-on has been installed. During the installation, select the GPU driver on the node. For details, see :ref:`gpu-beta <cce_10_0141>`.
- gpu-beta mounts the driver directory to **/usr/local/nvidia/lib64**. To use GPU resources in a container, you need to add **/usr/local/nvidia/lib64** to the **LD_LIBRARY_PATH** environment variable.
Generally, you can use any of the following methods to add a file:
#. Configure the **LD_LIBRARY_PATH** environment variable in the Dockerfile used for creating an image. (Recommended)
.. code-block::
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib64:$LD_LIBRARY_PATH
#. Configure the **LD_LIBRARY_PATH** environment variable in the image startup command.
.. code-block::
/bin/bash -c "export LD_LIBRARY_PATH=/usr/local/nvidia/lib64:$LD_LIBRARY_PATH && ..."
#. Define the **LD_LIBRARY_PATH** environment variable when creating a workload. (Ensure that this variable is not configured in the container. Otherwise, it will be overwritten.)
.. code-block::
env:
- name: LD_LIBRARY_PATH
value: /usr/local/nvidia/lib64
Using GPUs
----------
Create a workload and request GPUs. You can specify the number of GPUs as follows:
.. code-block::
apiVersion: apps/v1
kind: Deployment
metadata:
name: gpu-test
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: gpu-test
template:
metadata:
labels:
app: gpu-test
spec:
containers:
- image: nginx:perl
name: container-0
resources:
requests:
cpu: 250m
memory: 512Mi
nvidia.com/gpu: 1 # Number of requested GPUs
limits:
cpu: 250m
memory: 512Mi
nvidia.com/gpu: 1 # Maximum number of GPUs that can be used
imagePullSecrets:
- name: default-secret
**nvidia.com/gpu** specifies the number of GPUs to be requested. The value can be smaller than **1**. For example, **nvidia.com/gpu: 0.5** indicates that multiple pods share a GPU. In this case, all the requested GPU resources come from the same GPU card.
After **nvidia.com/gpu** is specified, workloads will not be scheduled to nodes without GPUs. If the node is GPU-starved, Kubernetes events similar to the following are reported:
- 0/2 nodes are available: 2 Insufficient nvidia.com/gpu.
- 0/4 nodes are available: 1 InsufficientResourceOnSingleGPU, 3 Insufficient nvidia.com/gpu.
To use GPUs on the CCE console, select the GPU quota and specify the percentage of GPUs reserved for the container when creating a workload.
.. figure:: /_static/images/en-us_image_0000001569022929.png
:alt: **Figure 1** Using GPUs
**Figure 1** Using GPUs
GPU Node Labels
---------------
CCE will label GPU-enabled nodes after they are created. Different types of GPU-enabled nodes have different labels.
.. code-block::
$ kubectl get node -L accelerator
NAME STATUS ROLES AGE VERSION ACCELERATOR
10.100.2.179 Ready <none> 8m43s v1.19.10-r0-CCE21.11.1.B006-21.11.1.B006 nvidia-t4
When using GPUs, you can enable the affinity between pods and nodes based on labels so that the pods can be scheduled to the correct nodes.
.. code-block::
apiVersion: apps/v1
kind: Deployment
metadata:
name: gpu-test
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: gpu-test
template:
metadata:
labels:
app: gpu-test
spec:
nodeSelector:
accelerator: nvidia-t4
containers:
- image: nginx:perl
name: container-0
resources:
requests:
cpu: 250m
memory: 512Mi
nvidia.com/gpu: 1 # Number of requested GPUs
limits:
cpu: 250m
memory: 512Mi
nvidia.com/gpu: 1 # Maximum number of GPUs that can be used
imagePullSecrets:
- name: default-secret