A200161411/cloud-container-engine

proposalbot 85e1a6ed92 Changes to cce_umn from docs/doc-exports#770 (Added the support of the OS for fe

Reviewed-by: Eotvos, Oliver <oliver.eotvos@t-systems.com>
Co-authored-by: proposalbot <proposalbot@otc-service.com>
Co-committed-by: proposalbot <proposalbot@otc-service.com>

2023-06-20 14:44:25 +00:00

4.4 KiB

Raw Permalink Blame History

original_name: cce_10_0345.html

GPU Scheduling

You can use GPUs in CCE containers.

Prerequisites

A GPU node has been created. For details, see Creating a Node <cce_10_0363>.
The gpu-beta add-on has been installed. During the installation, select the GPU driver on the node. For details, see gpu-beta <cce_10_0141>.
gpu-beta mounts the driver directory to /usr/local/nvidia/lib64. To use GPU resources in a container, you need to add /usr/local/nvidia/lib64 to the LD_LIBRARY_PATH environment variable.

Generally, you can use any of the following methods to add a file:
1. Configure the LD_LIBRARY_PATH environment variable in the Dockerfile used for creating an image. (Recommended)
```
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib64:$LD_LIBRARY_PATH
```
2. Configure the LD_LIBRARY_PATH environment variable in the image startup command.
```
/bin/bash -c "export LD_LIBRARY_PATH=/usr/local/nvidia/lib64:$LD_LIBRARY_PATH && ..."
```
3. Define the LD_LIBRARY_PATH environment variable when creating a workload. (Ensure that this variable is not configured in the container. Otherwise, it will be overwritten.)
```
env:
  - name: LD_LIBRARY_PATH
    value: /usr/local/nvidia/lib64
```

Using GPUs

Create a workload and request GPUs. You can specify the number of GPUs as follows:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gpu-test
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gpu-test
  template:
    metadata:
      labels:
        app: gpu-test
    spec:
      containers:
      - image: nginx:perl
        name: container-0
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
            nvidia.com/gpu: 1   # Number of requested GPUs
          limits:
            cpu: 250m
            memory: 512Mi
            nvidia.com/gpu: 1   # Maximum number of GPUs that can be used
      imagePullSecrets:
      - name: default-secret

nvidia.com/gpu specifies the number of GPUs to be requested. The value can be smaller than 1. For example, nvidia.com/gpu: 0.5 indicates that multiple pods share a GPU. In this case, all the requested GPU resources come from the same GPU card.

After nvidia.com/gpu is specified, workloads will not be scheduled to nodes without GPUs. If the node is GPU-starved, Kubernetes events similar to the following are reported:

0/2 nodes are available: 2 Insufficient nvidia.com/gpu.
0/4 nodes are available: 1 InsufficientResourceOnSingleGPU, 3 Insufficient nvidia.com/gpu.

To use GPUs on the CCE console, select the GPU quota and specify the percentage of GPUs reserved for the container when creating a workload.

GPU Node Labels

CCE will label GPU-enabled nodes after they are created. Different types of GPU-enabled nodes have different labels.

$ kubectl get node -L accelerator
NAME           STATUS   ROLES    AGE     VERSION                                    ACCELERATOR
10.100.2.179   Ready    <none>   8m43s   v1.19.10-r0-CCE21.11.1.B006-21.11.1.B006   nvidia-t4

When using GPUs, you can enable the affinity between pods and nodes based on labels so that the pods can be scheduled to the correct nodes.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gpu-test
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gpu-test
  template:
    metadata:
      labels:
        app: gpu-test
    spec:
      nodeSelector:
        accelerator: nvidia-t4
      containers:
      - image: nginx:perl
        name: container-0
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
            nvidia.com/gpu: 1   # Number of requested GPUs
          limits:
            cpu: 250m
            memory: 512Mi
            nvidia.com/gpu: 1   # Maximum number of GPUs that can be used
      imagePullSecrets:
      - name: default-secret

4.4 KiB Raw Permalink Blame History

GPU Scheduling

Prerequisites

Using GPUs

GPU Node Labels

4.4 KiB

Raw Permalink Blame History