NVIDIA GPU is a device management add-on that supports GPUs in containers. To use GPU nodes in a cluster, this add-on must be installed.
If the add-on is uninstalled, GPU pods newly scheduled to the nodes cannot run properly, but GPU pods already running on the nodes will not be affected.
After the add-on is installed, run the nvidia-smi command on the GPU node and the container that schedules GPU resources to verify the availability of the GPU device and driver.
# If the add-on version is earlier than 2.0.0, run the following command: cd /opt/cloud/cce/nvidia/bin && ./nvidia-smi # If the add-on version is 2.0.0 or later and the driver installation path is changed, run the following command: cd /usr/local/nvidia/bin && ./nvidia-smi
cd /usr/local/nvidia/bin && ./nvidia-smi
If GPU information is returned, the device is available and the add-on has been installed.
When the node is restarted, the driver will be downloaded and installed again. Ensure that the OBS bucket link of the driver is valid.
Component |
Description |
Resource Type |
---|---|---|
nvidia-driver-installer |
Used for installing an NVIDIA driver on GPU nodes. |
DaemonSet |
Add-on Version |
Supported Cluster Version |
New Feature |
---|---|---|
2.6.4 |
v1.28 v1.29 |
Updated the isolation logic of GPU cards. |
2.6.1 |
v1.28 v1.29 |
Upgraded the base images of the add-on. |
2.5.6 |
v1.28 |
Fixed an issue that occurred during the installation of the driver. |
2.5.4 |
v1.28 |
Clusters 1.28 are supported. |
2.0.69 |
v1.21 v1.23 v1.25 v1.27 |
Upgraded the base images of the add-on. |
2.0.48 |
v1.21 v1.23 v1.25 v1.27 |
Fixed an issue that occurred during the installation of the driver. |
2.0.46 |
v1.21 v1.23 v1.25 v1.27 |
|
1.2.28 |
v1.19 v1.21 v1.23 v1.25 |
|
1.2.20 |
v1.19 v1.21 v1.23 v1.25 |
Set the add-on alias to gpu. |
1.2.15 |
v1.15 v1.17 v1.19 v1.21 v1.23 |
CCE clusters 1.23 are supported. |
1.2.9 |
v1.15 v1.17 v1.19 v1.21 |
CCE clusters 1.21 are supported. |
1.2.2 |
v1.15 v1.17 v1.19 |
Supported the new EulerOS kernel. |