NVIDIA GPU is a device management add-on that supports GPUs in containers. To use GPU nodes in a cluster, this add-on must be installed.
If the add-on is uninstalled, GPU pods newly scheduled to the nodes cannot run properly, but GPU pods already running on the nodes will not be affected.
After the add-on is installed, run the nvidia-smi command on the GPU node and the container that schedules GPU resources to verify the availability of the GPU device and driver.
# If the add-on version is earlier than 2.0.0, run the following command: cd /opt/cloud/cce/nvidia/bin && ./nvidia-smi # If the add-on version is 2.0.0 or later and the driver installation path is changed, run the following command: cd /usr/local/nvidia/bin && ./nvidia-smi
cd /usr/local/nvidia/bin && ./nvidia-smi
If GPU information is returned, the device is available and the add-on has been installed.
When the node is restarted, the driver will be downloaded and installed again. Ensure that the OBS bucket link of the driver is valid.
Component |
Description |
Resource Type |
---|---|---|
nvidia-driver-installer |
Used for installing an NVIDIA driver on GPU nodes. |
DaemonSet |