Specifications for Custom Images Used for Training Jobs

When creating an image using locally developed models and training scripts, ensure that they meet the specifications defined by ModelArts.

Specifications

Overview of a Basic Image Package

To facilitate code download, training log output, and log file upload to OBS, ModelArts provides basic image packages for creating custom images. The basic images provided by ModelArts have the following features:

Run the following command to obtain a ModelArts image:

docker pull <Address for obtaining a basic image>

After customizing an image, upload it to SWR. Make sure that you have created an organization and obtained the password for logging in to SWR. For details, see .

docker push  swr.<region>.xxx.com/<Organization to which the target image belongs>/<Image name>

Obtain basic images based on chip requirements:

CPU-based Basic Images

Address for obtaining a basic image

swr.<region>.xxx.com/modelarts-job-dev-image/custom-cpu-base:1.3
Table 1 Optional parameters

Parameter

Optional Value

Description

<region>

eu-de

Region where the image resides.

Table 2 and Table 3 list the components and tools used by basic images.

Table 2 Components

Component

Description

run_train.sh

Training boot script. You can download the code directory, run training commands, redirect training log output, and upload log files to OBS after training commands are executed.

Table 3 Tool list

Tool

Description

utils.sh

Tool script. The run_train.sh script depends on this script.

It provides methods such as SK decryption, code directory download, and log file upload.

ip_mapper.py

Script for obtaining NIC addresses.

By default, the IP address of the ib0 NIC is obtained. Training code can use the IP address of the ib0 NIC to accelerate network communications.

dls-downloader.py

OBS download script. The utils.sh script depends on this script.

GPU-based Basic Images

Table 2 and Table 3 list the components and tools used by basic images.

Table 5 Components

Component

Description

run_train.sh

Training boot script. You can download the code directory, run training commands, redirect training log output, and upload log files to OBS after training commands are executed.

Table 6 Tool list

Tool

Description

utils.sh

Tool script. The run_train.sh script depends on this script.

It provides methods such as SK decryption, code directory download, and log file upload.

ip_mapper.py

Script for obtaining NIC addresses.

By default, the IP address of the ib0 NIC is obtained. Training code can use the IP address of the ib0 NIC to accelerate network communications.

dls-downloader.py

OBS download script. The utils.sh script depends on this script.