When creating an image using locally developed models and training scripts, ensure that they meet the specifications defined by ModelArts.
To facilitate code download, training log output, and log file upload to OBS, ModelArts provides basic image packages for creating custom images. The basic images provided by ModelArts have the following features:
Run the following command to obtain a ModelArts image:
docker pull <Address for obtaining a basic image>
After customizing an image, upload it to SWR. Make sure that you have created an organization and obtained the password for logging in to SWR. For details, see .
docker push swr.<region>.xxx.com/<Organization to which the target image belongs>/<Image name>
Obtain basic images based on chip requirements:
Address for obtaining a basic image
swr.<region>.xxx.com/modelarts-job-dev-image/custom-cpu-base:1.3
Parameter |
Optional Value |
Description |
---|---|---|
<region> |
eu-de |
Region where the image resides. |
Table 2 and Table 3 list the components and tools used by basic images.
Component |
Description |
---|---|
run_train.sh |
Training boot script. You can download the code directory, run training commands, redirect training log output, and upload log files to OBS after training commands are executed. |
Tool |
Description |
---|---|
utils.sh |
Tool script. The run_train.sh script depends on this script. It provides methods such as SK decryption, code directory download, and log file upload. |
ip_mapper.py |
Script for obtaining NIC addresses. By default, the IP address of the ib0 NIC is obtained. Training code can use the IP address of the ib0 NIC to accelerate network communications. |
dls-downloader.py |
OBS download script. The utils.sh script depends on this script. |
swr.<region>.xxx.com/modelarts-job-dev-image/custom-base-<cuda version>-<python version>-<os>-<arch>:<image tag>
swr.<region>.xxx.com/modelarts-job-dev-image/custom-gpu-<cuda version>-inner-moxing-<python version>:<image tag>
swr.<region>.xxx.com/modelarts-job-dev-image/custom-gpu-<cuda version>-base:<image tag>
Parameter |
Possible Value |
Description |
---|---|---|
<region> |
eu-de |
Region where the image resides. |
<cuda version> |
|
CUDA version installed in the image NOTE:
Check the CUDA version. After the version is specified, it cannot be changed. Otherwise, the training will fail. |
<image tag> |
|
Image version
|
python version |
|
Python environment |
os |
ubuntu18.04 |
Operating system |
arch |
x86 |
Architecture |
Table 2 and Table 3 list the components and tools used by basic images.
Component |
Description |
---|---|
run_train.sh |
Training boot script. You can download the code directory, run training commands, redirect training log output, and upload log files to OBS after training commands are executed. |
Tool |
Description |
---|---|
utils.sh |
Tool script. The run_train.sh script depends on this script. It provides methods such as SK decryption, code directory download, and log file upload. |
ip_mapper.py |
Script for obtaining NIC addresses. By default, the IP address of the ib0 NIC is obtained. Training code can use the IP address of the ib0 NIC to accelerate network communications. |
dls-downloader.py |
OBS download script. The utils.sh script depends on this script. |