If the framework used for algorithm development is not a frequently-used framework, you can build an algorithm into a custom image and use the custom image to create a training job.
Specify Name and Description according to actual requirements.
Parameter |
Sub-Parameter |
Description |
---|---|---|
One-Click Configuration |
- |
If you have saved job parameter configurations in ModelArts, click One-Click Configuration and select an existing job parameter configuration as prompted to quickly complete parameter setting for the job. |
Algorithm Source |
Custom |
For details about custom image specifications, see Specifications for Custom Images Used for Training Jobs.
|
Data Source |
Dataset |
Select an available dataset and its version from the ModelArts Data Management module.
|
Data path |
Select the training data from your OBS bucket. On the right of the Data path text box, click Select. In the dialog box that is displayed, select an OBS folder for storing data. |
|
Training Output Path |
- |
Storage path of the training result NOTE:
To minimize errors, select an empty directory for Training Output Path. Do not select the directory used for storing the dataset for Training Output Path. |
Environment Variable |
- |
Add environment variables based on your image file. This parameter is optional. You can click Add Environment Variable to add multiple variable parameters. |
Job Log Path |
- |
Select a path for storing log files generated during job running. |
Parameter |
Description |
---|---|
Resource Pool |
Select resource pools for the job. CPU- and GPU-based public resource pools are supported. Their application scenarios and charges are different. |
Type |
If Resource Pool is set to Public resource pools, select a resource type. Available resource types are CPU and GPU. The GPU resource delivers better performance, and the CPU resource is more cost effective. If the selected algorithm has been defined to use the CPU or GPU, the resource type is automatically displayed on the page. Select the resource type as required. NOTE:
|
Specifications |
Select a resource flavor based on the resource type. |
Compute Nodes |
Set the number of compute nodes. If you set Compute Nodes to 1, the standalone computing mode is used. If you set Compute Nodes to a value greater than 1, the distributed computing mode is used. Select a computing mode based on the actual requirements. |
Parameter |
Description |
---|---|
Saving Training Parameters |
If you select this option, the parameter settings of the current job will be saved to facilitate subsequent job creation. Select Save Training Parameters and specify Configuration Name and Description. After a training job is created, you can switch to the Job Parameters tab page to view your saved job parameter settings. For details, see Managing Job Parameters. |
After a custom image job is created, the system authorizes ModelArts to obtain and run the image by default. When you run a custom image job for the first time, ModelArts checks the custom image. For details about the check, see Specifications for Custom Images Used for Training Jobs. You can view the cause of the check failure in the log and modify the custom image based on the log.
After the image is checked, the background starts the custom image container to run the custom image training job. You can switch to the training job list to view the basic information about training jobs. In the training job list, Status of the newly created training job is Initializing. If the status changes to Successful, the training job ends and the model generated is stored in the location specified by Training Output Path. If the status of a training job changes to Running failed. Click the name of the training job and view the job logs. Troubleshoot the fault based on the logs.