Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: Lai, Weijian <laiweijian4@huawei.com> Co-committed-by: Lai, Weijian <laiweijian4@huawei.com>
294 KiB
Creating a Training Job
Function
This API is used to create a training job.
URI
POST /v2/{project_id}/training-jobs
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
project_id |
Yes |
String |
Project ID. For details, see Obtaining a Project ID. |
Request Parameters
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
kind |
Yes |
String |
Training job type, which is job by default. Options:
|
metadata |
Yes |
JobMetadata object |
Metadata of a training job. |
algorithm |
No |
JobAlgorithm object |
Algorithm for training jobs. The following formats are supported:
|
tasks |
No |
Array of Task objects |
List of tasks in heterogeneous training jobs. If this parameter is specified, leave the spec parameter blank. |
spec |
No |
spec object |
Specifications of a training job. If this parameter is specified, leave the tasks parameter blank. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
id |
No |
String |
Training job ID, which is generated and returned by ModelArts after the training job is created. |
name |
Yes |
String |
Name of a training job. The value must contain 1 to 64 characters consisting of only digits, letters, underscores (_), and hyphens (-). |
workspace_id |
No |
String |
Workspace where a job is located. The default value is 0. |
description |
No |
String |
Training job description. The value must contain 0 to 256 characters. The default value is NULL. |
create_time |
No |
Long |
Timestamp when a training job is created, in milliseconds. The value is generated and returned by ModelArts after the job is created. |
user_name |
No |
String |
Username for creating a training job. The username is generated and returned by ModelArts after the training job is created. |
annotations |
No |
Map<String,String> |
Declaration template of a training job. For heterogeneous jobs, the default value of job_template is Template RL. For other jobs, the default value is Template DL. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
id |
No |
String |
Algorithm ID. |
name |
No |
String |
Algorithm name. Leave it blank. |
subscription_id |
No |
String |
Subscription ID of the subscription algorithm. This parameter must be used together with item_version_id. |
item_version_id |
No |
String |
Version ID of the subscription algorithm. This parameter must be used together with subscription_id. |
code_dir |
No |
String |
Code directory of a training job, for example, /usr/app/. This parameter must be used together with boot_file. If id or subscription_id+item_version_id is set, leave it blank. |
boot_file |
No |
String |
Boot file of a training job, which needs to be stored in the code directory, for example, /usr/app/boot.py. This parameter must be used together with code_dir. If id or subscription_id+item_version_id is set, leave it blank. |
autosearch_config_path |
No |
String |
YAML configuration path of auto search jobs. An OBS URL is required. |
autosearch_framework_path |
No |
String |
Framework code directory of auto search jobs. An OBS URL is required. |
command |
No |
String |
Command for starting the container of the custom image of a training job in the custom image scenario. |
parameters |
No |
Array of parameters objects |
Running parameters of a training job. |
policies |
No |
policies object |
Policies supported by jobs, which are used for hyperparameter search. |
inputs |
No |
Array of Input objects |
Input of a training job. |
outputs |
No |
Array of Output objects |
Output of a training job. |
engine |
No |
engine object |
Engine of a training job. Leave this parameter blank if the job is created using id of the algorithm management API or subscription_id+item_version_id of the subscription algorithm API. |
environments |
No |
Array of Map<String,String> objects |
Environment variables of a training job. The format is key: value. Leave this parameter blank. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
name |
No |
String |
Parameter name. |
value |
No |
String |
Parameter value. |
description |
No |
String |
Parameter description. |
constraint |
No |
constraint object |
Parameter constraint. |
i18n_description |
No |
i18n_description object |
Internationalization description. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
type |
No |
String |
Parameter type. |
editable |
No |
Boolean |
Whether the parameter is editable. |
required |
No |
Boolean |
Whether the parameter is mandatory. |
sensitive |
No |
Boolean |
Whether the parameter is sensitive. |
valid_type |
No |
String |
Valid type. |
valid_range |
No |
Array of strings |
Valid range. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
language |
No |
String |
Internationalization language. |
description |
No |
String |
Description. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
auto_search |
No |
auto_search object |
Hyperparameter search configuration. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
skip_search_params |
No |
String |
Hyperparameter parameters that need to be skipped. |
reward_attrs |
No |
Array of reward_attrs objects |
List of search metrics. |
search_params |
No |
Array of search_params objects |
Search parameters. |
algo_configs |
No |
Array of algo_configs objects |
Search algorithm configurations. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
name |
No |
String |
Metric name. |
mode |
No |
String |
Search direction.
|
regex |
No |
String |
Regular expression of a metric. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
name |
No |
String |
Hyperparameter name. |
param_type |
No |
String |
Parameter type.
|
lower_bound |
No |
String |
Lower bound of the hyperparameter. |
upper_bound |
No |
String |
Upper bound of the hyperparameter. |
discrete_points_num |
No |
String |
Number of discrete points of a continuous hyperparameter. |
discrete_values |
No |
Array of strings |
List of discrete hyperparameter values. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
name |
No |
String |
Name of the search algorithm. |
params |
No |
Array of AutoSearchAlgoConfigParameter objects |
Search algorithm parameters. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
key |
No |
String |
Parameter key. |
value |
No |
String |
Parameter value. |
type |
No |
String |
Parameter type. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
engine_id |
No |
String |
Engine ID selected for a training job. You can set this parameter to engine_id, engine_name + engine_version, or image_url. |
engine_name |
No |
String |
Name of the engine selected for a training job. If engine_id is set, leave this parameter blank. |
engine_version |
No |
String |
Name of the engine version selected for a training job. If engine_id is set, leave this parameter blank. |
image_url |
No |
String |
Custom image URL selected for a training job. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
role |
No |
String |
Role of a heterogeneous training job. Options:
|
algorithm |
No |
algorithm object |
Algorithm management and configuration. |
task_resource |
No |
task_resource object |
Resource flavors of a training job. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
job_config |
No |
job_config object |
Algorithm configuration, such as the boot file. |
code_dir |
No |
String |
Algorithm code directory, for example, /usr/app/. This parameter must be used together with boot_file. |
boot_file |
No |
String |
Code boot file of the algorithm, which needs to be stored in the code directory, for example, /usr/app/boot.py. This parameter must be used together with code_dir. |
engine |
No |
engine object |
Engine of a heterogeneous job algorithm. |
inputs |
No |
Array of inputs objects |
Data input of an algorithm. |
outputs |
No |
Array of outputs objects |
Data output of an algorithm. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
parameters |
No |
Array of Parameter objects |
Running parameter of an algorithm. |
inputs |
No |
Array of Input objects |
Data input of an algorithm. |
outputs |
No |
Array of Output objects |
Data output of an algorithm. |
engine |
No |
engine object |
Algorithm engine. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
name |
No |
String |
Parameter name. |
value |
No |
String |
Parameter value. |
description |
No |
String |
Parameter description. |
constraint |
No |
constraint object |
Parameter constraint. |
i18n_description |
No |
i18n_description object |
Internationalization description. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
type |
No |
String |
Parameter type. |
editable |
No |
Boolean |
Whether the parameter is editable. |
required |
No |
Boolean |
Whether the parameter is mandatory. |
sensitive |
No |
Boolean |
Whether the parameter is sensitive. |
valid_type |
No |
String |
Valid type. |
valid_range |
No |
Array of strings |
Valid range. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
language |
No |
String |
Internationalization language. |
description |
No |
String |
Description. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
name |
Yes |
String |
Name of the data input channel. |
description |
No |
String |
Description of the data input channel. |
local_dir |
No |
String |
Local directory of the container to which the data input channel is mapped. |
remote |
Yes |
InputDataInfo object |
Data input. Options:
|
remote_constraint |
No |
Array of remote_constraint objects |
Data input constraint. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
dataset |
No |
dataset object |
Dataset as the data input. |
obs |
No |
obs object |
OBS in which data input and output stored. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
id |
Yes |
String |
Dataset ID of a training job. |
version_id |
Yes |
String |
Dataset version ID of a training job. |
obs_url |
No |
String |
OBS URL of the dataset required by a training job. ModelArts automatically parses and generates the URL based on the dataset and dataset version IDs. For example, /usr/data/. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
obs_url |
Yes |
String |
OBS URL of the dataset required by a training job. For example, /usr/data/. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
data_type |
No |
String |
Data input type, including the data storage location and dataset. |
attributes |
No |
String |
Attributes if a dataset is used as the data input. Options:
|
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
name |
Yes |
String |
Name of the data output channel. |
description |
No |
String |
Description of the data output channel. |
local_dir |
No |
String |
Local directory of the container to which the data output channel is mapped. |
remote |
Yes |
remote object |
Description of the actual data output. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
obs |
Yes |
obs object |
OBS to which data is actually exported. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
obs_url |
Yes |
String |
OBS URL to which data is actually exported. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
engine_id |
No |
String |
Engine ID selected for an algorithm. |
engine_name |
No |
String |
Engine version name selected for an algorithm. If engine_id is specified, leave this parameter blank. |
engine_version |
No |
String |
Engine version name selected for an algorithm. If engine_id is specified, leave this parameter blank. |
image_url |
No |
String |
Custom image URL selected by an algorithm. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
engine_id |
No |
String |
Engine ID of a heterogeneous job, for example, caffe-1.0.0-python2.7. |
engine_name |
No |
String |
Engine name of a heterogeneous job, for example, Caffe. |
engine_version |
No |
String |
Engine version of a heterogeneous job. |
image_url |
No |
String |
Custom image URL selected by an algorithm. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
name |
Yes |
String |
Name of the data input channel. |
description |
No |
String |
Description of the data input channel. |
local_dir |
No |
String |
Local directory of the container to which the data input channel is mapped. |
remote |
Yes |
remote object |
Data input. Options:
|
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
obs |
No |
obs object |
OBS in which data input and output stored. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
obs_url |
Yes |
String |
OBS URL of the dataset required by a training job. For example, /usr/data/. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
name |
Yes |
String |
Name of the data output channel. |
description |
No |
String |
Description of the data output channel. |
local_dir |
No |
String |
Local directory of the container to which the data output channel is mapped. |
remote |
Yes |
remote object |
Description of the actual data output. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
obs |
Yes |
obs object |
OBS to which data is actually exported. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
obs_url |
Yes |
String |
OBS URL to which data is actually exported. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
flavor_id |
No |
String |
Resource flavor ID of a training job. |
node_count |
Yes |
Integer |
Number of resource replicas selected for a training job. Minimum: 1 |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
resource |
No |
resource object |
Resource flavors of a training job. Select either flavor_id or pool_id+[flavor_id]. |
volumes |
No |
Array of volumes objects |
Volumes attached to a training job. |
log_export_path |
No |
log_export_path object |
Export path of training job logs. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
flavor_id |
Yes |
String |
ID of the resource flavors selected for a training job. |
node_count |
No |
Integer |
Number of nodes used for creating a training job in a pool. By default, a single node is used. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
nfs |
No |
nfs object |
Volumes attached in NFS mode. |
Response Parameters
Status code: 201
Parameter |
Type |
Description |
---|---|---|
kind |
String |
Training job type, which is job by default. Options:
|
metadata |
JobMetadata object |
Metadata of a training job. |
status |
Status object |
Status of a training job. You do not need to set this parameter when creating a job. |
algorithm |
JobAlgorithmResponse object |
Algorithm for training jobs. The following formats are supported:
|
tasks |
Array of TaskResponse objects |
List of tasks in heterogeneous training jobs. |
spec |
spec object |
Specifications of a training job. |
Parameter |
Type |
Description |
---|---|---|
id |
String |
Training job ID, which is generated and returned by ModelArts after the training job is created. |
name |
String |
Name of a training job. The value must contain 1 to 64 characters consisting of only digits, letters, underscores (_), and hyphens (-). |
workspace_id |
String |
Workspace where a job is located. The default value is 0. |
description |
String |
Training job description. The value must contain 0 to 256 characters. The default value is NULL. |
create_time |
Long |
Timestamp when a training job is created, in milliseconds. The value is generated and returned by ModelArts after the job is created. |
user_name |
String |
Username for creating a training job. The username is generated and returned by ModelArts after the training job is created. |
annotations |
Map<String,String> |
Declaration template of a training job. For heterogeneous jobs, the default value of job_template is Template RL. For other jobs, the default value is Template DL. |
Parameter |
Type |
Description |
---|---|---|
phase |
String |
Level-1 status of a training job. The value is stable. The options are as follows: Creating Pending Running Failed Completed, Terminating Terminated Abnormal |
secondary_phase |
String |
Level-2 status of a training job. The value is unstable. The options are as follows: Creating Queuing Running Failed Completed Terminating Terminated CreateFailed TerminatedFailed Unknown Lost |
duration |
Long |
Running duration of a training job, in milliseconds |
node_count_metrics |
Array<Array<Integer>> |
Node count changes during the training job running period. |
tasks |
Array of strings |
Tasks of a training job. |
start_time |
String |
Start time of a training job. The value is in timestamp format. |
task_statuses |
Array of task_statuses objects |
Status of a training job task. |
Parameter |
Type |
Description |
---|---|---|
task |
String |
Name of a training job task. |
exit_code |
Integer |
Exit code of a training job task. |
message |
String |
Error message of a training job task. |
Parameter |
Type |
Description |
---|---|---|
id |
String |
Algorithm for training jobs. Options:
|
name |
String |
Algorithm name. |
subscription_id |
String |
Subscription ID of the subscription algorithm. This parameter must be used together with item_version_id. |
item_version_id |
String |
Version ID of the subscription algorithm. This parameter must be used together with subscription_id. |
code_dir |
String |
Code directory of a training job, for example, /usr/app/. This parameter must be used together with boot_file. If id or subscription_id+item_version_id is set, leave it blank. |
boot_file |
String |
Boot file of a training job, which needs to be stored in the code directory, for example, /usr/app/boot.py. This parameter must be used together with code_dir. If id or subscription_id+item_version_id is set, leave it blank. |
autosearch_config_path |
String |
YAML configuration path of auto search jobs. An OBS URL is required. |
autosearch_framework_path |
String |
Framework code directory of auto search jobs. An OBS URL is required. |
command |
String |
Boot command used to start the container of the custom image used by a training job. You can set this parameter to code_dir. |
parameters |
Array of Parameter objects |
Running parameters of a training job. |
policies |
policies object |
Policies supported by jobs. |
inputs |
Array of Input objects |
Input of a training job. |
outputs |
Array of Output objects |
Output of a training job. |
engine |
engine object |
Engine of a training job. Leave this parameter blank if the job is created using id of the algorithm management API or subscription_id+item_version_id of the subscription algorithm API. |
environments |
Array of Map<String,String> objects |
Environment variables of a training job. The format is key: value. Leave this parameter blank. |
Parameter |
Type |
Description |
---|---|---|
name |
String |
Parameter name. |
value |
String |
Parameter value. |
description |
String |
Parameter description. |
constraint |
constraint object |
Parameter constraint. |
i18n_description |
i18n_description object |
Internationalization description. |
Parameter |
Type |
Description |
---|---|---|
type |
String |
Parameter type. |
editable |
Boolean |
Whether the parameter is editable. |
required |
Boolean |
Whether the parameter is mandatory. |
sensitive |
Boolean |
Whether the parameter is sensitive. |
valid_type |
String |
Valid type. |
valid_range |
Array of strings |
Valid range. |
Parameter |
Type |
Description |
---|---|---|
language |
String |
Internationalization language. |
description |
String |
Description. |
Parameter |
Type |
Description |
---|---|---|
auto_search |
auto_search object |
Hyperparameter search configuration. |
Parameter |
Type |
Description |
---|---|---|
skip_search_params |
String |
Hyperparameter parameters that need to be skipped. |
reward_attrs |
Array of reward_attrs objects |
List of search metrics. |
search_params |
Array of search_params objects |
Search parameters. |
algo_configs |
Array of algo_configs objects |
Search algorithm configurations. |
Parameter |
Type |
Description |
---|---|---|
name |
String |
Metric name. |
mode |
String |
Search direction.
|
regex |
String |
Regular expression of a metric. |
Parameter |
Type |
Description |
---|---|---|
name |
String |
Hyperparameter name. |
param_type |
String |
Parameter type.
|
lower_bound |
String |
Lower bound of the hyperparameter. |
upper_bound |
String |
Upper bound of the hyperparameter. |
discrete_points_num |
String |
Number of discrete points of a continuous hyperparameter. |
discrete_values |
Array of strings |
List of discrete hyperparameter values. |
Parameter |
Type |
Description |
---|---|---|
name |
String |
Name of the search algorithm. |
params |
Array of AutoSearchAlgoConfigParameter objects |
Search algorithm parameters. |
Parameter |
Type |
Description |
---|---|---|
key |
String |
Parameter key. |
value |
String |
Parameter value. |
type |
String |
Parameter type. |
Parameter |
Type |
Description |
---|---|---|
name |
String |
Name of the data input channel. |
description |
String |
Description of the data input channel. |
local_dir |
String |
Local directory of the container to which the data input channel is mapped. |
remote |
InputDataInfo object |
Data input. Options:
|
remote_constraint |
Array of remote_constraint objects |
Data input constraint. |
Parameter |
Type |
Description |
---|---|---|
dataset |
dataset object |
Dataset as the data input. |
obs |
obs object |
OBS in which data input and output stored. |
Parameter |
Type |
Description |
---|---|---|
id |
String |
Dataset ID of a training job. |
version_id |
String |
Dataset version ID of a training job. |
obs_url |
String |
OBS URL of the dataset required by a training job. ModelArts automatically parses and generates the URL based on the dataset and dataset version IDs. For example, /usr/data/. |
Parameter |
Type |
Description |
---|---|---|
obs_url |
String |
OBS URL of the dataset required by a training job. For example, /usr/data/. |
Parameter |
Type |
Description |
---|---|---|
data_type |
String |
Data input type, including the data storage location and dataset. |
attributes |
String |
Attributes if a dataset is used as the data input. Options:
|
Parameter |
Type |
Description |
---|---|---|
name |
String |
Name of the data output channel. |
description |
String |
Description of the data output channel. |
local_dir |
String |
Local directory of the container to which the data output channel is mapped. |
remote |
remote object |
Description of the actual data output. |
Parameter |
Type |
Description |
---|---|---|
obs_url |
String |
OBS URL to which data is actually exported. |
Parameter |
Type |
Description |
---|---|---|
engine_id |
String |
Engine ID selected for a training job. You can set this parameter to engine_id, engine_name + engine_version, or image_url. |
engine_name |
String |
Name of the engine selected for a training job. If engine_id is set, leave this parameter blank. |
engine_version |
String |
Name of the engine version selected for a training job. If engine_id is set, leave this parameter blank. |
image_url |
String |
Custom image URL selected for a training job. |
Parameter |
Type |
Description |
---|---|---|
role |
String |
Role of a heterogeneous training job. Options:
|
algorithm |
algorithm object |
Algorithm management and configuration. |
task_resource |
FlavorResponse object |
Flavors of a training job or an algorithm. |
Parameter |
Type |
Description |
---|---|---|
code_dir |
String |
Absolute path of the directory where the algorithm boot file is stored. |
boot_file |
String |
Absolute path of the algorithm boot file. |
inputs |
inputs object |
Algorithm input channel. |
outputs |
outputs object |
Algorithm output channel. |
engine |
engine object |
Engine on which a heterogeneous job depends. |
Parameter |
Type |
Description |
---|---|---|
name |
String |
Name of the data input channel. |
local_dir |
String |
Local path of the container to which the data input and output channels are mapped. |
remote |
remote object |
Actual data input. Heterogeneous jobs support only OBS. |
Parameter |
Type |
Description |
---|---|---|
obs |
obs object |
OBS in which data input and output stored. |
Parameter |
Type |
Description |
---|---|---|
obs_url |
String |
OBS URL of the dataset required by a training job. For example, /usr/data/. |
Parameter |
Type |
Description |
---|---|---|
name |
String |
Name of the data output channel. |
local_dir |
String |
Local directory of the container to which the data output channel is mapped. |
remote |
remote object |
Description of the actual data output. |
mode |
String |
Data transmission mode. The default value is upload_periodically. |
period |
String |
Data transmission period. The default value is 30s. |
Parameter |
Type |
Description |
---|---|---|
obs |
obs object |
OBS to which data is actually exported. |
Parameter |
Type |
Description |
---|---|---|
obs_url |
String |
OBS URL to which data is actually exported. |
Parameter |
Type |
Description |
---|---|---|
engine_id |
String |
Engine ID of a heterogeneous job, for example, caffe-1.0.0-python2.7. |
engine_name |
String |
Engine name of a heterogeneous job, for example, Caffe. |
engine_version |
String |
Engine version of a heterogeneous job. |
v1_compatible |
Boolean |
Whether the v1 compatibility mode is used. |
run_user |
String |
User UID started by default by the engine. |
image_url |
String |
Custom image URL selected by an algorithm. |
Parameter |
Type |
Description |
---|---|---|
flavor_id |
String |
ID of the resource flavor. |
flavor_name |
String |
Name of the resource flavor. |
max_num |
Integer |
Maximum number of nodes in a resource flavor. |
flavor_type |
String |
Resource flavor type. Options:
|
billing |
billing object |
Billing information of a resource flavor. |
flavor_info |
flavor_info object |
Resource flavor details. |
attributes |
Map<String,String> |
Other specification attributes. |
Parameter |
Type |
Description |
---|---|---|
code |
String |
Billing code. |
unit_num |
Integer |
Number of billing units. |
Parameter |
Type |
Description |
---|---|---|
max_num |
Integer |
Maximum number of nodes that can be selected. The value 1 indicates that the distributed mode is not supported. |
cpu |
cpu object |
CPU specifications. |
gpu |
gpu object |
GPU specifications. |
npu |
npu object |
Ascend flavors. |
memory |
memory object |
Memory information. |
Parameter |
Type |
Description |
---|---|---|
arch |
String |
CPU architecture. |
core_num |
Integer |
Number of cores. |
Parameter |
Type |
Description |
---|---|---|
unit_num |
Integer |
Number of GPUs. |
product_nume |
String |
Product name. |
memory |
String |
Memory. |
Parameter |
Type |
Description |
---|---|---|
unit_num |
String |
Number of NPUs. |
product_name |
String |
Product name. |
memory |
String |
Memory. |
Parameter |
Type |
Description |
---|---|---|
size |
Integer |
Memory size. |
unit |
String |
Number of memory units. |
Parameter |
Type |
Description |
---|---|---|
resource |
Resource object |
Resource flavors of a training job. Select either flavor_id or pool_id+[flavor_id]. |
volumes |
Array of volumes objects |
Volumes attached to a training job. |
log_export_path |
log_export_path object |
Export path of training job logs. |
Parameter |
Type |
Description |
---|---|---|
policy |
String |
Resource flavor of a training job. Options: regular |
flavor_id |
String |
Resource flavor ID of a training job. |
flavor_name |
String |
Read-only flavor name returned by ModelArts when flavor_id is used. |
node_count |
Integer |
Number of resource replicas selected for a training job. Minimum: 1 |
pool_id |
String |
Resource pool ID selected for a training job. |
flavor_detail |
flavor_detail object |
Flavors of a training job or an algorithm. |
Parameter |
Type |
Description |
---|---|---|
flavor_type |
String |
Resource flavor type. Options:
|
billing |
billing object |
Billing information of a resource flavor. |
flavor_info |
flavor_info object |
Resource flavor details. |
Parameter |
Type |
Description |
---|---|---|
code |
String |
Billing code. |
unit_num |
Integer |
Number of billing units. |
Parameter |
Type |
Description |
---|---|---|
max_num |
Integer |
Maximum number of nodes that can be selected. The value 1 indicates that the distributed mode is not supported. |
cpu |
cpu object |
CPU specifications. |
gpu |
gpu object |
GPU specifications. |
npu |
npu object |
Ascend flavors. |
memory |
memory object |
Memory information. |
disk |
disk object |
Disk information. |
Parameter |
Type |
Description |
---|---|---|
arch |
String |
CPU architecture. |
core_num |
Integer |
Number of cores. |
Parameter |
Type |
Description |
---|---|---|
unit_num |
Integer |
Number of GPUs. |
product_nume |
String |
Product name. |
memory |
String |
Memory. |
Parameter |
Type |
Description |
---|---|---|
unit_num |
String |
Number of NPUs. |
product_name |
String |
Product name. |
memory |
String |
Memory. |
Parameter |
Type |
Description |
---|---|---|
size |
Integer |
Memory size. |
unit |
String |
Number of memory units. |
Parameter |
Type |
Description |
---|---|---|
size |
String |
Disk size. |
unit |
String |
Unit of the disk size. Generally, the value is GB. |
Example Requests
- The following shows how to create a training job named TestModelArtsJob. The training job is described as This is a ModelArts job, the ID of the dependency algorithm is 3f5d6706-7b67-408d-8ba0-ec08048c45ed, no input or output is specified, and a free GPU flavor is used.
POST https://endpoint/v2/{project_id}/training-jobs { "kind" : "job", "metadata" : { "name" : "TestModelArtsJob", "description" : "This is a ModelArts job" }, "algorithm" : { "id" : "3f5d6706-7b67-408d-8ba0-ec08048c45ed", "parameters" : [ { "name" : "input_dir", "value" : "obs://xxx/test/moxingtest-dir/" }, { "name" : "input_file", "value" : "obs://xxx/test/moxingtest/" }, { "name" : "large_file_method", "value" : "1" } ], "inputs" : [ ], "outputs" : [ ], "policies" : { "auto_search" : null }, "environments" : { } }, "spec" : { "resource" : { "policy" : "regular", "flavor_id" : "modelarts.p3.large.public.free", "node_count" : 1 }, "log_export_path" : { "obs_url" : "" } } }
- The following shows how to use a custom image to create a training job named TestModelArtsJob2 described as This is a ModelArts job2. The target instance is deployed in a dedicated resource pool and is attached with an NFS.
POST https://endpoint/v2/{project_id}/training-jobs { "kind" : "job", "metadata" : { "name" : "TestModelArtsJob2", "description" : "This is a ModelArts job2" }, "algorithm" : { "engine" : { "image_url" : "hwstaff_z00424192/fastseq:1.2" }, "command" : "cd /home/ma-user/ddp_demo && sh run_ddp.sh", "parameters" : [ ], "inputs" : [ ], "outputs" : [ ], "policies" : { "auto_search" : null }, "environments" : { "NCCL_DEBUG" : "INFO", "NCCL_IB_DISABLE" : "0" } }, "spec" : { "resource" : { "policy" : "regular", "flavor_id" : "modelarts.pool.visual.xlarge", "node_count" : 1, "pool_id" : "poolfaf38d76" }, "log_export_path" : { "obs_url" : "/xxx/limou/ddp-demo-log/" }, "volumes" : [ { "nfs" : { "nfs_server_path" : "192.168.0.82:/", "local_path" : "/home/ma-user/nfs/", "read_only" : false } } ] } }
Example Responses
Status code: 201
ok
{ "kind" : "job", "metadata" : { "id" : "425b7087-83de-49ed-9e40-5bb642be956f", "name" : "TestModelArtsJob", "description" : "This is a ModelArts job", "create_time" : 1637045545982, "workspace_id" : "0", "ai_project" : "default-ai-project", "user_name" : "" }, "status" : { "phase" : "Creating", "secondary_phase" : "Creating", "duration" : 0, "start_time" : 0, "node_count_metrics" : null, "tasks" : [ "worker-0", "server-0" ] }, "algorithm" : { "id" : "3f5d6706-7b67-408d-8ba0-ec08048c45ed", "name" : "ttt-obs-gpu", "code_dir" : "/xxx/test/moxingtest-code/", "boot_file" : "/xxx/test/moxingtest-code/test_obs_gpu.py", "parameters" : [ { "name" : "input_dir", "description" : "", "i18n_description" : null, "value" : "s://xxx/test/moxingtest-dir/", "constraint" : { "type" : "String", "editable" : true, "required" : true, "sensitive" : false, "valid_type" : "None", "valid_range" : [ ] } }, { "name" : "input_file", "description" : "", "i18n_description" : null, "value" : "obs://cxxx/test/moxingtest/", "constraint" : { "type" : "String", "editable" : true, "required" : true, "sensitive" : false, "valid_type" : "None", "valid_range" : [ ] } }, { "name" : "large_file_method", "description" : "", "i18n_description" : null, "value" : "1", "constraint" : { "type" : "Integer", "editable" : true, "required" : true, "sensitive" : false, "valid_type" : "None", "valid_range" : [ ] } } ], "parameters_customization" : false, "engine" : { "engine_id" : "horovod-cp36-tf-1.16.2", "engine_name" : "Horovod", "engine_version" : "0.16.2-TF-1.13.1-python3.6", "usage" : "training", "support_groups" : "public,roma", "v1_compatible" : true, "run_user" : "" }, "policies" : { } }, "spec" : { "resource" : { "policy" : "regular", "turbo_range" : [ 1, 2 ], "flavor_id" : "modelarts.p3.large.public.free", "flavor_name" : "Computing GPU(V100) instance", "node_count" : 1, "flavor_detail" : { "flavor_type" : "GPU", "billing" : { "code" : "modelarts.vm.gpu.free", "unit_num" : 1 }, "attributes" : { "is_free" : "true", "max_free_job_count" : "10" }, "flavor_info" : { "cpu" : { "arch" : "x86", "core_num" : 8 }, "gpu" : { "unit_num" : 1, "product_name" : "NVIDIA-V100", "memory" : "32GB" }, "memory" : { "size" : 64, "unit" : "GB" } } } }, "log_export_path" : { }, "is_hosted_log" : true } }
Status Codes
Status Code |
Description |
---|---|
201 |
ok |
Error Codes
See Error Codes.