This API is used to query monitoring information about a single container of a job.
GET /v1/{project_id}/training-jobs/{job_id}/versions/{version_id}/pod/{pod_name}/metric-statistic
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
project_id |
Yes |
String |
Project ID |
job_id |
Yes |
Long |
ID of a training job |
version_id |
Yes |
Long |
Version ID of a training job |
pod_name |
Yes |
String |
Container name, which is the same as the job log name. For details about how to obtain the value, see Obtaining the Name of a Training Job Log File. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
metrics |
No |
String |
Metrics to be queried. Separate metrics by commas (,), for example, CpuUsage,MemUsage. If this parameter is left blank, all metrics are queried. Options:
|
statistic_type |
No |
String |
Metric statistics method, indicating whether to collect metric statistics based on a single GPU. This parameter applies only to GPU metric statistics.
|
Parameter |
Type |
Description |
---|---|---|
error_message |
String |
Error message when the API call fails. This parameter is not included when the API call succeeds. |
error_code |
String |
Error code when the API call fails. For details, see Error Codes. This parameter is not included when the API call succeeds. |
metrics |
JSON Array |
Metric monitoring details. For details, see Table 4. |
interval |
Integer |
Query interval, in minutes. |
The following shows how to query the logs contained in log1.log of the job whose job_id is 10 and version_id is 10.
GET https://endpoint/v1/{project_id}/training-jobs/10/versions/10/pod/pod1/metric-statistic?metrics=gpuUtil
{ "metrics": [ { "metric":"gpuUtil", "value":["1","22","33"] } ], "interval" : 1 }
{ "error_message": "Error string", "error_code": "ModelArts.0105" }
For details about the status code, see Status Code.