Changes to ma_api from doc-exports#1


This is an automatically created Pull Request for changes to ma_api in opentelekomcloud-docs/doc-exports#1.
Please do not edit it manually, since update to the original PR will overwrite local changes.
Original patch file, as well as complete rst archive,  can be found in the artifacts of the opentelekomcloud-docs/doc-exports#1

Reviewed-by: Vineet Pruthi <None>

2022-05-02 14:51:57 +00:00

232 KiB

Raw Blame History

original_name: CreateTask.html

Starting Intelligent Tasks

Function

This API is used to start an intelligent task, which can be an auto labeling task or an auto grouping task. You can specify task_type in the request body to start a type of tasks. The datasets whose data path or work path is an OBS path in a KMS-encrypted bucket support pre-labeling but do not support active learning and auto grouping.

Auto labeling: Learning and training are performed based on selected labels and images and an existing model is selected to quickly label the remaining images. Auto labeling includes active learning and pre-labeling. Active learning: The system uses semi-supervised learning and hard example filtering to perform auto labeling, reducing manual labeling workload and helping you find hard examples. Pre-labeling: Select a model displayed on the Model Management page for auto labeling.

Auto grouping: Unlabeled images are clustered using a clustering algorithm and then processed based on the clustering result. Images can be labeled or cleaned by group.

URI

POST /v2/{project_id}/datasets/{dataset_id}/tasks

**Table 1** Path parameters
Parameter	Mandatory	Type	Description
dataset_id	Yes	String	Dataset ID.
project_id	Yes	String	Project ID. For details about how to obtain the project ID, see `Obtaining a Project ID <modelarts_03_0147>`.

Request Parameters

**Table 2** Request body parameters
Parameter	Mandatory	Type	Description
collect_key_sample	No	Boolean	Whether to collect key samples. The options are as follows: true: Collect key samples. false: Do not collect key samples. (Default value)
config	No	`SmartTaskConfig <createtask__request_smarttaskconfig>` object	Task configuration.
model_id	No	String	Model ID.
task_type	No	String	Task type. The options are as follows: auto-label: active learning pre-label: pre-labeling auto-grouping: auto grouping auto-deploy: one-click model deployment

**Table 3** SmartTaskConfig
Parameter	Mandatory	Type	Description
algorithm_type	No	String	Algorithm type for auto labeling. Options: fast: Only labeled samples are used for training. This type of algorithm achieves faster labeling. accurate: In addition to labeled samples, unlabeled samples are used for semi-supervised training. This type of algorithm achieves more accurate labeling.
ambiguity	No	Boolean	Whether to perform clustering based on the image blurring degree.
annotation_output	No	String	Output path of the active learning labeling result.
collect_rule	No	String	Sample collection rule. The default value is all, indicating full collection. Currently, only value all is available.
collect_sample	No	Boolean	Whether to enable sample collection. The options are as follows: true: Enable sample collection. (Default value) false: Do not enable sample collection.
confidence_scope	No	String	Confidence range of key samples. The minimum and maximum values are separated by hyphens (-). Example: 0.10-0.90.
description	No	String	Task description.
engine_name	No	String	Engine name.
export_format	No	Integer	Format of the exported directory. The options are as follows: 1: tree structure. For example: cat/1.jpg,dog/2.jpg. 2: tile structure. For example: 1.jpg, 1.txt; 2.jpg,2.txt.
export_params	No	`ExportParams <createtask__request_exportparams>` object	Parameters of a dataset export task.
flavor	No	`Flavor <createtask__request_flavor>` object	Training resource flavor.
image_brightness	No	Boolean	Whether to perform clustering based on the image brightness.
image_colorfulness	No	Boolean	Whether to perform clustering based on the image color.
inf_cluster_id	No	String	ID of a dedicated cluster. This parameter is left blank by default, indicating that a dedicated cluster is not used. When using the dedicated cluster to deploy services, ensure that the cluster status is normal. After this parameter is set, the network configuration of the cluster is used, and the vpc_id parameter does not take effect.
inf_config_list	No	Array of `InfConfig <createtask__request_infconfig>` objects	Configuration list required for running an inference task, which is optional and left blank by default.
inf_output	No	String	Output path of inference in active learning.
infer_result_output_dir	No	String	OBS directory for storing sample prediction results. This parameter is optional. The {service_id}-infer-result subdirectory in the output_dir directory is used by default.
key_sample_output	No	String	Output path of hard examples in active learning.
log_url	No	String	OBS URL of the logs of a training job. By default, this parameter is left blank.
manifest_path	No	String	Path of the manifest file, which is used as the input for training and inference.
model_id	No	String	Model ID.
model_name	No	String	Model name.
model_parameter	No	String	Model parameter.
model_version	No	String	Model version.
n_clusters	No	Integer	Number of clusters.
name	No	String	Task name.
output_dir	No	String	Sample output path. The format is as follows: Dataset output path/Dataset name-Dataset ID/annotation/auto-deploy/. Example: /test/work_1608083108676/dataset123-g6IO9qSu6hoxwCAirfm/annotation/auto-deploy/.
parameters	No	Array of `TrainingParameter <createtask__request_trainingparameter>` objects	Running parameters of a training job.
pool_id	No	String	ID of a resource pool.
property	No	String	Attribute name.
req_uri	No	String	Inference path of a batch job.
result_type	No	Integer	Processing mode of auto grouping results. The options are as follows: 0: Save to OBS. 1: Save to samples.
samples	No	Array of `SampleLabels <createtask__request_samplelabels>` objects	List of labeling information for samples to be auto labeled.
stop_time	No	Integer	Timeout interval, in minutes. The default value is 15 minutes. This parameter is used only in the scenario of auto labeling for videos.
time	No	String	Timestamp in active learning.
train_data_path	No	String	Path for storing existing training datasets.
train_url	No	String	URL of the OBS path where the file of a training job is outputted. By default, this parameter is left blank.
version_format	No	String	Format of a dataset version. The options are as follows: Default: default format CarbonData: CarbonData (supported only by table datasets) CSV: CSV
worker_server_num	No	Integer	Number of workers in a training job.

**Table 4** ExportParams
Parameter	Mandatory	Type	Description
clear_hard_property	No	Boolean	Whether to clear hard example attributes. The options are as follows: true: Clear hard example attributes. (Default value) false: Do not clear hard example attributes.
export_dataset_version_format	No	String	Format of the dataset version to which data is exported.
export_dataset_version_name	No	String	Name of the dataset version to which data is exported.
export_dest	No	String	Export destination. The options are as follows: DIR: Export data to OBS. (Default value) NEW_DATASET: Export data to a new dataset.
export_new_dataset_name	No	String	Name of the new dataset to which data is exported.
export_new_dataset_work_path	No	String	Working directory of the new dataset to which data is exported.
ratio_sample_usage	No	Boolean	Whether to randomly allocate the training set and validation set based on the specified ratio. The options are as follows: true: Allocate the training set and validation set. false: Do not allocate the training set and validation set. (Default value)
sample_state	No	String	Sample status. The options are as follows: ALL: labeled NONE: unlabeled UNCHECK: pending acceptance ACCEPTED: accepted REJECTED: rejected UNREVIEWED: pending review REVIEWED: reviewed WORKFORCE_SAMPLED: sampled WORKFORCE_SAMPLED_UNCHECK: sampling unchecked WORKFORCE_SAMPLED_CHECKED: sampling checked WORKFORCE_SAMPLED_ACCEPTED: sampling accepted WORKFORCE_SAMPLED_REJECTED: sampling rejected AUTO_ANNOTATION: to be confirmed
samples	No	Array of strings	ID list of exported samples.
search_conditions	No	Array of `SearchCondition <createtask__request_searchcondition>` objects	Exported search conditions. The relationship between multiple search conditions is OR.
train_sample_ratio	No	String	Split ratio of training set and verification set during specified version release. The default value is 1.00, indicating that all released versions are training sets.

**Table 5** SearchCondition
Parameter	Mandatory	Type	Description
coefficient	No	String	Filter by coefficient of difficulty.
frame_in_video	No	Integer	A frame in the video.
hard	No	String	Whether a sample is a hard sample. The options are as follows: 0: non-hard sample 1: hard sample
import_origin	No	String	Filter by data source.
kvp	No	String	CT dosage, filtered by dosage.
label_list	No	`SearchLabels <createtask__request_searchlabels>` object	Label search criteria.
labeler	No	String	Labeler.
metadata	No	`SearchProp <createtask__request_searchprop>` object	Search by sample attribute.
parent_sample_id	No	String	Parent sample ID.
sample_dir	No	String	Directory where data samples are stored (the directory must end with a slash (/)). Only samples in the specified directory are searched for. Recursive search of directories is not supported.
sample_name	No	String	Search by sample name, including the file name extension.
sample_time	No	String	When a sample is added to the dataset, an index is created based on the last modification time (accurate to day) of the sample on OBS. You can search for the sample based on the time. The options are as follows: month: Search for samples added from 30 days ago to the current day. day: Search for samples added from yesterday (one day ago) to the current day. yyyyMMdd-yyyyMMdd: Search for samples added in a specified period (at most 30 days), in the format of Start date-End date. For example, 20190901-2019091501 indicates that samples generated from September 1 to September 15, 2019 are searched.
score	No	String	Search by confidence.
slice_thickness	No	String	DICOM layer thickness. Samples are filtered by layer thickness.
study_date	No	String	DICOM scanning time.
time_in_video	No	String	A time point in the video.

**Table 6** SearchLabels
Parameter	Mandatory	Type	Description
labels	No	Array of `SearchLabel <createtask__request_searchlabel>` objects	List of label search criteria.
op	No	String	If you want to search for multiple labels, op must be specified. If you search for only one label, op can be left blank. The options are as follows: OR: OR operation AND: AND operation

**Table 7** SearchLabel
Parameter	Mandatory	Type	Description
name	No	String	Label name.
op	No	String	Operation type between multiple attributes. The options are as follows: OR: OR operation AND: AND operation
property	No	Map<String,Array<String>>	Label attribute, which is in the Object format and stores any key-value pairs. key indicates the attribute name, and value indicates the value list. If value is null, the search is not performed by value. Otherwise, the search value can be any value in the list.
type	No	Integer	Label type. The options are as follows: 0: image classification 1: object detection 100: text classification 101: named entity recognition 102: text triplet relationship 103: text triplet entity 200: speech classification 201: speech content 202: speech paragraph labeling 600: video classification

**Table 8** SearchProp
Parameter	Mandatory	Type	Description
op	No	String	Relationship between attribute values. The options are as follows: AND: AND relationship OR: OR relationship
props	No	Map<String,Array<String>>	Search criteria of an attribute. Multiple search criteria can be set.

**Table 9** Flavor
Parameter	Mandatory	Type	Description
code	No	String	Attribute code of a resource specification, which is used for task creating.

**Table 10** InfConfig
Parameter	Mandatory	Type	Description
envs	No	Map<String,String>	(Optional) Environment variable key-value pair required for running a model. By default, this parameter is left blank. To ensure data security, do not enter sensitive information, such as plaintext passwords, in environment variables.
instance_count	No	Integer	Instance number of model deployment, that is, the number of compute nodes.
model_id	No	String	Model ID.
specification	No	String	Resource specifications of real-time services. For details, see `Deploying Services <modelarts_03_0082>`.
weight	No	Integer	Traffic weight allocated to a model. This parameter is mandatory only when infer_type is set to real-time. The sum of the weights must be 100.

**Table 11** TrainingParameter
Parameter	Mandatory	Type	Description
label	No	String	Parameter name.
value	No	String	Parameter value.

**Table 12** SampleLabels
Parameter	Mandatory	Type	Description
labels	No	Array of `SampleLabel <createtask__request_samplelabel>` objects	Sample label list. If this parameter is left blank, all sample labels are deleted.
metadata	No	`SampleMetadata <createtask__request_samplemetadata>` object	Key-value pair of the sample metadata attribute.
sample_id	No	String	Sample ID.
sample_type	No	Integer	Sample type. The options are as follows: 0: image 1: text 2: speech 4: table 6: video 9: custom format
sample_usage	No	String	Sample usage. The options are as follows: TRAIN: training EVAL: evaluation TEST: test INFERENCE: inference
source	No	String	Source address of sample data.
worker_id	No	String	ID of a labeling team member.

**Table 13** SampleLabel
Parameter	Mandatory	Type	Description
annotated_by	No	String	Video labeling method, which is used to distinguish whether a video is labeled manually or automatically. The options are as follows: human: manual labeling auto: automatic labeling
id	No	String	Label ID.
name	No	String	Label name.
property	No	`SampleLabelProperty <createtask__request_samplelabelproperty>` object	Attribute key-value pair of the sample label, such as the object shape and shape feature.
score	No	Float	Confidence.
type	No	Integer	Label type. The options are as follows: 0: image classification 1: object detection 100: text classification 101: named entity recognition 102: text triplet relationship 103: text triplet entity 200: speech classification 201: speech content 202: speech paragraph labeling 600: video classification

**Table 14** SampleLabelProperty
Parameter	Mandatory	Type	Description
@modelarts:content	No	String	Speech text content, which is a default attribute dedicated to the speech label (including the speech content and speech start and end points).
@modelarts:end_index	No	Integer	End position of the text, which is a default attribute dedicated to the named entity label. The end position does not include the character corresponding to the value of end_index. Examples are as follows. If the text content is "Barack Hussein Obama II (born August 4, 1961) is an American attorney and politician.", the start_index and end_index values of "Barack Hussein Obama II" are 0 and 23, respectively. If the text content is "By the end of 2018, the company has more than 100 employees.", the start_index and end_index values of "By the end of 2018" are 0 and 18, respectively.
@modelarts:end_time	No	String	Speech end time, which is a default attribute dedicated to the speech start/end point label, in the format of hh:mm:ss.SSS. (hh indicates hour; mm indicates minute; ss indicates second; and SSS indicates millisecond.)
@modelarts:feature	No	Object	Shape feature, which is a default attribute dedicated to the object detection label, with type of List. The upper left corner of an image is used as the coordinate origin [0,0]. Each coordinate point is represented by [x, y]. x indicates the horizontal coordinate, and y indicates the vertical coordinate (both x and y are greater than or equal to 0). The format of each shape is as follows: bndbox: consists of two points, for example, 0,10],[50,95. The first point is located at the upper left corner of the rectangle and the second point is located at the lower right corner of the rectangle. That is, the X coordinate of the first point must be smaller than that of the second point, and the Y coordinate of the second point must be smaller than that of the first point. polygon: consists of multiple points that are connected in sequence to form a polygon, for example, 0,100],[50,95],[10,60],[500,400. circle: consists of the center point and radius, for example, 100,100],[50. line: consists of two points, for example, 0,100],[50,95. The first point is the start point, and the second point is the end point. dashed: consists of two points, for example, 0,100],[50,95. The first point is the start point, and the second point is the end point. point: consists of one point, for example, 0,100. polyline: consists of multiple points, for example, 0,100],[50,95],[10,60],[500,400.
@modelarts:from	No	String	ID of the head entity in the triplet relationship label, which is a default attribute dedicated to the triplet relationship label.
@modelarts:hard	No	String	Sample labeled as a hard sample or not, which is a default attribute. Options: 0/false: not a hard example 1/true: hard example
@modelarts:hard_coefficient	No	String	Coefficient of difficulty of each label level, which is a default attribute. The value range is [0,1].
@modelarts:hard_reasons	No	String	Reasons that the sample is a hard sample, which is a default attribute. Use a hyphen (-) to separate every two hard sample reason IDs, for example, 3-20-21-19. The options are as follows: 0: No target objects are identified. 1: The confidence is low. 2: The clustering result based on the training dataset is inconsistent with the prediction result. 3: The prediction result is greatly different from the data of the same type in the training dataset. 4: The prediction results of multiple consecutive similar images are inconsistent. 5: There is a large offset between the image resolution and the feature distribution of the training dataset. 6: There is a large offset between the aspect ratio of the image and the feature distribution of the training dataset. 7: There is a large offset between the brightness of the image and the feature distribution of the training dataset. 8: There is a large offset between the saturation of the image and the feature distribution of the training dataset. 9: There is a large offset between the color richness of the image and the feature distribution of the training dataset. 10: There is a large offset between the definition of the image and the feature distribution of the training dataset. 11: There is a large offset between the number of frames of the image and the feature distribution of the training dataset. 12: There is a large offset between the standard deviation of area of image frames and the feature distribution of the training dataset. 13: There is a large offset between the aspect ratio of image frames and the feature distribution of the training dataset. 14: There is a large offset between the area portion of image frames and the feature distribution of the training dataset. 15: There is a large offset between the edge of image frames and the feature distribution of the training dataset. 16: There is a large offset between the brightness of image frames and the feature distribution of the training dataset. 17: There is a large offset between the definition of image frames and the feature distribution of the training dataset. 18: There is a large offset between the stack of image frames and the feature distribution of the training dataset. 19: The data enhancement result based on GaussianBlur is inconsistent with the prediction result of the original image. 20: The data enhancement result based on fliplr is inconsistent with the prediction result of the original image. 21: The data enhancement result based on Crop is inconsistent with the prediction result of the original image. 22: The data enhancement result based on flipud is inconsistent with the prediction result of the original image. 23: The data enhancement result based on scale is inconsistent with the prediction result of the original image. 24: The data enhancement result based on translate is inconsistent with the prediction result of the original image. 25: The data enhancement result based on shear is inconsistent with the prediction result of the original image. 26: The data enhancement result based on superpixels is inconsistent with the prediction result of the original image. 27: The data enhancement result based on sharpen is inconsistent with the prediction result of the original image. 28: The data enhancement result based on add is inconsistent with the prediction result of the original image. 29: The data enhancement result based on invert is inconsistent with the prediction result of the original image. 30: The data is predicted to be abnormal.
@modelarts:shape	No	String	Object shape, which is a default attribute dedicated to the object detection label and is left empty by default. The options are as follows: bndbox: rectangle polygon: polygon circle: circle line: straight line dashed: dotted line point: point polyline: polyline
@modelarts:source	No	String	Speech source, which is a default attribute dedicated to the speech start/end point label and can be set to a speaker or narrator.
@modelarts:start_index	No	Integer	Start position of the text, which is a default attribute dedicated to the named entity label. The start value begins from 0, including the character corresponding to the value of start_index.
@modelarts:start_time	No	String	Speech start time, which is a default attribute dedicated to the speech start/end point label, in the format of hh:mm:ss.SSS. (hh indicates hour; mm indicates minute; ss indicates second; and SSS indicates millisecond.)
@modelarts:to	No	String	ID of the tail entity in the triplet relationship label, which is a default attribute dedicated to the triplet relationship label.

**Table 15** SampleMetadata
Parameter	Mandatory	Type	Description
@modelarts:hard	No	Double	Whether the sample is labeled as a hard sample, which is a default attribute. The options are as follows: 0: non-hard sample 1: hard sample
@modelarts:hard_coefficient	No	Double	Coefficient of difficulty of each sample level, which is a default attribute. The value range is [0,1].
@modelarts:hard_reasons	No	Array of integers	ID of a hard sample reason, which is a default attribute. The options are as follows: 0: No target objects are identified. 1: The confidence is low. 2: The clustering result based on the training dataset is inconsistent with the prediction result. 3: The prediction result is greatly different from the data of the same type in the training dataset. 4: The prediction results of multiple consecutive similar images are inconsistent. 5: There is a large offset between the image resolution and the feature distribution of the training dataset. 6: There is a large offset between the aspect ratio of the image and the feature distribution of the training dataset. 7: There is a large offset between the brightness of the image and the feature distribution of the training dataset. 8: There is a large offset between the saturation of the image and the feature distribution of the training dataset. 9: There is a large offset between the color richness of the image and the feature distribution of the training dataset. 10: There is a large offset between the definition of the image and the feature distribution of the training dataset. 11: There is a large offset between the number of frames of the image and the feature distribution of the training dataset. 12: There is a large offset between the standard deviation of area of image frames and the feature distribution of the training dataset. 13: There is a large offset between the aspect ratio of image frames and the feature distribution of the training dataset. 14: There is a large offset between the area portion of image frames and the feature distribution of the training dataset. 15: There is a large offset between the edge of image frames and the feature distribution of the training dataset. 16: There is a large offset between the brightness of image frames and the feature distribution of the training dataset. 17: There is a large offset between the definition of image frames and the feature distribution of the training dataset. 18: There is a large offset between the stack of image frames and the feature distribution of the training dataset. 19: The data enhancement result based on GaussianBlur is inconsistent with the prediction result of the original image. 20: The data enhancement result based on fliplr is inconsistent with the prediction result of the original image. 21: The data enhancement result based on Crop is inconsistent with the prediction result of the original image. 22: The data enhancement result based on flipud is inconsistent with the prediction result of the original image. 23: The data enhancement result based on scale is inconsistent with the prediction result of the original image. 24: The data enhancement result based on translate is inconsistent with the prediction result of the original image. 25: The data enhancement result based on shear is inconsistent with the prediction result of the original image. 26: The data enhancement result based on superpixels is inconsistent with the prediction result of the original image. 27: The data enhancement result based on sharpen is inconsistent with the prediction result of the original image. 28: The data enhancement result based on add is inconsistent with the prediction result of the original image. 29: The data enhancement result based on invert is inconsistent with the prediction result of the original image. 30: The data is predicted to be abnormal.
@modelarts:size	No	Array of objects	Image size (width, height, and depth of the image), which is a default attribute, with type of List. In the list, the first number indicates the width (pixels), the second number indicates the height (pixels), and the third number indicates the depth (the depth can be left blank and the default value is 3). For example, [100,200,3] and [100,200] are both valid. Note: This parameter is mandatory only when the sample label list contains the object detection label.

Response Parameters

Status code: 200

**Table 16** Response body parameters
Parameter	Type	Description
task_id	String	Task ID.

Example Requests

Starting an Auto Labeling (Active Learning) Task

{
  "task_type" : "auto-label",
  "collect_key_sample" : true,
  "config" : {
    "algorithm_type" : "fast"
  }
}

Starting an Auto Labeling (Pre-labeling) Task

{
  "task_type" : "pre-label",
  "model_id" : "c4989033-7584-44ee-a180-1c476b810e46",
  "collect_key_sample" : true,
  "config" : {
    "inf_config_list" : [ {
      "specification" : "modelarts.vm.cpu.2u",
      "instance_count" : 1
    } ]
  }
}

Starting an Auto Grouping Task

{
  "type" : 2,
  "export_type" : 1,
  "config" : {
    "n_clusters" : "2",
    "ambiguity" : false,
    "image_brightness" : false,
    "image_colorfulness" : false,
    "property" : "size"
  }
}

Example Responses