otcbot[bot] 4317c860a8
Update content (#31)
Changes to ma_api from doc-exports#1


This is an automatically created Pull Request for changes to ma_api in opentelekomcloud-docs/doc-exports#1.
Please do not edit it manually, since update to the original PR will overwrite local changes.
Original patch file, as well as complete rst archive,  can be found in the artifacts of the opentelekomcloud-docs/doc-exports#1

Reviewed-by: Vineet Pruthi <None>
2022-05-02 14:51:57 +00:00

232 KiB

original_name

CreateTask.html

Starting Intelligent Tasks

Function

This API is used to start an intelligent task, which can be an auto labeling task or an auto grouping task. You can specify task_type in the request body to start a type of tasks. The datasets whose data path or work path is an OBS path in a KMS-encrypted bucket support pre-labeling but do not support active learning and auto grouping.

Auto labeling: Learning and training are performed based on selected labels and images and an existing model is selected to quickly label the remaining images. Auto labeling includes active learning and pre-labeling. Active learning: The system uses semi-supervised learning and hard example filtering to perform auto labeling, reducing manual labeling workload and helping you find hard examples. Pre-labeling: Select a model displayed on the Model Management page for auto labeling.

  • Auto grouping: Unlabeled images are clustered using a clustering algorithm and then processed based on the clustering result. Images can be labeled or cleaned by group.

URI

POST /v2/{project_id}/datasets/{dataset_id}/tasks

Table 1 Path parameters
Parameter Mandatory Type Description
dataset_id Yes String Dataset ID.
project_id Yes String Project ID. For details about how to obtain the project ID, see Obtaining a Project ID <modelarts_03_0147>.

Request Parameters

Table 2 Request body parameters
Parameter Mandatory Type Description
collect_key_sample No Boolean

Whether to collect key samples. The options are as follows:

  • true: Collect key samples.
  • false: Do not collect key samples. (Default value)
config No SmartTaskConfig <createtask__request_smarttaskconfig> object Task configuration.
model_id No String Model ID.
task_type No String

Task type. The options are as follows:

  • auto-label: active learning
  • pre-label: pre-labeling
  • auto-grouping: auto grouping
  • auto-deploy: one-click model deployment
Table 3 SmartTaskConfig
Parameter Mandatory Type Description
algorithm_type No String

Algorithm type for auto labeling. Options:

  • fast: Only labeled samples are used for training. This type of algorithm achieves faster labeling.
  • accurate: In addition to labeled samples, unlabeled samples are used for semi-supervised training. This type of algorithm achieves more accurate labeling.
ambiguity No Boolean Whether to perform clustering based on the image blurring degree.
annotation_output No String Output path of the active learning labeling result.
collect_rule No String Sample collection rule. The default value is all, indicating full collection. Currently, only value all is available.
collect_sample No Boolean

Whether to enable sample collection. The options are as follows:

  • true: Enable sample collection. (Default value)
  • false: Do not enable sample collection.
confidence_scope No String Confidence range of key samples. The minimum and maximum values are separated by hyphens (-). Example: 0.10-0.90.
description No String Task description.
engine_name No String Engine name.
export_format No Integer

Format of the exported directory. The options are as follows:

  • 1: tree structure. For example: cat/1.jpg,dog/2.jpg.
  • 2: tile structure. For example: 1.jpg, 1.txt; 2.jpg,2.txt.
export_params No ExportParams <createtask__request_exportparams> object Parameters of a dataset export task.
flavor No Flavor <createtask__request_flavor> object Training resource flavor.
image_brightness No Boolean Whether to perform clustering based on the image brightness.
image_colorfulness No Boolean Whether to perform clustering based on the image color.
inf_cluster_id No String ID of a dedicated cluster. This parameter is left blank by default, indicating that a dedicated cluster is not used. When using the dedicated cluster to deploy services, ensure that the cluster status is normal. After this parameter is set, the network configuration of the cluster is used, and the vpc_id parameter does not take effect.
inf_config_list No Array of InfConfig <createtask__request_infconfig> objects Configuration list required for running an inference task, which is optional and left blank by default.
inf_output No String Output path of inference in active learning.
infer_result_output_dir No String OBS directory for storing sample prediction results. This parameter is optional. The {service_id}-infer-result subdirectory in the output_dir directory is used by default.
key_sample_output No String Output path of hard examples in active learning.
log_url No String OBS URL of the logs of a training job. By default, this parameter is left blank.
manifest_path No String Path of the manifest file, which is used as the input for training and inference.
model_id No String Model ID.
model_name No String Model name.
model_parameter No String Model parameter.
model_version No String Model version.
n_clusters No Integer Number of clusters.
name No String Task name.
output_dir No String Sample output path. The format is as follows: Dataset output path/Dataset name-Dataset ID/annotation/auto-deploy/. Example: /test/work_1608083108676/dataset123-g6IO9qSu6hoxwCAirfm/annotation/auto-deploy/.
parameters No Array of TrainingParameter <createtask__request_trainingparameter> objects Running parameters of a training job.
pool_id No String ID of a resource pool.
property No String Attribute name.
req_uri No String Inference path of a batch job.
result_type No Integer

Processing mode of auto grouping results. The options are as follows:

  • 0: Save to OBS.
  • 1: Save to samples.
samples No Array of SampleLabels <createtask__request_samplelabels> objects List of labeling information for samples to be auto labeled.
stop_time No Integer Timeout interval, in minutes. The default value is 15 minutes. This parameter is used only in the scenario of auto labeling for videos.
time No String Timestamp in active learning.
train_data_path No String Path for storing existing training datasets.
train_url No String URL of the OBS path where the file of a training job is outputted. By default, this parameter is left blank.
version_format No String

Format of a dataset version. The options are as follows:

  • Default: default format
  • CarbonData: CarbonData (supported only by table datasets)
  • CSV: CSV
worker_server_num No Integer Number of workers in a training job.
Table 4 ExportParams
Parameter Mandatory Type Description
clear_hard_property No Boolean

Whether to clear hard example attributes. The options are as follows:

  • true: Clear hard example attributes. (Default value)
  • false: Do not clear hard example attributes.
export_dataset_version_format No String Format of the dataset version to which data is exported.
export_dataset_version_name No String Name of the dataset version to which data is exported.
export_dest No String

Export destination. The options are as follows:

  • DIR: Export data to OBS. (Default value)
  • NEW_DATASET: Export data to a new dataset.
export_new_dataset_name No String Name of the new dataset to which data is exported.
export_new_dataset_work_path No String Working directory of the new dataset to which data is exported.
ratio_sample_usage No Boolean

Whether to randomly allocate the training set and validation set based on the specified ratio. The options are as follows:

  • true: Allocate the training set and validation set.
  • false: Do not allocate the training set and validation set. (Default value)
sample_state No String

Sample status. The options are as follows:

  • ALL: labeled
  • NONE: unlabeled
  • UNCHECK: pending acceptance
  • ACCEPTED: accepted
  • REJECTED: rejected
  • UNREVIEWED: pending review
  • REVIEWED: reviewed
  • WORKFORCE_SAMPLED: sampled
  • WORKFORCE_SAMPLED_UNCHECK: sampling unchecked
  • WORKFORCE_SAMPLED_CHECKED: sampling checked
  • WORKFORCE_SAMPLED_ACCEPTED: sampling accepted
  • WORKFORCE_SAMPLED_REJECTED: sampling rejected
  • AUTO_ANNOTATION: to be confirmed
samples No Array of strings ID list of exported samples.
search_conditions No Array of SearchCondition <createtask__request_searchcondition> objects Exported search conditions. The relationship between multiple search conditions is OR.
train_sample_ratio No String Split ratio of training set and verification set during specified version release. The default value is 1.00, indicating that all released versions are training sets.
Table 5 SearchCondition
Parameter Mandatory Type Description
coefficient No String Filter by coefficient of difficulty.
frame_in_video No Integer A frame in the video.
hard No String

Whether a sample is a hard sample. The options are as follows:

  • 0: non-hard sample
  • 1: hard sample
import_origin No String Filter by data source.
kvp No String CT dosage, filtered by dosage.
label_list No SearchLabels <createtask__request_searchlabels> object Label search criteria.
labeler No String Labeler.
metadata No SearchProp <createtask__request_searchprop> object Search by sample attribute.
parent_sample_id No String Parent sample ID.
sample_dir No String Directory where data samples are stored (the directory must end with a slash (/)). Only samples in the specified directory are searched for. Recursive search of directories is not supported.
sample_name No String Search by sample name, including the file name extension.
sample_time No String

When a sample is added to the dataset, an index is created based on the last modification time (accurate to day) of the sample on OBS. You can search for the sample based on the time. The options are as follows:

  • month: Search for samples added from 30 days ago to the current day.
  • day: Search for samples added from yesterday (one day ago) to the current day.
  • yyyyMMdd-yyyyMMdd: Search for samples added in a specified period (at most 30 days), in the format of Start date-End date. For example, 20190901-2019091501 indicates that samples generated from September 1 to September 15, 2019 are searched.
score No String Search by confidence.
slice_thickness No String DICOM layer thickness. Samples are filtered by layer thickness.
study_date No String DICOM scanning time.
time_in_video No String A time point in the video.
Table 6 SearchLabels
Parameter Mandatory Type Description
labels No Array of SearchLabel <createtask__request_searchlabel> objects List of label search criteria.
op No String

If you want to search for multiple labels, op must be specified. If you search for only one label, op can be left blank. The options are as follows:

  • OR: OR operation
  • AND: AND operation
Table 7 SearchLabel
Parameter Mandatory Type Description
name No String Label name.
op No String

Operation type between multiple attributes. The options are as follows:

  • OR: OR operation
  • AND: AND operation
property No Map<String,Array<String>> Label attribute, which is in the Object format and stores any key-value pairs. key indicates the attribute name, and value indicates the value list. If value is null, the search is not performed by value. Otherwise, the search value can be any value in the list.
type No Integer

Label type. The options are as follows:

  • 0: image classification
  • 1: object detection
  • 100: text classification
  • 101: named entity recognition
  • 102: text triplet relationship
  • 103: text triplet entity
  • 200: speech classification
  • 201: speech content
  • 202: speech paragraph labeling
  • 600: video classification
Table 8 SearchProp
Parameter Mandatory Type Description
op No String

Relationship between attribute values. The options are as follows:

  • AND: AND relationship
  • OR: OR relationship
props No Map<String,Array<String>> Search criteria of an attribute. Multiple search criteria can be set.
Table 9 Flavor
Parameter Mandatory Type Description
code No String Attribute code of a resource specification, which is used for task creating.
Table 10 InfConfig
Parameter Mandatory Type Description
envs No Map<String,String> (Optional) Environment variable key-value pair required for running a model. By default, this parameter is left blank. To ensure data security, do not enter sensitive information, such as plaintext passwords, in environment variables.
instance_count No Integer Instance number of model deployment, that is, the number of compute nodes.
model_id No String Model ID.
specification No String Resource specifications of real-time services. For details, see Deploying Services <modelarts_03_0082>.
weight No Integer Traffic weight allocated to a model. This parameter is mandatory only when infer_type is set to real-time. The sum of the weights must be 100.
Table 11 TrainingParameter
Parameter Mandatory Type Description
label No String Parameter name.
value No String Parameter value.
Table 12 SampleLabels
Parameter Mandatory Type Description
labels No Array of SampleLabel <createtask__request_samplelabel> objects Sample label list. If this parameter is left blank, all sample labels are deleted.
metadata No SampleMetadata <createtask__request_samplemetadata> object Key-value pair of the sample metadata attribute.
sample_id No String Sample ID.
sample_type No Integer

Sample type. The options are as follows:

  • 0: image
  • 1: text
  • 2: speech
  • 4: table
  • 6: video
  • 9: custom format
sample_usage No String

Sample usage. The options are as follows:

  • TRAIN: training
  • EVAL: evaluation
  • TEST: test
  • INFERENCE: inference
source No String Source address of sample data.
worker_id No String ID of a labeling team member.
Table 13 SampleLabel
Parameter Mandatory Type Description
annotated_by No String

Video labeling method, which is used to distinguish whether a video is labeled manually or automatically. The options are as follows:

  • human: manual labeling
  • auto: automatic labeling
id No String Label ID.
name No String Label name.
property No SampleLabelProperty <createtask__request_samplelabelproperty> object Attribute key-value pair of the sample label, such as the object shape and shape feature.
score No Float Confidence.
type No Integer

Label type. The options are as follows:

  • 0: image classification
  • 1: object detection
  • 100: text classification
  • 101: named entity recognition
  • 102: text triplet relationship
  • 103: text triplet entity
  • 200: speech classification
  • 201: speech content
  • 202: speech paragraph labeling
  • 600: video classification
Table 14 SampleLabelProperty
Parameter Mandatory Type Description
@modelarts:content No String Speech text content, which is a default attribute dedicated to the speech label (including the speech content and speech start and end points).
@modelarts:end_index No Integer

End position of the text, which is a default attribute dedicated to the named entity label. The end position does not include the character corresponding to the value of end_index. Examples are as follows.

  • If the text content is "Barack Hussein Obama II (born August 4, 1961) is an American attorney and politician.", the start_index and end_index values of "Barack Hussein Obama II" are 0 and 23, respectively.
  • If the text content is "By the end of 2018, the company has more than 100 employees.", the start_index and end_index values of "By the end of 2018" are 0 and 18, respectively.
@modelarts:end_time No String Speech end time, which is a default attribute dedicated to the speech start/end point label, in the format of hh:mm:ss.SSS. (hh indicates hour; mm indicates minute; ss indicates second; and SSS indicates millisecond.)
@modelarts:feature No Object

Shape feature, which is a default attribute dedicated to the object detection label, with type of List. The upper left corner of an image is used as the coordinate origin [0,0]. Each coordinate point is represented by [x, y]. x indicates the horizontal coordinate, and y indicates the vertical coordinate (both x and y are greater than or equal to 0). The format of each shape is as follows:

  • bndbox: consists of two points, for example, 0,10],[50,95. The first point is located at the upper left corner of the rectangle and the second point is located at the lower right corner of the rectangle. That is, the X coordinate of the first point must be smaller than that of the second point, and the Y coordinate of the second point must be smaller than that of the first point.
  • polygon: consists of multiple points that are connected in sequence to form a polygon, for example, 0,100],[50,95],[10,60],[500,400.
  • circle: consists of the center point and radius, for example, 100,100],[50.
  • line: consists of two points, for example, 0,100],[50,95. The first point is the start point, and the second point is the end point.
  • dashed: consists of two points, for example, 0,100],[50,95. The first point is the start point, and the second point is the end point.
  • point: consists of one point, for example, 0,100.
  • polyline: consists of multiple points, for example, 0,100],[50,95],[10,60],[500,400.
@modelarts:from No String ID of the head entity in the triplet relationship label, which is a default attribute dedicated to the triplet relationship label.
@modelarts:hard No String

Sample labeled as a hard sample or not, which is a default attribute. Options:

  • 0/false: not a hard example
  • 1/true: hard example
@modelarts:hard_coefficient No String Coefficient of difficulty of each label level, which is a default attribute. The value range is [0,1].
@modelarts:hard_reasons No String

Reasons that the sample is a hard sample, which is a default attribute. Use a hyphen (-) to separate every two hard sample reason IDs, for example, 3-20-21-19. The options are as follows:

  • 0: No target objects are identified.
  • 1: The confidence is low.
  • 2: The clustering result based on the training dataset is inconsistent with the prediction result.
  • 3: The prediction result is greatly different from the data of the same type in the training dataset.
  • 4: The prediction results of multiple consecutive similar images are inconsistent.
  • 5: There is a large offset between the image resolution and the feature distribution of the training dataset.
  • 6: There is a large offset between the aspect ratio of the image and the feature distribution of the training dataset.
  • 7: There is a large offset between the brightness of the image and the feature distribution of the training dataset.
  • 8: There is a large offset between the saturation of the image and the feature distribution of the training dataset.
  • 9: There is a large offset between the color richness of the image and the feature distribution of the training dataset.
  • 10: There is a large offset between the definition of the image and the feature distribution of the training dataset.
  • 11: There is a large offset between the number of frames of the image and the feature distribution of the training dataset.
  • 12: There is a large offset between the standard deviation of area of image frames and the feature distribution of the training dataset.
  • 13: There is a large offset between the aspect ratio of image frames and the feature distribution of the training dataset.
  • 14: There is a large offset between the area portion of image frames and the feature distribution of the training dataset.
  • 15: There is a large offset between the edge of image frames and the feature distribution of the training dataset.
  • 16: There is a large offset between the brightness of image frames and the feature distribution of the training dataset.
  • 17: There is a large offset between the definition of image frames and the feature distribution of the training dataset.
  • 18: There is a large offset between the stack of image frames and the feature distribution of the training dataset.
  • 19: The data enhancement result based on GaussianBlur is inconsistent with the prediction result of the original image.
  • 20: The data enhancement result based on fliplr is inconsistent with the prediction result of the original image.
  • 21: The data enhancement result based on Crop is inconsistent with the prediction result of the original image.
  • 22: The data enhancement result based on flipud is inconsistent with the prediction result of the original image.
  • 23: The data enhancement result based on scale is inconsistent with the prediction result of the original image.
  • 24: The data enhancement result based on translate is inconsistent with the prediction result of the original image.
  • 25: The data enhancement result based on shear is inconsistent with the prediction result of the original image.
  • 26: The data enhancement result based on superpixels is inconsistent with the prediction result of the original image.
  • 27: The data enhancement result based on sharpen is inconsistent with the prediction result of the original image.
  • 28: The data enhancement result based on add is inconsistent with the prediction result of the original image.
  • 29: The data enhancement result based on invert is inconsistent with the prediction result of the original image.
  • 30: The data is predicted to be abnormal.
@modelarts:shape No String

Object shape, which is a default attribute dedicated to the object detection label and is left empty by default. The options are as follows:

  • bndbox: rectangle
  • polygon: polygon
  • circle: circle
  • line: straight line
  • dashed: dotted line
  • point: point
  • polyline: polyline
@modelarts:source No String Speech source, which is a default attribute dedicated to the speech start/end point label and can be set to a speaker or narrator.
@modelarts:start_index No Integer Start position of the text, which is a default attribute dedicated to the named entity label. The start value begins from 0, including the character corresponding to the value of start_index.
@modelarts:start_time No String Speech start time, which is a default attribute dedicated to the speech start/end point label, in the format of hh:mm:ss.SSS. (hh indicates hour; mm indicates minute; ss indicates second; and SSS indicates millisecond.)
@modelarts:to No String ID of the tail entity in the triplet relationship label, which is a default attribute dedicated to the triplet relationship label.
Table 15 SampleMetadata
Parameter Mandatory Type Description
@modelarts:hard No Double

Whether the sample is labeled as a hard sample, which is a default attribute. The options are as follows:

  • 0: non-hard sample
  • 1: hard sample
@modelarts:hard_coefficient No Double Coefficient of difficulty of each sample level, which is a default attribute. The value range is [0,1].
@modelarts:hard_reasons No Array of integers

ID of a hard sample reason, which is a default attribute. The options are as follows:

  • 0: No target objects are identified.
  • 1: The confidence is low.
  • 2: The clustering result based on the training dataset is inconsistent with the prediction result.
  • 3: The prediction result is greatly different from the data of the same type in the training dataset.
  • 4: The prediction results of multiple consecutive similar images are inconsistent.
  • 5: There is a large offset between the image resolution and the feature distribution of the training dataset.
  • 6: There is a large offset between the aspect ratio of the image and the feature distribution of the training dataset.
  • 7: There is a large offset between the brightness of the image and the feature distribution of the training dataset.
  • 8: There is a large offset between the saturation of the image and the feature distribution of the training dataset.
  • 9: There is a large offset between the color richness of the image and the feature distribution of the training dataset.
  • 10: There is a large offset between the definition of the image and the feature distribution of the training dataset.
  • 11: There is a large offset between the number of frames of the image and the feature distribution of the training dataset.
  • 12: There is a large offset between the standard deviation of area of image frames and the feature distribution of the training dataset.
  • 13: There is a large offset between the aspect ratio of image frames and the feature distribution of the training dataset.
  • 14: There is a large offset between the area portion of image frames and the feature distribution of the training dataset.
  • 15: There is a large offset between the edge of image frames and the feature distribution of the training dataset.
  • 16: There is a large offset between the brightness of image frames and the feature distribution of the training dataset.
  • 17: There is a large offset between the definition of image frames and the feature distribution of the training dataset.
  • 18: There is a large offset between the stack of image frames and the feature distribution of the training dataset.
  • 19: The data enhancement result based on GaussianBlur is inconsistent with the prediction result of the original image.
  • 20: The data enhancement result based on fliplr is inconsistent with the prediction result of the original image.
  • 21: The data enhancement result based on Crop is inconsistent with the prediction result of the original image.
  • 22: The data enhancement result based on flipud is inconsistent with the prediction result of the original image.
  • 23: The data enhancement result based on scale is inconsistent with the prediction result of the original image.
  • 24: The data enhancement result based on translate is inconsistent with the prediction result of the original image.
  • 25: The data enhancement result based on shear is inconsistent with the prediction result of the original image.
  • 26: The data enhancement result based on superpixels is inconsistent with the prediction result of the original image.
  • 27: The data enhancement result based on sharpen is inconsistent with the prediction result of the original image.
  • 28: The data enhancement result based on add is inconsistent with the prediction result of the original image.
  • 29: The data enhancement result based on invert is inconsistent with the prediction result of the original image.
  • 30: The data is predicted to be abnormal.
@modelarts:size No Array of objects Image size (width, height, and depth of the image), which is a default attribute, with type of List. In the list, the first number indicates the width (pixels), the second number indicates the height (pixels), and the third number indicates the depth (the depth can be left blank and the default value is 3). For example, [100,200,3] and [100,200] are both valid. Note: This parameter is mandatory only when the sample label list contains the object detection label.

Response Parameters

Status code: 200

Table 16 Response body parameters
Parameter Type Description
task_id String Task ID.

Example Requests

  • Starting an Auto Labeling (Active Learning) Task

    {
      "task_type" : "auto-label",
      "collect_key_sample" : true,
      "config" : {
        "algorithm_type" : "fast"
      }
    }
  • Starting an Auto Labeling (Pre-labeling) Task

    {
      "task_type" : "pre-label",
      "model_id" : "c4989033-7584-44ee-a180-1c476b810e46",
      "collect_key_sample" : true,
      "config" : {
        "inf_config_list" : [ {
          "specification" : "modelarts.vm.cpu.2u",
          "instance_count" : 1
        } ]
      }
    }
  • Starting an Auto Grouping Task

    {
      "type" : 2,
      "export_type" : 1,
      "config" : {
        "n_clusters" : "2",
        "ambiguity" : false,
        "image_brightness" : false,
        "image_colorfulness" : false,
        "property" : "size"
      }
    }

Example Responses

Status code: 200

OK

{
  "task_id" : "r0jT2zwxBDKf8KEnSuZ"
}

Status Codes

Status Code Description
200 OK
401 Unauthorized
403 Forbidden
404 Not Found

Error Codes

See Error Codes <modelarts_03_0095>.