ModelArts GA API 06052022 from R&D (#14 )

ModelArts GA API 06052022 from R&D

R&D has provided a right version of ModelArts GA API (06052022)

Reviewed-by: Artem Goncharov <Artem.goncharov@gmail.com>

2022-05-23 16:26:34 +00:00

63 KiB

Raw Permalink Blame History

Creating a Dataset Export Task

Function

This API is used to create a dataset export task to export a dataset to OBS or new datasets.

URI

POST /v2/{project_id}/datasets/{dataset_id}/export-tasks

**Table 1** Path Parameters
Parameter	Mandatory	Type	Description
dataset_id	Yes	String	Dataset ID.
project_id	Yes	String	Project ID. For details about how to obtain the project ID, see Obtaining a Project ID.

Request Parameters

**Table 2** Request body parameters
Parameter	Mandatory	Type	Description
annotation_format	No	String	Labeling format. The options are as follows: - VOC: VOC - COCO: COCO
dataset_id	No	String	Dataset ID.
dataset_type	No	Integer	Dataset type. The options are as follows: - 0: image classification - 1: object detection - 100: text classification - 101: named entity recognition - 102: text triplet - 200: sound classification - 201: speech content - 202: speech paragraph labeling - 400: table dataset - 600: video labeling - 900: custom format
export_format	No	Integer	Format of the exported directory. The options are as follows: - 1: tree structure. For example: cat/1.jpg,dog/2.jpg. - 2: tile structure. For example: 1.jpg, 1.txt; 2.jpg,2.txt.
export_params	No	ExportParams object	Parameters of a dataset export task.
export_type	No	Integer	Export type. The options are as follows: - 0: labeled - 1: unlabeled - 2: all - 3: conditional search
path	No	String	Export output path.
sample_state	No	String	Sample status. The options are as follows: - ALL: labeled - NONE: unlabeled - UNCHECK: pending acceptance - ACCEPTED: accepted - REJECTED: rejected - UNREVIEWED: pending review - REVIEWED: reviewed - WORKFORCE_SAMPLED: sampled - WORKFORCE_SAMPLED_UNCHECK: sampling unchecked - WORKFORCE_SAMPLED_CHECKED: sampling checked - WORKFORCE_SAMPLED_ACCEPTED: sampling accepted - WORKFORCE_SAMPLED_REJECTED: sampling rejected - AUTO_ANNOTATION: to be confirmed
source_type_header	No	String	Prefix of the OBS path in the exported labeling file. The default value is obs://. You can set it to s3://. The image path starting with obs cannot be parsed during training. Set the path prefix in the exported manifest file to s3://.
status	No	Integer	Task status.
task_id	No	String	Task ID.
version_format	No	String	Format of a dataset version. The options are as follows: - Default: default format - CarbonData: CarbonData (supported only by table datasets) - CSV: CSV
version_id	No	String	Dataset version ID.
with_column_header	No	Boolean	Whether to write the column name in the first line of the CSV file during export. This field is valid for the table dataset. The options are as follows: - true: Write the column name in the first line of the CSV file. (Default value) - false: Do not write the column name in the first line of the CSV file.

**Table 3** ExportParams
Parameter	Mandatory	Type	Description
clear_hard_property	No	Boolean	Whether to clear hard example attributes. The options are as follows: - true: Clear hard example attributes. (Default value) - false: Do not clear hard example attributes.
export_dataset_version_format	No	String	Format of the dataset version to which data is exported.
export_dataset_version_name	No	String	Name of the dataset version to which data is exported.
export_dest	No	String	Export destination. The options are as follows: - DIR: Export data to OBS. (Default value) - NEW_DATASET: Export data to a new dataset.
export_new_dataset_name	No	String	Name of the new dataset to which data is exported.
export_new_dataset_work_path	No	String	Working directory of the new dataset to which data is exported.
ratio_sample_usage	No	Boolean	Whether to randomly allocate the training set and validation set based on the specified ratio. The options are as follows: - true: Allocate the training set and validation set. - false: Do not allocate the training set and validation set. (Default value)
sample_state	No	String	Sample status. The options are as follows: - ALL: labeled - NONE: unlabeled - UNCHECK: pending acceptance - ACCEPTED: accepted - REJECTED: rejected - UNREVIEWED: pending review - REVIEWED: reviewed - WORKFORCE_SAMPLED: sampled - WORKFORCE_SAMPLED_UNCHECK: sampling unchecked - WORKFORCE_SAMPLED_CHECKED: sampling checked - WORKFORCE_SAMPLED_ACCEPTED: sampling accepted - WORKFORCE_SAMPLED_REJECTED: sampling rejected - AUTO_ANNOTATION: to be confirmed
samples	No	Array of strings	ID list of exported samples.
search_conditions	No	Array of SearchCondition objects	Exported search conditions. The relationship between multiple search conditions is OR.
train_sample_ratio	No	String	Split ratio of training set and verification set during specified version release. The default value is 1.00, indicating that all released versions are training sets.

**Table 4** SearchCondition
Parameter	Mandatory	Type	Description
coefficient	No	String	Filter by coefficient of difficulty.
frame_in_video	No	Integer	A frame in the video.
hard	No	String	Whether a sample is a hard sample. The options are as follows: - 0: non-hard sample - 1: hard sample
import_origin	No	String	Filter by data source.
kvp	No	String	CT dosage, filtered by dosage.
label_list	No	SearchLabels object	Label search criteria.
labeler	No	String	Labeler.
metadata	No	SearchProp object	Search by sample attribute.
parent_sample_id	No	String	Parent sample ID.
sample_dir	No	String	Directory where data samples are stored (the directory must end with a slash (/)). Only samples in the specified directory are searched for. Recursive search of directories is not supported.
sample_name	No	String	Search by sample name, including the file name extension.
sample_time	No	String	When a sample is added to the dataset, an index is created based on the last modification time (accurate to day) of the sample on OBS. You can search for the sample based on the time. The options are as follows: - month: Search for samples added from 30 days ago to the current day. - day: Search for samples added from yesterday (one day ago) to the current day. - yyyyMMdd-yyyyMMdd: Search for samples added in a specified period (at most 30 days), in the format of Start date-End date. For example, 20190901-2019091501 indicates that samples generated from September 1 to September 15, 2019 are searched.
score	No	String	Search by confidence.
slice_thickness	No	String	DICOM layer thickness. Samples are filtered by layer thickness.
study_date	No	String	DICOM scanning time.
time_in_video	No	String	A time point in the video.

**Table 5** SearchLabels
Parameter	Mandatory	Type	Description
labels	No	Array of SearchLabel objects	List of label search criteria.
op	No	String	If you want to search for multiple labels, op must be specified. If you search for only one label, op can be left blank. The options are as follows: - OR: OR operation - AND: AND operation

**Table 6** SearchLabel
Parameter	Mandatory	Type	Description
name	No	String	Label name.
op	No	String	Operation type between multiple attributes. The options are as follows: - OR: OR operation - AND: AND operation
property	No	Map<String,Array<String>>	Label attribute, which is in the Object format and stores any key-value pairs. key indicates the attribute name, and value indicates the value list. If value is null, the search is not performed by value. Otherwise, the search value can be any value in the list.
type	No	Integer	Label type. The options are as follows: - 0: image classification - 1: object detection - 100: text classification - 101: named entity recognition - 102: text triplet relationship - 103: text triplet entity - 200: speech classification - 201: speech content - 202: speech paragraph labeling - 600: video classification

**Table 7** SearchProp
Parameter	Mandatory	Type	Description
op	No	String	Relationship between attribute values. The options are as follows: - AND: AND relationship - OR: OR relationship
props	No	Map<String,Array<String>>	Search criteria of an attribute. Multiple search criteria can be set.

Response Parameters

Status code: 200

**Table 8** Response body parameters
Parameter	Type	Description
create_time	Long	Time when a task is created.
error_code	String	Error code.
error_msg	String	Error message.
export_format	Integer	Format of the exported directory. The options are as follows: - 1: tree structure. For example: cat/1.jpg,dog/2.jpg. - 2: tile structure. For example: 1.jpg, 1.txt; 2.jpg,2.txt.
export_params	ExportParams object	Parameters of a dataset export task.
export_type	Integer	Export type. The options are as follows: - 0: labeled - 1: unlabeled - 2: all - 3: conditional search
finished_sample_count	Integer	Number of completed samples.
path	String	Export output path.
progress	Float	Percentage of current task progress.
status	String	Task status. The options are as follows: - INIT: initialized - RUNNING: running - FAILED: failed - SUCCESSED: completed
task_id	String	Task ID.
total_sample_count	Integer	Total number of samples.
update_time	Long	Time when a task is updated.
version_format	String	Format of a dataset version. The options are as follows: - Default: default format - CarbonData: CarbonData (supported only by table datasets) - CSV: CSV
version_id	String	Dataset version ID.

**Table 9** ExportParams
Parameter	Type	Description
clear_hard_property	Boolean	Whether to clear hard example attributes. The options are as follows: - true: Clear hard example attributes. (Default value) - false: Do not clear hard example attributes.
export_dataset_version_format	String	Format of the dataset version to which data is exported.
export_dataset_version_name	String	Name of the dataset version to which data is exported.
export_dest	String	Export destination. The options are as follows: - DIR: Export data to OBS. (Default value) - NEW_DATASET: Export data to a new dataset.
export_new_dataset_name	String	Name of the new dataset to which data is exported.
export_new_dataset_work_path	String	Working directory of the new dataset to which data is exported.
ratio_sample_usage	Boolean	Whether to randomly allocate the training set and validation set based on the specified ratio. The options are as follows: - true: Allocate the training set and validation set. - false: Do not allocate the training set and validation set. (Default value)
sample_state	String	Sample status. The options are as follows: - ALL: labeled - NONE: unlabeled - UNCHECK: pending acceptance - ACCEPTED: accepted - REJECTED: rejected - UNREVIEWED: pending review - REVIEWED: reviewed - WORKFORCE_SAMPLED: sampled - WORKFORCE_SAMPLED_UNCHECK: sampling unchecked - WORKFORCE_SAMPLED_CHECKED: sampling checked - WORKFORCE_SAMPLED_ACCEPTED: sampling accepted - WORKFORCE_SAMPLED_REJECTED: sampling rejected - AUTO_ANNOTATION: to be confirmed
samples	Array of strings	ID list of exported samples.
search_conditions	Array of SearchCondition objects	Exported search conditions. The relationship between multiple search conditions is OR.
train_sample_ratio	String	Split ratio of training set and verification set during specified version release. The default value is 1.00, indicating that all released versions are training sets.

**Table 10** SearchCondition
Parameter	Type	Description
coefficient	String	Filter by coefficient of difficulty.
frame_in_video	Integer	A frame in the video.
hard	String	Whether a sample is a hard sample. The options are as follows: - 0: non-hard sample - 1: hard sample
import_origin	String	Filter by data source.
kvp	String	CT dosage, filtered by dosage.
label_list	SearchLabels object	Label search criteria.
labeler	String	Labeler.
metadata	SearchProp object	Search by sample attribute.
parent_sample_id	String	Parent sample ID.
sample_dir	String	Directory where data samples are stored (the directory must end with a slash (/)). Only samples in the specified directory are searched for. Recursive search of directories is not supported.
sample_name	String	Search by sample name, including the file name extension.
sample_time	String	When a sample is added to the dataset, an index is created based on the last modification time (accurate to day) of the sample on OBS. You can search for the sample based on the time. The options are as follows: - month: Search for samples added from 30 days ago to the current day. - day: Search for samples added from yesterday (one day ago) to the current day. - yyyyMMdd-yyyyMMdd: Search for samples added in a specified period (at most 30 days), in the format of Start date-End date. For example, 20190901-2019091501 indicates that samples generated from September 1 to September 15, 2019 are searched.
score	String	Search by confidence.
slice_thickness	String	DICOM layer thickness. Samples are filtered by layer thickness.
study_date	String	DICOM scanning time.
time_in_video	String	A time point in the video.

**Table 11** SearchLabels
Parameter	Type	Description
labels	Array of SearchLabel objects	List of label search criteria.
op	String	If you want to search for multiple labels, op must be specified. If you search for only one label, op can be left blank. The options are as follows: - OR: OR operation - AND: AND operation

**Table 12** SearchLabel
Parameter	Type	Description
name	String	Label name.
op	String	Operation type between multiple attributes. The options are as follows: - OR: OR operation - AND: AND operation
property	Map<String,Array<String>>	Label attribute, which is in the Object format and stores any key-value pairs. key indicates the attribute name, and value indicates the value list. If value is null, the search is not performed by value. Otherwise, the search value can be any value in the list.
type	Integer	Label type. The options are as follows: - 0: image classification - 1: object detection - 100: text classification - 101: named entity recognition - 102: text triplet relationship - 103: text triplet entity - 200: speech classification - 201: speech content - 202: speech paragraph labeling - 600: video classification

**Table 13** SearchProp
Parameter	Type	Description
op	String	Relationship between attribute values. The options are as follows: - AND: AND relationship - OR: OR relationship
props	Map<String,Array<String>>	Search criteria of an attribute. Multiple search criteria can be set.

Example Requests

Creating an Export Task (Exporting Data to OBS)

{
  "path" : "/test-obs/daoChu/",
  "export_type" : 3,
  "export_params" : {
    "sample_state" : "",
    "export_dest" : "DIR"
  }
}

Creating an Export Task (Exporting Data to a New Dataset)

{
  "path" : "/test-obs/classify/input/",
  "export_type" : 3,
  "export_params" : {
    "sample_state" : "",
    "export_dest" : "NEW_DATASET",
    "export_new_dataset_name" : "dataset-export-test",
    "export_new_dataset_work_path" : "/test-obs/classify/output/"
  }
}

Example Responses

Status code: 200

{
  "task_id" : "rF9NNoB56k5rtYKg2Y7"
}

Status Codes

Status Code	Description
200	OK
401	Unauthorized
403	Forbidden
404	Not Found

Error Codes

See Error Codes.

Parent topic: Data Export Task

63 KiB Raw Permalink Blame History