
ModelArts GA API 06052022 from R&D R&D has provided a right version of ModelArts GA API (06052022) Reviewed-by: Artem Goncharov <Artem.goncharov@gmail.com>
63 KiB
Creating a Dataset Export Task
Function
This API is used to create a dataset export task to export a dataset to OBS or new datasets.
URI
POST /v2/{project_id}/datasets/{dataset_id}/export-tasks
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
dataset_id |
Yes |
String |
Dataset ID. |
project_id |
Yes |
String |
Project ID. For details about how to obtain the project ID, see Obtaining a Project ID. |
Request Parameters
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
annotation_format |
No |
String |
Labeling format. The options are as follows: - VOC: VOC - COCO: COCO |
dataset_id |
No |
String |
Dataset ID. |
dataset_type |
No |
Integer |
Dataset type. The options are as follows: - 0: image classification - 1: object detection - 100: text classification - 101: named entity recognition - 102: text triplet - 200: sound classification - 201: speech content - 202: speech paragraph labeling - 400: table dataset - 600: video labeling - 900: custom format |
export_format |
No |
Integer |
Format of the exported directory. The options are as follows: - 1: tree structure. For example: cat/1.jpg,dog/2.jpg. - 2: tile structure. For example: 1.jpg, 1.txt; 2.jpg,2.txt. |
export_params |
No |
ExportParams object |
Parameters of a dataset export task. |
export_type |
No |
Integer |
Export type. The options are as follows: - 0: labeled - 1: unlabeled - 2: all - 3: conditional search |
path |
No |
String |
Export output path. |
sample_state |
No |
String |
Sample status. The options are as follows: - ALL: labeled - NONE: unlabeled - UNCHECK: pending acceptance - ACCEPTED: accepted - REJECTED: rejected - UNREVIEWED: pending review - REVIEWED: reviewed - WORKFORCE_SAMPLED: sampled - WORKFORCE_SAMPLED_UNCHECK: sampling unchecked - WORKFORCE_SAMPLED_CHECKED: sampling checked - WORKFORCE_SAMPLED_ACCEPTED: sampling accepted - WORKFORCE_SAMPLED_REJECTED: sampling rejected - AUTO_ANNOTATION: to be confirmed |
source_type_header |
No |
String |
Prefix of the OBS path in the exported labeling file. The default value is obs://. You can set it to s3://. The image path starting with obs cannot be parsed during training. Set the path prefix in the exported manifest file to s3://. |
status |
No |
Integer |
Task status. |
task_id |
No |
String |
Task ID. |
version_format |
No |
String |
Format of a dataset version. The options are as follows: - Default: default format - CarbonData: CarbonData (supported only by table datasets) - CSV: CSV |
version_id |
No |
String |
Dataset version ID. |
with_column_header |
No |
Boolean |
Whether to write the column name in the first line of the CSV file during export. This field is valid for the table dataset. The options are as follows: - true: Write the column name in the first line of the CSV file. (Default value) - false: Do not write the column name in the first line of the CSV file. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
clear_hard_property |
No |
Boolean |
Whether to clear hard example attributes. The options are as follows: - true: Clear hard example attributes. (Default value) - false: Do not clear hard example attributes. |
export_dataset_version_format |
No |
String |
Format of the dataset version to which data is exported. |
export_dataset_version_name |
No |
String |
Name of the dataset version to which data is exported. |
export_dest |
No |
String |
Export destination. The options are as follows: - DIR: Export data to OBS. (Default value) - NEW_DATASET: Export data to a new dataset. |
export_new_dataset_name |
No |
String |
Name of the new dataset to which data is exported. |
export_new_dataset_work_path |
No |
String |
Working directory of the new dataset to which data is exported. |
ratio_sample_usage |
No |
Boolean |
Whether to randomly allocate the training set and validation set based on the specified ratio. The options are as follows: - true: Allocate the training set and validation set. - false: Do not allocate the training set and validation set. (Default value) |
sample_state |
No |
String |
Sample status. The options are as follows: - ALL: labeled - NONE: unlabeled - UNCHECK: pending acceptance - ACCEPTED: accepted - REJECTED: rejected - UNREVIEWED: pending review - REVIEWED: reviewed - WORKFORCE_SAMPLED: sampled - WORKFORCE_SAMPLED_UNCHECK: sampling unchecked - WORKFORCE_SAMPLED_CHECKED: sampling checked - WORKFORCE_SAMPLED_ACCEPTED: sampling accepted - WORKFORCE_SAMPLED_REJECTED: sampling rejected - AUTO_ANNOTATION: to be confirmed |
samples |
No |
Array of strings |
ID list of exported samples. |
search_conditions |
No |
Array of SearchCondition objects |
Exported search conditions. The relationship between multiple search conditions is OR. |
train_sample_ratio |
No |
String |
Split ratio of training set and verification set during specified version release. The default value is 1.00, indicating that all released versions are training sets. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
coefficient |
No |
String |
Filter by coefficient of difficulty. |
frame_in_video |
No |
Integer |
A frame in the video. |
hard |
No |
String |
Whether a sample is a hard sample. The options are as follows: - 0: non-hard sample - 1: hard sample |
import_origin |
No |
String |
Filter by data source. |
kvp |
No |
String |
CT dosage, filtered by dosage. |
label_list |
No |
SearchLabels object |
Label search criteria. |
labeler |
No |
String |
Labeler. |
metadata |
No |
SearchProp object |
Search by sample attribute. |
parent_sample_id |
No |
String |
Parent sample ID. |
sample_dir |
No |
String |
Directory where data samples are stored (the directory must end with a slash (/)). Only samples in the specified directory are searched for. Recursive search of directories is not supported. |
sample_name |
No |
String |
Search by sample name, including the file name extension. |
sample_time |
No |
String |
When a sample is added to the dataset, an index is created based on the last modification time (accurate to day) of the sample on OBS. You can search for the sample based on the time. The options are as follows: - month: Search for samples added from 30 days ago to the current day. - **day**: Search for samples added from yesterday (one day ago) to the current day. - **yyyyMMdd-yyyyMMdd**: Search for samples added in a specified period (at most 30 days), in the format of **Start date-End date**. For example, **20190901-2019091501** indicates that samples generated from September 1 to September 15, 2019 are searched. |
score |
No |
String |
Search by confidence. |
slice_thickness |
No |
String |
DICOM layer thickness. Samples are filtered by layer thickness. |
study_date |
No |
String |
DICOM scanning time. |
time_in_video |
No |
String |
A time point in the video. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
labels |
No |
Array of SearchLabel objects |
List of label search criteria. |
op |
No |
String |
If you want to search for multiple labels, op must be specified. If you search for only one label, op can be left blank. The options are as follows: - OR: OR operation - AND: AND operation |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
name |
No |
String |
Label name. |
op |
No |
String |
Operation type between multiple attributes. The options are as follows: - OR: OR operation - AND: AND operation |
property |
No |
Map<String,Array<String>> |
Label attribute, which is in the Object format and stores any key-value pairs. key indicates the attribute name, and value indicates the value list. If value is null, the search is not performed by value. Otherwise, the search value can be any value in the list. |
type |
No |
Integer |
Label type. The options are as follows: - 0: image classification - 1: object detection - 100: text classification - 101: named entity recognition - 102: text triplet relationship - 103: text triplet entity - 200: speech classification - 201: speech content - 202: speech paragraph labeling - 600: video classification |
Response Parameters
Status code: 200
Parameter |
Type |
Description |
---|---|---|
create_time |
Long |
Time when a task is created. |
error_code |
String |
Error code. |
error_msg |
String |
Error message. |
export_format |
Integer |
Format of the exported directory. The options are as follows: - 1: tree structure. For example: cat/1.jpg,dog/2.jpg. - 2: tile structure. For example: 1.jpg, 1.txt; 2.jpg,2.txt. |
export_params |
ExportParams object |
Parameters of a dataset export task. |
export_type |
Integer |
Export type. The options are as follows: - 0: labeled - 1: unlabeled - 2: all - 3: conditional search |
finished_sample_count |
Integer |
Number of completed samples. |
path |
String |
Export output path. |
progress |
Float |
Percentage of current task progress. |
status |
String |
Task status. The options are as follows: - INIT: initialized - RUNNING: running - FAILED: failed - SUCCESSED: completed |
task_id |
String |
Task ID. |
total_sample_count |
Integer |
Total number of samples. |
update_time |
Long |
Time when a task is updated. |
version_format |
String |
Format of a dataset version. The options are as follows: - Default: default format - CarbonData: CarbonData (supported only by table datasets) - CSV: CSV |
version_id |
String |
Dataset version ID. |
Parameter |
Type |
Description |
---|---|---|
clear_hard_property |
Boolean |
Whether to clear hard example attributes. The options are as follows: - true: Clear hard example attributes. (Default value) - false: Do not clear hard example attributes. |
export_dataset_version_format |
String |
Format of the dataset version to which data is exported. |
export_dataset_version_name |
String |
Name of the dataset version to which data is exported. |
export_dest |
String |
Export destination. The options are as follows: - DIR: Export data to OBS. (Default value) - NEW_DATASET: Export data to a new dataset. |
export_new_dataset_name |
String |
Name of the new dataset to which data is exported. |
export_new_dataset_work_path |
String |
Working directory of the new dataset to which data is exported. |
ratio_sample_usage |
Boolean |
Whether to randomly allocate the training set and validation set based on the specified ratio. The options are as follows: - true: Allocate the training set and validation set. - false: Do not allocate the training set and validation set. (Default value) |
sample_state |
String |
Sample status. The options are as follows: - ALL: labeled - NONE: unlabeled - UNCHECK: pending acceptance - ACCEPTED: accepted - REJECTED: rejected - UNREVIEWED: pending review - REVIEWED: reviewed - WORKFORCE_SAMPLED: sampled - WORKFORCE_SAMPLED_UNCHECK: sampling unchecked - WORKFORCE_SAMPLED_CHECKED: sampling checked - WORKFORCE_SAMPLED_ACCEPTED: sampling accepted - WORKFORCE_SAMPLED_REJECTED: sampling rejected - AUTO_ANNOTATION: to be confirmed |
samples |
Array of strings |
ID list of exported samples. |
search_conditions |
Array of SearchCondition objects |
Exported search conditions. The relationship between multiple search conditions is OR. |
train_sample_ratio |
String |
Split ratio of training set and verification set during specified version release. The default value is 1.00, indicating that all released versions are training sets. |
Parameter |
Type |
Description |
---|---|---|
coefficient |
String |
Filter by coefficient of difficulty. |
frame_in_video |
Integer |
A frame in the video. |
hard |
String |
Whether a sample is a hard sample. The options are as follows: - 0: non-hard sample - 1: hard sample |
import_origin |
String |
Filter by data source. |
kvp |
String |
CT dosage, filtered by dosage. |
label_list |
SearchLabels object |
Label search criteria. |
labeler |
String |
Labeler. |
metadata |
SearchProp object |
Search by sample attribute. |
parent_sample_id |
String |
Parent sample ID. |
sample_dir |
String |
Directory where data samples are stored (the directory must end with a slash (/)). Only samples in the specified directory are searched for. Recursive search of directories is not supported. |
sample_name |
String |
Search by sample name, including the file name extension. |
sample_time |
String |
When a sample is added to the dataset, an index is created based on the last modification time (accurate to day) of the sample on OBS. You can search for the sample based on the time. The options are as follows: - month: Search for samples added from 30 days ago to the current day. - **day**: Search for samples added from yesterday (one day ago) to the current day. - **yyyyMMdd-yyyyMMdd**: Search for samples added in a specified period (at most 30 days), in the format of **Start date-End date**. For example, **20190901-2019091501** indicates that samples generated from September 1 to September 15, 2019 are searched. |
score |
String |
Search by confidence. |
slice_thickness |
String |
DICOM layer thickness. Samples are filtered by layer thickness. |
study_date |
String |
DICOM scanning time. |
time_in_video |
String |
A time point in the video. |
Parameter |
Type |
Description |
---|---|---|
labels |
Array of SearchLabel objects |
List of label search criteria. |
op |
String |
If you want to search for multiple labels, op must be specified. If you search for only one label, op can be left blank. The options are as follows: - OR: OR operation - AND: AND operation |
Parameter |
Type |
Description |
---|---|---|
name |
String |
Label name. |
op |
String |
Operation type between multiple attributes. The options are as follows: - OR: OR operation - AND: AND operation |
property |
Map<String,Array<String>> |
Label attribute, which is in the Object format and stores any key-value pairs. key indicates the attribute name, and value indicates the value list. If value is null, the search is not performed by value. Otherwise, the search value can be any value in the list. |
type |
Integer |
Label type. The options are as follows: - 0: image classification - 1: object detection - 100: text classification - 101: named entity recognition - 102: text triplet relationship - 103: text triplet entity - 200: speech classification - 201: speech content - 202: speech paragraph labeling - 600: video classification |
Example Requests
Creating an Export Task (Exporting Data to OBS)
{ "path" : "/test-obs/daoChu/", "export_type" : 3, "export_params" : { "sample_state" : "", "export_dest" : "DIR" } }
Creating an Export Task (Exporting Data to a New Dataset)
{ "path" : "/test-obs/classify/input/", "export_type" : 3, "export_params" : { "sample_state" : "", "export_dest" : "NEW_DATASET", "export_new_dataset_name" : "dataset-export-test", "export_new_dataset_work_path" : "/test-obs/classify/output/" } }
Example Responses
Status code: 200
OK
{ "task_id" : "rF9NNoB56k5rtYKg2Y7" }
Status Codes
Status Code |
Description |
---|---|
200 |
OK |
401 |
Unauthorized |
403 |
Forbidden |
404 |
Not Found |
Error Codes
See Error Codes.