Lai, Weijian 2f0818cf3d ModelArts API 22.3.0 version-20240311

Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Lai, Weijian <laiweijian4@huawei.com>
Co-committed-by: Lai, Weijian <laiweijian4@huawei.com>

2024-04-05 09:35:42 +00:00

58 KiB

Raw Permalink Blame History

Creating an Import Task

Function

This API is used to create a dataset import task to import samples and labels from the storage system to the dataset.

URI

POST /v2/{project_id}/datasets/{dataset_id}/import-tasks

**Table 1** Path Parameters
Parameter	Mandatory	Type	Description
dataset_id	Yes	String	Dataset ID.
project_id	Yes	String	Project ID. For details about how to obtain the project ID, see Obtaining a Project ID.

Request Parameters

**Table 2** Request body parameters
Parameter	Mandatory	Type	Description
annotation_format	No	String	Format of the labeling information. Currently, only object detection is supported. The options are as follows: VOC: VOC COCO: COCO
data_source	No	DataSource object	Data source.
difficult_only	No	Boolean	Whether to import only hard examples. The options are as follows: true: Only difficult samples are imported. false: All samples are imported. (Default value)
excluded_labels	No	Array of Label objects	Do not import samples containing the specified label.
final_annotation	No	Boolean	Whether to import data to the final state. The options are as follows: true: Import data to the final state. (Default value) false: Do not import data to the final state.
import_annotations	No	Boolean	Whether to import labels. The options are as follows: true: Import labels. (Default value) false: Do not import labels.
import_folder	No	String	Name of the subdirectory in the dataset storage directory after import. You can specify the same subdirectory for multiple import tasks to avoid repeated import of the same samples. This field is invalid for table datasets.
import_origin	No	String	Data source. The options are as follows: obs: OBS bucket (default value) dws: GaussDB(DWS) dli: DLI rds: RDS mrs: MRS inference: Inference service
import_path	No	String	OBS path or manifest path to be imported. When importing a manifest file, ensure that the path is accurate to the manifest file. When a path is imported as a directory, the dataset type can only support image classification, object detection, text classification, or sound classification.
import_samples	No	Boolean	Whether to import samples. The options are as follows: true: Import samples. (Default value) false: Do not import samples.
import_type	No	String	Import mode. The options are as follows: 0: Import by directory. 1: Import by manifest file.
included_labels	No	Array of Label objects	Import samples containing the specified label.
label_format	No	LabelFormat object	Label format. This parameter is used only for text datasets.
with_column_header	No	Boolean	Whether the first row in the file is a column name. This field is valid for the table dataset. The options are as follows: true: The first row in the file is the column name. false: The first row in the file is not the column name. (Default value)

**Table 3** DataSource
Parameter	Mandatory	Type	Description
data_path	No	String	Data source path.
data_type	No	Integer	Data type. The options are as follows: 0: OBS bucket (default value) 1: GaussDB(DWS) 2: DLI 3: RDS 4: MRS 5: AI Gallery 6: Inference service
schema_maps	No	Array of SchemaMap objects	Schema mapping information corresponding to the table data.
source_info	No	SourceInfo object	Information required for importing a table data source.
with_column_header	No	Boolean	Whether the first row in the file is a column name. This field is valid for the table dataset. The options are as follows: true: The first row in the file is the column name. false: The first row in the file is not the column name.

**Table 4** SchemaMap
Parameter	Mandatory	Type	Description
dest_name	No	String	Name of the destination column.
src_name	No	String	Name of the source column.

**Table 5** SourceInfo
Parameter	Mandatory	Type	Description
cluster_id	No	String	ID of an MRS cluster.
cluster_mode	No	String	Running mode of an MRS cluster. The options are as follows: 0: normal cluster 1: security cluster
cluster_name	No	String	Name of an MRS cluster.
database_name	No	String	Name of the database to which the table dataset is imported.
input	No	String	HDFS path of a table dataset.
ip	No	String	IP address of your GaussDB(DWS) cluster.
port	No	String	Port number of your GaussDB(DWS) cluster.
queue_name	No	String	DLI queue name of a table dataset.
subnet_id	No	String	Subnet ID of an MRS cluster.
table_name	No	String	Name of the table to which a table dataset is imported.
user_name	No	String	Username, which is mandatory for GaussDB(DWS) data.
user_password	No	String	User password, which is mandatory for GaussDB(DWS) data.
vpc_id	No	String	ID of the VPC where an MRS cluster resides.

**Table 6** Label
Parameter	Mandatory	Type	Description
attributes	No	Array of LabelAttribute objects	Multi-dimensional attribute of a label. For example, if the label is music, attributes such as style and artist may be included.
name	No	String	Label name.
property	No	LabelProperty object	Basic attribute key-value pair of a label, such as color and shortcut keys.
type	No	Integer	Label type. The options are as follows: 0: image classification 1: object detection 100: text classification 101: named entity recognition 102: text triplet relationship 103: text triplet entity 200: speech classification 201: speech content 202: speech paragraph labeling 600: video classification

**Table 7** LabelAttribute
Parameter	Mandatory	Type	Description
default_value	No	String	Default value of a label attribute.
id	No	String	Label attribute ID.
name	No	String	Label attribute name.
type	No	String	Label attribute type. The options are as follows: text: text select: single-choice drop-down list
values	No	Array of LabelAttributeValue objects	List of label attribute values.

**Table 8** LabelAttributeValue
Parameter	Mandatory	Type	Description
id	No	String	Label attribute value ID.
value	No	String	Label attribute value.

**Table 9** LabelProperty
Parameter	Mandatory	Type	Description
@modelarts:color	No	String	Default attribute: Label color, which is a hexadecimal code of the color. By default, this parameter is left blank. Example: #FFFFF0.
@modelarts:default_shape	No	String	Default attribute: Default shape of an object detection label (dedicated attribute). By default, this parameter is left blank. The options are as follows: bndbox: rectangle polygon: polygon circle: circle line: straight line dashed: dotted line point: point polyline: polyline
@modelarts:from_type	No	String	Default attribute: Type of the head entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset.
@modelarts:rename_to	No	String	Default attribute: The new name of the label.
@modelarts:shortcut	No	String	Default attribute: Label shortcut key. By default, this parameter is left blank. For example: D.
@modelarts:to_type	No	String	Default attribute: Type of the tail entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset.

**Table 10** LabelFormat
Parameter	Mandatory	Type	Description
label_type	No	String	Label type of text classification. The options are as follows:- 0: The label is separated from the text, and they are distinguished by the fixed suffix _result. For example, the text file is abc.txt, and the label file is abc_result.txt.- 1: Default value. Labels and texts are stored in the same file and separated by separators. You can use text_sample_separator to specify the separator between the text and label and text_label_separator to specify the separator between labels.
text_label_separator	No	String	Separator between labels. By default, a comma (,) is used as the separator. The separator needs to be escaped. The separator can contain only one character, such as a letter, a digit, or any of the following special characters: !@#$%^&*_=\|?/':.;,
text_sample_separator	No	String	Separator between the text and label. By default, the Tab key is used as the separator. The separator needs to be escaped. The separator can contain only one character, such as a letter, a digit, or any of the following special characters: !@#$%^&*_=\|?/':.;,

Response Parameters

Status code: 200

**Table 11** Response body parameters
Parameter	Type	Description
task_id	String	ID of an import task.

Example Requests

Creating an Import Task (Importing Data from OBS)

{
  "import_type" : "dir",
  "import_path" : "s3://test-obs/daoLu_images/cat-rabbit/",
  "included_tags" : [ ],
  "import_annotations" : false,
  "difficult_only" : false
}

Creating an Import Task (Importing Data from Manifest)

{
  "import_type" : "manifest",
  "import_path" : "s3://test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/annotation/V002/V002.manifest",
  "included_tags" : [ "cat", "rabbit", "Cat", "Rabbit" ],
  "import_annotations" : true,
  "difficult_only" : false
}

Example Responses

Status code: 200

{
  "task_id" : "gfghHSokody6AJigS5A_m1dYqOw8vWCAznw1V28"
}

Status Codes

Status Code	Description
200	OK
401	Unauthorized
403	Forbidden
404	Not Found

Error Codes

See Error Codes.

Parent topic: Data Import Task

58 KiB Raw Permalink Blame History