This API is used to create a dataset.
POST /v2/{project_id}/datasets
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
project_id |
Yes |
String |
Project ID. For details about how to obtain the project ID, see Obtaining a Project ID. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
data_format |
No |
String |
Data format. The options are as follows:
|
data_sources |
No |
Array of DataSource objects |
Input dataset path, which is used to synchronize source data (such as images, text files, and audio files) in the directory and its subdirectories to the dataset. For a table dataset, this parameter indicates the import directory. The work directory of a table dataset cannot be an OBS path in a KMS-encrypted bucket. |
dataset_name |
Yes |
String |
Dataset name. The value contains 1 to 100 characters. Only letters, digits, underscores (_), and hyphens (-) are allowed, for example, dataset-9f3b. |
dataset_type |
No |
Integer |
Dataset type. The options are as follows:
|
description |
No |
String |
Dataset description. The value is empty by default. The description contains 0 to 256 characters and does not support the following special characters: ^!<>=&"' |
import_annotations |
No |
Boolean |
Whether to automatically import the labeling information in the input directory, supporting detection, image classification, and text classification. The options are as follows:
|
import_data |
No |
Boolean |
Whether to import data. This parameter is used only for table datasets. The options are as follows:
|
label_format |
No |
LabelFormat object |
Label format information. This parameter is used only for text datasets. |
labels |
No |
Array of Label objects |
Dataset label list. |
managed |
No |
Boolean |
Whether to host a dataset. The options are as follows:
|
schema |
No |
Array of Field objects |
Schema list. |
work_path |
Yes |
String |
Output dataset path, which is used to store output files such as label files.
|
work_path_type |
Yes |
Integer |
Type of the dataset output path. The options are as follows:
|
workforce_information |
No |
WorkforceInformation object |
Team labeling information. |
workspace_id |
No |
String |
Workspace ID. If no workspace is created, the default value is 0. If a workspace is created and used, use the actual value. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
data_path |
No |
String |
Data source path. |
data_type |
No |
Integer |
Data type. The options are as follows:
|
schema_maps |
No |
Array of SchemaMap objects |
Schema mapping information corresponding to the table data. |
source_info |
No |
SourceInfo object |
Information required for importing a table data source. |
with_column_header |
No |
Boolean |
Whether the first row in the file is a column name. This field is valid for the table dataset. The options are as follows:
|
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
dest_name |
No |
String |
Name of the destination column. |
src_name |
No |
String |
Name of the source column. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
cluster_id |
No |
String |
ID of an MRS cluster. |
cluster_mode |
No |
String |
Running mode of an MRS cluster. The options are as follows:
|
cluster_name |
No |
String |
Name of an MRS cluster. |
database_name |
No |
String |
Name of the database to which the table dataset is imported. |
input |
No |
String |
HDFS path of a table dataset. |
ip |
No |
String |
IP address of your GaussDB(DWS) cluster. |
port |
No |
String |
Port number of your GaussDB(DWS) cluster. |
queue_name |
No |
String |
DLI queue name of a table dataset. |
subnet_id |
No |
String |
Subnet ID of an MRS cluster. |
table_name |
No |
String |
Name of the table to which a table dataset is imported. |
user_name |
No |
String |
Username, which is mandatory for GaussDB(DWS) data. |
user_password |
No |
String |
User password, which is mandatory for GaussDB(DWS) data. |
vpc_id |
No |
String |
ID of the VPC where an MRS cluster resides. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
label_type |
No |
String |
Label type of text classification. The options are as follows:- 0: The label is separated from the text, and they are distinguished by the fixed suffix _result. For example, the text file is abc.txt, and the label file is abc_result.txt.- 1: Default value. Labels and texts are stored in the same file and separated by separators. You can use text_sample_separator to specify the separator between the text and label and text_label_separator to specify the separator between labels. |
text_label_separator |
No |
String |
Separator between labels. By default, a comma (,) is used as the separator. The separator needs to be escaped. The separator can contain only one character, such as a letter, a digit, or any of the following special characters: !@#$%^&*_=|?/':.;, |
text_sample_separator |
No |
String |
Separator between the text and label. By default, the Tab key is used as the separator. The separator needs to be escaped. The separator can contain only one character, such as a letter, a digit, or any of the following special characters: !@#$%^&*_=|?/':.;, |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
attributes |
No |
Array of LabelAttribute objects |
Multi-dimensional attribute of a label. For example, if the label is music, attributes such as style and artist may be included. |
name |
No |
String |
Label name. |
property |
No |
LabelProperty object |
Basic attribute key-value pair of a label, such as color and shortcut keys. |
type |
No |
Integer |
Label type. The options are as follows:
|
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
default_value |
No |
String |
Default value of a label attribute. |
id |
No |
String |
Label attribute ID. |
name |
No |
String |
Label attribute name. |
type |
No |
String |
Label attribute type. The options are as follows:
|
values |
No |
Array of LabelAttributeValue objects |
List of label attribute values. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
id |
No |
String |
Label attribute value ID. |
value |
No |
String |
Label attribute value. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
@modelarts:color |
No |
String |
Default attribute: Label color, which is a hexadecimal code of the color. By default, this parameter is left blank. Example: #FFFFF0. |
@modelarts:default_shape |
No |
String |
Default attribute: Default shape of an object detection label (dedicated attribute). By default, this parameter is left blank. The options are as follows:
|
@modelarts:from_type |
No |
String |
Default attribute: Type of the head entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset. |
@modelarts:rename_to |
No |
String |
Default attribute: The new name of the label. |
@modelarts:shortcut |
No |
String |
Default attribute: Label shortcut key. By default, this parameter is left blank. For example: D. |
@modelarts:to_type |
No |
String |
Default attribute: Type of the tail entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
description |
No |
String |
Schema description. |
name |
No |
String |
Schema name. |
schema_id |
No |
Integer |
Schema ID. |
type |
No |
String |
Schema value type. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
data_sync_type |
No |
Integer |
Synchronization type. The options are as follows:
|
repetition |
No |
Integer |
Number of persons who label each sample. The minimum value is 1. |
synchronize_auto_labeling_data |
No |
Boolean |
Whether to synchronously update auto labeling data. The options are as follows:
|
synchronize_data |
No |
Boolean |
Whether to synchronize updated data, such as uploading files, synchronizing data sources, and assigning imported unlabeled files to team members. The options are as follows:
|
task_id |
No |
String |
ID of a team labeling task. |
task_name |
Yes |
String |
Name of a team labeling task. The value contains 1 to 64 characters, including only letters, digits, underscores (_), and hyphens (-). |
workforces_config |
No |
WorkforcesConfig object |
Manpower assignment of a team labeling task. You can delegate the system administrator to assign the manpower or do it by yourself. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
agency |
No |
String |
Team administrator. |
workforces |
No |
Array of WorkforceConfig objects |
List of teams that execute labeling tasks. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
workers |
No |
Array of Worker objects |
List of labeling team members. |
workforce_id |
No |
String |
ID of a labeling team. |
workforce_name |
No |
String |
Name of a labeling team. The value contains 0 to 1024 characters and does not support the following special characters: !<>=&"' |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
create_time |
No |
Long |
Creation time. |
description |
No |
String |
Labeling team member description. The value contains 0 to 256 characters and does not support the following special characters: ^!<>=&"' |
No |
String |
Email address of a labeling team member. |
|
role |
No |
Integer |
Role. The options are as follows:
|
status |
No |
Integer |
Current login status of a labeling team member. The options are as follows:
|
update_time |
No |
Long |
Update time. |
worker_id |
No |
String |
ID of a labeling team member. |
workforce_id |
No |
String |
ID of a labeling team. |
Status code: 201
Parameter |
Type |
Description |
---|---|---|
dataset_id |
String |
Dataset ID. |
error_code |
String |
Error code. |
error_msg |
String |
Error message. |
import_task_id |
String |
ID of an import task. |
{ "workspace_id" : "0", "dataset_name" : "dataset-457f", "dataset_type" : 0, "data_sources" : [ { "data_type" : 0, "data_path" : "/test-obs/classify/input/cat-rabbit/" } ], "description" : "", "work_path" : "/test-obs/classify/output/", "work_path_type" : 0, "labels" : [ { "name" : "Cat", "type" : 0, "property" : { "@modelarts:color" : "#3399ff" } }, { "name" : "Rabbit", "type" : 0, "property" : { "@modelarts:color" : "#3399ff" } } ] }
{ "workspace_id" : "0", "dataset_name" : "dataset-95a6", "dataset_type" : 1, "data_sources" : [ { "data_type" : 0, "data_path" : "/test-obs/detect/input/cat-rabbit/" } ], "description" : "", "work_path" : "/test-obs/detect/output/", "work_path_type" : 0, "labels" : [ { "name" : "Cat", "type" : 1, "property" : { "@modelarts:color" : "#3399ff" } }, { "name" : "Rabbit", "type" : 1, "property" : { "@modelarts:color" : "#3399ff" } } ] }
{ "workspace_id" : "0", "dataset_name" : "dataset-de83", "dataset_type" : 400, "data_sources" : [ { "data_type" : 0, "data_path" : "/test-obs/table/input/", "with_column_header" : true } ], "description" : "", "work_path" : "/test-obs/table/output/", "work_path_type" : 0, "schema" : [ { "schema_id" : 1, "name" : "150", "type" : "STRING" }, { "schema_id" : 2, "name" : "4", "type" : "STRING" }, { "schema_id" : 3, "name" : "setosa", "type" : "STRING" }, { "schema_id" : 4, "name" : "versicolor", "type" : "STRING" }, { "schema_id" : 5, "name" : "virginica", "type" : "STRING" } ], "import_data" : true }
Status code: 201
Created
{ "dataset_id" : "WxCREuCkBSAlQr9xrde" }
Status Code |
Description |
---|---|
201 |
Created |
401 |
Unauthorized |
403 |
Forbidden |
404 |
Not Found |
See Error Codes.