80 KiB
- original_name
CreateDataset.html
Creating a Dataset
Function
This API is used to create a dataset.
URI
POST /v2/{project_id}/datasets
Parameter | Mandatory | Type | Description |
---|---|---|---|
project_id | Yes | String | Project ID. For details about how to obtain the project ID, see Obtaining a Project ID <modelarts_03_0147> . |
Request Parameters
Parameter | Mandatory | Type | Description |
---|---|---|---|
data_format | No | String | Data format. The options are as follows:
|
data_sources | No | Array of DataSource <createdataset__request_datasource> objects |
Input dataset path, which is used to synchronize source data (such as images, text files, and audio files) in the directory and its subdirectories to the dataset. For a table dataset, this parameter indicates the import directory. The work directory of a table dataset cannot be an OBS path in a KMS-encrypted bucket. |
dataset_name | Yes | String | Dataset name. The value contains 1 to 100 characters. Only letters, digits, underscores (_), and hyphens (-) are allowed, for example, dataset-9f3b. |
dataset_type | No | Integer | Dataset type. The options are as follows:
|
description | No | String | Dataset description. The value is empty by default. The description contains 0 to 256 characters and does not support the following special characters: ^!<>=&"' |
import_annotations | No | Boolean | Whether to automatically import the labeling information in the input directory, supporting detection, image classification, and text classification. The options are as follows:
|
import_data | No | Boolean | Whether to import data. This parameter is used only for table datasets. The options are as follows:
|
label_format | No | LabelFormat <createdataset__request_labelformat> object |
Label format information. This parameter is used only for text datasets. |
labels | No | Array of Label <createdataset__request_label> objects |
Dataset label list. |
managed | No | Boolean | Whether to host a dataset. The options are as follows:
|
schema | No | Array of Field <createdataset__request_field> objects |
Schema list. |
work_path | Yes | String | Output dataset path, which is used to store output files such as label files.
|
work_path_type | Yes | Integer | Type of the dataset output path. The options are as follows:
|
workforce_information | No | WorkforceInformation <createdataset__request_workforceinformation> object |
Team labeling information. |
workspace_id | No | String | Workspace ID. If no workspace is created, the default value is 0. If a workspace is created and used, use the actual value. |
Parameter | Mandatory | Type | Description |
---|---|---|---|
data_path | No | String | Data source path. |
data_type | No | Integer | Data type. The options are as follows:
|
schema_maps | No | Array of SchemaMap <createdataset__request_schemamap> objects |
Schema mapping information corresponding to the table data. |
source_info | No | SourceInfo <createdataset__request_sourceinfo> object |
Information required for importing a table data source. |
with_column_header | No | Boolean | Whether the first row in the file is a column name. This field is valid for the table dataset. The options are as follows:
|
Parameter | Mandatory | Type | Description |
---|---|---|---|
dest_name | No | String | Name of the destination column. |
src_name | No | String | Name of the source column. |
Parameter | Mandatory | Type | Description |
---|---|---|---|
cluster_id | No | String | ID of an MRS cluster. |
cluster_mode | No | String | Running mode of an MRS cluster. The options are as follows:
|
cluster_name | No | String | Name of an MRS cluster. |
database_name | No | String | Name of the database to which the table dataset is imported. |
input | No | String | HDFS path of a table dataset. |
ip | No | String | IP address of your GaussDB(DWS) cluster. |
port | No | String | Port number of your GaussDB(DWS) cluster. |
queue_name | No | String | DLI queue name of a table dataset. |
subnet_id | No | String | Subnet ID of an MRS cluster. |
table_name | No | String | Name of the table to which a table dataset is imported. |
user_name | No | String | Username, which is mandatory for GaussDB(DWS) data. |
user_password | No | String | User password, which is mandatory for GaussDB(DWS) data. |
vpc_id | No | String | ID of the VPC where an MRS cluster resides. |
Parameter | Mandatory | Type | Description |
---|---|---|---|
label_type | No | String | Label type of text classification. The options are as follows:
|
text_label_separator | No | String | Separator between labels. By default, a comma (,) is used as the separator. The separator needs to be escaped. The separator can contain only one character, such as a letter, a digit, or any of the following special characters: !@#$%^&*_=|?/':.;, |
text_sample_separator | No | String | Separator between the text and label. By default, the Tab key is used as the separator. The separator needs to be escaped. The separator can contain only one character, such as a letter, a digit, or any of the following special characters: !@#$%^&*_=|?/':.;, |
Parameter | Mandatory | Type | Description |
---|---|---|---|
attributes | No | Array of LabelAttribute <createdataset__request_labelattribute> objects |
Multi-dimensional attribute of a label. For example, if the label is music, attributes such as style and artist may be included. |
name | No | String | Label name. |
property | No | LabelProperty <createdataset__request_labelproperty> object |
Basic attribute key-value pair of a label, such as color and shortcut keys. |
type | No | Integer | Label type. The options are as follows:
|
Parameter | Mandatory | Type | Description |
---|---|---|---|
default_value | No | String | Default value of a label attribute. |
id | No | String | Label attribute ID. |
name | No | String | Label attribute name. |
type | No | String | Label attribute type. The options are as follows:
|
values | No | Array of LabelAttributeValue <createdataset__request_labelattributevalue> objects |
List of label attribute values. |
Parameter | Mandatory | Type | Description |
---|---|---|---|
id | No | String | Label attribute value ID. |
value | No | String | Label attribute value. |
Parameter | Mandatory | Type | Description |
---|---|---|---|
@modelarts:color | No | String | Default attribute: Label color, which is a hexadecimal code of the color. By default, this parameter is left blank. Example: #FFFFF0. |
@modelarts:default_shape | No | String | Default attribute: Default shape of an object detection label (dedicated attribute). By default, this parameter is left blank. The options are as follows:
|
@modelarts:from_type | No | String | Default attribute: Type of the head entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset. |
@modelarts:rename_to | No | String | Default attribute: The new name of the label. |
@modelarts:shortcut | No | String | Default attribute: Label shortcut key. By default, this parameter is left blank. For example: D. |
@modelarts:to_type | No | String | Default attribute: Type of the tail entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset. |
Parameter | Mandatory | Type | Description |
---|---|---|---|
description | No | String | Schema description. |
name | No | String | Schema name. |
schema_id | No | Integer | Schema ID. |
type | No | String | Schema value type. |
Parameter | Mandatory | Type | Description |
---|---|---|---|
data_sync_type | No | Integer | Synchronization type. The options are as follows:
|
repetition | No | Integer | Number of persons who label each sample. The minimum value is 1. |
synchronize_auto_labeling_data | No | Boolean | Whether to synchronously update auto labeling data. The options are as follows:
|
synchronize_data | No | Boolean | Whether to synchronize updated data, such as uploading files, synchronizing data sources, and assigning imported unlabeled files to team members. The options are as follows:
|
task_id | No | String | ID of a team labeling task. |
task_name | Yes | String | Name of a team labeling task. The value contains 1 to 64 characters, including only letters, digits, underscores (_), and hyphens (-). |
workforces_config | No | WorkforcesConfig <createdataset__request_workforcesconfig> object |
Manpower assignment of a team labeling task. You can delegate the team administrator to assign the manpower or do it by yourself. |
Parameter | Mandatory | Type | Description |
---|---|---|---|
agency | No | String | Team administrator. |
workforces | No | Array of WorkforceConfig <createdataset__request_workforceconfig> objects |
List of teams that execute labeling tasks. |
Parameter | Mandatory | Type | Description |
---|---|---|---|
workers | No | Array of Worker <createdataset__request_worker> objects |
List of labeling team members. |
workforce_id | No | String | ID of a labeling team. |
workforce_name | No | String | Name of a labeling team. The value contains 0 to 1024 characters and does not support the following special characters: !<>=&"' |
Parameter | Mandatory | Type | Description |
---|---|---|---|
create_time | No | Long | Creation time. |
description | No | String | Labeling team member description. The value contains 0 to 256 characters and does not support the following special characters: ^!<>=&"' |
No | String | Email address of a labeling team member. | |
role | No | Integer | Role. The options are as follows:
|
status | No | Integer | Current login status of a labeling team member. The options are as follows:
|
update_time | No | Long | Update time. |
worker_id | No | String | ID of a labeling team member. |
workforce_id | No | String | ID of a labeling team. |
Response Parameters
Status code: 201
Parameter | Type | Description |
---|---|---|
dataset_id | String | Dataset ID. |
error_code | String | Error code. |
error_msg | String | Error message. |
import_task_id | String | ID of an import task. |
Example Requests
Creating an Image Classification Dataset
{ "workspace_id" : "0", "dataset_name" : "dataset-457f", "dataset_type" : 0, "data_sources" : [ { "data_type" : 0, "data_path" : "/test-obs/classify/input/cat-dog/" } ], "description" : "", "work_path" : "/test-obs/classify/output/", "work_path_type" : 0, "labels" : [ { "name" : "Cat", "type" : 0, "property" : { "@modelarts:color" : "#3399ff" } }, { "name" : "Dog", "type" : 0, "property" : { "@modelarts:color" : "#3399ff" } } ] }
Creating an Object Detection Dataset
{ "workspace_id" : "0", "dataset_name" : "dataset-95a6", "dataset_type" : 1, "data_sources" : [ { "data_type" : 0, "data_path" : "/test-obs/detect/input/cat-dog/" } ], "description" : "", "work_path" : "/test-obs/detect/output/", "work_path_type" : 0, "labels" : [ { "name" : "Cat", "type" : 1, "property" : { "@modelarts:color" : "#3399ff" } }, { "name" : "Dog", "type" : 1, "property" : { "@modelarts:color" : "#3399ff" } } ] }
Creating a Table Dataset
{ "workspace_id" : "0", "dataset_name" : "dataset-de83", "dataset_type" : 400, "data_sources" : [ { "data_type" : 0, "data_path" : "/test-obs/table/input/", "with_column_header" : true } ], "description" : "", "work_path" : "/test-obs/table/output/", "work_path_type" : 0, "schema" : [ { "schema_id" : 1, "name" : "150", "type" : "STRING" }, { "schema_id" : 2, "name" : "4", "type" : "STRING" }, { "schema_id" : 3, "name" : "setosa", "type" : "STRING" }, { "schema_id" : 4, "name" : "versicolor", "type" : "STRING" }, { "schema_id" : 5, "name" : "virginica", "type" : "STRING" } ], "import_data" : true }
Example Responses
Status code: 201
Created
{
"dataset_id" : "WxCREuCkBSAlQr9xrde"
}
Status Codes
Status Code | Description |
---|---|
201 | Created |
401 | Unauthorized |
403 | Forbidden |
404 | Not Found |
Error Codes
See Error Codes <modelarts_03_0095>
.