proposalbot d0127cf2e3 Changes to ma_api-ref from docs/doc-exports#332 (Reduce length of special chars
Reviewed-by: gtema <artem.goncharov@gmail.com>
Co-authored-by: proposalbot <proposalbot@otc-service.com>
Co-committed-by: proposalbot <proposalbot@otc-service.com>
2022-10-18 07:11:40 +00:00

80 KiB

original_name

CreateDataset.html

Creating a Dataset

Function

This API is used to create a dataset.

URI

POST /v2/{project_id}/datasets

Table 1 Path Parameters
Parameter Mandatory Type Description
project_id Yes String Project ID. For details about how to obtain the project ID, see Obtaining a Project ID <modelarts_03_0147>.

Request Parameters

Table 2 Request body parameters
Parameter Mandatory Type Description
data_format No String

Data format. The options are as follows:

  • Default: default format
  • CarbonData: CarbonData (supported only by table datasets)
data_sources No Array of DataSource <createdataset__request_datasource> objects Input dataset path, which is used to synchronize source data (such as images, text files, and audio files) in the directory and its subdirectories to the dataset. For a table dataset, this parameter indicates the import directory. The work directory of a table dataset cannot be an OBS path in a KMS-encrypted bucket.
dataset_name Yes String Dataset name. The value contains 1 to 100 characters. Only letters, digits, underscores (_), and hyphens (-) are allowed, for example, dataset-9f3b.
dataset_type No Integer

Dataset type. The options are as follows:

  • 0: image classification
  • 1: object detection
  • 100: text classification
  • 101: named entity recognition
  • 102: text triplet
  • 200: sound classification
  • 201: speech content
  • 202: speech paragraph labeling
  • 400: table dataset
  • 600: video labeling
  • 900: custom format
description No String Dataset description. The value is empty by default. The description contains 0 to 256 characters and does not support the following special characters: ^!<>=&"'
import_annotations No Boolean

Whether to automatically import the labeling information in the input directory, supporting detection, image classification, and text classification. The options are as follows:

  • true: Import labeling information in the input directory. (Default value)
  • false: Do not import labeling information in the input directory.
import_data No Boolean

Whether to import data. This parameter is used only for table datasets. The options are as follows:

  • true: Import data when creating a database.
  • false: Do not import data when creating a database. (Default value)
label_format No LabelFormat <createdataset__request_labelformat> object Label format information. This parameter is used only for text datasets.
labels No Array of Label <createdataset__request_label> objects Dataset label list.
managed No Boolean

Whether to host a dataset. The options are as follows:

  • true: Host a dataset.
  • false: Do not host a dataset. (Default value)
schema No Array of Field <createdataset__request_field> objects Schema list.
work_path Yes String

Output dataset path, which is used to store output files such as label files.

  • The format is /Bucket name/File path, for example, /obs-bucket/flower/rose/. (The directory is used as the path.)
  • A bucket cannot be directly used as a path.
  • The output dataset path is different from the input dataset path or its subdirectory.
  • The value contains 3 to 700 characters.
work_path_type Yes Integer

Type of the dataset output path. The options are as follows:

  • 0: OBS bucket (default value)
workforce_information No WorkforceInformation <createdataset__request_workforceinformation> object Team labeling information.
workspace_id No String Workspace ID. If no workspace is created, the default value is 0. If a workspace is created and used, use the actual value.
Table 3 DataSource
Parameter Mandatory Type Description
data_path No String Data source path.
data_type No Integer

Data type. The options are as follows:

  • 0: OBS bucket (default value)
  • 1: GaussDB(DWS)
  • 2: DLI
  • 3: RDS
  • 4: MRS
  • 5: AI Gallery
  • 6: Inference service
schema_maps No Array of SchemaMap <createdataset__request_schemamap> objects Schema mapping information corresponding to the table data.
source_info No SourceInfo <createdataset__request_sourceinfo> object Information required for importing a table data source.
with_column_header No Boolean

Whether the first row in the file is a column name. This field is valid for the table dataset. The options are as follows:

  • true: The first row in the file is the column name.
  • false: The first row in the file is not the column name.
Table 4 SchemaMap
Parameter Mandatory Type Description
dest_name No String Name of the destination column.
src_name No String Name of the source column.
Table 5 SourceInfo
Parameter Mandatory Type Description
cluster_id No String ID of an MRS cluster.
cluster_mode No String

Running mode of an MRS cluster. The options are as follows:

  • 0: normal cluster
  • 1: security cluster
cluster_name No String Name of an MRS cluster.
database_name No String Name of the database to which the table dataset is imported.
input No String HDFS path of a table dataset.
ip No String IP address of your GaussDB(DWS) cluster.
port No String Port number of your GaussDB(DWS) cluster.
queue_name No String DLI queue name of a table dataset.
subnet_id No String Subnet ID of an MRS cluster.
table_name No String Name of the table to which a table dataset is imported.
user_name No String Username, which is mandatory for GaussDB(DWS) data.
user_password No String User password, which is mandatory for GaussDB(DWS) data.
vpc_id No String ID of the VPC where an MRS cluster resides.
Table 6 LabelFormat
Parameter Mandatory Type Description
label_type No String

Label type of text classification. The options are as follows:

  • 0: The label is separated from the text, and they are distinguished by the fixed suffix _result. For example, the text file is abc.txt, and the label file is abc_result.txt.
  • 1: Default value. Labels and texts are stored in the same file and separated by separators. You can use text_sample_separator to specify the separator between the text and label and text_label_separator to specify the separator between labels.
text_label_separator No String Separator between labels. By default, a comma (,) is used as the separator. The separator needs to be escaped. The separator can contain only one character, such as a letter, a digit, or any of the following special characters: !@#$%^&*_=|?/':.;,
text_sample_separator No String Separator between the text and label. By default, the Tab key is used as the separator. The separator needs to be escaped. The separator can contain only one character, such as a letter, a digit, or any of the following special characters: !@#$%^&*_=|?/':.;,
Table 7 Label
Parameter Mandatory Type Description
attributes No Array of LabelAttribute <createdataset__request_labelattribute> objects Multi-dimensional attribute of a label. For example, if the label is music, attributes such as style and artist may be included.
name No String Label name.
property No LabelProperty <createdataset__request_labelproperty> object Basic attribute key-value pair of a label, such as color and shortcut keys.
type No Integer

Label type. The options are as follows:

  • 0: image classification
  • 1: object detection
  • 100: text classification
  • 101: named entity recognition
  • 102: text triplet relationship
  • 103: text triplet entity
  • 200: speech classification
  • 201: speech content
  • 202: speech paragraph labeling
  • 600: video classification
Table 8 LabelAttribute
Parameter Mandatory Type Description
default_value No String Default value of a label attribute.
id No String Label attribute ID.
name No String Label attribute name.
type No String

Label attribute type. The options are as follows:

  • text: text
  • select: single-choice drop-down list
values No Array of LabelAttributeValue <createdataset__request_labelattributevalue> objects List of label attribute values.
Table 9 LabelAttributeValue
Parameter Mandatory Type Description
id No String Label attribute value ID.
value No String Label attribute value.
Table 10 LabelProperty
Parameter Mandatory Type Description
@modelarts:color No String Default attribute: Label color, which is a hexadecimal code of the color. By default, this parameter is left blank. Example: #FFFFF0.
@modelarts:default_shape No String

Default attribute: Default shape of an object detection label (dedicated attribute). By default, this parameter is left blank. The options are as follows:

  • bndbox: rectangle
  • polygon: polygon
  • circle: circle
  • line: straight line
  • dashed: dotted line
  • point: point
  • polyline: polyline
@modelarts:from_type No String Default attribute: Type of the head entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset.
@modelarts:rename_to No String Default attribute: The new name of the label.
@modelarts:shortcut No String Default attribute: Label shortcut key. By default, this parameter is left blank. For example: D.
@modelarts:to_type No String Default attribute: Type of the tail entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset.
Table 11 Field
Parameter Mandatory Type Description
description No String Schema description.
name No String Schema name.
schema_id No Integer Schema ID.
type No String Schema value type.
Table 12 WorkforceInformation
Parameter Mandatory Type Description
data_sync_type No Integer

Synchronization type. The options are as follows:

  • 0: not to be synchronized
  • 1: data to be synchronized
  • 2: label to be synchronized
  • 3: data and label to be synchronized
repetition No Integer Number of persons who label each sample. The minimum value is 1.
synchronize_auto_labeling_data No Boolean

Whether to synchronously update auto labeling data. The options are as follows:

  • true: Update auto labeling data synchronously.
  • false: Do not update auto labeling data synchronously.
synchronize_data No Boolean

Whether to synchronize updated data, such as uploading files, synchronizing data sources, and assigning imported unlabeled files to team members. The options are as follows:

  • true: Synchronize updated data to team members.
  • false: Do not synchronize updated data to team members.
task_id No String ID of a team labeling task.
task_name Yes String Name of a team labeling task. The value contains 1 to 64 characters, including only letters, digits, underscores (_), and hyphens (-).
workforces_config No WorkforcesConfig <createdataset__request_workforcesconfig> object Manpower assignment of a team labeling task. You can delegate the administrator to assign the manpower or do it by yourself.
Table 13 WorkforcesConfig
Parameter Mandatory Type Description
agency No String Team administrator.
workforces No Array of WorkforceConfig <createdataset__request_workforceconfig> objects List of teams that execute labeling tasks.
Table 14 WorkforceConfig
Parameter Mandatory Type Description
workers No Array of Worker <createdataset__request_worker> objects List of labeling team members.
workforce_id No String ID of a labeling team.
workforce_name No String Name of a labeling team. The value contains 0 to 1024 characters and does not support the following special characters: !<>=&"'
Table 15 Worker
Parameter Mandatory Type Description
create_time No Long Creation time.
description No String Labeling team member description. The value contains 0 to 256 characters and does not support the following special characters: ^!<>=&"'
email No String Email address of a labeling team member.
role No Integer

Role. The options are as follows:

  • 0: labeling personnel
  • 1: reviewer
  • 2: team administrator
  • 3: dataset owner
status No Integer

Current login status of a labeling team member. The options are as follows:

  • 0: The invitation email has not been sent.
  • 1: The invitation email has been sent but the user has not logged in.
  • 2: The user has logged in.
  • 3: The labeling team member has been deleted.
update_time No Long Update time.
worker_id No String ID of a labeling team member.
workforce_id No String ID of a labeling team.

Response Parameters

Status code: 201

Table 16 Response body parameters
Parameter Type Description
dataset_id String Dataset ID.
error_code String Error code.
error_msg String Error message.
import_task_id String ID of an import task.

Example Requests

  • Creating an Image Classification Dataset

    {
      "workspace_id" : "0",
      "dataset_name" : "dataset-457f",
      "dataset_type" : 0,
      "data_sources" : [ {
        "data_type" : 0,
        "data_path" : "/test-obs/classify/input/cat-dog/"
      } ],
      "description" : "",
      "work_path" : "/test-obs/classify/output/",
      "work_path_type" : 0,
      "labels" : [ {
        "name" : "Cat",
        "type" : 0,
        "property" : {
          "@modelarts:color" : "#3399ff"
        }
      }, {
        "name" : "Dog",
        "type" : 0,
        "property" : {
          "@modelarts:color" : "#3399ff"
        }
      } ]
    }
  • Creating an Object Detection Dataset

    {
      "workspace_id" : "0",
      "dataset_name" : "dataset-95a6",
      "dataset_type" : 1,
      "data_sources" : [ {
        "data_type" : 0,
        "data_path" : "/test-obs/detect/input/cat-dog/"
      } ],
      "description" : "",
      "work_path" : "/test-obs/detect/output/",
      "work_path_type" : 0,
      "labels" : [ {
        "name" : "Cat",
        "type" : 1,
        "property" : {
          "@modelarts:color" : "#3399ff"
        }
      }, {
        "name" : "Dog",
        "type" : 1,
        "property" : {
          "@modelarts:color" : "#3399ff"
        }
      } ]
    }
  • Creating a Table Dataset

    {
      "workspace_id" : "0",
      "dataset_name" : "dataset-de83",
      "dataset_type" : 400,
      "data_sources" : [ {
        "data_type" : 0,
        "data_path" : "/test-obs/table/input/",
        "with_column_header" : true
      } ],
      "description" : "",
      "work_path" : "/test-obs/table/output/",
      "work_path_type" : 0,
      "schema" : [ {
        "schema_id" : 1,
        "name" : "150",
        "type" : "STRING"
      }, {
        "schema_id" : 2,
        "name" : "4",
        "type" : "STRING"
      }, {
        "schema_id" : 3,
        "name" : "setosa",
        "type" : "STRING"
      }, {
        "schema_id" : 4,
        "name" : "versicolor",
        "type" : "STRING"
      }, {
        "schema_id" : 5,
        "name" : "virginica",
        "type" : "STRING"
      } ],
      "import_data" : true
    }

Example Responses

Status code: 201

Created

{
  "dataset_id" : "WxCREuCkBSAlQr9xrde"
}

Status Codes

Status Code Description
201 Created
401 Unauthorized
403 Forbidden
404 Not Found

Error Codes

See Error Codes <modelarts_03_0095>.