142 KiB
Response body for querying the dataset list.
Function
This API is used to query the created datasets that meet the search criteria by page.
URI
GET /v2/{project_id}/datasets
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
project_id |
Yes |
String |
Project ID. For details about how to obtain the project ID, see Obtaining a Project ID. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
check_running_task |
No |
Boolean |
Whether to detect tasks (including initialization tasks) that are running in a dataset. The options are as follows:
|
contain_versions |
No |
Boolean |
Whether the dataset contains a version. |
dataset_type |
No |
Integer |
Dataset type. The options are as follows:
|
file_preview |
No |
Boolean |
Whether a dataset supports preview when it is queried. The options are as follows:
|
limit |
No |
Integer |
Maximum number of records returned on each page. The value ranges from 1 to 100. The default value is 10. |
offset |
No |
Integer |
Start page of the paging list. The default value is 0. |
order |
No |
String |
Sorting sequence of the query. The options are as follows:
|
running_task_type |
No |
Integer |
Type of the running tasks (including initialization tasks) to be detected. The options are as follows:
|
search_content |
No |
String |
Fuzzy search keyword. By default, this parameter is left blank. |
sort_by |
No |
String |
Sorting mode of the query. The options are as follows:
|
support_export |
No |
Boolean |
Whether to filter datasets that can be exported only (including datasets of image classification, object detection, and custom format). If this parameter is left blank or the value is set to false, no filtering is performed. The options are as follows:
|
train_evaluate_ratio |
No |
String |
Version split ratio for dataset filtering. The numbers before and after the comma indicate the minimum and maximum split ratios, and the versions whose split ratios are within the range are filtered out, for example, 0.0,1.0. Note: If this parameter is left blank or unavailable, the system does not filter datasets based on the version split ratio by default. |
version_format |
No |
Integer |
Dataset version format for dataset filtering. This parameter is used to filter datasets that meet the filter criteria. The options are as follows:
|
with_labels |
No |
Boolean |
Whether to return dataset labels. The options are as follows:
|
workspace_id |
No |
String |
Workspace ID. If no workspace is created, the default value is 0. If a workspace is created and used, use the actual value. |
Request Parameters
None
Response Parameters
Status code: 200
Parameter |
Type |
Description |
---|---|---|
datasets |
Array of DatasetAndFilePreview objects |
Dataset list queried by page. |
total_number |
Integer |
Total number of datasets. |
workspaceId |
String |
Workspace ID. If no workspace is created, the default value is 0. If a workspace is created and used, use the actual value. |
Parameter |
Type |
Description |
---|---|---|
annotated_sample_count |
Integer |
Number of labeled samples in a dataset. |
annotated_sub_sample_count |
Integer |
Number of labeled subsamples. |
content_labeling |
Boolean |
Whether to enable content labeling for the speech paragraph labeling dataset. This function is enabled by default. |
create_time |
Long |
Time when a dataset is created. |
current_version_id |
String |
Current version ID of a dataset. |
current_version_name |
String |
Current version name of a dataset. |
data_format |
String |
Data format. |
data_sources |
Array of DataSource objects |
Data source list. |
data_statistics |
Map<String,Object> |
Sample statistics on a dataset, including the statistics on sample metadata in JSON format. |
data_update_time |
Long |
Time when a sample and a label are updated. |
data_url |
String |
Data path for training. |
dataset_format |
Integer |
Dataset format. The options are as follows:
|
dataset_id |
String |
Dataset ID. |
dataset_name |
String |
Dataset name. |
dataset_tags |
Array of strings |
Key identifier list of a dataset, for example, ["Image","Object detection"]. |
dataset_type |
Integer |
Dataset type. The options are as follows:
|
dataset_version_count |
Integer |
Version number of a dataset. |
deleted_sample_count |
Integer |
Number of deleted samples. |
deletion_stats |
Map<String,Integer> |
Deletion reason statistics. |
description |
String |
Dataset description. |
enterprise_project_id |
String |
Enterprise project ID. |
exist_running_task |
Boolean |
Whether the dataset contains running (including initialization) tasks. The options are as follows:
|
exist_workforce_task |
Boolean |
Whether the dataset contains team labeling tasks. The options are as follows:
|
feature_supports |
Array of strings |
List of features supported by the dataset. Currently, only the value 0 is supported, indicating that the OBS file size is limited. |
import_data |
Boolean |
Whether to import data. The options are as follows:
|
import_task_id |
String |
ID of an import task. |
inner_annotation_path |
String |
Path for storing the labeling result of a dataset. |
inner_data_path |
String |
Path for storing the internal data of a dataset. |
inner_log_path |
String |
Path for storing internal logs of a dataset. |
inner_task_path |
String |
Path for internal task of a dataset. |
inner_temp_path |
String |
Path for storing internal temporary files of a dataset. |
inner_work_path |
String |
Output directory of a dataset. |
label_task_count |
Integer |
Number of labeling tasks. |
labels |
Array of Label objects |
Dataset label list. |
loading_sample_count |
Integer |
Number of loading samples. |
managed |
Boolean |
Whether a dataset is hosted. The options are as follows:
|
next_version_num |
Integer |
Number of next versions of a dataset. |
running_tasks_id |
Array of strings |
ID list of running (including initialization) tasks. |
samples |
Array of AnnotationFile objects |
Sample list. |
schema |
Array of Field objects |
Schema list. |
status |
Integer |
Dataset status. The options are as follows:
|
third_path |
String |
Third-party path. |
total_sample_count |
Integer |
Total number of dataset samples. |
total_sub_sample_count |
Integer |
Total number of subsamples generated from the parent samples. For example, the total number of key frame images extracted from the video labeling dataset is that of subsamples. |
unconfirmed_sample_count |
Integer |
Number of auto labeling samples to be confirmed. |
update_time |
Long |
Time when a dataset is updated. |
versions |
Array of DatasetVersion objects |
Dataset version information. Currently, only the current version information of a dataset is recorded. |
work_path |
String |
Output dataset path, which is used to store output files such as label files. The path is an OBS path in the format of /Bucket name/File path. For example: /obs-bucket. |
work_path_type |
Integer |
Type of the dataset output path. The options are as follows:
|
workforce_descriptor |
WorkforceDescriptor object |
Team labeling information. |
workforce_task_count |
Integer |
Number of team labeling tasks of a dataset. |
workspace_id |
String |
Workspace ID. If no workspace is created, the default value is 0. If a workspace is created and used, use the actual value. |
Parameter |
Type |
Description |
---|---|---|
data_path |
String |
Data source path. |
data_type |
Integer |
Data type. The options are as follows:
|
schema_maps |
Array of SchemaMap objects |
Schema mapping information corresponding to the table data. |
source_info |
SourceInfo object |
Information required for importing a table data source. |
with_column_header |
Boolean |
Whether the first row in the file is a column name. This field is valid for the table dataset. The options are as follows:
|
Parameter |
Type |
Description |
---|---|---|
dest_name |
String |
Name of the destination column. |
src_name |
String |
Name of the source column. |
Parameter |
Type |
Description |
---|---|---|
cluster_id |
String |
ID of an MRS cluster. |
cluster_mode |
String |
Running mode of an MRS cluster. The options are as follows:
|
cluster_name |
String |
Name of an MRS cluster. |
database_name |
String |
Name of the database to which the table dataset is imported. |
input |
String |
HDFS path of a table dataset. |
ip |
String |
IP address of your GaussDB(DWS) cluster. |
port |
String |
Port number of your GaussDB(DWS) cluster. |
queue_name |
String |
DLI queue name of a table dataset. |
subnet_id |
String |
Subnet ID of an MRS cluster. |
table_name |
String |
Name of the table to which a table dataset is imported. |
user_name |
String |
Username, which is mandatory for GaussDB(DWS) data. |
user_password |
String |
User password, which is mandatory for GaussDB(DWS) data. |
vpc_id |
String |
ID of the VPC where an MRS cluster resides. |
Parameter |
Type |
Description |
---|---|---|
attributes |
Array of LabelAttribute objects |
Multi-dimensional attribute of a label. For example, if the label is music, attributes such as style and artist may be included. |
name |
String |
Label name. |
property |
LabelProperty object |
Basic attribute key-value pair of a label, such as color and shortcut keys. |
type |
Integer |
Label type. The options are as follows:
|
Parameter |
Type |
Description |
---|---|---|
@modelarts:color |
String |
Default attribute: Label color, which is a hexadecimal code of the color. By default, this parameter is left blank. Example: #FFFFF0. |
@modelarts:default_shape |
String |
Default attribute: Default shape of an object detection label (dedicated attribute). By default, this parameter is left blank. The options are as follows:
|
@modelarts:from_type |
String |
Default attribute: Type of the head entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset. |
@modelarts:rename_to |
String |
Default attribute: The new name of the label. |
@modelarts:shortcut |
String |
Default attribute: Label shortcut key. By default, this parameter is left blank. For example: D. |
@modelarts:to_type |
String |
Default attribute: Type of the tail entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset. |
Parameter |
Type |
Description |
---|---|---|
create_time |
Long |
Time when a sample is created. |
dataset_id |
String |
Dataset ID. |
depth |
Integer |
Number of image sample channels. |
file_Name |
String |
Sample name. |
file_id |
String |
Sample ID. |
file_type |
String |
File type. |
height |
Integer |
Image sample height. |
size |
Long |
Image sample size. |
tags |
Map<String,String> |
Label information of a sample. |
url |
String |
OBS address of the preview sample. |
width |
Integer |
Image sample width. |
Parameter |
Type |
Description |
---|---|---|
description |
String |
Schema description. |
name |
String |
Schema name. |
schema_id |
Integer |
Schema ID. |
type |
String |
Schema value type. |
Parameter |
Type |
Description |
---|---|---|
add_sample_count |
Integer |
Number of added samples. |
annotated_sample_count |
Integer |
Number of samples with labeled versions. |
annotated_sub_sample_count |
Integer |
Number of labeled subsamples. |
clear_hard_property |
Boolean |
Whether to clear hard example properties during release. The options are as follows:
|
code |
String |
Status code of a preprocessing task such as rotation and cropping. |
create_time |
Long |
Time when a version is created. |
crop |
Boolean |
Whether to crop the image. This field is valid only for the object detection dataset whose labeling box is in the rectangle shape. The options are as follows:
|
crop_path |
String |
Path for storing cropped files. |
crop_rotate_cache_path |
String |
Temporary directory for executing the rotation and cropping task. |
data_path |
String |
Path for storing data. |
data_statistics |
Map<String,Object> |
Sample statistics on a dataset, including the statistics on sample metadata in JSON format. |
data_validate |
Boolean |
Whether data is validated by the validation algorithm before release. The options are as follows:
|
deleted_sample_count |
Integer |
Number of deleted samples. |
deletion_stats |
Map<String,Integer> |
Deletion reason statistics. |
description |
String |
Description of a version. |
export_images |
Boolean |
Whether to export images to the version output directory during release. The options are as follows:
|
extract_serial_number |
Boolean |
Whether to parse the subsample number during release. The field is valid for the healthcare dataset. The options are as follows:
|
include_dataset_data |
Boolean |
Whether to include the source data of a dataset during release. The options are as follows:
|
is_current |
Boolean |
Whether the current dataset version is used. The options are as follows:
|
label_stats |
Array of LabelStats objects |
Label statistics list of a released version. |
label_type |
String |
Label type of a released version. The options are as follows:
|
manifest_cache_input_path |
String |
Input path for the manifest file cache during version release. |
manifest_path |
String |
Path for storing the manifest file with the released version. |
message |
String |
Task information recorded during release (for example, error information). |
modified_sample_count |
Integer |
Number of modified samples. |
previous_annotated_sample_count |
Integer |
Number of labeled samples of parent versions. |
previous_total_sample_count |
Integer |
Total samples of parent versions. |
previous_version_id |
String |
Parent version ID |
processor_task_id |
String |
ID of a preprocessing task such as rotation and cropping. |
processor_task_status |
Integer |
Status of a preprocessing task such as rotation and cropping. The options are as follows:
|
remove_sample_usage |
Boolean |
Whether to clear the existing usage information of a dataset during release. The options are as follows:
|
rotate |
Boolean |
Whether to rotate the image. The options are as follows:
|
rotate_path |
String |
Path for storing the rotated file. |
sample_state |
String |
Sample status. The options are as follows:
|
status |
Integer |
Status of a dataset version. The options are as follows:
|
tags |
Array of strings |
Key identifier list of the dataset. The labeling type is used as the default label when the labeling task releases a version. For example, ["Image","Object detection"]. |
task_type |
Integer |
Labeling task type of the released version, which is the same as the dataset type. |
total_sample_count |
Integer |
Total number of version samples. |
total_sub_sample_count |
Integer |
Total number of subsamples generated from the parent samples. |
train_evaluate_sample_ratio |
String |
Split training and verification ratio during version release. The default value is 1.00, indicating that all labeled samples are split into the training set. |
update_time |
Long |
Time when a version is updated. |
version_format |
String |
Format of a dataset version. The options are as follows:
|
version_id |
String |
Dataset version ID. |
version_name |
String |
Dataset version name. |
with_column_header |
Boolean |
Whether the first row in the released CSV file is a column name. This field is valid for the table dataset. The options are as follows:
|
Parameter |
Type |
Description |
---|---|---|
attributes |
Array of LabelAttribute objects |
Multi-dimensional attribute of a label. For example, if the label is music, attributes such as style and artist may be included. |
count |
Integer |
Number of labels. |
name |
String |
Label name. |
property |
LabelProperty object |
Basic attribute key-value pair of a label, such as color and shortcut keys. |
sample_count |
Integer |
Number of samples containing the label. |
type |
Integer |
Label type. The options are as follows:
|
Parameter |
Type |
Description |
---|---|---|
default_value |
String |
Default value of a label attribute. |
id |
String |
Label attribute ID. |
name |
String |
Label attribute name. |
type |
String |
Label attribute type. The options are as follows:
|
values |
Array of LabelAttributeValue objects |
List of label attribute values. |
Parameter |
Type |
Description |
---|---|---|
id |
String |
Label attribute value ID. |
value |
String |
Label attribute value. |
Parameter |
Type |
Description |
---|---|---|
current_task_id |
String |
ID of a team labeling task. |
current_task_name |
String |
Name of a team labeling task. |
reject_num |
Integer |
Number of rejected samples. |
repetition |
Integer |
Number of persons who label each sample. The minimum value is 1. |
is_synchronize_auto_labeling_data |
Boolean |
Whether to synchronously update auto labeling data. The options are as follows:
|
is_synchronize_data |
Boolean |
Whether to synchronize updated data, such as uploading files, synchronizing data sources, and assigning imported unlabeled files to team members. The options are as follows:
|
workers |
Array of Worker objects |
List of labeling team members. |
workforce_id |
String |
ID of a labeling team. |
workforce_name |
String |
Name of a labeling team. |
Parameter |
Type |
Description |
---|---|---|
create_time |
Long |
Creation time. |
description |
String |
Labeling team member description. The value contains 0 to 256 characters and does not support the following special characters: ^!<>=&"' |
String |
Email address of a labeling team member. |
|
role |
Integer |
Role. The options are as follows:
|
status |
Integer |
Current login status of a labeling team member. The options are as follows:
|
update_time |
Long |
Update time. |
worker_id |
String |
ID of a labeling team member. |
workforce_id |
String |
ID of a labeling team. |
Example Requests
Querying the Dataset List
GET https://{endpoint}/v2/{project_id}/datasets?offset=0&limit=10&sort_by=create_time&order=desc&dataset_type=0&file_preview=true
Example Responses
Status code: 200
OK
{ "total_number" : 1, "datasets" : [ { "dataset_id" : "gfghHSokody6AJigS5A", "dataset_name" : "dataset-f9e8", "dataset_type" : 0, "data_format" : "Default", "next_version_num" : 4, "status" : 1, "data_sources" : [ { "data_type" : 0, "data_path" : "/test-obs/classify/input/catDog4/" } ], "create_time" : 1605690595404, "update_time" : 1605690595404, "description" : "", "current_version_id" : "54IXbeJhfttGpL46lbv", "current_version_name" : "V003", "total_sample_count" : 10, "annotated_sample_count" : 10, "work_path" : "/test-obs/classify/output/", "inner_work_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/", "inner_annotation_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/annotation/", "inner_data_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/data/", "inner_log_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/logs/", "inner_temp_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/temp/", "inner_task_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/task/", "work_path_type" : 0, "workspace_id" : "0", "enterprise_project_id" : "0", "exist_running_task" : false, "exist_workforce_task" : false, "running_tasks_id" : [ ], "workforce_task_count" : 0, "feature_supports" : [ "0" ], "managed" : false, "import_data" : false, "ai_project" : "default-ai-project", "label_task_count" : 1, "dataset_format" : 0, "dataset_version" : "v1", "content_labeling" : true, "samples" : [ { "url" : "https://test-obs.obs.xxx.com:443/classify/input/catDog4/15.jpg?AccessKeyId=vprCCTxxxxxxxxxxbXr&Expires=1606100112&Signature=tuUo9jl6lqoMKAwNBz5g8dxO%2FdE%3D", "create_time" : 1605690596035 }, { "url" : "https://test-obs.obs.xxx.com:443/classify/input/catDog4/8.jpg?AccessKeyId=vprCCTxxxxxxxxxxbXr&Expires=1606100112&Signature=NITOdBnkUXtdnKuEgDzZpkQzNfM%3D", "create_time" : 1605690596046 }, { "url" : "https://test-obs.obs.xxx.com:443/classify/input/catDog4/9.jpg?AccessKeyId=vprCCTxxxxxxxxxxbXr&Expires=1606100112&Signature=%2BwUo1BL38%2F2d7p7anPi4fNzm1VU%3D", "create_time" : 1605690596050 }, { "url" : "https://test-obs.obs.xxx.com:443/classify/input/catDog4/7.jpg?AccessKeyId=vprCCTxxxxxxxxxxbXr&Expires=1606100112&Signature=tOrHfcWo%2FEJ0wRzfi1M5Wk2MrXg%3D", "create_time" : 1605690596043 } ], "files" : [ { "url" : "https://test-obs.obs.xxx.com:443/classify/input/catDog4/15.jpg?AccessKeyId=vprCCTxxxxxxxxxxbXr&Expires=1606100112&Signature=tuUo9jl6lqoMKAwNBz5g8dxO%2FdE%3D", "create_time" : 1605690596035 }, { "url" : "https://test-obs.obs.xxx.com:443/classify/input/catDog4/8.jpg?AccessKeyId=vprCCTxxxxxxxxxxbXr&Expires=1606100112&Signature=NITOdBnkUXtdnKuEgDzZpkQzNfM%3D", "create_time" : 1605690596046 }, { "url" : "https://test-obs.obs.xxx.com:443/classify/input/catDog4/9.jpg?AccessKeyId=vprCCTxxxxxxxxxxbXr&Expires=1606100112&Signature=%2BwUo1BL38%2F2d7p7anPi4fNzm1VU%3D", "create_time" : 1605690596050 }, { "url" : "https://test-obs.obs.xxx.com:443/classify/input/catDog4/7.jpg?AccessKeyId=vprCCTxxxxxxxxxxbXr&Expires=1606100112&Signature=tOrHfcWo%2FEJ0wRzfi1M5Wk2MrXg%3D", "create_time" : 1605690596043 } ] } ] }
Status Codes
Status Code |
Description |
---|---|
200 |
OK |
401 |
Unauthorized |
403 |
Forbidden |
404 |
Not Found |
Error Codes
See Error Codes.