This API is used to add samples in batches.
POST /v2/{project_id}/datasets/{dataset_id}/data-annotations/samples
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
dataset_id |
Yes |
String |
Dataset ID. |
project_id |
Yes |
String |
Project ID. For details about how to obtain the project ID, see Obtaining a Project ID. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
final_annotation |
No |
Boolean |
Whether to directly import to the final result. The options are as follows: - true: Import labels to the labeled dataset. (Default value). - false: Import labels to the to-be-confirmed dataset. Currently, to-be-confirmed datasets only support categories of image classification and object detection. |
label_format |
No |
LabelFormat object |
Label format. This parameter is used only for text datasets. |
samples |
No |
Array of Sample objects |
Sample list. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
label_type |
No |
String |
Label type of text classification. The options are as follows: - 0: The label is separated from the text, and they are distinguished by the fixed suffix _result. For example, the text file is abc.txt, and the label file is abc_result.txt. - 1: Default value. Labels and texts are stored in the same file and separated by separators. You can use text_sample_separator to specify the separator between the text and label and text_label_separator to specify the separator between labels. |
text_label_separator |
No |
String |
Separator between labels. By default, a comma (,) is used as the separator. The separator needs to be escaped. The separator can contain only one character, such as a letter, a digit, or any of the following special characters: !@#$%^&*_=|?/':.;, |
text_sample_separator |
No |
String |
Separator between the text and label. By default, the Tab key is used as the separator. The separator needs to be escaped. The separator can contain only one character, such as a letter, a digit, or any of the following special characters: !@#$%^&*_=|?/':.;, |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
data |
No |
Object |
Byte data of sample files. The type is java.nio.ByteBuffer. When this parameter is called, the string converted from the byte data is uploaded. |
data_source |
No |
DataSource object |
Data source. |
encoding |
No |
String |
Encoding type of sample files, which is used to upload .txt or .csv files. The value can be UTF-8, GBK, or GB2312. The default value is UTF-8. |
labels |
No |
Array of SampleLabel objects |
Sample label list. |
metadata |
No |
SampleMetadata object |
Key-value pair of the sample metadata attribute. |
name |
No |
String |
Name of sample files. The value contains 0 to 1,024 characters and cannot contain special characters (!<>=&"'). |
sample_type |
No |
Integer |
Sample type. The options are as follows: - 0: image - 1: text - 2: speech - 4: table - 6: video - 9: custom format |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
data_path |
No |
String |
Data source path. |
data_type |
No |
Integer |
Data type. The options are as follows: - 0: OBS bucket (default value) - 1: GaussDB(DWS) - 2: DLI - 3: RDS - 4: MRS - 5: AI Gallery - 6: Inference service |
schema_maps |
No |
Array of SchemaMap objects |
Schema mapping information corresponding to the table data. |
source_info |
No |
SourceInfo object |
Information required for importing a table data source. |
with_column_header |
No |
Boolean |
Whether the first row in the file is a column name. This field is valid for the table dataset. The options are as follows: - true: The first row in the file is the column name. - false: The first row in the file is not the column name. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
dest_name |
No |
String |
Name of the destination column. |
src_name |
No |
String |
Name of the source column. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
cluster_id |
No |
String |
ID of an MRS cluster. |
cluster_mode |
No |
String |
Running mode of an MRS cluster. The options are as follows: - 0: normal cluster - 1: security cluster |
cluster_name |
No |
String |
Name of an MRS cluster. |
database_name |
No |
String |
Name of the database to which the table dataset is imported. |
input |
No |
String |
HDFS path of a table dataset. |
ip |
No |
String |
IP address of your GaussDB(DWS) cluster. |
port |
No |
String |
Port number of your GaussDB(DWS) cluster. |
queue_name |
No |
String |
DLI queue name of a table dataset. |
subnet_id |
No |
String |
Subnet ID of an MRS cluster. |
table_name |
No |
String |
Name of the table to which a table dataset is imported. |
user_name |
No |
String |
Username, which is mandatory for GaussDB(DWS) data. |
user_password |
No |
String |
User password, which is mandatory for GaussDB(DWS) data. |
vpc_id |
No |
String |
ID of the VPC where an MRS cluster resides. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
annotated_by |
No |
String |
Video labeling method, which is used to distinguish whether a video is labeled manually or automatically. The options are as follows: - human: manual labeling - auto: automatic labeling |
id |
No |
String |
Label ID. |
name |
No |
String |
Label name. |
property |
No |
SampleLabelProperty object |
Attribute key-value pair of the sample label, such as the object shape and shape feature. |
score |
No |
Float |
Confidence. |
type |
No |
Integer |
Label type. The options are as follows: - 0: image classification - 1: object detection - 100: text classification - 101: named entity recognition - 102: text triplet relationship - 103: text triplet entity - 200: speech classification - 201: speech content - 202: speech paragraph labeling - 600: video classification |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
@modelarts:content |
No |
String |
Speech text content, which is a default attribute dedicated to the speech label (including the speech content and speech start and end points). |
@modelarts:end_index |
No |
Integer |
End position of the text, which is a default attribute dedicated to the named entity label. The end position does not include the character corresponding to the value of end_index. Examples are as follows. - If the text content is "Barack Hussein Obama II (born August 4, 1961) is an American attorney and politician.", the start_index and end_index values of "Barack Hussein Obama II" are 0 and 23, respectively. - If the text content is "By the end of 2018, the company has more than 100 employees.", the start_index and end_index values of "By the end of 2018" are 0 and 18, respectively. |
@modelarts:end_time |
No |
String |
Speech end time, which is a default attribute dedicated to the speech start/end point label, in the format of hh:mm:ss.SSS. (hh indicates hour; mm indicates minute; ss indicates second; and SSS indicates millisecond.) |
@modelarts:feature |
No |
Object |
Shape feature, which is a default attribute dedicated to the object detection label, with type of List. The upper left corner of an image is used as the coordinate origin [0,0]. Each coordinate point is represented by [x, y]. x indicates the horizontal coordinate, and y indicates the vertical coordinate (both x and y are greater than or equal to 0). The format of each shape is as follows: - bndbox: consists of two points, for example, [[0,10],[50,95]]. The first point is located at the upper left corner of the rectangle and the second point is located at the lower right corner of the rectangle. That is, the X coordinate of the first point must be smaller than that of the second point, and the Y coordinate of the second point must be smaller than that of the first point. - **polygon**: consists of multiple points that are connected in sequence to form a polygon, for example, **[[0,100],[50,95],[10,60],[500,400]]**. - **circle**: consists of the center point and radius, for example, **[[100,100],[50]]**. - **line**: consists of two points, for example, **[[0,100],[50,95]]**. The first point is the start point, and the second point is the end point. - **dashed**: consists of two points, for example, **[[0,100],[50,95]]**. The first point is the start point, and the second point is the end point. - **point**: consists of one point, for example, **[[0,100]]**. - **polyline**: consists of multiple points, for example, **[[0,100],[50,95],[10,60],[500,400]]**. |
@modelarts:from |
No |
String |
ID of the head entity in the triplet relationship label, which is a default attribute dedicated to the triplet relationship label. |
@modelarts:hard |
No |
String |
Sample labeled as a hard sample or not, which is a default attribute. Options:
|
@modelarts:hard_coefficient |
No |
String |
Coefficient of difficulty of each label level, which is a default attribute. The value range is [0,1]. |
@modelarts:hard_reasons |
No |
String |
Reasons that the sample is a hard sample, which is a default attribute. Use a hyphen (-) to separate every two hard sample reason IDs, for example, 3-20-21-19. The options are as follows: - 0: No target objects are identified. - 1: The confidence is low. - 2: The clustering result based on the training dataset is inconsistent with the prediction result. - 3: The prediction result is greatly different from the data of the same type in the training dataset. - 4: The prediction results of multiple consecutive similar images are inconsistent. - 5: There is a large offset between the image resolution and the feature distribution of the training dataset. - 6: There is a large offset between the aspect ratio of the image and the feature distribution of the training dataset. - 7: There is a large offset between the brightness of the image and the feature distribution of the training dataset. - 8: There is a large offset between the saturation of the image and the feature distribution of the training dataset. - 9: There is a large offset between the color richness of the image and the feature distribution of the training dataset. - 10: There is a large offset between the definition of the image and the feature distribution of the training dataset. - 11: There is a large offset between the number of frames of the image and the feature distribution of the training dataset. - 12: There is a large offset between the standard deviation of area of image frames and the feature distribution of the training dataset. - 13: There is a large offset between the aspect ratio of image frames and the feature distribution of the training dataset. - 14: There is a large offset between the area portion of image frames and the feature distribution of the training dataset. - 15: There is a large offset between the edge of image frames and the feature distribution of the training dataset. - 16: There is a large offset between the brightness of image frames and the feature distribution of the training dataset. - 17: There is a large offset between the definition of image frames and the feature distribution of the training dataset. - 18: There is a large offset between the stack of image frames and the feature distribution of the training dataset. - 19: The data enhancement result based on GaussianBlur is inconsistent with the prediction result of the original image. - 20: The data enhancement result based on fliplr is inconsistent with the prediction result of the original image. - 21: The data enhancement result based on Crop is inconsistent with the prediction result of the original image. - 22: The data enhancement result based on flipud is inconsistent with the prediction result of the original image. - 23: The data enhancement result based on scale is inconsistent with the prediction result of the original image. - 24: The data enhancement result based on translate is inconsistent with the prediction result of the original image. - 25: The data enhancement result based on shear is inconsistent with the prediction result of the original image. - 26: The data enhancement result based on superpixels is inconsistent with the prediction result of the original image. - 27: The data enhancement result based on sharpen is inconsistent with the prediction result of the original image. - 28: The data enhancement result based on add is inconsistent with the prediction result of the original image. - 29: The data enhancement result based on invert is inconsistent with the prediction result of the original image. - 30: The data is predicted to be abnormal. |
@modelarts:shape |
No |
String |
Object shape, which is a default attribute dedicated to the object detection label and is left empty by default. The options are as follows: - bndbox: rectangle - polygon: polygon - circle: circle - line: straight line - dashed: dotted line - point: point - polyline: polyline |
@modelarts:source |
No |
String |
Speech source, which is a default attribute dedicated to the speech start/end point label and can be set to a speaker or narrator. |
@modelarts:start_index |
No |
Integer |
Start position of the text, which is a default attribute dedicated to the named entity label. The start value begins from 0, including the character corresponding to the value of start_index. |
@modelarts:start_time |
No |
String |
Speech start time, which is a default attribute dedicated to the speech start/end point label, in the format of hh:mm:ss.SSS. (hh indicates hour; mm indicates minute; ss indicates second; and SSS indicates millisecond.) |
@modelarts:to |
No |
String |
ID of the tail entity in the triplet relationship label, which is a default attribute dedicated to the triplet relationship label. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
@modelarts:hard |
No |
Double |
Whether the sample is labeled as a hard sample, which is a default attribute. The options are as follows: - 0: non-hard sample - 1: hard sample |
@modelarts:hard_coefficient |
No |
Double |
Coefficient of difficulty of each sample level, which is a default attribute. The value range is [0,1]. |
@modelarts:hard_reasons |
No |
Array of integers |
ID of a hard sample reason, which is a default attribute. The options are as follows: - 0: No target objects are identified. - 1: The confidence is low. - 2: The clustering result based on the training dataset is inconsistent with the prediction result. - 3: The prediction result is greatly different from the data of the same type in the training dataset. - 4: The prediction results of multiple consecutive similar images are inconsistent. - 5: There is a large offset between the image resolution and the feature distribution of the training dataset. - 6: There is a large offset between the aspect ratio of the image and the feature distribution of the training dataset. - 7: There is a large offset between the brightness of the image and the feature distribution of the training dataset. - 8: There is a large offset between the saturation of the image and the feature distribution of the training dataset. - 9: There is a large offset between the color richness of the image and the feature distribution of the training dataset. - 10: There is a large offset between the definition of the image and the feature distribution of the training dataset. - 11: There is a large offset between the number of frames of the image and the feature distribution of the training dataset. - 12: There is a large offset between the standard deviation of area of image frames and the feature distribution of the training dataset. - 13: There is a large offset between the aspect ratio of image frames and the feature distribution of the training dataset. - 14: There is a large offset between the area portion of image frames and the feature distribution of the training dataset. - 15: There is a large offset between the edge of image frames and the feature distribution of the training dataset. - 16: There is a large offset between the brightness of image frames and the feature distribution of the training dataset. - 17: There is a large offset between the definition of image frames and the feature distribution of the training dataset. - 18: There is a large offset between the stack of image frames and the feature distribution of the training dataset. - 19: The data enhancement result based on GaussianBlur is inconsistent with the prediction result of the original image. - 20: The data enhancement result based on fliplr is inconsistent with the prediction result of the original image. - 21: The data enhancement result based on Crop is inconsistent with the prediction result of the original image. - 22: The data enhancement result based on flipud is inconsistent with the prediction result of the original image. - 23: The data enhancement result based on scale is inconsistent with the prediction result of the original image. - 24: The data enhancement result based on translate is inconsistent with the prediction result of the original image. - 25: The data enhancement result based on shear is inconsistent with the prediction result of the original image. - 26: The data enhancement result based on superpixels is inconsistent with the prediction result of the original image. - 27: The data enhancement result based on sharpen is inconsistent with the prediction result of the original image. - 28: The data enhancement result based on add is inconsistent with the prediction result of the original image. - 29: The data enhancement result based on invert is inconsistent with the prediction result of the original image. - 30: The data is predicted to be abnormal. |
@modelarts:size |
No |
Array of objects |
Image size (width, height, and depth of the image), which is a default attribute, with type of List. In the list, the first number indicates the width (pixels), the second number indicates the height (pixels), and the third number indicates the depth (the depth can be left blank and the default value is 3). For example, [100,200,3] and [100,200] are both valid. Note: This parameter is mandatory only when the sample label list contains the object detection label. |
Status code: 200
Parameter |
Type |
Description |
---|---|---|
error_code |
String |
Error code. |
error_msg |
String |
Error message. |
results |
Array of UploadSampleResp objects |
Response list for adding samples in batches. |
success |
Boolean |
Whether the operation is successful. The options are as follows: - true: successful - false: failed |
Adding Samples in Batches
{ "samples" : [ { "name" : "2.jpg", "data" : "/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAA1AJUDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL" } ] }
Status code: 200
OK
{ "success" : true, "results" : [ { "success" : true, "name" : "/test-obs/classify/input/cat-dog/2.jpg", "info" : "960585877c92d63911ba555ab3129d36" } ] }
Status Code |
Description |
---|---|
200 |
OK |
401 |
Unauthorized |
403 |
Forbidden |
404 |
Not Found |
See Error Codes.