Specifications for Importing Data from an OBS Directory

When importing data from OBS, the data storage directory and file name must comply with the ModelArts specifications.

Only the following labeling types of data can be imported by Labeling Format: image classification, object detection, image segmentation, text classification, and sound classification.

  • To import data from an OBS directory, you must have the read permission on the OBS directory.
  • The OBS buckets and ModelArts must be in the same region.

Image classification

Data for image classification can be stored in two formats:

Format 1: ModelArts imageNet 1.0
  • Images with the same label must be stored in the same directory, with the label name as the directory name. If there are multiple levels of directories, the last level is used as the label name.

    In the following example, Rabbit and Panda are label names.

    dataset-import-example 
    ├─Rabbit 
    │      10.jpg 
    │      11.jpg 
    │      12.jpg 
    │ 
    └─Panda 
            1.jpg 
            2.jpg 
            3.jpg
Format 2: ModelArts image classification 1.0
  • The image and labeled file must be stored in the same directory, with the content in the labeled file used as label names.

    In the following example, import-dir-1 and import-dir-2 are the imported subdirectories:

    dataset-import-example 
    ├─import-dir-1
    │      10.jpg
    │      10.txt    
    │      11.jpg 
    │      11.txt
    │      12.jpg 
    │      12.txt
    └─import-dir-2
            1.jpg 
            1.txt
            2.jpg 
            2.txt

    The following shows a label file for a single label, for example, the 1.txt file:

    Rabbit

    The following shows a label file for multiple labels, for example, the 2.txt file:

    Rabbit
    Panda

Object detection

Data for object detection can be stored in two formats:

1)ModelArts PASCAL VOC 1.0

Format 2: YOLO

Image segmentation

ModelArts image segmentation 1.0:

Text classification

txt and csv files can be imported for text classification, with the text encoding format of UTF-8 or GBK.

Labeled objects and labels for text classification can be stored in two formats:

Sound classification

ModelArts audio classification dir 1.0: Sound files with the same label must be stored in the same directory, and the label name is the directory name.

Example:

dataset-import-example 
├─Rabbit 
│      10.wav 
│      11.wav 
│      12.wav 
│ 
└─Panda 
        1.wav 
        2.wav 
        3.wav

Tables

CSV files can be imported from OBS. Select the directory where the files are stored. The number of columns in the CSV file must be the same as that in the dataset schema. The schema of the CSV file can be automatically obtained.

├─dataset-import-example 
│      table_import_1.csv 
│      table_import_2.csv
│      table_import_3.csv
│      table_import_4.csv