Before using ModelArts ExeML to build a model, upload data to an OBS bucket.
Uploading Data to OBS
This operation uses the OBS console to upload data. For more information about how to create a bucket and upload files, see Creating a Bucket and Uploading a File.
Perform the following operations to import data to the dataset for model training and building.
- Log in to OBS Console and create a bucket.
- Upload the local data to the OBS bucket. If you have a large amount of data, use OBS Browser+ to upload data or folders. The uploaded data must meet the dataset requirements of the ExeML project.
Requirements on Datasets
- The name of files in a dataset cannot contain Chinese characters, plus signs (+), spaces, or tabs.
- Ensure that no damaged image exists. The supported image formats include JPG, JPEG, BMP, and PNG.
- Do not store data of different projects in the same dataset.
- Prepare sufficient data and balance each class of data. To achieve better results, prepare at least 100 images of each class in a training set for image classification.
- To ensure the prediction accuracy of models, the training samples must be similar to the actual application scenarios.
- To ensure the generalization capability of models, datasets should cover all possible scenarios.
Requirements for Files Uploaded to OBS
- If you do not need to upload training data in advance, create an empty folder to store files generated in the future, for example, /bucketName/data-cat.
- If you need to upload images to be labeled in advance, create an empty folder and save the images in the folder. An example of the image directory structure is /bucketName/data-cat/cat.jpg.
- If you want to upload labeled images to the OBS bucket, upload them according to the following specifications: