Generally, the quality of raw data cannot meet training requirements, for example, invalid or duplicate data exists. To help you improve data quality, ModelArts provides the following capabilities:
- Auto Grouping: pre-classifies data through clustering to allow you to label data based on clustering results, which ensures that different labels have the same or the almost same number of samples.
- Data Filtering: enables you to filter data based on sample attributes and auto grouping results.
- Data Feature Analysis: analyzes data features or labeling results, such as the brightness and bounding box distribution, helping you analyze data balance and improve the model training effect.