The manifest file defines the mapping between labeled objects and content. The manifest file import mode means that the manifest file is used for dataset import. The manifest file can be imported from OBS. When importing a manifest file from OBS, ensure that you have the permissions to access the directory where the manifest file is stored.

There are many requirements on the manifest file compilation. Import new data from OBS. Generally, manifest file import is used for data migration of ModelArts in different regions or using different accounts. If you have labeled data in a region using ModelArts, you can obtain the manifest file of the published dataset from the output path. Then you can import the dataset using the manifest file to ModelArts of other regions or accounts. The imported data carries the labeling information and does not need to be labeled again, improving development efficiency.

The manifest file that contains information about the original file and labeling can be used in labeling, training, and inference scenarios. The manifest file that contains only information about the original file can be used in inference scenarios or used to generate an unlabeled dataset. The manifest file must meet the following requirements:

The manifest file uses the UTF-8 encoding format.
The manifest file uses the JSON Lines format (jsonlines.org). A line contains one JSON object.
```
{"source": "/path/to/image1.jpg", "annotation": ... }
{"source": "/path/to/image2.jpg", "annotation": ... }
{"source": "/path/to/image3.jpg", "annotation": ... }
```
In the preceding example, the manifest file contains multiple lines of JSON object.
The manifest file can be generated by you, third-party tools, or ModelArts Data Labeling. The file name can be any valid file name. To facilitate the internal use of the ModelArts system, the file name generated by the ModelArts data labeling function consists of the following strings: DatasetName-VersionName.manifest. For example, animal-v201901231130304123.manifest.

Image Classification

{
    "source":"s3://path/to/image1.jpg",
    "usage":"TRAIN",
    "hard":"true",
    "hard-coefficient":0.8,
    "id":"0162005993f8065ef47eefb59d1e4970",
    "annotation": [
        {
            "type": "modelarts/image_classification",
            "name": "cat",
            "property": {
                "color":"white",
                "kind":"Persian cat"            
            },
            "hard":"true",
            "hard-coefficient":0.8,
            "annotated-by":"human",
            "creation-time":"2019-01-23 11:30:30"        
        },
        {
            "type": "modelarts/image_classification",
            "name":"animal",
            "annotated-by":"modelarts/active-learning",
            "confidence": 0.8,
            "creation-time":"2019-01-23 11:30:30"        
        }],
    "inference-loc":"/path/to/inference-output"
}

**Table 1** Parameters
Parameter	Mandatory	Description
source	Yes	URI of an object to be labeled. For details about data source types and examples, see Table 2.
usage	No	By default, the parameter value is left blank. Possible values are as follows: TRAIN: The object is used for training. EVAL: The object is used for evaluation. TEST: The object is used for testing. INFERENCE: The object is used for inference. If the parameter value is left blank, you decide how to use the object.
id	No	Sample ID exported from the system. You do not need to set this parameter when importing the sample.
annotation	No	If the parameter value is left blank, the object is not labeled. The value of annotation consists of an object list. For details about the parameters, see Table 3.
inference-loc	No	This parameter is available when the file is generated by the inference service, indicating the location of the inference result file.

**Table 2** Data source types
Type	Example
OBS	"source":"s3://path-to-jpg"
Content	"source":"content://I love machine learning"

**Table 3** **annotation** objects
Parameter	Mandatory	Description
type	Yes	Label type. Possible values are as follows: image_classification: image classification text_classification: text classification text_entity: named entity recognition object_detection: object detection audio_classification: sound classification audio_content: speech labeling audio_segmentation: speech paragraph labeling
name	Yes/No	This parameter is mandatory for the classification type but optional for other types. This example uses the image classification type.
id	Yes/No	Label ID. This parameter is mandatory for triplets but optional for other types. The entity label ID of a triplet is in E+number format, for example, E1 and E2. The relationship label ID of a triplet is in R+number format, for example, R1 and R2.
property	No	Labeling property. In this example, there are two properties: color and kind.
hard	No	Indicates whether the example is a hard example. True indicates that the labeling example is a hard example, and False indicates that the labeling example is not a hard example.
annotated-by	No	The default value is human, indicating manual labeling. human
creation-time	No	Time when the labeling job was created. It is the time when labeling information was written, not the time when the manifest file was generated.
confidence	No	Confidence score of machine labeling. The value ranges from 0 to 1.

Image Segmentation

{
    "annotation": [{
        "annotation-format": "PASCAL VOC",
        "type": "modelarts/image_segmentation",
        "annotation-loc": "s3://path/to/annotation/image1.xml",
        "creation-time": "2020-12-16 21:36:27",
        "annotated-by": "human"
    }],
    "usage": "train",
    "source": "s3://path/to/image1.jpg",
    "id": "16d196c19bf61994d7deccafa435398c",
    "sample-type": 0
}

The parameters such as source, usage, and annotation are the same as those described in Image Classification. For details, see Table 1.
annotation-loc indicates the path for saving the label file. This parameter is mandatory for image segmentation and object detection but optional for other labeling types.
annotation-format indicates the format of the label file. This parameter is optional. The default value is PASCAL VOC. Only PASCAL VOC is supported.
sample-type indicates a sample format. Value 0 indicates image, 1 text, 2 audio, 4 table, and 6 video.

**Table 4** PASCAL VOC format parameters
Parameter	Mandatory	Description
folder	Yes	Directory where the data source is located
filename	Yes	Name of the file to be labeled
size	Yes	Image pixel width: image width. This parameter is mandatory. height: image height. This parameter is mandatory. depth: number of image channels. This parameter is mandatory.
segmented	Yes	Segmented or not
mask_source	No	Segmentation mask path
object	Yes	Object detection information. Multiple object{} functions are generated for multiple objects. name: type of the labeled content. This parameter is mandatory. pose: shooting angle of the labeled content. This parameter is mandatory. truncated: whether the labeled content is truncated (0 indicates that the content is not truncated). This parameter is mandatory. occluded: whether the labeled content is occluded (0 indicates that the content is not occluded). This parameter is mandatory. difficult: whether the labeled object is difficult to identify (0 indicates that the object is easy to identify). This parameter is mandatory. confidence: confidence score of the labeled object. The value ranges from 0 to 1. This parameter is optional. bndbox: bounding box type. This parameter is mandatory. For details about the possible values, see Table 5. mask_color: label color, which is represented by the RGB value. This parameter is mandatory.

**Table 5** Bounding box types
type	Shape	Labeling information
polygon	Polygon	Coordinates of points <x1>100<x1> <y1>100<y1> <x2>200<x2> <y2>100<y2> <x3>250<x3> <y3>150<y3> <x4>200<x4> <y4>200<y4> <x5>100<x5> <y5>200<y5> <x6>50<x6> <y6>150<y6> <x7>100<x7> <y7>100<y7>

Example:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<annotation>
    <folder>NA</folder>
    <filename>image_0006.jpg</filename>
    <source>
        <database>Unknown</database>
    </source>
    <size>
        <width>230</width>
        <height>300</height>
        <depth>3</depth>
    </size>
    <segmented>1</segmented>
    <mask_source>obs://xianao/out/dataset-8153-Jmf5ylLjRmSacj9KevS/annotation/V001/segmentationClassRaw/image_0006.png</mask_source>
    <object>
        <name>bike</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <mask_color>193,243,53</mask_color>
        <occluded>0</occluded>
        <polygon>
            <x1>71</x1>
            <y1>48</y1>
            <x2>75</x2>
            <y2>73</y2>
            <x3>49</x3>
            <y3>69</y3>
            <x4>68</x4>
            <y4>92</y4>
            <x5>90</x5>
            <y5>101</y5>
            <x6>45</x6>
            <y6>110</y6>
            <x7>71</x7>
            <y7>48</y7>
        </polygon>
    </object>
</annotation>

Text Classification

{
    "source": "content://I like this product ",
    "id":"XGDVGS",
    "annotation": [
        {
            "type": "modelarts/text_classification",
            "name": " positive",
            "annotated-by": "human",
            "creation-time": "2019-01-23 11:30:30"        
        } ]
}

The content parameter indicates the text to be labeled. The other parameters are the same as those described in Image Classification. For details, see Table 1.

Named Entity Recognition

{
    "source":"content://Michael Jordan is the most famous basketball player in the world.",
    "usage":"TRAIN",
    "annotation":[
        {
            "type":"modelarts/text_entity",
            "name":"Person",
            "property":{
                "@modelarts:start_index":0,
                "@modelarts:end_index":14
            },
            "annotated-by":"human",
            "creation-time":"2019-01-23 11:30:30"
        },
        {
            "type":"modelarts/text_entity",
            "name":"Category",
            "property":{
                "@modelarts:start_index":34,
                "@modelarts:end_index":44
            },
            "annotated-by":"human",
            "creation-time":"2019-01-23 11:30:30"
        }
    ]
}

The parameters such as source, usage, and annotation are the same as those described in Image Classification. For details, see Table 1.

Table 6 describes the property parameters. For example, if you want to extract Michael from "source":"content://Michael Jordan", the value of start_index is 0 and that of end_index is 7.

**Table 6** **property** parameters
Parameter	Data Type	Description
@modelarts:start_index	Integer	Start position of the text. The value starts from 0, including the characters specified by start_index.
@modelarts:end_index	Integer	End position of the text, excluding the characters specified by end_index.

Text Triplet

{
    "source":"content://"Three Body" is a series of long science fiction novels created by Liu Cix.",
    "usage":"TRAIN",
    "annotation":[
        {
            "type":"modelarts/text_entity",
            "name":"Person",
            "id":"E1",
            "property":{
                "@modelarts:start_index":67,
                "@modelarts:end_index":74
            },
            "annotated-by":"human",
            "creation-time":"2019-01-23 11:30:30"
        },
        {
            "type":"modelarts/text_entity",
            "name":"Book",
            "id":"E2",
            "property":{
                "@modelarts:start_index":0,
                "@modelarts:end_index":12
            },
            "annotated-by":"human",
            "creation-time":"2019-01-23 11:30:30"
        },
        {
            "type":"modelarts/text_triplet",
            "name":"Author",
            "id":"R1",
            "property":{
                "@modelarts:from":"E1",
                "@modelarts:to":"E2"
            },
            "annotated-by":"human",
            "creation-time":"2019-01-23 11:30:30"
        },
        {
            "type":"modelarts/text_triplet",
            "name":"Works",
            "id":"R2",
            "property":{
                "@modelarts:from":"E2",
                "@modelarts:to":"E1"
            },
            "annotated-by":"human",
            "creation-time":"2019-01-23 11:30:30"
        }
    ]
}

The parameters such as source, usage, and annotation are the same as those described in Image Classification. For details, see Table 1.

Table 5 property parameters describes the property parameters. @modelarts:start_index and @modelarts:end_index are the same as those of named entity recognition. For example, when source is set to content://"Three Body" is a series of long science fiction novels created by Liu Cix., Liu Cix is an entity person, Three Body is an entity book, the person is the author of the book, and the book is works of the person.

**Table 7** **property** parameters
Parameter	Data Type	Description
@modelarts:start_index	Integer	Start position of the triplet entities. The value starts from 0, including the characters specified by start_index.
@modelarts:end_index	Integer	End position of the triplet entities, excluding the characters specified by end_index.
@modelarts:from	String	Start entity ID of the triplet relationship
@modelarts:to	String	Entity ID pointed to in the triplet relationship

Object Detection

{
    "source":"s3://path/to/image1.jpg",
    "usage":"TRAIN",
    "hard":"true",
    "hard-coefficient":0.8,
    "annotation": [
        {
            "type":"modelarts/object_detection",
            "annotation-loc": "s3://path/to/annotation1.xml",
            "annotation-format":"PASCAL VOC",
            "annotated-by":"human",
            "creation-time":"2019-01-23 11:30:30"                
        }]
}

The parameters such as source, usage, and annotation are the same as those described in Image Classification. For details, see Table 1.
annotation-loc indicates the path for saving the label file. This parameter is mandatory for object detection and image segmentation but optional for other labeling types.
annotation-format indicates the format of the label file. This parameter is optional. The default value is PASCAL VOC. Only PASCAL VOC is supported.

**Table 8** PASCAL VOC format parameters
Parameter	Mandatory	Description
folder	Yes	Directory where the data source is located
filename	Yes	Name of the file to be labeled
size	Yes	Image pixel width: image width. This parameter is mandatory. height: image height. This parameter is mandatory. depth: number of image channels. This parameter is mandatory.
segmented	Yes	Segmented or not
object	Yes	Object detection information. Multiple object{} functions are generated for multiple objects. name: type of the labeled content. This parameter is mandatory. pose: shooting angle of the labeled content. This parameter is mandatory. truncated: whether the labeled content is truncated (0 indicates that the content is not truncated). This parameter is mandatory. occluded: whether the labeled content is occluded (0 indicates that the content is not occluded). This parameter is mandatory. difficult: whether the labeled object is difficult to identify (0 indicates that the object is easy to identify). This parameter is mandatory. confidence: confidence score of the labeled object. The value ranges from 0 to 1. This parameter is optional. bndbox: bounding box type. This parameter is mandatory. For details about the possible values, see Table 9.

**Table 9** Bounding box types
type	Shape	Labeling Information
point	Point	Coordinates of a point <x>100<x> <y>100<y>
line	Line	Coordinates of points <x1>100<x1> <y1>100<y1> <x2>200<x2> <y2>200<y2>
bndbox	Rectangle	Coordinates of the upper left and lower right points <xmin>100<xmin> <ymin>100<ymin> <xmax>200<xmax> <ymax>200<ymax>
polygon	Polygon	Coordinates of points <x1>100<x1> <y1>100<y1> <x2>200<x2> <y2>100<y2> <x3>250<x3> <y3>150<y3> <x4>200<x4> <y4>200<y4> <x5>100<x5> <y5>200<y5> <x6>50<x6> <y6>150<y6>
circle	Circle	Center coordinates and radius <cx>100<cx> <cy>100<cy> <r>50<r>

Example:

<annotation>
   <folder>test_data</folder>
   <filename>260730932.jpg</filename>
   <size>
       <width>767</width>
       <height>959</height>
       <depth>3</depth>
   </size>
   <segmented>0</segmented>
   <object>
       <name>point</name>
       <pose>Unspecified</pose>
       <truncated>0</truncated>
       <occluded>0</occluded>
       <difficult>0</difficult>
       <point>
           <x1>456</x1>
           <y1>596</y1>
       </point>
   </object>
   <object>
       <name>line</name>
       <pose>Unspecified</pose>
       <truncated>0</truncated>
       <occluded>0</occluded>
       <difficult>0</difficult>
       <line>
           <x1>133</x1>
           <y1>651</y1>
           <x2>229</x2>
           <y2>561</y2>
       </line>
   </object>
   <object>
       <name>bag</name>
       <pose>Unspecified</pose>
       <truncated>0</truncated>
       <occluded>0</occluded>
       <difficult>0</difficult>
       <bndbox>
           <xmin>108</xmin>
           <ymin>101</ymin>
           <xmax>251</xmax>
           <ymax>238</ymax>
       </bndbox>
   </object>
   <object>
       <name>boots</name>
       <pose>Unspecified</pose>
       <truncated>0</truncated>
       <occluded>0</occluded>
       <difficult>0</difficult>
       <hard-coefficient>0.8</hard-coefficient>
       <polygon>
           <x1>373</x1>
           <y1>264</y1>
           <x2>500</x2>
           <y2>198</y2>
           <x3>437</x3>
           <y3>76</y3>
           <x4>310</x4>
           <y4>142</y4>
       </polygon>
   </object>
   <object>
       <name>circle</name>
       <pose>Unspecified</pose>
       <truncated>0</truncated>
       <occluded>0</occluded>
       <difficult>0</difficult>
       <circle>
           <cx>405</cx>
           <cy>170</cy>
           <r>100<r>
       </circle>
   </object>
</annotation>

Sound Classification

{
"source":
"s3://path/to/pets.wav", 
    "annotation": [
        {
            "type": "modelarts/audio_classification",
            "name":"cat",    
            "annotated-by":"human",
            "creation-time":"2019-01-23 11:30:30"
        } 
    ]
}

The parameters such as source, usage, and annotation are the same as those described in Image Classification. For details, see Table 1.

Speech Labeling

{
    "source":"s3://path/to/audio1.wav",
    "annotation":[
        {
            "type":"modelarts/audio_content",
            "property":{
                "@modelarts:content":"Today is a good day."
            },
            "annotated-by":"human",
            "creation-time":"2019-01-23 11:30:30"
        }
    ]
}

The parameters such as source, usage, and annotation are the same as those described in Image Classification. For details, see Table 1.
The @modelarts:content parameter in property indicates speech content. The data type is String.

Speech Paragraph Labeling

{
    "source":"s3://path/to/audio1.wav",
    "usage":"TRAIN",
    "annotation":[
        {
           
"type":"modelarts/audio_segmentation",
            "property":{
                "@modelarts:start_time":"00:01:10.123",
                "@modelarts:end_time":"00:01:15.456",
               
                "@modelarts:source":"Tom",
               
                "@modelarts:content":"How are you?"
            },
           "annotated-by":"human",
           "creation-time":"2019-01-23 11:30:30"
        },
        {
           "type":"modelarts/audio_segmentation",
            "property":{
                "@modelarts:start_time":"00:01:22.754",
                "@modelarts:end_time":"00:01:24.145",
                "@modelarts:source":"Jerry",
                "@modelarts:content":"I'm fine, thank you."
            },
           "annotated-by":"human",
           "creation-time":"2019-01-23 11:30:30"
        }
    ]
}

The parameters such as source, usage, and annotation are the same as those described in Image Classification. For details, see Table 1.

Table 10 describes the property parameters.

**Table 10** **property** parameters
Parameter	Data Type	Description
@modelarts:start_time	String	Start time of the sound. The format is hh:mm:ss.SSS. hh indicates the hour, mm indicates the minute, ss indicates the second, and SSS indicates the millisecond.
@modelarts:end_time	String	End time of the sound. The format is hh:mm:ss.SSS. hh indicates the hour, mm indicates the minute, ss indicates the second, and SSS indicates the millisecond.
@modelarts:source	String	Sound source
@modelarts:content	String	Sound content

Video Labeling

{
	"annotation": [{
		"annotation-format": "PASCAL VOC",
		"type": "modelarts/object_detection",
		"annotation-loc": "s3://path/to/annotation1_t1.473722.xml",
		"creation-time": "2020-10-09 14:08:24",
		"annotated-by": "human"
	}],
	"usage": "train",
	"property": {
		"@modelarts:parent_duration": 8,
		"@modelarts:parent_source": "s3://path/to/annotation1.mp4",
		"@modelarts:time_in_video": 1.473722
	},
	"source": "s3://input/path/to/annotation1_t1.473722.jpg",
	"id": "43d88677c1e9a971eeb692a80534b5d5",
	"sample-type": 0
}

The parameters such as source, usage, and annotation are the same as those described in Image Classification. For details, see Table 1.
annotation-loc indicates the path for saving the label file. This parameter is mandatory for object detection but optional for other labeling types.
annotation-format indicates the format of the label file. This parameter is optional. The default value is PASCAL VOC. Only PASCAL VOC is supported.
sample-type indicates a sample format. Value 0 indicates image, 1 text, 2 audio, 4 table, and 6 video.

**Table 11** **property** parameters
Parameter	Data Type	Description
@modelarts:parent_duration	Double	Duration of the labeled video, in seconds
@modelarts:time_in_video	Double	Timestamp of the labeled video frame, in seconds
@modelarts:parent_source	String	OBS path of the labeled video

**Table 12** PASCAL VOC format parameters
Parameter	Mandatory	Description
folder	Yes	Directory where the data source is located
filename	Yes	Name of the file to be labeled
size	Yes	Image pixel width: image width. This parameter is mandatory. height: image height. This parameter is mandatory. depth: number of image channels. This parameter is mandatory.
segmented	Yes	Segmented or not
object	Yes	Object detection information. Multiple object{} functions are generated for multiple objects. name: type of the labeled content. This parameter is mandatory. pose: shooting angle of the labeled content. This parameter is mandatory. truncated: whether the labeled content is truncated (0 indicates that the content is not truncated). This parameter is mandatory. occluded: whether the labeled content is occluded (0 indicates that the content is not occluded). This parameter is mandatory. difficult: whether the labeled object is difficult to identify (0 indicates that the object is easy to identify). This parameter is mandatory. confidence: confidence score of the labeled object. The value ranges from 0 to 1. This parameter is optional. bndbox: bounding box type. This parameter is mandatory. For details about the possible values, see Table 13.

**Table 13** Bounding box types
type	Shape	Labeling Information
point	Point	Coordinates of a point <x>100<x> <y>100<y>
line	Line	Coordinates of points <x1>100<x1> <y1>100<y1> <x2>200<x2> <y2>200<y2>
bndbox	Rectangle	Coordinates of the upper left and lower right points <xmin>100<xmin> <ymin>100<ymin> <xmax>200<xmax> <ymax>200<ymax>
polygon	Polygon	Coordinates of points <x1>100<x1> <y1>100<y1> <x2>200<x2> <y2>100<y2> <x3>250<x3> <y3>150<y3> <x4>200<x4> <y4>200<y4> <x5>100<x5> <y5>200<y5> <x6>50<x6> <y6>150<y6>
circle	Circle	Center coordinates and radius <cx>100<cx> <cy>100<cy> <r>50<r>

Example:

<annotation>
   <folder>test_data</folder>
   <filename>260730932_t1.473722.jpg.jpg</filename>
   <size>
       <width>767</width>
       <height>959</height>
       <depth>3</depth>
   </size>
   <segmented>0</segmented>
   <object>
       <name>point</name>
       <pose>Unspecified</pose>
       <truncated>0</truncated>
       <occluded>0</occluded>
       <difficult>0</difficult>
       <point>
           <x1>456</x1>
           <y1>596</y1>
       </point>
   </object>
   <object>
       <name>line</name>
       <pose>Unspecified</pose>
       <truncated>0</truncated>
       <occluded>0</occluded>
       <difficult>0</difficult>
       <line>
           <x1>133</x1>
           <y1>651</y1>
           <x2>229</x2>
           <y2>561</y2>
       </line>
   </object>
   <object>
       <name>bag</name>
       <pose>Unspecified</pose>
       <truncated>0</truncated>
       <occluded>0</occluded>
       <difficult>0</difficult>
       <bndbox>
           <xmin>108</xmin>
           <ymin>101</ymin>
           <xmax>251</xmax>
           <ymax>238</ymax>
       </bndbox>
   </object>
   <object>
       <name>boots</name>
       <pose>Unspecified</pose>
       <truncated>0</truncated>
       <occluded>0</occluded>
       <difficult>0</difficult>
       <hard-coefficient>0.8</hard-coefficient>
       <polygon>
           <x1>373</x1>
           <y1>264</y1>
           <x2>500</x2>
           <y2>198</y2>
           <x3>437</x3>
           <y3>76</y3>
           <x4>310</x4>
           <y4>142</y4>
       </polygon>
   </object>
   <object>
       <name>circle</name>
       <pose>Unspecified</pose>
       <truncated>0</truncated>
       <occluded>0</occluded>
       <difficult>0</difficult>
       <circle>
           <cx>405</cx>
           <cy>170</cy>
           <r>100<r>
       </circle>
   </object>
</annotation>

Specifications for Importing a Manifest File

Image Classification

Image Segmentation

Text Classification

Named Entity Recognition

Text Triplet

Object Detection

Sound Classification

Speech Labeling

Speech Paragraph Labeling

Video Labeling