The bucket inventory function periodically generates lists of metadata information of objects in a bucket. Inventories help you better understand object statuses in the bucket.
An inventory is a CSV file. Inventory files are automatically uploaded to the specified bucket.
You specify that inventories are generated for objects with the same object name prefix. You can also determine the inventory generation interval and whether to list all object versions in the inventory file. The object metadata you specify in the inventory include the file size, last modification time, storage class, ETag, multipart upload, encryption status, and replication status.
Table 1 lists all possible metadata fields that an inventory file can contain.
Metadata |
Description |
---|---|
Bucket |
Name of the source bucket |
Key |
Name of an object. Each object in a bucket has a unique key. Object names in the inventory file are URL-encoded using UTF-8 and must be decoded before you can use them. |
VersionId |
Object version ID. This field is not included in the inventory file if ObjectVersions in the inventory configuration is set to Current version only. |
IsLatest |
This field is set to True if the object version is the latest. This field is not included in the inventory file if ObjectVersions in the inventory configuration is set to Current version only. |
IsDeleteMarker |
When versioning is enabled for the source bucket, deleting an object will create a new piece of object metadata and set IsDeleteMarker of the metadata to true. This field is not included in the inventory file if ObjectVersions in the inventory configuration is set to Current version only. |
Size |
Object size, in bytes |
LastModifiedDate |
Object creation date or the last modification date |
ETag |
Hexadecimal digest of the object MD5. ETag is the unique identifier of the object content. It reflects whether the object content is changed. For example, if the ETag value is A when an object is uploaded but changes to B when the object is downloaded, it means that the object content has been changed. |
StorageClass |
Storage class of an object |
IsMultipartUploaded |
Whether an object is uploaded using multipart upload |
ReplicationStatus |
Cross-region replication status of an object |
EncryptionStatus |
Encryption status of an object |
The name of an inventory file is in the following format:
destinationPrefix/sourceBucketName/inventoryId/yyyy-MM-dd'T'HH-mm'Z'/files/UUID_index.csv
If there are a large number of objects in a bucket, multiple inventory files may be generated for a single inventory configuration. It takes some time to generate these files. For example, if there are 200,000 objects in a bucket, it takes about 1.5 minutes to generate all inventory files. One or two hours after all inventory files are generated, a manifest.json file will be generated. The manifest.json file contains information about all inventory files generated this time, including:
{ "sourceBucket":"user001", "destinationBucket":"bucket001", "version":"2019-01-03", "fileFormat":"CSV", "fileSchema":"Bucket,Key,Size,LastModifiedDate,ETag,StorageClass,IsMultipartUploaded,ReplicationStatus,EncryptionStatus", "files":[ { "key":"inventory%2Fuser001%2Ftest_id%2F2019-01-03T12-28Z%2Ffiles%2F0000016813AF58E66806C1E2D7F15155_1.csv", "size":6705647390, "inventoriedRecord":70585762, } ] }
The name of the manifest.json file is as follows (for details about each field, see Inventory File Name):
destinationPrefix/sourceBucketName/inventoryId/yyyy-MM-dd'T'HH-mm'Z'/manifest.json
The symlink.txt file records the path of an inventory file. It helps quickly find all inventory files in big data scenarios. Apache Hive is compatible with the symlink.txt file. Hive can automatically find the symlink.txt file and the inventory files recorded in it.
The name of the symlink.txt file is as follows (for details about each field, see Inventory File Name):
destinationPrefix/sourceBucketName/inventoryId/hive/dt=YYYY-MM-DD-00-00/symlink.txt