Bucket Inventory Overview

The bucket inventory function periodically generates lists of metadata information of objects in a bucket. Inventories help you better understand object statuses in the bucket.

An inventory is a CSV file. Inventory files are automatically uploaded to the specified bucket.

You specify that inventories are generated for objects with the same object name prefix. You can also determine the inventory generation interval and whether to list all object versions in the inventory file. The object metadata you specify in the inventory include the file size, last modification time, storage class, ETag, multipart upload, encryption status, and replication status.

Constraints

Content in an Inventory File

Table 1 lists all possible metadata fields that an inventory file can contain.

Table 1 Object metadata fields allowed in an inventory file

Metadata

Description

Bucket

Name of the source bucket

Key

Name of an object. Each object in a bucket has a unique key. Object names in the inventory file are URL-encoded using UTF-8 and must be decoded before you can use them.

VersionId

Object version ID. This field is not included in the inventory file if ObjectVersions in the inventory configuration is set to Current version only.

IsLatest

This field is set to True if the object version is the latest. This field is not included in the inventory file if ObjectVersions in the inventory configuration is set to Current version only.

IsDeleteMarker

When versioning is enabled for the source bucket, deleting an object will create a new piece of object metadata and set IsDeleteMarker of the metadata to true. This field is not included in the inventory file if ObjectVersions in the inventory configuration is set to Current version only.

Size

Object size, in bytes

LastModifiedDate

Object creation date or the last modification date

ETag

Hexadecimal digest of the object MD5. ETag is the unique identifier of the object content. It reflects whether the object content is changed. For example, if the ETag value is A when an object is uploaded but changes to B when the object is downloaded, it means that the object content has been changed.

StorageClass

Storage class of an object

IsMultipartUploaded

Whether an object is uploaded using multipart upload

ReplicationStatus

Cross-region replication status of an object

EncryptionStatus

Encryption status of an object

Inventory File Name

The name of an inventory file is in the following format:

destinationPrefix/sourceBucketName/inventoryId/yyyy-MM-dd'T'HH-mm'Z'/files/UUID_index.csv

The manifest.json File

If there are a large number of objects in a bucket, multiple inventory files may be generated for a single inventory configuration. It takes some time to generate these files. For example, if there are 200,000 objects in a bucket, it takes about 1.5 minutes to generate all inventory files. One or two hours after all inventory files are generated, a manifest.json file will be generated. The manifest.json file contains information about all inventory files generated this time, including:

The following is an example of a manifest.json file.
{
        "sourceBucket":"user001",
        "destinationBucket":"bucket001",
        "version":"2019-01-03",
        "fileFormat":"CSV",
        "fileSchema":"Bucket,Key,Size,LastModifiedDate,ETag,StorageClass,IsMultipartUploaded,ReplicationStatus,EncryptionStatus",
        "files":[
                {
                        "key":"inventory%2Fuser001%2Ftest_id%2F2019-01-03T12-28Z%2Ffiles%2F0000016813AF58E66806C1E2D7F15155_1.csv",
                        "size":6705647390,
                        "inventoriedRecord":70585762,
                }
        ]
}

The name of the manifest.json file is as follows (for details about each field, see Inventory File Name):

destinationPrefix/sourceBucketName/inventoryId/yyyy-MM-dd'T'HH-mm'Z'/manifest.json

The symlink.txt File

The symlink.txt file records the path of an inventory file. It helps quickly find all inventory files in big data scenarios. Apache Hive is compatible with the symlink.txt file. Hive can automatically find the symlink.txt file and the inventory files recorded in it.

The name of the symlink.txt file is as follows (for details about each field, see Inventory File Name):

destinationPrefix/sourceBucketName/inventoryId/hive/dt=YYYY-MM-DD-00-00/symlink.txt