Hudi

Hudi is the file organization layer of the data lake. It manages Parquet files, provides the data lake capability, and supports multiple compute engines. It also provides insert, update, and deletion (IUD) interfaces and streaming primitives for inserting, updating, and incremental pulling on HDFS datasets.

To use Hudi, ensure that the Spark2x service has been installed in the MRS cluster.

Figure 1 Basic architecture of Hudi

Feature

Key Technologies and Advantages

Two Types of Tables Supported by Hudi

Hudi Supporting Three Types Of Views for Read Capabilities in Different Scenarios