After the MOR table is synchronized to Hive, the following two tables are synchronized to Hive: Table name_rt and Table name_ro. The table suffixed with rt indicates the real-time view, and the table suffixed with ro indicates the read-optimized view. For example, the name of the Hudi table to be synchronized to Hive is test. After the table is synchronized to Hive, two more tables test_rt and test_ro are generated in the Hive table.
select count(*) from test_rt;
set hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat; // This parameter does not need to be specified for SparkSQL. set hoodie.test.consume.mode=INCREMENTAL; set hoodie.test.consume.max.commits=3; set hoodie.test.consume.start.timestamp=20201227153030; select count(*) from default.test_rt where `_hoodie_commit_time`>'20201227153030';
set hoodie.test.consume.mode=INCREMENTAL; set hoodie.test.consume.start.timestamp=20201227153030; // Specify the initial incremental pull commit. set hoodie.test.consume.end.timestamp=20210308212318; // Specify the end commit of the incremental pull. If this parameter is not specified, the latest commit is used. select count(*) from default.test_rt where `_hoodie_commit_time`>'20201227153030';
select count(*) from test_ro;
QUERY_TYPE_OPT_KEY must be set to QUERY_TYPE_READ_OPTIMIZED_OPT_VAL.
spark.read.format("hudi") .option(QUERY_TYPE_OPT_KEY, QUERY_TYPE_READ_OPTIMIZED_OPT_VAL) // Set the query type to the read-optimized view. .load("/tmp/default/mor_bugx/*/*/*/*") // Set the path of the Hudi table to be read. The current table has three levels of partitions. .createTempView("mycall") spark.sql("select * from mycall").show(100)