Yang, Tong 6182f91ba8 MRS component operation guide_normal 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-09 14:55:21 +00:00

40 lines
4.2 KiB
HTML

<a name="mrs_01_24098"></a><a name="mrs_01_24098"></a>
<h1 class="topictitle1">Reading COW Table Views</h1>
<div id="body0000001151103117"><ul id="mrs_01_24098__ul540514510288"><li id="mrs_01_24098__li134051151102812">Reading the real-time view (using Hive and SparkSQL as an example): Directly read the Hudi table stored in Hive.<pre class="screen" id="mrs_01_24098__screen4363241182314">select count(*) from test;</pre>
</li></ul>
<ul id="mrs_01_24098__ul925257192812"><li id="mrs_01_24098__li192510576282">Reading the real-time view (using the Spark DataSource API as an example): This is similar to reading a common DataSource table.<p id="mrs_01_24098__p1547010511504"><a name="mrs_01_24098__li192510576282"></a><a name="li192510576282"></a><strong id="mrs_01_24098__b1009373132112239">QUERY_TYPE_OPT_KEY</strong> must be set to <strong id="mrs_01_24098__b96132044112239">QUERY_TYPE_SNAPSHOT_OPT_VAL</strong>.</p>
<pre class="screen" id="mrs_01_24098__screen13578121265919">spark.read.format("hudi")
.option(QUERY_TYPE_OPT_KEY, QUERY_TYPE_SNAPSHOT_OPT_VAL) // Set the query type to the real-time view.
.load("/tmp/default/cow_bugx/*/*/*/*") // Set the path of the Hudi table to be read. The current table has three levels of partitions.
.createTempView("mycall")
spark.sql("select * from mycall").show(100)</pre>
</li></ul>
<ul id="mrs_01_24098__ul1269011722918"><li id="mrs_01_24098__li569097132910">Reading the incremental view (using Hive as an example):<pre class="screen" id="mrs_01_24098__screen1795642015258">set hoodie.test.consume.mode=INCREMENTAL; // Specify the incremental reading mode.
set hoodie.test.consume.max.commits=3; // Specify the maximum number of commits to be consumed.
set hoodie.test.consume.start.timestamp=20201227153030; // Specify the initial incremental pull commit.
select count(*) from default.test where `_hoodie_commit_time`&gt;'20201227153030'; // This filtering condition must be added, and the value is the initial incremental pull commit.</pre>
</li></ul>
<ul id="mrs_01_24098__ul977918108294"><li id="mrs_01_24098__li63941151115610">Reading the incremental view (using Spark SQL as an example):<pre class="screen" id="mrs_01_24098__screen1055905819578">set hoodie.test.consume.mode=INCREMENTAL; // Specify the incremental reading mode.
set hoodie.test.consume.start.timestamp=20201227153030; // Specify the initial incremental pull commit.
set hoodie.test.consume.end.timestamp=20210308212318; // Specify the end commit of the incremental pull. If this parameter is not specified, the latest commit is used.
select count(*) from default.test where `_hoodie_commit_time`&gt;'20201227153030'; // This filtering condition must be added, and the value is the initial incremental pull commit.</pre>
</li><li id="mrs_01_24098__li1177916103295">Reading the incremental view (using the Spark DataSource API as an example):<p id="mrs_01_24098__p1250192118478"><a name="mrs_01_24098__li1177916103295"></a><a name="li1177916103295"></a><strong id="mrs_01_24098__b111967639112239">QUERY_TYPE_OPT_KEY</strong> must be set to <strong id="mrs_01_24098__b1465126722112239">QUERY_TYPE_INCREMENTAL_OPT_VAL</strong>.</p>
<pre class="screen" id="mrs_01_24098__screen1084032313466">spark.read.format("hudi")
.option(QUERY_TYPE_OPT_KEY, QUERY_TYPE_INCREMENTAL_OPT_VAL) // Set the query type to the incremental mode.
.option(BEGIN_INSTANTTIME_OPT_KEY, "20210308212004") // Specify the initial incremental pull commit.
.option(END_INSTANTTIME_OPT_KEY, "20210308212318") //: Specify the end commit of the incremental pull.
.load("/tmp/default/cow_bugx/*/*/*/*") // Set the path of the Hudi table to be read. The current table has three levels of partitions.
.createTempView("mycall") // Register as a Spark temporary table.
spark.sql("select * from mycall where `_hoodie_commit_time`&gt;'20210308211131'")// Start the query. The statement is the same as the Hive incremental query statement.
.show(100, false)</pre>
</li></ul>
<ul id="mrs_01_24098__ul6286162417290"><li id="mrs_01_24098__li172868247297">Reading the read-optimized view: The read-optimized view of COW tables is equivalent to the real-time view.</li></ul>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_24037.html">Read</a></div>
</div>
</div>