Yang, Tong 6182f91ba8 MRS component operation guide_normal 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-09 14:55:21 +00:00

33 lines
5.2 KiB
HTML

<a name="mrs_01_24165"></a><a name="mrs_01_24165"></a>
<h1 class="topictitle1">Single-Table Concurrent Write</h1>
<div id="body0000001119780226"><div class="section" id="mrs_01_24165__section1782284465013"><h4 class="sectiontitle">Hudi Single-Table Concurrent Write Solution</h4><ol id="mrs_01_24165__ol931014493501"><li id="mrs_01_24165__li1931034918503">Uses an external service (ZooKeeper or Hive MetaStore) as the distributed mutex lock service.</li><li id="mrs_01_24165__li153101749125012">Files can be concurrently written, but commits cannot be concurrent. The commit operation is encapsulated in a transaction.</li><li id="mrs_01_24165__li1831034918506">When the commit operation is performed, the system performs conflict check. If the modified file list in the current commit operation overlaps with the file list in the commit operation after the instance time, the commit operation fails and the write operation is invalid.<p id="mrs_01_24165__p1954944919407"><a name="mrs_01_24165__li1831034918506"></a><a name="li1831034918506"></a><span><img id="mrs_01_24165__image4224550194018" src="en-us_image_0000001349090333.png"></span></p>
</li></ol>
</div>
<div class="section" id="mrs_01_24165__section181975219531"><h4 class="sectiontitle">Precautions for Using the Concurrency Mechanism</h4><ol id="mrs_01_24165__ol933615619538"><li id="mrs_01_24165__li53364618532">For <strong id="mrs_01_24165__b44074231374">insert</strong> and <strong id="mrs_01_24165__b14411826772">bulk_insert</strong> operations, the current Hudi concurrency mechanism cannot ensure that the primary key of the table is unique after data is written. You need to ensure that the primary key is unique.</li><li id="mrs_01_24165__li20336116145310">For incremental queries, data consumption and checkpoints may be out of order. As a result, multiple concurrent write operations are completed at different time points.</li><li id="mrs_01_24165__li880119186387">Concurrent write is supported only after this feature is enabled.</li></ol>
</div>
<div class="section" id="mrs_01_24165__section1264615975419"><h4 class="sectiontitle">How to Use the Concurrency Mechanism</h4><ol id="mrs_01_24165__ol7621329185419"><li id="mrs_01_24165__li1262629185410">Enable the concurrent write mechanism.<p id="mrs_01_24165__p429122811019"><a name="mrs_01_24165__li1262629185410"></a><a name="li1262629185410"></a><strong id="mrs_01_24165__b1061903811542">hoodie.write.concurrency.mode=optimistic_concurrency_control</strong></p>
<p id="mrs_01_24165__p172919281904"><strong id="mrs_01_24165__b126229387549">hoodie.cleaner.policy.failed.writes=LAZY</strong></p>
</li><li id="mrs_01_24165__li1289334014547">Sets the concurrent lock mode.<p id="mrs_01_24165__p979812521114"><a name="mrs_01_24165__li1289334014547"></a><a name="li1289334014547"></a>Hive MetaStore:</p>
<p id="mrs_01_24165__p197983521418"><strong id="mrs_01_24165__b834724714554">hoodie.write.lock.provider=org.apache.hudi.hive.HiveMetastoreBasedLockProvider</strong></p>
<p id="mrs_01_24165__p579816521115"><strong id="mrs_01_24165__b153481647175514">hoodie.write.lock.hivemetastore.database</strong>=&lt;<em id="mrs_01_24165__i151071519172311">database</em>_<em id="mrs_01_24165__i637231062310">name</em>&gt;</p>
<p id="mrs_01_24165__p479812521017"><strong id="mrs_01_24165__b1034914470553">hoodie.write.lock.hivemetastore.table</strong>=&lt;<em id="mrs_01_24165__i46414226234">table</em>_<em id="mrs_01_24165__i138851225172314">name</em>&gt;</p>
<p id="mrs_01_24165__p18565185614554">ZooKeeper:</p>
<p id="mrs_01_24165__p55514512577"><strong id="mrs_01_24165__b2242171121020">hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider</strong></p>
<p id="mrs_01_24165__p331017584584"><strong id="mrs_01_24165__b11193974100">hoodie.write.lock.zookeeper.url</strong>=<em id="mrs_01_24165__i13483210151010">&lt;zookeeper_url&gt;</em></p>
<p id="mrs_01_24165__p571718114593"><strong id="mrs_01_24165__b1392161416101">hoodie.write.lock.zookeeper.port</strong>=<em id="mrs_01_24165__i19330918201010">&lt;zookeeper_port&gt;</em></p>
<p id="mrs_01_24165__p1655111519579"><strong id="mrs_01_24165__b3206421141014">hoodie.write.lock.zookeeper.lock_key</strong>=&lt;<em id="mrs_01_24165__i949252401010">table_name&gt;</em></p>
<p id="mrs_01_24165__p161151742135813"><strong id="mrs_01_24165__b9131428111019">hoodie.write.lock.zookeeper.base_path</strong>=<em id="mrs_01_24165__i415115315108">&lt;table_path&gt;</em></p>
</li></ol>
<p id="mrs_01_24165__p079811529113">For details about more parameters, see <a href="mrs_01_24032.html">Configuration Reference</a>.</p>
<div class="caution" id="mrs_01_24165__note1588133691019"><span class="cautiontitle"><img src="public_sys-resources/caution_3.0-en-us.png"> </span><div class="cautionbody"><p id="mrs_01_24165__p88481442121020">If <strong id="mrs_01_24165__b6847931121320">cleaner policy</strong> is set to <strong id="mrs_01_24165__b1140333411315">Lazy</strong>, the system can only check whether the written files expire but cannot check and clear junk files generated by historical writes. That is, junk files cannot be automatically cleared in concurrent scenarios.</p>
</div></div>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_24038.html">Data Management and Maintenance</a></div>
</div>
</div>