Yang, Tong 6182f91ba8 MRS component operation guide_normal 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-09 14:55:21 +00:00

18 lines
2.1 KiB
HTML

<a name="mrs_01_24089"></a><a name="mrs_01_24089"></a>
<h1 class="topictitle1">Cleaning</h1>
<div id="body0000001104488268"><p id="mrs_01_24089__p161011423163211">Cleaning is used to delete data of versions that are no longer required.</p>
<p id="mrs_01_24089__p8060118">Hudi uses the cleaner working in the background to continuously delete unnecessary data of old versions. You can configure <strong id="mrs_01_24089__b123561012153714">hoodie.cleaner.policy</strong> and <strong id="mrs_01_24089__b24381715193715">hoodie.cleaner.commits.retained</strong> to use different cleaning policies and determine the number of saved commits.</p>
<p id="mrs_01_24089__p63519263813">You can use either of the following methods to perform cleaning:</p>
<ul id="mrs_01_24089__ul1215713174116"><li id="mrs_01_24089__li015761164114">Using Hudi CLI<p id="mrs_01_24089__p173606340425"><a name="mrs_01_24089__li015761164114"></a><a name="li015761164114"></a><strong id="mrs_01_24089__b95931311194217">cleans run --sparkMaster yarn --hoodieConfigs 'hoodie.cleaner.policy=KEEP_LATEST_COMMITS,hoodie.cleaner.commits.retained=1,hoodie.cleaner.incremental.mode=false,hoodie.keep.max.commits=3,hoodie.keep.min.commits=2</strong>'</p>
</li><li id="mrs_01_24089__li4491111214115">Using APIs<p id="mrs_01_24089__p18602411144314"><a name="mrs_01_24089__li4491111214115"></a><a name="li4491111214115"></a><strong id="mrs_01_24089__b5621808477">spark-submit --master yarn --jars /opt/client/Hudi/hudi/lib/hudi-client-common-</strong><em id="mrs_01_24089__i12255131124713">xxx</em><strong id="mrs_01_24089__b47498326472">.jar --class org.apache.hudi.utilities.HoodieCleaner /opt/client/Hudi/hudi/lib/hudi-utilities_</strong><em id="mrs_01_24089__i7463143394717">xxx</em><strong id="mrs_01_24089__b1474943294713">.jar --target-base-path /tmp/default/tb_test_mor</strong></p>
</li></ul>
<p id="mrs_01_24089__p1925630164318">For details about more cleaning parameters, see <a href="mrs_01_24032.html">Configuration Reference</a>.</p>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_24038.html">Data Management and Maintenance</a></div>
</div>
</div>