forked from docs/doc-exports
Reviewed-by: Kacur, Michal <michal.kacur@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
104 lines
9.5 KiB
HTML
104 lines
9.5 KiB
HTML
<a name="mrs_01_24804"></a><a name="mrs_01_24804"></a>
|
|
|
|
<h1 class="topictitle1">Clustering Configuration</h1>
|
|
<div id="body0000001535727658"><div class="note" id="mrs_01_24804__note8466173113919"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="mrs_01_24804__p246783183917">This section applies only to MRS 3.2.0 or later.</p>
|
|
<p id="mrs_01_24804__p1063145220441">Clustering has two strategies: <strong id="mrs_01_24804__b3557180203119">hoodie.clustering.plan.strategy.class</strong> and <strong id="mrs_01_24804__b136671066313">hoodie.clustering.execution.strategy.class</strong>. Typically, if <strong id="mrs_01_24804__b207991244113117">hoodie.clustering.plan.strategy.class</strong> is set to <strong id="mrs_01_24804__b1241365413116">SparkRecentDaysClusteringPlanStrategy</strong> or <strong id="mrs_01_24804__b10746217326">SparkSizeBasedClusteringPlanStrategy</strong>, <strong id="mrs_01_24804__b1625316224325">hoodie.clustering.execution.strategy.class</strong> does not need to be specified. However, if <strong id="mrs_01_24804__b189454611320">hoodie.clustering.plan.strategy.class</strong> is set to <strong id="mrs_01_24804__b56617543323">SparkSingleFileSortPlanStrategy</strong>, <strong id="mrs_01_24804__b104731612153310">hoodie.clustering.execution.strategy.class</strong> must be set to <strong id="mrs_01_24804__b4573618183313">SparkSingleFileSortExecutionStrategy</strong>.</p>
|
|
</div></div>
|
|
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="mrs_01_24804__table18509413103710" frame="border" border="1" rules="all"><thead align="left"><tr id="mrs_01_24804__row1550981314378"><th align="left" class="cellrowborder" valign="top" width="46.714671467146715%" id="mcps1.3.2.1.4.1.1"><p id="mrs_01_24804__p73073148310">Parameter</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="32.983298329832984%" id="mcps1.3.2.1.4.1.2"><p id="mrs_01_24804__p123071714832">Description</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="20.3020302030203%" id="mcps1.3.2.1.4.1.3"><p id="mrs_01_24804__p2307191416317">Default Value</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="mrs_01_24804__row115100138376"><td class="cellrowborder" valign="top" width="46.714671467146715%" headers="mcps1.3.2.1.4.1.1 "><p id="mrs_01_24804__p351011135375">hoodie.clustering.inline</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="32.983298329832984%" headers="mcps1.3.2.1.4.1.2 "><p id="mrs_01_24804__p25101113203711">Whether to execute clustering synchronously</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="20.3020302030203%" headers="mcps1.3.2.1.4.1.3 "><p id="mrs_01_24804__p15510141312373">false</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_24804__row4510013153713"><td class="cellrowborder" valign="top" width="46.714671467146715%" headers="mcps1.3.2.1.4.1.1 "><p id="mrs_01_24804__p751019134377">hoodie.clustering.inline.max.commits</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="32.983298329832984%" headers="mcps1.3.2.1.4.1.2 "><p id="mrs_01_24804__p2510813203716">Number of commits that trigger clustering</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="20.3020302030203%" headers="mcps1.3.2.1.4.1.3 "><p id="mrs_01_24804__p7510121313374">4</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_24804__row175101413163710"><td class="cellrowborder" valign="top" width="46.714671467146715%" headers="mcps1.3.2.1.4.1.1 "><p id="mrs_01_24804__p15510111319374">hoodie.clustering.plan.strategy.target.file.max.bytes</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="32.983298329832984%" headers="mcps1.3.2.1.4.1.2 "><p id="mrs_01_24804__p185108132378">Maximum size of each file after clustering</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="20.3020302030203%" headers="mcps1.3.2.1.4.1.3 "><p id="mrs_01_24804__p1651011319379">1024 * 1024 * 1024 byte</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_24804__row1351021314379"><td class="cellrowborder" valign="top" width="46.714671467146715%" headers="mcps1.3.2.1.4.1.1 "><p id="mrs_01_24804__p0510131383720">hoodie.clustering.plan.strategy.small.file.limit</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="32.983298329832984%" headers="mcps1.3.2.1.4.1.2 "><p id="mrs_01_24804__p1051015135377">Files smaller than this size will be clustered.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="20.3020302030203%" headers="mcps1.3.2.1.4.1.3 "><p id="mrs_01_24804__p9510131343717">300 * 1024 * 1024 byte</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_24804__row1451016139371"><td class="cellrowborder" valign="top" width="46.714671467146715%" headers="mcps1.3.2.1.4.1.1 "><p id="mrs_01_24804__p145101713133720">hoodie.clustering.plan.strategy.sort.columns</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="32.983298329832984%" headers="mcps1.3.2.1.4.1.2 "><p id="mrs_01_24804__p17510161323712">Columns used for sorting in clustering</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="20.3020302030203%" headers="mcps1.3.2.1.4.1.3 "><p id="mrs_01_24804__p4510813193720">None</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_24804__row18911101151519"><td class="cellrowborder" valign="top" width="46.714671467146715%" headers="mcps1.3.2.1.4.1.1 "><p id="mrs_01_24804__p1091219119152">hoodie.layout.optimize.strategy</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="32.983298329832984%" headers="mcps1.3.2.1.4.1.2 "><p id="mrs_01_24804__p79124116157">Clustering execution strategy. Three sorting modes are available: <strong id="mrs_01_24804__b267805243712">linear</strong>, <strong id="mrs_01_24804__b6841456173712">z-order</strong>, and <strong id="mrs_01_24804__b649919585375">hilbert</strong>.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="20.3020302030203%" headers="mcps1.3.2.1.4.1.3 "><p id="mrs_01_24804__p491218120153">linear</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_24804__row35172058785"><td class="cellrowborder" valign="top" width="46.714671467146715%" headers="mcps1.3.2.1.4.1.1 "><p id="mrs_01_24804__p125181858989">hoodie.layout.optimize.enable</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="32.983298329832984%" headers="mcps1.3.2.1.4.1.2 "><p id="mrs_01_24804__p5518358785">Set this parameter to <strong id="mrs_01_24804__b1648833173916">true</strong> when <strong id="mrs_01_24804__b6889113413916">z-order</strong> or <strong id="mrs_01_24804__b1670904573915">hilbert</strong> is used.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="20.3020302030203%" headers="mcps1.3.2.1.4.1.3 "><p id="mrs_01_24804__p551855814816">false</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_24804__row17685021195718"><td class="cellrowborder" valign="top" width="46.714671467146715%" headers="mcps1.3.2.1.4.1.1 "><p id="mrs_01_24804__p176862218578">hoodie.clustering.plan.strategy.class</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="32.983298329832984%" headers="mcps1.3.2.1.4.1.2 "><p id="mrs_01_24804__p136861521165716">Strategy class for filtering file groups for clustering. By default, files whose size is less than the value of <strong id="mrs_01_24804__b3962162114414">hoodie.clustering.plan.strategy.small.file.limit</strong> are filtered.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="20.3020302030203%" headers="mcps1.3.2.1.4.1.3 "><p id="mrs_01_24804__p1868682111574">org.apache.hudi.client.clustering.plan.strategy.SparkSizeBasedClusteringPlanStrategy</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_24804__row166241936495"><td class="cellrowborder" valign="top" width="46.714671467146715%" headers="mcps1.3.2.1.4.1.1 "><p id="mrs_01_24804__p1162423619911">hoodie.clustering.execution.strategy.class</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="32.983298329832984%" headers="mcps1.3.2.1.4.1.2 "><p id="mrs_01_24804__p1977514443132">Strategy class for executing clustering (subclass of RunClusteringStrategy), which is used to define the execution mode of a cluster plan.</p>
|
|
<p id="mrs_01_24804__p20624836693">The default classes sort the file groups in the plan by the specified column and meets the configured target file size.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="20.3020302030203%" headers="mcps1.3.2.1.4.1.3 "><p id="mrs_01_24804__p1662473616914">org.apache.hudi.client.clustering.run.strategy.SparkSortAndSizeExecutionStrategy</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_24804__row104496318325"><td class="cellrowborder" valign="top" width="46.714671467146715%" headers="mcps1.3.2.1.4.1.1 "><p id="mrs_01_24804__p114495383220">hoodie.clustering.plan.strategy.max.num.groups</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="32.983298329832984%" headers="mcps1.3.2.1.4.1.2 "><p id="mrs_01_24804__p104495363212">Maximum number of file groups that can be selected during clustering. A larger value indicates a higher concurrency.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="20.3020302030203%" headers="mcps1.3.2.1.4.1.3 "><p id="mrs_01_24804__p84491739320">30</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_24804__row19492153573213"><td class="cellrowborder" valign="top" width="46.714671467146715%" headers="mcps1.3.2.1.4.1.1 "><p id="mrs_01_24804__p10492123514324">hoodie.clustering.plan.strategy.max.bytes.per.group</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="32.983298329832984%" headers="mcps1.3.2.1.4.1.2 "><p id="mrs_01_24804__p11492123533218">Maximum number of data records in each file group involved in clustering</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="20.3020302030203%" headers="mcps1.3.2.1.4.1.3 "><p id="mrs_01_24804__p94925351320">2 * 1024 * 1024 * 1024 byte</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_24032.html">Hudi Configuration Reference</a></div>
|
|
</div>
|
|
</div>
|
|
|