Yang, Tong 6182f91ba8 MRS component operation guide_normal 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-09 14:55:21 +00:00

40 lines
3.8 KiB
HTML

<a name="mrs_01_1591"></a><a name="mrs_01_1591"></a>
<h1 class="topictitle1">Optimizing the Design of Partitioning Method</h1>
<div id="body1596442815782"><div class="section" id="mrs_01_1591__s3fc76c7e38f24dc8a6f1187df24d8c15"><h4 class="sectiontitle">Scenarios</h4><p id="mrs_01_1591__aca5f346a8b194b3c8c0a59acee07806a">The divide of tasks can be optimized by optimizing the partitioning method. If data skew occurs in a certain task, the whole execution process is delayed. Therefore, when designing the partitioning method, ensure that partitions are evenly assigned.</p>
</div>
<div class="section" id="mrs_01_1591__s641e2162ad01406c9909bf2f938bdfb2"><h4 class="sectiontitle">Procedure</h4><p id="mrs_01_1591__a60d4e9ddf0f246db9aa5fa129bc8a737">Partitioning methods are as follows:</p>
<ul id="mrs_01_1591__u019c3c8e35a34da08902d748f17b7b02"><li id="mrs_01_1591__lfacfbda93a2d4e0fa3f631b254cfa91f"><strong id="mrs_01_1591__aab28394726384ba69ab7384bad54f49c">Random partitioning</strong>: randomly partitions data.<pre class="screen" id="mrs_01_1591__sbaaf765a0cf647aea4eb2bc51531dd44">dataStream.shuffle();</pre>
</li><li id="mrs_01_1591__l823c85a6ada04afa9e2135ebec0a7395"><strong id="mrs_01_1591__a758a97f6f70d4cf8bc181f9ba02fb2ce">Rebalancing (round-robin partitioning)</strong>: evenly partitions data based on round-robin. The partitioning method is useful to optimize data with data skew.<pre class="screen" id="mrs_01_1591__s0cde5d7dcc734c929bb74e3e1082593d">dataStream.rebalance();</pre>
</li><li id="mrs_01_1591__ld890b682c32b4e55a20f5c60af669f5a"><strong id="mrs_01_1591__a46c1bb96df1a4e049bb960826d1780d0">Rescaling</strong>: assign data to downstream subsets in the form of round-robin. The partitioning method is useful if you want to deliver data from each parallel instance of a data source to subsets of some mappers without the using rebalance (), that is, the complete rebalance operation.<pre class="screen" id="mrs_01_1591__s20e8f35c72de443da2aa5fdd2f24633d">dataStream.rescale();</pre>
</li><li id="mrs_01_1591__lbcd4cc4fd7f64e95bb7d33a8ad02996a"><strong id="mrs_01_1591__aad84e5b6b766428ca65fb2dbc6729cde">Broadcast</strong>: broadcast data to all partitions.<pre class="screen" id="mrs_01_1591__s34bf19e5ec76481dae8e4e18e104db2b">dataStream.broadcast();</pre>
</li><li id="mrs_01_1591__la3b4fbeebc13451cafc40408f6d0fc92"><strong id="mrs_01_1591__a65496f76c2a24067ac799a7ce1dd910f">User-defined partitioning</strong>: use a user-defined partitioner to select a target task for each element. The user-defined partitioning allows user to partition data based on a certain feature to achieve optimized task execution.<p id="mrs_01_1591__a60b26702bc974869b4be041d54e269c5">The following is an example:</p>
<pre class="screen" id="mrs_01_1591__s6568a126aeee4da2b2398643b799cbb4">// fromElements builds simple Tuple2 stream
DataStream&lt;Tuple2&lt;String, Integer&gt;&gt; dataStream = env.fromElements(Tuple2.of("hello",1), Tuple2.of("test",2), Tuple2.of("world",100));
// Defines the key value used for partitioning. Adding one to the value equals to the id.
Partitioner&lt;Tuple2&lt;String, Integer&gt;&gt; strPartitioner = new Partitioner&lt;Tuple2&lt;String, Integer&gt;&gt;() {
@Override
public int partition(Tuple2&lt;String, Integer&gt; key, int numPartitions) {
return (key.f0.length() + key.f1) % numPartitions;
}
};
// The Tuple2 data is used as the basis for partitioning.
dataStream.partitionCustom(strPartitioner, new KeySelector&lt;Tuple2&lt;String, Integer&gt;, Tuple2&lt;String, Integer&gt;&gt;() {
@Override
public Tuple2&lt;String, Integer&gt; getKey(Tuple2&lt;String, Integer&gt; value) throws Exception {
return value;
}
}).print();</pre>
</li></ul>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1587.html">Optimization DataStream</a></div>
</div>
</div>