forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
20 lines
2.5 KiB
HTML
20 lines
2.5 KiB
HTML
<a name="mrs_01_1593"></a><a name="mrs_01_1593"></a>
|
|
|
|
<h1 class="topictitle1">Experience Summary</h1>
|
|
<div id="body1596442815785"><div class="section" id="mrs_01_1593__s19ecc6ccab7343ef9b6eb739c889ea75"><h4 class="sectiontitle">Avoiding Data Skew</h4><p id="mrs_01_1593__a123fcd577108445ca3d3e71b24495ea2">If data skew occurs (certain data volume is extremely large), the execution time of tasks is inconsistent even though no GC is performed.</p>
|
|
<ul id="mrs_01_1593__u9442100e9d674dd29edc4004b8ee5645"><li id="mrs_01_1593__lee7b72298acd41f5984d63597d11cf65">Redefine keys. Use keys of smaller granularity to optimize the task size.</li><li id="mrs_01_1593__l4a2fd8e60cc64356b55b753b0248f881">Modify the DOP.</li><li id="mrs_01_1593__l4211f9fd107048b088dee1dc46ff2d70">Call the rebalance operation to balance data partitions.</li></ul>
|
|
</div>
|
|
<div class="section" id="mrs_01_1593__s4e06e1d83084415c8fb9fc4e41b67c6a"><h4 class="sectiontitle">Setting Timeout Interval for the Buffer</h4><ul id="mrs_01_1593__ud2aa6819a1b74c93a4875387d84b8c63"><li id="mrs_01_1593__l42c0e46a7f1544229973c2820786ad78">During the execution of tasks, data is exchanged through network. You can set the <strong id="mrs_01_1593__b20103203822011">setBufferTimeout</strong> parameter to specify a buffer timeout interval for data exchanging among different servers.</li><li id="mrs_01_1593__l5edb8842efaa40ef81e15dcf121395f6">If <strong id="mrs_01_1593__b199585383441840">setBufferTimeout</strong> is set to <strong id="mrs_01_1593__b154615168741840">-1</strong>, the refreshing operation is performed when the buffer is full to maximize the throughput. If <strong id="mrs_01_1593__b69240166141840">setBufferTimeout</strong> is set to <strong id="mrs_01_1593__b177358734141840">0</strong>, the refreshing operation is performed each time data is received to minimize the delay. If <strong id="mrs_01_1593__b75528209741840">setBufferTimeout</strong> is set to a value greater than <strong id="mrs_01_1593__b83964167341840">0</strong>, the refreshing operation is performed after the buffer times out.<div class="p" id="mrs_01_1593__a9c6b7419dbbd46f5b904cb3afdbbb695">The following is an example:<pre class="screen" id="mrs_01_1593__s51fd0eecfd4d4f0a962be7d2d439f878">env.setBufferTimeout(timeoutMillis);
|
|
|
|
env.generateSequence(1,10).map(new MyMapper()).setBufferTimeout(timeoutMillis);</pre>
|
|
</div>
|
|
</li></ul>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1587.html">Optimization DataStream</a></div>
|
|
</div>
|
|
</div>
|
|
|