Yang, Tong 6182f91ba8 MRS component operation guide_normal 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-09 14:55:21 +00:00

17 lines
1.8 KiB
HTML

<a name="mrs_01_2014"></a><a name="mrs_01_2014"></a>
<h1 class="topictitle1">Why Tasks Fail When Hash Shuffle Is Used?</h1>
<div id="body1595920219639"><div class="section" id="mrs_01_2014__s4bd4320001f14f5599ef4c532ded1d8d"><h4 class="sectiontitle">Question</h4><p id="mrs_01_2014__ada60561b390e4291aa91544af840a17d">When Hash shuffle is used to run a job that consists of 1000000 map tasks x 100000 reduce tasks, run logs report many message failures and Executor heartbeat timeout, leading to task failures. Why does this happen?</p>
</div>
<div class="section" id="mrs_01_2014__s5125dd1ed215437eb474b274e5161fbb"><h4 class="sectiontitle">Answer</h4><p id="mrs_01_2014__abedd3e56e5d142908aaaec8cc177a318">During the shuffle process, Hash shuffle just writes the data of different reduce partitions to their respective disk files according to hash results without sorting the data.</p>
<p id="mrs_01_2014__a0b4a0684e9474787a773dfc5cc6f8e6f">If there are many reduce partitions, a large number of disk files will be generated. In your case, 10^11 shuffle files, that is, 1000000 * 100000 shuffle files, will be generated. The sheer number of disk files will have a great impact on the file read and write performance. In addition, the operations such as sorting and compressing will consume a large amount of temporary memory space because a large number of file handles are open, presenting great challenges to memory management and garbage collection and incurring the possibility that the Executor fails to respond to Driver.</p>
<p id="mrs_01_2014__ac37d10e5db6f46a0b116d897e8f47a53">Sort shuffle, instead of Hash shuffle, is recommended to run a job.</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_2003.html">Spark Core</a></div>
</div>
</div>