Yang, Tong 6182f91ba8 MRS component operation guide_normal 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-09 14:55:21 +00:00

67 lines
7.5 KiB
HTML

<a name="mrs_01_2019"></a><a name="mrs_01_2019"></a>
<h1 class="topictitle1">Why Does the Out of Memory Error Occur in NodeManager During the Execution of Spark Applications</h1>
<div id="body1595920219934"><div class="section" id="mrs_01_2019__s37dd2e102eed42d992646c14b70a58ce"><h4 class="sectiontitle">Question</h4><p id="mrs_01_2019__a049b82b40dd74a2cb9cec139dc293ae5">During the execution of Spark applications, if the YARN External Shuffle service is enabled and there are too many shuffle tasks, the <strong id="mrs_01_2019__abc9e3d3a2c034d3e82a601ff6fb6847b">java.lang.OutofMemoryError: Direct buffer Memory</strong> error occurs, indicating insufficient memory. The error log is as follows:</p>
<pre class="screen" id="mrs_01_2019__sd57af0ee4f27416d9db9a006ebd222e6">2016-12-06 02:01:00,768 | WARN | shuffle-server-38 | Exception in connection from /192.168.101.95:53680 | TransportChannelHandler.java:79
io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct buffer memory
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:153)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:693)
at java.nio.DirectByteBuffer.&lt;init&gt;(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434)
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:168)
at io.netty.buffer.PoolArena.reallocate(PoolArena.java:277)
at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:108)
at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:251)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:849)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:841)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:831)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:146)
... 10 more</pre>
</div>
<div class="section" id="mrs_01_2019__sb7f6f67bdd53440397b62815f92d043b"><h4 class="sectiontitle">Answer</h4><p id="mrs_01_2019__abeb8b70e316c4d8aa0b2e114e809c81b">In the Shuffle Service of YARN, the number of started threads are twice of the number of available CPU cores. The default size of direct buffer memory is 128 MB. If there are too many shuffle tasks connected at the same time, the direct buffer memory allocated to each thread service is insufficient. For example, if there are 40 CPU cores and there are 80 threads started by the Shuffle Service of YARN, the direct buffer memory allocated to each thread is less than 2 MB.</p>
<p id="mrs_01_2019__ae28f2af5d1604aec8d968e0591bf353d">To solve this problem, increase the directory buffer memory based on the number of CPU cores in NodeManager. For example, if there are 40 of CPU cores, increase the direct buffer memory to 512 MB, that is, configure the <span class="parmname" id="mrs_01_2019__p71df5e418c9046f782a91e60647d3d31"><b>GC_OPTS</b></span> parameter of NodeManager as follows:</p>
<p id="mrs_01_2019__a2bff42b39b4e4f218e5db4de96d9fb29">-XX:MaxDirectMemorySize=512M</p>
<div class="note" id="mrs_01_2019__note1439313242257"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="mrs_01_2019__p16281144122313">By default, <strong id="mrs_01_2019__b1772610563233">-XX:MaxDirectMemorySize</strong> is not configured in the <strong id="mrs_01_2019__b820016212413">GC_OPTS</strong> parameter. To configure it, you need to add it to the <strong id="mrs_01_2019__b78789914242">GC_OPTS</strong> parameter as an custom option.</p>
</div></div>
<p id="mrs_01_2019__a42eea0515d6d4ce595a42ad1e02a8f30">To configure the <span class="parmname" id="mrs_01_2019__pc2d06d5aeec94130b12691108f43058a"><b>GC_OPTS</b></span> parameter, log in to FusionInsight Manager, choose <strong id="mrs_01_2019__b12293512913">Cluster &gt; </strong><em id="mrs_01_2019__i132319512919">Name of the desired cluster</em><strong id="mrs_01_2019__b2229135091"> &gt; Service<span id="mrs_01_2019__ph1367715844019">s</span> </strong>&gt; <strong id="mrs_01_2019__a6bda9b42e87d47b8ba33b7bcf8d7f67a">Yarn </strong>&gt;<strong id="mrs_01_2019__a2de281bd0b564000af0ee3a02fb2e53b"> Configuration<span id="mrs_01_2019__ph5621204114">s</span></strong>, click <strong id="mrs_01_2019__b205012457588">All Configurations</strong>, and choose <span class="menucascade" id="mrs_01_2019__m3b43deb3ddcf4451a0527c932d3c006c"><b><span class="uicontrol" id="mrs_01_2019__ueb921183d82b4f869c96a04956d80c70">NodeManager &gt; System</span></b></span>, and then modify the <span class="parmname" id="mrs_01_2019__paee56ee69b41452a9fcfc2bea1672b7d"><b>GC_OPTS</b></span> parameter.</p>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="mrs_01_2019__t6d8ada0bb75a4143a9f4347b0343642d" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Parameter description</caption><thead align="left"><tr id="mrs_01_2019__r4e780bda5b864932b17d3273d8d5f2ec"><th align="left" class="cellrowborder" valign="top" width="18.82%" id="mcps1.3.2.7.2.4.1.1"><p id="mrs_01_2019__a03f44901ec16438b938f71135582ae82"><strong id="mrs_01_2019__a2a4ac282d9444990becf7ceff0d50c04">Parameter</strong></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="56.81%" id="mcps1.3.2.7.2.4.1.2"><p id="mrs_01_2019__a1f3f78b869004edbb56ea8231f11acbe"><strong id="mrs_01_2019__adc163d3ff3994bc1b05b5ae5e2d4b22b">Description</strong></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="24.37%" id="mcps1.3.2.7.2.4.1.3"><p id="mrs_01_2019__a859872c74d5e4031a83328ef7e8b0a81"><strong id="mrs_01_2019__a0257745a45024bc7a7950a5a44b18904">Default Value</strong></p>
</th>
</tr>
</thead>
<tbody><tr id="mrs_01_2019__rceea590307d8479f86ff8eb46ff3c3d5"><td class="cellrowborder" valign="top" width="18.82%" headers="mcps1.3.2.7.2.4.1.1 "><p id="mrs_01_2019__a0048197656934046bdcebe69ac6bb391">GC_OPTS</p>
</td>
<td class="cellrowborder" valign="top" width="56.81%" headers="mcps1.3.2.7.2.4.1.2 "><p id="mrs_01_2019__a6a894f55124f4026846edcf69a66dd10">The GC parameter of YARN NodeManger.</p>
</td>
<td class="cellrowborder" valign="top" width="24.37%" headers="mcps1.3.2.7.2.4.1.3 "><p id="mrs_01_2019__a309ebedfb6854460a758a4eb3a3bbaf4">128M</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_2003.html">Spark Core</a></div>
</div>
</div>