forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
99 lines
16 KiB
HTML
99 lines
16 KiB
HTML
<a name="ALM-14007"></a><a name="ALM-14007"></a>
|
|
|
|
<h1 class="topictitle1">ALM-14007 NameNode Heap Memory Usage Exceeds the Threshold</h1>
|
|
<div id="body39484380"><div class="section" id="ALM-14007__s2e9237a567ed4b98ba149efbe0576239"><h4 class="sectiontitle">Description</h4><p id="ALM-14007__en-us_topic_0070543643_p58448573">The system checks the HDFS NameNode Heap Memory usage every 30 seconds and compares the actual Heap memory usage with the threshold. The HDFS NameNode Heap Memory usage has a default threshold. This alarm is generated when the HDFS NameNode Heap Memory usage exceeds the threshold.</p>
|
|
<p id="ALM-14007__en-us_topic_0070543643_p56275109">You can change the threshold in <strong id="ALM-14007__en-us_topic_0070543638_b55978213">O&M</strong> > <strong id="ALM-14007__b18216526383">Alarm ></strong> <strong id="ALM-14007__b122075817202">Thresholds</strong> > <em id="ALM-14007__i10674629125819">Name of the desired cluster</em><strong id="ALM-14007__b76731229185816"> ></strong> <strong id="ALM-14007__en-us_topic_0070543638_b5927966">HDFS</strong>.</p>
|
|
<p id="ALM-14007__p27056810165635">When the <strong id="ALM-14007__b48421890111935">Trigger Count</strong> is 1, this alarm is cleared when the HDFS NameNode Heap memory usage is less than or equal to the threshold. When the <strong id="ALM-14007__b1862372016383">Trigger Count</strong> is greater than 1, this alarm is cleared when the HDFS NameNode Heap memory usage is less than or equal to 90% of the threshold.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14007__s01878b5f946e4b429282665a37a39054"><h4 class="sectiontitle">Attribute</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14007__en-us_topic_0070543643_table60922588" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14007__en-us_topic_0070543643_row22910047"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-14007__en-us_topic_0070543643_p43774532">Alarm ID</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-14007__en-us_topic_0070543643_p56076189">Alarm Severity</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-14007__en-us_topic_0070543643_p45877459">Automatically Cleared</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-14007__en-us_topic_0070543643_row25086669"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-14007__en-us_topic_0070543643_p18754291">14007</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-14007__en-us_topic_0070543643_p42702639">Major</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-14007__en-us_topic_0070543643_p36361702">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-14007__s7d7b3a3cd18e41d98b488ee9ed9507e3"><h4 class="sectiontitle">Parameters</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14007__en-us_topic_0070543643_table59616770" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14007__en-us_topic_0070543643_row48135374"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-14007__en-us_topic_0070543643_p6651221">Name</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-14007__en-us_topic_0070543643_p1878020">Meaning</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-14007__row202182325361"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14007__p192431315431">Source</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14007__p692551319435">Specifies the cluster for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14007__en-us_topic_0070543643_row17901956"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14007__en-us_topic_0070543643_p40772349">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14007__en-us_topic_0070543643_p14225942">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14007__en-us_topic_0070543643_row60924614"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14007__en-us_topic_0070543643_p35946674">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14007__en-us_topic_0070543643_p25999507">Specifies the role for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14007__en-us_topic_0070543643_row32668979"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14007__en-us_topic_0070543643_p28941662">HostName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14007__en-us_topic_0070543643_p62573298">Specifies the host for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14007__en-us_topic_0070543643_row26288770"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14007__en-us_topic_0070543643_p49015621">Trigger condition</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14007__en-us_topic_0070543643_p10842373">Specifies the threshold triggering the alarm. If the current indicator value exceeds this threshold, the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-14007__s82f5e85a72104a79a9b3fdbe2358da26"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-14007__en-us_topic_0070543643_p5817001">The HDFS NameNode Heap Memory usage is too high, which affects the data read/write performance of the HDFS.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14007__s7a1d1f83cd8d40d1a6e4245f49f2d2a0"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-14007__en-us_topic_0070543643_p1415066">The HDFS NameNode Heap Memory is insufficient.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14007__s456adb8ffc1644438bdc8f3c1a11babb"><h4 class="sectiontitle">Procedure</h4><p class="tableheading" id="ALM-14007__en-us_topic_0070543643_p47511487"><strong id="ALM-14007__b39105165165337">Delete unnecessary files.</strong></p>
|
|
<ol id="ALM-14007__ol14372328165349"><li id="ALM-14007__li21342252165341"><span>Log in to the HDFS client as user <strong id="ALM-14007__b2819430165341">root</strong>. <span id="ALM-14007__text101733453110"></span>Run <strong id="ALM-14007__b25374874165341">cd</strong> to switch to the client installation directory, and run <strong id="ALM-14007__b27047276165341">source bigdata_env</strong>.</span><p><p class="litext" id="ALM-14007__p42098893165341">If the cluster uses the security mode, perform security authentication.</p>
|
|
<p class="litext" id="ALM-14007__p54567144165341">Run the <strong id="ALM-14007__b43345718165341">kinit hdfs</strong> command and enter the password as prompted. Obtain the password from the administrator.</p>
|
|
</p></li><li id="ALM-14007__li56354808165341"><span>Run the <strong id="ALM-14007__b57862548165341">hdfs dfs -rm -r </strong><em id="ALM-14007__i51000888165341">file or directory</em> command to delete unnecessary files.</span></li><li id="ALM-14007__li46367881165341"><span>Check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-14007__ul41169163165341"><li id="ALM-14007__li37431231165341">If yes, no further action is required.</li><li id="ALM-14007__li12030891165341">If no, go to <a href="#ALM-14007__li15187254165341">4</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-14007__p34978148165341"><strong id="ALM-14007__b36471977165355">Check the NameNode JVM memory usage and configuration.</strong></p>
|
|
<ol start="4" id="ALM-14007__ol41944448165412"><li id="ALM-14007__li15187254165341"><a name="ALM-14007__li15187254165341"></a><a name="li15187254165341"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-14007__b18874158183119">Cluster > </strong><em id="ALM-14007__i6876108173116">Name of the desired cluster</em><strong id="ALM-14007__b1387513815312"> > Services</strong> > <strong id="ALM-14007__b64810847165341">HDFS</strong>.</span></li><li id="ALM-14007__li65654756165341"><span>In the <strong id="ALM-14007__b30282645162958">Basic Information</strong> area, click <strong id="ALM-14007__b22208053165341">NameNode(Active)</strong> to go to the HDFS WebUI.</span><p><div class="note" id="ALM-14007__note840916461457"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-14007__en-us_topic_0193189480_p91833832915">By default, the <strong id="ALM-14007__en-us_topic_0193189480_b4780151814294">admin</strong> user does not have the permissions to manage other components. If the page cannot be opened or the displayed content is incomplete when you access the native UI of a component due to insufficient permissions, you can manually create a user with the permissions to manage that component.</p>
|
|
</div></div>
|
|
</p></li><li id="ALM-14007__li13697230165341"><a name="ALM-14007__li13697230165341"></a><a name="li13697230165341"></a><span>On the HDFS WebUI, click the <strong id="ALM-14007__b54021893165341">Overview</strong> tab. In <strong id="ALM-14007__b16434995165341">Summary</strong>, check the numbers of files, directories, and blocks in the HDFS.</span></li><li id="ALM-14007__li46442940165341"><a name="ALM-14007__li46442940165341"></a><a name="li46442940165341"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-14007__b17530131611315">Cluster > </strong><em id="ALM-14007__i11533616103119">Name of the desired cluster</em><strong id="ALM-14007__b653131623114"> > Services</strong> > <strong id="ALM-14007__b35733815165341">HDFS</strong> > <strong id="ALM-14007__b5932806794051">Configurations</strong> > <strong id="ALM-14007__b11711915165341">All</strong> <strong id="ALM-14007__b15365646121318">Configurations</strong>. In <strong id="ALM-14007__b38298378165341">Search</strong>, enter <strong id="ALM-14007__b9141084165341">GC_OPTS</strong> to check the <strong id="ALM-14007__b15160900165341">GC_OPTS</strong> memory parameter of <strong id="ALM-14007__b2230378165341">HDFS->NameNode</strong>.</span></li></ol>
|
|
<p class="tableheading" id="ALM-14007__p20073407165341"><strong id="ALM-14007__b38406290165417">Adjust the configuration in the system.</strong></p>
|
|
<ol start="8" id="ALM-14007__ol31069147165630"><li id="ALM-14007__li29398803165341"><span>Check whether the memory is configured properly based on the number of files in <a href="#ALM-14007__li13697230165341">6</a> and the NameNode Heap Memory parameters in <a href="#ALM-14007__li46442940165341">7</a>.</span><p><ul class="subitemlist" id="ALM-14007__ul3266533165341"><li id="ALM-14007__li49346526165341">If yes, go to <a href="#ALM-14007__li14671769165341">9</a>.</li><li id="ALM-14007__li37645650165341">If no, go to <a href="#ALM-14007__li58431113165341">11</a>.</li></ul>
|
|
<div class="note" id="ALM-14007__note11220135014616"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-14007__p42201450184618">The recommended mapping between the number of HDFS file objects (filesystem objects = files + blocks) and the JVM parameters configured for NameNode is as follows:</p>
|
|
<ul id="ALM-14007__ul1521217912472"><li id="ALM-14007__li32133919470">If the number of file objects reaches 10,000,000, you are advised to set the JVM parameters as follows: -Xms6G -Xmx6G -XX:NewSize=512M -XX:MaxNewSize=512M</li><li id="ALM-14007__li1821359134715">If the number of file objects reaches 20,000,000, you are advised to set the JVM parameters as follows: -Xms12G -Xmx12G -XX:NewSize=1G -XX:MaxNewSize=1G</li><li id="ALM-14007__li1521369194710">If the number of file objects reaches 50,000,000, you are advised to set the JVM parameters as follows: -Xms32G -Xmx32G -XX:NewSize=3G -XX:MaxNewSize=3G</li><li id="ALM-14007__li721310912471">If the number of file objects reaches 100,000,000, you are advised to set the JVM parameters as follows: -Xms64G -Xmx64G -XX:NewSize=6G -XX:MaxNewSize=6G</li><li id="ALM-14007__li18213179194716">If the number of file objects reaches 200,000,000, you are advised to set the JVM parameters as follows: -Xms96G -Xmx96G -XX:NewSize=9G -XX:MaxNewSize=9G</li><li id="ALM-14007__li92138994714">If the number of file objects reaches 300,000,000, you are advised to set the JVM parameters as follows: -Xms164G -Xmx164G -XX:NewSize=12G -XX:MaxNewSize=12G</li></ul>
|
|
</div></div>
|
|
</p></li><li id="ALM-14007__li14671769165341"><a name="ALM-14007__li14671769165341"></a><a name="li14671769165341"></a><span>Modify the heap memory parameters of the NameNode based on the mapping between the number of file objects and the memory. Click <strong id="ALM-14007__b12170134216478">Save</strong> and choose <strong id="ALM-14007__b4308232137">Dashboard </strong>><strong id="ALM-14007__b830915321435"> More</strong> > <strong id="ALM-14007__b11170184254710">Restart Service</strong>.</span></li><li id="ALM-14007__li2473609165341"><span>Check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-14007__ul27371186165341"><li id="ALM-14007__li64937064165341">If yes, no further action is required.</li><li id="ALM-14007__li25410864165341">If no, go to <a href="#ALM-14007__li58431113165341">11</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-14007__p45014088165341"><strong id="ALM-14007__b54287357165635">Collect fault information.</strong></p>
|
|
<ol start="11" id="ALM-14007__ol20366342165637"><li id="ALM-14007__li58431113165341"><a name="ALM-14007__li58431113165341"></a><a name="li58431113165341"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-14007__b39977366113627">O&M</strong> > <strong id="ALM-14007__b24251979113627">Log > Download</strong>.</span></li><li id="ALM-14007__li8739951165341"><span>Select the following nodes in the required cluster from the <strong id="ALM-14007__b56117973165341">Service</strong>:</span><p><ul class="subitemlist" id="ALM-14007__ul30797267165341"><li id="ALM-14007__li49261978165341">ZooKeeper</li><li id="ALM-14007__li40704620165341">HDFS</li></ul>
|
|
</p></li><li id="ALM-14007__li1145664103113"><span>Click <span><img id="ALM-14007__image1945644173117" src="en-us_image_0269383962.png"></span> in the upper right corner, and set <strong id="ALM-14007__b6456941173117">Start Date</strong> and <strong id="ALM-14007__b11456154113318">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14007__b13456164113319">Download</strong>.</span></li><li id="ALM-14007__li29729817165341"><span>Contact the <span id="ALM-14007__text4614151421417">O&M personnel</span> and send the collected logs.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="ALM-14007__section1529716184534"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-14007__p4677152685316">After the fault is rectified, the system automatically clears this alarm.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14007__s9334d7add17b4f1bba1d923647c06918"><h4 class="sectiontitle">Related Information</h4><p id="ALM-14007__en-us_topic_0070543643_p63059085">None</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|