forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
88 lines
12 KiB
HTML
88 lines
12 KiB
HTML
<a name="ALM-14014"></a><a name="ALM-14014"></a>
|
|
|
|
<h1 class="topictitle1">ALM-14014 NameNode GC Time Exceeds the Threshold</h1>
|
|
<div id="body27153046"><div class="section" id="ALM-14014__s6a14ec140cc046479724474c01c99ba2"><h4 class="sectiontitle">Description</h4><p id="ALM-14014__en-us_topic_0070543651_p31450727">The system checks the garbage collection (GC) duration of the NameNode process every 60 seconds. This alarm is generated when the GC duration exceeds the threshold (12 seconds by default).</p>
|
|
<p id="ALM-14014__en-us_topic_0070543651_p14621088">This alarm is cleared when the GC duration is less than the threshold.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14014__sd0c7f31f2821461682996d81d0298c3f"><h4 class="sectiontitle">Attribute</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14014__en-us_topic_0070543651_table43457451" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14014__en-us_topic_0070543651_row25725008"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-14014__en-us_topic_0070543651_p3350926">Alarm ID</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-14014__en-us_topic_0070543651_p2989564">Alarm Severity</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-14014__en-us_topic_0070543651_p40828098">Automatically Cleared</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-14014__en-us_topic_0070543651_row18741641"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-14014__en-us_topic_0070543651_p41677944">14014</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-14014__en-us_topic_0070543651_p20470328">Major</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-14014__en-us_topic_0070543651_p47483889">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-14014__sc22595f8d7664f3b99cf12ffc5e9ada6"><h4 class="sectiontitle">Parameters</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14014__en-us_topic_0070543651_table20989770" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14014__en-us_topic_0070543651_row58429415"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-14014__en-us_topic_0070543651_p35162213">Name</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-14014__en-us_topic_0070543651_p29567015">Meaning</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-14014__row8875193320346"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14014__p192431315431">Source</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14014__p692551319435">Specifies the cluster for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14014__en-us_topic_0070543651_row46118038"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14014__en-us_topic_0070543651_p44573593">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14014__en-us_topic_0070543651_p53691289">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14014__en-us_topic_0070543651_row13459555"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14014__en-us_topic_0070543651_p16482152">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14014__en-us_topic_0070543651_p59985950">Specifies the role for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14014__en-us_topic_0070543651_row3002643"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14014__en-us_topic_0070543651_p41887508">HostName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14014__en-us_topic_0070543651_p37445019">Specifies the host for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14014__en-us_topic_0070543651_row1460851"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14014__en-us_topic_0070543651_p51220091">Trigger Condition</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14014__en-us_topic_0070543651_p55186683">Specifies the threshold triggering the alarm. If the current indicator value exceeds this threshold, the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-14014__s18b9c1e5b7c84d309eeaeca80d91df46"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-14014__en-us_topic_0070543651_p40936348">A long GC duration of the NameNode process may interrupt the services.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14014__sdd903c5494904cf3afaa7f4b70f5dcae"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-14014__en-us_topic_0070543651_p27509893">The heap memory of the NameNode instance is overused or the heap memory is inappropriately allocated. As a result, GCs occur frequently.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14014__s256c409480c84296a7af4dba7ae4bdae"><h4 class="sectiontitle">Procedure</h4><p class="tableheading" id="ALM-14014__en-us_topic_0070543651_p13708895"><strong id="ALM-14014__b220983293118">Check the GC duration.</strong></p>
|
|
<ol id="ALM-14014__ol5463741993136"><li id="ALM-14014__li392690193121"><span>On the FusionInsight Manager portal, choose <strong id="ALM-14014__b0989163143314">O&M</strong> > <strong id="ALM-14014__b1098973103314">Alarm </strong>> <strong id="ALM-14014__b39898316336">Alarms</strong>. On the displayed interface, click the drop-down button of <strong id="ALM-14014__b622524616819">ALM-14014 NameNode GC Time Exceeds the Threshold. </strong>Then check the role name in <strong id="ALM-14014__b14790172183618">Location </strong>and confirm the IP adress of the instance.</span></li><li id="ALM-14014__li1875181993121"><span>On the FusionInsight Manager portal, choose <strong id="ALM-14014__b18135018354">Cluster > </strong><em id="ALM-14014__i38375019350">Name of the desired cluster</em><strong id="ALM-14014__b158135023518"> > Services</strong> > <strong id="ALM-14014__b4964353493121">HDFS</strong> > <strong id="ALM-14014__b4413862293121">Instance</strong> > <strong id="ALM-14014__b6170327993121">NameNode (IP address for which the alarm is generated)</strong>. Click the drop-down menu in the upper right corner of <strong id="ALM-14014__b3273144141318">Chart</strong>, choose <strong id="ALM-14014__b7246166191312">Customize</strong> > <strong id="ALM-14014__b15702441192211">Garbage Collection</strong>, and select<strong id="ALM-14014__b540245213164"> </strong><strong id="ALM-14014__b3190969793121">NameNode Garbage Collection (GC)</strong> to check the GC duration statistics of the NameNode process collected every minute.</span></li><li id="ALM-14014__li2097964293121"><span>Check whether the GC duration of the NameNode process collected every minute exceeds the threshold (12 seconds by default).</span><p><ul class="subitemlist" id="ALM-14014__ul4707031493121"><li id="ALM-14014__li4250240993121">If yes, go to <a href="#ALM-14014__li5224893093121">4</a>.</li><li id="ALM-14014__li2014311593121">If no, go to <a href="#ALM-14014__li1978585593121">7</a>.</li></ul>
|
|
</p></li><li id="ALM-14014__li5224893093121"><a name="ALM-14014__li5224893093121"></a><a name="li5224893093121"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-14014__b292118588357">Cluster > </strong><em id="ALM-14014__i169231058123518">Name of the desired cluster</em><strong id="ALM-14014__b11922758143516"> > Services</strong> > <strong id="ALM-14014__b2162943793121">HDFS</strong> > <strong id="ALM-14014__b6593994694053">Configurations</strong> > <strong id="ALM-14014__b715398193121">All</strong> <strong id="ALM-14014__b6816162115232">Configurations</strong> > <strong id="ALM-14014__b6438583193121">NameNode</strong> > <strong id="ALM-14014__b4260156993121">System</strong> to increase the value of <strong id="ALM-14014__b2817505893121">GC_OPTS</strong> parameter as required.</span><p><div class="note" id="ALM-14014__note18667101910516"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-14014__p18501121155">The recommended mapping between the number of HDFS file objects (filesystem objects = files + blocks) and the JVM parameters configured for NameNode is as follows:</p>
|
|
<ul id="ALM-14014__ul2501162119513"><li id="ALM-14014__li1650115214519">If the number of file objects reaches 10,000,000, you are advised to set the JVM parameters as follows: -Xms6G -Xmx6G -XX:NewSize=512M -XX:MaxNewSize=512M</li><li id="ALM-14014__li20502821157">If the number of file objects reaches 20,000,000, you are advised to set the JVM parameters as follows: -Xms12G -Xmx12G -XX:NewSize=1G -XX:MaxNewSize=1G</li><li id="ALM-14014__li250262119515">If the number of file objects reaches 50,000,000, you are advised to set the JVM parameters as follows: -Xms32G -Xmx32G -XX:NewSize=3G -XX:MaxNewSize=3G</li><li id="ALM-14014__li250218212057">If the number of file objects reaches 100,000,000, you are advised to set the JVM parameters as follows: -Xms64G -Xmx64G -XX:NewSize=6G -XX:MaxNewSize=6G</li><li id="ALM-14014__li1112012281252">If the number of file objects reaches 200,000,000, you are advised to set the JVM parameters as follows: -Xms96G -Xmx96G -XX:NewSize=9G -XX:MaxNewSize=9G</li><li id="ALM-14014__li650242115516">If the number of file objects reaches 300,000,000, you are advised to set the JVM parameters as follows: -Xms164G -Xmx164G -XX:NewSize=12G -XX:MaxNewSize=12G</li></ul>
|
|
</div></div>
|
|
</p></li><li id="ALM-14014__li47832793121"><span>Save the configuration and restart the NameNode instance.</span></li><li id="ALM-14014__li6142852993121"><span>Check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-14014__ul5129715193121"><li id="ALM-14014__li430494493121">If yes, no further action is required.</li><li id="ALM-14014__li1315622393121">If no, go to <a href="#ALM-14014__li1978585593121">7</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-14014__p5902117593121"><strong id="ALM-14014__b1432065693145">Collect fault information.</strong></p>
|
|
<ol start="7" id="ALM-14014__ol4320768793150"><li id="ALM-14014__li1978585593121"><a name="ALM-14014__li1978585593121"></a><a name="li1978585593121"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-14014__b39977366113627">O&M</strong> > <strong id="ALM-14014__b24251979113627">Log > Download</strong>.</span></li><li id="ALM-14014__li2645568993121"><span>Select <strong id="ALM-14014__b4385497293121">NameNode</strong> in the required cluster from the <strong id="ALM-14014__b5915043293121">Service</strong>.</span></li><li id="ALM-14014__li1145664103113"><span>Click <span><img id="ALM-14014__image1945644173117" src="en-us_image_0269383969.png"></span> in the upper right corner, and set <strong id="ALM-14014__b6456941173117">Start Date</strong> and <strong id="ALM-14014__b11456154113318">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14014__b13456164113319">Download</strong>.</span></li><li id="ALM-14014__li6286535393121"><span>Contact the <span id="ALM-14014__text4614151421417">O&M personnel</span> and send the collected logs.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="ALM-14014__section1529716184534"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-14014__p4677152685316">After the fault is rectified, the system automatically clears this alarm.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14014__sf238038be8af43bebf0dd26df1929d6d"><h4 class="sectiontitle">Related Information</h4><p id="ALM-14014__en-us_topic_0070543651_p47549696">None</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|