forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
98 lines
14 KiB
HTML
98 lines
14 KiB
HTML
<a name="ALM-14008"></a><a name="ALM-14008"></a>
|
|
|
|
<h1 class="topictitle1">ALM-14008 DataNode Heap Memory Usage Exceeds the Threshold</h1>
|
|
<div id="body29727482"><div class="section" id="ALM-14008__s644274bc531446aaa7a27d4bf78c20aa"><h4 class="sectiontitle">Description</h4><p id="ALM-14008__en-us_topic_0070543644_p59818078">The system checks the HDFS DataNode Heap Memory usage every 30 seconds and compares the actual Heap Memory usage with the threshold. The HDFS DataNode Heap Memory usage has a default threshold. This alarm is generated when the HDFS DataNode Heap Memory usage exceeds the threshold.</p>
|
|
<p id="ALM-14008__en-us_topic_0070543644_p1491792">You can change the threshold in <strong id="ALM-14008__en-us_topic_0070543638_b55978213">O&M</strong> > <strong id="ALM-14008__b18216526383">Alarm ></strong> <strong id="ALM-14008__b122075817202">Thresholds</strong> > <em id="ALM-14008__i10674629125819">Name of the desired cluster</em><strong id="ALM-14008__b76731229185816"> ></strong> <strong id="ALM-14008__en-us_topic_0070543638_b5927966">HDFS</strong>.</p>
|
|
<p id="ALM-14008__p974624692155">When the <strong id="ALM-14008__b48421890111935">Trigger Count</strong> is 1, this alarm is cleared when the HDFS DataNode Heap Memory usage is less than or equal to the threshold. When the <strong id="ALM-14008__b18586724193815">Trigger Count</strong> is greater than 1, this alarm is cleared when the HDFS DataNode Heap Memory usage is less than or equal to 90% of the threshold.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14008__s221d35ee47994424b317d262585116aa"><h4 class="sectiontitle">Attribute</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14008__en-us_topic_0070543644_table49549872" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14008__en-us_topic_0070543644_row53980416"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-14008__en-us_topic_0070543644_p10337547">Alarm ID</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-14008__en-us_topic_0070543644_p32034983">Alarm Severity</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-14008__en-us_topic_0070543644_p44696827">Automatically Cleared</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-14008__en-us_topic_0070543644_row63673264"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-14008__en-us_topic_0070543644_p57260782">14008</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-14008__en-us_topic_0070543644_p7611740">Major</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-14008__en-us_topic_0070543644_p12571170">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-14008__s1b1660d4b0054650b59ba6133a8ea7f8"><h4 class="sectiontitle">Parameters</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14008__en-us_topic_0070543644_table11631849" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14008__en-us_topic_0070543644_row56960606"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-14008__en-us_topic_0070543644_p50406362">Name</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-14008__en-us_topic_0070543644_p56383482">Meaning</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-14008__row13013268364"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14008__p192431315431">Source</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14008__p692551319435">Specifies the cluster for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14008__en-us_topic_0070543644_row3659302"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14008__en-us_topic_0070543644_p27968071">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14008__en-us_topic_0070543644_p50821317">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14008__en-us_topic_0070543644_row54738674"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14008__en-us_topic_0070543644_p4647622">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14008__en-us_topic_0070543644_p40913065">Specifies the role for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14008__en-us_topic_0070543644_row32673273"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14008__en-us_topic_0070543644_p29289465">HostName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14008__en-us_topic_0070543644_p23636445">Specifies the host for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14008__en-us_topic_0070543644_row11401416"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14008__en-us_topic_0070543644_p51099518">Trigger condition</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14008__en-us_topic_0070543644_p45420300">Specifies the threshold triggering the alarm. If the current indicator value exceeds this threshold, the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-14008__s6df276abe0f74e3b995553c81fb17ba8"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-14008__en-us_topic_0070543644_p55165723">The HDFS DataNode Heap Memory usage is too high, which affects the data read/write performance of the HDFS.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14008__s94c4a46499714855a9c37db629a757d9"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-14008__en-us_topic_0070543644_p39238611">The HDFS DataNode Heap Memory is insufficient.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14008__s488fb168ae97467bb98a96527e8f303b"><h4 class="sectiontitle">Procedure</h4><p class="tableheading" id="ALM-14008__en-us_topic_0070543644_p24210907"><strong id="ALM-14008__b450796571703">Delete unnecessary files.</strong></p>
|
|
<ol id="ALM-14008__ol457802417015"><li id="ALM-14008__li61210121706"><span>Log in to the HDFS client as user <strong id="ALM-14008__b527027971706">root</strong>. <span id="ALM-14008__text101733453110"></span>Run <strong id="ALM-14008__b45631291706">cd</strong> to switch to the client installation directory, and run <strong id="ALM-14008__b410681611706">source bigdata_env</strong>.</span><p><p class="litext" id="ALM-14008__p340691361706">If the cluster uses the security mode, perform security authentication.</p>
|
|
<p class="litext" id="ALM-14008__p81366521706">Run the <strong id="ALM-14008__b381867741706">kinit hdfs</strong> command and enter the password as prompted. Obtain the password from the administrator.</p>
|
|
</p></li><li id="ALM-14008__li330328501706"><span>Run the <strong id="ALM-14008__b550891091706">hdfs dfs -rm -r </strong><em id="ALM-14008__i260399381706">file or directory</em> command to delete unnecessary files.</span></li><li id="ALM-14008__li34861011706"><span>Check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-14008__ul340117221706"><li id="ALM-14008__li288602001706">If yes, no further action is required.</li><li id="ALM-14008__li559748631706">If no, go to <a href="#ALM-14008__li552961441706">4</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-14008__p376700461706"><strong id="ALM-14008__b4499026417020">Check the DataNode JVM memory usage and configuration.</strong></p>
|
|
<ol start="4" id="ALM-14008__ol1244300017034"><li id="ALM-14008__li552961441706"><a name="ALM-14008__li552961441706"></a><a name="li552961441706"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-14008__b12281438113117">Cluster > </strong><em id="ALM-14008__i628373843114">Name of the desired cluster</em><strong id="ALM-14008__b52829386315"> > Services</strong> > <strong id="ALM-14008__b139387401706">HDFS</strong>.</span></li><li id="ALM-14008__li455705771706"><span>In the <strong id="ALM-14008__b30282645162958">Basic Information</strong> area, click <strong id="ALM-14008__b498026401706">NameNode(Active)</strong> to go to the HDFS WebUI.</span><p><div class="note" id="ALM-14008__note840916461457"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-14008__en-us_topic_0193189480_p91833832915">By default, the <strong id="ALM-14008__en-us_topic_0193189480_b4780151814294">admin</strong> user does not have the permissions to manage other components. If the page cannot be opened or the displayed content is incomplete when you access the native UI of a component due to insufficient permissions, you can manually create a user with the permissions to manage that component.</p>
|
|
</div></div>
|
|
</p></li><li id="ALM-14008__li2292511706"><a name="ALM-14008__li2292511706"></a><a name="li2292511706"></a><span>On the HDFS WebUI, click the <strong id="ALM-14008__b74820121706">DataNodes</strong> tab, and check the number of blocks of all DataNodes related to the alarm.</span></li><li id="ALM-14008__li421758201706"><a name="ALM-14008__li421758201706"></a><a name="li421758201706"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-14008__b1327004983117">Cluster > </strong><em id="ALM-14008__i142721049123115">Name of the desired cluster</em><strong id="ALM-14008__b327134915318"> > Services</strong> > <strong id="ALM-14008__b185693571706">HDFS</strong> > <strong id="ALM-14008__b329064911706">Configurations</strong> > <strong id="ALM-14008__b481801251706">All <strong id="ALM-14008__b7158112382314">Configurations</strong></strong>. In <strong id="ALM-14008__b309679481706">Search</strong>, enter <strong id="ALM-14008__b102760761706">GC_OPTS</strong> to check the GC_OPTS memory parameter of <strong id="ALM-14008__b253758231706">HDFS->DataNode</strong>.</span></li></ol>
|
|
<p class="tableheading" id="ALM-14008__p270558231706"><strong id="ALM-14008__b6519532617039">Adjust the configuration in the system.</strong></p>
|
|
<ol start="8" id="ALM-14008__ol3627381117051"><li id="ALM-14008__li410378251706"><span>Check whether the memory is configured properly based on the number of block in <a href="#ALM-14008__li2292511706">6</a> and the DataNode Heap Memory parameters in <a href="#ALM-14008__li421758201706">7</a>.</span><p><ul class="subitemlist" id="ALM-14008__ul120162981706"><li id="ALM-14008__li22258281706">If yes, go to <a href="#ALM-14008__li84133131706">9</a>.</li><li id="ALM-14008__li460743861706">If no, go to <a href="#ALM-14008__li435105481706">11</a>.</li></ul>
|
|
<div class="note" id="ALM-14008__note363517151125"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-14008__p1090415211923">The mapping between the average number of blocks of a DataNode instance and the DataNode memory is as follows:</p>
|
|
<ul id="ALM-14008__ul1790415214219"><li id="ALM-14008__li89046216219">If the average number of blocks of a DataNode instance reaches 2,000,000, the reference values of the JVM parameters of the DataNode are as follows: -Xms6G -Xmx6G -XX:NewSize=512M -XX:MaxNewSize=512M</li><li id="ALM-14008__li20904192110218">If the average number of blocks of a DataNode instance reaches 5,000,000, the reference values of the JVM parameters of the DataNode are as follows: -Xms12G -Xmx12G -XX:NewSize=1G -XX:MaxNewSize=1G</li></ul>
|
|
</div></div>
|
|
</p></li><li id="ALM-14008__li84133131706"><a name="ALM-14008__li84133131706"></a><a name="li84133131706"></a><span>Modify the heap memory parameters of the DataNode based on the mapping between the number of blocks and the memory. Click <strong id="ALM-14008__b572414386215">Save</strong> and choose <strong id="ALM-14008__b4308232137">Dashboard </strong>><strong id="ALM-14008__b830915321435"> More</strong> > <strong id="ALM-14008__b11170184254710">Restart Service</strong>.</span></li><li id="ALM-14008__li515190191706"><span>Check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-14008__ul362617301706"><li id="ALM-14008__li86109541706">If yes, no further action is required.</li><li id="ALM-14008__li263987021706">If no, go to <a href="#ALM-14008__li435105481706">11</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-14008__p579201181706"><strong id="ALM-14008__b3439776217056">Collect fault information.</strong></p>
|
|
<ol start="11" id="ALM-14008__ol2773642817058"><li id="ALM-14008__li435105481706"><a name="ALM-14008__li435105481706"></a><a name="li435105481706"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-14008__b39977366113627">O&M</strong> > <strong id="ALM-14008__b24251979113627">Log > Download</strong>.</span></li><li id="ALM-14008__li587099241706"><span>Select <strong id="ALM-14008__b560506161706">HDFS</strong> in the required cluster from the <strong id="ALM-14008__b346934981706">Service</strong>.</span></li><li id="ALM-14008__li1145664103113"><span>Click <span><img id="ALM-14008__image1945644173117" src="en-us_image_0269383963.png"></span> in the upper right corner, and set <strong id="ALM-14008__b6456941173117">Start Date</strong> and <strong id="ALM-14008__b11456154113318">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14008__b13456164113319">Download</strong>.</span></li><li id="ALM-14008__li38403331706"><span>Contact the <span id="ALM-14008__text4614151421417">O&M personnel</span> and send the collected logs.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="ALM-14008__section1529716184534"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-14008__p4677152685316">After the fault is rectified, the system automatically clears this alarm.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14008__s3a28d5bcbb76421eaf34da000c3a6bfe"><h4 class="sectiontitle">Related Information</h4><p id="ALM-14008__en-us_topic_0070543644_p48425074">None</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|