forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
102 lines
14 KiB
HTML
102 lines
14 KiB
HTML
<a name="ALM-14020"></a><a name="ALM-14020"></a>
|
|
|
|
<h1 class="topictitle1">ALM-14020 Number of Entries in the HDFS Directory Exceeds the Threshold</h1>
|
|
<div id="body62911159"><div class="section" id="ALM-14020__s52b94b967d164895a187f6614eedb3f1"><h4 class="sectiontitle">Description</h4><p id="ALM-14020__en-us_topic_0070543658_p61424044">The system obtains the number of subfiles and subdirectories in a specified directory every hour and checks whether it reaches the percentage of the threshold (the maximum number of subfiles and subdirectories in an HDFS directory, the threshold for triggering an alarm is <strong id="ALM-14020__en-us_topic_0070543658_b15945484">90%</strong> by default). If it exceeds the percentage of the threshold, an alarm is triggered.</p>
|
|
<p id="ALM-14020__en-us_topic_0070543658_p9291628">When the number of subfiles and subdirectories in the directory the alarm is lower than the percentage of the threshold, the alarm is automatically cleared. When the monitoring switch is disabled, alarms corresponding to all directories are cleared. If a directory is removed from the monitoring list, alarms corresponding to the directory are cleared.</p>
|
|
<div class="note" id="ALM-14020__en-us_topic_0070543658_note16515789"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ul id="ALM-14020__en-us_topic_0070543658_ul14424376"><li id="ALM-14020__en-us_topic_0070543658_li62710527">The <strong id="ALM-14020__en-us_topic_0070543658_b27523833">dfs.namenode.fs-limits.max-directory-items</strong> parameter specifies the maximum number of subfiles and subdirectories in the HDFS directory. Its default value is <strong id="ALM-14020__en-us_topic_0070543658_b46387913">1048576</strong>. If the number of subfiles and subdirectories in a directory exceeds the parameter value, subfiles and subdirectories cannot be created in the directory.</li><li id="ALM-14020__en-us_topic_0070543658_li14838039">The <strong id="ALM-14020__en-us_topic_0070543658_b66433492">dfs.namenode.directory-items.monitor</strong> parameter specifies the list of directories to be monitored. Its default value is <strong id="ALM-14020__en-us_topic_0070543658_b61030521">/tmp,/SparkJobHistory,/mr-history</strong>.</li><li id="ALM-14020__en-us_topic_0070543658_li12403784">The <strong id="ALM-14020__en-us_topic_0070543658_b44525192">dfs.namenode.directory-items.monitor.enabled</strong> parameter is used to enable or disable the monitoring switch. Its default value is <strong id="ALM-14020__en-us_topic_0070543658_b65182411">true</strong>, which means the monitoring switch is enabled by default.</li></ul>
|
|
</div></div>
|
|
</div>
|
|
<div class="section" id="ALM-14020__s342a17c8ade24895b5470279244932d5"><h4 class="sectiontitle">Attribute</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14020__en-us_topic_0070543658_table45283976" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14020__en-us_topic_0070543658_row53288992"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-14020__en-us_topic_0070543658_p21441102">Alarm ID</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-14020__en-us_topic_0070543658_p59007741">Alarm Severity</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-14020__en-us_topic_0070543658_p14897749">Automatically Cleared</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-14020__en-us_topic_0070543658_row65867005"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-14020__en-us_topic_0070543658_p33627165">14020</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-14020__en-us_topic_0070543658_p39445854">Major</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-14020__en-us_topic_0070543658_p40997607">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-14020__s788273e0e740423ea9c9befcccf415a7"><h4 class="sectiontitle">Parameters</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14020__en-us_topic_0070543658_table32471905" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14020__en-us_topic_0070543658_row66214016"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-14020__en-us_topic_0070543658_p61735062">Name</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-14020__en-us_topic_0070543658_p34484162">Meaning</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-14020__row0424163873318"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14020__p192431315431">Source</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14020__p692551319435">Specifies the cluster for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14020__en-us_topic_0070543658_row41753705"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14020__en-us_topic_0070543658_p26606965">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14020__en-us_topic_0070543658_p7680590">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14020__en-us_topic_0070543658_row2016451"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14020__en-us_topic_0070543658_p29114809">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14020__en-us_topic_0070543658_p9489328">Specifies the role for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14020__en-us_topic_0070543658_row18295091"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14020__p33011655183812">NameServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14020__en-us_topic_0070543658_p43446717">Specifies the NameService service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14020__en-us_topic_0070543658_row55476136"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14020__en-us_topic_0070543658_p64382001">Directory</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14020__en-us_topic_0070543658_p47559554">Specifies the directory for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14020__en-us_topic_0070543658_row25382805"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14020__en-us_topic_0070543658_p42741350">Trigger Condition</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14020__en-us_topic_0070543658_p39497364">Specifies the threshold triggering the alarm. If the current indicator value exceeds this threshold, the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-14020__s76a4d2743b0b48c69dccebbbf3d5fa7c"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-14020__en-us_topic_0070543658_p45169926">If the number of entries in the monitored directory exceeds 90% of the threshold, an alarm is triggered, but entries can be added to the directory. Once the maximum threshold is exceeded, entries will fail to be added to the directory.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14020__s24569bbac8cd4ce3bfa7b2fa57236fd2"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-14020__en-us_topic_0070543658_p34885413">The number of entries in the monitored directory exceeds 90% of the threshold.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14020__s605bbbb33b884a418da4cdb6b008c631"><h4 class="sectiontitle">Procedure</h4><p class="tableheading" id="ALM-14020__en-us_topic_0070543658_p7146200"><strong id="ALM-14020__b2811879591">Check whether unnecessary files exist in the system.</strong></p>
|
|
<ol id="ALM-14020__ol3801217395927"><li id="ALM-14020__li538272129594"><span>Log in to the HDFS client as user <strong id="ALM-14020__b32617149594">root</strong>. <span id="ALM-14020__text101733453110"></span>Run the <strong id="ALM-14020__b293554289594">cd</strong> command to go to the client installation directory, and run the <strong id="ALM-14020__b628722619594">source bigdata_env</strong> command to set the environment variables.</span><p><p class="litext" id="ALM-14020__p289794389594">If the cluster is in security mode, security authentication is required.</p>
|
|
<p class="litext" id="ALM-14020__p656331249594">Run the <strong id="ALM-14020__b594883529594">kinit hdfs</strong> command and enter the password as prompted. Obtain the password from the administrator.</p>
|
|
</p></li><li id="ALM-14020__li220988259594"><span>Run the following command to check whether files and directories in the directory with the alarm can be deleted:</span><p><p class="litext" id="ALM-14020__p484613039594"><strong id="ALM-14020__b146828649594">hdfs dfs -ls </strong><em id="ALM-14020__i650369129594">Directory with the alarm</em></p>
|
|
<ul class="subitemlist" id="ALM-14020__ul24554259594"><li id="ALM-14020__li330514919594">If yes, go to <a href="#ALM-14020__li352493139594">3</a>.</li><li id="ALM-14020__li599251489594">If no, go to <a href="#ALM-14020__li564838279594">5</a>.</li></ul>
|
|
</p></li><li id="ALM-14020__li352493139594"><a name="ALM-14020__li352493139594"></a><a name="li352493139594"></a><span>Run the following command to delete unnecessary files.</span><p><p class="litext" id="ALM-14020__p39165909594"><strong id="ALM-14020__b646717039594">hdfs dfs -rm -r -f </strong><em id="ALM-14020__i451744199594">File or directory path</em></p>
|
|
<div class="note" id="ALM-14020__note537615464205"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-14020__p0845156171416"><span id="ALM-14020__text4335154811504">Deleting a file or folder is a high-risk operation. Ensure that the file or folder is no longer required before performing this operation.</span></p>
|
|
</div></div>
|
|
</p></li><li id="ALM-14020__li279187119594"><span>Wait 1 hour and check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-14020__ul136007479594"><li id="ALM-14020__li488083659594">If yes, no further action is required.</li><li id="ALM-14020__li611635179594">If no, go to <a href="#ALM-14020__li564838279594">5</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-14020__p552978629594"><strong id="ALM-14020__b3487472495934">Check whether the threshold is correctly configured.</strong></p>
|
|
<ol start="5" id="ALM-14020__ol3588123895957"><li id="ALM-14020__li564838279594"><a name="ALM-14020__li564838279594"></a><a name="li564838279594"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-14020__b11332233194112">Cluster > </strong><em id="ALM-14020__i933663314411">Name of the desired cluster</em><strong id="ALM-14020__b113336337419"> > Services</strong> > <strong id="ALM-14020__b2162943793121">HDFS</strong> > <strong id="ALM-14020__b6593994694053">Configurations</strong> > <strong id="ALM-14020__b715398193121">All</strong> <strong id="ALM-14020__b6816162115232">Configurations</strong>. Search for the <strong id="ALM-14020__b489518289594">dfs.namenode.fs-limits.max-directory-items</strong> parameter and check whether the parameter value is appropriate.</span><p><ul class="subitemlist" id="ALM-14020__ul435586839594"><li id="ALM-14020__li56751709594">If yes, go to <a href="#ALM-14020__li615368649594">9</a>.</li><li id="ALM-14020__li570356369594">If no, go to <a href="#ALM-14020__li152448229594">6</a>.</li></ul>
|
|
</p></li><li id="ALM-14020__li152448229594"><a name="ALM-14020__li152448229594"></a><a name="li152448229594"></a><span>Increase the parameter value.</span></li><li id="ALM-14020__li720993694211"><span>Save the configuration and click <strong id="ALM-14020__b4308232137">Dashboard </strong>><strong id="ALM-14020__b830915321435"> More</strong> > <strong id="ALM-14020__b11170184254710">Restart Service</strong>.</span></li><li id="ALM-14020__li59760009594"><span>Wait 1 hour and check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-14020__ul290714359594"><li id="ALM-14020__li29856729594">If yes, no further action is required.</li><li id="ALM-14020__li405128619594">If no, go to <a href="#ALM-14020__li615368649594">9</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-14020__p603163239594"><strong id="ALM-14020__b403622911005">Collect fault information.</strong></p>
|
|
<ol start="9" id="ALM-14020__ol173507441009"><li id="ALM-14020__li615368649594"><a name="ALM-14020__li615368649594"></a><a name="li615368649594"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-14020__b39977366113627">O&M</strong> > <strong id="ALM-14020__b24251979113627">Log > Download</strong>.</span></li><li id="ALM-14020__li164447369594"><span>Select <strong id="ALM-14020__b169608729594">HDFS</strong> in the required cluster from the <strong id="ALM-14020__b184301209594">Service</strong>.</span></li><li id="ALM-14020__li1145664103113"><span>Click <span><img id="ALM-14020__image1945644173117" src="en-us_image_0269417344.png"></span> in the upper right corner, and set <strong id="ALM-14020__b6456941173117">Start Date</strong> and <strong id="ALM-14020__b11456154113318">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14020__b13456164113319">Download</strong>.</span></li><li id="ALM-14020__li210214349594"><span>Contact the <span id="ALM-14020__text4614151421417">O&M personnel</span> and send the collected logs.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="ALM-14020__section1529716184534"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-14020__p4677152685316">After the fault is rectified, the system automatically clears this alarm.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14020__s093cfd1873fb4b2ea5a5fc9fb0ee24ac"><h4 class="sectiontitle">Related Information</h4><p id="ALM-14020__en-us_topic_0070543658_p61675141">None</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|