forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
140 lines
17 KiB
HTML
140 lines
17 KiB
HTML
<a name="ALM-14006"></a><a name="ALM-14006"></a>
|
|
|
|
<h1 class="topictitle1">ALM-14006 Number of HDFS Files Exceeds the Threshold</h1>
|
|
<div id="body67038468"><div class="section" id="ALM-14006__section57589516"><h4 class="sectiontitle">Description</h4><p id="ALM-14006__p61406793">The system periodically checks the number of HDFS files every 30 seconds and compares the number of HDFS files with the threshold. This alarm is generated when the system detects that the number of HDFS files exceeds the threshold.</p>
|
|
<p id="ALM-14006__p7894314">If <strong id="ALM-14006__b48421890111935">Trigger Count</strong> is <strong id="ALM-14006__b38110919416">1</strong>, this alarm is cleared when the number of HDFS files is less than or equal to the threshold. If <strong id="ALM-14006__b86421416103817">Trigger Count</strong> is greater than <strong id="ALM-14006__b169848102045">1</strong>, this alarm is cleared when the number of HDFS files is less than or equal to 90% of the threshold.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14006__section48543596"><h4 class="sectiontitle">Attribute</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14006__table35459729" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14006__row47975022"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-14006__p60771574">Alarm ID</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-14006__p23550475">Alarm Severity</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-14006__p28540308">Auto Clear</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-14006__row30063610"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-14006__p19233316">14006</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-14006__p14394746">Minor</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-14006__p25123814">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-14006__section34239183"><h4 class="sectiontitle">Parameters</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14006__table21763021" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14006__row17034695"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-14006__p37633021">Name</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-14006__p28375860">Meaning</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-14006__row18643359173110"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14006__p156438591896">Source</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14006__p187931338134115">Specifies the cluster for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14006__row16743315"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14006__p65062640">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14006__p62793452">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14006__row28270157"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14006__p35626567">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14006__p58708857">Specifies the role for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14006__row58617672"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14006__p51620924">HostName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14006__p56757437">Specifies the host for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14006__row41054890"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14006__p37111817">NameServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14006__p53267196">Specifies the NameService for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14006__row9642721"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14006__p42862926">Trigger Condition</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14006__p49344963">Specifies the threshold for triggering the alarm.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-14006__section39717199"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-14006__p37519081">Disk storage space is insufficient, which may result in data import failure. The performance of the HDFS system is affected.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14006__section21910476"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-14006__p19146727">The number of HDFS files exceeds the threshold.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14006__section62976559"><h4 class="sectiontitle">Procedure</h4><p class="tableheading" id="ALM-14006__p7381094"><strong id="ALM-14006__b120458116497">Check the number of files in the system.</strong></p>
|
|
<ol id="ALM-14006__ol1313713410325"><li id="ALM-14006__li313764116325"><span>On FusionInsight Manager, check the number of HDFS files. Specifically, choose <strong id="ALM-14006__b108611933112117">Cluster</strong> > <em id="ALM-14006__i11861163316214">Name of the desired cluster</em> > <strong id="ALM-14006__b14861833152111">Services</strong> > <strong id="ALM-14006__b1286183316211">HDFS</strong>. Click the drop-down menu in the upper right corner of <strong id="ALM-14006__b5861153314212">Chart</strong>, choose <strong id="ALM-14006__b8861143314214">Customize</strong> > <strong id="ALM-14006__b8861433192115">File and Block</strong>, and select <strong id="ALM-14006__b10861133372113">HDFS File</strong> and <strong id="ALM-14006__b18611833122110">Total Blocks</strong>.</span></li><li id="ALM-14006__li1913714419324"><span>Choose <strong id="ALM-14006__b7351356175818">Cluster</strong> > <em id="ALM-14006__i149611439133012">Name of the desired cluster</em> > <strong id="ALM-14006__b33518561588">Services</strong> > <strong id="ALM-14006__b12461145125816">HDFS</strong> > <strong id="ALM-14006__b1424614459586">Configurations</strong> > <strong id="ALM-14006__b8246154515589">All Configurations</strong>, and search for the <strong id="ALM-14006__b124610452582">GC_OPTS</strong> parameter under <strong id="ALM-14006__b2246245195810">NameNode</strong>.</span></li><li id="ALM-14006__li131379417323"><span>Configure the threshold of the number of configuration file objects. Specifically, change the value of <strong id="ALM-14006__b560161465917">Xmx</strong> (GB) in the <strong id="ALM-14006__b66041435919">GC_OPTS</strong> parameter. The threshold (specified by y) is calculated as follows: y = 0.2007 x Xmx - 0.6312, where x indicates the memory capacity Xmx (GB) and y indicates the number of files (unit: kW). Adjust the memory size as required.</span></li><li id="ALM-14006__li813754113214"><span>Confirm that the value of <strong id="ALM-14006__b699418319598">GC_PROFILE</strong> is <strong id="ALM-14006__b399593114597">custom</strong> so that the <strong id="ALM-14006__b129951831175910">GC_OPTS</strong> configuration takes effect. Click <strong id="ALM-14006__b1499523115599">Save</strong> and choose <strong id="ALM-14006__b8995531195919">More</strong> > <strong id="ALM-14006__b1499543113598">Restart Instance </strong>to restart the service.</span></li><li id="ALM-14006__li20137114111326"><span>Check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-14006__ul32852828164725"><li id="ALM-14006__li56567483164725">If yes, no further action is required.</li><li id="ALM-14006__li18563395164725">If no, go to <a href="#ALM-14006__li57477018164725">6</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-14006__p27240003164725"><strong id="ALM-14006__b9680952164929">Check whether needless files exist in the system.</strong></p>
|
|
<ol start="6" id="ALM-14006__ol53728440164751"><li id="ALM-14006__li57477018164725"><a name="ALM-14006__li57477018164725"></a><a name="li57477018164725"></a><span>Log in to the HDFS client as user <strong id="ALM-14006__b45409754164910">root</strong>. <span id="ALM-14006__text85258205227"></span> Run <strong id="ALM-14006__b6034609164910">cd</strong> to switch to the client installation directory, and run <strong id="ALM-14006__b54311489164910">source bigdata_env</strong> to configure the environment variables.</span><p><p class="litext" id="ALM-14006__p29720893164725">If the cluster uses the security mode, perform security authentication.</p>
|
|
<p class="litext" id="ALM-14006__p58582118164725">Run the <strong id="ALM-14006__b37154451164910">kinit hdfs</strong> command and enter the password as prompted. Obtain the password from the MRS cluster administrator.</p>
|
|
</p></li><li id="ALM-14006__li55344888164725"><span>Run <strong id="ALM-14006__b40723089164910">hdfs dfs -ls </strong><em id="ALM-14006__i30963486164910">file or directory</em> to check whether the files in the directory can be deleted.</span><p><ul class="subitemlist" id="ALM-14006__ul6149432164725"><li id="ALM-14006__li22009832164725">If yes, go to <a href="#ALM-14006__li46417503164725">8</a>.</li><li id="ALM-14006__li37965972164725">If no, go to <a href="#ALM-14006__li15104347164725">9</a>.</li></ul>
|
|
</p></li><li id="ALM-14006__li46417503164725"><a name="ALM-14006__li46417503164725"></a><a name="li46417503164725"></a><span>Run the <strong id="ALM-14006__b42470439784526">hdfs dfs -rm -r</strong> <em id="ALM-14006__i115992362384526">file or directory path</em> command. After deleting unnecessary files, wait until the files are retained in the recycle bin for a period longer than the value of <strong id="ALM-14006__b19616162814241">fs.trash.interval</strong> on the NameNode. Then check whether the alarm is cleared.</span><p><div class="note" id="ALM-14006__note389114104155"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-14006__p7449185412811"><span id="ALM-14006__text5875253174515">Deleting a file or folder is a high-risk operation. Ensure that the file or folder is no longer required before performing this operation.</span></p>
|
|
</div></div>
|
|
<ul class="subitemlist" id="ALM-14006__ul3058569164725"><li id="ALM-14006__li13996715164725">If yes, no further action is required.</li><li id="ALM-14006__li59992164164725">If no, go to <a href="#ALM-14006__li15104347164725">9</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-14006__p27527121164725"><strong id="ALM-14006__b923415101440">Collect the fault information.</strong></p>
|
|
<ol start="9" id="ALM-14006__ol1279733116485"><li id="ALM-14006__li15104347164725"><a name="ALM-14006__li15104347164725"></a><a name="li15104347164725"></a><span>On FusionInsight Manager, choose <strong id="ALM-14006__b2479153817114">O&M</strong> > <strong id="ALM-14006__b194906387113">Log </strong>><strong id="ALM-14006__b1949219389113"> Download</strong>.</span></li><li id="ALM-14006__li1721398164725"><span>Expand the drop-down list next to the <strong id="ALM-14006__b125111256207">Service</strong> field. In the <strong id="ALM-14006__b102572025102018">Services</strong> dialog box that is displayed, select <strong id="ALM-14006__b132572251201">HDFS</strong> for the target cluster.</span></li><li id="ALM-14006__li15492585164725"><span>Click <span><img id="ALM-14006__image1945644173117" src="en-us_image_0269383961.png"></span> in the upper right corner, and set <strong id="ALM-14006__b6456941173117">Start Date</strong> and <strong id="ALM-14006__b11456154113318">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14006__b13456164113319">Download</strong>.</span></li><li id="ALM-14006__li46939876164725"><span>Contact <span id="ALM-14006__text4128163424210">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="ALM-14006__section169311343318"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-14006__p754913417333">This alarm is automatically cleared after the fault is rectified.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14006__section29918121"><h4 class="sectiontitle">Related Information</h4><p id="ALM-14006__p1055951511"><strong id="ALM-14006__b15781654924">Configuration rules of the NameNode JVM parameter</strong></p>
|
|
<p id="ALM-14006__a912f7bc55bd44dc7af294ec1977d06db">Default value of the NameNode JVM parameter<strong id="ALM-14006__aac46de0e6b7d44c2af394062994cdd01"> GC_OPTS</strong>:</p>
|
|
<p id="ALM-14006__p1279145241818">-Xms2G -Xmx4G -XX:NewSize=128M -XX:MaxNewSize=256M -XX:MetaspaceSize=128M -XX:MaxMetaspaceSize=128M -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=65 -XX:+PrintGCDetails -Dsun.rmi.dgc.client.gcInterval=0x7FFFFFFFFFFFFFE -Dsun.rmi.dgc.server.gcInterval=0x7FFFFFFFFFFFFFE -XX:-OmitStackTraceInFastThrow -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M -Djdk.tls.ephemeralDHKeySize=3072 -Djdk.tls.rejectClientInitiatedRenegotiation=true -Djava.io.tmpdir=${Bigdata_tmp_dir}</p>
|
|
<div class="p" id="ALM-14006__a2616701d47ea48779f0bc5a4f1d88df4">The number of NameNode files is proportional to the used memory size of the NameNode. When file objects change, you need to change <strong id="ALM-14006__b143181533192820">-Xms2G -Xmx4G -XX:NewSize=128M -XX:MaxNewSize=256M</strong> in the default value. The following table lists the reference values.
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14006__te26cc4df625545418ac6cc0a8f9f88f0" frame="border" border="1" rules="all"><caption><b>Table 1 </b>NameNode JVM configuration</caption><thead align="left"><tr id="ALM-14006__r12696e3d9eac47308ab8ceeeabaf9d60"><th align="left" class="cellrowborder" valign="top" width="28.01%" id="mcps1.3.8.5.2.2.3.1.1"><p id="ALM-14006__a1d09de0bdefa473fb5bbe44fa938dd42">Number of File Objects</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="71.99%" id="mcps1.3.8.5.2.2.3.1.2"><p id="ALM-14006__a3cda0d608a9342f1a59ffca91ccb43fb">Reference Value</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-14006__rf83d3e2a613f4411b06455ba3d050a91"><td class="cellrowborder" valign="top" width="28.01%" headers="mcps1.3.8.5.2.2.3.1.1 "><p id="ALM-14006__a698a4c1bac0e47a39311c4d3eab22de6">10,000,000</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="71.99%" headers="mcps1.3.8.5.2.2.3.1.2 "><p id="ALM-14006__ae013dee8e5c249b589373d1f0d623955">-Xms6G -Xmx6G -XX:NewSize=512M -XX:MaxNewSize=512M</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14006__r71dd137d8340469b9d6cfa56bad12a4f"><td class="cellrowborder" valign="top" width="28.01%" headers="mcps1.3.8.5.2.2.3.1.1 "><p id="ALM-14006__a068a2ba0b8aa4d6d83a392de11a0c207">20,000,000</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="71.99%" headers="mcps1.3.8.5.2.2.3.1.2 "><p id="ALM-14006__a702b804579e84bd490aae5a79da4302f">-Xms12G -Xmx12G -XX:NewSize=1G -XX:MaxNewSize=1G</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14006__racee4c018a9f4a729de4be081d5da2c7"><td class="cellrowborder" valign="top" width="28.01%" headers="mcps1.3.8.5.2.2.3.1.1 "><p id="ALM-14006__afbb0ed490b2d406583d777e01adad1f0">50,000,000</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="71.99%" headers="mcps1.3.8.5.2.2.3.1.2 "><p id="ALM-14006__a9f14dacaf1354a21919d6791d891e970">-Xms32G -Xmx32G -XX:NewSize=3G -XX:MaxNewSize=3G</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14006__r39bfe2109be84e2bb1757f212f3c7361"><td class="cellrowborder" valign="top" width="28.01%" headers="mcps1.3.8.5.2.2.3.1.1 "><p id="ALM-14006__a03cfef126810415795047cb2bbc7b216">100,000,000</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="71.99%" headers="mcps1.3.8.5.2.2.3.1.2 "><p id="ALM-14006__a61d1ccadcb8a4b738f933413bd59a772">-Xms64G -Xmx64G -XX:NewSize=6G -XX:MaxNewSize=6G</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14006__r3ee104e2dbe84dd6b2aea0a70d6fb067"><td class="cellrowborder" valign="top" width="28.01%" headers="mcps1.3.8.5.2.2.3.1.1 "><p id="ALM-14006__a795cdaa74df54d38aeb6f1317a72d62a">200,000,000</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="71.99%" headers="mcps1.3.8.5.2.2.3.1.2 "><p id="ALM-14006__aeb30301f03f549c7b5f56ce86b23a410">-Xms96G -Xmx96G -XX:NewSize=9G -XX:MaxNewSize=9G</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14006__r439309a2fb8c4f19a60122d992e22e77"><td class="cellrowborder" valign="top" width="28.01%" headers="mcps1.3.8.5.2.2.3.1.1 "><p id="ALM-14006__ad0738c3e97f14db9a3a72023f45e6c59">300,000,000</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="71.99%" headers="mcps1.3.8.5.2.2.3.1.2 "><p id="ALM-14006__ab31506428f634ad58ac748c65df24232">-Xms164G -Xmx164G -XX:NewSize=12G -XX:MaxNewSize=12G</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|