forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
100 lines
16 KiB
HTML
100 lines
16 KiB
HTML
<a name="ALM-12017"></a><a name="ALM-12017"></a>
|
|
|
|
<h1 class="topictitle1">ALM-12017 Insufficient Disk Capacity</h1>
|
|
<div id="body66975920"><div class="section" id="ALM-12017__s7a756a3074824ff29f40824ccac74790"><h4 class="sectiontitle">Description</h4><p id="ALM-12017__en-us_topic_0070543559_p25104075">The system checks the host disk usage of the system every 30 seconds and compares the actual disk usage with the threshold. The disk usage has a default threshold, this alarm is generated when the host disk usage exceeds the specified threshold.</p>
|
|
<p id="ALM-12017__p2082647611242">When the <strong id="ALM-12017__b44134084101639"><strong id="ALM-12017__b041615559258">Trigger Count</strong></strong> is 1, this alarm is cleared when the usage of a host disk partition is less than or equal to the threshold. When the <strong id="ALM-12017__b153891654103012"><strong id="ALM-12017__b53896541301">Trigger Count</strong></strong> is greater than 1, this alarm is cleared when the usage of a host disk partition is less than or equal to 90% of the threshold.</p>
|
|
</div>
|
|
<div class="section" id="ALM-12017__s7eaf36ea595e48c7ad5d731ce280ebd9"><h4 class="sectiontitle">Attribute</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12017__en-us_topic_0070543559_table47260226" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12017__en-us_topic_0070543559_row59951591"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-12017__en-us_topic_0070543559_p24240706">Alarm ID</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-12017__en-us_topic_0070543559_p17340145">Alarm Severity</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-12017__en-us_topic_0070543559_p62374493">Auto Clear</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-12017__en-us_topic_0070543559_row19169133"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-12017__en-us_topic_0070543559_p9195913">12017</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-12017__en-us_topic_0070543559_p6671514">Major</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-12017__en-us_topic_0070543559_p3521800">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-12017__s349818e9d8e9413dbca3219347d41604"><h4 class="sectiontitle">Parameters</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12017__en-us_topic_0070543559_table16830394" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12017__en-us_topic_0070543559_row2814577"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-12017__en-us_topic_0070543559_p26654174">Name</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-12017__en-us_topic_0070543559_p11504467">Meaning</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-12017__row73657316554"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12017__p192431315431">Source</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12017__p692551319435">Specifies the cluster or system for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-12017__en-us_topic_0070543559_row59446649"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12017__en-us_topic_0070543559_p50449271">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12017__en-us_topic_0070543559_p59859190">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-12017__en-us_topic_0070543559_row1861806"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12017__en-us_topic_0070543559_p16588560">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12017__en-us_topic_0070543559_p1496135">Specifies the role for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-12017__en-us_topic_0070543559_row13465222"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12017__en-us_topic_0070543559_p16941207">HostName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12017__en-us_topic_0070543559_p30060528">Specifies the host for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-12017__en-us_topic_0070543559_row2109304"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12017__en-us_topic_0070543559_p36635921">PartitionName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12017__en-us_topic_0070543559_p14719623">Specifies the device partition for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-12017__en-us_topic_0070543559_row65367744"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12017__en-us_topic_0070543559_p60295883">Trigger Condition</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12017__en-us_topic_0070543559_p52128359">Specifies the threshold triggering the alarm. If the current indicator value exceeds this threshold, the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-12017__s44d29551fde24303a025841fbafd5684"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-12017__en-us_topic_0070543559_p61647547">Service processes become unavailable.</p>
|
|
</div>
|
|
<div class="section" id="ALM-12017__sd53668685806495fb8d456ba9e2c2c11"><h4 class="sectiontitle">Possible Causes</h4><ul id="ALM-12017__en-us_topic_0070543559_ul27395440"><li id="ALM-12017__en-us_topic_0070543559_li45232374">The alarm threshold is incorrect.</li><li id="ALM-12017__en-us_topic_0070543559_li4438190">Disk configuration of the server cannot meet service requirements.</li></ul>
|
|
</div>
|
|
<div class="section" id="ALM-12017__s6fd2395d167c4db4814624ea702a37ac"><h4 class="sectiontitle">Procedure</h4><p class="tableheading" id="ALM-12017__en-us_topic_0070543559_p23949084"><strong id="ALM-12017__b457009885739">Check whether the alarm threshold is appropriate.</strong></p>
|
|
<ol id="ALM-12017__ol229057318582"><li id="ALM-12017__li3269990385745"><span>Log in to FusionInsight Manager, choose <strong id="ALM-12017__b126241333219">O&M</strong> > <strong id="ALM-12017__b156241435323">Alarm ></strong> <strong id="ALM-12017__b1562412314328">Thresholds</strong><strong id="ALM-12017__b1962413373216"> > </strong><em id="ALM-12017__i1162415315324">Name of the desired cluster</em> > <strong id="ALM-12017__b962416314323">Host</strong> > <strong id="ALM-12017__b106241931323">Disk</strong> > <strong id="ALM-12017__b4624163203210">Disk Usage</strong> and check whether the threshold (configurable, 90% by default) is appropriate.</span><p><ul class="subitemlist" id="ALM-12017__ul1854640385745"><li id="ALM-12017__li1687169885745">If yes, go to <a href="#ALM-12017__li1280611085745">2</a>.</li><li id="ALM-12017__li2443033285745">If no, go to <a href="#ALM-12017__li2782670585745">4</a>.</li></ul>
|
|
</p></li><li id="ALM-12017__li1280611085745"><a name="ALM-12017__li1280611085745"></a><a name="li1280611085745"></a><span>Choose <strong id="ALM-12017__b2586367385745">O&M</strong> > <strong id="ALM-12017__b1379910713499">Alarm ></strong> <strong id="ALM-12017__b2887114614242">Thresholds</strong><strong id="ALM-12017__b29831221166"> > </strong><em id="ALM-12017__i9983102101619">Name of the desired cluster</em> > <strong id="ALM-12017__b6413578985745">Host</strong> > <strong id="ALM-12017__b4035119385745">Disk</strong> > <strong id="ALM-12017__b2761642585745">Disk Usage</strong> and click <strong id="ALM-12017__b6659180133310">Modify</strong> in the <strong id="ALM-12017__b1374719315332">Operation</strong> column to change the alarm threshold based on site requirements. As shown in <a href="#ALM-12017__fig6063892885745">Figure 1</a>:</span><p><div class="fignone" id="ALM-12017__fig6063892885745"><a name="ALM-12017__fig6063892885745"></a><a name="fig6063892885745"></a><span class="figcap"><b>Figure 1 </b>Setting an alarm threshold</span><br><span><img id="ALM-12017__image1615410501365" src="en-us_image_0000001440977873.png"></span></div>
|
|
</p></li><li id="ALM-12017__li4783109885745"><span>After 2 minutes, check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-12017__ul59050785745"><li id="ALM-12017__li4814612685745">If yes, no further action is required.</li><li id="ALM-12017__li752215285745">If no, go to <a href="#ALM-12017__li2782670585745">4</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-12017__p531456685745"><strong id="ALM-12017__b98862278588">Check whether the disk usage reaches the upper limit.</strong></p>
|
|
<ol start="4" id="ALM-12017__ol1005390085829"><li id="ALM-12017__li2782670585745"><a name="ALM-12017__li2782670585745"></a><a name="li2782670585745"></a><span>In the alarm list on FusionInsight Manager, click <span><img id="ALM-12017__image168221113135319" src="en-us_image_0269383828.png"></span> in the row where the alarm is located to view the alarm host name and disk partition information in the alarm details.</span></li><li id="ALM-12017__li3937060885745"><span>Log in to the node where the alarm is generated as user <strong id="ALM-12017__b4911375485745">root</strong>. <span id="ALM-12017__text43649449460"></span></span></li><li id="ALM-12017__li1529764085745"><span>Run the <strong id="ALM-12017__b5391142133919">df -lmPT | awk '$2 != "iso9660"' | grep '^/dev/' | awk '{"readlink -m "$1 | getline real }{$1=real; print $0}' | sort -u -k 1,1</strong> command to check the system disk partition usage. Check whether the disk is mounted to the following directories based on the disk partition name obtained in <a href="#ALM-12017__li2782670585745">4</a>: <strong id="ALM-12017__b4568855685745">/</strong>, <strong id="ALM-12017__b2096079285745">/opt</strong>, <strong id="ALM-12017__b5442940785745">/tmp</strong>, <strong id="ALM-12017__b2010261785745">/var</strong>, <strong id="ALM-12017__b4670583385745">/var/log</strong>, and <strong id="ALM-12017__b2507614885745">/srv/BigData</strong>(can be customized).</span><p><ul class="subitemlist" id="ALM-12017__ul3152589985745"><li id="ALM-12017__li1790212085745">If yes, the disk is a system disk. Then go to <a href="#ALM-12017__li6170195385745">10</a>.</li><li id="ALM-12017__li4078557985745">If no, the disk is not a system disk. Then go to <a href="#ALM-12017__li1190839985745">7</a>.</li></ul>
|
|
</p></li><li id="ALM-12017__li1190839985745"><a name="ALM-12017__li1190839985745"></a><a name="li1190839985745"></a><span>Run the <strong id="ALM-12017__b10661194925219">df -lmPT | awk '$2 != "iso9660"' | grep '^/dev/' | awk '{"readlink -m "$1 | getline real }{$1=real; print $0}' | sort -u -k 1,1</strong> command to check the system disk partition usage. Determine the role of the disk based on the disk partition name obtained in <a href="#ALM-12017__li2782670585745">4</a>.</span></li><li id="ALM-12017__li11884059152614"><span>Check the disk service.</span><p><div class="p" id="ALM-12017__p0769162644910">In <span id="ALM-12017__text13624174411515">MRS</span>, check whether the disk service is HDFS, Yarn, Kafka, Supervisor.<ul id="ALM-12017__ul148852372297"><li id="ALM-12017__li10740174317299">If yes, adjust the capacity. Then go to <a href="#ALM-12017__li1354951085745">9</a>.</li><li id="ALM-12017__li1159152152914">If no, go to <a href="#ALM-12017__li1359113885745">12</a>.</li></ul>
|
|
</div>
|
|
</p></li><li id="ALM-12017__li1354951085745"><a name="ALM-12017__li1354951085745"></a><a name="li1354951085745"></a><span>After 2 minutes, check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-12017__ul150550185745"><li id="ALM-12017__li4676654185745">If yes, no further action is required.</li><li id="ALM-12017__li2999343985745">If no, go to <a href="#ALM-12017__li1359113885745">12</a>.</li></ul>
|
|
</p></li><li id="ALM-12017__li6170195385745"><a name="ALM-12017__li6170195385745"></a><a name="li6170195385745"></a><span>Run the <strong id="ALM-12017__b5483673385745">find / -xdev -size +500M -execls -l {} \;</strong> command to check whether a file larger than 500 MB exists on the node and disk.</span><p><ul class="subitemlist" id="ALM-12017__ul5159501585745"><li id="ALM-12017__li1259039885745">If yes, go to <a href="#ALM-12017__li3133628885745">11</a>.</li><li id="ALM-12017__li1318931985745">If no, go to <a href="#ALM-12017__li1359113885745">12</a>.</li></ul>
|
|
</p></li><li id="ALM-12017__li3133628885745"><a name="ALM-12017__li3133628885745"></a><a name="li3133628885745"></a><span>Handle the large file and check whether the alarm is cleared 2 minutes later.</span><p><ul class="subitemlist" id="ALM-12017__ul2585143185745"><li id="ALM-12017__li1844667285745">If yes, no further action is required.</li><li id="ALM-12017__li1778546285745">If no, go to <a href="#ALM-12017__li1359113885745">12</a>.</li></ul>
|
|
</p></li><li id="ALM-12017__li1359113885745"><a name="ALM-12017__li1359113885745"></a><a name="li1359113885745"></a><span>Contact the system administrator to expand the disk capacity.</span></li><li id="ALM-12017__li2833807185745"><span>After 2 minutes, check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-12017__ul5088862685745"><li id="ALM-12017__li5521138285745">If yes, no further action is required.</li><li id="ALM-12017__li4293699485745">If no, go to <a href="#ALM-12017__li5603307085745">14</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-12017__p5534445785745"><strong id="ALM-12017__b657764185839">Collect fault information.</strong></p>
|
|
<ol start="14" id="ALM-12017__ol4750985985842"><li id="ALM-12017__li5603307085745"><a name="ALM-12017__li5603307085745"></a><a name="li5603307085745"></a><span>On FusionInsight Manager, choose <strong id="ALM-12017__b13819155015320">O&M</strong> > <strong id="ALM-12017__b1368243785745">Log > Download</strong>.</span></li><li id="ALM-12017__li1061898185745"><span>Select <strong id="ALM-12017__b1352831932712">OMS</strong> from the <strong id="ALM-12017__b13893145519916">Service</strong> and click <strong id="ALM-12017__b20893115513911">OK</strong>.</span></li><li id="ALM-12017__li1145664103113"><span>Click <span><img id="ALM-12017__image1945644173117" src="en-us_image_0269383829.png"></span> in the upper right corner, and set <strong id="ALM-12017__b6456941173117">Start Date</strong> and <strong id="ALM-12017__b11456154113318">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-12017__b13456164113319">Download</strong>.</span></li><li id="ALM-12017__li495644512588"><span>Contact the <span id="ALM-12017__text4614151421417">O&M personnel</span> and send the collected log information.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="ALM-12017__section1529716184534"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-12017__p4677152685316">After the fault is rectified, the system automatically clears this alarm.</p>
|
|
</div>
|
|
<div class="section" id="ALM-12017__sdc198514f48e40f5bccbcac7d37c39b0"><h4 class="sectiontitle">Related Information</h4><p id="ALM-12017__en-us_topic_0070543559_p22957827">None</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|