forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
87 lines
11 KiB
HTML
87 lines
11 KiB
HTML
<a name="ALM-14000"></a><a name="ALM-14000"></a>
|
|
|
|
<h1 class="topictitle1">ALM-14000 HDFS Service Unavailable</h1>
|
|
<div id="body51272698"><div class="section" id="ALM-14000__s9e3ac4c4ffe142b3bd38535b8a4d0093"><h4 class="sectiontitle">Description</h4><p id="ALM-14000__en-us_topic_0070543637_p5800237">The system checks the NameService service status every 60 seconds. This alarm is generated when all the NameService services are abnormal and the system considers that the HDFS service is unavailable.</p>
|
|
<p id="ALM-14000__en-us_topic_0070543637_p52202136">This alarm is cleared when at least one NameService service is normal and the system considers that the HDFS service recovers.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14000__s313ef18ac5a6422f8d98fd5705fa77d5"><h4 class="sectiontitle">Attribute</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14000__en-us_topic_0070543637_table514661" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14000__en-us_topic_0070543637_row43527725"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-14000__en-us_topic_0070543637_p36084843">Alarm ID</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-14000__en-us_topic_0070543637_p37191210">Alarm Severity</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-14000__en-us_topic_0070543637_p59698066">Automatically Cleared</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-14000__en-us_topic_0070543637_row3705162"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-14000__en-us_topic_0070543637_p31682682">14000</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-14000__en-us_topic_0070543637_p16160459">Critical</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-14000__en-us_topic_0070543637_p33928804">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-14000__s5f4d590ca0264792811b29ca462e8f36"><h4 class="sectiontitle">Parameters</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14000__en-us_topic_0070543637_table63878638" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14000__en-us_topic_0070543637_row48934293"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-14000__en-us_topic_0070543637_p4254779">Name</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-14000__en-us_topic_0070543637_p9092855">Meaning</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-14000__row686981153715"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14000__p192431315431">Source</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14000__p692551319435">Specifies the cluster for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14000__en-us_topic_0070543637_row65432622"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14000__en-us_topic_0070543637_p65551050">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14000__en-us_topic_0070543637_p8034867">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14000__en-us_topic_0070543637_row5204944"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14000__en-us_topic_0070543637_p18947286">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14000__en-us_topic_0070543637_p58335197">Specifies the role for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14000__en-us_topic_0070543637_row55254728"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14000__en-us_topic_0070543637_p46447987">HostName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14000__en-us_topic_0070543637_p4190578">Specifies the host for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-14000__s8a48d5b3ff234dbdb7eca796d5c5a3b6"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-14000__en-us_topic_0070543637_p3892526">HDFS fails to provide services for HDFS service-based upper-layer components, such as HBase and MapReduce. As a result, users cannot read or write files.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14000__s33123502891b4993aaa4a6dcdc7e307c"><h4 class="sectiontitle">Possible Causes</h4><ul id="ALM-14000__en-us_topic_0070543637_ul46859150"><li id="ALM-14000__en-us_topic_0070543637_li19079167">The ZooKeeper service is abnormal.</li><li id="ALM-14000__en-us_topic_0070543637_li37494778">All NameService services are abnormal.</li></ul>
|
|
</div>
|
|
<div class="section" id="ALM-14000__s22d890e0bfe74c338c8bbf73bcd6c507"><h4 class="sectiontitle">Procedure</h4><p class="tableheading" id="ALM-14000__en-us_topic_0070543637_p17178203"><strong id="ALM-14000__b58035481162055">Check the ZooKeeper service status.</strong></p>
|
|
<ol id="ALM-14000__ol64416999162123"><li id="ALM-14000__li8332755162114"><span>On the FusionInsight Manager portal, choose<strong id="ALM-14000__b7257171810181"> O&M > Alarm > Alarms</strong>. On the Alarm page, check whether <strong id="ALM-14000__b8782195610196">ALM-13000 ZooKeeper Service Unavailable</strong> is reported.</span><p><ul class="subitemlist" id="ALM-14000__ul53121644162114"><li id="ALM-14000__li39564916162114">If yes, go to <a href="#ALM-14000__li31253013162114">2</a>.</li><li id="ALM-14000__li50641647162114">If no, go to <a href="#ALM-14000__li57039289162114">4</a>.</li></ul>
|
|
</p></li><li id="ALM-14000__li31253013162114"><a name="ALM-14000__li31253013162114"></a><a name="li31253013162114"></a><span>See <strong id="ALM-14000__b336715715110">ALM-13000 ZooKeeper Service Unavailable</strong> to rectify the health status of ZooKeeper fault and check whether the <strong id="ALM-14000__b1267814311478">Running</strong><strong id="ALM-14000__b2679831275"> Status</strong> of the ZooKeeper service restores to <strong id="ALM-14000__b3864518162114">Normal</strong>.</span><p><ul class="subitemlist" id="ALM-14000__ul25842178162114"><li id="ALM-14000__li44590579162114">If yes, go to <a href="#ALM-14000__li19570784162114">3</a>.</li><li id="ALM-14000__li55067136162114">If no, go to <a href="#ALM-14000__li44697640162114">7</a>.</li></ul>
|
|
</p></li><li id="ALM-14000__li19570784162114"><a name="ALM-14000__li19570784162114"></a><a name="li19570784162114"></a><span>On the <strong id="ALM-14000__b12841667162114">O&M > Alarm > Alarms</strong> page, check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-14000__ul32553289162114"><li id="ALM-14000__li48466139162114">If yes, no further action is required.</li><li id="ALM-14000__li33443193162114">If no, go to <a href="#ALM-14000__li57039289162114">4</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-14000__p24544152162114"><strong id="ALM-14000__b10870298162128">Handle the NameService service exception alarm.</strong></p>
|
|
<ol start="4" id="ALM-14000__ol20945793162139"><li id="ALM-14000__li57039289162114"><a name="ALM-14000__li57039289162114"></a><a name="li57039289162114"></a><span>On the FusionInsight Manager portal, choose<strong id="ALM-14000__b1737194719541"> O&M > Alarm</strong> <strong id="ALM-14000__b31216961315">> Alarms</strong>. On the Alarms page, check whether <strong id="ALM-14000__b1240458115410">ALM-14010 NameService Service Unavailable</strong> is reported.</span><p><ul class="subitemlist" id="ALM-14000__ul51076941162114"><li id="ALM-14000__li40022392162114">If yes, go to <a href="#ALM-14000__li25596313162114">5</a>.</li><li id="ALM-14000__li20588296162114">If no, go to <a href="#ALM-14000__li44697640162114">7</a>.</li></ul>
|
|
</p></li><li id="ALM-14000__li25596313162114"><a name="ALM-14000__li25596313162114"></a><a name="li25596313162114"></a><span>See<strong id="ALM-14000__b243075717520"> ALM-14010 NameService Service Unavailable</strong> to handle the abnormal NameService services and check whether each NameService service exception alarm is cleared.</span><p><ul class="subitemlist" id="ALM-14000__ul10300575162114"><li id="ALM-14000__li41255242162114">If yes, go to <a href="#ALM-14000__li7149629162114">6</a>.</li><li id="ALM-14000__li53340291162114">If no, go to <a href="#ALM-14000__li44697640162114">7</a>.</li></ul>
|
|
</p></li><li id="ALM-14000__li7149629162114"><a name="ALM-14000__li7149629162114"></a><a name="li7149629162114"></a><span>On the <strong id="ALM-14000__b29040226162114"><strong id="ALM-14000__b1940717581898">O&M > </strong>Alarm > Alarms<strong id="ALM-14000__b31973538119"> </strong></strong>page, check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-14000__ul10858825162114"><li id="ALM-14000__li60035443162114">If yes, no further action is required.</li><li id="ALM-14000__li31032697162114">If no, go to <a href="#ALM-14000__li44697640162114">7</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-14000__p30620565162114"><strong id="ALM-14000__b20614373162144">Collect fault information.</strong></p>
|
|
<ol start="7" id="ALM-14000__ol64763966162146"><li id="ALM-14000__li44697640162114"><a name="ALM-14000__li44697640162114"></a><a name="li44697640162114"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-14000__b39977366113627">O&M</strong> > <strong id="ALM-14000__b24251979113627">Log > Download</strong>.</span></li><li id="ALM-14000__li37045000162114"><span>Select the following nodes in the required cluster from the <strong id="ALM-14000__b66734448162114">Service</strong>:</span><p><ul class="subitemlist" id="ALM-14000__ul26485732162114"><li id="ALM-14000__li36781182162114">ZooKeeper</li><li id="ALM-14000__li62595182162114">HDFS</li></ul>
|
|
</p></li><li id="ALM-14000__li1145664103113"><span>Click <span><img id="ALM-14000__image1945644173117" src="en-us_image_0269383957.png"></span> in the upper right corner, and set <strong id="ALM-14000__b6456941173117">Start Date</strong> and <strong id="ALM-14000__b11456154113318">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14000__b13456164113319">Download</strong>.</span></li><li id="ALM-14000__li41484560162114"><span>Contact the <span id="ALM-14000__text4614151421417">O&M personnel</span> and send the collected logs.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="ALM-14000__section1529716184534"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-14000__p4677152685316">After the fault is rectified, the system automatically clears this alarm.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14000__s6fc058b1eed946569b4af3ffc440a3bb"><h4 class="sectiontitle">Related Information</h4><p id="ALM-14000__en-us_topic_0070543637_p61327959">None</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|