doc-exports/docs/mrs/umn/ALM-18021.html
Yang, Tong 3b1f73dece MRS UMN 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-13 12:03:34 +00:00

95 lines
13 KiB
HTML

<a name="ALM-18021"></a><a name="ALM-18021"></a>
<h1 class="topictitle1">ALM-18021 Mapreduce Service Unavailable</h1>
<div id="body1505269773422"><div class="section" id="ALM-18021__section6317837154848"><h4 class="sectiontitle">Description</h4><p id="ALM-18021__p41982809154848">The alarm module checks the MapReduce service status every 60 seconds. This alarm is generated when the system detects that the MapReduce service is unavailable.</p>
<p id="ALM-18021__p42300969154848">The alarm is cleared when the MapReduce service recovers.</p>
</div>
<div class="section" id="ALM-18021__section45164402154848"><h4 class="sectiontitle">Attribute</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-18021__table34437951154848" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-18021__row11436914154848"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-18021__p53974817154848">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-18021__p9884045154848">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-18021__p62410141154848">Automatically Cleared</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-18021__row22056633154848"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-18021__p41756874154848">18021</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-18021__p26863672154848">Critical</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-18021__p28473834154848">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-18021__section24679241154848"><h4 class="sectiontitle">Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-18021__table52861500154848" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-18021__row51266307154848"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-18021__p58930217154848">Name</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-18021__p8618279154848">Meaning</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-18021__row6791175121110"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18021__p192431315431">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18021__p692551319435">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-18021__row26992038154848"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18021__p38871502154848">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18021__p61583942154848">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-18021__row17384567154848"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18021__p65972715154848">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18021__p42189699154848">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-18021__row44162976154848"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18021__p20431338154848">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18021__p44325709154848">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-18021__section33612656154848"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-18021__p38270587154848">The cluster cannot provide the MapReduce service. For example, MapReduce cannot be used to view task logs or the log archive function is unavailable.</p>
</div>
<div class="section" id="ALM-18021__section49170703155012"><h4 class="sectiontitle">Possible Causes</h4><ul id="ALM-18021__ul45274515155036"><li id="ALM-18021__li4817457155036">The JobHistoryServer instance is abnormal.</li><li id="ALM-18021__li3259877995713">The KrbServer service is abnormal.</li><li id="ALM-18021__li2495356195713">The ZooKeeper service abnormal.</li><li id="ALM-18021__li2325546295713">The HDFS service abnormal.</li><li id="ALM-18021__li39763301155138">The Yarn service is abnormal.</li></ul>
</div>
<div class="section" id="ALM-18021__sc0f25eebb8744e519b7fb96a61670ee6"><h4 class="sectiontitle">Procedure</h4><p id="ALM-18021__p35798424155927"><strong id="ALM-18021__b59551096155927">Check </strong><strong id="ALM-18021__b61838427155939">MapReduce service JobHistoryServer instance status.</strong></p>
<ol id="ALM-18021__ol3997015174744"><li id="ALM-18021__li45230096174725"><span>On the FusionInsight Manager home page, choose <strong id="ALM-18021__b195002379523">Cluster</strong> &gt; <em id="ALM-18021__i1021725095817">Name of the desired cluster</em> &gt; <strong id="ALM-18021__b5400385716217">Services</strong> &gt; <strong id="ALM-18021__b1627267316217">MapReduce</strong> &gt; <strong id="ALM-18021__b1223633616217">Instance</strong>.</span></li><li id="ALM-18021__li399988916235"><span>Check whether the running status of JobHistoryServer is <strong id="ALM-18021__b18265102275910">Normal</strong>.</span><p><ul id="ALM-18021__ul5763173016244"><li id="ALM-18021__li4892352216244">If yes, go to <a href="#ALM-18021__li795116716116">11</a>.</li><li id="ALM-18021__li3765851516244">If no, go to <a href="#ALM-18021__li2896399895811">3</a>.</li></ul>
</p></li></ol>
<p id="ALM-18021__p1071823395811"><strong id="ALM-18021__b2935523795811">Check the KrbServer service status.</strong></p>
<ol start="3" id="ALM-18021__ol5576598210629"><li id="ALM-18021__li2896399895811"><a name="ALM-18021__li2896399895811"></a><a name="li2896399895811"></a><span>In the alarm list on FusionInsight Manager, check whether <strong id="ALM-18021__b117062033162513">ALM-25500 KrbServer Service Unavailable</strong> exists.</span><p><ul id="ALM-18021__ul4257191995811"><li id="ALM-18021__li4760295595811">If yes, go to <a href="#ALM-18021__li6544511395811">4</a>.</li><li id="ALM-18021__li3063412695811">If no, go to <a href="#ALM-18021__li4793762895811">5</a>.</li></ul>
</p></li><li id="ALM-18021__li6544511395811"><a name="ALM-18021__li6544511395811"></a><a name="li6544511395811"></a><span>Rectify the fault by following the steps provided in <strong id="ALM-18021__b12966124511251">ALM-25500 KrbServer Service Unavailable</strong>, and check whether the alarm is cleared.</span><p><ul id="ALM-18021__ul6656280595811"><li id="ALM-18021__li6219433795811">If yes, no further action is required.</li><li id="ALM-18021__li2287812195811">If no, go to <a href="#ALM-18021__li4793762895811">5</a>.</li></ul>
</p></li></ol>
<p id="ALM-18021__p2181874510631"><strong id="ALM-18021__b6215097810631">Check the ZooKeeper service.</strong></p>
<ol start="5" id="ALM-18021__ol6378358710648"><li id="ALM-18021__li4793762895811"><a name="ALM-18021__li4793762895811"></a><a name="li4793762895811"></a><span>In the alarm list on FusionInsight Manager, check whether <strong id="ALM-18021__b14445105111256">ALM-13000 ZooKeeper Service Unavailable</strong> exists.</span><p><ul id="ALM-18021__ul5774263495811"><li id="ALM-18021__li4992165895811">If yes, go to <a href="#ALM-18021__li4474654695811">6</a>.</li><li id="ALM-18021__li1712251695811">If no, go to <a href="#ALM-18021__li247717695811">7</a>.</li></ul>
</p></li><li id="ALM-18021__li4474654695811"><a name="ALM-18021__li4474654695811"></a><a name="li4474654695811"></a><span>Rectify the fault by following the steps provided in <strong id="ALM-18021__b1857199152613">ALM-13000 ZooKeeper Service Unavailable</strong>, and check whether the alarm is cleared.</span><p><ul id="ALM-18021__ul59161395811"><li id="ALM-18021__li532451895811">If yes, no further action is required.</li><li id="ALM-18021__li4792066895811">If no, go to <a href="#ALM-18021__li247717695811">7</a>.</li></ul>
</p></li></ol>
<p id="ALM-18021__p5060624610650"><strong id="ALM-18021__b5280303110650">Check the HDFS service status.</strong></p>
<ol start="7" id="ALM-18021__ol1234104710656"><li id="ALM-18021__li247717695811"><a name="ALM-18021__li247717695811"></a><a name="li247717695811"></a><span>In the alarm list on FusionInsight Manager, check whether <strong id="ALM-18021__b592593502611">ALM-14000 HDFS Service Unavailable</strong> exists.</span><p><ul id="ALM-18021__ul6643354095811"><li id="ALM-18021__li6103095595811">If yes, go to <a href="#ALM-18021__li5261529695811">8</a>.</li><li id="ALM-18021__li4456030995811">If no, go to <a href="#ALM-18021__li19148237174725">9</a>.</li></ul>
</p></li><li id="ALM-18021__li5261529695811"><a name="ALM-18021__li5261529695811"></a><a name="li5261529695811"></a><span>Rectify the fault by following the steps provided in <strong id="ALM-18021__b162721650122612">ALM-14000 HDFS Service Unavailable</strong>, and check whether the alarm is cleared.</span><p><ul id="ALM-18021__ul534864411613"><li id="ALM-18021__li7348144418613">If yes, no further action is required.</li><li id="ALM-18021__li334813441864">If no, go to <a href="#ALM-18021__li19148237174725">9</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-18021__p7491116174725"><strong id="ALM-18021__b6698763616359">Check the Yarn service status.</strong></p>
<ol start="9" id="ALM-18021__ol28880248174816"><li id="ALM-18021__li19148237174725"><a name="ALM-18021__li19148237174725"></a><a name="li19148237174725"></a><span>In the alarm list on FusionInsight Manager, check whether <strong id="ALM-18021__b1392285652619">ALM-18000 Yarn Service Unavailable</strong> exists.</span><p><ul id="ALM-18021__ul2036310116510"><li id="ALM-18021__li4905018516510">If yes, go to <a href="#ALM-18021__li13219687174725">10</a></li><li id="ALM-18021__li600337816547">If no, go to <a href="#ALM-18021__li795116716116">11</a>.</li></ul>
</p></li><li id="ALM-18021__li13219687174725"><a name="ALM-18021__li13219687174725"></a><a name="li13219687174725"></a><span>Rectify the fault by following the steps provided in<strong id="ALM-18021__b3996303110324"> </strong><strong id="ALM-18021__b3390137202717">ALM-18000 Yarn Service Unavailable</strong>, and check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-18021__ul34131890174725"><li id="ALM-18021__li7503360174725">If yes, no further action is required.</li><li id="ALM-18021__li3792432174725">If no, go to <a href="#ALM-18021__li795116716116">11</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-18021__p25148622174725"><strong id="ALM-18021__b51806081175029">Collect fault information.</strong></p>
<ol start="11" id="ALM-18021__ol40182044175016"><li id="ALM-18021__li795116716116"><a name="ALM-18021__li795116716116"></a><a name="li795116716116"></a><span>On the FusionInsight Manager home page of the active cluster, choose <strong id="ALM-18021__b1080535644315">O&amp;M</strong> &gt; <strong id="ALM-18021__b48621597161147">Log &gt; Download.</strong></span></li><li id="ALM-18021__li29307826161122"><span>Select<strong id="ALM-18021__b17128236161122"> MapReduce</strong> in the required cluster from the <strong id="ALM-18021__b19936400161122">Service</strong><strong id="ALM-18021__b58853657161145">.</strong></span></li><li id="ALM-18021__li1145664103113"><span>Click <span><img id="ALM-18021__image1945644173117" src="en-us_image_0269417409.png"></span> in the upper right corner, and set <strong id="ALM-18021__b6456941173117">Start Date</strong> and <strong id="ALM-18021__b11456154113318">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-18021__b13456164113319">Download</strong>.</span></li><li id="ALM-18021__li56082450174725"><span>Contact the <span id="ALM-18021__text4614151421417">O&amp;M personnel</span> and send the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-18021__section1529716184534"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-18021__p4677152685316">After the fault is rectified, the system automatically clears this alarm.</p>
</div>
<div class="section" id="ALM-18021__sb7f9c95269284e9eac231cf07c9638a5"><h4 class="sectiontitle">Related Information</h4><p id="ALM-18021__en-us_topic_0070543681_p31573738">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>