forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
82 lines
9.9 KiB
HTML
82 lines
9.9 KiB
HTML
<a name="ALM-14034"></a><a name="ALM-14034"></a>
|
|
|
|
<h1 class="topictitle1">ALM-14034 Router Process Is Abnormal</h1>
|
|
<div id="body0000002008256525"><div class="section" id="ALM-14034__section979815471118"><h4 class="sectiontitle"><span id="ALM-14034__text1079812471120">Alarm Description</span></h4><p id="ALM-14034__p8353691349">The Router process checks the process status every 20 seconds. This alarm is generated when the process status is abnormal and does not recover for a long time.</p>
|
|
<p id="ALM-14034__p197982471413">This alarm is cleared when the process status recovers.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14034__section18798204714110"><h4 class="sectiontitle"><span id="ALM-14034__text2798164712118">Alarm Attributes</span></h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14034__table87986471415" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14034__row167981047613"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-14034__p12798647315"><span id="ALM-14034__text10798547517">Alarm ID</span></p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-14034__p16798124719115"><span id="ALM-14034__text157981347317">Alarm Severity</span></p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-14034__p17992471410"><span id="ALM-14034__text15799194720117">Auto Cleared</span></p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-14034__row67994478118"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-14034__p18799747419">14034</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-14034__p279974710111">Major</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-14034__p107994471713">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-14034__section19799184712110"><h4 class="sectiontitle"><span id="ALM-14034__text27993470117">Alarm Parameters</span></h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14034__table3799204720116" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14034__row1879915471215"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-14034__p177993479118"><span id="ALM-14034__text207998471417">Parameter</span></p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-14034__p579954720114"><span id="ALM-14034__text127995473116">Description</span></p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-14034__row1179918471011"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14034__p859219498522">Source</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14034__p2059134995215">Specifies the cluster for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14034__row1279964711115"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14034__p1059010490521">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14034__p35886492524">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14034__row079994716117"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14034__p12587144965212">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14034__p145851849195219">Specifies the role for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-14034__row16592124952318"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14034__p51620924">HostName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14034__p34048007">Specifies the host for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-14034__section0799144716115"><h4 class="sectiontitle"><span id="ALM-14034__text479911470117">Impact on the System</span></h4><p id="ALM-14034__p8799247918">If the process status is abnormal, the process cannot provide services properly. As a result, the entire service may become abnormal.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14034__section1479910471912"><h4 class="sectiontitle"><span id="ALM-14034__text187997470114">Possible Causes</span></h4><p id="ALM-14034__p1626235122417">The host responds slowly to I/O (disk I/O and network I/O) requests and some processes are in the D state and Z state. The process may also be suspended and enter the T state.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14034__section179924719116"><h4 class="sectiontitle"><span id="ALM-14034__text1799947611">Handling Procedure</span></h4><p id="ALM-14034__p1243515278455"><strong id="ALM-14034__b34831828145411">Check whether the process is in the D, Z, or T state.</strong></p>
|
|
<ol id="ALM-14034__ol67999471216"><li id="ALM-14034__li1980611196816"><span>Log in to FusionInsight Manager and choose <strong id="ALM-14034__b1937751732912">O&M</strong> > <strong id="ALM-14034__b1837741713297">Alarm</strong> > <strong id="ALM-14034__b6377121732913">Alarms</strong>. Wait for about 10 minutes and check whether the alarm is automatically cleared.</span><p><ul id="ALM-14034__ul10505203319910"><li id="ALM-14034__li5505533895">If the alarm is not in the list, no further action is required.</li><li id="ALM-14034__li350517336917">If the alarm is in the list, view the alarm details and record the IP address of the host where the alarm is generated. Run the command in <a href="#ALM-14034__li16811215432">2</a>.</li></ul>
|
|
</p></li><li id="ALM-14034__li16811215432"><a name="ALM-14034__li16811215432"></a><a name="li16811215432"></a><span>Log in to the host where the alarm is generated as the <strong id="ALM-14034__b156544532910">root</strong> user and run the <strong id="ALM-14034__b175661745142920">su - omm</strong> command to switch to the <strong id="ALM-14034__b456614458299">omm</strong> user.</span></li><li id="ALM-14034__li129386734811"><span>Run the following command to check whether the process state is abnormal:</span><p><p id="ALM-14034__p114995439534"><strong id="ALM-14034__b105101533205318">ps ww -eo stat,cmd| grep -w org.apache.hadoop.hdfs.server.federation.router.DFSRouter | grep -v grep | awk '{print$1}'</strong></p>
|
|
</p></li><li id="ALM-14034__li3621133502116"><span>Check whether the command output contains any abnormal state (D, Z, or T).</span><p><ul id="ALM-14034__ul161804819579"><li id="ALM-14034__li670603211">If the output contains any abnormal state, go to <a href="#ALM-14034__li39471558560">5</a>.</li><li id="ALM-14034__li47070182111">If the output does not contain abnormal states, go to <a href="#ALM-14034__li17799174711116">7</a>.</li></ul>
|
|
</p></li><li id="ALM-14034__li39471558560"><a name="ALM-14034__li39471558560"></a><a name="li39471558560"></a><span>Switch to user <strong id="ALM-14034__b449481414300">root</strong> and run the <strong id="ALM-14034__b149421411305">reboot</strong> command to restart the host for which the alarm is generated. (Restarting a host is risky. Ensure that the service process is normal after the restart.)</span></li><li id="ALM-14034__li7936132616563"><span>Wait 5 minutes and check whether the alarm is cleared.</span><p><ul id="ALM-14034__ul19652752195618"><li id="ALM-14034__li1365317526566">If the alarm is cleared, no further action is required.</li><li id="ALM-14034__li2065375285614">If the alarm fails to be cleared, go to <a href="#ALM-14034__li17799174711116">7</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-14034__p2079910471716"><strong id="ALM-14034__b958603455414">Collect fault information.</strong></p>
|
|
<ol start="7" id="ALM-14034__ol37994471410"><li id="ALM-14034__li17799174711116"><a name="ALM-14034__li17799174711116"></a><a name="li17799174711116"></a><span>On FusionInsight Manager, choose <strong id="ALM-14034__b1261064693015">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-14034__b186101846163011">Log</strong> > <strong id="ALM-14034__b761174619309">Download</strong>.</span></li><li id="ALM-14034__li177999474110"><span>Expand the drop-down list next to the <strong id="ALM-14034__b3138154923017">Service</strong> field. In the <strong id="ALM-14034__b1513820496307">Services</strong> dialog box that is displayed, select <strong id="ALM-14034__b313984916304">HDFS</strong> for the target cluster.</span></li><li id="ALM-14034__li5799147219"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-14034__b14685025112516">Start Date</strong> and <strong id="ALM-14034__b96858253253">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14034__b5685225202516">Download</strong>.</span></li><li id="ALM-14034__li57991247416"><span>Contact <span id="ALM-14034__text640375883017">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="ALM-14034__section979934710111"><h4 class="sectiontitle"><span id="ALM-14034__text1379918471115">Alarm Clearance</span></h4><p id="ALM-14034__p27991247919">This alarm is automatically cleared after the fault is rectified.</p>
|
|
</div>
|
|
<div class="section" id="ALM-14034__section879913471915"><h4 class="sectiontitle"><span id="ALM-14034__text16799164711115">Related Information</span></h4><p id="ALM-14034__p1779913479110"><span id="ALM-14034__text879984715119">None.</span></p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|