forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
83 lines
9.8 KiB
HTML
83 lines
9.8 KiB
HTML
<a name="ALM-12089"></a><a name="ALM-12089"></a>
|
|
|
|
<h1 class="topictitle1">ALM-12089 Inter-Node Network Is Abnormal</h1>
|
|
<div id="body1553225075085"><div class="section" id="ALM-12089__s4ed51c9f0d9a477fbf50d8ce120581b4"><h4 class="sectiontitle">Description</h4><p id="ALM-12089__p928381012517">The alarm module checks the network health status of nodes in the cluster every 10 seconds. This alarm is generated when the network between two nodes is unreachable or the network status is unstable.</p>
|
|
</div>
|
|
<div class="section" id="ALM-12089__s513710eed6ad4012b965eb6d83223b70"><h4 class="sectiontitle">Attribute</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12089__en-us_topic_0070543632_table47383040" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12089__en-us_topic_0070543632_row31563057"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-12089__en-us_topic_0070543632_p6470829">Alarm ID</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-12089__en-us_topic_0070543632_p54375137">Alarm Severity</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-12089__en-us_topic_0070543632_p42310006">Auto Clear</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-12089__en-us_topic_0070543632_row4558484"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-12089__en-us_topic_0070543632_p33692898">12089</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-12089__en-us_topic_0070543632_p44770184">Major</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-12089__en-us_topic_0070543632_p2506287">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-12089__en-us_topic_0070543632_section463484"><h4 class="sectiontitle">Parameters</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12089__en-us_topic_0070543632_table1682686" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12089__en-us_topic_0070543632_row39854064"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-12089__en-us_topic_0070543632_p6953712">Name</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-12089__en-us_topic_0070543632_p26379772">Meaning</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-12089__row1855773633815"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12089__p192431315431">Source</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12089__p692551319435">Specifies the cluster or system for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-12089__en-us_topic_0070543632_row56386813"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12089__en-us_topic_0070543632_p3929179">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12089__en-us_topic_0070543632_p49828093">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-12089__en-us_topic_0070543632_row45799660"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12089__en-us_topic_0070543632_p18784943">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12089__en-us_topic_0070543632_p45185452">Specifies the role for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-12089__en-us_topic_0070543632_row4015887"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12089__en-us_topic_0070543632_p56851411">HostName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12089__en-us_topic_0070543632_p41561572">Specifies the host for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-12089__s0c40eacbbe5d4468af530e88a6f42993"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-12089__en-us_topic_0070543632_p11044137">Functions of some components, such as HDFS and ZooKeeper, are affected.</p>
|
|
</div>
|
|
<div class="section" id="ALM-12089__s9736acd82e1f45699a9799949d179cc9"><h4 class="sectiontitle">Possible Causes</h4><ul id="ALM-12089__en-us_topic_0070543632_ul22159882"><li id="ALM-12089__li1399211015453">The node breaks down.</li><li id="ALM-12089__li82581912164517">The network is faulty.</li></ul>
|
|
</div>
|
|
<div class="section" id="ALM-12089__section15180193720484"><h4 class="sectiontitle">Procedure</h4><p class="tableheading" id="ALM-12089__p2650131165116"><strong id="ALM-12089__b166571125118">Check the network health status.</strong></p>
|
|
<ol id="ALM-12089__ol169991538233"><li id="ALM-12089__li89975381230"><span>In the alarm list on FusionInsight Manager, click the drop-down button of the alarm and view <strong id="ALM-12089__b3944155215402">Additional Information</strong>. Record the source IP address and destination IP address of the node for which the alarm is reported.</span></li><li id="ALM-12089__li189988381537"><a name="ALM-12089__li189988381537"></a><a name="li189988381537"></a><span>Log in to the node for which the alarm is reported. On the node, ping the target node to check whether the network between the two nodes is normal.</span><p><ul id="ALM-12089__ul12998183817318"><li id="ALM-12089__li199818381732">If yes, go to <a href="#ALM-12089__li1646022411214">6</a>.</li><li id="ALM-12089__li89983383314">If no, go to <a href="#ALM-12089__li184601124820">3</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p id="ALM-12089__p7444924221"><strong id="ALM-12089__b139301335321">Check the node status.</strong></p>
|
|
<ol start="3" id="ALM-12089__ol346072412212"><li id="ALM-12089__li184601124820"><a name="ALM-12089__li184601124820"></a><a name="li184601124820"></a><span>On FusionInsight Manager, click <strong id="ALM-12089__b1745932414213">Host </strong>and check whether the host list contains the faulty node to determine whether the faulty node has been removed from the cluster.</span><p><ul id="ALM-12089__ul34591924623"><li id="ALM-12089__li104594241025">If yes, go to <a href="#ALM-12089__li746012241226">5</a>.</li><li id="ALM-12089__li144599241629">If no, go to <a href="#ALM-12089__li19460824120">4</a>.</li></ul>
|
|
</p></li><li id="ALM-12089__li19460824120"><a name="ALM-12089__li19460824120"></a><a name="li19460824120"></a><span>Check whether the faulty node is powered off.</span><p><ul id="ALM-12089__ul19460224220"><li id="ALM-12089__li114603244210">If yes, start the faulty node and go to <a href="#ALM-12089__li189988381537">2</a>.</li><li id="ALM-12089__li34603245211">If no, contact related personnel to find root cause, if need to remove the faulty nodes from the cluster and go to <a href="#ALM-12089__li746012241226">5</a>, otherwise go to <a href="#ALM-12089__li1646022411214">6</a>.</li></ul>
|
|
</p></li><li id="ALM-12089__li746012241226"><a name="ALM-12089__li746012241226"></a><a name="li746012241226"></a><span>Remove the file <strong id="ALM-12089__b104603246218">$NODE_AGENT_HOME/etc/agent/hosts.ini</strong> of all nodes in the cluster, and clean up the file <strong id="ALM-12089__b1846014247211">/var/log/Bigdata/unreachable/unreachable_ip_info.log</strong>, and then manually clear the alarm.</span></li><li id="ALM-12089__li1646022411214"><a name="ALM-12089__li1646022411214"></a><a name="li1646022411214"></a><span>Wait for 30 seconds and checking if the alarm was been cleared.</span><p><ul id="ALM-12089__ul152511853174013"><li id="ALM-12089__li74987205418">If yes, no further action is required.</li><li id="ALM-12089__li6252553134015">If no, go to <a href="#ALM-12089__li69951938132">7</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p id="ALM-12089__p1576819394241"><strong id="ALM-12089__b18931332152418">Collect fault information.</strong></p>
|
|
<ol start="7" id="ALM-12089__ol149961038932"><li id="ALM-12089__li69951938132"><a name="ALM-12089__li69951938132"></a><a name="li69951938132"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-12089__b79951738330">O&M</strong> > <strong id="ALM-12089__b199959384316">Log > Download</strong>.</span></li><li id="ALM-12089__li60837487151427"><span>Select <strong id="ALM-12089__b1212019253363">OmmAgent</strong> from the <strong id="ALM-12089__b33891259151427">Service</strong> and click <strong id="ALM-12089__b3991118545">OK</strong>.</span></li><li id="ALM-12089__li1199518381837"><span>Click <span><img id="ALM-12089__image599573816317" src="en-us_image_0269383936.png"></span> in the upper right corner, and set <strong id="ALM-12089__b39958385313">Start Date</strong> and <strong id="ALM-12089__b179959384319">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-12089__b159959381732">Download</strong>.</span></li><li id="ALM-12089__li495644512588"><span>Contact the <span id="ALM-12089__text4614151421417">O&M personnel</span> and send the collected log information.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="ALM-12089__section1529716184534"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-12089__p4677152685316">After the fault is rectified, the system automatically clears this alarm.</p>
|
|
</div>
|
|
<div class="section" id="ALM-12089__sb2d3fdce13c3410687c752df0a484012"><h4 class="sectiontitle">Related Information</h4><p id="ALM-12089__en-us_topic_0070543632_p9607124">None</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|