doc-exports/docs/mrs/umn/ALM-12010.html
Yang, Tong 3b1f73dece MRS UMN 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-13 12:03:34 +00:00

93 lines
14 KiB
HTML

<a name="ALM-12010"></a><a name="ALM-12010"></a>
<h1 class="topictitle1">ALM-12010 Manager Heartbeat Interruption Between the Active and Standby Nodes</h1>
<div id="body52794692"><div class="section" id="ALM-12010__s280ef4e111974c26b59b1fff047f7699"><h4 class="sectiontitle">Description</h4><p id="ALM-12010__en-us_topic_0070543674_p63605632">This alarm is generated when the active Mager does not receive the heartbeat signal from the standby Manager within 7 seconds.</p>
<p id="ALM-12010__en-us_topic_0070543674_p35579781">This alarm is cleared when the active Manager receives heartbeat signals from the standby Manager.</p>
</div>
<div class="section" id="ALM-12010__s499887f79aa24499a2d2e7e398da0453"><h4 class="sectiontitle">Attribute</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12010__en-us_topic_0070543674_table63390002" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12010__en-us_topic_0070543674_row446859"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-12010__en-us_topic_0070543674_p36195658">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-12010__en-us_topic_0070543674_p46167218">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-12010__en-us_topic_0070543674_p48557162">Auto Clear</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-12010__en-us_topic_0070543674_row40816026"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-12010__en-us_topic_0070543674_p17763819">12010</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-12010__en-us_topic_0070543674_p29583236">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-12010__en-us_topic_0070543674_p47431950">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-12010__s69e79a64c37f4996a9e6280d78e16d58"><h4 class="sectiontitle">Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12010__en-us_topic_0070543674_table16782769" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12010__en-us_topic_0070543674_row9145947"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-12010__en-us_topic_0070543674_p2624279">Name</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-12010__en-us_topic_0070543674_p11240014">Meaning</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-12010__row113122810557"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12010__p192431315431">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12010__p692551319435">Specifies the cluster or system for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-12010__en-us_topic_0070543674_row38025962"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12010__en-us_topic_0070543674_p60204115">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12010__en-us_topic_0070543674_p44695152">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-12010__en-us_topic_0070543674_row66712054"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12010__en-us_topic_0070543674_p34967293">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12010__en-us_topic_0070543674_p13778476">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-12010__en-us_topic_0070543674_row56897427"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12010__en-us_topic_0070543674_p45288850">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12010__en-us_topic_0070543674_p44518222">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-12010__s51cab4675b644be49bc4ff774ddbd51c"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-12010__en-us_topic_0070543674_p9588202">When the active Manager process is abnormal, an active/standby failover cannot be performed, and services are affected.</p>
</div>
<div class="section" id="ALM-12010__s7843db533b38470ea902ef6788b89a22"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-12010__en-us_topic_0070543674_p38446893"></p>
<ul id="ALM-12010__ul11347112011510"><li id="ALM-12010__li17347132014154">The link between the active and standby Manager is abnormal.</li><li id="ALM-12010__li127451022151512">The node name configuration is incorrect.</li><li id="ALM-12010__li15347620181517">The port is disabled by the firewall.</li></ul>
</div>
<div class="section" id="ALM-12010__s8af1753e22d647b9b1328244e85fc0a1"><h4 class="sectiontitle">Procedure</h4><p class="tableheading" id="ALM-12010__en-us_topic_0070543674_p27190637"><strong id="ALM-12010__b5350194613159">Check whether the network between the active and standby Manager server is normal.</strong></p>
<ol id="ALM-12010__ol20655039202014"><li id="ALM-12010__li3649153912014"><span>In the FusionInsight Manager portal, click <strong id="ALM-12010__b3064793094522">O&amp;M &gt; Alarm<strong id="ALM-12010__b27872374104950"> &gt; Alarms</strong></strong>, click <span><img id="ALM-12010__image4649163910207" src="en-us_image_0269383815.png"></span> in the row containing the alarm and view the IP address of the standby Manager (Peer Manager) server in the alarm details.</span></li><li id="ALM-12010__li665018399204"><span>Log in to the active Manager server as user <strong id="ALM-12010__b16650193982017">root</strong>. <span id="ALM-12010__text13862037144910"></span><span id="ALM-12010__text077751144915"></span></span></li><li id="ALM-12010__li86511539112014"><span>Run the <strong id="ALM-12010__b14650439102018">ping</strong> <em id="ALM-12010__i96503394205">standby Manager heartbeat IP address</em> command to check whether the standby Manager server is reachable.</span><p><ul class="subitemlist" id="ALM-12010__ul565043917209"><li id="ALM-12010__li665012399202">If yes, go to <a href="#ALM-12010__li206521339172011">6</a>.</li><li id="ALM-12010__li36504394207">If no, go to <a href="#ALM-12010__li18651103915205">4</a>.</li></ul>
</p></li><li id="ALM-12010__li18651103915205"><a name="ALM-12010__li18651103915205"></a><a name="li18651103915205"></a><span>Contact the network administrator to check whether the network is faulty.</span><p><ul class="subitemlist" id="ALM-12010__ul1465123917207"><li id="ALM-12010__li7651539162019">If yes, go to <a href="#ALM-12010__li166511739102017">5</a>.</li><li id="ALM-12010__li12651153932016">If no, go to <a href="#ALM-12010__li206521339172011">6</a>.</li></ul>
</p></li><li id="ALM-12010__li166511739102017"><a name="ALM-12010__li166511739102017"></a><a name="li166511739102017"></a><span>Rectify the network fault and check whether the alarm is cleared from the alarm list.</span><p><ul class="subitemlist" id="ALM-12010__ul12651143992015"><li id="ALM-12010__li66510391204">If yes, no further action is required.</li><li id="ALM-12010__li165193912202">If no, go to <a href="#ALM-12010__li206521339172011">6</a>.</li></ul>
</p></li><li class="subitemlist" id="ALM-12010__li206521339172011"><a name="ALM-12010__li206521339172011"></a><a name="li206521339172011"></a><span>Run the following command to go to the software installation directory:</span><p><p id="ALM-12010__p1652939182013"><strong id="ALM-12010__b136521139172015">cd /opt</strong></p>
</p></li><li id="ALM-12010__li206524391203"><span>Run the following command to find the configuration file directory of the active and standby nodes.</span><p><p id="ALM-12010__p8652153962016"><strong id="ALM-12010__b16652173917208">find -name hacom_local.xml</strong></p>
</p></li><li id="ALM-12010__li9652143912209"><span>Run the following command to go to the <strong id="ALM-12010__b1265243992012">workspace</strong> directory:</span><p><p id="ALM-12010__p36527396208"><strong id="ALM-12010__b1765203982016">cd${BIGDATA_HOME}/om-server/OMS/workspace0</strong><strong id="ALM-12010__b1564419127399">/ha/local/hacom/conf/</strong></p>
</p></li><li id="ALM-12010__li1065213914202"><span>Run the <strong id="ALM-12010__b11213458183417">vim</strong> command to open the <strong id="ALM-12010__b521318586344">hacom_local.xml</strong> file. Check whether the local and peer nodes are correctly configured. The local node is configured as the active node, and the peer node is configured as the standby node.</span><p><ul id="ALM-12010__ul1365263916206"><li id="ALM-12010__li13652123919204">If yes, go to <a href="#ALM-12010__li56481639112012">12</a>.</li><li id="ALM-12010__li126521439182014">If no, go to <a href="#ALM-12010__li18655163992011">10</a>.</li></ul>
</p></li><li id="ALM-12010__li18655163992011"><a name="ALM-12010__li18655163992011"></a><a name="li18655163992011"></a><span>Modify the configuration of the active and standby nodes in the <strong id="ALM-12010__b8957024133513">hacom_local.xml</strong> file and press <strong id="ALM-12010__b59571324153518">Esc</strong> to return to the command mode. Run the <strong id="ALM-12010__b69571524173512">:wq</strong> command to save the modification and exit.</span></li><li id="ALM-12010__li1265563992014"><span>Check whether the alarm is cleared automatically.</span><p><ul id="ALM-12010__ul116551239192019"><li id="ALM-12010__li11655123992018">If yes, no further action is required.</li><li id="ALM-12010__li665543992012">If no, go to <a href="#ALM-12010__li56481639112012">12</a>.</li></ul>
</p></li></ol>
<p id="ALM-12010__p151791650141914"><strong id="ALM-12010__b193901952171915">Check whether the port is disabled by the firewall.</strong></p>
<ol start="12" id="ALM-12010__ol1264983932018"><li id="ALM-12010__li56481639112012"><a name="ALM-12010__li56481639112012"></a><a name="li56481639112012"></a><span>Run the <strong id="ALM-12010__b193834425356">lsof -i :20012</strong> command to check whether the heartbeat ports of the active and standby nodes are enabled. If the command output is displayed, the ports are enabled. Otherwise, the ports are disabled by the firewall.</span><p><ul id="ALM-12010__ul20648143982016"><li id="ALM-12010__li2064816399204">If yes, go to <a href="#ALM-12010__li8648153982010">13</a>.</li><li id="ALM-12010__li116484391209">If no, go to <a href="#ALM-12010__li41244883171443">16</a>.</li></ul>
</p></li><li id="ALM-12010__li8648153982010"><a name="ALM-12010__li8648153982010"></a><a name="li8648153982010"></a><span>Run the <strong id="ALM-12010__b064853911204">iptables -P INPUT ACCEPT</strong> command to avoid the server disconnection.</span></li><li id="ALM-12010__li8648113917204"><span>Run the following command to clear the firewall:</span><p><p id="ALM-12010__p1564893915206"><strong id="ALM-12010__b3648539112020">iptables -F</strong></p>
</p></li><li id="ALM-12010__li5649163982013"><span>Check whether the alarm is cleared from the alarm list.</span><p><ul id="ALM-12010__ul12649143919207"><li id="ALM-12010__li76481939182016">If yes, no further action is required.</li><li id="ALM-12010__li6649839152018">If no, go to <a href="#ALM-12010__li41244883171443">16</a>.</li></ul>
</p></li></ol>
<p id="ALM-12010__p66076255171453"><strong id="ALM-12010__b56103124171459">Collect fault information.</strong></p>
<ol start="16" id="ALM-12010__ol4742499917152"><li id="ALM-12010__li41244883171443"><a name="ALM-12010__li41244883171443"></a><a name="li41244883171443"></a><span>On the FusionInsight Manager, choose <strong id="ALM-12010__b2091290617036">O&amp;M</strong> &gt; <strong id="ALM-12010__b4582764171443">Log &gt; Download</strong>.</span></li><li id="ALM-12010__li52887856171443"><span>Select the following nodes from the <strong id="ALM-12010__b1114195518811">Service</strong> and click<strong id="ALM-12010__b11411559819"> OK</strong>:</span><p><ul class="subitemlist" id="ALM-12010__ul58072211171443"><li id="ALM-12010__li2749285171443">OmmServer</li><li id="ALM-12010__li24743571171443">Controller</li><li id="ALM-12010__li21365548171443">NodeAgent</li></ul>
</p></li><li id="ALM-12010__li1145664103113"><span>Click <span><img id="ALM-12010__image1945644173117" src="en-us_image_0269383816.png"></span> in the upper right corner, and set <strong id="ALM-12010__b6456941173117">Start Date</strong> and <strong id="ALM-12010__b11456154113318">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-12010__b13456164113319">Download</strong>.</span></li><li id="ALM-12010__li495644512588"><span>Contact the <span id="ALM-12010__text4614151421417">O&amp;M personnel</span> and send the collected log information.</span></li></ol>
</div>
<div class="section" id="ALM-12010__section1529716184534"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-12010__p4677152685316">After the fault is rectified, the system automatically clears this alarm.</p>
</div>
<div class="section" id="ALM-12010__s785de8080aae450dbd0d37da4f9f95ef"><h4 class="sectiontitle">Related Information</h4><p id="ALM-12010__en-us_topic_0070543674_p25816034">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>