doc-exports/docs/mrs/umn/ALM-13000.html
Yang, Tong 3b1f73dece MRS UMN 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-13 12:03:34 +00:00

101 lines
18 KiB
HTML

<a name="ALM-13000"></a><a name="ALM-13000"></a>
<h1 class="topictitle1">ALM-13000 ZooKeeper Service Unavailable</h1>
<div id="body2237679"><div class="section" id="ALM-13000__s4ed51c9f0d9a477fbf50d8ce120581b4"><h4 class="sectiontitle">Description</h4><p id="ALM-13000__en-us_topic_0070543632_p20777609">The system checks the ZooKeeper service status every 60 seconds. This alarm is generated when the ZooKeeper service is unavailable.</p>
<p id="ALM-13000__en-us_topic_0070543632_p52780758">This alarm is cleared when the ZooKeeper service recovers.</p>
</div>
<div class="section" id="ALM-13000__s513710eed6ad4012b965eb6d83223b70"><h4 class="sectiontitle">Attribute</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-13000__en-us_topic_0070543632_table47383040" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-13000__en-us_topic_0070543632_row31563057"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-13000__en-us_topic_0070543632_p6470829">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-13000__en-us_topic_0070543632_p54375137">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-13000__en-us_topic_0070543632_p42310006">Auto Clear</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-13000__en-us_topic_0070543632_row4558484"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-13000__en-us_topic_0070543632_p33692898">13000</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-13000__en-us_topic_0070543632_p44770184">Critical</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-13000__en-us_topic_0070543632_p2506287">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-13000__en-us_topic_0070543632_section463484"><h4 class="sectiontitle">Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-13000__en-us_topic_0070543632_table1682686" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-13000__en-us_topic_0070543632_row39854064"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-13000__en-us_topic_0070543632_p6953712">Name</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-13000__en-us_topic_0070543632_p26379772">Meaning</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-13000__row183911312123814"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-13000__p192431315431">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-13000__p692551319435">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-13000__en-us_topic_0070543632_row56386813"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-13000__en-us_topic_0070543632_p3929179">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-13000__en-us_topic_0070543632_p49828093">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-13000__en-us_topic_0070543632_row45799660"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-13000__en-us_topic_0070543632_p18784943">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-13000__en-us_topic_0070543632_p45185452">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-13000__en-us_topic_0070543632_row4015887"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-13000__en-us_topic_0070543632_p56851411">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-13000__en-us_topic_0070543632_p41561572">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-13000__s0c40eacbbe5d4468af530e88a6f42993"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-13000__en-us_topic_0070543632_p11044137">ZooKeeper cannot provide coordination services for upper layer components and the components that depend on ZooKeeper may not run properly.</p>
</div>
<div class="section" id="ALM-13000__s9736acd82e1f45699a9799949d179cc9"><h4 class="sectiontitle">Possible Causes</h4><ul id="ALM-13000__en-us_topic_0070543632_ul22159882"><li id="ALM-13000__li1399211015453">The DNS is installed on the ZooKeeper node.</li><li id="ALM-13000__li82581912164517">The network is faulty.</li><li id="ALM-13000__en-us_topic_0070543632_li65221217">The KrbServer service is abnormal.</li><li id="ALM-13000__en-us_topic_0070543632_li50120049">The ZooKeeper instance is abnormal.</li><li id="ALM-13000__en-us_topic_0070543632_li48427257">The disk capacity is insufficient.</li></ul>
</div>
<div class="section" id="ALM-13000__s49d1cf9060454ae2b45eb214f22343fd"><h4 class="sectiontitle">Procedure</h4><p class="tableheading" id="ALM-13000__p2650131165116"><strong id="ALM-13000__b166571125118">Check the DNS.</strong></p>
<ol id="ALM-13000__ol136651718516"><li id="ALM-13000__li116653155115"><span>Check whether the DNS is installed on the node where the ZooKeeper instance is located. On the Linux node where the ZooKeeper instance is located, run the <strong id="ALM-13000__b17665916517">cat /etc/resolv.conf</strong> command to check whether the file is empty.</span><p><ul class="subitemlist" id="ALM-13000__ul20665211516"><li id="ALM-13000__li366516105113">If yes, go to <a href="#ALM-13000__li76816175112">2</a>.</li><li id="ALM-13000__li468111125116">If no, go to <a href="#ALM-13000__li86969116511">3</a>.</li></ul>
</p></li><li id="ALM-13000__li76816175112"><a name="ALM-13000__li76816175112"></a><a name="li76816175112"></a><span>Run the <strong id="ALM-13000__b2681161105115">service named status</strong> command to check whether the DNS is started.</span><p><ul class="subitemlist" id="ALM-13000__ul1468117120512"><li id="ALM-13000__li36811317513">If yes, go to <a href="#ALM-13000__li86969116511">3</a>.</li><li id="ALM-13000__li46969117517">If no, go to <a href="#ALM-13000__li42741615115119">5</a>.</li></ul>
</p></li><li id="ALM-13000__li86969116511"><a name="ALM-13000__li86969116511"></a><a name="li86969116511"></a><span>Run the <strong id="ALM-13000__b11696312517">service named stop</strong> command to stop the DNS service. If "Shutting down name server BIND waiting for named to shut down (28s)" is displayed, the DNS service is stopped successfully. Comment out the content (if any) in <strong id="ALM-13000__b469651165115">/etc/resolv.conf</strong>.</span></li><li id="ALM-13000__li4696912514"><span>On the <strong id="ALM-13000__b10696171195116">O&amp;M &gt; Alarm<strong id="ALM-13000__b146965165119"> &gt; Alarms</strong></strong> tab, check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-13000__ul116961714513"><li id="ALM-13000__li1769613120511">If yes, no further action is required.</li><li id="ALM-13000__li57116114517">If no, go to <a href="#ALM-13000__li42741615115119">5</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-13000__p427411154517"><strong id="ALM-13000__b14274115115113">Check the network status.</strong></p>
<ol start="5" id="ALM-13000__ol827461525111"><li id="ALM-13000__li42741615115119"><a name="ALM-13000__li42741615115119"></a><a name="li42741615115119"></a><span>On the Linux node where the ZooKeeper instance is located, run the <strong id="ALM-13000__b2274131512512">ping</strong> command to check whether the host names of other nodes where the ZooKeeper instance is located can be pinged successfully.</span><p><ul class="subitemlist" id="ALM-13000__ul1227411518514"><li id="ALM-13000__li0274515195120">If yes, go to <a href="#ALM-13000__li26784425155523">9</a>.</li><li id="ALM-13000__li2027441517514">If no, go to <a href="#ALM-13000__li1227471525111">6</a>.</li></ul>
</p></li><li id="ALM-13000__li1227471525111"><a name="ALM-13000__li1227471525111"></a><a name="li1227471525111"></a><span>Modify the IP addresses in <strong id="ALM-13000__b172743151514">/etc/hosts</strong> and add the host name and IP address mapping.</span></li><li id="ALM-13000__li1827411513519"><span>Run the <strong id="ALM-13000__b62901815165119">ping</strong> command again to check whether the host names of other nodes where the ZooKeeper instance is located can be pinged successfully.</span><p><ul class="subitemlist" id="ALM-13000__ul112902156514"><li id="ALM-13000__li6290101517512">If yes, go to <a href="#ALM-13000__li129021555116">8</a>.</li><li id="ALM-13000__li1629031516519">If no, go to <a href="#ALM-13000__li42883384155523">23</a>.</li></ul>
</p></li><li id="ALM-13000__li129021555116"><a name="ALM-13000__li129021555116"></a><a name="li129021555116"></a><span>On the <strong id="ALM-13000__b729011159515">O&amp;M &gt; Alarm<strong id="ALM-13000__b1929014150513"> &gt; Alarms</strong></strong> tab, check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-13000__ul8290151513517"><li id="ALM-13000__li123054159513">If yes, no further action is required.</li><li id="ALM-13000__li230581545115">If no, go to <a href="#ALM-13000__li26784425155523">9</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-13000__en-us_topic_0070543632_p37872961"><strong id="ALM-13000__b25851084155520">Check the KrbServer service status (Skip this step if the normal mode is used).</strong></p>
<ol start="9" id="ALM-13000__ol12748298155533"><li id="ALM-13000__li26784425155523"><a name="ALM-13000__li26784425155523"></a><a name="li26784425155523"></a><span>On FusionInsight Manager, choose <strong id="ALM-13000__b12331101111912">Cluster &gt; </strong><em id="ALM-13000__i153356114193">Name of the desired cluster</em><strong id="ALM-13000__b23331011101911"> &gt; Services</strong>.</span></li><li id="ALM-13000__li15014301155523"><span>Check whether the KrbServer service is normal.</span><p><ul class="subitemlist" id="ALM-13000__ul38950957155523"><li id="ALM-13000__li22054827155523">If yes, go to <a href="#ALM-13000__li21042948155523">13</a>.</li><li id="ALM-13000__li41610586155523">If no, go to <a href="#ALM-13000__li45270224155523">11</a>.</li></ul>
</p></li><li id="ALM-13000__li45270224155523"><a name="ALM-13000__li45270224155523"></a><a name="li45270224155523"></a><span>Perform operations based on "ALM-25500 KrbServer Service Unavailable" and check whether the KrbServer service is recovered.</span><p><ul class="subitemlist" id="ALM-13000__ul42312727155523"><li id="ALM-13000__li4292620155523">If yes, go to <a href="#ALM-13000__li4125272155523">12</a>.</li><li id="ALM-13000__li12157954155523">If no, go to <a href="#ALM-13000__li42883384155523">23</a>.</li></ul>
</p></li><li id="ALM-13000__li4125272155523"><a name="ALM-13000__li4125272155523"></a><a name="li4125272155523"></a><span>On the <strong id="ALM-13000__b4778840155523">O&amp;M &gt; Alarm<strong id="ALM-13000__b27872374104950"> &gt; Alarms</strong></strong> tab, check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-13000__ul14135505155523"><li id="ALM-13000__li43009567155523">If yes, no further action is required.</li><li id="ALM-13000__li61222935155523">If no, go to <a href="#ALM-13000__li21042948155523">13</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-13000__p60110687155523"><strong id="ALM-13000__b63142738155543">Check the ZooKeeper service instance status.</strong></p>
<ol start="13" id="ALM-13000__ol62066454155552"><li id="ALM-13000__li21042948155523"><a name="ALM-13000__li21042948155523"></a><a name="li21042948155523"></a><span>On FusionInsight Manager, choose <strong id="ALM-13000__b1617816227218">Cluster &gt;</strong><em id="ALM-13000__i41831122182118"> Name of the desired cluster</em><strong id="ALM-13000__b1218042217212"> &gt; Services</strong> &gt; <strong id="ALM-13000__b65711644155523">ZooKeeper</strong> &gt; <strong id="ALM-13000__b54533888155523">quorumpeer</strong>.</span></li><li id="ALM-13000__li64580904155523"><span>Check whether the ZooKeeper instances are normal.</span><p><ul class="subitemlist" id="ALM-13000__ul44458358155523"><li id="ALM-13000__li26757241155523">If yes, go to <a href="#ALM-13000__li253728155523">18</a>.</li><li id="ALM-13000__li19852898155523">If no, go to <a href="#ALM-13000__li36165444155523">15</a>.</li></ul>
</p></li><li id="ALM-13000__li36165444155523"><a name="ALM-13000__li36165444155523"></a><a name="li36165444155523"></a><span>Select instances whose status is not good, and choose <strong id="ALM-13000__b44357225155523">More</strong> &gt; <strong id="ALM-13000__b63670706155523">Restart Instance</strong>.</span></li><li id="ALM-13000__li22693465155523"><span>Check whether the instance status is good after restart.</span><p><ul class="subitemlist" id="ALM-13000__ul62173819155523"><li id="ALM-13000__li43719856155523">If yes, go to <a href="#ALM-13000__li65308695155523">17</a>.</li><li id="ALM-13000__li51647444155523">If no, go to <a href="#ALM-13000__li253728155523">18</a>.</li></ul>
</p></li><li id="ALM-13000__li65308695155523"><a name="ALM-13000__li65308695155523"></a><a name="li65308695155523"></a><span>On the <strong id="ALM-13000__b2914599155523">O&amp;M &gt; Alarm<strong id="ALM-13000__b013318129169"> &gt; Alarms</strong></strong> tab, check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-13000__ul63772621155523"><li id="ALM-13000__li26231399155523">If yes, no further action is required.</li><li id="ALM-13000__li44368549155523">If no, go to <a href="#ALM-13000__li253728155523">18</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-13000__p37082683155523"><strong id="ALM-13000__b926938515561">Check disk status.</strong></p>
<ol start="18" id="ALM-13000__ol34760783155613"><li id="ALM-13000__li253728155523"><a name="ALM-13000__li253728155523"></a><a name="li253728155523"></a><span>On FusionInsight Manager, choose <strong id="ALM-13000__b18447182672118">Cluster &gt; </strong><em id="ALM-13000__i1944814269215">Name of the desired cluster</em><strong id="ALM-13000__b9447132642119"> &gt; Service</strong> &gt; <strong id="ALM-13000__b55512933155523">ZooKeeper</strong> &gt; <strong id="ALM-13000__b29854353155523">quorumpeer</strong>, and check the node host information of the ZooKeeper instance.</span></li><li id="ALM-13000__li20551983155523"><span>On FusionInsight Manager, click <strong id="ALM-13000__b2283553155523">Host</strong>.</span></li><li id="ALM-13000__li45856550155523"><span>In the <strong id="ALM-13000__b50750122155523">Disk</strong> column, check whether the disk space of each node where ZooKeeper instances are located is insufficient (disk usage exceeds 80%).</span><p><ul class="subitemlist" id="ALM-13000__ul64747495155523"><li id="ALM-13000__li17119187155523">If yes, go to <a href="#ALM-13000__li23393056155523">21</a>.</li><li id="ALM-13000__li44476868155523">If no, go to <a href="#ALM-13000__li42883384155523">23</a>.</li></ul>
</p></li><li id="ALM-13000__li23393056155523"><a name="ALM-13000__li23393056155523"></a><a name="li23393056155523"></a><span>Expand disk capacity. For details, see "ALM-12017 Insufficient Disk Capacity".</span></li><li id="ALM-13000__li5048138155523"><span>On the <strong id="ALM-13000__b9210918155523">O&amp;M &gt; Alarm<strong id="ALM-13000__b82438187160"> &gt; Alarms</strong></strong> tab, check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-13000__ul34859511155523"><li id="ALM-13000__li15789403155523">If yes, no further action is required.</li><li id="ALM-13000__li3873279155523">If no, go to <a href="#ALM-13000__li42883384155523">23</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-13000__p30906685155523"><strong id="ALM-13000__b62295254155656">Collect fault information.</strong></p>
<ol start="23" id="ALM-13000__ol56504106155659"><li id="ALM-13000__li42883384155523"><a name="ALM-13000__li42883384155523"></a><a name="li42883384155523"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-13000__b39977366113627">O&amp;M</strong> &gt; <strong id="ALM-13000__b24251979113627">Log &gt; Download</strong>.</span></li><li id="ALM-13000__li19589573155523"><span>Select the following nodes in the required cluster from the <strong id="ALM-13000__b50406136155523">Service</strong>: (KrbServer logs do not need to be downloaded in normal mode.)</span><p><ul class="subitemlist" id="ALM-13000__ul2176619155523"><li id="ALM-13000__li56365177155523">ZooKeeper</li><li id="ALM-13000__li37524548155523">KrbServer</li></ul>
</p></li><li id="ALM-13000__li1145664103113"><span>Click <span><img id="ALM-13000__image1945644173117" src="en-us_image_0269383940.png"></span> in the upper right corner, and set <strong id="ALM-13000__b6456941173117">Start Date</strong> and <strong id="ALM-13000__b11456154113318">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-13000__b13456164113319">Download</strong>.</span></li><li id="ALM-13000__li495644512588"><span>Contact the <span id="ALM-13000__text4614151421417">O&amp;M personnel</span> and send the collected log information.</span></li></ol>
</div>
<div class="section" id="ALM-13000__section1529716184534"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-13000__p4677152685316">After the fault is rectified, the system automatically clears this alarm.</p>
</div>
<div class="section" id="ALM-13000__sb2d3fdce13c3410687c752df0a484012"><h4 class="sectiontitle">Related Information</h4><p id="ALM-13000__en-us_topic_0070543632_p9607124">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>