doc-exports/docs/mrs/umn/ALM-45425.html
Yang, Tong 3b1f73dece MRS UMN 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-13 12:03:34 +00:00

94 lines
13 KiB
HTML

<a name="ALM-45425"></a><a name="ALM-45425"></a>
<h1 class="topictitle1">ALM-45425 ClickHouse Service Unavailable</h1>
<div id="body1606211201997"><div class="section" id="ALM-45425__section8280367"><h4 class="sectiontitle">Description</h4><p id="ALM-45425__p2335053105020">The alarm module checks the ClickHouse instance status every 60 seconds. This alarm is generated when the alarm module detects that all ClickHouse instances are abnormal.</p>
<p id="ALM-45425__p34897906">This alarm is cleared when the system detects that any ClickHouse instance is restored and the alarm is cleared.</p>
</div>
<div class="section" id="ALM-45425__section7414445"><h4 class="sectiontitle">Attribute</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45425__table45079949" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45425__row5683496"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-45425__p57710042">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-45425__p44001849">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-45425__p7380012">Auto Clear</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45425__row60910108"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-45425__p16488194717492">45425</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-45425__p588994817496">Critical</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-45425__p34071398">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45425__section66730009"><h4 class="sectiontitle">Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45425__table8319831" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45425__row40868022"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-45425__p21975462">Name</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-45425__p35182007">Meaning</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45425__row594512751512"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-45425__p8838358184914">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-45425__p837170125015">Specifies the cluster or system for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45425__row31170320"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-45425__p39123317">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-45425__p172628810500">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45425__row13127105964111"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-45425__p8127135964119">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-45425__p212715599414">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45425__row722366124213"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-45425__p522314610427">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-45425__p222314615429">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45425__section63699172"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-45425__p485055019508">The ClickHouse service is abnormal. You cannot use FusionInsight Manager to perform cluster operations on the ClickHouse service. The ClickHouse service function is unavailable.</p>
</div>
<div class="section" id="ALM-45425__section36421639"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-45425__p9402509505">The configuration information in the <strong id="ALM-45425__b138851334174811">metrika.xml</strong> file in the component configuration directory of the faulty ClickHouse instance node is inconsistent with that of the corresponding ClickHouse instance in the ZooKeeper.</p>
</div>
<div class="section" id="ALM-45425__section2425015133012"><h4 class="sectiontitle">Procedure</h4><p id="ALM-45425__p19193152810241"><strong id="ALM-45425__b1709746184817">Check whether the configuration in metrika.xml of the ClickHouse instance is correct.</strong></p>
<ol id="ALM-45425__ol237743711398"><li id="ALM-45425__li1236345173817"><span>Log in to FusionInsight Manager, choose <strong id="ALM-45425__b14218193644911">Cluster</strong> &gt; <strong id="ALM-45425__b108741637174918">Services</strong> &gt; <strong id="ALM-45425__b156675392499">ClickHouse</strong> &gt; <strong id="ALM-45425__b1931619415498">Instance</strong>, and locate the abnormal ClickHouse instance based on the alarm information.</span><p><ul id="ALM-45425__ul81040199400"><li id="ALM-45425__li17104181984011">If yes, go to <a href="#ALM-45425__li237743710398">2</a>.</li><li id="ALM-45425__li17994722194010">If no, go to <a href="#ALM-45425__li62779304563">9</a>.</li></ul>
</p></li><li id="ALM-45425__li237743710398"><a name="ALM-45425__li237743710398"></a><a name="li237743710398"></a><span>Log in to the host where the ClickHouse service is abnormal and ping the IP address of another normal ClickHouse instance node to check whether the network connection is normal.</span><p><ul id="ALM-45425__ul20377537183913"><li id="ALM-45425__li937719375399">If yes, go to <a href="#ALM-45425__li156597363713">3</a>.</li><li id="ALM-45425__li1377113743915">If no, contact the network administrator to repair the network.</li></ul>
</p></li></ol><ol start="3" id="ALM-45425__ol1377123753914"><li id="ALM-45425__li156597363713"><a name="ALM-45425__li156597363713"></a><a name="li156597363713"></a><span>Choose <strong id="ALM-45425__b8458105618494">Cluster</strong> &gt; <strong id="ALM-45425__b12605585499">Services</strong> &gt; <strong id="ALM-45425__b8274106505">ClickHouse</strong> &gt; <strong id="ALM-45425__b1829212155015">Instance</strong>, click the abnormal instance name in the <strong id="ALM-45425__b10827438185013">Role</strong> column, click <strong id="ALM-45425__b1136175115013">Configurations</strong>, search for <strong id="ALM-45425__b0684140125114">macros.id</strong> in the search box, and find the value of <strong id="ALM-45425__b3555164510513">macros.id </strong>of the current instance.</span></li><li id="ALM-45425__li19429132053415"><span>Log in to the host where the ZooKeeper client is located and log in to the ZooKeeper client.</span><p><p id="ALM-45425__p9605547340">Switch to the client installation directory.</p>
<p id="ALM-45425__p195631949103411">Example: <strong id="ALM-45425__b516712551287">cd <span id="ALM-45425__ph381512063917">/opt/client</span></strong></p>
<p id="ALM-45425__p589718591346">Run the following command to configure environment variables:</p>
<p id="ALM-45425__p1289715917344"><strong id="ALM-45425__b1365028103518">source bigdata_env</strong></p>
<p id="ALM-45425__p57261392351">Run the following command to authenticate the user (skip this step in common mode):</p>
<p id="ALM-45425__p147265993516"><strong id="ALM-45425__b132617312353">kinit</strong> <em id="ALM-45425__i717193316356">Component service user</em></p>
<p id="ALM-45425__p148895181357">Run the following command to log in to the client tool:</p>
<p id="ALM-45425__p1788951812353"><strong id="ALM-45425__b148952348114927">zkCli.sh -server</strong> <em id="ALM-45425__i1519184681114927">service IP address of the node where the ZooKeeper role instance locates</em><strong id="ALM-45425__b2077879734114927">:</strong><em id="ALM-45425__i349509323114927">client port</em></p>
</p></li><li id="ALM-45425__li1377133713911"><a name="ALM-45425__li1377133713911"></a><a name="li1377133713911"></a><span>Run the following command to check whether the ClickHouse cluster topology information can be obtained.</span><p><p id="ALM-45425__p4971108016"><strong id="ALM-45425__b19865102974119">get /clickhouse/config/</strong><em id="ALM-45425__i42031330104119">value of <strong id="ALM-45425__b6169171439">macros.id</strong> in </em><strong id="ALM-45425__b9865142916410"><a href="#ALM-45425__li156597363713">3</a>/metrika.xml</strong></p>
<ul id="ALM-45425__ul10208175012116"><li id="ALM-45425__li12081950162115">If yes, go to <a href="#ALM-45425__li1462431320505">6</a>.</li><li id="ALM-45425__li196898110224">If no, go to <a href="#ALM-45425__li62779304563">9</a>.</li></ul>
</p></li></ol><ol start="6" id="ALM-45425__ol14718127162219"><li id="ALM-45425__li1462431320505"><a name="ALM-45425__li1462431320505"></a><a name="li1462431320505"></a><span>Log in to the host where the ClickHouse instance is abnormal and go to the configuration directory of the ClickHouse instance.</span><p><p id="ALM-45425__p16691656519"><strong id="ALM-45425__b109351647191615">cd </strong>${BIGDATA_HOME}<strong id="ALM-45425__b1293564714163">/FusionInsight_ClickHouse_</strong><em id="ALM-45425__i16916913105111">Version</em><strong id="ALM-45425__b1148072195117">/</strong>x_x<strong id="ALM-45425__b14804211517">_ClickHouseServer/etc</strong></p>
<p id="ALM-45425__p1536563275112"><strong id="ALM-45425__b2015191445216">cat metrika.xml</strong></p>
</p></li><li id="ALM-45425__li67181127152212"><span>Check whether the cluster topology information on ZooKeeper obtained in <a href="#ALM-45425__li1377133713911">5</a> is the same as that in the <strong id="ALM-45425__b968568191720">metrika.xml</strong> file in the component configuration directory in <a href="#ALM-45425__li1462431320505">6</a>.</span><p><ul id="ALM-45425__ul14389135010239"><li id="ALM-45425__li938905072317">If yes, check whether the alarm is cleared. If the alarm persists, go to <a href="#ALM-45425__li62779304563">9</a>.</li><li id="ALM-45425__li4691434182413">If no, go to <a href="#ALM-45425__li113661428132312">8</a>.</li></ul>
</p></li><li id="ALM-45425__li113661428132312"><a name="ALM-45425__li113661428132312"></a><a name="li113661428132312"></a><span>On FusionInsight Manager, choose <strong id="ALM-45425__b5845181819176">Cluster</strong> &gt; <strong id="ALM-45425__b448582020171">Services</strong> &gt; <strong id="ALM-45425__b17941025151713">ClickHouse</strong>, click <strong id="ALM-45425__b17333202719171">More</strong>, and select <strong id="ALM-45425__b2572112941715">Synchronize Configuration</strong>. Then, check whether the service status is normal and whether the alarm is cleared 5 minutes later.</span><p><ul id="ALM-45425__ul9689114518254"><li id="ALM-45425__li106893452251">If yes, no further action is required.</li><li id="ALM-45425__li5975185212515">If no, go to <a href="#ALM-45425__li62779304563">9</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-45425__p3847019615437"><strong id="ALM-45425__b1068744715437">Collect the fault information.</strong></p>
<ol start="9" id="ALM-45425__ol22771530115614"><li id="ALM-45425__li62779304563"><a name="ALM-45425__li62779304563"></a><a name="li62779304563"></a><span>On FusionInsight Manager, choose <strong id="ALM-45425__b130764118326">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-45425__b16308134113329">Log</strong> &gt; <strong id="ALM-45425__b1130934163211">Download</strong>.</span></li><li id="ALM-45425__li1627710305566"><span>Expand the <strong id="ALM-45425__b204619772465020">Service</strong> drop-down list, and select <strong id="ALM-45425__b56152748265020">ClickHouse</strong> for the target cluster.</span></li><li id="ALM-45425__li1686655576"><span>Choose the corresponding host form the host list.</span></li><li id="ALM-45425__li14277163085615"><span>Click <span><img id="ALM-45425__image152778301560" src="en-us_image_0295554634.png"></span> in the upper right corner, and set <strong id="ALM-45425__b142541059865020">Start Date</strong> and <strong id="ALM-45425__b142525258565020">End Date</strong> for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-45425__b82449652165020">Download</strong>.</span></li><li id="ALM-45425__li132771130125615"><span>Contact <span id="ALM-45425__text45815613394">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-45425__section169311343318"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-45425__p55781648135011">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-45425__section53362350"><h4 class="sectiontitle">Related Information</h4><p id="ALM-45425__p7522741">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>