forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
87 lines
10 KiB
HTML
87 lines
10 KiB
HTML
<a name="ALM-45444"></a><a name="ALM-45444"></a>
|
|
|
|
<h1 class="topictitle1">ALM-45444 Abnormal ClickHouse Process</h1>
|
|
<div id="body63238839"><div class="note" id="ALM-45444__note8913155652611"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-45444__p10913195632614">This section applies only to MRS 3.3.0 or later.</p>
|
|
</div></div>
|
|
<div class="section" id="ALM-45444__section209151456122611"><h4 class="sectiontitle"><span id="ALM-45444__text14838183534515">Alarm Description</span></h4><p id="ALM-45444__p107817516304">The health check module checks ClickHouse instances every 30 seconds. If the number of consecutive failures exceeds the threshold, an alarm is reported. In this case, the ClickHouse process may stop responding and services cannot be properly executed.</p>
|
|
</div>
|
|
<div class="section" id="ALM-45444__section191525620261"><h4 class="sectiontitle"><span id="ALM-45444__text66488119489">Alarm Attributes</span></h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45444__table591515562262" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45444__row5915956102610"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-45444__p19151656172610"><span id="ALM-45444__text1074744511529">Alarm ID</span></p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-45444__p2091518563265"><span id="ALM-45444__text529420513457">Alarm Severity</span></p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-45444__p13915135612268"><span id="ALM-45444__text139206232502">Auto Cleared</span></p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-45444__row1391585642617"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-45444__p11915185652619">45444</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-45444__p12915165652616">Critical</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-45444__p1691625612265">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-45444__section19916165682616"><h4 class="sectiontitle"><span id="ALM-45444__text0580183514489">Alarm Parameters</span></h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45444__table189161756162610" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45444__row1891614568262"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.1"><p id="ALM-45444__p09167563266"><span id="ALM-45444__text12210145419505">Parameter</span></p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.2"><p id="ALM-45444__p16916185622616"><span id="ALM-45444__text1971012173566">Description</span></p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-45444__row5916195672615"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45444__p591665612614">Source</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45444__p189166561264">Specifies the cluster or system for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-45444__row13916145652618"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45444__p991619562263">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45444__p99162564261">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-45444__row391615613263"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45444__p2091675612260">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45444__p14916256182615">Specifies the role for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-45444__row16916105602616"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45444__p89171556112620">HostName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45444__p1091719564268">Specifies the host for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-45444__section149178567269"><h4 class="sectiontitle"><span id="ALM-45444__text1127833410585">Impact on the System</span></h4><p id="ALM-45444__p10917125632616">If the ClickHouse process is abnormal, services cannot run properly.</p>
|
|
</div>
|
|
<div class="section" id="ALM-45444__section13917856122619"><h4 class="sectiontitle"><span id="ALM-45444__text10245783115">Possible Causes</span></h4><p id="ALM-45444__p697817318317">The ClickHouse process runs improperly.</p>
|
|
</div>
|
|
<div class="section" id="ALM-45444__section14917145612262"><h4 class="sectiontitle"><span id="ALM-45444__text35421632154">Handling Procedure</span></h4><ol id="ALM-45444__ol159175560267"><li id="ALM-45444__li129171956202619"><span>Log in to FusionInsight Manager, choose <strong id="ALM-45444__b10613153985019">O&M</strong> > <strong id="ALM-45444__b136147391504">Alarm</strong> > <strong id="ALM-45444__b7614123916503">Alarms</strong>, and view the role name and the IP address for the hostname in <strong id="ALM-45444__b1614153915014">Location</strong>.</span></li><li id="ALM-45444__li49171056152620"><span>Log in to the node where the client is installed as the client installation user and run the following commands:</span><p><p id="ALM-45444__p291714564260"><strong id="ALM-45444__b1410534311503">cd </strong><em id="ALM-45444__i2105543105018">{Client installation path}</em></p>
|
|
<p id="ALM-45444__p1091718563266"><strong id="ALM-45444__b189171056162620">source bigdata_env</strong></p>
|
|
<ul id="ALM-45444__ul2091745610265"><li id="ALM-45444__li1191795642616">For a cluster with Kerberos authentication enabled (security mode):<p id="ALM-45444__p991855611266"><a name="ALM-45444__li1191795642616"></a><a name="li1191795642616"></a><strong id="ALM-45444__b09186566263">kinit</strong> <em id="ALM-45444__i59181356172615">Component service user</em></p>
|
|
<p id="ALM-45444__p2918155682611"><strong id="ALM-45444__b20961153444222">clickhouse client --host </strong><em id="ALM-45444__i4851913054222">IP address of the ClickHouseServer instance that reports the alarm</em><strong id="ALM-45444__b14091716574222"> --port </strong>9440 <strong id="ALM-45444__b19430793294222">--secure</strong></p>
|
|
</li><li id="ALM-45444__li14918205622613">For a cluster with Kerberos authentication disabled (normal mode):<p id="ALM-45444__p13943125414375"><a name="ALM-45444__li14918205622613"></a><a name="li14918205622613"></a><strong id="ALM-45444__b12287888974222">clickhouse client --host </strong><em id="ALM-45444__i8086596194222">IP address of the ClickHouseServer instance that reports the alarm</em> <strong id="ALM-45444__b6097517624222">--user </strong><em id="ALM-45444__i20592752064222">Username</em><strong id="ALM-45444__b15603673404222"> --password</strong><strong id="ALM-45444__b19596656294222"> --port </strong>9000</p>
|
|
</li></ul>
|
|
</p></li><li id="ALM-45444__li1891825612618"><span>Run the following statement to check whether the result can be properly returned:</span><p><p id="ALM-45444__p1891845617264"><strong id="ALM-45444__b991815682610">SELECT 1;</strong></p>
|
|
<ul id="ALM-45444__ul6918125672612"><li id="ALM-45444__li891825632617">If yes, go to <a href="#ALM-45444__li611216137">4</a>.</li><li id="ALM-45444__li109188563267">If no, go to <a href="#ALM-45444__li179191356102616">5</a>.</li></ul>
|
|
</p></li><li id="ALM-45444__li611216137"><a name="ALM-45444__li611216137"></a><a name="li611216137"></a><span>Wait for several minutes and check whether the alarm is cleared.</span><p><ul id="ALM-45444__ul1344192818139"><li id="ALM-45444__li1244113287134">If yes, no further action is required.</li><li id="ALM-45444__li16441182851319">If no, go to <a href="#ALM-45444__li179191356102616">5</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p id="ALM-45444__p15919856122610"><strong id="ALM-45444__b4892621132610">Collect fault information.</strong></p>
|
|
<ol start="5" id="ALM-45444__ol8919105662610"><li id="ALM-45444__li179191356102616"><a name="ALM-45444__li179191356102616"></a><a name="li179191356102616"></a><span>On FusionInsight Manager, choose <strong id="ALM-45444__b1383175624222">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-45444__b1397912554222">Log</strong> > <strong id="ALM-45444__b10365783414222">Download</strong>.</span></li><li id="ALM-45444__li9919165619265"><span>Expand the <strong id="ALM-45444__b7479613644222">Service</strong> drop-down list, and select <strong id="ALM-45444__b16555479604222">ClickHouse</strong> for the target cluster.</span></li><li id="ALM-45444__li18919155619260"><span>Expand the <strong id="ALM-45444__b21469614804222">Hosts</strong> drop-down list. In the <strong id="ALM-45444__b11690820864222">Select Host</strong> dialog box that is displayed, select the abnormal host, and click <strong id="ALM-45444__b18662919404222">OK</strong>.</span></li><li id="ALM-45444__li6919456142616"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-45444__b19405663844222">Start Date</strong> and <strong id="ALM-45444__b12284045324222">End Date</strong> for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-45444__b13847233054222">Download</strong>.</span></li><li id="ALM-45444__li109197563265"><span>Contact <span id="ALM-45444__text209195568266">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="ALM-45444__section169191156132611"><h4 class="sectiontitle"><span id="ALM-45444__text976142215819">Alarm Clearance</span></h4><p id="ALM-45444__p7919156132617">This alarm is automatically cleared after the fault is rectified.</p>
|
|
</div>
|
|
<div class="section" id="ALM-45444__section891955662611"><h4 class="sectiontitle"><span id="ALM-45444__text13373191116114">Related Information</span></h4><p id="ALM-45444__p139191756122619"><span id="ALM-45444__text13669101910115">None.</span></p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|