doc-exports/docs/mrs/umn/ALM-45443.html
Yang, Tong 5914b67d13 MRS UMN Doc 20240802 version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2024-09-28 19:04:58 +00:00

92 lines
12 KiB
HTML

<a name="ALM-45443"></a><a name="ALM-45443"></a>
<h1 class="topictitle1">ALM-45443 Slow SQL Queries in the Cluster</h1>
<div id="body63357319"><div class="note" id="ALM-45443__note12303191265"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-45443__p1431519566">This section applies only to MRS 3.3.0 or later.</p>
</div></div>
<div class="section" id="ALM-45443__section4181191543314"><h4 class="sectiontitle"><span id="ALM-45443__text14838183534515">Alarm Description</span></h4><p id="ALM-45443__p182761827181">The system checks slow SQL queries for ClickHouse every 1 minute. This alarm is generated when the execution time of a SQL statement is longer than or equal to the slow SQL threshold.</p>
<p id="ALM-45443__p88950108215">This alarm is automatically cleared when the system detects that the execution time of the SQL statement is shorter than the slow SQL threshold.</p>
</div>
<div class="section" id="ALM-45443__section6432132533414"><h4 class="sectiontitle"><span id="ALM-45443__text66488119489">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45443__table15811244124611" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45443__row115971544184611"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-45443__p12597174434618"><span id="ALM-45443__text1074744511529">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-45443__p5597114494615"><span id="ALM-45443__text529420513457">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-45443__p1559716445469"><span id="ALM-45443__text139206232502">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45443__row155971644124612"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-45443__p65978447466">45443</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-45443__p13598344144611">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-45443__p175981544194611">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45443__section105471213143515"><h4 class="sectiontitle"><span id="ALM-45443__text0580183514489">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45443__table132164271583" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45443__row92761127482"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.1"><p id="ALM-45443__p12276527485"><span id="ALM-45443__text12210145419505">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.2"><p id="ALM-45443__p72767277812"><span id="ALM-45443__text1971012173566">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45443__row1427616274819"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45443__p162761627283">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45443__p1427611271585">Specifies the cluster or system for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45443__row5276827783"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45443__p52764271086">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45443__p32763271180">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45443__row2027616271585"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45443__p122762271287">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45443__p2276327885">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45443__row5276202713819"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45443__p202768273810">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45443__p1227618271580">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45443__section0918121233917"><h4 class="sectiontitle"><span id="ALM-45443__text1127833410585">Impact on the System</span></h4><p id="ALM-45443__p115222903917">The performance of the ClickHouse service deteriorates, which slows the response of other services. If there are too many slow SQL statements, the service may be unavailable.</p>
</div>
<div class="section" id="ALM-45443__section15920165211392"><h4 class="sectiontitle"><span id="ALM-45443__text10245783115">Possible Causes</span></h4><ul id="ALM-45443__ul99361645184215"><li id="ALM-45443__li293674514216">The ClickHouse service is overloaded.</li><li id="ALM-45443__li12936945154211">The execution of SQL statements takes a long time.</li></ul>
</div>
<div class="section" id="ALM-45443__section1437654425314"><h4 class="sectiontitle"><span id="ALM-45443__text35421632154">Handling Procedure</span></h4><p id="ALM-45443__p15198192793613"><strong id="ALM-45443__b1078011002714">Check whether the ClickHouse service load is heavy.</strong></p>
<ol id="ALM-45443__ol1031577544"><li id="ALM-45443__li891013335416"><span>Log in to FusionInsight Manager, choose <strong id="ALM-45443__b12255162514213">O&amp;M</strong> &gt; <strong id="ALM-45443__b1025515259214">Alarm</strong> &gt; <strong id="ALM-45443__b625515259212">Alarms</strong>, and view the role name and the IP address for the hostname in <strong id="ALM-45443__b925532512210">Location</strong>.</span></li><li id="ALM-45443__li20952135802"><span>Log in to the node where the client is installed as the client installation user and run the following commands:</span><p><p id="ALM-45443__p20951185605"><strong id="ALM-45443__b4951351605">cd </strong><em id="ALM-45443__i49516515011">{Client installation path}</em></p>
<p id="ALM-45443__p10951851109"><strong id="ALM-45443__b895175403">source bigdata_env</strong></p>
<ul id="ALM-45443__ul119521151019"><li id="ALM-45443__li12952051017">For a cluster with Kerberos authentication enabled (security mode):<p id="ALM-45443__p199511657011"><a name="ALM-45443__li12952051017"></a><a name="li12952051017"></a><strong id="ALM-45443__b159515511014">kinit</strong> <em id="ALM-45443__i69513518010">Component service user</em></p>
<p id="ALM-45443__p20952752006"><strong id="ALM-45443__b26971317152718">clickhouse client --host </strong><em id="ALM-45443__i3697161712279">IP address of the ClickHouseServer instance that reports the alarm</em><strong id="ALM-45443__b18697191718273"> --port </strong> <strong id="ALM-45443__b4697151711279">--secure</strong></p>
</li><li id="ALM-45443__li59521456012">For a cluster with Kerberos authentication disabled (normal mode):<p id="ALM-45443__p4952175200"><a name="ALM-45443__li59521456012"></a><a name="li59521456012"></a><strong id="ALM-45443__b736817376278">clickhouse client --host </strong><em id="ALM-45443__i6368193732719">IP address of the ClickHouseServer instance that reports the alarm</em> <strong id="ALM-45443__b1436817375276">--user </strong><em id="ALM-45443__i16369153782713">Username</em><strong id="ALM-45443__b11369137182717"> --password</strong><strong id="ALM-45443__b4369193717274"> --port </strong></p>
</li></ul>
</p></li><li id="ALM-45443__li891033318545"><span>Run the following statement to check whether data is frequently written to the system table. If yes, wait until the service execution is complete and check whether the alarm is cleared.</span><p><p id="ALM-45443__p89521551509"><strong id="ALM-45443__b4952195308">SELECT query_id, user, FQDN(), elapsed, query FROM system.processes ORDER BY query_id;</strong></p>
<ul id="ALM-45443__ul99521851808"><li id="ALM-45443__li18952658011">If yes, no further action is required.</li><li id="ALM-45443__li19952159015">If no, go to <a href="#ALM-45443__li1927623020184">4</a>.</li></ul>
</p></li></ol>
<p id="ALM-45443__p965203101"><strong id="ALM-45443__b27182813319">Checking whether the SQL statements take a long time.</strong></p>
<ol start="4" id="ALM-45443__ol192761930121810"><li id="ALM-45443__li1927623020184"><a name="ALM-45443__li1927623020184"></a><a name="li1927623020184"></a><span>Check the logical cluster to which the alarm object belongs. Log in to FusionInsight Manager, click <strong id="ALM-45443__b139171459203116">Cluster</strong>, choose <strong id="ALM-45443__b529213243210">Services</strong> &gt; <strong id="ALM-45443__b8765523217">ClickHouse</strong>, and click <strong id="ALM-45443__b4693853214">Logic Cluster</strong>. On the displayed page, choose <strong id="ALM-45443__b1741973593018">Query Management</strong> &gt; <strong id="ALM-45443__b15317161653516">Ongoing Slow Queries</strong>. Check which SQL statements take a long time on the displayed page, confirm with the user to adjust services, optimize slow SQL statements, and check whether the optimization is successful.</span><p><ul id="ALM-45443__ul94391310111413"><li id="ALM-45443__li543915105142">If yes, go to <a href="#ALM-45443__li1043716190409">5</a>.</li><li id="ALM-45443__li14391510141414">If no, go to <a href="#ALM-45443__li6769733151816">6</a>.</li></ul>
</p></li><li id="ALM-45443__li1043716190409"><a name="ALM-45443__li1043716190409"></a><a name="li1043716190409"></a><span>After the SQL statements are complete, check whether the alarm is cleared.</span><p><ul id="ALM-45443__ul1437419154020"><li id="ALM-45443__li11437111916404">If yes, no further action is required.</li><li id="ALM-45443__li1743761904019">If no, go to <a href="#ALM-45443__li6769733151816">6</a>.</li></ul>
</p></li></ol>
<p id="ALM-45443__p348310210267"><strong id="ALM-45443__b4892621132610">Collect fault information.</strong></p>
<ol start="6" id="ALM-45443__ol14770133318187"><li id="ALM-45443__li6769733151816"><a name="ALM-45443__li6769733151816"></a><a name="li6769733151816"></a><span>On FusionInsight Manager, choose <strong id="ALM-45443__b14542949524230">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-45443__b3266283604230">Log</strong> &gt; <strong id="ALM-45443__b15042013484230">Download</strong>.</span></li><li id="ALM-45443__li10902033134212"><span>Expand the <strong id="ALM-45443__b13861038044230">Service</strong> drop-down list, and select <strong id="ALM-45443__b3274132804230">ClickHouse</strong> for the target cluster.</span></li><li id="ALM-45443__li1848161911347"><span>Expand the <strong id="ALM-45443__b12642628784230">Hosts</strong> drop-down list. In the <strong id="ALM-45443__b20837869254230">Select Host</strong> dialog box that is displayed, select the abnormal host, and click <strong id="ALM-45443__b18862369244230">OK</strong>.</span></li><li id="ALM-45443__li181213284341"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-45443__b14271849014230">Start Date</strong> and <strong id="ALM-45443__b5515233344230">End Date</strong> for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-45443__b2987094994230">Download</strong>.</span></li><li id="ALM-45443__li1539653315345"><span>Contact <span id="ALM-45443__text193961533193414">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-45443__section1069512919569"><h4 class="sectiontitle"><span id="ALM-45443__text976142215819">Alarm Clearance</span></h4><p id="ALM-45443__p391831655614">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-45443__section891955662611"><h4 class="sectiontitle"><span id="ALM-45443__text13373191116114">Related Information</span></h4><p id="ALM-45443__p139191756122619"><span id="ALM-45443__text13669101910115">None.</span></p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>