doc-exports/docs/mrs/umn/ALM-13003.html
Yang, Tong 3b1f73dece MRS UMN 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-13 12:03:34 +00:00

88 lines
10 KiB
HTML

<a name="ALM-13003"></a><a name="ALM-13003"></a>
<h1 class="topictitle1">ALM-13003 GC Duration of the ZooKeeper Process Exceeds the Threshold</h1>
<div id="body20914735"><div class="section" id="ALM-13003__section11321137"><h4 class="sectiontitle">Description</h4><p id="ALM-13003__p16372008">The system checks the garbage collection (GC) duration of the ZooKeeper process every 60 seconds. This alarm is generated when the GC duration exceeds the threshold (12 seconds by default).</p>
<p id="ALM-13003__p13130349">This alarm is cleared when the GC duration is less than the threshold.</p>
</div>
<div class="section" id="ALM-13003__section34781370"><h4 class="sectiontitle">Attribute</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-13003__table56925315" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-13003__row39131625"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-13003__p15545074">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-13003__p51191451">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-13003__p52866828">Auto Clear</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-13003__row54354670"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-13003__p40652136">13003</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-13003__p4488739">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-13003__p28043591">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-13003__section44596876"><h4 class="sectiontitle">Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-13003__table56938362" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-13003__row66416598"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-13003__p11035351">Name</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-13003__p21448204">Meaning</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-13003__row1972162216335"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-13003__p2036481141117">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-13003__p187931338134115">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-13003__row59582941"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-13003__p65062640">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-13003__p14548854">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-13003__row63830828"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-13003__p35626567">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-13003__p34752493">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-13003__row44336985"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-13003__p51620924">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-13003__p45146327">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-13003__row3663763"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-13003__p28329366">Trigger Condition</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-13003__p12977281">Specifies the threshold for triggering the alarm.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-13003__section65827568"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-13003__p44526845">A long GC duration of the ZooKeeper process may interrupt the services. </p>
</div>
<div class="section" id="ALM-13003__section55577208"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-13003__p49904689">The heap memory of the ZooKeeper process is overused or inappropriately allocated, causing frequent occurrence of the GC process.</p>
</div>
<div class="section" id="ALM-13003__section30432828"><h4 class="sectiontitle">Procedure</h4><p class="tableheading" id="ALM-13003__p15748038"><strong id="ALM-13003__b787194920356">Check the GC duration.</strong></p>
<ol id="ALM-13003__ol55592094161335"><li id="ALM-13003__li34235908161327"><span>On FusionInsight Manager, choose <strong id="ALM-13003__b778341819403">O&amp;M</strong> &gt; <strong id="ALM-13003__b166511720134015">Alarm</strong> &gt; <strong id="ALM-13003__b1350392216402">Alarms</strong>. On the displayed page, click the drop-down list of <strong id="ALM-13003__b2732174014113">GC Duration of the ZooKeeper Process Exceeds the Threshold</strong>. View the IP address of the instance for which the alarm is generated.</span></li><li id="ALM-13003__li39687721161327"><span>On FusionInsight Manager, choose <strong id="ALM-13003__b132130183616">Cluster</strong> &gt; <em id="ALM-13003__i533217014369">Name of the desired cluster</em> &gt; <strong id="ALM-13003__b9332409363">Services</strong> &gt; <strong id="ALM-13003__b1333140173619">ZooKeeper</strong> &gt; <strong id="ALM-13003__b8333180103619">Instance</strong> &gt; <strong id="ALM-13003__b19509153533819">quorumpeer</strong>. Click the drop-down list in the upper right corner of <strong id="ALM-13003__b1615117194310">Chart</strong>, choose <strong id="ALM-13003__b26902077434">Customize</strong> &gt; <strong id="ALM-13003__b157781292432">GC</strong>, select <strong id="ALM-13003__b17760151711432">ZooKeeper GC Duration per Minute</strong>, and click <strong id="ALM-13003__b129814612449">OK</strong> to check the GC duration statistics of the ZooKeeper process collected every minute.</span></li><li id="ALM-13003__li37760361161327"><span>Check whether the GC duration of the ZooKeeper process collected every minute exceeds the threshold (12 seconds by default). </span><p><ul class="subitemlist" id="ALM-13003__ul11652136161327"><li id="ALM-13003__li60588867161327">If yes, go to <a href="#ALM-13003__li1332215316392">4</a>.</li><li id="ALM-13003__li8751222161327">If no, go to <a href="#ALM-13003__li12535847161327">8</a>.</li></ul>
</p></li><li id="ALM-13003__li1332215316392"><a name="ALM-13003__li1332215316392"></a><a name="li1332215316392"></a><span>Check whether memory leakage occurs in the application.</span></li><li id="ALM-13003__li4298930161327"><span>On the <strong id="ALM-13003__b1771822012398">Home</strong> page of FusionInsight Manager, choose <strong id="ALM-13003__b772711202391">Cluster</strong> &gt; <strong id="ALM-13003__b1672812016391">Services</strong> &gt; <strong id="ALM-13003__b14729192043910">ZooKeeper</strong>. On the page that is displayed, click the <strong id="ALM-13003__b973072017394">Configuration</strong> tab then the <strong id="ALM-13003__b19730142016394">All Configurations</strong> sub-tab, and select <strong id="ALM-13003__b18731820173912">quorumpeer</strong> &gt; <strong id="ALM-13003__b8731182043910">System</strong>. Increase the value of the <strong id="ALM-13003__b156582605344957">GC_OPTS</strong> parameter as required.</span><p><div class="note" id="ALM-13003__note1177145816255"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-13003__p16954114265">Generally, <strong id="ALM-13003__b96571347164713">-Xmx</strong> is twice of ZooKeeper data capacity. If the capacity of ZooKeeper reaches 2 GB, set <strong id="ALM-13003__b15172185420489">GC_OPTS</strong> as follows:</p>
<p id="ALM-13003__p8696411268">-Xms4G -Xmx4G -XX:NewSize=512M -XX:MaxNewSize=512M -XX:MetaspaceSize=64M -XX:MaxMetaspaceSize=64M -XX:CMSFullGCsBeforeCompaction=1</p>
</div></div>
</p></li><li id="ALM-13003__li38690372161327"><span>Save the configuration and restart the ZooKeeper service.</span></li><li id="ALM-13003__li31219033161327"><span>Check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-13003__ul41810644161327"><li id="ALM-13003__li12669032161327">If yes, no further action is required.</li><li id="ALM-13003__li19558708161327">If no, go to <a href="#ALM-13003__li12535847161327">8</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-13003__p40751483161327"><strong id="ALM-13003__b48545787161340">Collect the fault information.</strong></p>
<ol start="8" id="ALM-13003__ol55955245161343"><li id="ALM-13003__li12535847161327"><a name="ALM-13003__li12535847161327"></a><a name="li12535847161327"></a><span>On FusionInsight Manager, choose <strong id="ALM-13003__b3628412154011">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-13003__b136371512184016">Log</strong> &gt; <strong id="ALM-13003__b163771217407">Download</strong>.</span></li><li id="ALM-13003__li45713767161327"><span>Expand the <strong id="ALM-13003__b188921417114020">Service</strong> drop-down list, and select <strong id="ALM-13003__b1790071711408">ZooKeeper</strong> for the target cluster.</span></li><li id="ALM-13003__li8770720161327"><span>Click <span><img id="ALM-13003__image104601319175315" src="en-us_image_0263895382.png"></span> in the upper right corner, and set <strong id="ALM-13003__b154242516402">Start Date</strong> and <strong id="ALM-13003__b2431725194018">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-13003__b7431425124012">Download</strong>.</span></li><li id="ALM-13003__li39339725161327"><span>Contact <span id="ALM-13003__text14690443174017">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-13003__section169311343318"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-13003__p754913417333">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-13003__section5459996"><h4 class="sectiontitle">Related Information</h4><p id="ALM-13003__p37131656">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>