forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
89 lines
17 KiB
HTML
89 lines
17 KiB
HTML
<a name="ALM-19007"></a><a name="ALM-19007"></a>
|
|
|
|
<h1 class="topictitle1">ALM-19007 HBase GC Time Exceeds the Threshold</h1>
|
|
<div id="body62523113"><div class="section" id="ALM-19007__s594d6a1d7f4a49ea986d756a9212c0b7"><h4 class="sectiontitle">Description</h4><p id="ALM-19007__en-us_topic_0070543521_p52325333">The system checks the old generation garbage collection (GC) time of the HBase service every 60 seconds. This alarm is generated when the detected old generation GC time exceeds the threshold (exceeds 5 seconds for three consecutive checks by default). To change the threshold, on the FusionInsight Manager portal, choose <strong id="ALM-19007__b146241759172115">O&M</strong> ><strong id="ALM-19007__b376819842310"> Alarm</strong> > <strong id="ALM-19007__b35164182237">Thresholds</strong> ><strong id="ALM-19007__b45571112173119"> </strong><em id="ALM-19007__i1250410253315">Name of the desired cluster</em> ><strong id="ALM-19007__b16503725193111"> HBase > GC > </strong><strong id="ALM-19007__b1626718712109">GC time for old generation</strong>. This alarm is cleared when the old generation GC time of the HBase service is shorter than or equal to the threshold.</p>
|
|
<div class="note" id="ALM-19007__note14544102852418"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p class="text" id="ALM-19007__en-us_topic_0070543520_p32794215">If the multi-instance function is enabled in the cluster and multiple HBase service instances are installed, you need to determine the HBase service instance where the alarm is generated based on the value of <strong id="ALM-19007__en-us_topic_0070543520_b26712487">ServiceName</strong> in <strong id="ALM-19007__en-us_topic_0070543520_b39085796">Location</strong>. For example, if the HBase1 service is unavailable, <strong id="ALM-19007__en-us_topic_0070543520_b11832897">ServiceName=HBase1</strong> is displayed in <strong id="ALM-19007__en-us_topic_0070543520_b39387211">Location</strong>, and the operation object in the procedure needs to be changed from HBase to HBase1.</p>
|
|
</div></div>
|
|
</div>
|
|
<div class="section" id="ALM-19007__s86b2709a2dfc4477836b0190a7007ca1"><h4 class="sectiontitle">Attribute</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-19007__en-us_topic_0070543521_table61629280" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-19007__en-us_topic_0070543521_row46652504"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-19007__en-us_topic_0070543521_p20756449">Alarm ID</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-19007__en-us_topic_0070543521_p3550770">Alarm Severity</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-19007__en-us_topic_0070543521_p19176948">Automatically Cleared</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-19007__en-us_topic_0070543521_row9828942"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-19007__en-us_topic_0070543521_p57946857">19007</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-19007__en-us_topic_0070543521_p63183853">Major</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-19007__en-us_topic_0070543521_p17618432">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-19007__s3d3c0665d10447c282aa73df14afc76b"><h4 class="sectiontitle">Parameters</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-19007__en-us_topic_0070543521_table17806925" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-19007__en-us_topic_0070543521_row19454509"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-19007__en-us_topic_0070543521_p32311362">Name</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-19007__en-us_topic_0070543521_p67083507">Meaning</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-19007__row10303193751017"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19007__p192431315431">Source</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19007__p692551319435">Specifies the cluster for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-19007__en-us_topic_0070543521_row65055018"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19007__en-us_topic_0070543521_p34965068">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19007__en-us_topic_0070543521_p13598275">Specifies the service name for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-19007__en-us_topic_0070543521_row55275618"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19007__en-us_topic_0070543521_p48140083">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19007__en-us_topic_0070543521_p7032652">Specifies the role name for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-19007__en-us_topic_0070543521_row63293876"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19007__en-us_topic_0070543521_p26530317">HostName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19007__en-us_topic_0070543521_p1472067">Specifies the object (host ID) for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-19007__s2fe75b804f744b44a784b09b8642e9c5"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-19007__en-us_topic_0070543521_p52128573">If the old generation GC time exceeds the threshold, HBase data read and write are affected.</p>
|
|
</div>
|
|
<div class="section" id="ALM-19007__s987a45396a814e99909cadecae6c8567"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-19007__en-us_topic_0070543521_p61664867">The memory of HBase instances is overused, the heap memory is inappropriately allocated, or a large number of I/O operations exist in HBase. As a result, GCs occur frequently.</p>
|
|
</div>
|
|
<div class="section" id="ALM-19007__se03570b5321641bdb673263c88715fac"><h4 class="sectiontitle">Procedure</h4><p class="tableheading" id="ALM-19007__en-us_topic_0070543521_p28798367"><strong id="ALM-19007__b15184248194758">Check the GC time.</strong></p>
|
|
<ol id="ALM-19007__ol8097414195026"><li id="ALM-19007__li60562631194753"><span>On the FusionInsight Manager portal, click <span class="menucascade" id="ALM-19007__menucascade9622183132519"><b><span class="uicontrol" id="ALM-19007__uicontrol862223115259">O&M</span></b> > <b><span class="uicontrol" id="ALM-19007__uicontrol106221731182512">Alarm</span></b> > <b><span class="uicontrol" id="ALM-19007__uicontrol9622153122518">Alarms</span></b></span> and select the alarm whose <strong id="ALM-19007__b29921111194753">ID</strong> is <strong id="ALM-19007__b854544194753">19007</strong>. Then check the role name in <strong id="ALM-19007__b14790172183618">Location </strong>and confirm the IP adress of the instance.</span><p><ul class="subitemlist" id="ALM-19007__ul14185721194753"><li id="ALM-19007__li18983006194753">If the role for which the alarm is generated is HMaster, go to <a href="#ALM-19007__li56776013194753">2</a>.</li><li id="ALM-19007__li61228514194753">If the role for which the alarm is generated is RegionServer, go to <a href="#ALM-19007__li29806005194753">3</a>.</li></ul>
|
|
</p></li><li id="ALM-19007__li56776013194753"><a name="ALM-19007__li56776013194753"></a><a name="li56776013194753"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-19007__b822515428018">Cluster</strong> > <em id="ALM-19007__i175981233194118">Name of the desired cluster</em> > <strong id="ALM-19007__b8192773194753">Services</strong> > <strong id="ALM-19007__b6626101194753">HBase</strong> > <strong id="ALM-19007__b59634910194753">Instance</strong> and click the HMaster for which the alarm is generated to go to the<strong id="ALM-19007__b14303164441516"> Dashboard </strong>page. Click the drop-down menu in the <strong id="ALM-19007__b1887733916215">Chart </strong>area and choose<strong id="ALM-19007__b74761236102111"> Customize</strong> > <strong id="ALM-19007__b216317116215">GC </strong>><strong id="ALM-19007__b816419114214"> Garbage Collection (GC) Time of HMaster</strong> and click <strong id="ALM-19007__b19971093194753">OK</strong> to check whether the value of <strong id="ALM-19007__b166461139164713">GC time </strong><strong id="ALM-19007__b76471639124715">for old generation</strong> is greater than the threshold (exceeds 5 seconds for three consecutive checks periods by default).</span><p><ul class="subitemlist" id="ALM-19007__ul36134607194753"><li id="ALM-19007__li7045827194753">If yes, go to <a href="#ALM-19007__li25514146194753">4</a>.</li><li id="ALM-19007__li33841118194753">If no, go to <a href="#ALM-19007__li55997378194753">6</a>.</li></ul>
|
|
</p></li><li id="ALM-19007__li29806005194753"><a name="ALM-19007__li29806005194753"></a><a name="li29806005194753"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-19007__b4176110636">Cluster</strong> > <em id="ALM-19007__i44620983515">Name of the desired cluster</em> ><strong id="ALM-19007__b1346059153510"> Services</strong> > <strong id="ALM-19007__b35454329194753">HBase</strong> > <strong id="ALM-19007__b50653505194753">Instance</strong> and click the RegionServer for which the alarm is generated to go to the<strong id="ALM-19007__b1371314415199"> Dashboard </strong>page. Click the drop-down menu in the <strong id="ALM-19007__b15305114102220">Chart </strong>area and choose<strong id="ALM-19007__b17868982219"> Customize</strong> > <strong id="ALM-19007__b1812013221733">GC </strong>><strong id="ALM-19007__b312014221232"> Garbage Collection (GC) Time of RegionServer</strong> and click <strong id="ALM-19007__b14551892194753">OK</strong> to check whether the value of <strong id="ALM-19007__b4999131215019">GC time </strong><strong id="ALM-19007__b1001013165010">for old generation</strong> is greater than the threshold (exceeds 5 seconds for three consecutive checks periods by default).</span><p><ul class="subitemlist" id="ALM-19007__ul12795542194753"><li id="ALM-19007__li37852590194753">If yes, go to <a href="#ALM-19007__li25514146194753">4</a>.</li><li id="ALM-19007__li46160969194753">If no, go to <a href="#ALM-19007__li55997378194753">6</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-19007__p48051021194753"><strong id="ALM-19007__b53110860195040">Check the current JVM configuration.</strong></p>
|
|
<ol start="4" id="ALM-19007__ol23971414195043"><li id="ALM-19007__li25514146194753"><a name="ALM-19007__li25514146194753"></a><a name="li25514146194753"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-19007__b18751172041"><strong id="ALM-19007__b7751971147">Cluster</strong> </strong>> <em id="ALM-19007__i183201355183515">Name of the desired cluster</em> ><strong id="ALM-19007__b931875511351"> Services</strong> > <strong id="ALM-19007__b65476200194753">HBase</strong> > <strong id="ALM-19007__b52414893194753">Configurations</strong>, and click <strong id="ALM-19007__b17747930194753">All</strong> <strong id="ALM-19007__b124181451414">Configurations</strong>. In Search, enter <strong id="ALM-19007__b25513644194753">GC_OPTS</strong> to check the <strong id="ALM-19007__b28296207194753">GC_OPTS</strong> memory parameter of role HMaster(HBase->HMaster), RegionServer(HBase->RegionServer). Adjust the values of <strong id="ALM-19007__b53339277194753">-Xmx</strong> and <strong id="ALM-19007__b10291445194753">-XX:CMSInitiatingOccupancyFraction</strong> of the GC_OPTS parameter by referring to the Note.</span><p><div class="note" id="ALM-19007__note8237141213115"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ol type="a" id="ALM-19007__ol98931136413"><li id="ALM-19007__li12894436414">Suggestions on GC parameter configurations for HMaster<ul id="ALM-19007__ul1575412555118"><li id="ALM-19007__li207541155214">Set <strong id="ALM-19007__b157677941118">-Xms</strong> and <strong id="ALM-19007__b1776710918113">-Xmx</strong> to the same value to prevent JVM from dynamically adjusting the heap memory size and affecting performance.</li><li id="ALM-19007__li8754175518118">Set <strong id="ALM-19007__b1706142619239">-XX:NewSize</strong> to the value of <strong id="ALM-19007__b47061026172320">-XX:MaxNewSize</strong>, which is one eighth of <strong id="ALM-19007__b20706626122313">-Xmx</strong>.</li><li id="ALM-19007__li875445512117">For large-scale HBase clusters with a large number of regions, increase values of <strong id="ALM-19007__b398019572614">GC_OPTS</strong> parameters for HMaster. Specifically, set <strong id="ALM-19007__b17980859260">-Xmx</strong> to 4 GB if the number of regions is less than 100,000. If the number of regions is more than 100,000, set -Xmx to be greater than or equal to 6 GB. For each increased 35,000 regions, increase the value of <strong id="ALM-19007__b79808582618">-Xmx</strong> by 2 GB. The maximum value of <strong id="ALM-19007__b109808518268">-Xmx</strong> is 32 GB.</li></ul>
|
|
</li><li id="ALM-19007__li7966124915114">Suggestions on GC parameter configurations for RegionServer<ul id="ALM-19007__ul8262185416213"><li id="ALM-19007__li11863072314">Set <strong id="ALM-19007__b19627407123">-Xms</strong> and <strong id="ALM-19007__b1262124061218">-Xmx</strong> to the same value to prevent JVM from dynamically adjusting the heap memory size and affecting performance.</li><li id="ALM-19007__li162621954329">Set <strong id="ALM-19007__b250503141217">-XX:NewSize</strong> to one eighth of <strong id="ALM-19007__b17505203111126">-Xmx</strong>.</li><li id="ALM-19007__li889514285313">Set the memory for RegionServer to be greater than that for HMaster. If sufficient memory is available, increase the heap memory.</li><li id="ALM-19007__li168901016311">Set <strong id="ALM-19007__b11220193171320">-Xmx</strong> based on the machine memory size. Specifically, set <strong id="ALM-19007__b7220173114135">-Xmx</strong> to 32 GB if the machine memory is greater than 200 GB, to 16 GB if the machine memory is greater than 128 GB and less than 200 GB, and to 8 GB if the machine memory is less than 128 GB. When <strong id="ALM-19007__b8631164614134">-Xmx</strong> is set to 32 GB, a RegionServer node supports 2000 regions and 200 hotspot regions.</li><li id="ALM-19007__li1025122479"><strong id="ALM-19007__b349205920133">XX:CMSInitiatingOccupancyFraction</strong> to be less than and equal to <strong id="ALM-19007__b194922597134">85</strong>, and it is calculated as follows: 100 x (hfile.block.cache.size + hbase.regionserver.global.memstore.size)</li></ul>
|
|
</li></ol>
|
|
</div></div>
|
|
</p></li><li id="ALM-19007__li51628204194753"><span>Check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-19007__ul28806538194753"><li id="ALM-19007__li28300723194753">If yes, no further action is required.</li><li id="ALM-19007__li10657266194753">If no, go to <a href="#ALM-19007__li55997378194753">6</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-19007__p57932250194753"><strong id="ALM-19007__b22985570194817">Collect fault information.</strong></p>
|
|
<ol start="6" id="ALM-19007__ol1705244194820"><li id="ALM-19007__li55997378194753"><a name="ALM-19007__li55997378194753"></a><a name="li55997378194753"></a><span>On the FusionInsight Manager interface of active and standby clusters, choose <strong id="ALM-19007__b1933413102614">O&M</strong> > <strong id="ALM-19007__b298516206520">Log </strong>><strong id="ALM-19007__b11985132015516"> Download</strong>.</span></li><li id="ALM-19007__li44879532194753"><span>In the <strong id="ALM-19007__b34214359194753">Service</strong> drop-down list box, select <strong id="ALM-19007__b1631016246710">HBase</strong> in the required cluster.</span></li><li id="ALM-19007__li1145664103113"><span>Click <span><img id="ALM-19007__image1945644173117" src="en-us_image_0269417421.png"></span> in the upper right corner, and set <strong id="ALM-19007__b6456941173117">Start Date</strong> and <strong id="ALM-19007__b11456154113318">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-19007__b13456164113319">Download</strong>.</span></li><li id="ALM-19007__li65155516194753"><span>Contact the <span id="ALM-19007__text4614151421417">O&M personnel</span> and send the collected fault logs.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="ALM-19007__section1529716184534"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-19007__p4677152685316">After the fault is rectified, the system automatically clears this alarm.</p>
|
|
</div>
|
|
<div class="section" id="ALM-19007__sc893e6bd42e948be8718e5f727a7a70a"><h4 class="sectiontitle">Related Information</h4><p id="ALM-19007__en-us_topic_0070543521_p26571327">None</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|