forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
93 lines
13 KiB
HTML
93 lines
13 KiB
HTML
<a name="ALM-13009"></a><a name="ALM-13009"></a>
|
|
|
|
<h1 class="topictitle1">ALM-13009 ZooKeeper Znode Capacity Usage Exceeds the Threshold</h1>
|
|
<div id="body1559547426810"><div class="section" id="ALM-13009__section18794533"><h4 class="sectiontitle">Description</h4><p id="ALM-13009__p65268561">The system checks the level-2 ZNode status in the ZooKeeper data directory every hour. This alarm is generated when the system detects that the capacity usage exceeds the threshold.</p>
|
|
</div>
|
|
<div class="section" id="ALM-13009__section34933073"><h4 class="sectiontitle">Attribute</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-13009__table52262125" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-13009__row24697033"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-13009__p54302662">Alarm ID</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-13009__p36439520">Alarm Severity</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-13009__p65919998">Auto Clear</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-13009__row37919625"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-13009__p1163219417345">13009</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-13009__p1663217423418">Major</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-13009__p16632104193412">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-13009__section45962205"><h4 class="sectiontitle">Parameters</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-13009__table51772816" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-13009__row55869420"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-13009__p29129184">Name</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-13009__p10653667">Meaning</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-13009__row115206463329"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-13009__p77584302119">Source</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-13009__p187931338134115">Specifies the cluster for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-13009__row57640736"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-13009__p65062640">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-13009__p22422626">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-13009__row477048"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-13009__p15708115853117">ServiceDirectory</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-13009__p42904606">Specifies the directory for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-13009__row111316194717"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-13009__p39186745">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-13009__p20009785">Specifies the role for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-13009__row50597141"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-13009__p4727789">Trigger Condition</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-13009__p47406613">Specifies the threshold for triggering the alarm.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-13009__section11006666"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-13009__p14730421">A large amount of data is written to the ZooKeeper data directory. As a result, ZooKeeper cannot provide services properly.</p>
|
|
</div>
|
|
<div class="section" id="ALM-13009__section31951138"><h4 class="sectiontitle">Possible Causes</h4><ul id="ALM-13009__ul52313462"><li id="ALM-13009__li1059114">A large volume of data has been written to the ZooKeeper data directory.</li><li id="ALM-13009__li9532031">The threshold is improperly defined.</li></ul>
|
|
</div>
|
|
<div class="section" id="ALM-13009__section16109195613361"><h4 class="sectiontitle">Procedure</h4><p class="tableheading" id="ALM-13009__p33897081"><strong id="ALM-13009__b146410316362">Check whether a large volume of data is written to the alarm directory.</strong></p>
|
|
<ol id="ALM-13009__ol18001226161846"><li id="ALM-13009__li60203448161840"><span>On FusionInsight Manager, choose <strong id="ALM-13009__b8758184123913">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-13009__b93941245103918">Alarm</strong> > <strong id="ALM-13009__b15240174820391">Alarms</strong>. Click the drop-down list in the row containing <strong id="ALM-13009__b1849254204016">ALM-13009 ZooKeeper ZNode Capacity Usage Exceeds the Threshold</strong>, and find the ZNode for which the alarm is generated in the <strong id="ALM-13009__b1193010114412">Location</strong> area.</span></li><li id="ALM-13009__li43268947161840"><span>Choose <strong id="ALM-13009__b205191610115110">Cluster</strong> > <strong id="ALM-13009__b181251826135113">Services</strong> > <strong id="ALM-13009__b1654072811517">ZooKeeper</strong>. On the page that is displayed, click the <strong id="ALM-13009__b5200636105219">Resource</strong> tab. In the <strong id="ALM-13009__b96538509589">Used Resources (By Second-Level ZNode)</strong> area, click <strong id="ALM-13009__b12822140205912">By capacity</strong> and check whether a large amount of data is written to the top-level ZNode directory.</span><p><ul class="subitemlist" id="ALM-13009__ul27177282161840"><li id="ALM-13009__li66225583161840">If yes, record the directory to which a large amount of data is written and go to <a href="#ALM-13009__li151971257113310">3</a>.</li><li id="ALM-13009__li62672021161840">If no, go to <a href="#ALM-13009__li1932073512913">5</a>.</li></ul>
|
|
</p></li><li id="ALM-13009__li151971257113310"><a name="ALM-13009__li151971257113310"></a><a name="li151971257113310"></a><span>Check whether data in the directory can be deleted.</span><p><div class="notice" id="ALM-13009__note10522164172614"><span class="noticetitle"><img src="public_sys-resources/notice_3.0-en-us.png"> </span><div class="noticebody"><p id="ALM-13009__p11523114172619">Deleting data from ZooKeeper is a high-risk operation. Exercise caution when performing this operation.</p>
|
|
</div></div>
|
|
<ul id="ALM-13009__ul05959336510"><li id="ALM-13009__li185951533354">If yes, go to <a href="#ALM-13009__li40737202161840">4</a>.</li><li id="ALM-13009__li58226466519">If no, go to <a href="#ALM-13009__li1932073512913">5</a>.</li></ul>
|
|
</p></li><li id="ALM-13009__li40737202161840"><a name="ALM-13009__li40737202161840"></a><a name="li40737202161840"></a><span>Log in to the ZooKeeper client and delete unnecessary data from the directory to which a large amount of data is written.</span><p><ol type="a" id="ALM-13009__ol1619518095019"><li id="ALM-13009__li201631459154910">Log in to the ZooKeeper client installation directory, for example, <strong id="ALM-13009__b122572427918"><span id="ALM-13009__ph381512063917">/opt/client</span></strong>, and configure environment variables.<p id="ALM-13009__p133491745118"><strong id="ALM-13009__b6111153617813">cd <span id="ALM-13009__ph173011483314">/opt/client</span></strong></p>
|
|
<p id="ALM-13009__p979013835116"><strong id="ALM-13009__b1713183612811">source bigdata_env</strong></p>
|
|
</li><li id="ALM-13009__li71151545513">Run the following command to authenticate the user (skip this step for a cluster in normal mode):<p id="ALM-13009__p8586550175220"><a name="ALM-13009__li71151545513"></a><a name="li71151545513"></a><strong id="ALM-13009__b193001381489">kinit</strong><strong id="ALM-13009__b1930118381287"> </strong><em id="ALM-13009__i15218121115417">Component service user</em></p>
|
|
</li><li id="ALM-13009__li8721530135219">Run the following command to log in to the client tool:<p id="ALM-13009__p15031730262"><a name="ALM-13009__li8721530135219"></a><a name="li8721530135219"></a><strong id="ALM-13009__b198512111718">zkCli.sh -server</strong> <strong id="ALM-13009__b1038310591968"><</strong><em id="ALM-13009__i16396358053">Service IP address of the node where any ZooKeeper instance resides</em><strong id="ALM-13009__b64211231473">>:<</strong><em id="ALM-13009__i1462420472068">Client port</em><strong id="ALM-13009__b8344367715">></strong></p>
|
|
</li><li id="ALM-13009__li111218191777">Run the following command to delete unnecessary data:<p id="ALM-13009__p1491115716712"><a name="ALM-13009__li111218191777"></a><a name="li111218191777"></a><strong id="ALM-13009__b13308112080">delete </strong><em id="ALM-13009__i62191937773">Path of the file to be deleted</em></p>
|
|
</li></ol>
|
|
</p></li><li id="ALM-13009__li1932073512913"><a name="ALM-13009__li1932073512913"></a><a name="li1932073512913"></a><span>Log in to FusionInsight Manager and choose <strong id="ALM-13009__b720292481">Cluster</strong> > <strong id="ALM-13009__b6517161218820">Services</strong> > <strong id="ALM-13009__b144701613810">ZooKeeper</strong>. On the page that is displayed, click the <strong id="ALM-13009__b147071132882">Configuration</strong> tab then the <strong id="ALM-13009__b1128815459817">All Configurations</strong> sub-tab, and search for <strong id="ALM-13009__b11591191015916">max.data.size</strong>. The value of <strong id="ALM-13009__b279418301295">max.data.size</strong> is the maximum capacity quota of the ZooKeeper directory. The unit is byte. Search for the <strong id="ALM-13009__b10405135813130">GC_OPTS</strong> configuration item and check the value of <strong id="ALM-13009__b795017131418">Xmx</strong>.</span></li><li id="ALM-13009__li132461505465"><span>Compare the values of <strong id="ALM-13009__b811421914149">max.data.size</strong> and <strong id="ALM-13009__b1336314417164">Xmx*0.65</strong>. The threshold is the smaller value multiplied by 80%. You can change the values of <strong id="ALM-13009__b011917356159">max.data.size</strong> and <strong id="ALM-13009__b1847315816168">Xmx*0.65</strong> to increase the threshold.</span></li><li id="ALM-13009__li817635715531"><span>Check whether the alarm is cleared.</span><p><ul id="ALM-13009__ul833062035413"><li id="ALM-13009__li41891434105416">If yes, no further action is required.</li><li id="ALM-13009__li73302020145419">If no, go to <a href="#ALM-13009__li57092876161840">8</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-13009__p63671191161840"><strong id="ALM-13009__b4527812416195">Collect the fault information.</strong></p>
|
|
<ol start="8" id="ALM-13009__ol3717986016198"><li id="ALM-13009__li57092876161840"><a name="ALM-13009__li57092876161840"></a><a name="li57092876161840"></a><span>On FusionInsight Manager, choose <strong id="ALM-13009__b21325820428254">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-13009__b18663536448254">Log</strong> > <strong id="ALM-13009__b19983638768254">Download</strong>.</span></li><li id="ALM-13009__li51794488161840"><span>Expand the <strong id="ALM-13009__b11242906768254">Service</strong> drop-down list, and select <strong id="ALM-13009__b13396816248254">ZooKeeper</strong> for the target cluster.</span></li><li id="ALM-13009__li63497208161840"><span>Click <span><img id="ALM-13009__image104601319175315" src="en-us_image_0263895683.png"></span> in the upper right corner, and set <strong id="ALM-13009__b17262879568254">Start Date</strong> and <strong id="ALM-13009__b12804361838254">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-13009__b961519558254">Download</strong>.</span></li><li id="ALM-13009__li43000242161840"><span>Contact <span id="ALM-13009__text126301214142412">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="ALM-13009__section169311343318"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-13009__p754913417333">This alarm is automatically cleared after the fault is rectified.</p>
|
|
</div>
|
|
<div class="section" id="ALM-13009__section37905371"><h4 class="sectiontitle">Related Information</h4><p id="ALM-13009__p88508484307">None</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|