MRS UMN Doc 20240802 version

Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
This commit is contained in:
Yang, Tong 2024-09-28 19:04:58 +00:00 committed by zuul
parent f1bf1560d6
commit 5914b67d13
312 changed files with 18649 additions and 5709 deletions

File diff suppressed because it is too large Load Diff

View File

@ -2,7 +2,6 @@
<h1 class="topictitle1">ALM-12014 Partition Lost</h1> <h1 class="topictitle1">ALM-12014 Partition Lost</h1>
<div id="body1841371"><div class="section" id="ALM-12014__s305a8061b9134145a1a1e3f83ea9bfc4"><h4 class="sectiontitle">Description</h4><p id="ALM-12014__en-us_topic_0070543526_p19713524">The system checks the partition status every 60 seconds. This alarm is generated when the system detects that a partition to which service directories are mounted is lost (because the device is removed or goes offline, or the partition is deleted). The system checks the partition status periodically.</p> <div id="body1841371"><div class="section" id="ALM-12014__s305a8061b9134145a1a1e3f83ea9bfc4"><h4 class="sectiontitle">Description</h4><p id="ALM-12014__en-us_topic_0070543526_p19713524">The system checks the partition status every 60 seconds. This alarm is generated when the system detects that a partition to which service directories are mounted is lost (because the device is removed or goes offline, or the partition is deleted). The system checks the partition status periodically.</p>
<p id="ALM-12014__en-us_topic_0070543526_p43203995">This alarm must be manually cleared.</p>
</div> </div>
<div class="section" id="ALM-12014__s9888b5efac804e36a1257629159c863d"><h4 class="sectiontitle">Attribute</h4> <div class="section" id="ALM-12014__s9888b5efac804e36a1257629159c863d"><h4 class="sectiontitle">Attribute</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12014__en-us_topic_0070543526_table9862716" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12014__en-us_topic_0070543526_row48323826"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-12014__en-us_topic_0070543526_p21915813">Alarm ID</p> <div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12014__en-us_topic_0070543526_table9862716" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12014__en-us_topic_0070543526_row48323826"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-12014__en-us_topic_0070543526_p21915813">Alarm ID</p>
@ -17,7 +16,7 @@
</td> </td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-12014__en-us_topic_0070543526_p62417507">Major</p> <td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-12014__en-us_topic_0070543526_p62417507">Major</p>
</td> </td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-12014__en-us_topic_0070543526_p22653281">No</p> <td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><ul id="ALM-12014__ul788136174513"><li id="ALM-12014__li20881467459">Yes: MRS 3.3.0 and later versions</li><li id="ALM-12014__li19881264450">No: Versions earlier than MRS 3.3.0</li></ul>
</td> </td>
</tr> </tr>
</tbody> </tbody>
@ -71,12 +70,14 @@
</div> </div>
<div class="section" id="ALM-12014__sb1a1ee7b7a444d5dbe8388e9c9e8bba9"><h4 class="sectiontitle">Procedure</h4><ol id="ALM-12014__ol43371064173421"><li id="ALM-12014__li30640494173421"><span>On <span id="ALM-12014__text34789336432">MRS</span> Manager, click <strong id="ALM-12014__b18317580173421">O&amp;M &gt; Alarm &gt; Alarms</strong>, and click <span><img id="ALM-12014__image10408151910137" src="en-us_image_0000001532767638.png"></span> in the row where the alarm is located.</span></li><li id="ALM-12014__li51941841173421"><span>Obtain <strong id="ALM-12014__b65960965173421">HostName</strong>, <strong id="ALM-12014__b56777780173421">PartitionName</strong> and <strong id="ALM-12014__b41237977173421">DirName</strong> from <strong id="ALM-12014__b645062473115">Location</strong>.</span></li><li id="ALM-12014__li15983295173421"><span>Check whether the disk of <strong id="ALM-12014__b64823390173421">PartitionName</strong> on <strong id="ALM-12014__b46539606173421">HostName</strong> is inserted to the correct server slot.</span><p><ul class="subitemlist" id="ALM-12014__ul9232462173421"><li id="ALM-12014__li11611727173421">If yes, go to <a href="#ALM-12014__li9631929173421">4</a>.</li><li id="ALM-12014__li1025829173421">If no, go to <a href="#ALM-12014__li18162941173421">5</a>.</li></ul> <div class="section" id="ALM-12014__sb1a1ee7b7a444d5dbe8388e9c9e8bba9"><h4 class="sectiontitle">Procedure</h4><ol id="ALM-12014__ol43371064173421"><li id="ALM-12014__li30640494173421"><span>On <span id="ALM-12014__text34789336432">MRS</span> Manager, click <strong id="ALM-12014__b18317580173421">O&amp;M &gt; Alarm &gt; Alarms</strong>, and click <span><img id="ALM-12014__image10408151910137" src="en-us_image_0000001532767638.png"></span> in the row where the alarm is located.</span></li><li id="ALM-12014__li51941841173421"><span>Obtain <strong id="ALM-12014__b65960965173421">HostName</strong>, <strong id="ALM-12014__b56777780173421">PartitionName</strong> and <strong id="ALM-12014__b41237977173421">DirName</strong> from <strong id="ALM-12014__b645062473115">Location</strong>.</span></li><li id="ALM-12014__li15983295173421"><span>Check whether the disk of <strong id="ALM-12014__b64823390173421">PartitionName</strong> on <strong id="ALM-12014__b46539606173421">HostName</strong> is inserted to the correct server slot.</span><p><ul class="subitemlist" id="ALM-12014__ul9232462173421"><li id="ALM-12014__li11611727173421">If yes, go to <a href="#ALM-12014__li9631929173421">4</a>.</li><li id="ALM-12014__li1025829173421">If no, go to <a href="#ALM-12014__li18162941173421">5</a>.</li></ul>
</p></li><li id="ALM-12014__li9631929173421"><a name="ALM-12014__li9631929173421"></a><a name="li9631929173421"></a><span>Contact hardware engineers to remove the faulty disk.</span></li><li id="ALM-12014__li18162941173421"><a name="ALM-12014__li18162941173421"></a><a name="li18162941173421"></a><span>Log in to the <strong id="ALM-12014__b19578501173421">HostName</strong> node where an alarm is reported and check whether there is a line containing <strong id="ALM-12014__b41988789173421">DirName</strong> in the <strong id="ALM-12014__b42354785173421">/etc/fstab</strong> file as user <strong id="ALM-12014__b37365710490">root</strong>. <span id="ALM-12014__text43649449460"></span></span><p><ul class="subitemlist" id="ALM-12014__ul61670428173421"><li id="ALM-12014__li8185528173421">If yes, go to <a href="#ALM-12014__li20338192173421">6</a>.</li><li id="ALM-12014__li59048052173421">If no, go to <a href="#ALM-12014__li48826004173421">7</a>.</li></ul> </p></li><li id="ALM-12014__li9631929173421"><a name="ALM-12014__li9631929173421"></a><a name="li9631929173421"></a><span>Contact hardware engineers to remove the faulty disk.</span></li><li id="ALM-12014__li18162941173421"><a name="ALM-12014__li18162941173421"></a><a name="li18162941173421"></a><span>Log in to the <strong id="ALM-12014__b19578501173421">HostName</strong> node where an alarm is reported and check whether there is a line containing <strong id="ALM-12014__b41988789173421">DirName</strong> in the <strong id="ALM-12014__b42354785173421">/etc/fstab</strong> file as user <strong id="ALM-12014__b37365710490">root</strong>. <span id="ALM-12014__text43649449460"></span></span><p><ul class="subitemlist" id="ALM-12014__ul61670428173421"><li id="ALM-12014__li8185528173421">If yes, go to <a href="#ALM-12014__li20338192173421">6</a>.</li><li id="ALM-12014__li59048052173421">If no, go to <a href="#ALM-12014__li48826004173421">7</a>.</li></ul>
</p></li><li id="ALM-12014__li20338192173421"><a name="ALM-12014__li20338192173421"></a><a name="li20338192173421"></a><span>Run the <strong id="ALM-12014__b29248746173421">vi /etc/fstab</strong> command to edit the file and delete the line containing <strong id="ALM-12014__b61912122173421">DirName</strong>.</span></li><li id="ALM-12014__li48826004173421"><a name="ALM-12014__li48826004173421"></a><a name="li48826004173421"></a><span>Contact hardware engineers to insert a new disk. For details, see the hardware product document of the relevant model. If the faulty disk is in a RAID group, configure the RAID group. For details, see the configuration methods of the relevant RAID controller card.</span></li><li id="ALM-12014__li55753407173421"><span>Wait 20 to 30 minutes (The disk size determines the waiting time), and run the <strong id="ALM-12014__b36780855173421">mount</strong> command to check whether the disk has been mounted to the <strong id="ALM-12014__b62592242173421">DirName</strong> directory.</span><p><ul class="subitemlist" id="ALM-12014__ul28564444173421"><li id="ALM-12014__li26459270173421">If yes, manually clear the alarm. No further operation is required.</li><li id="ALM-12014__li62826150173421">If no, go to <a href="#ALM-12014__li1607193817587">9</a>.</li></ul> </p></li><li id="ALM-12014__li20338192173421"><a name="ALM-12014__li20338192173421"></a><a name="li20338192173421"></a><span>Run the <strong id="ALM-12014__b29248746173421">vi /etc/fstab</strong> command to edit the file and delete the line containing <strong id="ALM-12014__b61912122173421">DirName</strong>.</span></li><li id="ALM-12014__li48826004173421"><a name="ALM-12014__li48826004173421"></a><a name="li48826004173421"></a><span>Contact hardware engineers to insert a new disk. For details, see the hardware product document of the relevant model. If the faulty disk is in a RAID group, configure the RAID group. For details, see the configuration methods of the relevant RAID controller card.</span></li><li id="ALM-12014__li55753407173421"><span>Wait 20 to 30 minutes (The disk size determines the waiting time), and run the <strong id="ALM-12014__b36780855173421">mount</strong> command to check whether the disk has been mounted to the <strong id="ALM-12014__b62592242173421">DirName</strong> directory.</span><p><ul class="subitemlist" id="ALM-12014__ul28564444173421"><li id="ALM-12014__li26459270173421">If yes, <span id="ALM-12014__ph5591748152811">go to </span><a href="#ALM-12014__li4349723135320">9</a> for MRS 3.3.0 and later versions. For clusters earlier than MRS 3.3.0, manually clear the alarm. No further action is required<span id="ALM-12014__ph4592048132815">.</span></li><li id="ALM-12014__li62826150173421">If no, go to <a href="#ALM-12014__li1607193817587">10</a>.</li></ul>
</p></li><li class="subitemlist" id="ALM-12014__li4349723135320"><a name="ALM-12014__li4349723135320"></a><a name="li4349723135320"></a><span>Wait about 2 minute and check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-12014__ul16105939173731"><li id="ALM-12014__li52125820173731">If yes, no further action is required.</li><li id="ALM-12014__li61441872173731">If no, go to <a href="#ALM-12014__li1607193817587">10</a>.</li></ul>
</p></li></ol> </p></li></ol>
<p id="ALM-12014__p0392542185819"><strong id="ALM-12014__b59246063204559">Collect fault information.</strong></p> <p id="ALM-12014__p0392542185819"><strong id="ALM-12014__b59246063204559">Collect fault information.</strong></p>
<ol start="9" id="ALM-12014__ol36071038115815"><li id="ALM-12014__li1607193817587"><a name="ALM-12014__li1607193817587"></a><a name="li1607193817587"></a><span>On the <span id="ALM-12014__text314615114416">MRS</span> Manager, choose <strong id="ALM-12014__b87862548435">O&amp;M</strong> &gt; <strong id="ALM-12014__b11281153164820">Log &gt; Download</strong>.</span></li><li id="ALM-12014__li1560793895812"><span>Select the <strong id="ALM-12014__b486612581809">OmmServer</strong> from the Services drop-down list and click <strong id="ALM-12014__b20607238175815">OK</strong>.</span></li><li id="ALM-12014__li660723815584"><span>Set Start Date for log collection to 10 minutes ahead of the alarm generation time and End Date to 10 minutes behind the alarm generation time and click <strong id="ALM-12014__b15452018112">Download</strong>.</span></li><li id="ALM-12014__li495644512588"><span>Contact the <span id="ALM-12014__text4614151421417">O&amp;M personnel</span> and send the collected log information.</span></li></ol> <ol start="10" id="ALM-12014__ol36071038115815"><li id="ALM-12014__li1607193817587"><a name="ALM-12014__li1607193817587"></a><a name="li1607193817587"></a><span>On the <span id="ALM-12014__text314615114416">MRS</span> Manager, choose <strong id="ALM-12014__b87862548435">O&amp;M</strong> &gt; <strong id="ALM-12014__b11281153164820">Log &gt; Download</strong>.</span></li><li id="ALM-12014__li1560793895812"><span>Select the <strong id="ALM-12014__b486612581809">OmmServer</strong> from the Services drop-down list and click <strong id="ALM-12014__b20607238175815">OK</strong>.</span></li><li id="ALM-12014__li660723815584"><span>Set Start Date for log collection to 10 minutes ahead of the alarm generation time and End Date to 10 minutes behind the alarm generation time and click <strong id="ALM-12014__b15452018112">Download</strong>.</span></li><li id="ALM-12014__li495644512588"><span>Contact the <span id="ALM-12014__text4614151421417">O&amp;M personnel</span> and send the collected log information.</span></li></ol>
</div> </div>
<div class="section" id="ALM-12014__section1529716184534"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-12014__p697913319401">After the fault is rectified, the system does not automatically clear this alarm, and you need to manually clear the alarm.</p> <div class="section" id="ALM-12014__section1529716184534"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-12014__p4677152685316">MRS 3.3.0 and later versions: After the fault is rectified, the system automatically clears this alarm.</p>
<p id="ALM-12014__p9804190104613">Versions earlier than MRS 3.3.0: After the fault is rectified, the system does not automatically clear this alarm, and you need to manually clear the alarm.</p>
</div> </div>
<div class="section" id="ALM-12014__s30aa982d9de44f9d918fba0190750058"><h4 class="sectiontitle">Related Information</h4><p id="ALM-12014__en-us_topic_0070543526_p21391728">None</p> <div class="section" id="ALM-12014__s30aa982d9de44f9d918fba0190750058"><h4 class="sectiontitle">Related Information</h4><p id="ALM-12014__en-us_topic_0070543526_p21391728">None</p>
</div> </div>

View File

@ -1,15 +1,23 @@
<a name="ALM-12033"></a><a name="ALM-12033"></a> <a name="ALM-12033"></a><a name="ALM-12033"></a>
<h1 class="topictitle1">ALM-12033 Slow Disk Fault</h1> <h1 class="topictitle1">ALM-12033 Slow Disk Fault</h1>
<div id="body16648012"><div class="section" id="ALM-12033__section37461388"><h4 class="sectiontitle">Description</h4><ul id="ALM-12033__ul58461341018"><li id="ALM-12033__li9495543441">For HDDs, the alarm is triggered when any of the following conditions is met:<ul id="ALM-12033__ul12610161595313"><li id="ALM-12033__li5610201585311">The system runs the <strong id="ALM-12033__b1417353444611">iostat</strong> command every 3 seconds, and detects that the <strong id="ALM-12033__b15173183464614">svctm</strong> value exceeds 1000 ms for 7 consecutive periods within 30 seconds.</li><li id="ALM-12033__li9610111545314">The system runs the <strong id="ALM-12033__b46619613475">iostat</strong> command every 3 seconds, and detects that more than 50% of I/Os take more than 150 ms within 300s.</li></ul> <div id="body16648012"><div class="section" id="ALM-12033__section37461388"><h4 class="sectiontitle">Description</h4><p id="ALM-12033__p5341271587"><strong id="ALM-12033__b1865172020812">For MRS 3.3.0 and its later versions:</strong></p>
<ul id="ALM-12033__ul1028212014916"><li id="ALM-12033__li52821901296">For HDDs, the alarm is triggered when any of the following conditions is met:<ul id="ALM-12033__ul8282601492"><li id="ALM-12033__li1328210011915">By default, the system collects data every 3 seconds. The svctm latency reaches 1000 ms within 30 seconds in at least seven collection periods.</li><li id="ALM-12033__li328210014919">By default, the system collects data every 3 seconds. At least 50% of detected svctm take no less than 150 ms within 300 seconds.</li></ul>
</li><li id="ALM-12033__li182825020913">For SSDs, the alarm is triggered when any of the following conditions is met:<ul id="ALM-12033__ul02822018918"><li id="ALM-12033__li728290199">By default, the system collects data every 3 seconds. The svctm latency reaches 1000 ms within 30 seconds in at least seven collection periods.</li><li id="ALM-12033__li112829016918">By default, the system collects data every 3 seconds. At least 50% of detected svctm take no less than 20 ms within 300 seconds.</li></ul>
</li></ul>
<p id="ALM-12033__p172821702097">The collection period is 3 seconds, and the detection period is 30 or 300 seconds. This alarm is automatically cleared when neither of the preceding conditions is met for three consecutive detection periods (30 or 300 seconds).</p>
<p id="ALM-12033__p168428525712"><strong id="ALM-12033__b84321105496">For versions earlier than MRS 3.3.0:</strong></p>
<ul id="ALM-12033__ul58461341018"><li id="ALM-12033__li9495543441">For HDDs, the alarm is triggered when any of the following conditions is met:<ul id="ALM-12033__ul12610161595313"><li id="ALM-12033__li5610201585311">The system runs the <strong id="ALM-12033__b1417353444611">iostat</strong> command every 3 seconds, and detects that the <strong id="ALM-12033__b15173183464614">svctm</strong> value exceeds 1000 ms for 7 consecutive periods within 30 seconds.</li><li id="ALM-12033__li9610111545314">The system runs the <strong id="ALM-12033__b46619613475">iostat</strong> command every 3 seconds, and detects that more than 50% of I/Os take more than 150 ms within 300s.</li></ul>
</li><li id="ALM-12033__li88478345118">For SSDs, the alarm is triggered when any of the following conditions is met:<ul id="ALM-12033__ul1697514491912"><li id="ALM-12033__li20184348616">The system runs the <strong id="ALM-12033__b15668702267">iostat</strong> command every 3 seconds, and detects that the <strong id="ALM-12033__b266920112620">svctm</strong> value exceeds 1000 ms for 10 consecutive periods within 30 seconds.</li><li id="ALM-12033__li818514818112">The system runs the <strong id="ALM-12033__b549820317266">iostat</strong> command every 3 seconds, and detects that more than 60% of I/Os take more than 20 ms within 300 seconds.</li></ul> </li><li id="ALM-12033__li88478345118">For SSDs, the alarm is triggered when any of the following conditions is met:<ul id="ALM-12033__ul1697514491912"><li id="ALM-12033__li20184348616">The system runs the <strong id="ALM-12033__b15668702267">iostat</strong> command every 3 seconds, and detects that the <strong id="ALM-12033__b266920112620">svctm</strong> value exceeds 1000 ms for 10 consecutive periods within 30 seconds.</li><li id="ALM-12033__li818514818112">The system runs the <strong id="ALM-12033__b549820317266">iostat</strong> command every 3 seconds, and detects that more than 60% of I/Os take more than 20 ms within 300 seconds.</li></ul>
</li></ul> </li></ul>
<p id="ALM-12033__p1147865811515">This alarm is automatically cleared when the preceding conditions have not been met for 15 minutes.</p> <p id="ALM-12033__p1147865811515">This alarm is automatically cleared when the preceding conditions have not been met for 15 minutes.</p>
<div class="note" id="ALM-12033__note146121953385"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-12033__p10787194912146">The <strong id="ALM-12033__b4851941125114">svctm</strong> value can be obtained as follows:</p> <div class="note" id="ALM-12033__note146121953385"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-12033__p10787194912146">The <strong id="ALM-12033__b4851941125114">svctm</strong> value can be obtained as follows:</p>
<ul id="ALM-12033__ul12122227541"><li id="ALM-12033__li1775013885414">MRS 3.1.0:<p id="ALM-12033__p6761124418546"><a name="ALM-12033__li1775013885414"></a><a name="li1775013885414"></a>Run the <strong id="ALM-12033__b6647834165418">iostat -x -t</strong> command in the OS.</p> <ul id="ALM-12033__ul12122227541"><li id="ALM-12033__li1775013885414">MRS 3.1.0:<p id="ALM-12033__p6761124418546"><a name="ALM-12033__li1775013885414"></a><a name="li1775013885414"></a>Run the <strong id="ALM-12033__b6647834165418">iostat -x -t</strong> command in the OS.</p>
<p id="ALM-12033__p29371953145511"><span><img id="ALM-12033__image1950415575516" src="en-us_image_0000001583087321.png"></span></p> <p id="ALM-12033__p29371953145511"><span><img id="ALM-12033__image1950415575516" src="en-us_image_0000001583087321.png"></span></p>
</li><li id="ALM-12033__li023264515117">Versions later than MRS 3.1.0:</li></ul> </li><li id="ALM-12033__li023264515117">Versions later than MRS 3.1.0:<p id="ALM-12033__p332417335118"><a name="ALM-12033__li023264515117"></a><a name="li023264515117"></a>svctm = (tot_ticks_new - tot_ticks_old)/(rd_ios_new + wr_ios_new - rd_ios_old - wr_ios_old)</p>
<p id="ALM-12033__p332417335118">svctm = (tot_ticks_new - tot_ticks_old)/(rd_ios_new + wr_ios_new - rd_ios_old - wr_ios_old)</p> </li><li id="ALM-12033__li1619174674912">Versions earlier than MRS 3.3.0: If <strong id="ALM-12033__b188221852131413">rd_ios_new + wr_ios_new - rd_ios_old - wr_ios_old = 0</strong>, then <strong id="ALM-12033__b1982275211412">svctm = 0</strong>.</li><li id="ALM-12033__li1993324817494">MRS 3.3.0 and its later versions:<p id="ALM-12033__p1891618421461"><a name="ALM-12033__li1993324817494"></a><a name="li1993324817494"></a>When the detection period is 30 seconds, if <strong id="ALM-12033__b131901168243">rd_ios_new + wr_ios_new - rd_ios_old - wr_ios_old = 0</strong>, then <strong id="ALM-12033__b101903632413">svctm = 0</strong>.</p>
<p id="ALM-12033__p2010172134814">When the detection period is 300 seconds and <strong id="ALM-12033__b208045611429">rd_ios_new + wr_ios_new - rd_ios_old - wr_ios_old = 0</strong>, if <strong id="ALM-12033__b148095619424">tot_ticks_new - tot_ticks_old = 0</strong>, then <strong id="ALM-12033__b16803569429">svctm = 0</strong>; otherwise, the value of <strong id="ALM-12033__b780956204218">svctm</strong> is infinite.</p>
</li></ul>
<p id="ALM-12033__p4167121643616">If <strong id="ALM-12033__b15597134712416">rd_ios_new + wr_ios_new - rd_ios_old - wr_ios_old</strong> is <strong id="ALM-12033__b8379165513414">0</strong>, then <strong id="ALM-12033__b7311200253">svctm</strong> is <strong id="ALM-12033__b245516612518">0</strong>.</p> <p id="ALM-12033__p4167121643616">If <strong id="ALM-12033__b15597134712416">rd_ios_new + wr_ios_new - rd_ios_old - wr_ios_old</strong> is <strong id="ALM-12033__b8379165513414">0</strong>, then <strong id="ALM-12033__b7311200253">svctm</strong> is <strong id="ALM-12033__b245516612518">0</strong>.</p>
<p id="ALM-12033__p1268752201517">The parameters can be obtained as follows:</p> <p id="ALM-12033__p1268752201517">The parameters can be obtained as follows:</p>
<p id="ALM-12033__p5648122416463">The system runs the <strong id="ALM-12033__b3375154216449">cat /proc/diskstats</strong> command every 3 seconds to collect data. For example:</p> <p id="ALM-12033__p5648122416463">The system runs the <strong id="ALM-12033__b3375154216449">cat /proc/diskstats</strong> command every 3 seconds to collect data. For example:</p>
@ -32,7 +40,7 @@
</thead> </thead>
<tbody><tr id="ALM-12033__row65888257"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-12033__p35348568">12033</p> <tbody><tr id="ALM-12033__row65888257"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-12033__p35348568">12033</p>
</td> </td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-12033__p44661780">Minor</p> <td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><ul id="ALM-12033__ul1323211594214"><li id="ALM-12033__li202320595220">Minor: MRS 3.3.0 and its later versions</li><li id="ALM-12033__li152325590214">Major: versions earlier than MRS 3.3.0</li></ul>
</td> </td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-12033__p60834461">Yes</p> <td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-12033__p60834461">Yes</p>
</td> </td>

View File

@ -0,0 +1,80 @@
<a name="ALM-12091"></a><a name="ALM-12091"></a>
<h1 class="topictitle1">ALM-12091 Abnormal disaster Resources</h1>
<div id="body0000002008297073"><div class="section" id="ALM-12091__section10369415133116"><h4 class="sectiontitle"><span id="ALM-12091__text8925301575">Alarm Description</span></h4><p id="ALM-12091__p50249318">HA checks the disaster resources of Manager every 86 seconds. This alarm is generated when HA detects that the disaster resources have been abnormal for 10 consecutive times.</p>
<p id="ALM-12091__p49590684">This alarm is cleared when HA detects that the disaster resources become normal.</p>
<p id="ALM-12091__p79241142103811"><strong id="ALM-12091__b6559929206">Resource Type</strong> of disaster is <strong id="ALM-12091__b1956002911016">Single-active</strong>. Active/Standby switchover will be triggered upon resource exceptions. When this alarm is generated, the active/standby switchover is complete and new disaster resources have been enabled on the new active Manager. In this case, this alarm is cleared. This alarm is used to notify users of the cause of the active/standby Manager switchover.</p>
</div>
<div class="section" id="ALM-12091__section8323192410322"><h4 class="sectiontitle"><span id="ALM-12091__text38748475555">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12091__table1479793583212" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12091__row107991735133210"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-12091__p57710042"><span id="ALM-12091__text17980150175619">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-12091__p44001849"><span id="ALM-12091__text199471335614">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-12091__p7380012"><span id="ALM-12091__text152400388563">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-12091__row880183517329"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-12091__p108014356328">12091</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-12091__p19802163593213">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-12091__p880215356323">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-12091__section652875914327"><h4 class="sectiontitle"><span id="ALM-12091__text155061195577">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12091__table1090459143316" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12091__row190429173313"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-12091__p21975462"><span id="ALM-12091__text776142495720">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-12091__p35182007"><span id="ALM-12091__text632018391572">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-12091__row1751615582354"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12091__p17935380415">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12091__p187931338134115">Specifies the cluster or system for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-12091__row18907109203311"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12091__p99095916333">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12091__p4909159173310">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-12091__row4910691332"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12091__p39101953320">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12091__p5911189173310">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-12091__row59118923315"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12091__p0912169123319">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12091__p169131916332">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-12091__section2990133614335"><h4 class="sectiontitle"><span id="ALM-12091__text2266192715582">Impact on the System</span></h4><ul id="ALM-12091__ul25260697"><li id="ALM-12091__li26019688">The active/standby Manager switchover occurs.</li><li id="ALM-12091__li32850608">The disaster process restarts repeatedly, which may cause active/standby DR to be unavailable.</li></ul>
</div>
<div class="section" id="ALM-12091__section950130153414"><h4 class="sectiontitle"><span id="ALM-12091__text12656240135813">Possible Causes</span></h4><p id="ALM-12091__p171771431115712">The disaster process is abnormal.</p>
</div>
<div class="section" id="ALM-12091__section1548510327214"><h4 class="sectiontitle"><span id="ALM-12091__text19569135285811">Handling Procedure</span></h4><p class="tableheading" id="ALM-12091__p8324186"><strong id="ALM-12091__b1530064416313">Check whether the disaster process is normal.</strong></p>
<ol id="ALM-12091__ol5558276163811"><li id="ALM-12091__li34357272165726"><span>In the alarm list on FusionInsight Manager, locate the row that contains the alarm, and click <span><img id="ALM-12091__image168221113135319" src="en-us_image_0000002008258961.png"></span> to view the name of the host for which the alarm is generated.</span></li><li id="ALM-12091__li50024484163811"><span>Log in to the host for which the alarm is generated as user <strong id="ALM-12091__b11453743219">root</strong>. <span id="ALM-12091__text65184518511"></span></span></li><li id="ALM-12091__li1581327399"><span>Run the <strong id="ALM-12091__b249615917334">su - omm</strong> command to switch to user <strong id="ALM-12091__b1496159183317">omm</strong>.</span></li><li id="ALM-12091__li17626636132716"><span>Run the <strong id="ALM-12091__b32015537163811">sh ${BIGDATA_HOME}/om-server/OMS/workspace0/ha/module/hacom/script/status_ha.sh</strong> command to check whether the status of the disaster resources managed by the HA is normal. In the single-node system, the disaster resource is in the normal state. In the dual-node system, the disaster resource is in the normal state on the active node and in the stopped state on the standby node.</span><p><ul class="subitemlist" id="ALM-12091__ul66289368274"><li id="ALM-12091__li1062811360271">If yes, go to <a href="#ALM-12091__li6152360163635">7</a>.</li><li id="ALM-12091__li46281436112719">If no, go to <a href="#ALM-12091__li139657016249">5</a>.</li></ul>
</p></li><li id="ALM-12091__li139657016249"><a name="ALM-12091__li139657016249"></a><a name="li139657016249"></a><span>Run the <strong id="ALM-12091__b519675815717">vi ${BIGDATA_LOG_HOME}/disaster/disaster.log</strong> command to check whether the disaster resource log of HA contains the keyword <strong id="ALM-12091__b1919625895719">ERROR</strong>. If yes, analyze the logs to locate the resource exception cause and fix the exception.</span></li><li id="ALM-12091__li14736019164314"><span>Wait 5 minutes and check whether the alarm is automatically cleared.</span><p><ul class="subitemlist" id="ALM-12091__ul473671984320"><li id="ALM-12091__li9736151912432">If yes, no further action is required.</li><li id="ALM-12091__li4736141910439">If no, go to <a href="#ALM-12091__li6152360163635">7</a>.</li></ul>
</p></li></ol>
<p id="ALM-12091__p3652216163758"><strong id="ALM-12091__b83507409354">Collect fault information.</strong></p>
<ol start="7" id="ALM-12091__ol26111342163819"><li id="ALM-12091__li6152360163635"><a name="ALM-12091__li6152360163635"></a><a name="li6152360163635"></a><span>On FusionInsight Manager, choose <strong id="ALM-12091__b5931842173510">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-12091__b169334283512">Log</strong> &gt; <strong id="ALM-12091__b893154212351">Download</strong>.</span></li><li id="ALM-12091__li55371246163635"><span>Expand the <strong id="ALM-12091__b975881714366">Service</strong> drop-down list, select <strong id="ALM-12091__b1758517163617">Disaster</strong> for the target cluster, and click <strong id="ALM-12091__b8758417163620">OK</strong>.</span></li><li id="ALM-12091__li28579174163635"><span>Click <span><img id="ALM-12091__image69691781225" src="en-us_image_0000002008299541.png"></span> in the upper right corner, and set <strong id="ALM-12091__b17704133417363">Start Date</strong> and <strong id="ALM-12091__b87041334123611">End Date</strong> for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-12091__b1870583453615">Download</strong>.</span></li><li id="ALM-12091__li33211732163635"><span>Contact <span id="ALM-12091__text12867404363">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-12091__section129720811223"><h4 class="sectiontitle"><span id="ALM-12091__text367020138593">Alarm Clearance</span></h4><p id="ALM-12091__p19973168152211">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-12091__section53362350"><h4 class="sectiontitle"><span id="ALM-12091__text1246242445916">Related Information</span></h4><p id="ALM-12091__p7522741"><span id="ALM-12091__text1881919412591">None.</span></p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -1,7 +1,13 @@
<a name="ALM-12180"></a><a name="ALM-12180"></a> <a name="ALM-12180"></a><a name="ALM-12180"></a>
<h1 class="topictitle1">ALM-12180 Suspended Disk I/O</h1> <h1 class="topictitle1">ALM-12180 Suspended Disk I/O</h1>
<div id="body0000001353935630"><div class="section" id="ALM-12180__section14673296256"><h4 class="sectiontitle">Description</h4><ul id="ALM-12180__ul2014714514423"><li id="ALM-12180__li1614775174215">For HDDs, the alarm is triggered when any of the following conditions is met:<ul id="ALM-12180__ul17332858184213"><li id="ALM-12180__li3477654124215">The system collects data every 3 seconds, and detects that the <strong id="ALM-12180__b187823011515">svctm</strong> value exceeds 6s for 10 consecutive periods within 30 seconds.</li><li id="ALM-12180__li44771354124218">The system collects data every 3 seconds, and detects that the <strong id="ALM-12180__b1552492416563">avgqu-sz</strong> value is greater than 0, the IOPS or bandwidth is 0, and the <strong id="ALM-12180__b165881215175717">ioutil</strong> value is greater than <strong id="ALM-12180__b39725306574">99%</strong> for 10 consecutive periods within 30 seconds.</li></ul> <div id="body0000001353935630"><div class="section" id="ALM-12180__section14673296256"><h4 class="sectiontitle">Description</h4><p id="ALM-12180__p592920101711"><strong id="ALM-12180__b3954021192120">For MRS 3.3.0 and its later versions:</strong></p>
<ul id="ALM-12180__ul1599064812170"><li id="ALM-12180__li3990748141711">For HDDs, the alarm is triggered when any of the following conditions is met:<ul id="ALM-12180__ul199906487171"><li id="ALM-12180__li199074831718">By default, the system collects data every 3 seconds. The svctm latency reaches 6 seconds within 30 seconds in at least seven collection periods.</li><li id="ALM-12180__li2990114810171">By default, the system collects data every 3 seconds. The disk queue depth (<strong id="ALM-12180__b1721433115210">avgqu-sz</strong>) is greater than 0, the IOPS or bandwidth is 0, and <strong id="ALM-12180__b1421533105219">ioutil</strong> is greater than 99% in at least 10 collection periods within 30 seconds.</li><li id="ALM-12180__li799054861713">By default, the system collects data every 3 seconds. At least 50% of detected svctm take no less than 1000 ms within 300 seconds.</li></ul>
</li><li id="ALM-12180__li1699054820177">For SSDs, the alarm is triggered when any of the following conditions is met:<ul id="ALM-12180__ul18990948121720"><li id="ALM-12180__li199903486178">By default, the system collects data every 3 seconds. The svctm latency reaches 3 seconds within 30 seconds in at least seven collection periods.</li><li id="ALM-12180__li209906483174">By default, the system collects data every 3 seconds. The disk queue depth (<strong id="ALM-12180__b19773135116518">avgqu-sz</strong>) is greater than 0, the IOPS or bandwidth is 0, and <strong id="ALM-12180__b777420514512">ioutil</strong> is greater than 99% in at least 10 collection periods within 30 seconds.</li><li id="ALM-12180__li1899074812179">By default, the system collects data every 3 seconds. At least 50% of detected svctm take no less than 500 ms within 300 seconds.</li></ul>
</li></ul>
<p id="ALM-12180__p499024817173">The collection period is 3 seconds, and the detection period is 30 or 300 seconds. This alarm is automatically cleared when neither of the preceding conditions is met for three consecutive detection periods (30 or 300 seconds).</p>
<p id="ALM-12180__p1138317920171"><strong id="ALM-12180__b16881736201413">For versions earlier than MRS 3.3.0:</strong></p>
<ul id="ALM-12180__ul2014714514423"><li id="ALM-12180__li1614775174215">For HDDs, the alarm is triggered when any of the following conditions is met:<ul id="ALM-12180__ul17332858184213"><li id="ALM-12180__li3477654124215">The system collects data every 3 seconds, and detects that the <strong id="ALM-12180__b187823011515">svctm</strong> value exceeds 6s for 10 consecutive periods within 30 seconds.</li><li id="ALM-12180__li44771354124218">The system collects data every 3 seconds, and detects that the <strong id="ALM-12180__b1552492416563">avgqu-sz</strong> value is greater than 0, the IOPS or bandwidth is 0, and the <strong id="ALM-12180__b165881215175717">ioutil</strong> value is greater than <strong id="ALM-12180__b39725306574">99%</strong> for 10 consecutive periods within 30 seconds.</li></ul>
</li><li id="ALM-12180__li1447755464219">For SSDs, the alarm is triggered when any of the following conditions is met:<ul id="ALM-12180__ul1255610134312"><li id="ALM-12180__li15477195414210">The system collects data every 3 seconds, and detects that the <strong id="ALM-12180__b1360517433619">svctm</strong> value exceeds 2s for 10 consecutive periods within 30 seconds.</li><li id="ALM-12180__li947865419422">The system collects data every 3 seconds, and detects that the <strong id="ALM-12180__b134091219815">avgqu-sz</strong> value is greater than 0, the IOPS or bandwidth is 0, and the <strong id="ALM-12180__b144071210812">ioutil</strong> value is greater than <strong id="ALM-12180__b1141612088">99%</strong> for 10 consecutive periods within 30 seconds.</li></ul> </li><li id="ALM-12180__li1447755464219">For SSDs, the alarm is triggered when any of the following conditions is met:<ul id="ALM-12180__ul1255610134312"><li id="ALM-12180__li15477195414210">The system collects data every 3 seconds, and detects that the <strong id="ALM-12180__b1360517433619">svctm</strong> value exceeds 2s for 10 consecutive periods within 30 seconds.</li><li id="ALM-12180__li947865419422">The system collects data every 3 seconds, and detects that the <strong id="ALM-12180__b134091219815">avgqu-sz</strong> value is greater than 0, the IOPS or bandwidth is 0, and the <strong id="ALM-12180__b144071210812">ioutil</strong> value is greater than <strong id="ALM-12180__b1141612088">99%</strong> for 10 consecutive periods within 30 seconds.</li></ul>
</li></ul> </li></ul>
<p id="ALM-12180__p4178195414013">This alarm is automatically cleared when the preceding conditions have not been met for 90s.</p> <p id="ALM-12180__p4178195414013">This alarm is automatically cleared when the preceding conditions have not been met for 90s.</p>
@ -15,16 +21,17 @@
</li><li id="ALM-12180__li1775013885414">MRS 3.1.0:<p id="ALM-12180__p6761124418546"><a name="ALM-12180__li1775013885414"></a><a name="li1775013885414"></a>Run the <strong id="ALM-12180__b20959192741112">iostat -x -t</strong> command in the OS.</p> </li><li id="ALM-12180__li1775013885414">MRS 3.1.0:<p id="ALM-12180__p6761124418546"><a name="ALM-12180__li1775013885414"></a><a name="li1775013885414"></a>Run the <strong id="ALM-12180__b20959192741112">iostat -x -t</strong> command in the OS.</p>
<p id="ALM-12180__p29371953145511"><span><img id="ALM-12180__image1950415575516" src="en-us_image_0000001532607690.png"></span></p> <p id="ALM-12180__p29371953145511"><span><img id="ALM-12180__image1950415575516" src="en-us_image_0000001532607690.png"></span></p>
</li><li id="ALM-12180__li023264515117">Calculate <strong id="ALM-12180__b169471639111116">svctm</strong> as follows in versions later than MRS 3.1.0:<p id="ALM-12180__p332417335118">svctm = (tot_ticks_new - tot_ticks_old)/(rd_ios_new + wr_ios_new - rd_ios_old - wr_ios_old)</p> </li><li id="ALM-12180__li023264515117">Calculate <strong id="ALM-12180__b169471639111116">svctm</strong> as follows in versions later than MRS 3.1.0:<p id="ALM-12180__p332417335118">svctm = (tot_ticks_new - tot_ticks_old)/(rd_ios_new + wr_ios_new - rd_ios_old - wr_ios_old)</p>
<p id="ALM-12180__p4167121643616">If <strong id="ALM-12180__b169718121611">rd_ios_new + wr_ios_new - rd_ios_old - wr_ios_old</strong> is <strong id="ALM-12180__b8972101216616">0</strong>, then <strong id="ALM-12180__b8972712666">svctm</strong> is <strong id="ALM-12180__b15972181217617">0</strong>.</p> </li><li id="ALM-12180__li263234505719">Versions earlier than MRS 3.3.0: If <strong id="ALM-12180__b1249415917273">rd_ios_new + wr_ios_new - rd_ios_old - wr_ios_old = 0</strong>, then <strong id="ALM-12180__b5495155962715">svctm = 0</strong>.</li><li id="ALM-12180__li3635194519573">MRS 3.3.0 and its later versions:<p id="ALM-12180__p1818817301575"><a name="ALM-12180__li3635194519573"></a><a name="li3635194519573"></a>When the detection period is 30 seconds, if <strong id="ALM-12180__b1842112818281">rd_ios_new + wr_ios_new - rd_ios_old - wr_ios_old = 0</strong>, then <strong id="ALM-12180__b9421112813282">svctm = 0</strong>.</p>
<p id="ALM-12180__p1268752201517">The parameters can be obtained as follows:</p> <p id="ALM-12180__p59781931171919">When the detection period is 300 seconds and <strong id="ALM-12180__b127418421289">rd_ios_new + wr_ios_new - rd_ios_old - wr_ios_old = 0</strong>, if <strong id="ALM-12180__b1274134212286">tot_ticks_new - tot_ticks_old = 0</strong>, then <strong id="ALM-12180__b0274342112813">svctm = 0</strong>; otherwise, the value of <strong id="ALM-12180__b10274114210289">svctm</strong> is infinite.</p>
<p id="ALM-12180__p5648122416463">The system runs the <strong id="ALM-12180__b3987117155318">cat /proc/diskstats</strong> command every 3 seconds to collect data. For example:</p>
<p id="ALM-12180__p1657515122539"><span><img id="ALM-12180__image1675110291273" src="en-us_image_0000001583087345.png"></span></p>
<p id="ALM-12180__p146243408539">In these two commands:</p>
<p id="ALM-12180__p1264621195310">In the data collected for the first time, the number in the fourth column is the <strong id="ALM-12180__b2974115335316">rd_ios_old</strong> value, the number in the eighth column is the <strong id="ALM-12180__b1197410533532">wr_ios_old</strong> value, and the number in the thirteenth column is the <strong id="ALM-12180__b0974125312533">tot_ticks_old</strong> value.</p>
<p id="ALM-12180__p415119825410">In the data collected for the second time, the number in the fourth column is the <strong id="ALM-12180__b5467171016545">rd_ios_new</strong> value, the number in the eighth column is the <strong id="ALM-12180__b746711016548">wr_ios_new</strong> value, and the number in the thirteenth column is the <strong id="ALM-12180__b446716104542">tot_ticks_new</strong> value.</p>
<p id="ALM-12180__p1328974985416">In this case, the value of <strong id="ALM-12180__b71451317175415">svctm</strong> is as follows:</p>
<p id="ALM-12180__p296819576542">(19571460 - 19569526)/(1101553 + 28747977 - 1101553 - 28744856) = 0.6197</p>
</li></ul> </li></ul>
<p id="ALM-12180__p681018184114">The parameters can be obtained as follows:</p>
<p id="ALM-12180__p1964583419">The system runs the <strong id="ALM-12180__b3987117155318">cat /proc/diskstats</strong> command every 3 seconds to collect data. For example:</p>
<p id="ALM-12180__p178281094114"><span><img id="ALM-12180__image1573148194110" src="en-us_image_0000001583087345.png"></span></p>
<p id="ALM-12180__p57318164113">In these two commands:</p>
<p id="ALM-12180__p173148124116">In the data collected for the first time, the number in the fourth column is the <strong id="ALM-12180__b1273281411">rd_ios_old</strong> value, the number in the eighth column is the <strong id="ALM-12180__b4737884111">wr_ios_old</strong> value, and the number in the thirteenth column is the <strong id="ALM-12180__b9738810415">tot_ticks_old</strong> value.</p>
<p id="ALM-12180__p14731384412">In the data collected for the second time, the number in the fourth column is the <strong id="ALM-12180__b6735854112">rd_ios_new</strong> value, the number in the eighth column is the <strong id="ALM-12180__b4737814116">wr_ios_new</strong> value, and the number in the thirteenth column is the <strong id="ALM-12180__b6731788411">tot_ticks_new</strong> value.</p>
<p id="ALM-12180__p207318834113">In this case, the value of <strong id="ALM-12180__b19731286414">svctm</strong> is as follows:</p>
<p id="ALM-12180__p7734824114">(19571460 - 19569526)/(1101553 + 28747977 - 1101553 - 28744856) = 0.6197</p>
</div></div> </div></div>
</div> </div>
<div class="section" id="ALM-12180__section28308296"><h4 class="sectiontitle">Attribute</h4> <div class="section" id="ALM-12180__section28308296"><h4 class="sectiontitle">Attribute</h4>

View File

@ -0,0 +1,83 @@
<a name="ALM-12186"></a><a name="ALM-12186"></a>
<h1 class="topictitle1">ALM-12186 CGroup Task Usage Exceeds the Threshold</h1>
<div id="body0000001971656752"><div class="section" id="ALM-12186__section59446631"><h4 class="sectiontitle"><span id="ALM-12186__text8925301575">Alarm Description</span></h4><p id="ALM-12186__p133011517258">The system checks the CGroup task usage of user <strong id="ALM-12186__b17330105132510">omm</strong> every 5 minutes. This alarm is generated when the CGroup task usage exceeds 90%. This alarm is cleared when the CGroup task usage is less than or equal to 90%.</p>
<p id="ALM-12186__p196911897275">CGroup task usage = Number of used CGroup tasks/Maximum number of CGroup tasks</p>
<p id="ALM-12186__p20655700">You can run the <strong id="ALM-12186__b48417473221">systemctl status user-$(id -u).slice | grep limit | awk -F ' ' '{print $2}'</strong> command as user <strong id="ALM-12186__b10108105192314">omm</strong> to obtain the number of used CGroup tasks of this user and run the <strong id="ALM-12186__b2012312932312">echo $(systemctl status user-$(id -u).slice | grep limit | awk -F ' ' '{print $4}') | sed -e 's/)//g'</strong> command to obtain the maximum number of CGroup tasks allowed for this user.</p>
</div>
<div class="section" id="ALM-12186__section65257632"><h4 class="sectiontitle"><span id="ALM-12186__text38748475555">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12186__table62499017" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12186__row14656770"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-12186__p57710042"><span id="ALM-12186__text17980150175619">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-12186__p44001849"><span id="ALM-12186__text199471335614">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-12186__p7380012"><span id="ALM-12186__text152400388563">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-12186__row37197943"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-12186__p1840118642517">12186</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-12186__p1539715612254">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-12186__p739516182515">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-12186__section50447784"><h4 class="sectiontitle"><span id="ALM-12186__text155061195577">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12186__table7221941" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12186__row38703738"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-12186__p21975462"><span id="ALM-12186__text776142495720">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-12186__p35182007"><span id="ALM-12186__text632018391572">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-12186__row12600839183317"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12186__p14765121319101">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12186__p1486615431212">Specifies the cluster or system for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-12186__row20687350"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12186__p1376621317100">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12186__p35582507">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-12186__row51807112"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12186__p15766121315107">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12186__p70810">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-12186__row637295"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12186__p2076651312109">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12186__p20545297">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-12186__section51376879"><h4 class="sectiontitle"><span id="ALM-12186__text2266192715582">Impact on the System</span></h4><ul id="ALM-12186__ul912918518349"><li id="ALM-12186__li61291353340">Failed to switch to user <strong id="ALM-12186__b998615452283">omm</strong>.</li><li id="ALM-12186__li61290573410">Failed to create new <strong id="ALM-12186__b171581889204">omm</strong> processes.</li></ul>
<ul id="ALM-12186__ul356321419141"><li id="ALM-12186__li145631914181416">A faulty service or process cannot be restarted.</li></ul>
</div>
<div class="section" id="ALM-12186__section59738735"><h4 class="sectiontitle"><span id="ALM-12186__text12656240135813">Possible Causes</span></h4><p id="ALM-12186__p63545420285">The CGroup task usage exceeds 90%.</p>
</div>
<div class="section" id="ALM-12186__section74438017253"><h4 class="sectiontitle"><span id="ALM-12186__text19569135285811">Handling Procedure</span></h4><p id="ALM-12186__p6803921141514"><strong id="ALM-12186__b10197143715218">Check the maximum number of threads that can be concurrently opened by user omm is properly set.</strong></p>
<ol id="ALM-12186__ol18615415345"><li id="ALM-12186__li4861174163414"><span>Log in to FusionInsight Manager and choose <strong id="ALM-12186__b59771025152414">O&amp;M</strong> &gt; <strong id="ALM-12186__b12977325132418">Alarm</strong> &gt; <strong id="ALM-12186__b29783254243">Alarms</strong>. On the page that is displayed, click <span><img id="ALM-12186__image178619416342" src="en-us_image_0000001971659200.png"></span> in the row containing the alarm, and view the name of the host for which the alarm is generated in <strong id="ALM-12186__b139781825112416">Location</strong>. Click the host name to view its IP address.</span></li><li id="ALM-12186__li38611343349"><span>Log in to the host for which the alarm is generated as user <strong id="ALM-12186__b108614453411">omm</strong>.</span></li><li id="ALM-12186__li78615410344"><span>Run the following command to obtain the maximum number of threads that can be concurrently opened by user <strong id="ALM-12186__b44603483302">omm</strong> and check whether this number is greater than or equal to <strong id="ALM-12186__b1359611271442">60000</strong>:</span><p><p id="ALM-12186__p186144103410"><strong id="ALM-12186__b188611440346">systemctl status user-$(id -u).slice | grep limit</strong></p>
<ul id="ALM-12186__ul68351049153616"><li id="ALM-12186__li683524983611">If yes, go to <a href="#ALM-12186__li18602412348">6</a>.</li><li id="ALM-12186__li168351249113620">If no, go to <a href="#ALM-12186__li9448150105813">4</a>.</li></ul>
</p></li><li id="ALM-12186__li9448150105813"><a name="ALM-12186__li9448150105813"></a><a name="li9448150105813"></a><span>Switch to user <strong id="ALM-12186__b1474565455812">root</strong> and run the following command to change the value for user <strong id="ALM-12186__b1974513542588">omm</strong> to <strong id="ALM-12186__b4971456144717">60000</strong>:</span><p><p id="ALM-12186__p3829509598"><strong id="ALM-12186__b98290075918">systemctl set-property user-2000.slice TasksMax=60000</strong></p>
</p></li><li id="ALM-12186__li23671340113420"><span>Change the value of <strong id="ALM-12186__b5941937204815">UserTasksMax</strong> in the <strong id="ALM-12186__b143671241124814">/etc/systemd/logind.conf</strong> file to <strong id="ALM-12186__b19903204454814">60000</strong>. (If the parameter is commented out, uncomment it.) Save the file, wait 5 minutes, and check whether the alarm is cleared.</span><p><ul id="ALM-12186__ul068811558364"><li id="ALM-12186__li7689125573617">If yes, no further action is required.</li><li id="ALM-12186__li156890551363">If no, go to <a href="#ALM-12186__li18602412348">6</a>.</li></ul>
</p></li></ol>
<p id="ALM-12186__p1215415911338"><strong id="ALM-12186__b594844774514">Collect fault information.</strong></p>
<ol start="6" id="ALM-12186__ol98604413413"><li id="ALM-12186__li18602412348"><a name="ALM-12186__li18602412348"></a><a name="li18602412348"></a><span>On FusionInsight Manager of the cluster, choose <strong id="ALM-12186__b145611142462">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-12186__b756164154619">Log</strong> &gt; <strong id="ALM-12186__b25611140464">Download</strong>.</span></li><li id="ALM-12186__li128601240346"><span>Expand the <strong id="ALM-12186__b563015444610">Service</strong> drop-down list, select <strong id="ALM-12186__b435816267461">OmmServer</strong> and <strong id="ALM-12186__b17358926194615">NodeAgent</strong> for the target cluster, and click <strong id="ALM-12186__b1963711277476">OK</strong>.</span></li><li id="ALM-12186__li12860194133410"><span>Click <span><img id="ALM-12186__image104601319175315" src="en-us_image_0000001971818972.png"></span> in the upper right corner, and set <strong id="ALM-12186__b1417313467476">Start Date</strong> and <strong id="ALM-12186__b12174184612477">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-12186__b16174164617476">Download</strong>.</span></li><li id="ALM-12186__li1886064173413"><span>Contact <span id="ALM-12186__text8470138174810">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-12186__section169311343318"><h4 class="sectiontitle"><span id="ALM-12186__text367020138593">Alarm Clearance</span></h4><p id="ALM-12186__p754913417333">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-12186__section53362350"><h4 class="sectiontitle"><span id="ALM-12186__text1246242445916">Related Information</span></h4><p id="ALM-12186__p7522741"><span id="ALM-12186__text1881919412591">None.</span></p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

102
docs/mrs/umn/ALM-12187.html Normal file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,85 @@
<a name="ALM-12188"></a><a name="ALM-12188"></a>
<h1 class="topictitle1">ALM-12188 diskmgt Disk Monitoring Unavailable</h1>
<div id="body0000002008256521"><div class="section" id="ALM-12188__section14673296256"><h4 class="sectiontitle"><span id="ALM-12188__text8925301575">Alarm Description</span></h4><p id="ALM-12188__p14221612114715">NodeAgent checks the status of the diskmgt disk monitoring service every 5 minutes. This alarm is generated when diskmgt disk monitoring is unavailable.</p>
<p id="ALM-12188__p0221191210476">This alarm is cleared when the diskmgt disk monitoring service recovers.</p>
</div>
<div class="section" id="ALM-12188__section28308296"><h4 class="sectiontitle"><span id="ALM-12188__text38748475555">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12188__table36969235" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12188__row42433012"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-12188__p57710042"><span id="ALM-12188__text17980150175619">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-12188__p44001849"><span id="ALM-12188__text199471335614">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-12188__p7380012"><span id="ALM-12188__text152400388563">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-12188__row21396528"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-12188__p181161948202917">12188</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-12188__p3114164812914">Minor</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-12188__p111105489292">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-12188__section53448080"><h4 class="sectiontitle"><span id="ALM-12188__text155061195577">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-12188__table33617909" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-12188__row23730911"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-12188__p21975462"><span id="ALM-12188__text776142495720">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-12188__p35182007"><span id="ALM-12188__text632018391572">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-12188__row96067296346"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12188__p13424120164815">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12188__p124259012489">Specifies the cluster or system for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-12188__row28589139"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12188__p1742519034818">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12188__p184251003485">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-12188__row7926750304"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12188__p10378852194810">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12188__p114251064811">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-12188__row1080912111496"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-12188__p342513064820">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-12188__p103531553614">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-12188__section14442155121012"><h4 class="sectiontitle"><span id="ALM-12188__text2266192715582">Impact on the System</span></h4><p id="ALM-12188__p756414444920">When diskmgt disk monitoring is unavailable, the read-only detection of the device partition file system, device partition loss detection, and disk partition scale-out detection cannot be performed.</p>
</div>
<div class="section" id="ALM-12188__section18133852349"><h4 class="sectiontitle"><span id="ALM-12188__text12656240135813">Possible Causes</span></h4><ul id="ALM-12188__ul20753183122111"><li id="ALM-12188__li187531031132115">The diskmgt disk monitoring service does not exist.</li><li id="ALM-12188__li137531431122114">The diskmgt disk monitoring service is not started.</li></ul>
</div>
<div class="section" id="ALM-12188__section1481384019290"><h4 class="sectiontitle"><span id="ALM-12188__text19569135285811">Handling Procedure</span></h4><p id="ALM-12188__p863220328248"><strong id="ALM-12188__b1780312320442">Check whether the diskmgt disk monitoring service exists.</strong></p>
<ol id="ALM-12188__ol53051313141410"><li id="ALM-12188__li12658345203210"><span>Log in to FusionInsight Manager, click <strong id="ALM-12188__b295115361448">O&amp;M</strong>, and choose <strong id="ALM-12188__b129521136154416">Alarm</strong> &gt; <strong id="ALM-12188__b17952736144411">Alarms</strong> to view the alarm details. In the <strong id="ALM-12188__b1095213634418">Location</strong> column, check the name of the host for which the alarm is generated. Click the host name to view its IP address.</span></li><li id="ALM-12188__li639818123229"><span>Log in to the node for which the alarm is generated as user <strong id="ALM-12188__b123417532449">root</strong>.</span></li><li id="ALM-12188__li94191028135118"><span>Run the following command to check whether the core service file exists:</span><p><p id="ALM-12188__p1229375915528"><strong id="ALM-12188__b1691616572228">stat /usr/local/diskmgt/inner/diskmgtd</strong></p>
<p id="ALM-12188__p11951165662314">If the file does not exist, contact <span id="ALM-12188__text138985011164">O&amp;M personnel</span>.</p>
</p></li></ol>
<p id="ALM-12188__p10889644250"><strong id="ALM-12188__b5473634154515">Start the diskmgt disk monitoring service.</strong></p>
<ol start="4" id="ALM-12188__ol12899114192517"><li id="ALM-12188__li98991648257"><span>Run the following command to start the diskmgt disk monitoring service:</span><p><p id="ALM-12188__p188991047258"><strong id="ALM-12188__b387138172714">systemctl restart diskmgt</strong></p>
</p></li><li id="ALM-12188__li88993422519"><span>Run the following command to check whether the diskmgt disk monitoring service is started:</span><p><p id="ALM-12188__p789913410252"><strong id="ALM-12188__b1510673942713">systemctl status diskmgt</strong></p>
<ul id="ALM-12188__ul11610104684116"><li id="ALM-12188__li5610124613415">If information similar to the following is displayed, the service is started successfully. Go to <a href="#ALM-12188__li09010504164">6</a>.<p id="ALM-12188__p653711564416"><span><img id="ALM-12188__image653745664119" src="en-us_image_0000002008258977.png"></span></p>
</li></ul>
<ul id="ALM-12188__ul923895804114"><li id="ALM-12188__li1423995894119">If no, contact <span id="ALM-12188__text127613104464">O&amp;M personnel</span>.</li></ul>
</p></li><li id="ALM-12188__li09010504164"><a name="ALM-12188__li09010504164"></a><a name="li09010504164"></a><span>Wait for 5 minutes, click <strong id="ALM-12188__b931583484612">O&amp;M</strong>, and choose <strong id="ALM-12188__b17316234164616">Alarm</strong> &gt; <strong id="ALM-12188__b18316183413466">Alarms</strong> on FusionInsight Manager. Check whether the alarm is cleared.</span><p><ul id="ALM-12188__ul49095010162"><li id="ALM-12188__li1090175001611">If yes, no further action is required.</li><li id="ALM-12188__li890850101611">If no, contact <span id="ALM-12188__text102591238154611">O&amp;M personnel</span>.</li></ul>
</p></li></ol>
</div>
<div class="section" id="ALM-12188__section7293173912175"><h4 class="sectiontitle"><span id="ALM-12188__text367020138593">Alarm Clearance</span></h4><p id="ALM-12188__p4178195414013">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-12188__section53362350"><h4 class="sectiontitle"><span id="ALM-12188__text1246242445916">Related Information</span></h4><p id="ALM-12188__p7522741"><span id="ALM-12188__text1881919412591">None.</span></p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,81 @@
<a name="ALM-14031"></a><a name="ALM-14031"></a>
<h1 class="topictitle1">ALM-14031 DataNode Process Is Abnormal</h1>
<div id="body0000002008297081"><div class="section" id="ALM-14031__section8243740"><h4 class="sectiontitle"><span id="ALM-14031__text8925301575">Alarm Description</span></h4><p id="ALM-14031__p8353691349">The DataNode process checks the process status every 20 seconds. This alarm is generated when the process status is abnormal and does not recover for a long time.</p>
<p id="ALM-14031__p1931134211237">This alarm is cleared when the process status recovers.</p>
</div>
<div class="section" id="ALM-14031__section7084804"><h4 class="sectiontitle"><span id="ALM-14031__text38748475555">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14031__table38418539" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14031__row53418480"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-14031__p31929608"><span id="ALM-14031__text981514694317">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-14031__p36161432"><span id="ALM-14031__text15260185184313">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-14031__p43394889"><span id="ALM-14031__text27412586431">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-14031__row25325122"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-14031__p853895314331">14031</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-14031__p115373532334">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-14031__p1553517532330">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-14031__section63763242"><h4 class="sectiontitle"><span id="ALM-14031__text155061195577">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14031__table3554205" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14031__row22865724"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-14031__p21975462"><span id="ALM-14031__text776142495720">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-14031__p35182007"><span id="ALM-14031__text632018391572">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-14031__row10137556112512"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14031__p859219498522">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14031__p2059134995215">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14031__row46079102"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14031__p1059010490521">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14031__p35886492524">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14031__row63154538"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14031__p12587144965212">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14031__p145851849195219">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14031__row1089082402316"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14031__p51620924">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14031__p34048007">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-14031__section36998271"><h4 class="sectiontitle"><span id="ALM-14031__text2266192715582">Impact on the System</span></h4><p id="ALM-14031__p16253019">If the process status is abnormal, the process cannot provide services properly. As a result, the entire service may become abnormal.</p>
</div>
<div class="section" id="ALM-14031__section64548988"><h4 class="sectiontitle"><span id="ALM-14031__text12656240135813">Possible Causes</span></h4><p id="ALM-14031__p8207814181819">The host responds slowly to I/O (disk I/O and network I/O) requests and some processes are in the D state and Z state. The process may also be suspended and enter the T state.</p>
</div>
<div class="section" id="ALM-14031__section770654563320"><h4 class="sectiontitle"><span id="ALM-14031__text19569135285811">Handling Procedure</span></h4><p id="ALM-14031__p1243515278455"><strong id="ALM-14031__b1655484819527">Check whether the process is in the D, Z, or T state.</strong></p>
<ol id="ALM-14031__ol8805715143410"><li id="ALM-14031__li1980611196816"><span>Log in to FusionInsight Manager and choose <strong id="ALM-14031__b1530417210108">O&amp;M</strong> &gt; <strong id="ALM-14031__b664215411018">Alarm</strong> &gt; <strong id="ALM-14031__b63760791011">Alarms</strong>. Wait for about 10 minutes and check whether the alarm is automatically cleared.</span><p><ul id="ALM-14031__ul10505203319910"><li id="ALM-14031__li5505533895">If the alarm is not in the list, no further action is required.</li><li id="ALM-14031__li350517336917">If the alarm is in the list, view the alarm details and record the IP address of the host where the alarm is generated. Run the command in <a href="#ALM-14031__li162831544134616">2</a>.</li></ul>
</p></li><li id="ALM-14031__li162831544134616"><a name="ALM-14031__li162831544134616"></a><a name="li162831544134616"></a><span>Log in to the host where the alarm is generated as the <strong id="ALM-14031__b1578713318414">root</strong> user and run the <strong id="ALM-14031__b131521842151211">su - omm</strong> command to switch to the <strong id="ALM-14031__b133931244201216">omm</strong> user.</span></li><li id="ALM-14031__li129386734811"><span>Run the following command to check the process state:</span><p><p id="ALM-14031__p114995439534"><strong id="ALM-14031__b105101533205318">ps ww -eo stat,cmd| grep -w org.apache.hadoop.hdfs.server.datanode.DataNode | grep -v grep | awk '{print$1}'</strong></p>
</p></li><li id="ALM-14031__li0510123385319"><span>Check whether the command output contains any abnormal state (D, Z, or T).</span><p><ul id="ALM-14031__ul161804819579"><li id="ALM-14031__li161818483576">If the output contains any abnormal state, go to <a href="#ALM-14031__li39471558560">5</a>.</li><li id="ALM-14031__li1661854818575">If the output does not contain abnormal states, go to <a href="#ALM-14031__li14805191513412">7</a>.</li></ul>
</p></li><li id="ALM-14031__li39471558560"><a name="ALM-14031__li39471558560"></a><a name="li39471558560"></a><span>Switch to user <strong id="ALM-14031__b94993490139">root</strong> and run the <strong id="ALM-14031__b9500154991318">reboot</strong> command to restart the host for which the alarm is generated. (Restarting a host is risky. Ensure that the service process is normal after the restart.)</span></li><li id="ALM-14031__li7936132616563"><span>Wait 5 minutes and check whether the alarm is cleared.</span><p><ul id="ALM-14031__ul19652752195618"><li id="ALM-14031__li1365317526566">If the alarm is cleared, no further action is required.</li><li id="ALM-14031__li2065375285614">If the alarm fails to be cleared, go to <a href="#ALM-14031__li14805191513412">7</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-14031__p3255143214441"><strong id="ALM-14031__b17190233165214">Collect fault information.</strong></p>
<ol start="7" id="ALM-14031__ol480581514342"><li id="ALM-14031__li14805191513412"><a name="ALM-14031__li14805191513412"></a><a name="li14805191513412"></a><span>On FusionInsight Manager, choose <strong id="ALM-14031__b463700064113054">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-14031__b402136686113054">Log</strong> &gt; <strong id="ALM-14031__b1875241582113054">Download</strong>.</span></li><li id="ALM-14031__li168051615113417"><span>Expand the drop-down list next to the <strong id="ALM-14031__b15369453141411">Service</strong> field. In the <strong id="ALM-14031__b10370353171419">Services</strong> dialog box that is displayed, select <strong id="ALM-14031__b14370153101416">HDFS</strong> for the target cluster.</span></li><li id="ALM-14031__li5805171503414"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-14031__b18636418253">Start Date</strong> and <strong id="ALM-14031__b28631142250">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14031__b17864154132515">Download</strong>.</span></li><li id="ALM-14031__li10805181583414"><span>Contact <span id="ALM-14031__text19191183321513">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-14031__section169311343318"><h4 class="sectiontitle"><span id="ALM-14031__text367020138593">Alarm Clearance</span></h4><p id="ALM-14031__p754913417333">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-14031__section53362350"><h4 class="sectiontitle"><span id="ALM-14031__text1246242445916">Related Information</span></h4><p id="ALM-14031__p7522741"><span id="ALM-14031__text1881919412591">None.</span></p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,82 @@
<a name="ALM-14032"></a><a name="ALM-14032"></a>
<h1 class="topictitle1">ALM-14032 JournalNode Process Is Abnormal</h1>
<div id="body0000001971656756"><div class="section" id="ALM-14032__section979815471118"><h4 class="sectiontitle"><span id="ALM-14032__text1079812471120">Alarm Description</span></h4><p id="ALM-14032__p8353691349">The JournalNode process checks the process status every 20 seconds. This alarm is generated when the process status is abnormal and does not recover for a long time.</p>
<p id="ALM-14032__p197982471413">This alarm is cleared when the process status recovers.</p>
</div>
<div class="section" id="ALM-14032__section18798204714110"><h4 class="sectiontitle"><span id="ALM-14032__text2798164712118">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14032__table87986471415" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14032__row167981047613"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-14032__p12798647315"><span id="ALM-14032__text10798547517">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-14032__p16798124719115"><span id="ALM-14032__text157981347317">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-14032__p17992471410"><span id="ALM-14032__text15799194720117">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-14032__row67994478118"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-14032__p18799747419">14032</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-14032__p279974710111">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-14032__p107994471713">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-14032__section19799184712110"><h4 class="sectiontitle"><span id="ALM-14032__text27993470117">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14032__table3799204720116" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14032__row1879915471215"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-14032__p177993479118"><span id="ALM-14032__text207998471417">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-14032__p579954720114"><span id="ALM-14032__text127995473116">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-14032__row1179918471011"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14032__p859219498522">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14032__p2059134995215">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14032__row1279964711115"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14032__p1059010490521">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14032__p35886492524">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14032__row079994716117"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14032__p12587144965212">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14032__p145851849195219">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14032__row443143222315"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14032__p51620924">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14032__p34048007">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-14032__section0799144716115"><h4 class="sectiontitle"><span id="ALM-14032__text479911470117">Impact on the System</span></h4><p id="ALM-14032__p8799247918">If the process status is abnormal, the process cannot provide services properly. As a result, the entire service may become abnormal.</p>
</div>
<div class="section" id="ALM-14032__section1479910471912"><h4 class="sectiontitle"><span id="ALM-14032__text187997470114">Possible Causes</span></h4><p id="ALM-14032__p276313327196">The host responds slowly to I/O (disk I/O and network I/O) requests and some processes are in the D state and Z state. The process may also be suspended and enter the T state.</p>
</div>
<div class="section" id="ALM-14032__section179924719116"><h4 class="sectiontitle"><span id="ALM-14032__text1799947611">Handling Procedure</span></h4><p id="ALM-14032__p1243515278455"><strong id="ALM-14032__b19561554105317">Check whether the process is in the D, Z, or T state.</strong></p>
<ol id="ALM-14032__ol67999471216"><li id="ALM-14032__li1980611196816"><span>Log in to FusionInsight Manager and choose <strong id="ALM-14032__b633663291610">O&amp;M</strong> &gt; <strong id="ALM-14032__b73361132151618">Alarm</strong> &gt; <strong id="ALM-14032__b133673291615">Alarms</strong>. Wait for about 10 minutes and check whether the alarm is automatically cleared.</span><p><ul id="ALM-14032__ul10505203319910"><li id="ALM-14032__li5505533895">If the alarm is not in the list, no further action is required.</li><li id="ALM-14032__li350517336917">If the alarm is in the list, view the alarm details and record the IP address of the host where the alarm is generated. Run the command in <a href="#ALM-14032__li162831544134616">2</a>.</li></ul>
</p></li><li id="ALM-14032__li162831544134616"><a name="ALM-14032__li162831544134616"></a><a name="li162831544134616"></a><span>Log in to the host where the alarm is generated as the <strong id="ALM-14032__b8152183916183">root</strong> user and run the <strong id="ALM-14032__b1215323991811">su - omm</strong> command to switch to the <strong id="ALM-14032__b11531439181818">omm</strong> user.</span></li><li id="ALM-14032__li129386734811"><span>Run the following command to check the process state:</span><p><p id="ALM-14032__p114995439534"><strong id="ALM-14032__b105101533205318">ps ww -eo stat,cmd| grep -w org.apache.hadoop.hdfs.qjournal.server.JournalNode | grep -v grep | awk '{print$1}'</strong></p>
</p></li><li id="ALM-14032__li3621133502116"><span>Check whether the command output contains any abnormal state (D, Z, or T).</span><p><ul id="ALM-14032__ul161804819579"><li id="ALM-14032__li670603211">If the output contains any abnormal state, go to <a href="#ALM-14032__li39471558560">5</a>.</li><li id="ALM-14032__li47070182111">If the output does not contain abnormal states, go to <a href="#ALM-14032__li17799174711116">7</a>.</li></ul>
</p></li><li id="ALM-14032__li39471558560"><a name="ALM-14032__li39471558560"></a><a name="li39471558560"></a><span>Switch to user <strong id="ALM-14032__b105163881919">root</strong> and run the <strong id="ALM-14032__b6517582194">reboot</strong> command to restart the host for which the alarm is generated. (Restarting a host is risky. Ensure that the service process is normal after the restart.)</span></li><li id="ALM-14032__li7936132616563"><span>Wait 5 minutes and check whether the alarm is cleared.</span><p><ul id="ALM-14032__ul19652752195618"><li id="ALM-14032__li1365317526566">If the alarm is cleared, no further action is required.</li><li id="ALM-14032__li2065375285614">If the alarm fails to be cleared, go to <a href="#ALM-14032__li17799174711116">7</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-14032__p2079910471716"><strong id="ALM-14032__b1648416219547">Collect fault information.</strong></p>
<ol start="7" id="ALM-14032__ol37994471410"><li id="ALM-14032__li17799174711116"><a name="ALM-14032__li17799174711116"></a><a name="li17799174711116"></a><span>On FusionInsight Manager, choose <strong id="ALM-14032__b1377244511199">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-14032__b1677394513195">Log</strong> &gt; <strong id="ALM-14032__b12773144551914">Download</strong>.</span></li><li id="ALM-14032__li177999474110"><span>Expand the drop-down list next to the <strong id="ALM-14032__b1524684816195">Service</strong> field. In the <strong id="ALM-14032__b92474487193">Services</strong> dialog box that is displayed, select <strong id="ALM-14032__b162479489190">HDFS</strong> for the target cluster.</span></li><li id="ALM-14032__li5799147219"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-14032__b17773913202510">Start Date</strong> and <strong id="ALM-14032__b67743138253">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14032__b3774181332510">Download</strong>.</span></li><li id="ALM-14032__li57991247416"><span>Contact <span id="ALM-14032__text9526257151916">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-14032__section979934710111"><h4 class="sectiontitle"><span id="ALM-14032__text1379918471115">Alarm Clearance</span></h4><p id="ALM-14032__p27991247919">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-14032__section879913471915"><h4 class="sectiontitle"><span id="ALM-14032__text16799164711115">Related Information</span></h4><p id="ALM-14032__p1779913479110"><span id="ALM-14032__text879984715119">None.</span></p>
</div>
<p id="ALM-14032__p5696162553415"></p>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,81 @@
<a name="ALM-14033"></a><a name="ALM-14033"></a>
<h1 class="topictitle1">ALM-14033 ZKFC Process Is Abnormal</h1>
<div id="body0000001971816508"><div class="section" id="ALM-14033__section979815471118"><h4 class="sectiontitle"><span id="ALM-14033__text1079812471120">Alarm Description</span></h4><p id="ALM-14033__p8353691349">The ZKFC process checks the process status every 20 seconds. This alarm is generated when the process status is abnormal and does not recover for a long time.</p>
<p id="ALM-14033__p197982471413">This alarm is cleared when the process status recovers.</p>
</div>
<div class="section" id="ALM-14033__section18798204714110"><h4 class="sectiontitle"><span id="ALM-14033__text2798164712118">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14033__table87986471415" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14033__row167981047613"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-14033__p12798647315"><span id="ALM-14033__text10798547517">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-14033__p16798124719115"><span id="ALM-14033__text157981347317">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-14033__p17992471410"><span id="ALM-14033__text15799194720117">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-14033__row67994478118"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-14033__p18799747419">14033</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-14033__p279974710111">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-14033__p107994471713">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-14033__section19799184712110"><h4 class="sectiontitle"><span id="ALM-14033__text27993470117">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14033__table3799204720116" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14033__row1879915471215"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-14033__p177993479118"><span id="ALM-14033__text207998471417">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-14033__p579954720114"><span id="ALM-14033__text127995473116">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-14033__row1179918471011"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14033__p859219498522">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14033__p2059134995215">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14033__row1279964711115"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14033__p1059010490521">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14033__p35886492524">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14033__row079994716117"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14033__p12587144965212">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14033__p145851849195219">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14033__row149900404239"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14033__p51620924">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14033__p34048007">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-14033__section0799144716115"><h4 class="sectiontitle"><span id="ALM-14033__text479911470117">Impact on the System</span></h4><p id="ALM-14033__p8799247918">If the process status is abnormal, the process cannot provide services properly. As a result, the entire service may become abnormal.</p>
</div>
<div class="section" id="ALM-14033__section1479910471912"><h4 class="sectiontitle"><span id="ALM-14033__text187997470114">Possible Causes</span></h4><p id="ALM-14033__p1647015610239">The host responds slowly to I/O (disk I/O and network I/O) requests and some processes are in the D state and Z state. The process may also be suspended and enter the T state.</p>
</div>
<div class="section" id="ALM-14033__section179924719116"><h4 class="sectiontitle"><span id="ALM-14033__text1799947611">Handling Procedure</span></h4><p id="ALM-14033__p1243515278455"><strong id="ALM-14033__b6239811105419">Check whether the process is in the D, Z, or T state.</strong></p>
<ol id="ALM-14033__ol67999471216"><li id="ALM-14033__li1980611196816"><span>Log in to FusionInsight Manager and choose <strong id="ALM-14033__b149051811214">O&amp;M</strong> &gt; <strong id="ALM-14033__b1290161817211">Alarm</strong> &gt; <strong id="ALM-14033__b189171812117">Alarms</strong>. Wait for about 10 minutes and check whether the alarm is automatically cleared.</span><p><ul id="ALM-14033__ul10505203319910"><li id="ALM-14033__li5505533895">If the alarm is not in the list, no further action is required.</li><li id="ALM-14033__li350517336917">If the alarm is in the list, view the alarm details and record the IP address of the host where the alarm is generated. Run the command in <a href="#ALM-14033__li191311041031">2</a>.</li></ul>
</p></li><li id="ALM-14033__li191311041031"><a name="ALM-14033__li191311041031"></a><a name="li191311041031"></a><span>Log in to the host where the alarm is generated as the <strong id="ALM-14033__b31081042122115">root</strong> user and run the <strong id="ALM-14033__b171086426213">su - omm</strong> command to switch to the <strong id="ALM-14033__b131082423217">omm</strong> user.</span></li><li id="ALM-14033__li129386734811"><span>Run the following command to check whether the process state is abnormal:</span><p><p id="ALM-14033__p114995439534"><strong id="ALM-14033__b105101533205318">ps ww -eo stat,cmd| grep -w org.apache.hadoop.hdfs.tools.DFSZKFailoverController | grep -v grep | awk '{print$1}'</strong></p>
</p></li><li id="ALM-14033__li3621133502116"><span>Check whether the command output contains any abnormal state (D, Z, or T).</span><p><ul id="ALM-14033__ul161804819579"><li id="ALM-14033__li670603211">If the output contains any abnormal state, go to <a href="#ALM-14033__li39471558560">5</a>.</li><li id="ALM-14033__li47070182111">If the output does not contain abnormal states, go to <a href="#ALM-14033__li17799174711116">7</a>.</li></ul>
</p></li><li id="ALM-14033__li39471558560"><a name="ALM-14033__li39471558560"></a><a name="li39471558560"></a><span>Switch to user <strong id="ALM-14033__b937623310221">root</strong> and run the <strong id="ALM-14033__b537733311228">reboot</strong> command to restart the host for which the alarm is generated. (Restarting a host is risky. Ensure that the service process is normal after the restart.)</span></li><li id="ALM-14033__li7936132616563"><span>Wait 5 minutes and check whether the alarm is cleared.</span><p><ul id="ALM-14033__ul19652752195618"><li id="ALM-14033__li1365317526566">If the alarm is cleared, no further action is required.</li><li id="ALM-14033__li2065375285614">If the alarm fails to be cleared, go to <a href="#ALM-14033__li17799174711116">7</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-14033__p2079910471716"><strong id="ALM-14033__b14258101712544">Collect fault information.</strong></p>
<ol start="7" id="ALM-14033__ol37994471410"><li id="ALM-14033__li17799174711116"><a name="ALM-14033__li17799174711116"></a><a name="li17799174711116"></a><span>On FusionInsight Manager, choose <strong id="ALM-14033__b12332161919238">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-14033__b1833361911230">Log</strong> &gt; <strong id="ALM-14033__b1133317198238">Download</strong>.</span></li><li id="ALM-14033__li177999474110"><span>Expand the drop-down list next to the <strong id="ALM-14033__b16661422122318">Service</strong> field. In the <strong id="ALM-14033__b5667182202315">Services</strong> dialog box that is displayed, select <strong id="ALM-14033__b1166812210237">HDFS</strong> for the target cluster.</span></li><li id="ALM-14033__li5799147219"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-14033__b1370492018259">Start Date</strong> and <strong id="ALM-14033__b18704142012518">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14033__b147041420182520">Download</strong>.</span></li><li id="ALM-14033__li57991247416"><span>Contact <span id="ALM-14033__text4716173792311">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-14033__section979934710111"><h4 class="sectiontitle"><span id="ALM-14033__text1379918471115">Alarm Clearance</span></h4><p id="ALM-14033__p27991247919">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-14033__section879913471915"><h4 class="sectiontitle"><span id="ALM-14033__text16799164711115">Related Information</span></h4><p id="ALM-14033__p1779913479110"><span id="ALM-14033__text879984715119">None.</span></p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,81 @@
<a name="ALM-14034"></a><a name="ALM-14034"></a>
<h1 class="topictitle1">ALM-14034 Router Process Is Abnormal</h1>
<div id="body0000002008256525"><div class="section" id="ALM-14034__section979815471118"><h4 class="sectiontitle"><span id="ALM-14034__text1079812471120">Alarm Description</span></h4><p id="ALM-14034__p8353691349">The Router process checks the process status every 20 seconds. This alarm is generated when the process status is abnormal and does not recover for a long time.</p>
<p id="ALM-14034__p197982471413">This alarm is cleared when the process status recovers.</p>
</div>
<div class="section" id="ALM-14034__section18798204714110"><h4 class="sectiontitle"><span id="ALM-14034__text2798164712118">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14034__table87986471415" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14034__row167981047613"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-14034__p12798647315"><span id="ALM-14034__text10798547517">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-14034__p16798124719115"><span id="ALM-14034__text157981347317">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-14034__p17992471410"><span id="ALM-14034__text15799194720117">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-14034__row67994478118"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-14034__p18799747419">14034</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-14034__p279974710111">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-14034__p107994471713">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-14034__section19799184712110"><h4 class="sectiontitle"><span id="ALM-14034__text27993470117">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14034__table3799204720116" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14034__row1879915471215"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-14034__p177993479118"><span id="ALM-14034__text207998471417">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-14034__p579954720114"><span id="ALM-14034__text127995473116">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-14034__row1179918471011"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14034__p859219498522">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14034__p2059134995215">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14034__row1279964711115"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14034__p1059010490521">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14034__p35886492524">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14034__row079994716117"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14034__p12587144965212">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14034__p145851849195219">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14034__row16592124952318"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14034__p51620924">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14034__p34048007">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-14034__section0799144716115"><h4 class="sectiontitle"><span id="ALM-14034__text479911470117">Impact on the System</span></h4><p id="ALM-14034__p8799247918">If the process status is abnormal, the process cannot provide services properly. As a result, the entire service may become abnormal.</p>
</div>
<div class="section" id="ALM-14034__section1479910471912"><h4 class="sectiontitle"><span id="ALM-14034__text187997470114">Possible Causes</span></h4><p id="ALM-14034__p1626235122417">The host responds slowly to I/O (disk I/O and network I/O) requests and some processes are in the D state and Z state. The process may also be suspended and enter the T state.</p>
</div>
<div class="section" id="ALM-14034__section179924719116"><h4 class="sectiontitle"><span id="ALM-14034__text1799947611">Handling Procedure</span></h4><p id="ALM-14034__p1243515278455"><strong id="ALM-14034__b34831828145411">Check whether the process is in the D, Z, or T state.</strong></p>
<ol id="ALM-14034__ol67999471216"><li id="ALM-14034__li1980611196816"><span>Log in to FusionInsight Manager and choose <strong id="ALM-14034__b1937751732912">O&amp;M</strong> &gt; <strong id="ALM-14034__b1837741713297">Alarm</strong> &gt; <strong id="ALM-14034__b6377121732913">Alarms</strong>. Wait for about 10 minutes and check whether the alarm is automatically cleared.</span><p><ul id="ALM-14034__ul10505203319910"><li id="ALM-14034__li5505533895">If the alarm is not in the list, no further action is required.</li><li id="ALM-14034__li350517336917">If the alarm is in the list, view the alarm details and record the IP address of the host where the alarm is generated. Run the command in <a href="#ALM-14034__li16811215432">2</a>.</li></ul>
</p></li><li id="ALM-14034__li16811215432"><a name="ALM-14034__li16811215432"></a><a name="li16811215432"></a><span>Log in to the host where the alarm is generated as the <strong id="ALM-14034__b156544532910">root</strong> user and run the <strong id="ALM-14034__b175661745142920">su - omm</strong> command to switch to the <strong id="ALM-14034__b456614458299">omm</strong> user.</span></li><li id="ALM-14034__li129386734811"><span>Run the following command to check whether the process state is abnormal:</span><p><p id="ALM-14034__p114995439534"><strong id="ALM-14034__b105101533205318">ps ww -eo stat,cmd| grep -w org.apache.hadoop.hdfs.server.federation.router.DFSRouter | grep -v grep | awk '{print$1}'</strong></p>
</p></li><li id="ALM-14034__li3621133502116"><span>Check whether the command output contains any abnormal state (D, Z, or T).</span><p><ul id="ALM-14034__ul161804819579"><li id="ALM-14034__li670603211">If the output contains any abnormal state, go to <a href="#ALM-14034__li39471558560">5</a>.</li><li id="ALM-14034__li47070182111">If the output does not contain abnormal states, go to <a href="#ALM-14034__li17799174711116">7</a>.</li></ul>
</p></li><li id="ALM-14034__li39471558560"><a name="ALM-14034__li39471558560"></a><a name="li39471558560"></a><span>Switch to user <strong id="ALM-14034__b449481414300">root</strong> and run the <strong id="ALM-14034__b149421411305">reboot</strong> command to restart the host for which the alarm is generated. (Restarting a host is risky. Ensure that the service process is normal after the restart.)</span></li><li id="ALM-14034__li7936132616563"><span>Wait 5 minutes and check whether the alarm is cleared.</span><p><ul id="ALM-14034__ul19652752195618"><li id="ALM-14034__li1365317526566">If the alarm is cleared, no further action is required.</li><li id="ALM-14034__li2065375285614">If the alarm fails to be cleared, go to <a href="#ALM-14034__li17799174711116">7</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-14034__p2079910471716"><strong id="ALM-14034__b958603455414">Collect fault information.</strong></p>
<ol start="7" id="ALM-14034__ol37994471410"><li id="ALM-14034__li17799174711116"><a name="ALM-14034__li17799174711116"></a><a name="li17799174711116"></a><span>On FusionInsight Manager, choose <strong id="ALM-14034__b1261064693015">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-14034__b186101846163011">Log</strong> &gt; <strong id="ALM-14034__b761174619309">Download</strong>.</span></li><li id="ALM-14034__li177999474110"><span>Expand the drop-down list next to the <strong id="ALM-14034__b3138154923017">Service</strong> field. In the <strong id="ALM-14034__b1513820496307">Services</strong> dialog box that is displayed, select <strong id="ALM-14034__b313984916304">HDFS</strong> for the target cluster.</span></li><li id="ALM-14034__li5799147219"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-14034__b14685025112516">Start Date</strong> and <strong id="ALM-14034__b96858253253">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14034__b5685225202516">Download</strong>.</span></li><li id="ALM-14034__li57991247416"><span>Contact <span id="ALM-14034__text640375883017">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-14034__section979934710111"><h4 class="sectiontitle"><span id="ALM-14034__text1379918471115">Alarm Clearance</span></h4><p id="ALM-14034__p27991247919">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-14034__section879913471915"><h4 class="sectiontitle"><span id="ALM-14034__text16799164711115">Related Information</span></h4><p id="ALM-14034__p1779913479110"><span id="ALM-14034__text879984715119">None.</span></p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,81 @@
<a name="ALM-14035"></a><a name="ALM-14035"></a>
<h1 class="topictitle1">ALM-14035 HttpFS Process Is Abnormal</h1>
<div id="body0000002008297085"><div class="section" id="ALM-14035__section979815471118"><h4 class="sectiontitle"><span id="ALM-14035__text1079812471120">Alarm Description</span></h4><p id="ALM-14035__p8353691349">The HttpFS process checks the process status every 20 seconds. This alarm is generated when the process status is abnormal and does not recover for a long time.</p>
<p id="ALM-14035__p197982471413">This alarm is cleared when the process status recovers.</p>
</div>
<div class="section" id="ALM-14035__section18798204714110"><h4 class="sectiontitle"><span id="ALM-14035__text2798164712118">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14035__table87986471415" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14035__row167981047613"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-14035__p12798647315"><span id="ALM-14035__text10798547517">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-14035__p16798124719115"><span id="ALM-14035__text157981347317">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-14035__p17992471410"><span id="ALM-14035__text15799194720117">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-14035__row67994478118"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-14035__p18799747419">14035</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-14035__p279974710111">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-14035__p107994471713">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-14035__section19799184712110"><h4 class="sectiontitle"><span id="ALM-14035__text27993470117">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-14035__table3799204720116" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-14035__row1879915471215"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-14035__p177993479118"><span id="ALM-14035__text207998471417">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-14035__p579954720114"><span id="ALM-14035__text127995473116">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-14035__row1179918471011"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14035__p859219498522">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14035__p2059134995215">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14035__row1279964711115"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14035__p1059010490521">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14035__p35886492524">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14035__row079994716117"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14035__p12587144965212">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14035__p145851849195219">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-14035__row1839713564234"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-14035__p51620924">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-14035__p34048007">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-14035__section0799144716115"><h4 class="sectiontitle"><span id="ALM-14035__text479911470117">Impact on the System</span></h4><p id="ALM-14035__p8799247918">If the process status is abnormal, the process cannot provide services properly. As a result, the entire service may become abnormal.</p>
</div>
<div class="section" id="ALM-14035__section1479910471912"><h4 class="sectiontitle"><span id="ALM-14035__text187997470114">Possible Causes</span></h4><p id="ALM-14035__p251412141245">The host responds slowly to I/O (disk I/O and network I/O) requests and some processes are in the D state and Z state. The process may also be suspended and enter the T state.</p>
</div>
<div class="section" id="ALM-14035__section179924719116"><h4 class="sectiontitle"><span id="ALM-14035__text1799947611">Handling Procedure</span></h4><p id="ALM-14035__p1243515278455"><strong id="ALM-14035__b1988924517547">Check whether the process is in the D, Z, or T state.</strong></p>
<ol id="ALM-14035__ol67999471216"><li id="ALM-14035__li1980611196816"><span>Log in to FusionInsight Manager and choose <strong id="ALM-14035__b10840950103119">O&amp;M</strong> &gt; <strong id="ALM-14035__b5841115013118">Alarm</strong> &gt; <strong id="ALM-14035__b14841155093119">Alarms</strong>. Wait for about 10 minutes and check whether the alarm is automatically cleared.</span><p><ul id="ALM-14035__ul10505203319910"><li id="ALM-14035__li5505533895">If the alarm is not in the list, no further action is required.</li><li id="ALM-14035__li350517336917">If the alarm is in the list, view the alarm details and record the IP address of the host where the alarm is generated. Run the command in <a href="#ALM-14035__li68511247311">2</a>.</li></ul>
</p></li><li id="ALM-14035__li68511247311"><a name="ALM-14035__li68511247311"></a><a name="li68511247311"></a><span>Log in to the host where the alarm is generated as the <strong id="ALM-14035__b379415162324">root</strong> user and run the <strong id="ALM-14035__b07941316193217">su - omm</strong> command to switch to the <strong id="ALM-14035__b4795516173217">omm</strong> user.</span></li><li id="ALM-14035__li129386734811"><span>Run the following command to check whether the process state is abnormal:</span><p><p id="ALM-14035__p114995439534"><strong id="ALM-14035__b105101533205318">ps ww -eo stat,cmd| grep -w org.apache.hadoop.fs.http.server.HttpFSServerWebServer | grep -v grep | awk '{print$1}'</strong></p>
</p></li><li id="ALM-14035__li3621133502116"><span>Check whether the command output contains any abnormal state (D, Z, or T).</span><p><ul id="ALM-14035__ul161804819579"><li id="ALM-14035__li670603211">If the output contains any abnormal state, go to <a href="#ALM-14035__li39471558560">5</a>.</li><li id="ALM-14035__li47070182111">If the output does not contain abnormal states, go to <a href="#ALM-14035__li17799174711116">7</a>.</li></ul>
</p></li><li id="ALM-14035__li39471558560"><a name="ALM-14035__li39471558560"></a><a name="li39471558560"></a><span>Switch to user <strong id="ALM-14035__b5858163753214">root</strong> and run the <strong id="ALM-14035__b4859203716322">reboot</strong> command to restart the host for which the alarm is generated. (Restarting a host is risky. Ensure that the service process is normal after the restart.)</span></li><li id="ALM-14035__li7936132616563"><span>Wait 5 minutes and check whether the alarm is cleared.</span><p><ul id="ALM-14035__ul19652752195618"><li id="ALM-14035__li1365317526566">If the alarm is cleared, no further action is required.</li><li id="ALM-14035__li2065375285614">If the alarm fails to be cleared, go to <a href="#ALM-14035__li17799174711116">7</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-14035__p2079910471716"><strong id="ALM-14035__b10284155114545">Collect fault information.</strong></p>
<ol start="7" id="ALM-14035__ol37994471410"><li id="ALM-14035__li17799174711116"><a name="ALM-14035__li17799174711116"></a><a name="li17799174711116"></a><span>On FusionInsight Manager, choose <strong id="ALM-14035__b1761410973312">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-14035__b1861613933317">Log</strong> &gt; <strong id="ALM-14035__b186171298337">Download</strong>.</span></li><li id="ALM-14035__li177999474110"><span>Expand the drop-down list next to the <strong id="ALM-14035__b1423117123337">Service</strong> field. In the <strong id="ALM-14035__b8232181263315">Services</strong> dialog box that is displayed, select <strong id="ALM-14035__b15232141214334">HDFS</strong> for the target cluster.</span></li><li id="ALM-14035__li5799147219"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-14035__b86785334252">Start Date</strong> and <strong id="ALM-14035__b06791933142513">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-14035__b18679833192510">Download</strong>.</span></li><li id="ALM-14035__li57991247416"><span>Contact <span id="ALM-14035__text6536822123311">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-14035__section979934710111"><h4 class="sectiontitle"><span id="ALM-14035__text1379918471115">Alarm Clearance</span></h4><p id="ALM-14035__p27991247919">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-14035__section879913471915"><h4 class="sectiontitle"><span id="ALM-14035__text16799164711115">Related Information</span></h4><p id="ALM-14035__p1779913479110"><span id="ALM-14035__text879984715119">None.</span></p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,95 @@
<a name="ALM-19022"></a><a name="ALM-19022"></a>
<h1 class="topictitle1">ALM-19022 HBase Hotspot Detection Is Unavailable</h1>
<div id="body0000002007647317"><div class="section" id="ALM-19022__section42400121"><h4 class="sectiontitle"><span id="ALM-19022__text185357518384">Alarm Description</span></h4><p id="ALM-19022__p1779922">When the MetricController instance is installed for HBase, the alarm module checks the health status of the active HBase MetricController instance every 120 seconds. This alarm is generated when the active HBase MetricController instance does not exist or is unavailable and the hotspot detection function is unavailable.</p>
<p id="ALM-19022__p16019298">This alarm is cleared when the active HBase MetricController instance recovers.</p>
<div class="note" id="ALM-19022__note9955955"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-19022__p10779958195019">This alarm applies only to MRS 3.3.0 or later.</p>
</div></div>
</div>
<div class="section" id="ALM-19022__section46056776"><h4 class="sectiontitle"><span id="ALM-19022__text1582805433817">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-19022__table3909558" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-19022__row9358345"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-19022__p19828475"><span id="ALM-19022__text17999570388">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-19022__p62602629"><span id="ALM-19022__text318901183916">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-19022__p37648208"><span id="ALM-19022__text1568825511215">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-19022__row29606020"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-19022__p49277383">19022</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-19022__p13261740360">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-19022__p1825511402618">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-19022__section11857806"><h4 class="sectiontitle"><span id="ALM-19022__text5781104153916">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-19022__table10287189" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-19022__row45935908"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-19022__p29821069"><span id="ALM-19022__text171691577390">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-19022__p66696423"><span id="ALM-19022__text1459201019396">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-19022__row18190122316182"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19022__p13858113752316">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19022__p187931338134115">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-19022__row33701210"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19022__p39123317">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19022__p57042344">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-19022__row43619052"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19022__p37226997">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19022__p32410239">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-19022__row23256701"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19022__p66118565">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19022__p48772425">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-19022__section39611396"><h4 class="sectiontitle"><span id="ALM-19022__text8406151319394">Impact on the System</span></h4><p id="ALM-19022__p149952341174">The HBase hotspot detection function is unavailable.</p>
</div>
<div class="section" id="ALM-19022__section20958252"><h4 class="sectiontitle"><span id="ALM-19022__text941851614397">Possible Causes</span></h4><ul id="ALM-19022__ul20817398"><li id="ALM-19022__li1188327193">The ZooKeeper service is abnormal.</li><li id="ALM-19022__li9280164">The HBase service is abnormal.</li><li id="ALM-19022__li16412613">In the current HBase service, the MetricController instance on the same node as the active HMaster instance is not started.</li><li id="ALM-19022__li7418144120197">The network is abnormal.</li></ul>
</div>
<div class="section" id="ALM-19022__section118143257718"><h4 class="sectiontitle"><span id="ALM-19022__text7119112018395">Handling Procedure</span></h4><p class="tableheading" id="ALM-19022__p54353294"><strong id="ALM-19022__b15135086935">Check the ZooKeeper service status.</strong></p>
<ol id="ALM-19022__ol967113713192"><li id="ALM-19022__li116753791911"><span>In the service list on FusionInsight Manager, check whether <strong id="ALM-19022__b700098901114933">Running Status</strong> of ZooKeeper is <strong id="ALM-19022__b2002078094114933">Normal</strong>.</span><p><ul class="subitemlist" id="ALM-19022__ul167113714196"><li id="ALM-19022__li867163716190">If yes, go to <a href="#ALM-19022__li18661164216271">5</a>.</li><li id="ALM-19022__li14673371194">If no, go to <a href="#ALM-19022__li1267193701920">2</a>.</li></ul>
</p></li><li id="ALM-19022__li1267193701920"><a name="ALM-19022__li1267193701920"></a><a name="li1267193701920"></a><span>In the alarm list, check whether <strong id="ALM-19022__b1414187519114933">ALM-13000 ZooKeeper Service Unavailable</strong> exists.</span><p><ul class="subitemlist" id="ALM-19022__ul26783713195"><li id="ALM-19022__li76793711910">If yes, go to <a href="#ALM-19022__li667113714198">3</a>.</li><li id="ALM-19022__li14671437191915">If no, go to <a href="#ALM-19022__li18661164216271">5</a>.</li></ul>
</p></li><li id="ALM-19022__li667113714198"><a name="ALM-19022__li667113714198"></a><a name="li667113714198"></a><span>Rectify the fault by performing the operations provided for <strong id="ALM-19022__b836413547337">ALM-13000 ZooKeeper Service Unavailable</strong>.</span></li><li id="ALM-19022__li367113701911"><span>Wait for several minutes and check whether the alarm <strong id="ALM-19022__b147001165344">HBase Hotspot Detection Is Unavailable</strong> is cleared.</span><p><ul class="subitemlist" id="ALM-19022__ul76793751911"><li id="ALM-19022__li2671837191915">If yes, no further action is required.</li><li id="ALM-19022__li19671237151914">If no, go to <a href="#ALM-19022__li18661164216271">5</a>.</li></ul>
</p></li></ol>
<p id="ALM-19022__p865314531778"><strong id="ALM-19022__b1748012335616">Check the HBase service status.</strong></p>
<ol start="5" id="ALM-19022__ol466218426271"><li id="ALM-19022__li18661164216271"><a name="ALM-19022__li18661164216271"></a><a name="li18661164216271"></a><span>In the service list on FusionInsight Manager, check whether <strong id="ALM-19022__b18974248183410">Running Status</strong> of HBase is <strong id="ALM-19022__b1097474893418">Normal</strong>.</span><p><ul class="subitemlist" id="ALM-19022__ul146611042202714"><li id="ALM-19022__li4661842122717">If yes, go to <a href="#ALM-19022__li61381651152817">9</a>.</li><li id="ALM-19022__li566194262714">If no, go to <a href="#ALM-19022__li18662154292714">6</a>.</li></ul>
</p></li><li id="ALM-19022__li18662154292714"><a name="ALM-19022__li18662154292714"></a><a name="li18662154292714"></a><span>In the alarm list, check whether the alarm ALM-19000 HBase Service Unavailable exists.</span><p><ul class="subitemlist" id="ALM-19022__ul1366214211276"><li id="ALM-19022__li2066144217277">If yes, go to <a href="#ALM-19022__li66625429278">7</a>.</li><li id="ALM-19022__li126627425274">If no, go to <a href="#ALM-19022__li61381651152817">9</a>.</li></ul>
</p></li><li id="ALM-19022__li66625429278"><a name="ALM-19022__li66625429278"></a><a name="li66625429278"></a><span>Rectify the fault by following the steps provided for <strong id="ALM-19022__b6885292357">ALM-19000 HBase Service Unavailable</strong>.</span></li><li id="ALM-19022__li3662542162713"><span>Wait for several minutes and check whether the alarm <strong id="ALM-19022__b9249143393512">HBase Hotspot Detection Is Unavailable</strong> is cleared.</span><p><ul class="subitemlist" id="ALM-19022__ul11662144222718"><li id="ALM-19022__li166628421274">If yes, no further action is required.</li><li class="subitemlist" id="ALM-19022__li1266215424271">If no, go to <a href="#ALM-19022__li61381651152817">9</a>.</li></ul>
</p></li></ol>
<p id="ALM-19022__p868752102714"><strong id="ALM-19022__b1191412289287">Check whether the MetricController instance deployed on the same node as the active HMaster instance is started.</strong></p>
<ol start="9" id="ALM-19022__ol1113913517286"><li id="ALM-19022__li61381651152817"><a name="ALM-19022__li61381651152817"></a><a name="li61381651152817"></a><span>On FusionInsight Manager, choose <strong id="ALM-19022__b1855525619369">Cluster</strong> &gt; <strong id="ALM-19022__b42663582366">Service</strong> &gt; <strong id="ALM-19022__b1921119013715">HBase</strong>, and click <strong id="ALM-19022__b152267183717">Instances</strong> to check whether the <strong id="ALM-19022__b7728614153719">MetricController(Active)</strong> instance exists.</span><p><ul id="ALM-19022__ul18137551182817"><li id="ALM-19022__li213685102813">If yes, go to <a href="#ALM-19022__li182979395366">12</a>.</li><li id="ALM-19022__li1013719517283">If no, go to <a href="#ALM-19022__li12138165182818">10</a>.</li></ul>
</p></li><li id="ALM-19022__li12138165182818"><a name="ALM-19022__li12138165182818"></a><a name="li12138165182818"></a><span>Select the MetricController instance whose management IP address is the same as that of the active HMaster instance, and click <strong id="ALM-19022__b16234161112506">Start Instance</strong>.</span></li><li id="ALM-19022__li2139155152815"><span>After the MetricController instance is restarted, check whether the alarm <strong id="ALM-19022__b165503410387">HBase Hotspot Detection Is Unavailable</strong> is cleared.</span><p><ul id="ALM-19022__ul41391251132811"><li id="ALM-19022__li613819516284">If yes, no further action is required.</li><li id="ALM-19022__li813913519284">If no, go to <a href="#ALM-19022__li182979395366">12</a>.</li></ul>
</p></li></ol>
<p id="ALM-19022__p69991826393"><strong id="ALM-19022__b34087649221">Check the network connectivity between the started MetricController instances and the active HMaster node.</strong></p>
<ol start="12" id="ALM-19022__ol14298143919367"><li id="ALM-19022__li182979395366"><a name="ALM-19022__li182979395366"></a><a name="li182979395366"></a><span>Log in to the node where the active HMaser instance is deployed and run <strong id="ALM-19022__b02971395367">ping</strong> <em id="ALM-19022__i165003820507">IP address of the node where the standby MetricController instance is deployed</em> to check whether the network connection between the started MetricController instances and the host where the active HMaster instance is deployed is normal.</span><p><ul class="subitemlist" id="ALM-19022__ul329718398364"><li id="ALM-19022__li1297139153613">If yes, go to <a href="#ALM-19022__li107641231103617">15</a>.</li><li class="subitemlist" id="ALM-19022__li1229719397368">If no, go to <a href="#ALM-19022__li929715395365">13</a>.</li></ul>
</p></li><li id="ALM-19022__li929715395365"><a name="ALM-19022__li929715395365"></a><a name="li929715395365"></a><span>Contact the network administrator to restore the network.</span></li><li id="ALM-19022__li6298193923611"><span>After the network recovers, check whether the alarm <strong id="ALM-19022__b4583132544015">HBase Hotspot Detection Is Unavailable</strong> is cleared.</span><p><ul class="subitemlist" id="ALM-19022__ul5298133993617"><li id="ALM-19022__li42981839123610">If yes, no further action is required.</li><li id="ALM-19022__li3298239123613">If no, go to <a href="#ALM-19022__li107641231103617">15</a>.</li></ul>
</p></li></ol>
<p id="ALM-19022__p15601739207"><strong id="ALM-19022__b3606332013">Collect fault information.</strong></p>
<ol start="15" id="ALM-19022__ol167651631113615"><li id="ALM-19022__li107641231103617"><a name="ALM-19022__li107641231103617"></a><a name="li107641231103617"></a><span>On FusionInsight Manager, choose <strong id="ALM-19022__b1985771627114933">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-19022__b1016475368114933">Log</strong> &gt; <strong id="ALM-19022__b1811065636114933">Download</strong>.</span></li><li id="ALM-19022__li07645310363"><span>Expand the <strong id="ALM-19022__b1683209692114933">Service</strong> drop-down list, and select <strong id="ALM-19022__b1178843058114933">HBase</strong> for the target cluster.</span></li><li id="ALM-19022__li73388391699"><span>In the <strong id="ALM-19022__b109542059414">Host</strong> area, select the host where the HMaster instance is deployed.</span></li><li id="ALM-19022__li976593115360"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-19022__b103081519194118">Start Date</strong> and <strong id="ALM-19022__b130917192419">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-19022__b23091319184112">Download</strong>.</span></li><li id="ALM-19022__li77651631163618"><span>Contact <span id="ALM-19022__text12765631133618">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-19022__section563505465818"><h4 class="sectiontitle"><span id="ALM-19022__text1761202610393">Alarm Clearance</span></h4><p id="ALM-19022__p715945811718">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-19022__section762211012599"><h4 class="sectiontitle"><span id="ALM-19022__text107101829133911">Related Information</span></h4><p id="ALM-19022__p1218816411811"><span id="ALM-19022__text61294221672">None.</span></p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,79 @@
<a name="ALM-19023"></a><a name="ALM-19023"></a>
<h1 class="topictitle1">ALM-19023 Region Traffic Restriction for HBase</h1>
<div id="body0000001314311136"><div class="section" id="ALM-19023__section42400121"><h4 class="sectiontitle"><span id="ALM-19023__text87451034183913">Alarm Description</span></h4><p id="ALM-19023__p1779922">When the MetricController instance is installed for the HBase service, self-healing from hotspotting is automatically enabled. The alarm module checks whether there are regions whose request traffic is restricted due to hotspot issues in HBase every 120 seconds. This alarm is generated when the region where hotspot traffic is restricted is detected in HBase.</p>
<p id="ALM-19023__p16019298">This alarm is cleared when the region is no longer a hotspot.</p>
<p id="ALM-19023__p10779958195019">This alarm applies only to MRS 3.3.0 or later.</p>
</div>
<div class="section" id="ALM-19023__section46056776"><h4 class="sectiontitle"><span id="ALM-19023__text121753812398">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-19023__table3909558" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-19023__row9358345"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-19023__p19828475"><span id="ALM-19023__text438204423910">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-19023__p62602629"><span id="ALM-19023__text269185353918">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-19023__p37648208"><span id="ALM-19023__text1568825511215">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-19023__row29606020"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-19023__p581094414588">19023</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-19023__p17808204420580">Critical</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-19023__p98051144165816">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-19023__section11857806"><h4 class="sectiontitle"><span id="ALM-19023__text58462589391">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-19023__table10287189" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-19023__row45935908"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-19023__p29821069"><span id="ALM-19023__text1196312114401">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-19023__p66696423"><span id="ALM-19023__text138826444013">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-19023__row18190122316182"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19023__p13858113752316">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19023__p187931338134115">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-19023__row33701210"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19023__p39123317">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19023__p57042344">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-19023__row43619052"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19023__p37226997">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19023__p32410239">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-19023__row23256701"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19023__p66118565">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19023__p48772425">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-19023__section39611396"><h4 class="sectiontitle"><span id="ALM-19023__text201436844017">Impact on the System</span></h4><p id="ALM-19023__p1199521583">If the traffic of a hotspot region is restricted, the number of handlers for processing the requests in the region is limited. As a result, services requesting the region may slow down or retry upon failure.</p>
</div>
<div class="section" id="ALM-19023__section20958252"><h4 class="sectiontitle"><span id="ALM-19023__text189626102409">Possible Causes</span></h4><p id="ALM-19023__p1635445605811">Too many requests are directed to a single region when the HBase service is accessed.</p>
</div>
<div class="section" id="ALM-19023__section1147916303585"><h4 class="sectiontitle"><span id="ALM-19023__text8893161374012">Handling Procedure</span></h4><p id="ALM-19023__p1452914817195"><strong id="ALM-19023__b1249473125012">Check whether there are too many requests in a single region of HBase.</strong></p>
<ol id="ALM-19023__ol13645123675812"><li id="ALM-19023__li56451636195813"><span>Log in to FusionInsight Manager, and Choose <strong id="ALM-19023__b946811165018">O&amp;M</strong> &gt; <strong id="ALM-19023__b3468111155015">Alarm</strong> &gt; <strong id="ALM-19023__b24681115502">Alarms</strong>.</span></li><li id="ALM-19023__li864533612584"><a name="ALM-19023__li864533612584"></a><a name="li864533612584"></a><span>In <strong id="ALM-19023__b059910382565">Additional Information</strong> of <strong id="ALM-19023__b016916295712">Region Traffic Restriction for HBase</strong>, view the reported table name and region information.</span></li><li id="ALM-19023__li3155164012258"><span>On FusionInsight Manager, choose <strong id="ALM-19023__b79531638105717">Cluster</strong> &gt; <strong id="ALM-19023__b125105407578">Service</strong> &gt; <strong id="ALM-19023__b39691344125715">HBase</strong> and click the hyperlink on the right of HMaster web UI.</span></li><li id="ALM-19023__li143233034717"><span>Click <strong id="ALM-19023__b152025412583">Table Details</strong> and adjust service configurations in the region where the table in <a href="#ALM-19023__li864533612584">2</a> is deployed.</span></li><li id="ALM-19023__li12733112072918"><span>Wait a moment and then check whether the alarm is cleared.</span><p><ul id="ALM-19023__ul815863116392"><li id="ALM-19023__li142853445474">If yes, no further action is required.</li><li id="ALM-19023__li571104311394">If no, go to <a href="#ALM-19023__li16644173610580">6</a>.</li></ul>
</p></li></ol>
<p id="ALM-19023__p15601739207"><strong id="ALM-19023__b158753340553">Collect fault information.</strong></p>
<ol start="6" id="ALM-19023__ol164433617585"><li id="ALM-19023__li16644173610580"><a name="ALM-19023__li16644173610580"></a><a name="li16644173610580"></a><span>On FusionInsight Manager, choose <strong id="ALM-19023__b207011479527">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-19023__b170114745213">Log</strong> &gt; <strong id="ALM-19023__b17015720524">Download</strong>.</span></li><li id="ALM-19023__li20644736165810"><span>Expand the <strong id="ALM-19023__b8635913155214">Service</strong> drop-down list, and select <strong id="ALM-19023__b146357134525">HBase</strong> for the target cluster.</span></li><li id="ALM-19023__li186448362587"><span>In the <strong id="ALM-19023__b10379201915216">Host</strong> area, select the host where the HMaster instance is deployed.</span></li><li id="ALM-19023__li18644133665817"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-19023__b13562112005217">Start Date</strong> and <strong id="ALM-19023__b35628203522">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-19023__b1556252085213">Download</strong>.</span></li><li id="ALM-19023__li064413695811"><span>Contact <span id="ALM-19023__text127091922205217">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-19023__section169311343318"><h4 class="sectiontitle"><span id="ALM-19023__text8397201834010">Alarm Clearance</span></h4><p id="ALM-19023__p3969205517187">This alarm will be automatically cleared.</p>
</div>
<div class="section" id="ALM-19023__section19896826"><h4 class="sectiontitle"><span id="ALM-19023__text18971162594015">Related Information</span></h4><p id="ALM-19023__p9275082"><span id="ALM-19023__text61294221672">None.</span></p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,94 @@
<a name="ALM-19024"></a><a name="ALM-19024"></a>
<h1 class="topictitle1">ALM-19024 RPC Requests P99 Latency on RegionServer Exceeds the Threshold</h1>
<div id="body0000001971007522"><div class="section" id="ALM-19024__section42400121"><h4 class="sectiontitle"><span id="ALM-19024__text11958175822511">Alarm Description</span></h4><p id="ALM-19024__p42166519292">The system checks P99 latency for RPC requests on each RegionServer instance of the HBase service every 30 seconds. This alarm is generated when P99 latency for RPC requests on a RegionServer exceeds the threshold for 10 consecutive times.</p>
<p id="ALM-19024__p1231351418316">This alarm is cleared when P99 latency for RPC requests on a RegionServer instance is less than or equal to the threshold.</p>
<p id="ALM-19024__p10779958195019">This alarm applies only to MRS 3.3.0 or later.</p>
</div>
<div class="section" id="ALM-19024__section46056776"><h4 class="sectiontitle"><span id="ALM-19024__text18690329262">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-19024__table3909558" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-19024__row9358345"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-19024__p19828475"><span id="ALM-19024__text14553174118286">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-19024__p62602629"><span id="ALM-19024__text10623610112610">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-19024__p37648208"><span id="ALM-19024__text1568825511215">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-19024__row29606020"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-19024__p581094414588">19024</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><ul id="ALM-19024__ul499215152411"><li id="ALM-19024__li49921251182415"><strong id="ALM-19024__b516114381194">Critical</strong>: The default threshold is 10 seconds.</li><li id="ALM-19024__li15992451132411"><strong id="ALM-19024__b730411430915">Major</strong>: The default threshold is 5 seconds.</li></ul>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-19024__p98051144165816">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-19024__section11857806"><h4 class="sectiontitle"><span id="ALM-19024__text137861872618">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-19024__table63098886" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-19024__row42029922"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-19024__p48980553"><span id="ALM-19024__text296118200264">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-19024__p8001819"><span id="ALM-19024__text511510248263">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-19024__row88451931718"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19024__p13858113752316">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19024__p187931338134115">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-19024__row44167618"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19024__p39123317">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19024__p83161014635">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-19024__row1943587"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19024__p37226997">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19024__p18316114535">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-19024__row10765874"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19024__p66118565">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19024__p33168145315">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-19024__row498425193419"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19024__p49810256344">Trigger Condition</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19024__p149812573416">Specifies the threshold for triggering the alarm.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-19024__section39611396"><h4 class="sectiontitle"><span id="ALM-19024__text7398142810268">Impact on the System</span></h4><p id="ALM-19024__p1199521583">If RPC requests P99 latency exceeds the threshold, the RegionServer cannot deliver normal service performance externally. If RPC requests P99 latency on most RegionServers in the cluster exceeds the threshold, HBase may fail to provide services for external systems.</p>
</div>
<div class="section" id="ALM-19024__section20958252"><h4 class="sectiontitle"><span id="ALM-19024__text202332032152613">Possible Causes</span></h4><ul id="ALM-19024__ul178628366354"><li id="ALM-19024__li14862193663518">RegionServer GC duration is too long.</li><li id="ALM-19024__li1686243619356">The HDFS RPC response is too slow.</li><li id="ALM-19024__li3862143663513">RegionServer request concurrency is too high.</li></ul>
</div>
<div class="section" id="ALM-19024__section1147916303585"><h4 class="sectiontitle"><span id="ALM-19024__text1733835102615">Handling Procedure</span></h4><ol id="ALM-19024__ol6708234101512"><li id="ALM-19024__li187081734191516"><a name="ALM-19024__li187081734191516"></a><a name="li187081734191516"></a><span>Log in to FusionInsight Manager and choose <strong id="ALM-19024__b42297502143">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-19024__b722945015141">Alarm</strong> &gt; <strong id="ALM-19024__b1222911503147">Alarms</strong>. On the page that is displayed, locate the row containing the alarm whose <strong id="ALM-19024__b122291750191414">Alarm ID</strong> is <strong id="ALM-19024__b122995061413">19024</strong>, and view the service instance and host name in <strong id="ALM-19024__b9229850141415">Location</strong>.</span></li></ol>
<p id="ALM-19024__p18769103663611"><strong id="ALM-19024__b1938714493153">Check the GC duration of RegionServer.</strong></p>
<ol start="2" id="ALM-19024__ol37085342153"><li id="ALM-19024__li97081834111516"><span>In the alarm list on FusionInsight Manager, check whether the "HBase GC Duration Exceeds the Threshold" alarm is generated for the service instance in <a href="#ALM-19024__li187081734191516">1</a>.</span><p><ul id="ALM-19024__ul13708153410158"><li id="ALM-19024__li1170714345158">If yes, go to <a href="#ALM-19024__li167081134161511">3</a>.</li><li id="ALM-19024__li270723411517">If no, go to <a href="#ALM-19024__li2708203154412">5</a>.</li></ul>
</p></li><li id="ALM-19024__li167081134161511"><a name="ALM-19024__li167081134161511"></a><a name="li167081134161511"></a><span>Rectify the fault by following the handling procedure of "ALM-19007 HBase GC Duration Exceeds the Threshold".</span></li><li id="ALM-19024__li18708234121516"><span>Wait several minutes and check whether the alarm is cleared.</span><p><ul id="ALM-19024__ul1170803417150"><li id="ALM-19024__li3708183415151">If yes, no further action is required.</li><li id="ALM-19024__li970893414153">If no, go to <a href="#ALM-19024__li2708203154412">5</a>.</li></ul>
</p></li></ol>
<p id="ALM-19024__p17294242437"><strong id="ALM-19024__b02081554181812">Check HDFS RPC response time.</strong></p>
<ol start="5" id="ALM-19024__ol19709113194416"><li id="ALM-19024__li2708203154412"><a name="ALM-19024__li2708203154412"></a><a name="li2708203154412"></a><span>In the alarm list on FusionInsight Manager, check whether alarm "Average NameNode RPC Processing Time Exceeds the Threshold" is generated for the HDFS service on which the HBase service depends.</span><p><ul id="ALM-19024__ul1045945610445"><li id="ALM-19024__li1545913562445">If yes, go to <a href="#ALM-19024__li87091331184413">6</a>.</li><li id="ALM-19024__li745905654413">If no, go to <a href="#ALM-19024__li2133184710441">8</a>.</li></ul>
</p></li><li id="ALM-19024__li87091331184413"><a name="ALM-19024__li87091331184413"></a><a name="li87091331184413"></a><span>Rectify the fault by following the handling procedure of "ALM-14021 Average NameNode RPC Processing Time Exceeds the Threshold".</span></li><li id="ALM-19024__li970913314444"><span>Wait several minutes and check whether the alarm is cleared.</span><p><ul id="ALM-19024__ul770681614614"><li id="ALM-19024__li9706816124617">If yes, no further action is required.</li><li id="ALM-19024__li0706416104619">If no, go to <a href="#ALM-19024__li2133184710441">8</a>.</li></ul>
</p></li></ol>
<p id="ALM-19024__p7985133944418"><strong id="ALM-19024__b20818193111">Check the number of concurrent processes on a RegionServer.</strong></p>
<ol start="8" id="ALM-19024__ol31346472448"><li id="ALM-19024__li2133184710441"><a name="ALM-19024__li2133184710441"></a><a name="li2133184710441"></a><span>In the alarm list on FusionInsight Manager, check whether the "Handler Usage of RegionServer Exceeds the Threshold" alarm is generated for the service instance in <a href="#ALM-19024__li187081734191516">1</a>.</span><p><ul id="ALM-19024__ul14691255124610"><li id="ALM-19024__li16469115518463">If yes, go to <a href="#ALM-19024__li1781144374611">9</a>.</li><li id="ALM-19024__li11469755194617">If no, go to <a href="#ALM-19024__li959275915215">11</a>.</li></ul>
</p></li><li id="ALM-19024__li1781144374611"><a name="ALM-19024__li1781144374611"></a><a name="li1781144374611"></a><span>Rectify the fault by following the handling procedure of "ALM-19021 Handler Usage of RegionServer Exceeds the Threshold".</span></li><li id="ALM-19024__li61341947114413"><span>Wait several minutes and check whether the alarm is cleared.</span><p><ul id="ALM-19024__ul197493917470"><li id="ALM-19024__li774103917479">If yes, no further action is required.</li><li id="ALM-19024__li874203913474">If no, go to <a href="#ALM-19024__li959275915215">11</a>.</li></ul>
</p></li></ol>
<p id="ALM-19024__p15601739207"><strong id="ALM-19024__b1595128153218">Collect fault information.</strong></p>
<ol start="11" id="ALM-19024__ol1559215914523"><li id="ALM-19024__li959275915215"><a name="ALM-19024__li959275915215"></a><a name="li959275915215"></a><span>On FusionInsight Manager, choose <strong id="ALM-19024__b11396101324">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-19024__b139201003211">Log</strong> &gt; <strong id="ALM-19024__b1139191053220">Download</strong>.</span></li><li id="ALM-19024__li1959211592529"><span>Expand the <strong id="ALM-19024__b142056118321">Service</strong> drop-down list, and select <strong id="ALM-19024__b162052117320">HBase</strong> for the target cluster.</span></li><li id="ALM-19024__li19592145975215"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-19024__b153341712103211">Start Date</strong> and <strong id="ALM-19024__b6334171283215">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-19024__b833410121328">Download</strong>.</span></li><li id="ALM-19024__li1759215945217"><span>Contact <span id="ALM-19024__text1059295916526">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-19024__section169311343318"><h4 class="sectiontitle"><span id="ALM-19024__text596254111265">Alarm Clearance</span></h4><p id="ALM-19024__p5635175455816">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-19024__section19896826"><h4 class="sectiontitle"><span id="ALM-19024__text7831044102616">Related Information</span></h4><p id="ALM-19024__p9275082"><span id="ALM-19024__text61294221672">None.</span></p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,94 @@
<a name="ALM-19025"></a><a name="ALM-19025"></a>
<h1 class="topictitle1">ALM-19025 Damaged StoreFile in HBase</h1>
<div id="body0000002007527853"><div class="section" id="ALM-19025__section42400121"><h4 class="sectiontitle"><span id="ALM-19025__text14720102111505">Alarm Description</span></h4><p id="ALM-19025__p42166519292">The system checks the <strong id="ALM-19025__b2710259145714">hdfs://hacluster/hbase/autocorrupt</strong> and <strong id="ALM-19025__b426042055118">hdfs://hacluster/hbase/MasterData/autocorrupt</strong> directories on HDFS of each HBase service every 120 seconds. This alarm is generated when there are files in the directories.</p>
<p id="ALM-19025__p1231351418316">This alarm is cleared when the <strong id="ALM-19025__b2036914565214">hdfs://hacluster/hbase/autocorrupt</strong> and <strong id="ALM-19025__b3369125205213">hdfs://hacluster/hbase/MasterData/autocorrupt</strong> directories do not exist or are empty.</p>
<p id="ALM-19025__p10779958195019">This alarm applies only to MRS 3.3.0 or later.</p>
<div class="note" id="ALM-19025__note227923535513"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-19025__p13733223134811"><strong id="ALM-19025__b7548014135912">hdfs://hacluster</strong> indicates the name of the file system used by HBase, and <strong id="ALM-19025__b87122095910">/hbase</strong> indicates the root directory of HBase in the file system. You can log in to FusionInsight Manager, choose <strong id="ALM-19025__b156701940135912">Cluster</strong> &gt; <strong id="ALM-19025__b167334325910">Services</strong> &gt; <strong id="ALM-19025__b63431447135914">HBase</strong> and click <strong id="ALM-19025__b159671454175915">Configuration</strong>. Search for <strong id="ALM-19025__b8912221606">fs.defaultFS</strong> and <strong id="ALM-19025__b21192209019">hbase.data.rootdir</strong>.</p>
</div></div>
</div>
<div class="section" id="ALM-19025__section46056776"><h4 class="sectiontitle"><span id="ALM-19025__text2972174455013">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-19025__table3909558" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-19025__row9358345"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-19025__p19828475"><span id="ALM-19025__text8675746135320">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-19025__p62602629"><span id="ALM-19025__text138115610534">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-19025__p37648208"><span id="ALM-19025__text118371076718">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-19025__row29606020"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-19025__p581094414588">19025</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-19025__p15650153814177">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-19025__p98051144165816">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-19025__section11857806"><h4 class="sectiontitle"><span id="ALM-19025__text10271232513">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-19025__table63098886" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-19025__row42029922"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-19025__p48980553"><span id="ALM-19025__text11637191318549">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-19025__p8001819"><span id="ALM-19025__text7473120145419">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-19025__row88451931718"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19025__p13858113752316">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19025__p187931338134115">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-19025__row44167618"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19025__p39123317">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19025__p83161014635">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-19025__row1943587"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19025__p37226997">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19025__p18316114535">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-19025__row10765874"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19025__p66118565">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19025__p33168145315">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-19025__section39611396"><h4 class="sectiontitle"><span id="ALM-19025__text7831228155115">Impact on the System</span></h4><p id="ALM-19025__p1199521583">There are damaged StoreFile files in HBase, which may cause data loss.</p>
</div>
<div class="section" id="ALM-19025__section20958252"><h4 class="sectiontitle"><span id="ALM-19025__text11410115911514">Possible Causes</span></h4><p id="ALM-19025__p1844714448">The StoreFile files are damaged.</p>
</div>
<div class="section" id="ALM-19025__section14353515104812"><h4 class="sectiontitle"><span id="ALM-19025__text157397117523">Handling Procedure</span></h4><ol id="ALM-19025__ol12758176556"><li id="ALM-19025__li0272417155517"><span>Log in to FusionInsight Manager and choose <strong id="ALM-19025__b095616141645">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-19025__b179561314545">Alarm</strong> &gt; <strong id="ALM-19025__b179572141418">Alarms</strong>. On the page that is displayed, locate the row containing the alarm whose <strong id="ALM-19025__b995741415411">Alarm ID</strong> is <strong id="ALM-19025__b195713145414">19025</strong>, and view the service in <strong id="ALM-19025__b29579141040">Location</strong>.</span></li><li id="ALM-19025__li182722017165515"><span>Log in to the node where the HDFS and HBase clients are installed as the client installation user and run the following commands:</span><p><p id="ALM-19025__p927291765511"><strong id="ALM-19025__b1773610562317">cd </strong><em id="ALM-19025__i87372564312">Client installation directory</em></p>
<p id="ALM-19025__p1227212173551"><strong id="ALM-19025__b102721017155513">source bigdata_env</strong></p>
<p id="ALM-19025__p1027281735519"><strong id="ALM-19025__b1616016594418">kinit</strong> <em id="ALM-19025__i1523393652">Component service user</em> (If <span id="ALM-19025__ph132721317155510">Kerberos authentication is disabled for the cluster (the cluster is in normal mode)</span>, skip this step.)</p>
</p></li><li id="ALM-19025__li11272201715518"><span>Check the damaged StoreFile file.</span><p><ul id="ALM-19025__ul138228219290"><li id="ALM-19025__li1748012243294">Run the following command to check whether the <strong id="ALM-19025__b12848115495217">/hbase/autocorrupt</strong> directory of HDFS is empty. If it is not, go to <a href="#ALM-19025__li202731117105511">4</a>.<p id="ALM-19025__p1301102363115"><strong id="ALM-19025__b527231785512">hdfs dfs -ls -R</strong><strong id="ALM-19025__b1727212176550"> hdfs://hacluster</strong><strong id="ALM-19025__b1327291735518">/hbase</strong><strong id="ALM-19025__b5272917135519">/autocorrupt</strong></p>
</li><li id="ALM-19025__li67241125123110">Run the following command to check whether the <strong id="ALM-19025__b53591622115317">/hbase/MasterData/autocorrupt</strong> directory of HDFS is empty. If it is not, go to <a href="#ALM-19025__li13270141719556">9</a>.<p id="ALM-19025__p665544263213"><strong id="ALM-19025__b13279145792313">hdfs dfs -ls -R</strong><strong id="ALM-19025__b4279357152311"> hdfs://hacluster</strong><strong id="ALM-19025__b027916572237">/hbase</strong><strong id="ALM-19025__b20279557112314">/MasterData/autocorrupt</strong></p>
</li></ul>
</p></li><li id="ALM-19025__li202731117105511"><a name="ALM-19025__li202731117105511"></a><a name="li202731117105511"></a><span>Run the following command to restore the StoreFile files in the <strong id="ALM-19025__b751317223610">hdfs://hacluster/hbase/autocorrupt</strong> directory:</span><p><p id="ALM-19025__p20273151725519"><strong id="ALM-19025__b17587543969">hdfs debug recoverLease -path hdfs://hacluster/hbase/autocorrupt/</strong><em id="ALM-19025__i517045811616">Name space</em><strong id="ALM-19025__b16474356168">/</strong>Table<strong id="ALM-19025__b28218115711">/</strong>Region<strong id="ALM-19025__b198101281775">/</strong>Column family<strong id="ALM-19025__b79791037716">/</strong><em id="ALM-19025__i144831741275">StoreFile files</em></p>
</p></li><li id="ALM-19025__li162731617125519"><span>Check whether the damaged StoreFile files are restored. If the following information is displayed, the restoration is successful:</span><p><pre class="screen" id="ALM-19025__screen062311418582">recoverLease SUCCEEDED on hdfs://hacluster/hbase/autocorrupt/<em id="ALM-19025__i16961159155816">default/h1/865665fe32db62dadada68b644359809/cf1/95f210f931ad44c99e4028470be7d292</em></pre>
<p id="ALM-19025__p473117514583">If yes, go to <a href="#ALM-19025__li1427441715510">6</a>.</p>
<p id="ALM-19025__p37326585811">If no, go to <a href="#ALM-19025__li13270141719556">9</a>.</p>
</p></li><li id="ALM-19025__li1427441715510"><a name="ALM-19025__li1427441715510"></a><a name="li1427441715510"></a><span>Run the following command to move the files back to the <strong id="ALM-19025__b132901928783">hdfs://hacluster/hbase/data</strong> directory:</span><p><p id="ALM-19025__p62741117175519"><strong id="ALM-19025__b12995035289">hdfs dfs -mv hdfs://hacluster/hbase/autocorrupt/</strong><em id="ALM-19025__i47591337691">Name space</em><strong id="ALM-19025__b1675973717913">/</strong>Table<strong id="ALM-19025__b16759113717913">/</strong>Region<strong id="ALM-19025__b147598371893">/</strong>Column family<strong id="ALM-19025__b1375916379914">/</strong><em id="ALM-19025__i187593372099">StoreFile files</em><strong id="ALM-19025__b147326812911">hdfs://hacluster/hbase/data/</strong><em id="ALM-19025__i19750194210912">Name space</em><strong id="ALM-19025__b6750154211916">/</strong>Table<strong id="ALM-19025__b1575016422917">/</strong>Region<strong id="ALM-19025__b17501942790">/</strong>Column family<strong id="ALM-19025__b075011422916">/</strong><em id="ALM-19025__i1875034214914">StoreFile files</em></p>
</p></li><li id="ALM-19025__li1427451795511"><span>Run the following command on HBase Shell to bring the region online again:</span><p><p id="ALM-19025__p19274201715512"><strong id="ALM-19025__b1027416171551">hbase shell</strong></p>
<p id="ALM-19025__p2027471715554"><strong id="ALM-19025__b145881549181011">unassign'</strong><em id="ALM-19025__i1634025771019">Region</em><strong id="ALM-19025__b2884125281014">'</strong></p>
<p id="ALM-19025__p227451745510"><strong id="ALM-19025__b1850813051116">assign'</strong><em id="ALM-19025__i675135191115">Region</em><strong id="ALM-19025__b139852171119">'</strong></p>
</p></li><li id="ALM-19025__li122754172555"><span>Wait several minutes and check whether the alarm is cleared.</span><p><ul id="ALM-19025__ul15275111795517"><li id="ALM-19025__li1327413175559">If yes, no further action is required.</li><li id="ALM-19025__li182755176556">If no, go to <a href="#ALM-19025__li13270141719556">9</a>.</li></ul>
</p></li></ol>
<p id="ALM-19025__p8551171215559"><strong id="ALM-19025__b194921811118">Collect fault information.</strong></p>
<ol start="9" id="ALM-19025__ol15271317145513"><li id="ALM-19025__li13270141719556"><a name="ALM-19025__li13270141719556"></a><a name="li13270141719556"></a><span>On FusionInsight Manager, choose <strong id="ALM-19025__b1597510214111">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-19025__b1975152118113">Log</strong> &gt; <strong id="ALM-19025__b12975172115113">Download</strong>.</span></li><li id="ALM-19025__li327016177553"><span>Expand the <strong id="ALM-19025__b840925131118">Service</strong> drop-down list, and select <strong id="ALM-19025__b240102519115">HBase</strong> for the target cluster.</span></li><li id="ALM-19025__li14270121705519"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-19025__b11744525101114">Start Date</strong> and <strong id="ALM-19025__b1874416255114">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-19025__b774410259112">Download</strong>.</span></li><li id="ALM-19025__li42711517125520"><span>Contact <span id="ALM-19025__text1137338201119">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-19025__section169311343318"><h4 class="sectiontitle"><span id="ALM-19025__text19963162185211">Alarm Clearance</span></h4><p id="ALM-19025__p5635175455816">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-19025__section19896826"><h4 class="sectiontitle"><span id="ALM-19025__text19872111325310">Related Information</span></h4><p id="ALM-19025__p9275082"><span id="ALM-19025__text57129331713">None.</span></p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,83 @@
<a name="ALM-19026"></a><a name="ALM-19026"></a>
<h1 class="topictitle1">ALM-19026 Damaged WAL Files in HBase</h1>
<div id="body0000001971167294"><div class="section" id="ALM-19026__section42400121"><h4 class="sectiontitle"><span id="ALM-19026__text1917154618563">Alarm Description</span></h4><p id="ALM-19026__p42166519292">The system checks the <strong id="ALM-19026__b19150158121414">hdfs://hacluster/hbase/corrupt</strong> directory on the HDFS of each HBase service every 120 seconds. This alarm is generated when there are WAL files in the <strong id="ALM-19026__b141501558161415">/hbase/corrupt</strong> directory.</p>
<p id="ALM-19026__p1231351418316">This alarm is cleared when the <strong id="ALM-19026__b17645927111513">/hbase/corrupt</strong> directory does not exist or does not contain WAL files.</p>
<p id="ALM-19026__p10779958195019">This alarm applies only to MRS 3.3.0 or later.</p>
<div class="note" id="ALM-19026__note227923535513"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-19026__p111138587412"><strong id="ALM-19026__b0626835101515">hdfs://hacluster</strong> indicates the name of the file system used by HBase, and <strong id="ALM-19026__b16263356158">/hbase</strong> indicates the root directory of HBase in the file system. You can log in to FusionInsight Manager, choose <strong id="ALM-19026__b8627143514156">Cluster</strong> &gt; <strong id="ALM-19026__b1562711357157">Services</strong> &gt; <strong id="ALM-19026__b1162753521510">HBase</strong> and click <strong id="ALM-19026__b206271535131517">Configuration</strong>. Search for <strong id="ALM-19026__b762853520153">fs.defaultFS</strong> and <strong id="ALM-19026__b46285355150">hbase.data.rootdir</strong>.</p>
</div></div>
</div>
<div class="section" id="ALM-19026__section46056776"><h4 class="sectiontitle"><span id="ALM-19026__text17511658165619">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-19026__table3909558" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-19026__row9358345"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-19026__p19828475"><span id="ALM-19026__text648196165715">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-19026__p62602629"><span id="ALM-19026__text734181815578">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-19026__p37648208"><span id="ALM-19026__text118371076718">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-19026__row29606020"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-19026__p581094414588">19026</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-19026__p15650153814177">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-19026__p98051144165816">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-19026__section11857806"><h4 class="sectiontitle"><span id="ALM-19026__text1960525117576">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-19026__table63098886" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-19026__row42029922"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-19026__p48980553"><span id="ALM-19026__text52825012580">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-19026__p8001819"><span id="ALM-19026__text1966112157582">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-19026__row88451931718"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19026__p13858113752316">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19026__p187931338134115">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-19026__row44167618"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19026__p39123317">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19026__p83161014635">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-19026__row1943587"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19026__p37226997">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19026__p18316114535">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-19026__row10765874"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-19026__p66118565">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-19026__p33168145315">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-19026__section39611396"><h4 class="sectiontitle"><span id="ALM-19026__text10127133013584">Impact on the System</span></h4><p id="ALM-19026__p1199521583">There are damaged WAL files in HBase, which may cause data loss.</p>
</div>
<div class="section" id="ALM-19026__section20958252"><h4 class="sectiontitle"><span id="ALM-19026__text9496174017583">Possible Causes</span></h4><p id="ALM-19026__p1844714448">The WAL files are damaged.</p>
</div>
<div class="section" id="ALM-19026__section14353515104812"><h4 class="sectiontitle"><span id="ALM-19026__text37162049145815">Handling Procedure</span></h4><ol id="ALM-19026__ol252113230207"><li id="ALM-19026__li85201923102019"><span>Log in to FusionInsight Manager and choose <strong id="ALM-19026__b7730412121613">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-19026__b17316128160">Alarm</strong> &gt; <strong id="ALM-19026__b7731712201616">Alarms</strong>. On the page that is displayed, locate the row containing the alarm whose <strong id="ALM-19026__b16731191281613">Alarm ID</strong> is <strong id="ALM-19026__b1573151291620">19026</strong>, and view the service in <strong id="ALM-19026__b1873241201613">Location</strong>.</span></li><li id="ALM-19026__li0521723112015"><span>Log in to the node where the HDFS clients are installed as the client installation user and run the following commands:</span><p><p id="ALM-19026__p19521523192012"><strong id="ALM-19026__b2197732101613">cd </strong><em id="ALM-19026__i1119793281615">Client installation directory</em></p>
<p id="ALM-19026__p7521162322017"><strong id="ALM-19026__b75213238207">source bigdata_env</strong></p>
<p id="ALM-19026__p1152112392016"><strong id="ALM-19026__b12782035171611">kinit</strong> <em id="ALM-19026__i187903581619">Component service user</em> (If <span id="ALM-19026__ph167953551616">Kerberos authentication is disabled for the cluster (the cluster is in normal mode)</span>, skip this step.)</p>
</p></li><li id="ALM-19026__li4521122362019"><span>Run the following command to check the damaged WAL files and go to <a href="#ALM-19026__li135201823182014">4</a>:</span><p><p id="ALM-19026__p1452116235200"><strong id="ALM-19026__b252192317202">hdfs dfs -ls </strong><strong id="ALM-19026__b18521122317205">hdfs://hacluster</strong><strong id="ALM-19026__b352132314200">/hbase</strong><strong id="ALM-19026__b652122352014">/corrupt/*%2C*</strong></p>
</p></li></ol>
<p id="ALM-19026__p8661181992019"><strong id="ALM-19026__b6869655151611">Collect fault information.</strong></p>
<ol start="4" id="ALM-19026__ol952032316206"><li id="ALM-19026__li135201823182014"><a name="ALM-19026__li135201823182014"></a><a name="li135201823182014"></a><span>On FusionInsight Manager, choose <strong id="ALM-19026__b5452059191614">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-19026__b1945175971610">Log</strong> &gt; <strong id="ALM-19026__b34645941615">Download</strong>.</span></li><li id="ALM-19026__li13520623172011"><span>Expand the <strong id="ALM-19026__b19634159131613">Service</strong> drop-down list, and select <strong id="ALM-19026__b1963414599164">HBase</strong> for the target cluster.</span></li><li id="ALM-19026__li5520152313207"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-19026__b18361000175">Start Date</strong> and <strong id="ALM-19026__b16836120111719">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-19026__b178363018173">Download</strong>.</span></li><li id="ALM-19026__li1652011236208"><span>Contact <span id="ALM-19026__text16951220175">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-19026__section169311343318"><h4 class="sectiontitle"><span id="ALM-19026__text38561858145810">Alarm Clearance</span></h4><p id="ALM-19026__p5635175455816">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-19026__section19896826"><h4 class="sectiontitle"><span id="ALM-19026__text8686141045919">Related Information</span></h4><p id="ALM-19026__p9275082"><span id="ALM-19026__text57129331713">None.</span></p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,88 @@
<a name="ALM-25007"></a><a name="ALM-25007"></a>
<h1 class="topictitle1">ALM-25007 Number of SlapdServer Connections Exceeds the Threshold</h1>
<div id="body0000001971656760"><div class="section" id="ALM-25007__section6427584"><h4 class="sectiontitle"><span id="ALM-25007__text8925301575">Alarm Description</span></h4><p id="ALM-25007__p42420329101829">The system checks the number of process connections on the SlapdServer node every 30 seconds and compares the actual number with the threshold. This alarm is generated when the number of process connections exceeds the threshold (<strong id="ALM-25007__b8866125433316">1000</strong> by default) for multiple times (<strong id="ALM-25007__b1978152016345">5</strong> by default).</p>
<p id="ALM-25007__p7890154011523">Its <strong id="ALM-25007__b8322134934612">Trigger Count</strong> is configurable. If <strong id="ALM-25007__b101511558165816">Trigger Count</strong> is set to <strong id="ALM-25007__b4897107195911">1</strong>, this alarm is cleared when the number of process connections is less than or equal to the threshold. If <strong id="ALM-25007__b985511871012">Trigger Count</strong> is greater than <strong id="ALM-25007__b985513813104">1</strong>, this alarm is cleared when the number of process connections is less than or equal to 90% of the threshold.</p>
</div>
<div class="section" id="ALM-25007__section57848263"><h4 class="sectiontitle"><span id="ALM-25007__text38748475555">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-25007__table53988588" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-25007__row25963404"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-25007__p57710042"><span id="ALM-25007__text17980150175619">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-25007__p44001849"><span id="ALM-25007__text199471335614">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-25007__p7380012"><span id="ALM-25007__text152400388563">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-25007__row14760880"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-25007__p14887165972519">25007</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-25007__p11886859182510">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-25007__p14881165912515">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-25007__section50872323"><h4 class="sectiontitle"><span id="ALM-25007__text155061195577">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-25007__table22167579" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-25007__row15017071"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-25007__p21975462"><span id="ALM-25007__text776142495720">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-25007__p35182007"><span id="ALM-25007__text632018391572">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-25007__row1756114464143"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-25007__p3820532611">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-25007__p6810518268">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-25007__row34521134"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-25007__p171352261">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-25007__p1461254261">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-25007__row6737354"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-25007__p15512518268">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-25007__p7112518267">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-25007__row678591271316"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-25007__p20786112101312">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-25007__p207861112101315">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-25007__row6761024151317"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-25007__p127613244137">Trigger Condition</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-25007__p976132418139">Specifies the threshold for triggering the alarm.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-25007__section55197725"><h4 class="sectiontitle"><span id="ALM-25007__text2266192715582">Impact on the System</span></h4><p id="ALM-25007__p47392170">Processes respond slowly or do not work.</p>
</div>
<div class="section" id="ALM-25007__section27017478"><h4 class="sectiontitle"><span id="ALM-25007__text12656240135813">Possible Causes</span></h4><ul id="ALM-25007__ul928305117158"><li id="ALM-25007__li12831051111518">There are too many SlapdServer connections.</li><li id="ALM-25007__li20252875511">The alarm threshold or alarm trigger count is improperly configured.</li></ul>
</div>
<div class="section" id="ALM-25007__section535785120256"><h4 class="sectiontitle"><span id="ALM-25007__text19569135285811">Handling Procedure</span></h4><p id="ALM-25007__p13680121610197"><strong id="ALM-25007__b554744191416">Check whether there are too many SlapdServer process connections.</strong></p>
<ol id="ALM-25007__ol3606133663917"><li id="ALM-25007__li360615369397"><span>Log in to FusionInsight Manager and choose <strong id="ALM-25007__b274121091511">Cluster</strong> &gt; <strong id="ALM-25007__b15878191591510">Services</strong> &gt; <strong id="ALM-25007__b262119189157">LdapServer</strong>.</span></li><li id="ALM-25007__li1360653613918"><span>On the LdapServer dashboard page, observe the SlapdServer process connections and decrease the connections based on service requirements.</span><p><div class="fignone" id="ALM-25007__fig1360663643914"><span class="figcap"><b>Figure 1 </b>SlapdServer process connections</span><br><span><img id="ALM-25007__image86061536153916" src="en-us_image_0000001971659216.png"></span></div>
</p></li><li id="ALM-25007__li36061236183919"><span>Wait about 2 minutes and check whether the alarm is cleared.</span><p><ul id="ALM-25007__ul4606143673918"><li id="ALM-25007__li160693618391">If yes, no further action is required.</li><li id="ALM-25007__li106061436163913">If no, go to <a href="#ALM-25007__li1860517366397">4</a>.</li></ul>
</p></li></ol>
<p id="ALM-25007__p197271833958"><strong id="ALM-25007__b13441112181">Check whether the alarm threshold or alarm trigger count is properly configured.</strong></p>
<ol start="4" id="ALM-25007__ol7606036183911"><li id="ALM-25007__li1860517366397"><a name="ALM-25007__li1860517366397"></a><a name="li1860517366397"></a><span>On FusionInsight Manager, choose <strong id="ALM-25007__b97002030184">O&amp;M</strong> &gt; <strong id="ALM-25007__b37017311816">Alarm</strong> &gt; <strong id="ALM-25007__b1370216312180">Thresholds</strong>, click the name of the desired cluster, choose <strong id="ALM-25007__b177031439188">LdapServer</strong> &gt; <strong id="ALM-25007__b186991181409">Other</strong> &gt; <strong id="ALM-25007__b1270520331810">SlapdServer Service Connections</strong>, and check whether the alarm trigger count and alarm threshold are set properly.</span><p><ul id="ALM-25007__ul15605736123916"><li id="ALM-25007__li10605836173911">If yes, go to <a href="#ALM-25007__li2086435114014">7</a>.</li><li id="ALM-25007__li12605236173920">If no, go to <a href="#ALM-25007__li20605336193916">5</a>.</li></ul>
</p></li><li id="ALM-25007__li20605336193916"><a name="ALM-25007__li20605336193916"></a><a name="li20605336193916"></a><span>Change the trigger count and alarm threshold based on the actual number of process connections, and apply the changes.</span></li><li id="ALM-25007__li760611368392"><span>Wait 2 minutes and check whether the alarm is automatically cleared.</span><p><ul id="ALM-25007__ul76061436193914"><li id="ALM-25007__li8605536113913">If yes, no further action is required.</li><li id="ALM-25007__li1060613362395">If no, go to <a href="#ALM-25007__li2086435114014">7</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-25007__p22707215144835"><strong id="ALM-25007__b295134514200">Collect fault information.</strong></p>
<ol start="7" id="ALM-25007__ol78649514010"><li id="ALM-25007__li2086435114014"><a name="ALM-25007__li2086435114014"></a><a name="li2086435114014"></a><span>On FusionInsight Manager, choose <strong id="ALM-25007__b1874116469201">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-25007__b157416465209">Log</strong> &gt; <strong id="ALM-25007__b974144602013">Download</strong>.</span></li><li id="ALM-25007__li188641959409"><span>Expand the <strong id="ALM-25007__b8504952013">Service</strong> drop-down list, and select <strong id="ALM-25007__b176174902013">LdapServer</strong> for the target cluster.</span></li><li id="ALM-25007__li8864165104018"><span>Specify <strong id="ALM-25007__b81165715209">Hosts</strong> for collecting logs, which is optional. By default, all hosts are selected.</span></li><li id="ALM-25007__li286419594011"><span>Click <span><img id="ALM-25007__image68642513407" src="en-us_image_0000001971818984.png"></span> in the upper right corner, and set <strong id="ALM-25007__b55031022112216">Start Date</strong> and <strong id="ALM-25007__b135056229226">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-25007__b15508322182218">Download</strong>.</span></li><li id="ALM-25007__li15864954406"><span>Contact <span id="ALM-25007__text3163192382317">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-25007__section169311343318"><h4 class="sectiontitle"><span id="ALM-25007__text367020138593">Alarm Clearance</span></h4><p id="ALM-25007__p754913417333">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-25007__section53362350"><h4 class="sectiontitle"><span id="ALM-25007__text1246242445916">Related Information</span></h4><p id="ALM-25007__p7522741"><span id="ALM-25007__text1881919412591">None.</span></p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,90 @@
<a name="ALM-25008"></a><a name="ALM-25008"></a>
<h1 class="topictitle1">ALM-25008 SlapdServer CPU Usage Exceeds the Threshold</h1>
<div id="body0000001971816512"><div class="section" id="ALM-25008__section6427584"><h4 class="sectiontitle"><span id="ALM-25008__text8925301575">Alarm Description</span></h4><p id="ALM-25008__p649614519412">The system checks the CPU usage of the SlapdServer node every 30 seconds and compares the actual usage with the threshold. This alarm is generated when the SlapdServer CPU usage exceeds the threshold for multiple times (<strong id="ALM-25008__b96508419288">5</strong> by default).</p>
<p id="ALM-25008__p7890154011523">Its <strong id="ALM-25008__b1697251618268">Trigger Count</strong> is configurable. If <strong id="ALM-25008__b5972171619264">Trigger Count</strong> is set to <strong id="ALM-25008__b169726167268">1</strong>, this alarm is cleared when the SlapdServer CPU usage is less than or equal to the threshold. If <strong id="ALM-25008__b1853416135271">Trigger Count</strong> is greater than <strong id="ALM-25008__b353411312718">1</strong>, this alarm is cleared when the SlapdServer CPU usage is less than or equal to 90% of the threshold.</p>
</div>
<div class="section" id="ALM-25008__section57848263"><h4 class="sectiontitle"><span id="ALM-25008__text38748475555">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-25008__table53988588" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-25008__row25963404"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-25008__p57710042"><span id="ALM-25008__text17980150175619">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-25008__p44001849"><span id="ALM-25008__text199471335614">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-25008__p7380012"><span id="ALM-25008__text152400388563">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-25008__row14760880"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-25008__p14887165972519">25008</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-25008__p660834585110">Critical (default threshold: 85%)</p>
<p id="ALM-25008__p51431020">Major (default threshold: 75%)</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-25008__p14881165912515">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-25008__section50872323"><h4 class="sectiontitle"><span id="ALM-25008__text155061195577">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-25008__table22167579" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-25008__row15017071"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-25008__p21975462"><span id="ALM-25008__text776142495720">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-25008__p35182007"><span id="ALM-25008__text632018391572">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-25008__row1756114464143"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-25008__p3820532611">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-25008__p6810518268">Specifies the cluster or system for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-25008__row34521134"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-25008__p171352261">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-25008__p1461254261">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-25008__row6737354"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-25008__p15512518268">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-25008__p7112518267">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-25008__row1028801444414"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-25008__p9288161412441">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-25008__p13288214194417">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-25008__row19401162054415"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-25008__p15402102044415">Trigger Condition</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-25008__p9402122064414">Specifies the threshold for triggering the alarm.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-25008__section55197725"><h4 class="sectiontitle"><span id="ALM-25008__text2266192715582">Impact on the System</span></h4><p id="ALM-25008__p47392170">Processes respond slowly or do not work.</p>
</div>
<div class="section" id="ALM-25008__section27017478"><h4 class="sectiontitle"><span id="ALM-25008__text12656240135813">Possible Causes</span></h4><ul id="ALM-25008__ul460131185210"><li id="ALM-25008__li1373752155210">The alarm threshold or alarm trigger count is improperly configured.</li><li id="ALM-25008__li1760201165215">The CPU configuration cannot meet service requirements, and the CPU usage reaches the upper limit.</li></ul>
</div>
<div class="section" id="ALM-25008__section535785120256"><h4 class="sectiontitle"><span id="ALM-25008__text19569135285811">Handling Procedure</span></h4><p id="ALM-25008__p18319915115316"><strong id="ALM-25008__b1692240193419">Check whether the alarm threshold or alarm trigger count is properly configured.</strong></p>
<ol id="ALM-25008__ol12485153614462"><li id="ALM-25008__li124853366461"><span>Log in to FusionInsight Manager, choose <strong id="ALM-25008__b10959185319342">O&amp;M</strong> &gt; <strong id="ALM-25008__b8960125312349">Alarm</strong> &gt; <strong id="ALM-25008__b896118537342">Thresholds</strong>, click the name of the desired cluster, choose <strong id="ALM-25008__b3962253113412">LdapServer</strong> &gt; <strong id="ALM-25008__b142712140111">Other</strong> &gt; <strong id="ALM-25008__b8963253153419">SlapdServer Service Total CPU Percentage</strong>, and check whether the alarm trigger count and alarm threshold are set properly.</span><p><ul id="ALM-25008__ul17485136134617"><li id="ALM-25008__li1748553654613">If yes, go to <a href="#ALM-25008__li848412361466">4</a>.</li><li id="ALM-25008__li19485153694610">If no, go to <a href="#ALM-25008__li174859361464">2</a>.</li></ul>
</p></li><li id="ALM-25008__li174859361464"><a name="ALM-25008__li174859361464"></a><a name="li174859361464"></a><span>Change the trigger count and alarm threshold based on the actual CPU usage, and apply the changes.</span></li><li id="ALM-25008__li1148563612460"><span>Wait 2 minutes and check whether the alarm is automatically cleared.</span><p><ul id="ALM-25008__ul9485143618462"><li id="ALM-25008__li748513618463">If yes, no further action is required.</li><li id="ALM-25008__li548563615468">If no, go to <a href="#ALM-25008__li848412361466">4</a>.</li></ul>
</p></li></ol>
<p id="ALM-25008__p832011512539"><strong id="ALM-25008__b45360489359">Check whether the CPU usage reaches the upper limit.</strong></p>
<ol start="4" id="ALM-25008__ol5485203614469"><li id="ALM-25008__li848412361466"><a name="ALM-25008__li848412361466"></a><a name="li848412361466"></a><span>On FusionInsight Manager, choose <strong id="ALM-25008__b971913559358">O&amp;M</strong> &gt; <strong id="ALM-25008__b972118552354">Alarm</strong> &gt; <strong id="ALM-25008__b1272255523516">Alarms</strong>. In the right pane, click this alarm and obtain the host name in <strong id="ALM-25008__b972310555352">Location</strong>.</span></li><li id="ALM-25008__li1248417366465"><a name="ALM-25008__li1248417366465"></a><a name="li1248417366465"></a><span>Choose <strong id="ALM-25008__b3925118345">Cluster</strong> &gt; <strong id="ALM-25008__b199251915349">Services</strong> &gt; <strong id="ALM-25008__b69256113343">LdapServer</strong>, click the <strong id="ALM-25008__b8916032173415">Instance</strong> tab, and click the SlapdServer instance corresponding to the host name in <a href="#ALM-25008__li848412361466">4</a>.</span></li><li id="ALM-25008__li133258517208"><a name="ALM-25008__li133258517208"></a><a name="li133258517208"></a><span>On the dashboard of the instance, observe the real-time data of the <strong id="ALM-25008__b159977196486">CPU Usage of a Single SlapdServer Instance</strong> chart for about 5 minutes and check whether the CPU usage exceeds the threshold (<strong id="ALM-25008__b128032916541">75%</strong> by default) for multiple times.</span><p><ul id="ALM-25008__ul1846911369207"><li id="ALM-25008__li1246923622018">If yes, go to <a href="#ALM-25008__li14826210161714">7</a>.</li><li id="ALM-25008__li0145124915202">If no, go to <a href="#ALM-25008__li89991152124618">9</a>.</li></ul>
</p></li><li id="ALM-25008__li14826210161714"><a name="ALM-25008__li14826210161714"></a><a name="li14826210161714"></a><span>Check whether the status of other SlapdServer instances is normal. For details, see <a href="#ALM-25008__li1248417366465">5</a> to <a href="#ALM-25008__li133258517208">6</a>.</span><p><ul id="ALM-25008__ul53828202177"><li id="ALM-25008__li1298672511175">If yes, contact the MRS cluster administrator to evaluate whether to expand the capacity of SlapdServer instances. Then, go to <a href="#ALM-25008__li12485203614616">8</a>.</li><li id="ALM-25008__li4382920191715">If no, repair the faulty SlapdServer instance and go to <a href="#ALM-25008__li12485203614616">8</a>.</li></ul>
</p></li><li id="ALM-25008__li12485203614616"><a name="ALM-25008__li12485203614616"></a><a name="li12485203614616"></a><span>Check whether the alarm is cleared.</span><p><ul id="ALM-25008__ul16484153654614"><li id="ALM-25008__li15484163634617">If yes, no further action is required.</li><li id="ALM-25008__li184842368460">If no, go to <a href="#ALM-25008__li89991152124618">9</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-25008__p22707215144835"><strong id="ALM-25008__b17539175141012">Collect fault information.</strong></p>
<ol start="9" id="ALM-25008__ol14015319462"><li id="ALM-25008__li89991152124618"><a name="ALM-25008__li89991152124618"></a><a name="li89991152124618"></a><span>On FusionInsight Manager, choose <strong id="ALM-25008__b18375135361015">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-25008__b1637515331015">Log</strong> &gt; <strong id="ALM-25008__b3375553171015">Download</strong>.</span></li><li id="ALM-25008__li15999175218461"><span>Expand the <strong id="ALM-25008__b2060812549107">Service</strong> drop-down list, and select <strong id="ALM-25008__b176086541100">LdapServer</strong> for the target cluster.</span></li><li id="ALM-25008__li1799955234619"><span>Click <span><img id="ALM-25008__image1299965219461" src="en-us_image_0000002008258989.png"></span> in the upper right corner, and set <strong id="ALM-25008__b9290115818109">Start Date</strong> and <strong id="ALM-25008__b02911158101019">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-25008__b4291185881017">Download</strong>.</span></li><li id="ALM-25008__li1602535462"><span>Contact <span id="ALM-25008__text176166613113">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-25008__section169311343318"><h4 class="sectiontitle"><span id="ALM-25008__text367020138593">Alarm Clearance</span></h4><p id="ALM-25008__p754913417333">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-25008__section53362350"><h4 class="sectiontitle"><span id="ALM-25008__text1246242445916">Related Information</span></h4><p id="ALM-25008__p7522741"><span id="ALM-25008__text1881919412591">None.</span></p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,91 @@
<a name="ALM-29007"></a><a name="ALM-29007"></a>
<h1 class="topictitle1">ALM-29007 Impalad Process Memory Usage Exceeds the Threshold</h1>
<div id="body0000001282768122"><div class="section" id="ALM-29007__section8280367"><h4 class="sectiontitle">Alarm Description</h4><p id="ALM-29007__p1658442017419">The system checks the memory usage of the Impalad process every 30 seconds. This alarm is generated when the system detects that the memory usage exceeds the default threshold (80%).</p>
<p id="ALM-29007__p2335053105020">This alarm is automatically cleared when the system detects that the memory usage of the process falls below the threshold.</p>
</div>
<div class="section" id="ALM-29007__section7414445"><h4 class="sectiontitle">Alarm Attributes</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-29007__table45079949" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-29007__row5683496"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-29007__p57710042">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-29007__p44001849">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-29007__p7380012">Auto Cleared</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-29007__row60910108"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-29007__p16488194717492">29007</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-29007__p588994817496">Minor</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-29007__p34071398">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-29007__section66730009"><h4 class="sectiontitle">Alarm Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-29007__table8319831" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-29007__row40868022"><th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-29007__p142444173412">Type</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-29007__p21975462">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="60%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-29007__p35182007">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-29007__row594512751512"><td class="cellrowborder" rowspan="4" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29007__p193386121348">Location Information</p>
</td>
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29007__p8838358184914">Source</p>
</td>
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-29007__p837170125015">Specifies the cluster for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29007__row31170320"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29007__p39123317">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29007__p172628810500">Specifies the service for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29007__row8175713133714"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29007__p37226997">RoleName</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29007__p18175513173712">Specifies the role for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29007__row144886177375"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29007__p24881417123717">HostName</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29007__p13488417133716">Specifies the host for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29007__row1688158103712"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29007__p152418412348">Additional Information</p>
</td>
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29007__p688484371">Trigger Condition</p>
</td>
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-29007__p19884817379">Specifies the threshold for triggering the alarm.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-29007__section63699172"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-29007__p485055019508">The memory usage is too high. Some query tasks may fail due to insufficient memory.</p>
</div>
<div class="section" id="ALM-29007__section36421639"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-29007__p9402509505">The Impalad process is executing a large number of query tasks.</p>
</div>
<div class="section" id="ALM-29007__section61311810131118"><h4 class="sectiontitle">Handling Procedure</h4><ol id="ALM-29007__ol1598183211416"><li id="ALM-29007__li39816321245"><span>On FusionInsight Manager, choose <strong id="ALM-29007__b1159183925518">O&amp;M</strong> &gt; <strong id="ALM-29007__b5160939195510">Alarm</strong> &gt; <strong id="ALM-29007__b116013916555">Thresholds</strong> &gt; <strong id="ALM-29007__b616015395555">Impala</strong> &gt; <strong id="ALM-29007__b6160939195519">CPU and Memory</strong> &gt; <strong id="ALM-29007__b1160153918559">Impalad Process Memory Usage (Impalad)</strong> and check the threshold.</span></li><li id="ALM-29007__li6595161750"><span>If the alarm threshold is smaller than 80%, increase the alarm threshold as required and check whether the alarm is cleared.</span><p><ul id="ALM-29007__ul941175682912"><li id="ALM-29007__li241456102913">If yes, no further action is required.</li><li id="ALM-29007__li1032055153019">If no, go to <a href="#ALM-29007__li54643151153">3</a>.</li></ul>
</p></li><li id="ALM-29007__li54643151153"><a name="ALM-29007__li54643151153"></a><a name="li54643151153"></a><span>If the threshold is greater than 80%, check whether a large number of concurrent query tasks exist when the alarm is generated. A large number of concurrent query tasks will cause the memory usage to increase sharply. After the tasks are complete, check whether the alarm is automatically cleared. During this period, some tasks may fail to be executed or may be canceled due to insufficient memory. In this case, try again.</span><p><div class="note" id="ALM-29007__note35700516451"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-29007__p257015514518">If the memory usage always exceeds the threshold, the cluster capacity needs to be expanded.</p>
</div></div>
<ul id="ALM-29007__ul1769835811449"><li id="ALM-29007__li10698175824412">If yes, no further action is required.</li><li id="ALM-29007__li14698258154413">If no, go to <a href="#ALM-29007__li1698242954313">4</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-29007__p39821129144316"><strong id="ALM-29007__b17406162195715">Collect fault information.</strong></p>
<ol start="4" id="ALM-29007__ol189821329134317"><li id="ALM-29007__li1698242954313"><a name="ALM-29007__li1698242954313"></a><a name="li1698242954313"></a><span>On FusionInsight Manager of the active or standby cluster, choose <strong id="ALM-29007__b694017416572">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-29007__b59401846573">Log</strong> &gt; <strong id="ALM-29007__b109404445710">Download</strong>.</span></li><li id="ALM-29007__li27049781154249"><span>Expand the <strong id="ALM-29007__b149911610572">Service</strong> drop-down list, and select <strong id="ALM-29007__b74991264575">Impala</strong> for the target cluster.</span></li><li id="ALM-29007__li1498212919436"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-29007__b935629185712">Start Date</strong> and <strong id="ALM-29007__b53568916572">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-29007__b33564918571">Download</strong>.</span></li><li id="ALM-29007__li56393916154249"><span>Contact <span id="ALM-29007__text16720101425714">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-29007__section169311343318"><h4 class="sectiontitle">Alarm Clearance</h4><p id="ALM-29007__p55781648135011">The alarm is automatically cleared after the burst concurrent tasks are complete.</p>
</div>
<div class="section" id="ALM-29007__section53362350"><h4 class="sectiontitle">Related Information</h4><p id="ALM-29007__p7522741">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,91 @@
<a name="ALM-29008"></a><a name="ALM-29008"></a>
<h1 class="topictitle1">ALM-29008 Number of ODBC Connections to Impalad Exceeds the Threshold</h1>
<div id="body0000001282448558"><div class="section" id="ALM-29008__section8280367"><h4 class="sectiontitle">Alarm Description</h4><p id="ALM-29008__p1268811311458">The system checks the number of client connections to the Impalad node every 30 seconds. This alarm is generated when the number of client connections exceeds the customized threshold (60 by default).</p>
<p id="ALM-29008__p2335053105020">This alarm is automatically cleared when the number of client connections is less than the threshold.</p>
</div>
<div class="section" id="ALM-29008__section7414445"><h4 class="sectiontitle">Alarm Attributes</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-29008__table45079949" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-29008__row5683496"><th align="left" class="cellrowborder" valign="top" width="33.3033303330333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-29008__p57710042">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.36333633363336%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-29008__p44001849">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-29008__p7380012">Auto Cleared</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-29008__row60910108"><td class="cellrowborder" valign="top" width="33.3033303330333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-29008__p16488194717492">29008</p>
</td>
<td class="cellrowborder" valign="top" width="33.36333633363336%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-29008__p588994817496">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-29008__p34071398">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-29008__section66730009"><h4 class="sectiontitle">Alarm Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-29008__table8319831" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-29008__row40868022"><th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-29008__p981243023416">Type</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-29008__p21975462">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="60%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-29008__p35182007">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-29008__row594512751512"><td class="cellrowborder" rowspan="4" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29008__p660023953412">Location Information</p>
</td>
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29008__p8838358184914">Source</p>
</td>
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-29008__p837170125015">Specifies the cluster for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29008__row31170320"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29008__p39123317">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29008__p172628810500">Specifies the service for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29008__row883552454311"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29008__p1583622417439">RoleName</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29008__p783622419433">Specifies the role for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29008__row164912316433"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29008__p154923174310">HostName</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29008__p1349163115432">Specifies the host for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29008__row1305193818439"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29008__p15812530183414">Additional Information</p>
</td>
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29008__p1130583818431">Trigger Condition</p>
</td>
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-29008__p33051338164316">Specifies the threshold for triggering the alarm.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-29008__section63699172"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-29008__p485055019508">New client connections may be blocked or even fail.</p>
</div>
<div class="section" id="ALM-29008__section36421639"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-29008__p9402509505">The number of client connections maintained by the Impalad service is too large or the threshold is too small.</p>
</div>
<div class="section" id="ALM-29008__section20662238181315"><h4 class="sectiontitle">Handling Procedure</h4><ol id="ALM-29008__ol398575283918"><li id="ALM-29008__li1898555216394"><span>On FusionInsight Manager, choose <strong id="ALM-29008__b121256143357">O&amp;M </strong>&gt; <strong id="ALM-29008__b16125121473512">Alarm</strong> &gt; <strong id="ALM-29008__b16125141412354">Thresholds</strong> &gt; <strong id="ALM-29008__b6125101443510">Impala</strong> &gt; <strong id="ALM-29008__b412515147355">Connections</strong> &gt; <strong id="ALM-29008__b12125101410353">Number of ODBC Connections to Impalad Process (Impalad)</strong> to check the threshold.</span></li><li id="ALM-29008__li1232161715409"><span>Check the number of ODBC applications connected to Impalad and stop idle applications. Check whether the alarm is automatically cleared.</span><p><ul id="ALM-29008__ul1437394518402"><li id="ALM-29008__li9373184519406">If yes, no further action is required.</li><li id="ALM-29008__li5327195094015">If no, go to <a href="#ALM-29008__li1507754134111">3</a> to change the number of concurrent connections supported by Impalad.</li></ul>
</p></li><li id="ALM-29008__li1507754134111"><a name="ALM-29008__li1507754134111"></a><a name="li1507754134111"></a><span>On FusionInsight Manager, choose <strong id="ALM-29008__b1618423443519">Cluster</strong> &gt; <strong id="ALM-29008__b17184193412354">Impala</strong> &gt; <strong id="ALM-29008__b21848340351">Configurations</strong> &gt; <strong id="ALM-29008__b1718415344354">All Configurations</strong> &gt; <strong id="ALM-29008__b1918516346351">Impalad</strong> &gt; <strong id="ALM-29008__b7185173413359">Customization</strong>. Add the custom parameter <strong id="ALM-29008__b3185334173516">--fe_service_threads</strong>. The default value of this parameter is <strong id="ALM-29008__b11185934143518">64</strong>. Change the value as required and click <strong id="ALM-29008__b918563414350">Save</strong>.</span></li><li id="ALM-29008__li128051144134613"><span>After the query tasks on all clients are complete, click the <strong id="ALM-29008__b154063993614">Instances</strong> tab. Select all Impalad instances, and restart them.</span><p><div class="note" id="ALM-29008__note220918329390"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-29008__en-us_topic_0294848303_p0481144124012">The service will become unavailable when all instances are restarted. If a single instance is restarted, the tasks that are being executed on that instance will fail and the service will become available.</p>
</div></div>
</p></li><li id="ALM-29008__li313119456566"><span>After the restart is complete, check whether the alarm is cleared.</span><p><ul id="ALM-29008__ul554711711577"><li id="ALM-29008__li105478719578">If yes, no further action is required.</li><li id="ALM-29008__li165471275576">If yes, go to <a href="#ALM-29008__li17918612154249">6</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-29008__p3847019615437"><strong id="ALM-29008__b1184566123718">Collect fault information.</strong></p>
<ol start="6" id="ALM-29008__ol18403783154311"><li id="ALM-29008__li17918612154249"><a name="ALM-29008__li17918612154249"></a><a name="li17918612154249"></a><span>On FusionInsight Manager, choose <strong id="ALM-29008__b92041888374">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-29008__b10204158143719">Log</strong> &gt; <strong id="ALM-29008__b1520416853712">Download</strong>.</span></li><li id="ALM-29008__li27049781154249"><span>Expand the <strong id="ALM-29008__b145588914370">Service</strong> drop-down list, and select <strong id="ALM-29008__b14558199173717">Impala</strong> for the target cluster.</span></li><li id="ALM-29008__li42121445154249"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-29008__b18417611143713">Start Date</strong> and <strong id="ALM-29008__b1341791133710">End Date</strong> for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-29008__b12417111113713">Download</strong>.</span></li><li id="ALM-29008__li56393916154249"><span>Contact <span id="ALM-29008__text876211216374">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-29008__section169311343318"><h4 class="sectiontitle">Alarm Clearance</h4><p id="ALM-29008__p55781648135011">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-29008__section53362350"><h4 class="sectiontitle">Related Information</h4><p id="ALM-29008__p7522741">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,93 @@
<a name="ALM-29010"></a><a name="ALM-29010"></a>
<h1 class="topictitle1">ALM-29010 Number of Queries Being Submitted by Impalad Exceeds the Threshold</h1>
<div id="body28230097"><div class="section" id="ALM-29010__section8280367"><h4 class="sectiontitle">Alarm Description</h4><p id="ALM-29010__p1268811311458">The system checks the total number of queries being submitted by the Impalad node every 60 seconds. This alarm is generated when the number of queries exceeds the customized threshold (150 by default).</p>
<p id="ALM-29010__p2335053105020">This alarm is automatically cleared when the number of queries is less than the threshold.</p>
</div>
<div class="section" id="ALM-29010__section7414445"><h4 class="sectiontitle">Alarm Attributes</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-29010__table45079949" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-29010__row5683496"><th align="left" class="cellrowborder" valign="top" width="33.3033303330333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-29010__p57710042">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.36333633363336%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-29010__p44001849">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-29010__p7380012">Auto Cleared</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-29010__row60910108"><td class="cellrowborder" valign="top" width="33.3033303330333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-29010__p16488194717492">29010</p>
</td>
<td class="cellrowborder" valign="top" width="33.36333633363336%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-29010__p588994817496">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-29010__p34071398">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-29010__section66730009"><h4 class="sectiontitle">Alarm Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-29010__table8319831" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-29010__row40868022"><th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-29010__p19765150143516">Type</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-29010__p21975462">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="60%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-29010__p35182007">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-29010__row594512751512"><td class="cellrowborder" rowspan="4" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29010__p141661010103514">Location Information</p>
</td>
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29010__p8838358184914">Source</p>
</td>
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-29010__p837170125015">Specifies the cluster for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29010__row31170320"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29010__p39123317">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29010__p172628810500">Specifies the service for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29010__row883552454311"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29010__p1583622417439">RoleName</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29010__p783622419433">Specifies the role for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29010__row164912316433"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29010__p154923174310">HostName</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29010__p1349163115432">Specifies the host for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29010__row1305193818439"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29010__p1765904353">Additional Information</p>
</td>
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29010__p1130583818431">Trigger Condition</p>
</td>
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-29010__p33051338164316">Specifies the threshold for triggering the alarm.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-29010__section63699172"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-29010__p485055019508">The queries may be blocked or even fail.</p>
</div>
<div class="section" id="ALM-29010__section36421639"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-29010__p9402509505">The Impalad service has maintained a large number of queries, or the threshold is too small.</p>
</div>
<div class="section" id="ALM-29010__section20662238181315"><h4 class="sectiontitle">Handling Procedure</h4><ol id="ALM-29010__ol398575283918"><li id="ALM-29010__li1898555216394"><span>On FusionInsight Manager, choose <strong id="ALM-29010__b112652457235">O&amp;M</strong> &gt; <strong id="ALM-29010__b1926584552312">Alarm</strong> &gt; <strong id="ALM-29010__b426564522319">Thresholds</strong> &gt; <strong id="ALM-29010__b6265114532310">Impala</strong> &gt; <strong id="ALM-29010__b326554511233">Query Task Sum Statistics</strong> &gt; <strong id="ALM-29010__b4266345162313">Total number of Queries Being Submitted (Impalad)</strong> and check the threshold.</span><p><p id="ALM-29010__p16200204422518"><span><img id="ALM-29010__image17964155802615" src="en-us_image_0000002007649989.png"></span></p>
</p></li><li id="ALM-29010__li1232161715409"><span>Change the threshold.</span><p><p id="ALM-29010__p1428013915914"><span><img id="ALM-29010__image441151014594" src="en-us_image_0000001971169950.png"></span></p>
</p></li><li id="ALM-29010__li1507754134111"><span>Click the <strong id="ALM-29010__b2014612516170">Instances</strong> tab, select all Impalad instances, and restart them.</span><p><div class="note" id="ALM-29010__note220918329390"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-29010__en-us_topic_0294848303_p0481144124012">The service will become unavailable when all instances are restarted. If a single instance is restarted, the tasks that are being executed on that instance will fail and the service will become available.</p>
</div></div>
<p id="ALM-29010__p13942182613557"><span><img id="ALM-29010__image610162835512" src="en-us_image_0000001971010166.png"></span></p>
</p></li><li id="ALM-29010__li10975203610439"><span>After the restart is complete, check whether the alarm is cleared.</span><p><ul id="ALM-29010__ul5550203425918"><li id="ALM-29010__li15501934145913">If yes, no further action is required.</li><li id="ALM-29010__li55501534135916">If no, go to <a href="#ALM-29010__li17918612154249">5</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-29010__p3847019615437"><strong id="ALM-29010__b445483441518">Collect fault information.</strong></p>
<ol start="5" id="ALM-29010__ol18403783154311"><li id="ALM-29010__li17918612154249"><a name="ALM-29010__li17918612154249"></a><a name="li17918612154249"></a><span>On FusionInsight Manager, choose <strong id="ALM-29010__b1363713516156">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-29010__b14638935181518">Log</strong> &gt; <strong id="ALM-29010__b1463853510152">Download</strong>.</span></li><li id="ALM-29010__li27049781154249"><span>Expand the <strong id="ALM-29010__b173407374153">Service</strong> drop-down list, and select <strong id="ALM-29010__b15340103720156">Impala</strong> for the target cluster.</span></li><li id="ALM-29010__li42121445154249"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-29010__b1313103991517">Start Date</strong> and <strong id="ALM-29010__b1913239131519">End Date</strong> for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-29010__b71313971513">Download</strong>.</span></li><li id="ALM-29010__li56393916154249"><span>Contact <span id="ALM-29010__text164301140111513">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-29010__section169311343318"><h4 class="sectiontitle">Alarm Clearance</h4><p id="ALM-29010__p55781648135011">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-29010__section53362350"><h4 class="sectiontitle">Related Information</h4><p id="ALM-29010__p7522741">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,93 @@
<a name="ALM-29011"></a><a name="ALM-29011"></a>
<h1 class="topictitle1">ALM-29011 Number of Queries Being Executed by Impalad Exceeds the Threshold</h1>
<div id="body8999755"><div class="section" id="ALM-29011__section8280367"><h4 class="sectiontitle">Alarm Description</h4><p id="ALM-29011__p1268811311458">The system checks the total number of queries being executed by the Impalad node every 60 seconds. This alarm is generated when the number of queries exceeds the customized threshold (150 by default).</p>
<p id="ALM-29011__p2335053105020">This alarm is automatically cleared when the number of queries is less than the threshold.</p>
</div>
<div class="section" id="ALM-29011__section7414445"><h4 class="sectiontitle">Alarm Attributes</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-29011__table45079949" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-29011__row5683496"><th align="left" class="cellrowborder" valign="top" width="33.3033303330333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-29011__p57710042">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.36333633363336%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-29011__p44001849">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-29011__p7380012">Auto Cleared</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-29011__row60910108"><td class="cellrowborder" valign="top" width="33.3033303330333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-29011__p16488194717492">29011</p>
</td>
<td class="cellrowborder" valign="top" width="33.36333633363336%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-29011__p588994817496">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-29011__p34071398">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-29011__section66730009"><h4 class="sectiontitle">Alarm Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-29011__table8319831" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-29011__row40868022"><th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-29011__p146625284351">Type</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-29011__p21975462">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="60%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-29011__p35182007">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-29011__row594512751512"><td class="cellrowborder" rowspan="4" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29011__p125689195367">Location Information</p>
</td>
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29011__p8838358184914">Source</p>
</td>
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-29011__p837170125015">Specifies the cluster for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29011__row31170320"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29011__p39123317">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29011__p172628810500">Specifies the service for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29011__row883552454311"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29011__p1583622417439">RoleName</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29011__p783622419433">Specifies the role for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29011__row164912316433"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29011__p154923174310">HostName</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29011__p1349163115432">Specifies the host for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29011__row1305193818439"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29011__p19662132817352">Additional Information</p>
</td>
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29011__p1130583818431">Trigger Condition</p>
</td>
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-29011__p33051338164316">Specifies the threshold for triggering the alarm.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-29011__section63699172"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-29011__p485055019508">The queries may be blocked or even fail.</p>
</div>
<div class="section" id="ALM-29011__section36421639"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-29011__p9402509505">The Impalad service has maintained a large number of queries, or the threshold is too small.</p>
</div>
<div class="section" id="ALM-29011__section20662238181315"><h4 class="sectiontitle">Handling Procedure</h4><ol id="ALM-29011__ol398575283918"><li id="ALM-29011__li1898555216394"><span>On FusionInsight Manager, choose <strong id="ALM-29011__b15560629125317">O&amp;M</strong> &gt; <strong id="ALM-29011__b1560112914533">Alarm</strong> &gt; <strong id="ALM-29011__b2560122916538">Thresholds</strong> &gt; <strong id="ALM-29011__b75611929135314">Impala</strong> &gt; <strong id="ALM-29011__b16561829125312">Query Task Sum Statistics</strong> &gt; <strong id="ALM-29011__b85611829165317">Total number of Queries Being Executed (Impalad)</strong> and check the threshold.</span><p><p id="ALM-29011__p9615111114018"><span><img id="ALM-29011__image585410413" src="en-us_image_0000002007530501.png"></span></p>
</p></li><li id="ALM-29011__li1232161715409"><span>Change the threshold.</span><p><p id="ALM-29011__p1428013915914"><span><img id="ALM-29011__image441151014594" src="en-us_image_0000002007649997.png"></span></p>
</p></li><li id="ALM-29011__li1507754134111"><span>Click the <strong id="ALM-29011__b076818269173">Instances</strong> tab, select all Impalad instances, and restart them.</span><p><div class="note" id="ALM-29011__note220918329390"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-29011__en-us_topic_0294848303_p0481144124012">The service will become unavailable when all instances are restarted. If a single instance is restarted, the tasks that are being executed on that instance will fail and the service will become available.</p>
</div></div>
<p id="ALM-29011__p33211614561"><span><img id="ALM-29011__image731017715569" src="en-us_image_0000001971169958.png"></span></p>
</p></li><li id="ALM-29011__li10975203610439"><span>After the restart is complete, check whether the alarm is cleared.</span><p><ul id="ALM-29011__ul5550203425918"><li id="ALM-29011__li15501934145913">If yes, no further action is required.</li><li id="ALM-29011__li55501534135916">If no, go to <a href="#ALM-29011__li17918612154249">5</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-29011__p3847019615437"><strong id="ALM-29011__b1529294803515">Collect fault information.</strong></p>
<ol start="5" id="ALM-29011__ol18403783154311"><li id="ALM-29011__li17918612154249"><a name="ALM-29011__li17918612154249"></a><a name="li17918612154249"></a><span>On FusionInsight Manager, choose <strong id="ALM-29011__b11424175273513">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-29011__b5425105283513">Log</strong> &gt; <strong id="ALM-29011__b74253525354">Download</strong>.</span></li><li id="ALM-29011__li27049781154249"><span>Expand the <strong id="ALM-29011__b19135754143511">Service</strong> drop-down list, and select <strong id="ALM-29011__b121357544355">Impala</strong> for the target cluster.</span></li><li id="ALM-29011__li42121445154249"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-29011__b1778245513356">Start Date</strong> and <strong id="ALM-29011__b47821755193514">End Date</strong> for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-29011__b778217552359">Download</strong>.</span></li><li id="ALM-29011__li56393916154249"><span>Contact <span id="ALM-29011__text4636165793517">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-29011__section169311343318"><h4 class="sectiontitle">Alarm Clearance</h4><p id="ALM-29011__p55781648135011">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-29011__section53362350"><h4 class="sectiontitle">Related Information</h4><p id="ALM-29011__p7522741">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,93 @@
<a name="ALM-29012"></a><a name="ALM-29012"></a>
<h1 class="topictitle1">ALM-29012 Number of Queries Being Waited by Impalad Exceeds the Threshold</h1>
<div id="body54399553"><div class="section" id="ALM-29012__section8280367"><h4 class="sectiontitle">Alarm Description</h4><p id="ALM-29012__p1268811311458">The system checks the total number of queries being waited by the Impalad node every 60 seconds. This alarm is generated when the number of queries exceeds the customized threshold (150 by default).</p>
<p id="ALM-29012__p2335053105020">This alarm is automatically cleared when the number of queries is less than the threshold.</p>
</div>
<div class="section" id="ALM-29012__section7414445"><h4 class="sectiontitle">Alarm Attributes</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-29012__table45079949" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-29012__row5683496"><th align="left" class="cellrowborder" valign="top" width="33.3033303330333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-29012__p57710042">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.36333633363336%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-29012__p44001849">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-29012__p7380012">Auto Cleared</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-29012__row60910108"><td class="cellrowborder" valign="top" width="33.3033303330333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-29012__p16488194717492">29012</p>
</td>
<td class="cellrowborder" valign="top" width="33.36333633363336%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-29012__p588994817496">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-29012__p34071398">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-29012__section66730009"><h4 class="sectiontitle">Alarm Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-29012__table8319831" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-29012__row40868022"><th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-29012__p165801043163619">Type</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-29012__p21975462">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="60%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-29012__p35182007">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-29012__row594512751512"><td class="cellrowborder" rowspan="4" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29012__p1874575153612">Location Information</p>
</td>
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29012__p8838358184914">Source</p>
</td>
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-29012__p837170125015">Specifies the cluster for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29012__row31170320"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29012__p39123317">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29012__p172628810500">Specifies the service for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29012__row883552454311"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29012__p1583622417439">RoleName</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29012__p783622419433">Specifies the role for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29012__row164912316433"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29012__p154923174310">HostName</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29012__p1349163115432">Specifies the host for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29012__row1305193818439"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29012__p175801743123613">Additional Information</p>
</td>
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29012__p1130583818431">Trigger Condition</p>
</td>
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-29012__p33051338164316">Specifies the threshold for triggering the alarm.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-29012__section63699172"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-29012__p485055019508">The queries may be blocked or even fail.</p>
</div>
<div class="section" id="ALM-29012__section36421639"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-29012__p9402509505">The Impalad service has maintained a large number of queries, or the threshold is too small.</p>
</div>
<div class="section" id="ALM-29012__section20662238181315"><h4 class="sectiontitle">Handling Procedure</h4><ol id="ALM-29012__ol398575283918"><li id="ALM-29012__li1898555216394"><span>On FusionInsight Manager, choose <strong id="ALM-29012__b11586614141312">O&amp;M</strong> &gt; <strong id="ALM-29012__b2058771413137">Alarm</strong> &gt; <strong id="ALM-29012__b17587141414139">Thresholds</strong> &gt; <strong id="ALM-29012__b6587101451312">Impala</strong> &gt; <strong id="ALM-29012__b1358771451313">Query Task Sum Statistics</strong> &gt; <strong id="ALM-29012__b10587141441317">Total number of Waiting Queries (Impalad)</strong> and check the threshold.</span><p><p id="ALM-29012__p201983247587"><span><img id="ALM-29012__image762624311422" src="en-us_image_0000001971010174.png"></span></p>
</p></li><li id="ALM-29012__li1232161715409"><span>Change the threshold.</span><p><p id="ALM-29012__p1428013915914"><span><img id="ALM-29012__image441151014594" src="en-us_image_0000002007530505.png"></span></p>
</p></li><li id="ALM-29012__li1507754134111"><span>Click the <strong id="ALM-29012__b17625111414">Instances</strong> tab, select all Impalad instances, and restart them.</span><p><div class="note" id="ALM-29012__note220918329390"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-29012__en-us_topic_0294848303_p0481144124012">The service will become unavailable when all instances are restarted. If a single instance is restarted, the tasks that are being executed on that instance will fail and the service will become available.</p>
</div></div>
<p id="ALM-29012__p17555202610561"><span><img id="ALM-29012__image4798202712565" src="en-us_image_0000002007650001.png"></span></p>
</p></li></ol><ol start="4" id="ALM-29012__ol3290027701"><li id="ALM-29012__li1929042719015"><span>After the restart is complete, check whether the alarm is cleared.</span><p><ul id="ALM-29012__ul5550203425918"><li id="ALM-29012__li15501934145913">If yes, no further action is required.</li><li id="ALM-29012__li55501534135916">If no, go to <a href="#ALM-29012__li17918612154249">5</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-29012__p3847019615437"><strong id="ALM-29012__b19185151143619">Collect fault information.</strong></p>
<ol start="5" id="ALM-29012__ol18403783154311"><li id="ALM-29012__li17918612154249"><a name="ALM-29012__li17918612154249"></a><a name="li17918612154249"></a><span>On FusionInsight Manager, choose <strong id="ALM-29012__b1954918528364">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-29012__b6549135273615">Log</strong> &gt; <strong id="ALM-29012__b1054912529360">Download</strong>.</span></li><li id="ALM-29012__li27049781154249"><span>Expand the <strong id="ALM-29012__b660005373610">Service</strong> drop-down list, and select <strong id="ALM-29012__b8600753113611">Impala</strong> for the target cluster.</span></li><li id="ALM-29012__li42121445154249"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-29012__b6321195553612">Start Date</strong> and <strong id="ALM-29012__b03211255103610">End Date</strong> for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-29012__b17321135519362">Download</strong>.</span></li><li id="ALM-29012__li56393916154249"><span>Contact <span id="ALM-29012__text136257566360">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-29012__section169311343318"><h4 class="sectiontitle">Alarm Clearance</h4><p id="ALM-29012__p55781648135011">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-29012__section53362350"><h4 class="sectiontitle">Related Information</h4><p id="ALM-29012__p7522741">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,91 @@
<a name="ALM-29013"></a><a name="ALM-29013"></a>
<h1 class="topictitle1">ALM-29013 Impalad FGC Time Exceeds the Threshold</h1>
<div id="body4757982"><div class="section" id="ALM-29013__section35798275"><h4 class="sectiontitle">Alarm Description</h4><p id="ALM-29013__p6121288">The system checks the FGC time of the Impalad service every 60 seconds. This alarm is generated when the FGC time exceeds the threshold (12 seconds) for five consecutive times. This alarm is cleared when the FGC time is less than or equal to the threshold.</p>
</div>
<div class="section" id="ALM-29013__section53749019"><h4 class="sectiontitle">Alarm Attributes</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-29013__table26062329" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-29013__row59129055"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-29013__p24724130">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-29013__p56497485">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-29013__p12893577">Auto Cleared</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-29013__row37746813"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-29013__p37593042">29013</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-29013__p25137584">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-29013__p22878398">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-29013__section13979128"><h4 class="sectiontitle">Alarm Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-29013__table41210916" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-29013__row28890097"><th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-29013__p832426173714">Type</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-29013__p58396496">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="60%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-29013__p32495724">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-29013__row18994184913243"><td class="cellrowborder" rowspan="4" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29013__p11296141553715">Location Information</p>
</td>
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29013__p156438591896">Source</p>
</td>
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-29013__p187931338134115">Specifies the cluster for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29013__row14907948"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29013__p65062640">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29013__p33433017">Specifies the service for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29013__row32461705"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29013__p35626567">RoleName</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29013__p44825864">Specifies the role for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29013__row779592"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29013__p51620924">HostName</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29013__p14632331">Specifies the host for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29013__row1016518552460"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29013__p1532418603711">Additional Information</p>
</td>
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29013__p57854422">Trigger Condition</p>
</td>
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-29013__p55696635">Specifies the threshold for triggering the alarm.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-29013__section58703289"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-29013__p44368143">Data read and write are affected.</p>
</div>
<div class="section" id="ALM-29013__section58567561"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-29013__p37049824">The memory of the node instance is overused or the heap memory is inappropriately allocated, causing frequent occurrence of the GC process.</p>
</div>
<div class="section" id="ALM-29013__section272394118192"><h4 class="sectiontitle">Handling Procedure</h4><p class="tableheading" id="ALM-29013__p16723124118197"><strong id="ALM-29013__b6480201345919">Check the GC time.</strong></p>
<ol id="ALM-29013__ol5723104121916"><li id="ALM-29013__li2072324101912"><a name="ALM-29013__li2072324101912"></a><a name="li2072324101912"></a><span>Choose <strong id="ALM-29013__b72119343592">O&amp;M</strong> &gt; <strong id="ALM-29013__b131013615591">Alarm</strong> &gt; <strong id="ALM-29013__b780743825919">Thresholds</strong> &gt; <strong id="ALM-29013__b1087055153910">Impala</strong> &gt; <strong id="ALM-29013__b24645592391">Process FGCT</strong> &gt; <strong id="ALM-29013__b1165486114010">Process FGCT of Impalad (Impalad)</strong>, and check the threshold (12s by default).</span><p><p id="ALM-29013__p197233419198"><span><img id="ALM-29013__image1972314121916" src="en-us_image_0000001971169962.png"></span></p>
</p></li><li id="ALM-29013__li1872374151910"><span>Log in to FusionInsight Manager, choose <strong id="ALM-29013__b3698116151210">O&amp;M</strong> &gt; <strong id="ALM-29013__b069891616129">Alarm</strong> &gt; <strong id="ALM-29013__b169871611125">Alarms</strong>, and check whether the alarm whose <strong id="ALM-29013__b1569881619124">Alarm ID</strong> is <strong id="ALM-29013__b1769811166122">29013</strong> exists in the alarm list.</span><p><ul id="ALM-29013__ul1472454131912"><li id="ALM-29013__li5724841131914">If yes, go to <a href="#ALM-29013__li972417412195">3</a>.</li><li id="ALM-29013__li1272413414194">If no, no further action is required.</li></ul>
</p></li><li id="ALM-29013__li972417412195"><a name="ALM-29013__li972417412195"></a><a name="li972417412195"></a><span>On FusionInsight Manager, choose <strong id="ALM-29013__b15488175314317">Cluster</strong> &gt; <strong id="ALM-29013__b451514568313">Impala</strong>, click the <strong id="ALM-29013__b21928873216">Instances</strong> tab, select the Impalad instance for which the alarm is generated, then click the <strong id="ALM-29013__b237743815482">Chart</strong> tab, locate the <strong id="ALM-29013__b8552103785211">Process FGCT</strong> chart, and check whether the FGC time is greater than the threshold in <a href="#ALM-29013__li2072324101912">1</a>.</span><p><ul id="ALM-29013__ul5724641121911"><li id="ALM-29013__li2072404111919">If yes, go to <a href="#ALM-29013__li16724841191916">4</a>.</li><li id="ALM-29013__li67241741141910">If no, go to <a href="#ALM-29013__li188852383514">5</a>.</li></ul>
</p></li><li id="ALM-29013__li16724841191916"><a name="ALM-29013__li16724841191916"></a><a name="li16724841191916"></a><span>Choose <strong id="ALM-29013__b13242017541">O&amp;M</strong> &gt; <strong id="ALM-29013__b1176214225420">Alarm</strong> &gt; <strong id="ALM-29013__b14831695417">Thresholds</strong> &gt; <strong id="ALM-29013__b13323141594218">Impala</strong> &gt; <strong id="ALM-29013__b133233155424">Process FGCT</strong> &gt; <strong id="ALM-29013__b17323415154213">Process FGCT of Impalad (Impalad)</strong>, and change the threshold to a value less than the time obtained in <a href="#ALM-29013__li972417412195">3</a>. Then, check whether the alarm is cleared.</span><p><ul id="ALM-29013__ul86162302368"><li id="ALM-29013__li136162030163617">If yes, no further action is required.</li><li id="ALM-29013__li261693263619">If no, go to <a href="#ALM-29013__li188852383514">5</a>.</li></ul>
</p></li></ol>
<p id="ALM-29013__p127211026143416"><strong id="ALM-29013__b21302019470">Collect fault information.</strong></p>
<ol start="5" id="ALM-29013__ol08857316353"><li id="ALM-29013__li188852383514"><a name="ALM-29013__li188852383514"></a><a name="li188852383514"></a><span>On FusionInsight Manager of the active or standby cluster, choose <strong id="ALM-29013__b294413174718">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-29013__b7945131114718">Log</strong> &gt; <strong id="ALM-29013__b794512110476">Download</strong>.</span></li><li id="ALM-29013__li1588563103511"><span>Expand the <strong id="ALM-29013__b7283639185013">Service</strong> drop-down list, and select <strong id="ALM-29013__b02830391501">Impala</strong> for the target cluster.</span></li><li id="ALM-29013__li08853393515"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-29013__b1322253185115">Start Date</strong> and <strong id="ALM-29013__b1622313315510">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-29013__b9223131195116">Download</strong>.</span></li><li id="ALM-29013__li178851635359"><span>Contact <span id="ALM-29013__text1323295855012">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-29013__section169311343318"><h4 class="sectiontitle">Alarm Clearance</h4><p id="ALM-29013__p754913417333">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-29013__section46352032"><h4 class="sectiontitle">Related Information</h4><p id="ALM-29013__p38036089">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,91 @@
<a name="ALM-29014"></a><a name="ALM-29014"></a>
<h1 class="topictitle1">ALM-29014 Catalog FGC Time Exceeds the Threshold</h1>
<div id="body58101744"><div class="section" id="ALM-29014__section3980122974317"><h4 class="sectiontitle">Alarm Description</h4><p id="ALM-29014__p1598018293434">The system checks the FGC time of the Catalog service every 60 seconds. This alarm is generated when the FGC time exceeds the threshold (12 seconds) for five consecutive times. This alarm is cleared when the FGC time is less than or equal to the threshold.</p>
</div>
<div class="section" id="ALM-29014__section19801296431"><h4 class="sectiontitle">Alarm Attributes</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-29014__table8980102912435" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-29014__row159801129114318"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-29014__p1498062964312">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-29014__p89801029144317">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-29014__p09807291430">Auto Cleared</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-29014__row598020294439"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-29014__p398012911433">29014</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-29014__p1798032914435">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-29014__p1098052917438">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-29014__section1498172914436"><h4 class="sectiontitle">Alarm Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-29014__table1498152994320" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-29014__row1098142911430"><th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-29014__p563318417372">Type</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-29014__p9981112918439">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="60%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-29014__p1498117299431">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-29014__row16981132912430"><td class="cellrowborder" rowspan="4" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29014__p175841353173714">Location Information</p>
</td>
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29014__p398172954315">Source</p>
</td>
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-29014__p1498152910435">Specifies the cluster for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29014__row298132919438"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29014__p1298114291434">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29014__p7981162911438">Specifies the service for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29014__row10981172954311"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29014__p598111297437">RoleName</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29014__p79811629144314">Specifies the role for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29014__row1898182934315"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29014__p098115293435">HostName</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29014__p1298172934317">Specifies the host for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29014__row4981182904315"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29014__p1163314123715">Additional Information</p>
</td>
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29014__p109811529154312">Trigger Condition</p>
</td>
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-29014__p8981182916434">Specifies the threshold for triggering the alarm.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-29014__section1798112919439"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-29014__p898142994317">Data read and write are affected.</p>
</div>
<div class="section" id="ALM-29014__section29811829134319"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-29014__p39811229174312">The memory of the node instance is overused or the heap memory is inappropriately allocated, causing frequent occurrence of the GC process.</p>
</div>
<div class="section" id="ALM-29014__section6981172914311"><h4 class="sectiontitle">Handling Procedure</h4><p class="tableheading" id="ALM-29014__p498117295433"><strong id="ALM-29014__b484018292591">Check the GC time.</strong></p>
<ol id="ALM-29014__ol109821829164313"><li id="ALM-29014__li10982129144318"><a name="ALM-29014__li10982129144318"></a><a name="li10982129144318"></a><span>Choose <strong id="ALM-29014__b651173512595">O&amp;M</strong> &gt; <strong id="ALM-29014__b1751535175914">Alarm</strong> &gt; <strong id="ALM-29014__b2051143575918">Thresholds</strong>, click the name of the desired cluster, choose <strong id="ALM-29014__b551123516590">Impala</strong> &gt; <strong id="ALM-29014__b1351183511599">Process FGCT</strong> &gt; <strong id="ALM-29014__b251163515591">Process FGCT of Catalog (Catalog)</strong>, and check the threshold (12s by default).</span><p><p id="ALM-29014__p95139951510"><span><img id="ALM-29014__image74411454134614" src="en-us_image_0000001971010178.png"></span></p>
</p></li><li id="ALM-29014__li198242914319"><span>Log in to FusionInsight Manager, choose <strong id="ALM-29014__b366210414017">O&amp;M</strong> &gt; <strong id="ALM-29014__b12662134407">Alarm</strong> &gt; <strong id="ALM-29014__b1866220415018">Alarms</strong>, and check whether the alarm whose <strong id="ALM-29014__b8662645014">Alarm ID</strong> is <strong id="ALM-29014__b466244604">29014</strong> exists in the alarm list.</span><p><ul id="ALM-29014__ul19814443101610"><li id="ALM-29014__li18141943121617">If yes, go to <a href="#ALM-29014__li244181931712">3</a>.</li><li id="ALM-29014__li237711467169">If no, no further action is required.</li></ul>
</p></li><li id="ALM-29014__li244181931712"><a name="ALM-29014__li244181931712"></a><a name="li244181931712"></a><span>On FusionInsight Manager, choose <strong id="ALM-29014__b15332201608">Cluster</strong> &gt; <strong id="ALM-29014__b93312202014">Impala</strong>, click the <strong id="ALM-29014__b1833020701">Instance</strong> tab, select the Catalog instance for which the alarm is generated, then click the <strong id="ALM-29014__b53315201002">Chart</strong> tab, locate the <strong id="ALM-29014__b20336201408">Process FGCT</strong> chart, and check whether the FGC time is greater than the threshold in <a href="#ALM-29014__li10982129144318">1</a>.</span><p><ul id="ALM-29014__ul122729421179"><li id="ALM-29014__li13272242121710">If yes, go to <a href="#ALM-29014__li12325539141817">4</a>.</li><li id="ALM-29014__li17888154471713">If no, go to <a href="#ALM-29014__li1698242954313">5</a>.</li></ul>
</p></li><li id="ALM-29014__li12325539141817"><a name="ALM-29014__li12325539141817"></a><a name="li12325539141817"></a><span>Choose <strong id="ALM-29014__b14976134810111">O&amp;M</strong> &gt; <strong id="ALM-29014__b199762481215">Alarm</strong> &gt; <strong id="ALM-29014__b1097620483113">Thresholds</strong>, click the name of the desired cluster, choose <strong id="ALM-29014__b49773486110">Impala</strong> &gt; <strong id="ALM-29014__b169774487110">Process FGCT</strong> &gt; <strong id="ALM-29014__b5977848418">Process FGCT of Catalog (Catalog)</strong>, and change the threshold to a value less than the time obtained in <a href="#ALM-29014__li244181931712">3</a>. Then, check whether the alarm is cleared.</span><p><ul id="ALM-29014__ul510774915176"><li id="ALM-29014__li131082049111717">If yes, no further action is required.</li><li id="ALM-29014__li4391255141719">If no, go to <a href="#ALM-29014__li1698242954313">5</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-29014__p39821129144316"><strong id="ALM-29014__b14938391926">Collect fault information.</strong></p>
<ol start="5" id="ALM-29014__ol189821329134317"><li id="ALM-29014__li1698242954313"><a name="ALM-29014__li1698242954313"></a><a name="li1698242954313"></a><span>On FusionInsight Manager of the active or standby cluster, choose <strong id="ALM-29014__b1075415408215">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-29014__b127541940928">Log</strong> &gt; <strong id="ALM-29014__b47542404216">Download</strong>.</span></li><li id="ALM-29014__li27049781154249"><span>Expand the <strong id="ALM-29014__b3810242725">Service</strong> drop-down list, and select <strong id="ALM-29014__b381020421218">Impala</strong> for the target cluster.</span></li><li id="ALM-29014__li1498212919436"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-29014__b107981112115816">Start Date</strong> and <strong id="ALM-29014__b1879815124585">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-29014__b19798121215811">Download</strong>.</span></li><li id="ALM-29014__li56393916154249"><span>Contact <span id="ALM-29014__text1230616520211">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-29014__section19982122910436"><h4 class="sectiontitle">Alarm Clearance</h4><p id="ALM-29014__p149821529154312">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-29014__section1298211296431"><h4 class="sectiontitle">Related Information</h4><p id="ALM-29014__p12982152913438">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,92 @@
<a name="ALM-29015"></a><a name="ALM-29015"></a>
<h1 class="topictitle1">ALM-29015 Catalog Process Memory Usage Exceeds the Threshold</h1>
<div id="body3948078"><div class="section" id="ALM-29015__section3980122974317"><h4 class="sectiontitle">Alarm Description</h4><p id="ALM-29015__p1658442017419">The system checks the memory usage of the Catalog process every 30 seconds. This alarm is generated when the system detects that the memory usage exceeds the default threshold (80%).</p>
<p id="ALM-29015__p2335053105020">This alarm is automatically cleared when the system detects that the memory usage of the process falls below the threshold.</p>
</div>
<div class="section" id="ALM-29015__section19801296431"><h4 class="sectiontitle">Alarm Attributes</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-29015__table8980102912435" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-29015__row159801129114318"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-29015__p1498062964312">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-29015__p89801029144317">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-29015__p09807291430">Auto Cleared</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-29015__row598020294439"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-29015__p398012911433">29015</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-29015__p1798032914435">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-29015__p1098052917438">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-29015__section1498172914436"><h4 class="sectiontitle">Alarm Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-29015__table1498152994320" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-29015__row1098142911430"><th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-29015__p8801612133810">Type</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-29015__p9981112918439">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="60%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-29015__p1498117299431">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-29015__row16981132912430"><td class="cellrowborder" rowspan="4" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29015__p2442413819">Location Information</p>
</td>
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29015__p398172954315">Source</p>
</td>
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-29015__p1498152910435">Specifies the cluster for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29015__row298132919438"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29015__p1298114291434">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29015__p7981162911438">Specifies the service for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29015__row10981172954311"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29015__p598111297437">RoleName</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29015__p79811629144314">Specifies the role for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29015__row1898182934315"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29015__p098115293435">HostName</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29015__p1298172934317">Specifies the host for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29015__row4981182904315"><td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29015__p1380181215387">Additional Information</p>
</td>
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29015__p109811529154312">Trigger Condition</p>
</td>
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-29015__p8981182916434">Specifies the threshold for triggering the alarm.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-29015__section1798112919439"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-29015__p485055019508">The memory usage is too high. Some query tasks may fail due to insufficient memory.</p>
</div>
<div class="section" id="ALM-29015__section29811829134319"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-29015__p39811229174312">The memory of the node instance is overused or the memory is inappropriately configured.</p>
</div>
<div class="section" id="ALM-29015__section1156322720376"><h4 class="sectiontitle">Handling Procedure</h4><ol id="ALM-29015__ol1598183211416"><li id="ALM-29015__li39816321245"><span>On FusionInsight Manager, choose <strong id="ALM-29015__b132371340784">O&amp;M</strong> &gt; <strong id="ALM-29015__b192371740183">Alarm</strong> &gt; <strong id="ALM-29015__b1123710401586">Thresholds</strong> &gt; <strong id="ALM-29015__b1323744014819">Impala</strong> &gt; <strong id="ALM-29015__b723712409820">CPU and Memory</strong> &gt; <strong id="ALM-29015__b1023717402819">Catalog Process Memory Usage (Impalad)</strong> and check the threshold.</span></li><li id="ALM-29015__li6595161750"><span>If the alarm threshold is smaller than 80%, increase the alarm threshold as required and check whether the alarm is cleared.</span><p><ul id="ALM-29015__ul941175682912"><li id="ALM-29015__li241456102913">If yes, no further action is required.</li><li id="ALM-29015__li8391144284811">If no, go to <a href="#ALM-29015__li54643151153">3</a>.</li></ul>
</p></li><li id="ALM-29015__li54643151153"><a name="ALM-29015__li54643151153"></a><a name="li54643151153"></a><span>If the threshold is greater than 80%, check whether a large number of concurrent query tasks exist when the alarm is generated. A large number of concurrent query tasks will cause the memory usage to increase sharply. After the tasks are complete, check whether the alarm is automatically cleared. During this period, some tasks may fail to be executed or may be canceled due to insufficient memory. In this case, try again.</span><p><div class="note" id="ALM-29015__note19651145104520"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-29015__p1365184564518">If the memory usage always exceeds the threshold, the cluster capacity needs to be expanded.</p>
</div></div>
<ul id="ALM-29015__ul1833142413458"><li id="ALM-29015__li1783316243457">If yes, no further action is required.</li><li id="ALM-29015__li1483314248456">If no, go to <a href="#ALM-29015__li1698242954313">4</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-29015__p39821129144316"><strong id="ALM-29015__b1711152219106">Collect fault information.</strong></p>
<ol start="4" id="ALM-29015__ol189821329134317"><li id="ALM-29015__li1698242954313"><a name="ALM-29015__li1698242954313"></a><a name="li1698242954313"></a><span>On FusionInsight Manager of the active or standby cluster, choose <strong id="ALM-29015__b13408162361010">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-29015__b16408023141010">Log</strong> &gt; <strong id="ALM-29015__b940892381011">Download</strong>.</span></li><li id="ALM-29015__li27049781154249"><span>Expand the <strong id="ALM-29015__b1572314241104">Service</strong> drop-down list, and select <strong id="ALM-29015__b1672372481016">Impala</strong> for the target cluster.</span></li><li id="ALM-29015__li1498212919436"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-29015__b525332618104">Start Date</strong> and <strong id="ALM-29015__b225320269102">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-29015__b82532026121016">Download</strong>.</span></li><li id="ALM-29015__li56393916154249"><span>Contact <span id="ALM-29015__text17541182712105">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-29015__section19982122910436"><h4 class="sectiontitle">Alarm Clearance</h4><p id="ALM-29015__p149821529154312">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-29015__section1298211296431"><h4 class="sectiontitle">Related Information</h4><p id="ALM-29015__p12982152913438">None</p>
</div>
<p id="ALM-29015__p8060118"></p>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,86 @@
<a name="ALM-29016"></a><a name="ALM-29016"></a>
<h1 class="topictitle1">ALM-29016 Impalad Instance in the Sub-healthy State</h1>
<div id="body37584996"><div class="section" id="ALM-29016__section3980122974317"><h4 class="sectiontitle">Alarm Description</h4><p id="ALM-29016__p1598018293434">In MRS 3.1.5, the system checks every 60 seconds whether the Hive Server2 HTTP port (28000) of Impalad responds to cURL requests. This alarm is generated when the returned result has been incorrect for 20 seconds in two consecutive times. This alarm is cleared when the system correctly responds within 20 seconds.</p>
<p id="ALM-29016__p15052028153413">In other MRS versions, the system checks every 60 seconds whether Impalad can execute <strong id="ALM-29016__b345731318319">select 1</strong>. This alarm is generated when the returned result has been incorrect for 20 seconds in two consecutive times. This alarm is cleared when the SQL statement is correctly executed within 20 seconds.</p>
</div>
<div class="section" id="ALM-29016__section19801296431"><h4 class="sectiontitle">Alarm Attributes</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-29016__table8980102912435" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-29016__row159801129114318"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-29016__p1498062964312">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-29016__p89801029144317">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-29016__p09807291430">Auto Cleared</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-29016__row598020294439"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-29016__p398012911433">29016</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-29016__p1798032914435">Minor</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-29016__p1098052917438">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-29016__section1498172914436"><h4 class="sectiontitle">Alarm Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-29016__table1498152994320" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-29016__row1098142911430"><th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-29016__p13622174219387">Type</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="20%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-29016__p9981112918439">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="60%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-29016__p1498117299431">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-29016__row16981132912430"><td class="cellrowborder" rowspan="4" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29016__p8242195393817">Location Information</p>
</td>
<td class="cellrowborder" valign="top" width="20%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29016__p398172954315">Source</p>
</td>
<td class="cellrowborder" valign="top" width="60%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-29016__p1498152910435">Specifies the cluster for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29016__row298132919438"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29016__p1298114291434">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29016__p7981162911438">Specifies the service for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29016__row10981172954311"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29016__p598111297437">RoleName</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29016__p79811629144314">Specifies the role for which the alarm was generated.</p>
</td>
</tr>
<tr id="ALM-29016__row1898182934315"><td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-29016__p098115293435">HostName</p>
</td>
<td class="cellrowborder" valign="top" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-29016__p1298172934317">Specifies the host for which the alarm was generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-29016__section1798112919439"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-29016__p898142994317">Impalad cannot execute SQL statements or SQL statement execution times out, which affects data read and write.</p>
</div>
<div class="section" id="ALM-29016__section29811829134319"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-29016__p39811229174312">The Impalad service maintains too many queries.</p>
</div>
<div class="section" id="ALM-29016__section6981172914311"><h4 class="sectiontitle">Handling Procedure</h4><ol id="ALM-29016__ol109821829164313"><li id="ALM-29016__li10982129144318"><span>Log in to FusionInsight Manager and choose <strong id="ALM-29016__b162912408151">Cluster</strong> &gt; <strong id="ALM-29016__b136931429157">Services</strong> &gt; <strong id="ALM-29016__b5510124691519">Impala</strong> &gt; <strong id="ALM-29016__b48111548171515">Impalad Web UI</strong>. On the displayed page, click any node to go to the web UI.</span></li><li id="ALM-29016__li19622101341418"><span>On the web UI, click <strong id="ALM-29016__b124811016171614">/backends</strong> to view the Impala instance list. Locate the instance for which the alarm is generated and click <strong id="ALM-29016__b169183211619">Web UI</strong>. After the web UI of the subhealthy node is displayed, click <strong id="ALM-29016__b11697151542219">/queries</strong> to check the task execution status and check whether any task is executed slowly.</span><p><ul id="ALM-29016__ul1279633192116"><li id="ALM-29016__li5796435213">If yes, go to <a href="#ALM-29016__li918651451111">3</a>.</li><li id="ALM-29016__li157961332213">If no, go to <a href="#ALM-29016__li668151171315">4</a>.</li></ul>
</p></li><li id="ALM-29016__li918651451111"><a name="ALM-29016__li918651451111"></a><a name="li918651451111"></a><span>After the task is complete, check whether the alarm is cleared.</span><p><ul id="ALM-29016__ul122729421179"><li id="ALM-29016__li13272242121710">If yes, no further action is required.</li><li id="ALM-29016__li17888154471713">If no, go to <a href="#ALM-29016__li668151171315">4</a>.</li></ul>
</p></li><li id="ALM-29016__li668151171315"><a name="ALM-29016__li668151171315"></a><a name="li668151171315"></a><span>On FusionInsight Manager, choose <strong id="ALM-29016__b75927321254">Cluster</strong> &gt; <strong id="ALM-29016__b321002334112">Services</strong> &gt; <strong id="ALM-29016__b125597344517">Impala</strong> &gt; <strong id="ALM-29016__b82444401956">Instances</strong>, select the Impala instance for which the alarm is generated, click <strong id="ALM-29016__b47915574404">More</strong>, and select <strong id="ALM-29016__b1219619624110">Restart Instance</strong>. Then, check whether the alarm is cleared.</span><p><ul id="ALM-29016__ul20421612152818"><li id="ALM-29016__li1942141232810">If yes, no further action is required.</li><li id="ALM-29016__li542131217283">If no, go to <a href="#ALM-29016__li1698242954313">5</a>.<div class="note" id="ALM-29016__note220918329390"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-29016__en-us_topic_0294848303_p0481144124012">The service will become unavailable when all instances are restarted. If a single instance is restarted, the tasks that are being executed on that instance will fail and the service will become available.</p>
</div></div>
</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-29016__p39821129144316"><strong id="ALM-29016__b383702116612">Collect fault information.</strong></p>
<ol start="5" id="ALM-29016__ol189821329134317"><li id="ALM-29016__li1698242954313"><a name="ALM-29016__li1698242954313"></a><a name="li1698242954313"></a><span>On FusionInsight Manager of the active or standby cluster, choose <strong id="ALM-29016__b4251224665">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-29016__b112513241267">Log</strong> &gt; <strong id="ALM-29016__b6261241861">Download</strong>.</span></li><li id="ALM-29016__li27049781154249"><span>Expand the <strong id="ALM-29016__b638117322617">Service</strong> drop-down list, and select <strong id="ALM-29016__b1738115324611">Impala</strong> for the target cluster.</span></li><li id="ALM-29016__li1498212919436"><span>Click <span><img id="ALM-29016__image2098272984311" src="en-us_image_0000002007530509.png"></span> in the upper right corner, and set <strong id="ALM-29016__b07561599719">Start Date</strong> and <strong id="ALM-29016__b87561697718">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-29016__b19756498719">Download</strong>.</span></li><li id="ALM-29016__li56393916154249"><span>Contact <span id="ALM-29016__text3528218674">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-29016__section19982122910436"><h4 class="sectiontitle">Alarm Clearance</h4><p id="ALM-29016__p149821529154312">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-29016__section1298211296431"><h4 class="sectiontitle">Related Information</h4><p id="ALM-29016__p12982152913438">None</p>
</div>
<p id="ALM-29016__p8060118"></p>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,98 @@
<a name="ALM-45003"></a><a name="ALM-45003"></a>
<h1 class="topictitle1">ALM-45003 HetuEngine QAS Disk Capacity Is Insufficient</h1>
<div id="body0000001971621298"><p id="ALM-45003__p12261122253615">This section applies to MRS 3.3.0 or later.</p>
<div class="section" id="ALM-45003__section38830555"><h4 class="sectiontitle"><span id="ALM-45003__text19212234396">Alarm Description</span></h4><p id="ALM-45003__p61215157">The system checks the HetuEngine QAS disk usage every 60 seconds and compares the actual disk usage with the threshold. The disk usage has a default threshold. This alarm is generated if the disk usage exceeds the threshold.</p>
<p id="ALM-45003__p14065507">To change the threshold, choose <strong id="ALM-45003__b1490134013289">O&amp;M</strong> &gt; <strong id="ALM-45003__b16743124210283">Alarm</strong> &gt; <strong id="ALM-45003__b124013454285">Thresholds</strong>. In the service list, choose <strong id="ALM-45003__b15641525153414">HetuEngine</strong> &gt; <strong id="ALM-45003__b634772863417">Disk</strong> &gt; <strong id="ALM-45003__b1669241615342">QAS Disk Usage (QAS)</strong>.</p>
<p id="ALM-45003__p59480702">If the <strong id="ALM-45003__b18138184493119">Trigger Count</strong> is <strong id="ALM-45003__b14138114453111">1</strong>, this alarm is cleared when the usage of the HetuEngine QAS disk is less than or equal to the threshold. If the <strong id="ALM-45003__b813804433116">Trigger Count</strong> is greater than <strong id="ALM-45003__b81381644163112">1</strong>, this alarm is cleared when the disk usage is less than or equal to 80% of the threshold.</p>
</div>
<div class="section" id="ALM-45003__section13930683"><h4 class="sectiontitle"><span id="ALM-45003__text1568221154719">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45003__table53207527" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45003__row5975028"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-45003__p14215279"><span id="ALM-45003__text7326193625810">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-45003__p10586963"><span id="ALM-45003__text88707444586">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-45003__p52237638"><span id="ALM-45003__text20302185317581">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45003__row3390323"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-45003__p6180736">45003</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-45003__p30877589">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-45003__p18056795">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45003__section58267284"><h4 class="sectiontitle"><span id="ALM-45003__text156858325471">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45003__table53314297" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45003__row12062690"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.1"><p id="ALM-45003__p37553849"><span id="ALM-45003__text8840132255920">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.2"><p id="ALM-45003__p21962957"><span id="ALM-45003__text55455293594">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45003__row04245146811"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45003__p17935380415">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45003__p187931338134115">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45003__row34169093"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45003__p41293795">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45003__p39817731">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45003__row22815263"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45003__p23892775">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45003__p38175373">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45003__row8034041"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45003__p14847206">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45003__p30890924">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45003__row9582866"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45003__p38014704">PartitionName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45003__p59292158">Specifies the disk partition for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45003__row63867374"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45003__p5874824">Trigger Condition</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45003__p6098726">Specifies the threshold for triggering the alarm.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45003__section54643513"><h4 class="sectiontitle"><span id="ALM-45003__text425161010595">Impact on the System</span></h4><p id="ALM-45003__p24234801">If the disk capacity is insufficient, QAS fails to write data, affecting SQL diagnosis and automatic recommendation of materialized views.</p>
</div>
<div class="section" id="ALM-45003__section22029569"><h4 class="sectiontitle"><span id="ALM-45003__text163478464119">Possible Causes</span></h4><ul id="ALM-45003__ul16861840"><li id="ALM-45003__li192121459141719">The alarm threshold is improperly configured.</li><li id="ALM-45003__li17538835">The configuration of the HetuEngine QAS disk cannot meet service requirements. The disk usage reaches the upper limit.</li></ul>
</div>
<div class="section" id="ALM-45003__section64048399"><h4 class="sectiontitle"><span id="ALM-45003__text331117561911">Handling Procedure</span></h4><p id="ALM-45003__p1370412232213"><strong id="ALM-45003__b1611263713220">Check whether the threshold is set properly.</strong></p>
<ol id="ALM-45003__ol16743111171010"><li id="ALM-45003__li158343817111"><span>Log in to FusionInsight Manager and choose <strong id="ALM-45003__b629585963617">O&amp;M</strong> &gt; <strong id="ALM-45003__b021620753716">Alarm</strong> &gt; <strong id="ALM-45003__b1533494375">Thresholds</strong>. In the service list, choose <strong id="ALM-45003__b13588141913375">HetuEngine</strong> &gt; <strong id="ALM-45003__b7743621153718">Disk</strong> &gt; <strong id="ALM-45003__b454913288375">QAS Disk Usage (QAS)</strong>. Check whether the alarm threshold is set properly. The default threshold is 80% of the disk capacity. You can change the threshold as required.</span><p><ul id="ALM-45003__ul1468718191916"><li id="ALM-45003__li156879194119">If the threshold is set properly, go to <a href="#ALM-45003__li1561212104442">4</a>.</li><li id="ALM-45003__li3496112119111">If the threshold is not set properly, go to <a href="#ALM-45003__li1673781151015">2</a>.</li></ul>
</p></li><li id="ALM-45003__li1673781151015"><a name="ALM-45003__li1673781151015"></a><a name="li1673781151015"></a><span>Click <strong id="ALM-45003__b21646515443">Modify</strong> in the <strong id="ALM-45003__b38393714446">Operation</strong> column to modify and save the alarm threshold as required.</span></li><li id="ALM-45003__li18737111110109"><span>Wait 2 minutes and check whether the alarm is cleared.</span><p><ul id="ALM-45003__ul1075412355913"><li id="ALM-45003__li1499044611910">If the alarm is cleared, no further action is required.</li><li id="ALM-45003__li1880155018910">If the alarm is not cleared, go to <a href="#ALM-45003__li1561212104442">4</a>.</li></ul>
</p></li></ol>
<p id="ALM-45003__p7788411192919"><strong id="ALM-45003__b215915391443">Check whether the disk usage reaches the upper limit.</strong></p>
<ol start="4" id="ALM-45003__ol92781418143311"><li id="ALM-45003__li1561212104442"><a name="ALM-45003__li1561212104442"></a><a name="li1561212104442"></a><span>Expand the alarm information, view the information in the <strong id="ALM-45003__b1752601204517">Location</strong> area, and check the role name and host name of the QAS disk where the alarm is generated.</span></li><li id="ALM-45003__li827841815332"><span>Choose <strong id="ALM-45003__b1184275834613">Cluster</strong> &gt; <strong id="ALM-45003__b37818119473">Services</strong> &gt; <strong id="ALM-45003__b1179871944716">HetuEngine</strong> and click <strong id="ALM-45003__b10874109124813">Instance</strong>. On the displayed page, click the QAS role name in the alarm information. On the instance page that is displayed, click <strong id="ALM-45003__b17263193134815">Chart</strong> and check whether the QAS disk usage in the <strong id="ALM-45003__b1446063984812">QAS Disk Usage</strong> chart exceeds the threshold (80% of the disk capacity by default).</span><p><ul id="ALM-45003__ul169016288390"><li id="ALM-45003__li9690828173912">If the disk usage reaches the upper limit, go to <a href="#ALM-45003__li1266819163911">6</a>.</li><li id="ALM-45003__li83115360398">If the disk usage does not reaches the upper limit, go to <a href="#ALM-45003__li1573581113104">9</a>.</li></ul>
</p></li><li id="ALM-45003__li1266819163911"><a name="ALM-45003__li1266819163911"></a><a name="li1266819163911"></a><span>Log in to the host of the node where the QAS instance reporting the alarm is located as the <strong id="ALM-45003__b7393153204518">root</strong> user.</span></li><li id="ALM-45003__li6148329204513"><span>Run the following command to go to the QAS data directory and delete temporary files as required:</span><p><p id="ALM-45003__p211612494811"><strong id="ALM-45003__b9986427104815">cd ${BIGDATA_DATA_HOME}/hetuengine/qas</strong></p>
<div class="notice" id="ALM-45003__note53085191806"><span class="noticetitle"><img src="public_sys-resources/notice_3.0-en-us.png"> </span><div class="noticebody"><p id="ALM-45003__p1324112161719">Deleting temporary files affects the latest QAS execution result but does not affect subsequent results.</p>
</div></div>
</p></li><li id="ALM-45003__li710418358470"><span>Wait 2 minutes and check whether the alarm is cleared.</span><p><ul id="ALM-45003__ul10362933114819"><li id="ALM-45003__li1136263316485">If the alarm is cleared, no further action is required.</li><li id="ALM-45003__li920194020481">If the alarm fails to be cleared, go to <a href="#ALM-45003__li1573581113104">9</a>.</li></ul>
</p></li></ol>
<p id="ALM-45003__p585083316916"><strong id="ALM-45003__b1769051635110">Collect fault information.</strong></p>
<ol start="9" id="ALM-45003__ol87351411121012"><li id="ALM-45003__li1573581113104"><a name="ALM-45003__li1573581113104"></a><a name="li1573581113104"></a><span>On FusionInsight Manager, choose <strong id="ALM-45003__b1616717232516">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-45003__b1316872365110">Log</strong> &gt; <strong id="ALM-45003__b1617092312515">Download</strong>.</span></li><li id="ALM-45003__li1873531161013"><span>Expand the <strong id="ALM-45003__b1918462885112">Service</strong> drop-down list, select <strong id="ALM-45003__b1718514288511">HetuEngine</strong> for the target cluster, and click <strong id="ALM-45003__b141879281514">OK</strong>.</span></li><li id="ALM-45003__li6421753013"><span>Expand the <strong id="ALM-45003__b1152119467515">Hosts</strong> drop-down list. In the <strong id="ALM-45003__b2523114618512">Select Host</strong> dialog box that is displayed, select the hosts to which the role belongs, and click <strong id="ALM-45003__b352411466511">OK</strong>.</span></li><li id="ALM-45003__li137359114102"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-45003__b122381351115113">Start Date</strong> and <strong id="ALM-45003__b12241205185111">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-45003__b18242205155114">Download</strong>.</span></li><li id="ALM-45003__li07351311121012"><span>Contact <span id="ALM-45003__text13121185435113">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-45003__section169311343318"><h4 class="sectiontitle"><span id="ALM-45003__text10274168326">Alarm Clearance</span></h4><p id="ALM-45003__p754913417333">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-45003__section39564683"><h4 class="sectiontitle"><span id="ALM-45003__text44238205212">Related Information</span></h4><p id="ALM-45003__p179052433154"><span id="ALM-45003__text1997020518156">None.</span></p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,88 @@
<a name="ALM-45289"></a><a name="ALM-45289"></a>
<h1 class="topictitle1">ALM-45289 PolicySync Heap Memory Usage Exceeds the Threshold</h1>
<div id="body41307931"><div class="note" id="ALM-45289__note14744151615401"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-45289__p1431519566">This section applies only to MRS 3.3.0 or later.</p>
</div></div>
<div class="section" id="ALM-45289__section61130422"><h4 class="sectiontitle"><span id="ALM-45289__text14838183534515">Alarm Description</span></h4><p id="ALM-45289__p14504164">The system checks the heap memory usage of the PolicySync service every 60 seconds. This alarm is generated when the heap memory usage of the PolicySync instance exceeds the threshold (95% of the maximum memory) for 10 consecutive times. This alarm is cleared when the heap memory usage is less than the threshold.</p>
</div>
<div class="section" id="ALM-45289__section13302888"><h4 class="sectiontitle"><span id="ALM-45289__text66488119489">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45289__table33986641" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45289__row13879140"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-45289__p50468531"><span id="ALM-45289__text1074744511529">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-45289__p61419199"><span id="ALM-45289__text529420513457">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-45289__p8899183"><span id="ALM-45289__text139206232502">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45289__row49745195"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-45289__p2829020">45289</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-45289__p27824106">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-45289__p39160124">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45289__section52617132"><h4 class="sectiontitle"><span id="ALM-45289__text0580183514489">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45289__table17853499" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45289__row18143824"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.1"><p id="ALM-45289__p60363621"><span id="ALM-45289__text12210145419505">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.2"><p id="ALM-45289__p57615147"><span id="ALM-45289__text1971012173566">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45289__row13401184712152"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45289__p0124015142017">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45289__p141241159202">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45289__row36315337"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45289__p4124161572012">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45289__p28465328">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45289__row54861362"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45289__p91242015132019">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45289__p40562973">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45289__row29522441"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45289__p71246159207">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45289__p20557292">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45289__row9721490519"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45289__p9124181519204">Trigger Condition</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45289__p21241615172015">Specifies the threshold for triggering the alarm.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45289__section3792148"><h4 class="sectiontitle"><span id="ALM-45289__text1127833410585">Impact on the System</span></h4><p id="ALM-45289__p6696265">Heap memory overflow may cause service breakdown.</p>
</div>
<div class="section" id="ALM-45289__section34129336"><h4 class="sectiontitle"><span id="ALM-45289__text10245783115">Possible Causes</span></h4><p id="ALM-45289__p5526629">The heap memory of the PolicySync instance is overused or the heap memory is inappropriately allocated.</p>
</div>
<div class="section" id="ALM-45289__section5600173202020"><h4 class="sectiontitle"><span id="ALM-45289__text35421632154">Handling Procedure</span></h4><ol id="ALM-45289__ol2380770"><li id="ALM-45289__li21426936"><span>Log in to FusionInsight Manager and choose <strong id="ALM-45289__b99641451585">O&amp;M</strong> &gt; <strong id="ALM-45289__b3964144518819">Alarm </strong>&gt; <strong id="ALM-45289__b8965194512815">Alarms </strong>&gt; <strong id="ALM-45289__b17965154510820">ALM-45289 PolicySync Heap Memory Usage Exceeds the Threshold</strong>. Check the location information of the alarm and view the host name of the instance for which the alarm is generated.</span></li><li id="ALM-45289__li58624704"><a name="ALM-45289__li58624704"></a><a name="li58624704"></a><span>On FusionInsight Manager, choose <strong id="ALM-45289__b527115152912">Cluster </strong>&gt; <strong id="ALM-45289__b1427110151196">Services </strong>&gt; <strong id="ALM-45289__b1527113151699">Ranger </strong>&gt; <strong id="ALM-45289__b72715157911">Instance</strong>. Select the role corresponding to the host name of the instance for which the alarm is generated. Click the drop-down list in the upper right corner of the chart area and choose <strong id="ALM-45289__b19271191516913">Customize </strong>&gt; <strong id="ALM-45289__b1627119153914">CPU and Memory</strong> &gt; <strong id="ALM-45289__b1427131511916">PolicySync Heap Memory Usage</strong>. Click <strong id="ALM-45289__b32711158910">OK</strong>.</span></li><li id="ALM-45289__li57860289"><span>Check whether the heap memory used by PolicySync reaches the threshold (95% of the maximum heap memory by default).</span><p><ul id="ALM-45289__ul50980558"><li id="ALM-45289__li56171838">If yes, go to <a href="#ALM-45289__li11521246145513">4</a>.</li><li id="ALM-45289__li53625048">If no, go to <a href="#ALM-45289__li42224042151734">6</a>.</li></ul>
</p></li><li id="ALM-45289__li11521246145513"><a name="ALM-45289__li11521246145513"></a><a name="li11521246145513"></a><span>On FusionInsight Manager, choose <strong id="ALM-45289__b17746201010">Cluster </strong>&gt; <strong id="ALM-45289__b1477156161013">Services </strong>&gt; <strong id="ALM-45289__b678196101019">Ranger </strong>&gt; <strong id="ALM-45289__b15781863109">Instance </strong>&gt; <strong id="ALM-45289__b2078169100">PolicySync</strong>. Click <strong id="ALM-45289__b7785613109">Instance Configuration</strong> and then <strong id="ALM-45289__b97836101013">All Configurations</strong>, and choose <strong id="ALM-45289__b187916618106">PolicySync </strong>&gt; <strong id="ALM-45289__b1479463109">System</strong>. Set <strong id="ALM-45289__b2743558191014">-Xmx</strong> in the <strong id="ALM-45289__b874435821016">GC_OPTS</strong> parameter to a larger value based on site requirements and save the configuration.</span><p><div class="note" id="ALM-45289__note14125215132018"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-45289__p2125171592019">If this alarm is generated, the heap memory configured for PolicySync cannot meet the heap memory required by the PolicySync process. You are advised to change the value of <strong id="ALM-45289__b170211331119">-Xmx</strong> in <strong id="ALM-45289__b57033313114">GC_OPTS</strong> to twice that of the heap memory used by PolicySync. You can change the value based on the actual service scenario. Refer to <a href="#ALM-45289__li58624704">2</a> to view the PolicySync heap memory usage.</p>
</div></div>
</p></li><li id="ALM-45289__li35301418"><span>Restart the affected services or instances and check whether the alarm is cleared.</span><p><ul id="ALM-45289__ul49277313"><li id="ALM-45289__li40842634">If yes, no further action is required.</li><li id="ALM-45289__li32039392">If no, go to <a href="#ALM-45289__li42224042151734">6</a>.<div class="notice" id="ALM-45289__note88641929172117"><span class="noticetitle"><img src="public_sys-resources/notice_3.0-en-us.png"> </span><div class="noticebody"><p id="ALM-45289__p13864122962115">When the service is rebooted, it becomes unavailable and can disrupt business operations. When the instance is rebooted, it cannot be used and any tasks running on the current instance node will fail.</p>
</div></div>
</li></ul>
</p></li></ol>
<p id="ALM-45289__p45053948"><strong id="ALM-45289__b35235717112450">Collect fault information.</strong></p>
<ol start="6" id="ALM-45289__ol41031367112456"><li id="ALM-45289__li42224042151734"><a name="ALM-45289__li42224042151734"></a><a name="li42224042151734"></a><span>On FusionInsight Manager, choose <strong id="ALM-45289__b11470051171118">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-45289__b2047175118119">Log</strong> &gt; <strong id="ALM-45289__b847125151116">Download</strong>.</span></li><li id="ALM-45289__li28093597"><span>Expand the <strong id="ALM-45289__b6537154161118">Service</strong> drop-down list, and select <strong id="ALM-45289__b1053716545112">Ranger</strong> for the target cluster.</span></li><li id="ALM-45289__li51515784"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-45289__b1896295911114">Start Date</strong> and <strong id="ALM-45289__b396385917117">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-45289__b4963195919116">Download</strong>.</span></li><li id="ALM-45289__li60988879"><span>Contact <span id="ALM-45289__text157218215128">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-45289__section2125181572010"><h4 class="sectiontitle"><span id="ALM-45289__text976142215819">Alarm Clearance</span></h4><p id="ALM-45289__p17125121572018">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-45289__section891955662611"><h4 class="sectiontitle"><span id="ALM-45289__text13373191116114">Related Information</span></h4><p id="ALM-45289__p139191756122619"><span id="ALM-45289__text13669101910115">None.</span></p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,90 @@
<a name="ALM-45290"></a><a name="ALM-45290"></a>
<h1 class="topictitle1">ALM-45290 PolicySync Direct Memory Usage Exceeds the Threshold</h1>
<div id="body18594423"><div class="note" id="ALM-45290__note14744151615401"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-45290__p1431519566">This section applies only to MRS 3.3.0 or later.</p>
</div></div>
<div class="section" id="ALM-45290__section35136898"><h4 class="sectiontitle"><span id="ALM-45290__text14838183534515">Alarm Description</span></h4><p id="ALM-45290__p19775813">The system checks the direct memory usage of the PolicySync service every 60 seconds. This alarm is generated when the direct memory usage of the PolicySync instance exceeds the threshold (90% of the maximum memory) for five consecutive times. This alarm is cleared when the PolicySync direct memory usage is less than or equal to the threshold.</p>
</div>
<div class="section" id="ALM-45290__section47796626"><h4 class="sectiontitle"><span id="ALM-45290__text66488119489">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45290__table58337011" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45290__row62299817"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-45290__p13120377"><span id="ALM-45290__text1074744511529">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-45290__p56117589"><span id="ALM-45290__text529420513457">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-45290__p49230886"><span id="ALM-45290__text139206232502">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45290__row28278868"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-45290__p8886982">45290</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-45290__p48756965">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-45290__p57000088">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45290__section27516457"><h4 class="sectiontitle"><span id="ALM-45290__text0580183514489">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45290__table53604424" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45290__row30968229"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.1"><p id="ALM-45290__p25398627"><span id="ALM-45290__text12210145419505">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.2"><p id="ALM-45290__p44022946"><span id="ALM-45290__text1971012173566">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45290__row1083384091512"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45290__p13858113752316">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45290__p187931338134115">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45290__row9088906"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45290__p39123317">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45290__p39642994">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45290__row21242631"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45290__p37226997">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45290__p54903620">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45290__row24370534"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45290__p66118565">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45290__p41764972">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45290__row5383818185117"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45290__p15179191519371">Trigger Condition</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45290__p1517911153376">Specifies the threshold for triggering the alarm.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45290__section46321527"><h4 class="sectiontitle"><span id="ALM-45290__text1127833410585">Impact on the System</span></h4><p id="ALM-45290__p27494118">Direct memory overflow may cause service breakdown.</p>
</div>
<div class="section" id="ALM-45290__section14240565"><h4 class="sectiontitle"><span id="ALM-45290__text10245783115">Possible Causes</span></h4><p id="ALM-45290__p12431106">The direct memory of the PolicySync process is overused or the direct memory is inappropriately allocated.</p>
</div>
<div class="section" id="ALM-45290__section912459145314"><h4 class="sectiontitle"><span id="ALM-45290__text35421632154">Handling Procedure</span></h4><p id="ALM-45290__p286699"><strong id="ALM-45290__b3633994112659">Check the direct memory usage.</strong></p>
<ol id="ALM-45290__ol2580296"><li id="ALM-45290__li23222664"><span>Log in to FusionInsight Manager and choose <strong id="ALM-45290__b1068562411310">O&amp;M</strong> &gt; <strong id="ALM-45290__b1368612242133">Alarm </strong>&gt; <strong id="ALM-45290__b46864240134">Alarms </strong>&gt; <strong id="ALM-45290__b6686924151310">ALM-45290 PolicySync Direct Memory Usage Exceeds the Threshold</strong>. Check the location information of the alarm and view the host name of the instance for which the alarm is generated.</span></li><li id="ALM-45290__li7677390"><a name="ALM-45290__li7677390"></a><a name="li7677390"></a><span>On FusionInsight Manager, choose <strong id="ALM-45290__b1072614211134">Cluster </strong>&gt; <strong id="ALM-45290__b77271842161316">Services </strong>&gt; <strong id="ALM-45290__b12727942131314">Ranger </strong>&gt; <strong id="ALM-45290__b172764219135">Instance</strong>. Select the role corresponding to the host name of the instance for which the alarm is generated. Click the drop-down list in the upper right corner of the chart area and choose <strong id="ALM-45290__b19728184210138">Customize </strong>&gt; <strong id="ALM-45290__b37282042151315">CPU and Memory</strong> &gt; <strong id="ALM-45290__b2728154211318">PolicySync Direct Memory Usage</strong>. Click <strong id="ALM-45290__b16728442141311">OK</strong>.</span></li><li id="ALM-45290__li1987650"><span>Check whether the direct memory used by the PolicySync reaches the threshold (90% of the maximum direct memory by default).</span><p><ul id="ALM-45290__ul17888853"><li id="ALM-45290__li26781952">If yes, go to <a href="#ALM-45290__li10450762161055">4</a>.</li><li id="ALM-45290__li21854507">If no, go to <a href="#ALM-45290__d0e43963">6</a>.</li></ul>
</p></li><li id="ALM-45290__li10450762161055"><a name="ALM-45290__li10450762161055"></a><a name="li10450762161055"></a><span>On FusionInsight Manager, choose <strong id="ALM-45290__b0657119154114">Cluster </strong>&gt; <strong id="ALM-45290__b765718974117">Services </strong>&gt; <strong id="ALM-45290__b76583964115">Ranger </strong>&gt; <strong id="ALM-45290__b76581994118">Instance </strong>&gt; <strong id="ALM-45290__b86581934115">PolicySync</strong>. Click <strong id="ALM-45290__b1465813974113">Instance Configuration</strong> and then <strong id="ALM-45290__b5658189114110">All Configurations</strong>, and choose <strong id="ALM-45290__b116583910419">PolicySync </strong>&gt; <strong id="ALM-45290__b965811917411">System</strong>. Set <strong id="ALM-45290__b2441912174119">-XX:MaxDirectMemorySize</strong> in the <strong id="ALM-45290__b64411312184116">GC_OPTS</strong> parameter to a larger value based on site requirements and save the configuration.</span><p><div class="note" id="ALM-45290__note1572143455414"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-45290__p65732349545">If this alarm is generated, the direct memory configured for PolicySync cannot meet the direct memory required by the PolicySync process. You are advised to check the direct memory usage of PolicySync and change the value of <strong id="ALM-45290__b8117183116548">-XX:MaxDirectMemorySize</strong> in <strong id="ALM-45290__b3118203110542">GC_OPTS</strong> to the twice of the direct memory used by PolicySync. You can change the value based on the actual service scenario. Refer to <a href="#ALM-45290__li7677390">2</a> to view the TokenServer direct memory usage.</p>
</div></div>
</p></li><li id="ALM-45290__li27134973"><span>Restart the affected services or instances and check whether the alarm is cleared.</span><p><ul id="ALM-45290__ul42888166"><li id="ALM-45290__li50449174">If yes, no further action is required.</li><li id="ALM-45290__li51389382">If no, go to <a href="#ALM-45290__d0e43963">6</a>.<div class="notice" id="ALM-45290__note88641929172117"><span class="noticetitle"><img src="public_sys-resources/notice_3.0-en-us.png"> </span><div class="noticebody"><p id="ALM-45290__p13864122962115">When the service is rebooted, it becomes unavailable and can disrupt business operations. When the instance is rebooted, it cannot be used and any tasks running on the current instance node will fail.</p>
</div></div>
</li></ul>
</p></li></ol>
<p id="ALM-45290__p1790386"><strong id="ALM-45290__b46521598112720">Collect fault information.</strong></p>
<ol start="6" id="ALM-45290__ol58212157112724"><li id="ALM-45290__d0e43963"><a name="ALM-45290__d0e43963"></a><a name="d0e43963"></a><span>On FusionInsight Manager, choose <strong id="ALM-45290__b548317219558">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-45290__b184841521185510">Log</strong> &gt; <strong id="ALM-45290__b204841121185519">Download</strong>.</span></li><li id="ALM-45290__li30123571"><span>Expand the <strong id="ALM-45290__b1910732315552">Service</strong> drop-down list, and select <strong id="ALM-45290__b17107142311559">Ranger</strong> for the target cluster.</span></li><li id="ALM-45290__li2676689"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-45290__b1631172525520">Start Date</strong> and <strong id="ALM-45290__b731525195515">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-45290__b153112565515">Download</strong>.</span></li><li id="ALM-45290__li24090206"><span>Contact <span id="ALM-45290__text1890426135510">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-45290__section169311343318"><h4 class="sectiontitle"><span id="ALM-45290__text976142215819">Alarm Clearance</span></h4><p id="ALM-45290__p754913417333">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-45290__section891955662611"><h4 class="sectiontitle"><span id="ALM-45290__text13373191116114">Related Information</span></h4><p id="ALM-45290__p139191756122619"><span id="ALM-45290__text13669101910115">None.</span></p>
</div>
<p id="ALM-45290__p8060118"></p>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,89 @@
<a name="ALM-45291"></a><a name="ALM-45291"></a>
<h1 class="topictitle1">ALM-45291 PolicySync Non-Heap Memory Usage Exceeds the Threshold</h1>
<div id="body12649095"><div class="note" id="ALM-45291__note14744151615401"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-45291__p1431519566">This section applies only to MRS 3.3.0 or later.</p>
</div></div>
<div class="section" id="ALM-45291__section26053439"><h4 class="sectiontitle"><span id="ALM-45291__text14838183534515">Alarm Description</span></h4><p id="ALM-45291__p13084344">The system checks the non-heap memory usage of the PolicySync service every 60 seconds. This alarm is generated when the non-heap memory usage of the PolicySync instance exceeds the threshold (90% of the maximum memory) for five consecutive times. This alarm is cleared when the non-heap memory usage is less than the threshold.</p>
</div>
<div class="section" id="ALM-45291__section33154367"><h4 class="sectiontitle"><span id="ALM-45291__text66488119489">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45291__table53198919" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45291__row62016895"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-45291__p57312624"><span id="ALM-45291__text1074744511529">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-45291__p11811006"><span id="ALM-45291__text529420513457">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-45291__p17167440"><span id="ALM-45291__text139206232502">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45291__row48385423"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-45291__p26905186">45291</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-45291__p31836466">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-45291__p28616934">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45291__section29953848"><h4 class="sectiontitle"><span id="ALM-45291__text0580183514489">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45291__table36270311" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45291__row13312348"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.1"><p id="ALM-45291__p4558436"><span id="ALM-45291__text12210145419505">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.2"><p id="ALM-45291__p33689020"><span id="ALM-45291__text1971012173566">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45291__row1348133541510"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45291__p13858113752316">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45291__p187931338134115">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45291__row44456076"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45291__p39123317">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45291__p21195702">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45291__row56543598"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45291__p37226997">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45291__p4748349">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45291__row42735142"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45291__p66118565">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45291__p4433805">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45291__row75609255517"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45291__p15179191519371">Trigger Condition</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45291__p1517911153376">Specifies the threshold for triggering the alarm.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45291__section1149181"><h4 class="sectiontitle"><span id="ALM-45291__text1127833410585">Impact on the System</span></h4><p id="ALM-45291__p17475349">Non-heap memory overflow may cause service breakdown.</p>
</div>
<div class="section" id="ALM-45291__section10342629"><h4 class="sectiontitle"><span id="ALM-45291__text10245783115">Possible Causes</span></h4><p id="ALM-45291__p6217170">The non-heap memory of the PolicySync instance is overused or the non-heap memory is inappropriately allocated.</p>
</div>
<div class="section" id="ALM-45291__section17250140155719"><h4 class="sectiontitle"><span id="ALM-45291__text35421632154">Handling Procedure</span></h4><p id="ALM-45291__p33828780"><strong id="ALM-45291__b2631472111298">Check non-heap memory usage.</strong></p>
<ol id="ALM-45291__ol36023567"><li id="ALM-45291__li55776650"><span>Log in to FusionInsight Manager and choose <strong id="ALM-45291__b14233104216567">O&amp;M</strong> &gt; <strong id="ALM-45291__b12233042165617">Alarm </strong>&gt; <strong id="ALM-45291__b18233442155616">Alarms </strong>&gt; <strong id="ALM-45291__b223420426563">ALM-45291 PolicySync Non-Heap Memory Usage Exceeds the Threshold</strong>. Check the location information of the alarm and view the host name of the instance for which the alarm is generated.</span></li><li id="ALM-45291__li32227806"><span>On FusionInsight Manager, choose <strong id="ALM-45291__b1647393115711">Cluster </strong>&gt; <strong id="ALM-45291__b174735355715">Services </strong>&gt; <strong id="ALM-45291__b134734313574">Ranger </strong>&gt; <strong id="ALM-45291__b64749318575">Instance</strong>. Select the role corresponding to the host name of the instance for which the alarm is generated. Click the drop-down list in the upper right corner of the chart area and choose <strong id="ALM-45291__b3474193105713">Customize </strong>&gt; <strong id="ALM-45291__b114743311572">CPU and Memory</strong> &gt; <strong id="ALM-45291__b124749325717">PolicySync Non-Heap Memory Usage</strong>. Click <strong id="ALM-45291__b0474143205715">OK</strong>.</span></li><li id="ALM-45291__li21614805"><span>Check whether the non-heap memory used by PolicySync reaches the threshold (90% of the maximum heap memory by default).</span><p><ul id="ALM-45291__ul60315518"><li id="ALM-45291__li5968756">If yes, go to <a href="#ALM-45291__li29985659161559">4</a>.</li><li id="ALM-45291__li13707216">If no, go to <a href="#ALM-45291__d0e44186">6</a>.</li></ul>
</p></li><li id="ALM-45291__li29985659161559"><a name="ALM-45291__li29985659161559"></a><a name="li29985659161559"></a><span>On FusionInsight Manager, choose <strong id="ALM-45291__b1146112326576">Cluster </strong>&gt; <strong id="ALM-45291__b18462632135717">Services </strong>&gt; <strong id="ALM-45291__b14628328572">Ranger </strong>&gt; <strong id="ALM-45291__b1846218321571">Instance </strong>&gt; <strong id="ALM-45291__b646243265717">PolicySync</strong>. Click <strong id="ALM-45291__b15462113275713">Instance Configuration</strong> and then <strong id="ALM-45291__b13462113216579">All Configurations</strong>, and choose <strong id="ALM-45291__b1346223275712">PolicySync </strong>&gt; <strong id="ALM-45291__b17462153215719">System</strong>. Set <strong id="ALM-45291__b19216181055810">-XX: MaxPermSize</strong> in the <strong id="ALM-45291__b102161103581">GC_OPTS</strong> parameter to a larger value based on site requirements and save the configuration.</span><p><div class="note" id="ALM-45291__note1572143455414"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-45291__p65732349545">If this alarm is generated, the non-heap memory size configured for the PolicySync instance cannot meet the non-heap memory required by the PolicySync process. You are advised to change the value of <strong id="ALM-45291__b3470111305819">-XX:MaxPermSize</strong> in <strong id="ALM-45291__b14470121313584">GC_OPTS</strong> to twice that of the current non-heap memory size or change the value based on site requirements.</p>
</div></div>
</p></li><li id="ALM-45291__li60448593"><span>Restart the affected services or instances and check whether the alarm is cleared.</span><p><ul id="ALM-45291__ul7166428"><li id="ALM-45291__li64497859">If yes, no further action is required.</li><li id="ALM-45291__li43609821">If no, go to <a href="#ALM-45291__d0e44186">6</a>.<div class="notice" id="ALM-45291__note88641929172117"><span class="noticetitle"><img src="public_sys-resources/notice_3.0-en-us.png"> </span><div class="noticebody"><p id="ALM-45291__p13864122962115">When the service is rebooted, it becomes unavailable and can disrupt business operations. When the instance is rebooted, it cannot be used and any tasks running on the current instance node will fail.</p>
</div></div>
</li></ul>
</p></li></ol>
<p id="ALM-45291__p42734651"><strong id="ALM-45291__b661712394581">Collect fault information.</strong></p>
<ol start="6" id="ALM-45291__ol15264135112926"><li id="ALM-45291__d0e44186"><a name="ALM-45291__d0e44186"></a><a name="d0e44186"></a><span>On FusionInsight Manager, choose <strong id="ALM-45291__b711941115810">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-45291__b1812441115813">Log</strong> &gt; <strong id="ALM-45291__b1812241205812">Download</strong>.</span></li><li id="ALM-45291__li15048140"><span>Expand the <strong id="ALM-45291__b13409142135812">Service</strong> drop-down list, and select <strong id="ALM-45291__b1040934220582">Ranger</strong> for the target cluster.</span></li><li id="ALM-45291__li1215532"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-45291__b1171816430582">Start Date</strong> and <strong id="ALM-45291__b3718124395820">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-45291__b371817435586">Download</strong>.</span></li><li id="ALM-45291__li10939791"><span>Contact <span id="ALM-45291__text1594844685812">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-45291__section169311343318"><h4 class="sectiontitle"><span id="ALM-45291__text976142215819">Alarm Clearance</span></h4><p id="ALM-45291__p754913417333">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-45291__section891955662611"><h4 class="sectiontitle"><span id="ALM-45291__text13373191116114">Related Information</span></h4><p id="ALM-45291__p139191756122619"><span id="ALM-45291__text13669101910115">None.</span></p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,91 @@
<a name="ALM-45292"></a><a name="ALM-45292"></a>
<h1 class="topictitle1">ALM-45292 PolicySync GC Duration Exceeds the Threshold</h1>
<div id="body16383168"><div class="note" id="ALM-45292__note14744151615401"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-45292__p1431519566">This section applies only to MRS 3.3.0 or later.</p>
</div></div>
<div class="section" id="ALM-45292__section8280367"><h4 class="sectiontitle"><span id="ALM-45292__text14838183534515">Alarm Description</span></h4><p id="ALM-45292__p11327101">The system checks the GC duration of the PolicySync process every 60 seconds. This alarm is generated when the GC duration of the PolicySync process exceeds the threshold for five consecutive times. This alarm is cleared when the GC duration is less than the threshold.</p>
</div>
<div class="section" id="ALM-45292__section7414445"><h4 class="sectiontitle"><span id="ALM-45292__text66488119489">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45292__table45079949" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45292__row5683496"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-45292__p57710042"><span id="ALM-45292__text1074744511529">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-45292__p44001849"><span id="ALM-45292__text529420513457">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-45292__p7380012"><span id="ALM-45292__text139206232502">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45292__row60910108"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-45292__p34771696">45292</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-45292__p149921348317">Critical (default threshold: 20000ms)</p>
<p id="ALM-45292__p65043985">Major (default threshold: 12000ms)</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-45292__p34071398">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45292__section66730009"><h4 class="sectiontitle"><span id="ALM-45292__text0580183514489">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45292__table8319831" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45292__row40868022"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.1"><p id="ALM-45292__p21975462"><span id="ALM-45292__text12210145419505">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.2"><p id="ALM-45292__p35182007"><span id="ALM-45292__text1971012173566">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45292__row594512751512"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45292__p13858113752316">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45292__p187931338134115">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45292__row31170320"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45292__p39123317">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45292__p27766973">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45292__row48576167"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45292__p37226997">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45292__p8237383">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45292__row7027591"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45292__p66118565">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45292__p4237968">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45292__row2952111825016"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45292__p15179191519371">Trigger Condition</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45292__p1517911153376">Specifies the threshold for triggering the alarm.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45292__section63699172"><h4 class="sectiontitle"><span id="ALM-45292__text1127833410585">Impact on the System</span></h4><p id="ALM-45292__p42193686">PolicySync responds slowly.</p>
</div>
<div class="section" id="ALM-45292__section36421639"><h4 class="sectiontitle"><span id="ALM-45292__text10245783115">Possible Causes</span></h4><p id="ALM-45292__p62245396">The heap memory of the PolicySync process is overused or inappropriately allocated, causing frequent occurrence of the GC process.</p>
</div>
<div class="section" id="ALM-45292__section1358115205716"><h4 class="sectiontitle"><span id="ALM-45292__text35421632154">Handling Procedure</span></h4><p id="ALM-45292__p8712319"><strong id="ALM-45292__b31998415113132">Check the GC duration.</strong></p>
<ol id="ALM-45292__ol11302008"><li id="ALM-45292__li34609214"><span>Log in to FusionInsight Manager and choose <strong id="ALM-45292__b10970195015910">O&amp;M</strong> &gt; <strong id="ALM-45292__b0970175065910">Alarm</strong> &gt; <strong id="ALM-45292__b1397085025918">Alarms</strong> &gt; <strong id="ALM-45292__b2971175055911">ALM-45292 PolicySync GC Duration Exceeds the Threshold</strong>. Check the location information of the alarm and view the host name of the instance for which the alarm is generated.</span></li><li id="ALM-45292__li43047473"><a name="ALM-45292__li43047473"></a><a name="li43047473"></a><span>On FusionInsight Manager, choose <strong id="ALM-45292__b57081555010">Cluster </strong>&gt; <strong id="ALM-45292__b1570885303">Services </strong>&gt; <strong id="ALM-45292__b17081752001">Ranger </strong>&gt; <strong id="ALM-45292__b570915701">Instance</strong>. Select the role corresponding to the host name of the instance for which the alarm is generated and click the drop-down list in the upper right corner of the chart area. Choose <strong id="ALM-45292__b77091655013">Customize </strong>&gt; <strong id="ALM-45292__b77091451709">GC</strong> &gt; <strong id="ALM-45292__b77091259011">PolicySync GC Duration</strong>. Click <strong id="ALM-45292__b137098512015">OK</strong>.</span></li><li id="ALM-45292__li51882938"><span>Check whether the GC duration of the PolicySync process collected every minute exceeds the threshold (12 seconds by default).</span><p><ul id="ALM-45292__ul64293258"><li id="ALM-45292__li41768410">If yes, go to <a href="#ALM-45292__d0e44388">4</a>.</li><li id="ALM-45292__li27798022">If no, go to <a href="#ALM-45292__d0e44409">6</a>.</li></ul>
</p></li><li id="ALM-45292__d0e44388"><a name="ALM-45292__d0e44388"></a><a name="d0e44388"></a><span>On FusionInsight Manager, choose <strong id="ALM-45292__b129648328011">Cluster </strong>&gt; <strong id="ALM-45292__b1496516321304">Services </strong>&gt; <strong id="ALM-45292__b796573218011">Ranger </strong>&gt; <strong id="ALM-45292__b12965132601">Instance </strong>&gt; <strong id="ALM-45292__b11965932407">PolicySync</strong>. Click <strong id="ALM-45292__b496611328019">Instance Configuration</strong> and then <strong id="ALM-45292__b896613214018">All Configurations</strong>, and choose <strong id="ALM-45292__b1596614328013">PolicySync </strong>&gt; <strong id="ALM-45292__b179661932101">System</strong>. Set <strong id="ALM-45292__b326343518014">-Xmx</strong> in the <strong id="ALM-45292__b526310352015">GC_OPTS</strong> parameter to a larger value based on site requirements and save the configuration.</span><p><div class="note" id="ALM-45292__note1572143455414"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-45292__p65732349545">If this alarm is generated, the heap memory configured for PolicySync cannot meet the heap memory required by the PolicySync process. You are advised to change the value of <strong id="ALM-45292__b169156361206">-Xmx</strong> in <strong id="ALM-45292__b4915536904">GC_OPTS</strong> to twice that of the heap memory used by PolicySync. You can change the value based on the actual service scenario. Refer to <a href="#ALM-45292__li43047473">2</a> to view the PolicySync heap memory usage.</p>
</div></div>
</p></li><li id="ALM-45292__li64990567"><span>Restart the affected services or instances and check whether the alarm is cleared.</span><p><ul id="ALM-45292__ul48044197"><li id="ALM-45292__li29744594">If yes, no further action is required.</li><li id="ALM-45292__li66374759">If no, go to <a href="#ALM-45292__d0e44409">6</a>.<div class="notice" id="ALM-45292__note15408019152111"><span class="noticetitle"><img src="public_sys-resources/notice_3.0-en-us.png"> </span><div class="noticebody"><p id="ALM-45292__p3408919142118">When the service is rebooted, it becomes unavailable and can disrupt business operations. When the instance is rebooted, it cannot be used and any tasks running on the current instance node will fail.</p>
</div></div>
</li></ul>
</p></li></ol>
<p id="ALM-45292__p7646422"><strong id="ALM-45292__b39997837113153">Collect fault information.</strong></p>
<ol start="6" id="ALM-45292__ol22406895113157"><li id="ALM-45292__d0e44409"><a name="ALM-45292__d0e44409"></a><a name="d0e44409"></a><span>On FusionInsight Manager, choose <strong id="ALM-45292__b1816055675422">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-45292__b382798899422">Log</strong> &gt; <strong id="ALM-45292__b1917425309422">Download</strong>.</span></li><li id="ALM-45292__li4206246"><span>Expand the <strong id="ALM-45292__b1623371292422">Service</strong> drop-down list, and select <strong id="ALM-45292__b1382717192422">Ranger</strong> for the target cluster.</span></li><li id="ALM-45292__li37856217"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-45292__b63466517019">Start Date</strong> and <strong id="ALM-45292__b834615514020">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-45292__b1034615519017">Download</strong>.</span></li><li id="ALM-45292__li5161635"><span>Contact <span id="ALM-45292__text126301214142412">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-45292__section169311343318"><h4 class="sectiontitle"><span id="ALM-45292__text976142215819">Alarm Clearance</span></h4><p id="ALM-45292__p754913417333">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-45292__section891955662611"><h4 class="sectiontitle"><span id="ALM-45292__text13373191116114">Related Information</span></h4><p id="ALM-45292__p139191756122619"><span id="ALM-45292__text13669101910115">None.</span></p>
</div>
<p id="ALM-45292__p188911526118"></p>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,96 @@
<a name="ALM-45435"></a><a name="ALM-45435"></a>
<h1 class="topictitle1">ALM-45435 Inconsistent Metadata of ClickHouse Tables</h1>
<div id="body1545702937431"><div class="note" id="ALM-45435__note14744151615401"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-45435__p1431519566">This section applies only to MRS 3.3.0 or later.</p>
</div></div>
<div class="section" id="ALM-45435__section16899242162213"><h4 class="sectiontitle"><span id="ALM-45435__text14838183534515">Alarm Description</span></h4><p id="ALM-45435__p10563369165">This alarm is generated when the metadata in a distributed table or in the local table of the distributed table has been inconsistent for 180 min.</p>
<p id="ALM-45435__p88950108215">This alarm is automatically cleared when the metadata in the distributed table or in the local table of the distributed table becomes consistent.</p>
<p id="ALM-45435__p1829242518323">Metadata consistency includes:</p>
<ul id="ALM-45435__ul108112810327"><li id="ALM-45435__li126911312326">Consistent quantity, name, sequence, and type of each column in the table</li><li id="ALM-45435__li174533512325">Consistent partition keys</li><li id="ALM-45435__li195482038103213">Consistent sorting keys</li><li id="ALM-45435__li274414123214">Consistent primary keys</li><li id="ALM-45435__li178122816322">Consistent sampling keys</li></ul>
<div class="note" id="ALM-45435__note1080116443316"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-45435__p59347714337">If this alarm exists, table metadata is inconsistent in the ClickHouse cluster to which the current node belongs. The inconsistency may be caused by multiple reasons, not limited to those mentioned in additional information.</p>
</div></div>
</div>
<div class="section" id="ALM-45435__section7625192211"><h4 class="sectiontitle"><span id="ALM-45435__text66488119489">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45435__table121116271288" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45435__row1427611277820"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-45435__p927662712816"><span id="ALM-45435__text1074744511529">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-45435__p1027614271817"><span id="ALM-45435__text529420513457">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-45435__p82761227186"><span id="ALM-45435__text139206232502">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45435__row92762279818"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-45435__p162761427082">45435</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-45435__p227610275811">Minor</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-45435__p7276172714812">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45435__section13677142112315"><h4 class="sectiontitle"><span id="ALM-45435__text0580183514489">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45435__table132164271583" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45435__row92761127482"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.1"><p id="ALM-45435__p12276527485"><span id="ALM-45435__text12210145419505">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.2"><p id="ALM-45435__p72767277812"><span id="ALM-45435__text1971012173566">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45435__row1427616274819"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45435__p162761627283">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45435__p1427611271585">Specifies the cluster or system for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45435__row5276827783"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45435__p52764271086">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45435__p32763271180">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45435__row2027616271585"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45435__p122762271287">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45435__p2276327885">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45435__row5276202713819"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45435__p202768273810">Table</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45435__p1227618271580">Specifies the database name and table name for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45435__section3610161112317"><h4 class="sectiontitle"><span id="ALM-45435__text1127833410585">Impact on the System</span></h4><p id="ALM-45435__p11277127489">Subsequent operations such as INSERT and ALTER on the table may fail.</p>
</div>
<div class="section" id="ALM-45435__section919011910231"><h4 class="sectiontitle"><span id="ALM-45435__text10245783115">Possible Causes</span></h4><p id="ALM-45435__p180614392102">Table metadata modification fails or is not executed on one or more ClickHouseServer nodes.</p>
</div>
<div class="section" id="ALM-45435__section7242948585"><h4 class="sectiontitle"><span id="ALM-45435__text35421632154">Handling Procedure</span></h4><ol id="ALM-45435__ol14303175512411"><li id="ALM-45435__li2655131716563"><span>Log in to FusionInsight Manager, choose <strong id="ALM-45435__b369212277160">O&amp;M</strong> &gt; <strong id="ALM-45435__b969252751612">Alarm</strong> &gt; <strong id="ALM-45435__b169212751615">Alarms</strong>, and view the role name and the IP address for the hostname in <strong id="ALM-45435__b11805471166">Location</strong>.</span></li><li id="ALM-45435__li2080719511561"><span>Log in to the node where the client is installed as the client installation user and run the following commands:</span><p><p id="ALM-45435__p104947467"><strong id="ALM-45435__b13678437202">cd </strong><em id="ALM-45435__i1767818314203">{Client installation path}</em></p>
<p id="ALM-45435__p1862917101764"><strong id="ALM-45435__b66292100619">source bigdata_env</strong></p>
<ul id="ALM-45435__ul124161821973"><li id="ALM-45435__li8416721677">For a cluster with Kerberos authentication enabled (security mode):<p id="ALM-45435__p141391537163"><a name="ALM-45435__li8416721677"></a><a name="li8416721677"></a><strong id="ALM-45435__b201454351202">kinit</strong> <em id="ALM-45435__i111451235112012">Component service user</em></p>
<p id="ALM-45435__p114906514387"><strong id="ALM-45435__b2939337484">clickhouse client --host </strong><em id="ALM-45435__i1194218334815">IP address of the ClickHouseServer instance that reports the alarm</em><strong id="ALM-45435__b5942203114817"> --port </strong>9440 <strong id="ALM-45435__b894410316486">--secure</strong></p>
</li><li id="ALM-45435__li20301166978">For a cluster with Kerberos authentication disabled (normal mode):<p id="ALM-45435__p162052540415"><a name="ALM-45435__li20301166978"></a><a name="li20301166978"></a><strong id="ALM-45435__b1320516547415">clickhouse client --host </strong><em id="ALM-45435__i891812221246">IP address of the ClickHouseServer instance that reports the alarm</em> <strong id="ALM-45435__b92057541043">--user </strong><em id="ALM-45435__i15903133313245">Username</em><strong id="ALM-45435__b122051654944"> --password</strong><strong id="ALM-45435__b02054545412"> --port </strong>9000</p>
</li></ul>
</p></li><li id="ALM-45435__li1430310555241"><span>Check whether any task is being executed for the table to which the alarm is generated.</span><p><p id="ALM-45435__p197501837115215">Run the following command to check whether any SQL task is being executed:</p>
<p id="ALM-45435__p113183596251"><strong id="ALM-45435__b9389101913518">select * from system.processes where current_database='</strong><em id="ALM-45435__i11391191963516">Database name</em><strong id="ALM-45435__b451202320352">' and query like '%</strong><em id="ALM-45435__i1614113246356">Table name</em><strong id="ALM-45435__b1551212232357">%'</strong></p>
<p id="ALM-45435__p132056399549">Run the following command to check whether a mutation task is being executed:</p>
<p id="ALM-45435__p773064913544"><strong id="ALM-45435__b11882724134015">select * from system.mutations where database=</strong><em id="ALM-45435__i16356122564011">'Database name' </em><strong id="ALM-45435__b1481311293403">and table=</strong><em id="ALM-45435__i17382153064018">'Table name'</em><strong id="ALM-45435__b18814129164019">;</strong></p>
<ul id="ALM-45435__ul2628181411374"><li id="ALM-45435__li1562851413375">If the query result is empty, go to <a href="#ALM-45435__li2088812501189">4</a>.</li><li id="ALM-45435__li3854716173713">If the query result contains error information, rectify the fault accordingly. If the fault cannot be rectified based on the error information, go to <a href="#ALM-45435__li153021955172417">6</a>.</li><li id="ALM-45435__li1129494316">If the query result contains information about an on-going task with no error, the SQL/mutation task is being executed.<p id="ALM-45435__p1856315564316"><a name="ALM-45435__li1129494316"></a><a name="li1129494316"></a>Wait for 5 minutes. If the alarm is cleared, no further action is required. If the alarm persists, go to <a href="#ALM-45435__li2088812501189">4</a>.</p>
</li></ul>
</p></li><li id="ALM-45435__li2088812501189"><a name="ALM-45435__li2088812501189"></a><a name="li2088812501189"></a><span>Modify the table structure, delete a table, or add a table based on service requirements until the table metadata of all nodes in the cluster is consistent. After 5 minutes, check whether this alarm is cleared.</span><p><ul id="ALM-45435__ul153451427123611"><li id="ALM-45435__li1934592716363">If yes, no further action is required.</li><li id="ALM-45435__li5573143116365">If no, go to <a href="#ALM-45435__li1346013391892">5</a>.</li></ul>
</p></li></ol><ol start="5" id="ALM-45435__ol746023915911"><li id="ALM-45435__li1346013391892"><a name="ALM-45435__li1346013391892"></a><a name="li1346013391892"></a><span>If the table has been deleted, manually clear the alarm and check whether the alarm is reported again.</span><p><ul id="ALM-45435__ul242413151104"><li id="ALM-45435__li11424815181019">If yes, go to <a href="#ALM-45435__li153021955172417">6</a>.</li><li id="ALM-45435__li1613614363108">If no, no further action is required.</li></ul>
</p></li></ol>
<p id="ALM-45435__p7678135319232"><strong id="ALM-45435__b28846433192">Collect fault information.</strong></p>
<ol start="6" id="ALM-45435__ol1030219559241"><li id="ALM-45435__li153021955172417"><a name="ALM-45435__li153021955172417"></a><a name="li153021955172417"></a><span>On FusionInsight Manager, choose <strong id="ALM-45435__b1464912456199">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-45435__b664984516191">Log</strong> &gt; <strong id="ALM-45435__b16491645201915">Download</strong>.</span></li><li id="ALM-45435__li13302125522416"><span>Expand the <strong id="ALM-45435__b132571850151917">Service</strong> drop-down list, and select <strong id="ALM-45435__b102571050111920">ClickHouse</strong> for the target cluster.</span></li><li id="ALM-45435__li730216554243"><span>Expand the <strong id="ALM-45435__b161181857171910">Hosts</strong> drop-down list. In the <strong id="ALM-45435__b8118757121911">Select Host</strong> dialog box that is displayed, select the abnormal host, and click <strong id="ALM-45435__b12118057191912">OK</strong>.</span></li><li id="ALM-45435__li18302105517248"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-45435__b1495621714010">Start Date</strong> and <strong id="ALM-45435__b17957517907">End Date</strong> for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-45435__b0957101714011">Download</strong>.</span></li><li id="ALM-45435__li330245532414"><span>Contact <span id="ALM-45435__text103024559245">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-45435__section101419392510"><h4 class="sectiontitle"><span id="ALM-45435__text976142215819">Alarm Clearance</span></h4><p id="ALM-45435__p1014737253">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-45435__section891955662611"><h4 class="sectiontitle"><span id="ALM-45435__text13373191116114">Related Information</span></h4><p id="ALM-45435__p139191756122619"><span id="ALM-45435__text13669101910115">None.</span></p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,88 @@
<a name="ALM-45436"></a><a name="ALM-45436"></a>
<h1 class="topictitle1">ALM-45436 Skew ClickHouse Table Data</h1>
<div id="body1559100774179"><div class="note" id="ALM-45436__note14744151615401"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-45436__p1431519566">This section applies only to MRS 3.3.0 or later.</p>
</div></div>
<div class="section" id="ALM-45436__section16899242162213"><h4 class="sectiontitle"><span id="ALM-45436__text14838183534515">Alarm Description</span></h4><p id="ALM-45436__p53915205176">This alarm is generated when data skew occurs in the local table of a distributed table between ClickHouse nodes. This alarm is automatically cleared when data becomes balanced.</p>
<p id="ALM-45436__p88950108215">Data skew check method:</p>
<ul id="ALM-45436__ul155132611454"><li id="ALM-45436__li105132644516">If <strong id="ALM-45436__b15755350144216">min_table_check_data_bytes</strong> is set to <strong id="ALM-45436__b1277145310427">0</strong>, data skew check is disabled.</li><li id="ALM-45436__li195130644516">If <strong id="ALM-45436__b20275214164417">min_table_check_data_bytes</strong> is greater than <strong id="ALM-45436__b82751914174411">0</strong>, data skew check is enabled.</li></ul>
<p id="ALM-45436__p767510599307">After data skew check is enabled, if the data volume in a table is less than the <strong id="ALM-45436__b2238351458">min_table_check_data_bytes</strong> value, no alarm will be reported due to data skew. When the data volume is greater than the <strong id="ALM-45436__b208403239462">min_table_check_data_bytes</strong> value and the data volume difference between the same table on different nodes is greater than the percentage specified in <strong id="ALM-45436__b10653102811471">min_table_data_varies_rate</strong>, data skew occurs and this alarm is reported.</p>
</div>
<div class="section" id="ALM-45436__section7625192211"><h4 class="sectiontitle"><span id="ALM-45436__text66488119489">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45436__table121116271288" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45436__row1427611277820"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-45436__p927662712816"><span id="ALM-45436__text1074744511529">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-45436__p1027614271817"><span id="ALM-45436__text529420513457">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-45436__p82761227186"><span id="ALM-45436__text139206232502">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45436__row92762279818"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-45436__p162761427082">45436</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-45436__p227610275811">Minor</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-45436__p7276172714812">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45436__section13677142112315"><h4 class="sectiontitle"><span id="ALM-45436__text0580183514489">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45436__table132164271583" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45436__row92761127482"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.1"><p id="ALM-45436__p12276527485"><span id="ALM-45436__text12210145419505">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.2"><p id="ALM-45436__p72767277812"><span id="ALM-45436__text1971012173566">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45436__row1427616274819"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45436__p162761627283">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45436__p1427611271585">Specifies the cluster or system for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45436__row5276827783"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45436__p52764271086">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45436__p32763271180">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45436__row2027616271585"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45436__p122762271287">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45436__p2276327885">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45436__row5276202713819"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45436__p202768273810">Table</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45436__p1227618271580">Specifies the database name and table name for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45436__section3610161112317"><h4 class="sectiontitle"><span id="ALM-45436__text1127833410585">Impact on the System</span></h4><p id="ALM-45436__p11277127489">SQL execution efficiency may be lowered.</p>
</div>
<div class="section" id="ALM-45436__section919011910231"><h4 class="sectiontitle"><span id="ALM-45436__text10245783115">Possible Causes</span></h4><p id="ALM-45436__p180614392102">The data write policy is improper, causing unbalanced data among nodes.</p>
</div>
<div class="section" id="ALM-45436__section7242948585"><h4 class="sectiontitle"><span id="ALM-45436__text35421632154">Handling Procedure</span></h4><ol id="ALM-45436__ol14303175512411"><li id="ALM-45436__li2655131716563"><span>Log in to FusionInsight Manager, choose <strong id="ALM-45436__b627919232222">O&amp;M</strong> &gt; <strong id="ALM-45436__b112799236227">Alarm</strong> &gt; <strong id="ALM-45436__b6280182332219">Alarms</strong>, and view the role name and the IP address for the hostname in <strong id="ALM-45436__b1928122311224">Location</strong>.</span></li><li id="ALM-45436__li2080719511561"><span>Log in to the node where the client is installed as the client installation user and run the following commands:</span><p><p id="ALM-45436__p104947467"><strong id="ALM-45436__b1112442552615">cd </strong><em id="ALM-45436__i1712502592617">{Client installation path}</em></p>
<p id="ALM-45436__p1862917101764"><strong id="ALM-45436__b66292100619">source bigdata_env</strong></p>
<ul id="ALM-45436__ul124161821973"><li id="ALM-45436__li8416721677">Security mode (with Kerberos enabled):<p id="ALM-45436__p141391537163"><a name="ALM-45436__li8416721677"></a><a name="li8416721677"></a><strong id="ALM-45436__b191620199291">kinit</strong> <em id="ALM-45436__i1917119202918">Component service user</em></p>
<p id="ALM-45436__p114906514387"><strong id="ALM-45436__b414520275482">clickhouse client --host </strong><em id="ALM-45436__i15145182718484">IP address of the ClickHouseServer instance that reports the alarm</em><strong id="ALM-45436__b5146527194817"> --port </strong>9440 <strong id="ALM-45436__b614616279487">--secure</strong></p>
</li><li id="ALM-45436__li20301166978">Normal mode (with Kerberos disabled):<p id="ALM-45436__p162052540415"><a name="ALM-45436__li20301166978"></a><a name="li20301166978"></a><strong id="ALM-45436__b1582623682919">clickhouse client --host </strong><em id="ALM-45436__i17826153613290">IP address of the ClickHouseServer instance that reports the alarm</em> <strong id="ALM-45436__b08261136102913">--user </strong><em id="ALM-45436__i1282617363291">Username</em><strong id="ALM-45436__b168271736192920"> --password</strong><strong id="ALM-45436__b1782713611296"> --port </strong>9000</p>
</li></ul>
</p></li><li id="ALM-45436__li153585044119"><span>View data distribution.</span><p><p id="ALM-45436__p486635515211"><strong id="ALM-45436__b156251126185018">select FQDN(), database, table, sum(data_compressed_bytes) from clusterAllReplicas(</strong><em id="ALM-45436__i14602427165019">Name of the logical cluster</em><strong id="ALM-45436__b15625152615014">, system.parts) where database='</strong><em id="ALM-45436__i1960126985">Database name</em><strong id="ALM-45436__b5589530784">' and table='</strong><em id="ALM-45436__i183261931881">Table name</em><strong id="ALM-45436__b85901302811">' and active=1 group by (FQDN(), database, table);</strong></p>
</p></li><li id="ALM-45436__li166372716552"><span>Balance data with a few clicks or migrate data based on service requirements.</span></li><li id="ALM-45436__li48091081190"><span>Check whether the alarm is cleared.</span><p><ul id="ALM-45436__ul11148192517913"><li id="ALM-45436__li1614817252910">If yes, no further action is required.</li><li id="ALM-45436__li1014832513919">If no, go to <a href="#ALM-45436__li153021955172417">6</a>.</li></ul>
</p></li></ol>
<p id="ALM-45436__p7678135319232"><strong id="ALM-45436__b2013915155152">Collect fault information.</strong></p>
<ol start="6" id="ALM-45436__ol1030219559241"><li id="ALM-45436__li153021955172417"><a name="ALM-45436__li153021955172417"></a><a name="li153021955172417"></a><span>On FusionInsight Manager, choose <strong id="ALM-45436__b1168561611518">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-45436__b86851516141519">Log</strong> &gt; <strong id="ALM-45436__b8686216171513">Download</strong>.</span></li><li id="ALM-45436__li13302125522416"><span>Expand the <strong id="ALM-45436__b20945121711516">Service</strong> drop-down list, and select <strong id="ALM-45436__b694511712159">ClickHouse</strong> for the target cluster.</span></li><li id="ALM-45436__li730216554243"><span>Expand the <strong id="ALM-45436__b101353208156">Hosts</strong> drop-down list. In the <strong id="ALM-45436__b11135920111518">Select Host</strong> dialog box that is displayed, select the abnormal host, and click <strong id="ALM-45436__b8135152071512">OK</strong>.</span></li><li id="ALM-45436__li18302105517248"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-45436__b8863222306">Start Date</strong> and <strong id="ALM-45436__b1586320224012">End Date</strong> for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-45436__b286310222017">Download</strong>.</span></li><li id="ALM-45436__li330245532414"><span>Contact <span id="ALM-45436__text1848472871512">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-45436__section101419392510"><h4 class="sectiontitle"><span id="ALM-45436__text976142215819">Alarm Clearance</span></h4><p id="ALM-45436__p1014737253">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-45436__section891955662611"><h4 class="sectiontitle"><span id="ALM-45436__text13373191116114">Related Information</span></h4><p id="ALM-45436__p139191756122619"><span id="ALM-45436__text13669101910115">None.</span></p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,93 @@
<a name="ALM-45437"></a><a name="ALM-45437"></a>
<h1 class="topictitle1">ALM-45437 Excessive Parts in the ClickHouse Table</h1>
<div id="body1559128174373"><div class="note" id="ALM-45437__note14744151615401"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-45437__p1431519566">This section applies only to MRS 3.3.0 or later.</p>
</div></div>
<div class="section" id="ALM-45437__section16899242162213"><h4 class="sectiontitle"><span id="ALM-45437__text14838183534515">Alarm Description</span></h4><p id="ALM-45437__p0813155419145">This alarm is generated when the number of parts exceeds the threshold specified by <strong id="ALM-45437__b18624917151612">part_num_threshold</strong>.</p>
<p id="ALM-45437__p88950108215">This alarm is automatically cleared when the number of parts is less than the <strong id="ALM-45437__b7141184215165">part_num_threshold</strong> value.</p>
</div>
<div class="section" id="ALM-45437__section7625192211"><h4 class="sectiontitle"><span id="ALM-45437__text66488119489">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45437__table121116271288" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45437__row1427611277820"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-45437__p927662712816"><span id="ALM-45437__text1074744511529">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-45437__p1027614271817"><span id="ALM-45437__text529420513457">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-45437__p82761227186"><span id="ALM-45437__text139206232502">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45437__row92762279818"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-45437__p162761427082">45437</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-45437__p227610275811">Minor</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-45437__p7276172714812">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45437__section13677142112315"><h4 class="sectiontitle"><span id="ALM-45437__text0580183514489">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45437__table132164271583" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45437__row92761127482"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.1"><p id="ALM-45437__p12276527485"><span id="ALM-45437__text12210145419505">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.2"><p id="ALM-45437__p72767277812"><span id="ALM-45437__text1971012173566">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45437__row1427616274819"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45437__p162761627283">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45437__p1427611271585">Specifies the cluster or system for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45437__row5276827783"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45437__p52764271086">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45437__p32763271180">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45437__row2027616271585"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45437__p122762271287">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45437__p2276327885">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45437__row14108759195213"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45437__p201091559185214">Table</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45437__p151095598526">Specifies the database name and table name for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45437__row5276202713819"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45437__p202768273810">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45437__p1227618271580">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45437__section3610161112317"><h4 class="sectiontitle"><span id="ALM-45437__text1127833410585">Impact on the System</span></h4><p id="ALM-45437__p11277127489">Service errors may occur.</p>
</div>
<div class="section" id="ALM-45437__section919011910231"><h4 class="sectiontitle"><span id="ALM-45437__text10245783115">Possible Causes</span></h4><p id="ALM-45437__p180614392102">The data distribution in the ClickHouse table is improper, or the background merge task is executed slowly.</p>
</div>
<div class="section" id="ALM-45437__section7242948585"><h4 class="sectiontitle"><span id="ALM-45437__text35421632154">Handling Procedure</span></h4><ol id="ALM-45437__ol14303175512411"><li id="ALM-45437__li2655131716563"><span>Log in to FusionInsight Manager, choose <strong id="ALM-45437__b456855471915">O&amp;M</strong> &gt; <strong id="ALM-45437__b6568175491915">Alarm</strong> &gt; <strong id="ALM-45437__b756895419191">Alarms</strong>, and view the role name and the IP address for the hostname in <strong id="ALM-45437__b9569754181913">Location</strong>.</span></li><li id="ALM-45437__li2080719511561"><span>Log in to the node where the client is installed as the client installation user and run the following commands:</span><p><p id="ALM-45437__p104947467"><strong id="ALM-45437__b116405572198">cd </strong><em id="ALM-45437__i11640557171918">{Client installation path}</em></p>
<p id="ALM-45437__p1862917101764"><strong id="ALM-45437__b66292100619">source bigdata_env</strong></p>
<ul id="ALM-45437__ul124161821973"><li id="ALM-45437__li8416721677">Security mode (with Kerberos enabled):<p id="ALM-45437__p141391537163"><a name="ALM-45437__li8416721677"></a><a name="li8416721677"></a><strong id="ALM-45437__b811711162019">kinit</strong> <em id="ALM-45437__i101176102015">Component service user</em></p>
<p id="ALM-45437__p114906514387"><strong id="ALM-45437__b9491951153818">clickhouse client --host </strong><em id="ALM-45437__i7491151133816">IP address of the ClickHouseServer instance that reports the alarm</em><strong id="ALM-45437__b9491165114385"> --port </strong>9440 <strong id="ALM-45437__b749114516388">--secure</strong></p>
</li><li id="ALM-45437__li20301166978">Normal mode (with Kerberos disabled):<p id="ALM-45437__p162052540415"><a name="ALM-45437__li20301166978"></a><a name="li20301166978"></a><strong id="ALM-45437__b4801165152013">clickhouse client --host </strong><em id="ALM-45437__i118013512013">IP address of the ClickHouseServer instance that reports the alarm</em> <strong id="ALM-45437__b188012562014">--user </strong><em id="ALM-45437__i1680116512012">Username</em><strong id="ALM-45437__b1080120519209"> --password</strong><strong id="ALM-45437__b6802145192017"> --port </strong>9000</p>
</li></ul>
</p></li><li id="ALM-45437__li5567053120"><span>Run the following command to manually merge parts:</span><p><p id="ALM-45437__p71411051846"><strong id="ALM-45437__b19241174571219">optimize table </strong><em id="ALM-45437__i13242154515122">Database name</em><strong id="ALM-45437__b13116848141213">.</strong><em id="ALM-45437__i666544841219">Table name</em><strong id="ALM-45437__b19116104810125"> final;</strong></p>
</p></li><li id="ALM-45437__li82777396514"><span>Check whether the number of parts has decreased.</span><p><p id="ALM-45437__p882733813147"><strong id="ALM-45437__b881435741213">select FQDN(), database, table, count(1) from clusterAllReplicas(default_cluster, system.parts) where database='</strong><em id="ALM-45437__i15584135820125">Database name</em><strong id="ALM-45437__b116931424134">' and table='</strong><em id="ALM-45437__i618213318137">Table name</em><strong id="ALM-45437__b56934281314">' and active=1 group by (FQDN(), database, table);</strong></p>
<ol type="a" id="ALM-45437__ol96891624151314"><li id="ALM-45437__li1498124219190">If the number of parts is less than the threshold, wait for 5 minutes and check whether the alarm is cleared.<ul id="ALM-45437__ul19685183821319"><li id="ALM-45437__li1768519384132">If yes, no further action is required.</li><li id="ALM-45437__li6137144741315">If no, go to <a href="#ALM-45437__li153021955172417">5</a>.</li></ul>
</li><li id="ALM-45437__li533692716135">If the number of parts does not decrease, check whether the partition key of the table is set properly. If the number of partitions is too large, rectify the service logic.</li><li id="ALM-45437__li1121362819158">If the command output is empty, the table does not exist. This alarm is a historical alarm and can be ignored. Manually clear it.</li></ol>
</p></li></ol>
<p id="ALM-45437__p7678135319232"><strong id="ALM-45437__b15583223172912">Collect fault information.</strong></p>
<ol start="5" id="ALM-45437__ol1030219559241"><li id="ALM-45437__li153021955172417"><a name="ALM-45437__li153021955172417"></a><a name="li153021955172417"></a><span>On FusionInsight Manager, choose <strong id="ALM-45437__b5577152517297">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-45437__b657782562915">Log</strong> &gt; <strong id="ALM-45437__b1577122532917">Download</strong>.</span></li><li id="ALM-45437__li13302125522416"><span>Expand the <strong id="ALM-45437__b1526813272291">Service</strong> drop-down list, and select <strong id="ALM-45437__b1526813275294">ClickHouse</strong> for the target cluster.</span></li><li id="ALM-45437__li730216554243"><span>Expand the <strong id="ALM-45437__b17916202815297">Hosts</strong> drop-down list. In the <strong id="ALM-45437__b199162028102919">Select Host</strong> dialog box that is displayed, select the abnormal host, and click <strong id="ALM-45437__b1091762812911">OK</strong>.</span></li><li id="ALM-45437__li18302105517248"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-45437__b270492617011">Start Date</strong> and <strong id="ALM-45437__b167043261509">End Date</strong> for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-45437__b1170472619014">Download</strong>.</span></li><li id="ALM-45437__li330245532414"><span>Contact <span id="ALM-45437__text14667113672919">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-45437__section101419392510"><h4 class="sectiontitle"><span id="ALM-45437__text976142215819">Alarm Clearance</span></h4><p id="ALM-45437__p1014737253">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-45437__section891955662611"><h4 class="sectiontitle"><span id="ALM-45437__text13373191116114">Related Information</span></h4><p id="ALM-45437__p139191756122619"><span id="ALM-45437__text13669101910115">None.</span></p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,83 @@
<a name="ALM-45438"></a><a name="ALM-45438"></a>
<h1 class="topictitle1">ALM-45438 ClickHouse Disk Usage Exceeds 80%</h1>
<div id="body1559128174373"><div class="note" id="ALM-45438__note14744151615401"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-45438__p1431519566">This section applies only to MRS 3.3.0 or later.</p>
</div></div>
<div class="section" id="ALM-45438__section16899242162213"><h4 class="sectiontitle"><span id="ALM-45438__text14838183534515">Alarm Description</span></h4><p id="ALM-45438__p182761827181">The system checks the disk capacity of the ClickHouseServer node every 1 minute. This alarm is generated when the usage of the disk where the ClickHouse data directory or metadata directory resides exceeds 80%.</p>
<p id="ALM-45438__p88950108215">This alarm is automatically cleared when the usage of the disk where the ClickHouse data directory or metadata directory is located is lower than 80%.</p>
</div>
<div class="section" id="ALM-45438__section7625192211"><h4 class="sectiontitle"><span id="ALM-45438__text66488119489">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45438__table121116271288" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45438__row1427611277820"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-45438__p927662712816"><span id="ALM-45438__text1074744511529">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-45438__p1027614271817"><span id="ALM-45438__text529420513457">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-45438__p82761227186"><span id="ALM-45438__text139206232502">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45438__row92762279818"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-45438__p162761427082">45438</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-45438__p227610275811">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-45438__p7276172714812">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45438__section13677142112315"><h4 class="sectiontitle"><span id="ALM-45438__text0580183514489">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45438__table132164271583" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45438__row92761127482"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.1"><p id="ALM-45438__p12276527485"><span id="ALM-45438__text12210145419505">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.2"><p id="ALM-45438__p72767277812"><span id="ALM-45438__text1971012173566">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45438__row1427616274819"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45438__p162761627283">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45438__p1427611271585">Specifies the cluster or system for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45438__row5276827783"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45438__p52764271086">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45438__p32763271180">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45438__row2027616271585"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45438__p122762271287">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45438__p2276327885">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45438__row5276202713819"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45438__p202768273810">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45438__p1227618271580">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45438__row37281842173210"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45438__p37286421329">DiskPath</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45438__p10728642103211">Specifies the path of the disk for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45438__section3610161112317"><h4 class="sectiontitle"><span id="ALM-45438__text1127833410585">Impact on the System</span></h4><p id="ALM-45438__p11277127489">The ClickHouse write operation may fail.</p>
</div>
<div class="section" id="ALM-45438__section919011910231"><h4 class="sectiontitle"><span id="ALM-45438__text10245783115">Possible Causes</span></h4><p id="ALM-45438__p180614392102">The disk capacity of the ClickHouseServer node is too small.</p>
</div>
<div class="section" id="ALM-45438__section7242948585"><h4 class="sectiontitle"><span id="ALM-45438__text35421632154">Handling Procedure</span></h4><ol id="ALM-45438__ol14303175512411"><li id="ALM-45438__li2655131716563"><span>Log in to FusionInsight Manager, choose <strong id="ALM-45438__b11996143910347">O&amp;M</strong> &gt; <strong id="ALM-45438__b499663983415">Alarm</strong> &gt; <strong id="ALM-45438__b799715391341">Alarms</strong>, and view the role name and the IP address for the hostname in <strong id="ALM-45438__b7997133973418">Location</strong>.</span></li><li id="ALM-45438__li2080719511561"><span>Expand the disk capacity of the node for which the alarm is generated.</span></li><li id="ALM-45438__li11709154612419"><span>Go to <a href="#ALM-45438__li153021955172417">4</a> if the expansion fails or the alarm persists after the expansion.</span></li></ol>
<p id="ALM-45438__p7678135319232"><strong id="ALM-45438__b752618012376">Collect fault information.</strong></p>
<ol start="4" id="ALM-45438__ol1030219559241"><li id="ALM-45438__li153021955172417"><a name="ALM-45438__li153021955172417"></a><a name="li153021955172417"></a><span>On FusionInsight Manager, choose <strong id="ALM-45438__b1296818227376">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-45438__b9968132218371">Log</strong> &gt; <strong id="ALM-45438__b1696912223713">Download</strong>.</span></li><li id="ALM-45438__li13302125522416"><span>Expand the <strong id="ALM-45438__b5244424143717">Service</strong> drop-down list, and select <strong id="ALM-45438__b62441924193713">ClickHouse</strong> for the target cluster.</span></li><li id="ALM-45438__li730216554243"><span>Expand the <strong id="ALM-45438__b111242693714">Hosts</strong> drop-down list. In the <strong id="ALM-45438__b151124264373">Select Host</strong> dialog box that is displayed, select the abnormal host, and click <strong id="ALM-45438__b1411242611378">OK</strong>.</span></li><li id="ALM-45438__li18302105517248"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-45438__b1579411291506">Start Date</strong> and <strong id="ALM-45438__b8794929601">End Date</strong> for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-45438__b57946291108">Download</strong>.</span></li><li id="ALM-45438__li330245532414"><span>Contact <span id="ALM-45438__text4972173593715">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-45438__section101419392510"><h4 class="sectiontitle"><span id="ALM-45438__text976142215819">Alarm Clearance</span></h4><p id="ALM-45438__p1014737253">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-45438__section891955662611"><h4 class="sectiontitle"><span id="ALM-45438__text13373191116114">Related Information</span></h4><p id="ALM-45438__p139191756122619"><span id="ALM-45438__text13669101910115">None.</span></p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,88 @@
<a name="ALM-45439"></a><a name="ALM-45439"></a>
<h1 class="topictitle1">ALM-45439 ClickHouse Node Enters the Read-Only Mode</h1>
<div id="body0000001239525407"><div class="note" id="ALM-45439__note14744151615401"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-45439__p1431519566">This section applies only to MRS 3.3.0 or later.</p>
</div></div>
<div class="section" id="ALM-45439__section16899242162213"><h4 class="sectiontitle"><span id="ALM-45439__text14838183534515">Alarm Description</span></h4><p id="ALM-45439__p182761827181">The system checks the disk capacity of the ClickHouseServer node every 1 minute. This alarm is generated when the system detects that the disk capacity exceeds 90% and the ClickHouseServer node enters the read-only mode.</p>
<p id="ALM-45439__p88950108215">This alarm is automatically cleared when the system detects that the disk capacity is lower than 90% and the ClickHouseServer node exits the read-only mode.</p>
<div class="note" id="ALM-45439__note1977114313545"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-45439__p1437031605616">If the ClickHouseServer node is in read-only mode and you need to log in to the client to clear data, you can manually exit the read-only mode using the following method:</p>
<p id="ALM-45439__p67874311544">Log in to FusionInsight Manager, choose <strong id="ALM-45439__b788613318559">Cluster</strong> &gt; <strong id="ALM-45439__b159425518557">Services</strong> &gt; <strong id="ALM-45439__b4315171015553">ClickHouse</strong> &gt; <strong id="ALM-45439__b2524113185520">Configurations</strong> &gt; <strong id="ALM-45439__b2998191655519">All Configurations</strong>, search for <strong id="ALM-45439__b133155365516">profiles.default.readonly</strong>, and change its value to <strong id="ALM-45439__b78001359155516">0</strong>.</p>
</div></div>
</div>
<div class="section" id="ALM-45439__section7625192211"><h4 class="sectiontitle"><span id="ALM-45439__text66488119489">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45439__table121116271288" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45439__row1427611277820"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-45439__p927662712816"><span id="ALM-45439__text1074744511529">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-45439__p1027614271817"><span id="ALM-45439__text529420513457">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-45439__p82761227186"><span id="ALM-45439__text139206232502">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45439__row92762279818"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-45439__p162761427082">45439</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-45439__p227610275811">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-45439__p7276172714812">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45439__section13677142112315"><h4 class="sectiontitle"><span id="ALM-45439__text0580183514489">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45439__table132164271583" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45439__row92761127482"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.1"><p id="ALM-45439__p12276527485"><span id="ALM-45439__text12210145419505">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.2"><p id="ALM-45439__p72767277812"><span id="ALM-45439__text1971012173566">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45439__row1427616274819"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45439__p162761627283">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45439__p1427611271585">Specifies the cluster or system for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45439__row5276827783"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45439__p52764271086">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45439__p32763271180">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45439__row2027616271585"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45439__p122762271287">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45439__p2276327885">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45439__row5276202713819"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45439__p202768273810">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45439__p1227618271580">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45439__row8463102014492"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45439__p44631320194912">DiskPath</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45439__p34638204492">Specifies the path of the disk for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45439__section3610161112317"><h4 class="sectiontitle"><span id="ALM-45439__text1127833410585">Impact on the System</span></h4><p id="ALM-45439__p11277127489">After the ClickHouseServer node enters the read-only mode, all write, modification, and deletion operations fail.</p>
</div>
<div class="section" id="ALM-45439__section919011910231"><h4 class="sectiontitle"><span id="ALM-45439__text10245783115">Possible Causes</span></h4><p id="ALM-45439__p180614392102">The disk usage of the ClickHouse node exceeds 90%.</p>
</div>
<div class="section" id="ALM-45439__section7242948585"><h4 class="sectiontitle"><span id="ALM-45439__text35421632154">Handling Procedure</span></h4><ol id="ALM-45439__ol14303175512411"><li id="ALM-45439__li2655131716563"><span>Log in to FusionInsight Manager, choose <strong id="ALM-45439__b91463710572">O&amp;M</strong> &gt; <strong id="ALM-45439__b0147677572">Alarm</strong> &gt; <strong id="ALM-45439__b014777195719">Alarms</strong>, and view the role name and the IP address for the hostname in <strong id="ALM-45439__b141479713579">Location</strong>.</span></li><li id="ALM-45439__li12332155152117"><span>Expand the disk capacity of the node for which the alarm is generated.</span></li><li id="ALM-45439__li5489174907"><span>Go to <a href="#ALM-45439__li153021955172417">4</a> if the expansion fails or the alarm persists after the expansion.</span><p><div class="note" id="ALM-45439__note123910115419"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-45439__p82401105420">After the capacity expansion, this alarm can be automatically cleared only when <strong id="ALM-45439__b84928973413">profiles.default.readonly</strong> is <strong id="ALM-45439__b119521493413">auto</strong>. If its value has been manually changed, change it back to <strong id="ALM-45439__b212933218341">auto</strong>. If <strong id="ALM-45439__b4670162963510">profiles.default.readonly</strong> needs to be set to <strong id="ALM-45439__b15915038173515">0</strong> or <strong id="ALM-45439__b15931412357">1</strong> based on service requirements, manually clear this alarm.</p>
</div></div>
</p></li></ol>
<p id="ALM-45439__p7678135319232"><strong id="ALM-45439__b1570717211577">Collect fault information.</strong></p>
<ol start="4" id="ALM-45439__ol1030219559241"><li id="ALM-45439__li153021955172417"><a name="ALM-45439__li153021955172417"></a><a name="li153021955172417"></a><span>On FusionInsight Manager, choose <strong id="ALM-45439__b292792295711">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-45439__b109288225579">Log</strong> &gt; <strong id="ALM-45439__b1928522165720">Download</strong>.</span></li><li id="ALM-45439__li13302125522416"><span>Expand the <strong id="ALM-45439__b688811231572">Service</strong> drop-down list, and select <strong id="ALM-45439__b8888102310572">ClickHouse</strong> for the target cluster.</span></li><li id="ALM-45439__li730216554243"><span>Expand the <strong id="ALM-45439__b10970324115714">Hosts</strong> drop-down list. In the <strong id="ALM-45439__b59705244573">Select Host</strong> dialog box that is displayed, select the abnormal host, and click <strong id="ALM-45439__b997019247577">OK</strong>.</span></li><li id="ALM-45439__li18302105517248"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-45439__b14976143211017">Start Date</strong> and <strong id="ALM-45439__b109761532403">End Date</strong> for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-45439__b79778325015">Download</strong>.</span></li><li id="ALM-45439__li330245532414"><span>Contact <span id="ALM-45439__text125023195713">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-45439__section101419392510"><h4 class="sectiontitle"><span id="ALM-45439__text976142215819">Alarm Clearance</span></h4><p id="ALM-45439__p1014737253">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-45439__section891955662611"><h4 class="sectiontitle"><span id="ALM-45439__text13373191116114">Related Information</span></h4><p id="ALM-45439__p139191756122619"><span id="ALM-45439__text13669101910115">None.</span></p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

121
docs/mrs/umn/ALM-45440.html Normal file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,93 @@
<a name="ALM-45441"></a><a name="ALM-45441"></a>
<h1 class="topictitle1">ALM-45441 Zookeeper Disconnected</h1>
<div id="body0000001194005538"><div class="note" id="ALM-45441__note12303191265"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-45441__p1431519566">This section applies only to MRS 3.3.0 or later.</p>
</div></div>
<div class="section" id="ALM-45441__section4181191543314"><h4 class="sectiontitle"><span id="ALM-45441__text14838183534515">Alarm Description</span></h4><p id="ALM-45441__p363513175232">The system checks the connection between ClickHouse and ZooKeeper every minute. This alarm is generated when the connection fails. The alarm is reported because the ZooKeeper connection is abnormal. If the connection fails for three consecutive times, the system generates an alarm.</p>
<p id="ALM-45441__p842285323314">This alarm is automatically cleared when the system detects that the connection is normal.</p>
</div>
<div class="section" id="ALM-45441__section6432132533414"><h4 class="sectiontitle"><span id="ALM-45441__text66488119489">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45441__table15811244124611" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45441__row115971544184611"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-45441__p12597174434618"><span id="ALM-45441__text1074744511529">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-45441__p5597114494615"><span id="ALM-45441__text529420513457">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-45441__p1559716445469"><span id="ALM-45441__text139206232502">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45441__row155971644124612"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-45441__p65978447466">45441</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-45441__p13598344144611">Critical</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-45441__p175981544194611">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45441__section105471213143515"><h4 class="sectiontitle"><span id="ALM-45441__text0580183514489">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45441__table132164271583" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45441__row92761127482"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.1"><p id="ALM-45441__p12276527485"><span id="ALM-45441__text12210145419505">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.2"><p id="ALM-45441__p72767277812"><span id="ALM-45441__text1971012173566">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45441__row1427616274819"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45441__p162761627283">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45441__p1427611271585">Specifies the cluster or system for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45441__row5276827783"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45441__p52764271086">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45441__p32763271180">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45441__row2027616271585"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45441__p122762271287">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45441__p2276327885">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45441__row5276202713819"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45441__p202768273810">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45441__p1227618271580">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45441__section0918121233917"><h4 class="sectiontitle"><span id="ALM-45441__text1127833410585">Impact on the System</span></h4><p id="ALM-45441__p115222903917">If ClickHouse is disconnected from ZooKeeper, the ClickHouse service cannot be used.</p>
</div>
<div class="section" id="ALM-45441__section15920165211392"><h4 class="sectiontitle"><span id="ALM-45441__text10245783115">Possible Causes</span></h4><ul id="ALM-45441__ul99361645184215"><li id="ALM-45441__li1046817283012">The ZooKeeper service is abnormal.</li><li id="ALM-45441__li12936945154211">The ClickHouse service is overloaded.</li></ul>
</div>
<div class="section" id="ALM-45441__section1437654425314"><h4 class="sectiontitle"><span id="ALM-45441__text35421632154">Handling Procedure</span></h4><p id="ALM-45441__p15198192793613"><strong id="ALM-45441__b561045314214">Check whether ZooKeeper is normal.</strong></p>
<ol id="ALM-45441__ol79577513018"><li id="ALM-45441__li1195114518013"><span>On FusionInsight Manager, choose <strong id="ALM-45441__b19366759622">Cluster</strong> &gt; <strong id="ALM-45441__b33669592218">Services</strong> &gt; <strong id="ALM-45441__b536625916215">ZooKeeper</strong> &gt; <strong id="ALM-45441__b836716591922">quorumpeer</strong>.</span></li><li id="ALM-45441__li20952135802"><span>Check whether ZooKeeper instances are normal.</span><p><ul id="ALM-45441__ul4144613813"><li id="ALM-45441__li17144311818">If yes, go to <a href="#ALM-45441__li15319205119354">6</a>.</li><li id="ALM-45441__li214431387">If no, go to <a href="#ALM-45441__li1395215202">3</a>.</li></ul>
</p></li><li id="ALM-45441__li1395215202"><a name="ALM-45441__li1395215202"></a><a name="li1395215202"></a><span>Select instances whose status is not good and choose <strong id="ALM-45441__b172351721665">More</strong> &gt; <strong id="ALM-45441__b1823512211866">Restart Instance</strong>.</span></li><li id="ALM-45441__li99531855012"><span>Check whether the instance status is good after restart.</span><p><ul id="ALM-45441__ul9953054010"><li id="ALM-45441__li1995315703">If yes, go to <a href="#ALM-45441__li6946141915104">5</a>.</li><li id="ALM-45441__li995315511013">If no, go to <a href="#ALM-45441__li6769733151816">10</a>.</li></ul>
</p></li><li id="ALM-45441__li6946141915104"><a name="ALM-45441__li6946141915104"></a><a name="li6946141915104"></a><span>Choose <strong id="ALM-45441__b16111472714">O&amp;M</strong> &gt; <strong id="ALM-45441__b116121579716">Alarm</strong> &gt; <strong id="ALM-45441__b13612107578">Alarms</strong> and check whether the alarm is cleared.</span><p><ul id="ALM-45441__ul17946619191016"><li id="ALM-45441__li14946191918101">If yes, no further action is required.</li><li id="ALM-45441__li260244816319">If no, go to <a href="#ALM-45441__li15319205119354">6</a>.</li></ul>
</p></li></ol>
<p id="ALM-45441__p348310210267"><strong id="ALM-45441__b336616166720">Check whether the ClickHouse service load is heavy.</strong></p>
<ol start="6" id="ALM-45441__ol113197516357"><li id="ALM-45441__li15319205119354"><a name="ALM-45441__li15319205119354"></a><a name="li15319205119354"></a><span>Log in to FusionInsight Manager, choose <strong id="ALM-45441__b5853201316202">O&amp;M</strong> &gt; <strong id="ALM-45441__b1285361352013">Alarm</strong> &gt; <strong id="ALM-45441__b188531913182010">Alarms</strong>, and view the role name and the IP address for the hostname in <strong id="ALM-45441__b48531713112019">Location</strong>.</span></li><li id="ALM-45441__li13812198153717"><span>Log in to the node where the client is installed as the client installation user and run the following commands:</span><p><p id="ALM-45441__p20951185605"><strong id="ALM-45441__b4951351605">cd </strong><em id="ALM-45441__i49516515011">{Client installation path}</em></p>
<p id="ALM-45441__p10951851109"><strong id="ALM-45441__b895175403">source bigdata_env</strong></p>
<ul id="ALM-45441__ul119521151019"><li id="ALM-45441__li12952051017">For a cluster with Kerberos authentication enabled (security mode):<p id="ALM-45441__p199511657011"><a name="ALM-45441__li12952051017"></a><a name="li12952051017"></a><strong id="ALM-45441__b159515511014">kinit</strong> <em id="ALM-45441__i69513518010">Component service user</em></p>
<p id="ALM-45441__p2233164043715"><strong id="ALM-45441__b9928999924216">clickhouse client --host </strong><em id="ALM-45441__i17831994784216">IP address of the ClickHouseServer instance that reports the alarm</em><strong id="ALM-45441__b3090799784216"> --port </strong>9440 <strong id="ALM-45441__b5464036404216">--secure</strong></p>
</li><li id="ALM-45441__li59521456012">For a cluster with Kerberos authentication disabled (normal mode):<p id="ALM-45441__p4952175200"><a name="ALM-45441__li59521456012"></a><a name="li59521456012"></a><strong id="ALM-45441__b797211811917">clickhouse client --host </strong>IP address of the ClickHouseServer instance that reports the alarm<strong id="ALM-45441__b1896203614918"> --user</strong><em id="ALM-45441__i952216364916"> User name</em><strong id="ALM-45441__b934319531498"> --password --port </strong>9440</p>
</li></ul>
</p></li><li id="ALM-45441__li14242152404310"><span>Run the following statement to check whether data is frequently written to the system table. If yes, wait until the service execution is complete and check whether the alarm is cleared.</span><p><p id="ALM-45441__p89521551509"><strong id="ALM-45441__b4952195308">SELECT query_id, user, FQDN(), elapsed, query FROM system.processes ORDER BY query_id;</strong></p>
<ul id="ALM-45441__ul99521851808"><li id="ALM-45441__li18952658011">If yes, no further action is required.</li><li id="ALM-45441__li19952159015">If no, go to <a href="#ALM-45441__li195348914449">9</a>.</li></ul>
</p></li><li id="ALM-45441__li195348914449"><a name="ALM-45441__li195348914449"></a><a name="li195348914449"></a><span>Check whether a large amount of data is written. If yes, wait until the task is complete and check whether the alarm is cleared.</span><p><ul id="ALM-45441__ul153413918448"><li id="ALM-45441__li25354954411">If yes, no further action is required.</li><li id="ALM-45441__li65357912445">If no, go to <a href="#ALM-45441__li6769733151816">10</a>.</li></ul>
</p></li></ol>
<p id="ALM-45441__p1086712560313"><strong id="ALM-45441__b10676357193114">Collect fault information.</strong></p>
<ol start="10" id="ALM-45441__ol14770133318187"><li id="ALM-45441__li6769733151816"><a name="ALM-45441__li6769733151816"></a><a name="li6769733151816"></a><span>On FusionInsight Manager, choose <strong id="ALM-45441__b19849230121814">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-45441__b785093001817">Log</strong> &gt; <strong id="ALM-45441__b198501930101814">Download</strong>.</span></li><li id="ALM-45441__li10902033134212"><span>Expand the <strong id="ALM-45441__b3487113271812">Service</strong> drop-down list, and select <strong id="ALM-45441__b84870323183">ClickHouse</strong> for the target cluster.</span></li><li id="ALM-45441__li1848161911347"><span>Expand the <strong id="ALM-45441__b11864103414188">Hosts</strong> drop-down list. In the <strong id="ALM-45441__b10864193401817">Select Host</strong> dialog box that is displayed, select the abnormal host, and click <strong id="ALM-45441__b17864534171819">OK</strong>.</span></li><li id="ALM-45441__li181213284341"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-45441__b15588153631815">Start Date</strong> and <strong id="ALM-45441__b1458893620182">End Date</strong> for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-45441__b3588183601817">Download</strong>.</span></li><li id="ALM-45441__li1539653315345"><span>Contact <span id="ALM-45441__text5701498183">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-45441__section1069512919569"><h4 class="sectiontitle"><span id="ALM-45441__text976142215819">Alarm Clearance</span></h4><p id="ALM-45441__p391831655614">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-45441__section891955662611"><h4 class="sectiontitle"><span id="ALM-45441__text13373191116114">Related Information</span></h4><p id="ALM-45441__p139191756122619"><span id="ALM-45441__text13669101910115">None.</span></p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,81 @@
<a name="ALM-45442"></a><a name="ALM-45442"></a>
<h1 class="topictitle1">ALM-45442 Too Many Concurrent SQL Statements</h1>
<div id="body20829273"><div class="note" id="ALM-45442__note12303191265"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-45442__p1431519566">This section applies only to MRS 3.3.0 or later.</p>
</div></div>
<div class="section" id="ALM-45442__section4181191543314"><h4 class="sectiontitle"><span id="ALM-45442__text14838183534515">Alarm Description</span></h4><p id="ALM-45442__p107817516304">The alarm module checks the number of concurrent ClickHouse requests every 30 seconds. This alarm is generated when the number of concurrent ClickHouse requests exceeds the concurrency threshold configured on the UI.</p>
<p id="ALM-45442__p5490163382810">This alarm is cleared when the system detects that the actual number of concurrent requests is less than concurrency threshold.</p>
</div>
<div class="section" id="ALM-45442__section6432132533414"><h4 class="sectiontitle"><span id="ALM-45442__text66488119489">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45442__table15811244124611" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45442__row115971544184611"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-45442__p12597174434618"><span id="ALM-45442__text1074744511529">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-45442__p5597114494615"><span id="ALM-45442__text529420513457">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-45442__p1559716445469"><span id="ALM-45442__text139206232502">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45442__row155971644124612"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-45442__p65978447466">45442</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-45442__p13598344144611">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-45442__p175981544194611">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45442__section105471213143515"><h4 class="sectiontitle"><span id="ALM-45442__text0580183514489">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45442__table132164271583" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45442__row92761127482"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.1"><p id="ALM-45442__p12276527485"><span id="ALM-45442__text12210145419505">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.2"><p id="ALM-45442__p72767277812"><span id="ALM-45442__text1971012173566">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45442__row1427616274819"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45442__p162761627283">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45442__p1427611271585">Specifies the cluster or system for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45442__row5276827783"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45442__p52764271086">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45442__p32763271180">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45442__row2027616271585"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45442__p122762271287">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45442__p2276327885">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45442__row5276202713819"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45442__p202768273810">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45442__p1227618271580">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45442__section0918121233917"><h4 class="sectiontitle"><span id="ALM-45442__text1127833410585">Impact on the System</span></h4><p id="ALM-45442__p115222903917">If there are too many concurrent SQL statements, a large number of system resources are consumed. As a result, system response becomes slow.</p>
</div>
<div class="section" id="ALM-45442__section15920165211392"><h4 class="sectiontitle"><span id="ALM-45442__text10245783115">Possible Causes</span></h4><p id="ALM-45442__p697817318317">The ClickHouse service is overloaded.</p>
</div>
<div class="section" id="ALM-45442__section1437654425314"><h4 class="sectiontitle"><span id="ALM-45442__text35421632154">Handling Procedure</span></h4><ol id="ALM-45442__ol79577513018"><li id="ALM-45442__li1195114518013"><span>Log in to FusionInsight Manager, choose <strong id="ALM-45442__b099143212118">O&amp;M</strong> &gt; <strong id="ALM-45442__b1799032122118">Alarm</strong> &gt; <strong id="ALM-45442__b119983216219">Alarms</strong>, and view the role name and the IP address for the hostname in <strong id="ALM-45442__b210015326214">Location</strong>.</span></li><li id="ALM-45442__li4765141353315"><span>Choose <strong id="ALM-45442__b189722205201">Cluster</strong> &gt; <strong id="ALM-45442__b4972020152019">ClickHouse</strong> &gt; <strong id="ALM-45442__b11972172017207">Instance</strong>, select an instance based on the alarm information. Choose <strong id="ALM-45442__b4972162015202">Chart</strong> &gt; <strong id="ALM-45442__b1972720122013">Concurrency</strong> to check whether the actual number of concurrent SQL statements is greater than SQL concurrency threshold.</span><p><ul id="ALM-45442__ul1585035204019"><li id="ALM-45442__li685012519406">If yes, go to <a href="#ALM-45442__li13745587366">3</a>.</li><li id="ALM-45442__li285011519401">If no, go to <a href="#ALM-45442__li6769733151816">5</a>.</li></ul>
</p></li><li id="ALM-45442__li13745587366"><a name="ALM-45442__li13745587366"></a><a name="li13745587366"></a><span>Confirm with the user whether a large number of tasks were being executed during the alarming period.</span><p><ul id="ALM-45442__ul32943140376"><li id="ALM-45442__li19294101493711">If yes, go to <a href="#ALM-45442__li99531855012">4</a>.</li><li id="ALM-45442__li629481463715">If no, go to <a href="#ALM-45442__li6769733151816">5</a>.</li></ul>
</p></li><li id="ALM-45442__li99531855012"><a name="ALM-45442__li99531855012"></a><a name="li99531855012"></a><span>On FusionInsight Manager, choose <strong id="ALM-45442__b896715410213">O&amp;M</strong> and click <strong id="ALM-45442__b16575104122220">Alarm</strong> &gt; <strong id="ALM-45442__b175818617222">Thresholds</strong> in the navigation pane on the left. On the displayed page, click <strong id="ALM-45442__b269553518226">ClickHouse</strong> &gt; <strong id="ALM-45442__b1158820398226">Concurrency</strong> and adjust the threshold, or wait until the task is complete. Check whether the alarm is cleared.</span><p><ul id="ALM-45442__ul9953054010"><li id="ALM-45442__li1995315703">If yes, no further action is required.</li><li id="ALM-45442__li995315511013">If no, go to <a href="#ALM-45442__li6769733151816">5</a>.</li></ul>
</p></li></ol>
<p id="ALM-45442__p348310210267"><strong id="ALM-45442__b4892621132610">Collect fault information.</strong></p>
<ol start="5" id="ALM-45442__ol14770133318187"><li id="ALM-45442__li6769733151816"><a name="ALM-45442__li6769733151816"></a><a name="li6769733151816"></a><span>On FusionInsight Manager, choose <strong id="ALM-45442__b19432253237">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-45442__b244122511234">Log</strong> &gt; <strong id="ALM-45442__b144416259232">Download</strong>.</span></li><li id="ALM-45442__li10902033134212"><span>Expand the <strong id="ALM-45442__b14925192692316">Service</strong> drop-down list, and select <strong id="ALM-45442__b15925192614233">ClickHouse</strong> for the target cluster.</span></li><li id="ALM-45442__li1848161911347"><span>Expand the <strong id="ALM-45442__b1379183072312">Hosts</strong> drop-down list. In the <strong id="ALM-45442__b19791630182314">Select Host</strong> dialog box that is displayed, select the abnormal host, and click <strong id="ALM-45442__b57911309239">OK</strong>.</span></li><li id="ALM-45442__li181213284341"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-45442__b136810341238">Start Date</strong> and <strong id="ALM-45442__b1036815344233">End Date</strong> for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-45442__b1236814342232">Download</strong>.</span></li><li id="ALM-45442__li1539653315345"><span>Contact <span id="ALM-45442__text5977173682313">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-45442__section1069512919569"><h4 class="sectiontitle"><span id="ALM-45442__text976142215819">Alarm Clearance</span></h4><p id="ALM-45442__p391831655614">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-45442__section891955662611"><h4 class="sectiontitle"><span id="ALM-45442__text13373191116114">Related Information</span></h4><p id="ALM-45442__p139191756122619"><span id="ALM-45442__text13669101910115">None.</span></p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,91 @@
<a name="ALM-45443"></a><a name="ALM-45443"></a>
<h1 class="topictitle1">ALM-45443 Slow SQL Queries in the Cluster</h1>
<div id="body63357319"><div class="note" id="ALM-45443__note12303191265"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-45443__p1431519566">This section applies only to MRS 3.3.0 or later.</p>
</div></div>
<div class="section" id="ALM-45443__section4181191543314"><h4 class="sectiontitle"><span id="ALM-45443__text14838183534515">Alarm Description</span></h4><p id="ALM-45443__p182761827181">The system checks slow SQL queries for ClickHouse every 1 minute. This alarm is generated when the execution time of a SQL statement is longer than or equal to the slow SQL threshold.</p>
<p id="ALM-45443__p88950108215">This alarm is automatically cleared when the system detects that the execution time of the SQL statement is shorter than the slow SQL threshold.</p>
</div>
<div class="section" id="ALM-45443__section6432132533414"><h4 class="sectiontitle"><span id="ALM-45443__text66488119489">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45443__table15811244124611" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45443__row115971544184611"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-45443__p12597174434618"><span id="ALM-45443__text1074744511529">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-45443__p5597114494615"><span id="ALM-45443__text529420513457">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-45443__p1559716445469"><span id="ALM-45443__text139206232502">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45443__row155971644124612"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-45443__p65978447466">45443</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-45443__p13598344144611">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-45443__p175981544194611">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45443__section105471213143515"><h4 class="sectiontitle"><span id="ALM-45443__text0580183514489">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45443__table132164271583" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45443__row92761127482"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.1"><p id="ALM-45443__p12276527485"><span id="ALM-45443__text12210145419505">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.2"><p id="ALM-45443__p72767277812"><span id="ALM-45443__text1971012173566">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45443__row1427616274819"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45443__p162761627283">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45443__p1427611271585">Specifies the cluster or system for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45443__row5276827783"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45443__p52764271086">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45443__p32763271180">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45443__row2027616271585"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45443__p122762271287">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45443__p2276327885">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45443__row5276202713819"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45443__p202768273810">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45443__p1227618271580">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45443__section0918121233917"><h4 class="sectiontitle"><span id="ALM-45443__text1127833410585">Impact on the System</span></h4><p id="ALM-45443__p115222903917">The performance of the ClickHouse service deteriorates, which slows the response of other services. If there are too many slow SQL statements, the service may be unavailable.</p>
</div>
<div class="section" id="ALM-45443__section15920165211392"><h4 class="sectiontitle"><span id="ALM-45443__text10245783115">Possible Causes</span></h4><ul id="ALM-45443__ul99361645184215"><li id="ALM-45443__li293674514216">The ClickHouse service is overloaded.</li><li id="ALM-45443__li12936945154211">The execution of SQL statements takes a long time.</li></ul>
</div>
<div class="section" id="ALM-45443__section1437654425314"><h4 class="sectiontitle"><span id="ALM-45443__text35421632154">Handling Procedure</span></h4><p id="ALM-45443__p15198192793613"><strong id="ALM-45443__b1078011002714">Check whether the ClickHouse service load is heavy.</strong></p>
<ol id="ALM-45443__ol1031577544"><li id="ALM-45443__li891013335416"><span>Log in to FusionInsight Manager, choose <strong id="ALM-45443__b12255162514213">O&amp;M</strong> &gt; <strong id="ALM-45443__b1025515259214">Alarm</strong> &gt; <strong id="ALM-45443__b625515259212">Alarms</strong>, and view the role name and the IP address for the hostname in <strong id="ALM-45443__b925532512210">Location</strong>.</span></li><li id="ALM-45443__li20952135802"><span>Log in to the node where the client is installed as the client installation user and run the following commands:</span><p><p id="ALM-45443__p20951185605"><strong id="ALM-45443__b4951351605">cd </strong><em id="ALM-45443__i49516515011">{Client installation path}</em></p>
<p id="ALM-45443__p10951851109"><strong id="ALM-45443__b895175403">source bigdata_env</strong></p>
<ul id="ALM-45443__ul119521151019"><li id="ALM-45443__li12952051017">For a cluster with Kerberos authentication enabled (security mode):<p id="ALM-45443__p199511657011"><a name="ALM-45443__li12952051017"></a><a name="li12952051017"></a><strong id="ALM-45443__b159515511014">kinit</strong> <em id="ALM-45443__i69513518010">Component service user</em></p>
<p id="ALM-45443__p20952752006"><strong id="ALM-45443__b26971317152718">clickhouse client --host </strong><em id="ALM-45443__i3697161712279">IP address of the ClickHouseServer instance that reports the alarm</em><strong id="ALM-45443__b18697191718273"> --port </strong> <strong id="ALM-45443__b4697151711279">--secure</strong></p>
</li><li id="ALM-45443__li59521456012">For a cluster with Kerberos authentication disabled (normal mode):<p id="ALM-45443__p4952175200"><a name="ALM-45443__li59521456012"></a><a name="li59521456012"></a><strong id="ALM-45443__b736817376278">clickhouse client --host </strong><em id="ALM-45443__i6368193732719">IP address of the ClickHouseServer instance that reports the alarm</em> <strong id="ALM-45443__b1436817375276">--user </strong><em id="ALM-45443__i16369153782713">Username</em><strong id="ALM-45443__b11369137182717"> --password</strong><strong id="ALM-45443__b4369193717274"> --port </strong></p>
</li></ul>
</p></li><li id="ALM-45443__li891033318545"><span>Run the following statement to check whether data is frequently written to the system table. If yes, wait until the service execution is complete and check whether the alarm is cleared.</span><p><p id="ALM-45443__p89521551509"><strong id="ALM-45443__b4952195308">SELECT query_id, user, FQDN(), elapsed, query FROM system.processes ORDER BY query_id;</strong></p>
<ul id="ALM-45443__ul99521851808"><li id="ALM-45443__li18952658011">If yes, no further action is required.</li><li id="ALM-45443__li19952159015">If no, go to <a href="#ALM-45443__li1927623020184">4</a>.</li></ul>
</p></li></ol>
<p id="ALM-45443__p965203101"><strong id="ALM-45443__b27182813319">Checking whether the SQL statements take a long time.</strong></p>
<ol start="4" id="ALM-45443__ol192761930121810"><li id="ALM-45443__li1927623020184"><a name="ALM-45443__li1927623020184"></a><a name="li1927623020184"></a><span>Check the logical cluster to which the alarm object belongs. Log in to FusionInsight Manager, click <strong id="ALM-45443__b139171459203116">Cluster</strong>, choose <strong id="ALM-45443__b529213243210">Services</strong> &gt; <strong id="ALM-45443__b8765523217">ClickHouse</strong>, and click <strong id="ALM-45443__b4693853214">Logic Cluster</strong>. On the displayed page, choose <strong id="ALM-45443__b1741973593018">Query Management</strong> &gt; <strong id="ALM-45443__b15317161653516">Ongoing Slow Queries</strong>. Check which SQL statements take a long time on the displayed page, confirm with the user to adjust services, optimize slow SQL statements, and check whether the optimization is successful.</span><p><ul id="ALM-45443__ul94391310111413"><li id="ALM-45443__li543915105142">If yes, go to <a href="#ALM-45443__li1043716190409">5</a>.</li><li id="ALM-45443__li14391510141414">If no, go to <a href="#ALM-45443__li6769733151816">6</a>.</li></ul>
</p></li><li id="ALM-45443__li1043716190409"><a name="ALM-45443__li1043716190409"></a><a name="li1043716190409"></a><span>After the SQL statements are complete, check whether the alarm is cleared.</span><p><ul id="ALM-45443__ul1437419154020"><li id="ALM-45443__li11437111916404">If yes, no further action is required.</li><li id="ALM-45443__li1743761904019">If no, go to <a href="#ALM-45443__li6769733151816">6</a>.</li></ul>
</p></li></ol>
<p id="ALM-45443__p348310210267"><strong id="ALM-45443__b4892621132610">Collect fault information.</strong></p>
<ol start="6" id="ALM-45443__ol14770133318187"><li id="ALM-45443__li6769733151816"><a name="ALM-45443__li6769733151816"></a><a name="li6769733151816"></a><span>On FusionInsight Manager, choose <strong id="ALM-45443__b14542949524230">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-45443__b3266283604230">Log</strong> &gt; <strong id="ALM-45443__b15042013484230">Download</strong>.</span></li><li id="ALM-45443__li10902033134212"><span>Expand the <strong id="ALM-45443__b13861038044230">Service</strong> drop-down list, and select <strong id="ALM-45443__b3274132804230">ClickHouse</strong> for the target cluster.</span></li><li id="ALM-45443__li1848161911347"><span>Expand the <strong id="ALM-45443__b12642628784230">Hosts</strong> drop-down list. In the <strong id="ALM-45443__b20837869254230">Select Host</strong> dialog box that is displayed, select the abnormal host, and click <strong id="ALM-45443__b18862369244230">OK</strong>.</span></li><li id="ALM-45443__li181213284341"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-45443__b14271849014230">Start Date</strong> and <strong id="ALM-45443__b5515233344230">End Date</strong> for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-45443__b2987094994230">Download</strong>.</span></li><li id="ALM-45443__li1539653315345"><span>Contact <span id="ALM-45443__text193961533193414">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-45443__section1069512919569"><h4 class="sectiontitle"><span id="ALM-45443__text976142215819">Alarm Clearance</span></h4><p id="ALM-45443__p391831655614">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-45443__section891955662611"><h4 class="sectiontitle"><span id="ALM-45443__text13373191116114">Related Information</span></h4><p id="ALM-45443__p139191756122619"><span id="ALM-45443__text13669101910115">None.</span></p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,86 @@
<a name="ALM-45444"></a><a name="ALM-45444"></a>
<h1 class="topictitle1">ALM-45444 Abnormal ClickHouse Process</h1>
<div id="body63238839"><div class="note" id="ALM-45444__note8913155652611"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-45444__p10913195632614">This section applies only to MRS 3.3.0 or later.</p>
</div></div>
<div class="section" id="ALM-45444__section209151456122611"><h4 class="sectiontitle"><span id="ALM-45444__text14838183534515">Alarm Description</span></h4><p id="ALM-45444__p107817516304">The health check module checks ClickHouse instances every 30 seconds. If the number of consecutive failures exceeds the threshold, an alarm is reported. In this case, the ClickHouse process may stop responding and services cannot be properly executed.</p>
</div>
<div class="section" id="ALM-45444__section191525620261"><h4 class="sectiontitle"><span id="ALM-45444__text66488119489">Alarm Attributes</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45444__table591515562262" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45444__row5915956102610"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.1"><p id="ALM-45444__p19151656172610"><span id="ALM-45444__text1074744511529">Alarm ID</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.2"><p id="ALM-45444__p2091518563265"><span id="ALM-45444__text529420513457">Alarm Severity</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.3.2.1.4.1.3"><p id="ALM-45444__p13915135612268"><span id="ALM-45444__text139206232502">Auto Cleared</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45444__row1391585642617"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.1 "><p id="ALM-45444__p11915185652619">45444</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.2 "><p id="ALM-45444__p12915165652616">Critical</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.3.2.1.4.1.3 "><p id="ALM-45444__p1691625612265">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45444__section19916165682616"><h4 class="sectiontitle"><span id="ALM-45444__text0580183514489">Alarm Parameters</span></h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45444__table189161756162610" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45444__row1891614568262"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.1"><p id="ALM-45444__p09167563266"><span id="ALM-45444__text12210145419505">Parameter</span></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.4.2.1.3.1.2"><p id="ALM-45444__p16916185622616"><span id="ALM-45444__text1971012173566">Description</span></p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45444__row5916195672615"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45444__p591665612614">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45444__p189166561264">Specifies the cluster or system for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45444__row13916145652618"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45444__p991619562263">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45444__p99162564261">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45444__row391615613263"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45444__p2091675612260">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45444__p14916256182615">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45444__row16916105602616"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.1 "><p id="ALM-45444__p89171556112620">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.4.2.1.3.1.2 "><p id="ALM-45444__p1091719564268">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45444__section149178567269"><h4 class="sectiontitle"><span id="ALM-45444__text1127833410585">Impact on the System</span></h4><p id="ALM-45444__p10917125632616">If the ClickHouse process is abnormal, services cannot run properly.</p>
</div>
<div class="section" id="ALM-45444__section13917856122619"><h4 class="sectiontitle"><span id="ALM-45444__text10245783115">Possible Causes</span></h4><p id="ALM-45444__p697817318317">The ClickHouse process runs improperly.</p>
</div>
<div class="section" id="ALM-45444__section14917145612262"><h4 class="sectiontitle"><span id="ALM-45444__text35421632154">Handling Procedure</span></h4><ol id="ALM-45444__ol159175560267"><li id="ALM-45444__li129171956202619"><span>Log in to FusionInsight Manager, choose <strong id="ALM-45444__b10613153985019">O&amp;M</strong> &gt; <strong id="ALM-45444__b136147391504">Alarm</strong> &gt; <strong id="ALM-45444__b7614123916503">Alarms</strong>, and view the role name and the IP address for the hostname in <strong id="ALM-45444__b1614153915014">Location</strong>.</span></li><li id="ALM-45444__li49171056152620"><span>Log in to the node where the client is installed as the client installation user and run the following commands:</span><p><p id="ALM-45444__p291714564260"><strong id="ALM-45444__b1410534311503">cd </strong><em id="ALM-45444__i2105543105018">{Client installation path}</em></p>
<p id="ALM-45444__p1091718563266"><strong id="ALM-45444__b189171056162620">source bigdata_env</strong></p>
<ul id="ALM-45444__ul2091745610265"><li id="ALM-45444__li1191795642616">For a cluster with Kerberos authentication enabled (security mode):<p id="ALM-45444__p991855611266"><a name="ALM-45444__li1191795642616"></a><a name="li1191795642616"></a><strong id="ALM-45444__b09186566263">kinit</strong> <em id="ALM-45444__i59181356172615">Component service user</em></p>
<p id="ALM-45444__p2918155682611"><strong id="ALM-45444__b20961153444222">clickhouse client --host </strong><em id="ALM-45444__i4851913054222">IP address of the ClickHouseServer instance that reports the alarm</em><strong id="ALM-45444__b14091716574222"> --port </strong>9440 <strong id="ALM-45444__b19430793294222">--secure</strong></p>
</li><li id="ALM-45444__li14918205622613">For a cluster with Kerberos authentication disabled (normal mode):<p id="ALM-45444__p13943125414375"><a name="ALM-45444__li14918205622613"></a><a name="li14918205622613"></a><strong id="ALM-45444__b12287888974222">clickhouse client --host </strong><em id="ALM-45444__i8086596194222">IP address of the ClickHouseServer instance that reports the alarm</em> <strong id="ALM-45444__b6097517624222">--user </strong><em id="ALM-45444__i20592752064222">Username</em><strong id="ALM-45444__b15603673404222"> --password</strong><strong id="ALM-45444__b19596656294222"> --port </strong>9000</p>
</li></ul>
</p></li><li id="ALM-45444__li1891825612618"><span>Run the following statement to check whether the result can be properly returned:</span><p><p id="ALM-45444__p1891845617264"><strong id="ALM-45444__b991815682610">SELECT 1;</strong></p>
<ul id="ALM-45444__ul6918125672612"><li id="ALM-45444__li891825632617">If yes, go to <a href="#ALM-45444__li611216137">4</a>.</li><li id="ALM-45444__li109188563267">If no, go to <a href="#ALM-45444__li179191356102616">5</a>.</li></ul>
</p></li><li id="ALM-45444__li611216137"><a name="ALM-45444__li611216137"></a><a name="li611216137"></a><span>Wait for several minutes and check whether the alarm is cleared.</span><p><ul id="ALM-45444__ul1344192818139"><li id="ALM-45444__li1244113287134">If yes, no further action is required.</li><li id="ALM-45444__li16441182851319">If no, go to <a href="#ALM-45444__li179191356102616">5</a>.</li></ul>
</p></li></ol>
<p id="ALM-45444__p15919856122610"><strong id="ALM-45444__b4892621132610">Collect fault information.</strong></p>
<ol start="5" id="ALM-45444__ol8919105662610"><li id="ALM-45444__li179191356102616"><a name="ALM-45444__li179191356102616"></a><a name="li179191356102616"></a><span>On FusionInsight Manager, choose <strong id="ALM-45444__b1383175624222">O&amp;M</strong>. In the navigation pane on the left, choose <strong id="ALM-45444__b1397912554222">Log</strong> &gt; <strong id="ALM-45444__b10365783414222">Download</strong>.</span></li><li id="ALM-45444__li9919165619265"><span>Expand the <strong id="ALM-45444__b7479613644222">Service</strong> drop-down list, and select <strong id="ALM-45444__b16555479604222">ClickHouse</strong> for the target cluster.</span></li><li id="ALM-45444__li18919155619260"><span>Expand the <strong id="ALM-45444__b21469614804222">Hosts</strong> drop-down list. In the <strong id="ALM-45444__b11690820864222">Select Host</strong> dialog box that is displayed, select the abnormal host, and click <strong id="ALM-45444__b18662919404222">OK</strong>.</span></li><li id="ALM-45444__li6919456142616"><span>Click the edit icon in the upper right corner, and set <strong id="ALM-45444__b19405663844222">Start Date</strong> and <strong id="ALM-45444__b12284045324222">End Date</strong> for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-45444__b13847233054222">Download</strong>.</span></li><li id="ALM-45444__li109197563265"><span>Contact <span id="ALM-45444__text209195568266">O&amp;M personnel</span> and provide the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-45444__section169191156132611"><h4 class="sectiontitle"><span id="ALM-45444__text976142215819">Alarm Clearance</span></h4><p id="ALM-45444__p7919156132617">This alarm is automatically cleared after the fault is rectified.</p>
</div>
<div class="section" id="ALM-45444__section891955662611"><h4 class="sectiontitle"><span id="ALM-45444__text13373191116114">Related Information</span></h4><p id="ALM-45444__p139191756122619"><span id="ALM-45444__text13669101910115">None.</span></p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

View File

@ -0,0 +1,84 @@
<a name="ALM-45639"></a><a name="ALM-45639"></a>
<h1 class="topictitle1">ALM-45639 Checkpointing of a Flink Job Times Out</h1>
<div id="body0000001505088401"><div class="section" id="ALM-45639__section35136898"><h4 class="sectiontitle">Description</h4><p id="ALM-45639__p1969103114813">The system checks the checkpointing timeout of Flink jobs every 30 seconds. This alarm is generated if the checkpointing timeout of a Flink job is longer than the threshold (600 seconds by default). This alarm is cleared when the checkpointing timeout of a job is less than or equal to the threshold.</p>
</div>
<div class="section" id="ALM-45639__section47796626"><h4 class="sectiontitle">Attribute</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45639__table58337011" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45639__row62299817"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-45639__p13120377">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-45639__p56117589">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-45639__p49230886">Auto Clear</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45639__row28278868"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-45639__p8886982">45639</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-45639__p48756965">Minor</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-45639__p57000088">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45639__section27516457"><h4 class="sectiontitle">Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-45639__table53604424" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-45639__row30968229"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-45639__p25398627">Name</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-45639__p44022946">Meaning</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-45639__row1083384091512"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-45639__p13858113752316">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-45639__p187931338134115">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45639__row9088906"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-45639__p39123317">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-45639__p39642994">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45639__row21242631"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-45639__p37226997">JobName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-45639__p54903620">Specifies the job for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-45639__row1364191112361"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-45639__p1664121110361">UserName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-45639__p0641191103614">Specifies the username for which the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-45639__section46321527"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-45639__p27494118">This alarm has no impact on the system.</p>
</div>
<div class="section" id="ALM-45639__section14240565"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-45639__p145934411369">The job may be in the sub-healthy state. The possible causes are as follows:</p>
<ul id="ALM-45639__ul7459744123619"><li id="ALM-45639__li12459144123615">The memory for the TaskManager of the job is insufficient.</li><li id="ALM-45639__li445974419363">The state memory is too large, making checkpointing time-consuming.</li></ul>
</div>
<div class="section" id="ALM-45639__section02555126619"><h4 class="sectiontitle">Procedure</h4><ol id="ALM-45639__en-us_topic_0000001150940918_ol89051015141915"><li id="ALM-45639__en-us_topic_0000001150940918_li14903715101912"><span>Log in to Manager as a user who has the FlinkServer management permission.</span></li><li id="ALM-45639__li16359165619266"><span>Choose <strong id="ALM-45639__b1165743382718">O&amp;M</strong> &gt; <strong id="ALM-45639__b165811331270">Alarm</strong> &gt; <strong id="ALM-45639__b3658103311276">Alarms</strong> &gt; <strong id="ALM-45639__b1965843315272">ALM-45639 Checkpointing of a Flink Job Times Out</strong>, view <strong id="ALM-45639__b965863392714">Location</strong>, and obtain the name of the task for which the alarm is generated.</span></li><li id="ALM-45639__en-us_topic_0000001150940918_li5903101518199"><span>Choose <strong id="ALM-45639__b17108371388">Cluster</strong> &gt; <strong id="ALM-45639__b710137487">Services</strong> &gt; <strong id="ALM-45639__b7102037984">Yarn</strong> and click the link next to <strong id="ALM-45639__b101010371984">ResourceManager WebUI</strong> to go to the native Yarn page.</span></li><li id="ALM-45639__en-us_topic_0000001150940918_li6904191511920"><span>Locate the failed task based on its name displayed in <strong id="ALM-45639__b14875524275">Location</strong>, search for and record the application ID of the job, and check whether the job logs are available on the Yarn page.</span><p><div class="fignone" id="ALM-45639__en-us_topic_0000001150940918_fig1390461517192"><span class="figcap"><b>Figure 1 </b>Application ID of a job</span><br><span><img id="ALM-45639__image18481135525018" src="en-us_image_0000001532448466.png"></span></div>
<ul id="ALM-45639__ul181982072158"><li id="ALM-45639__li1819847161518">If yes, go to <a href="#ALM-45639__en-us_topic_0000001150940918_li109051158191">5</a>.</li><li id="ALM-45639__li111981574152">If no, go to <a href="#ALM-45639__en-us_topic_0000001150940918_li5902415141913">7</a>.</li></ul>
</p></li><li id="ALM-45639__en-us_topic_0000001150940918_li109051158191"><a name="ALM-45639__en-us_topic_0000001150940918_li109051158191"></a><a name="en-us_topic_0000001150940918_li109051158191"></a><span>Click the application ID of the failed job to go to the job page.</span><p><ol type="a" id="ALM-45639__en-us_topic_0000001150940918_ol18905161513191"><li id="ALM-45639__en-us_topic_0000001150940918_li090431510192">Click <strong id="ALM-45639__b149851036173320">Logs</strong> in the <strong id="ALM-45639__b19985163610339">Logs</strong> column to view JobManager logs.<div class="fignone" id="ALM-45639__en-us_topic_0000001150940918_fig0904115131915"><span class="figcap"><b>Figure 2 </b>Clicking Logs</span><br><span><img id="ALM-45639__en-us_topic_0000001150940918_image290471501913" src="en-us_image_0000001583127589.png"></span></div>
</li><li id="ALM-45639__en-us_topic_0000001150940918_li232434015269">Click the ID in the <strong id="ALM-45639__b75651512193414">Attempt ID</strong> column and click <strong id="ALM-45639__b12566201216343">Logs</strong> in the <strong id="ALM-45639__b3566191233417">Logs</strong> column to view TaskManager logs.<div class="fignone" id="ALM-45639__en-us_topic_0000001150940918_fig16904101571920"><span class="figcap"><b>Figure 3 </b>Clicking the ID in the Attempt ID column</span><br><span><img id="ALM-45639__en-us_topic_0000001150940918_image1890411511199" src="en-us_image_0000001582927845.png"></span></div>
<div class="fignone" id="ALM-45639__en-us_topic_0000001150940918_fig67971748144610"><span class="figcap"><b>Figure 4 </b>Clicking Logs</span><br><span><img id="ALM-45639__en-us_topic_0000001150940918_image1620681118112" src="en-us_image_0000001583087613.png"></span></div>
<div class="note" id="ALM-45639__en-us_topic_0000001150940918_note126111528152718"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-45639__en-us_topic_0000001150940918_p14611162814277">You can also log in to Manager as a user who has the FlinkServer management permission. Choose <strong id="ALM-45639__b1840312315436">Cluster</strong> &gt; <strong id="ALM-45639__b54042039435">Services</strong> &gt; <strong id="ALM-45639__b74056315430">Flink</strong>, and click the link next to <strong id="ALM-45639__b134050311437">Flink WebUI</strong>. On the displayed Flink web UI, click <strong id="ALM-45639__b144062318432">Job Management</strong>, click <strong id="ALM-45639__b34072311437">More</strong> in the <strong id="ALM-45639__b44086314432">Operation</strong> column, and select <strong id="ALM-45639__b64075315432">Job Monitoring</strong> to view TaskManager logs.</p>
</div></div>
</li></ol>
</p></li><li id="ALM-45639__en-us_topic_0000001150940918_li1836201562019"><span>View the logs of the failed job to rectify the fault, or contact the <span id="ALM-45639__text1820979637">O&amp;M personnel</span> and send the collected fault logs. No further action is required.</span></li></ol>
<p id="ALM-45639__en-us_topic_0000001150940918_p17355115921820"><strong id="ALM-45639__b117221775412">If logs are unavailable on the Yarn page, download logs from HDFS.</strong></p>
<ol start="7" id="ALM-45639__en-us_topic_0000001150940918_ol16902715151916"><li id="ALM-45639__en-us_topic_0000001150940918_li5902415141913"><a name="ALM-45639__en-us_topic_0000001150940918_li5902415141913"></a><a name="en-us_topic_0000001150940918_li5902415141913"></a><span>On Manager, choose <strong id="ALM-45639__b12011016545">Cluster</strong> &gt; <strong id="ALM-45639__b182016165411">Services</strong> &gt; <strong id="ALM-45639__b182014161949">HDFS</strong>, click the link next to <strong id="ALM-45639__b720141616414">NameNode WebUI</strong> to go to the HDFS page, choose <strong id="ALM-45639__b820141618420">Utilities</strong> &gt; <strong id="ALM-45639__b182012161749">Browse the file system</strong>, and download logs in the <strong id="ALM-45639__b192022016747">/tmp/logs/</strong><em id="ALM-45639__i102021916645">Username</em><strong id="ALM-45639__b1020218162411">/logs/</strong><em id="ALM-45639__i142031416449">Application ID of the failed job</em> directory.</span></li><li id="ALM-45639__en-us_topic_0000001150940918_li17902141519196"><span>View the logs of the failed job to rectify the fault, or contact the <span id="ALM-45639__text0672235746">O&amp;M personnel</span> and send the collected fault logs.</span></li></ol>
</div>
<div class="section" id="ALM-45639__section169311343318"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-45639__p754913417333">This alarm is cleared when the checkpointing timeout a Flink job is less than or equal to the threshold.</p>
</div>
<div class="section" id="ALM-45639__section53362350"><h4 class="sectiontitle">Related Information</h4><p id="ALM-45639__p7522741">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>

104
docs/mrs/umn/ALM-45642.html Normal file

File diff suppressed because it is too large Load Diff

128
docs/mrs/umn/ALM-45643.html Normal file

File diff suppressed because it is too large Load Diff

129
docs/mrs/umn/ALM-45644.html Normal file

File diff suppressed because it is too large Load Diff

114
docs/mrs/umn/ALM-45645.html Normal file

File diff suppressed because it is too large Load Diff

114
docs/mrs/umn/ALM-45646.html Normal file

File diff suppressed because it is too large Load Diff

Some files were not shown because too many files have changed in this diff Show More