doc-exports/docs/mrs/umn/ALM-18015.html
Yang, Tong 3b1f73dece MRS UMN 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-13 12:03:34 +00:00

87 lines
12 KiB
HTML

<a name="ALM-18015"></a><a name="ALM-18015"></a>
<h1 class="topictitle1">ALM-18015 JobHistoryServer Direct Memory Usage Exceeds the Threshold</h1>
<div id="body57904994"><div class="section" id="ALM-18015__s71f3ed0b709043b7b67594deb5012367"><h4 class="sectiontitle">Description</h4><p id="ALM-18015__en-us_topic_0070543512_p2142393">The system checks the direct memory usage of the MapReduce service every 30 seconds. This alarm is generated when the direct memory usage of a JobHistoryServer instance exceeds the threshold (90% of the maximum memory).</p>
<p id="ALM-18015__en-us_topic_0070543512_p19281542">The alarm is cleared when the direct memory usage is less than the threshold.</p>
</div>
<div class="section" id="ALM-18015__s30a7a58b750946dcb79260caaa91e0fe"><h4 class="sectiontitle">Attribute</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-18015__en-us_topic_0070543512_table18301082" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-18015__en-us_topic_0070543512_row59012183"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-18015__en-us_topic_0070543512_p15257500">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-18015__en-us_topic_0070543512_p27898002">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-18015__en-us_topic_0070543512_p45145696">Automatically Cleared</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-18015__en-us_topic_0070543512_row32922799"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-18015__en-us_topic_0070543512_p49501095">18015</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-18015__en-us_topic_0070543512_p50165743">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-18015__en-us_topic_0070543512_p36893359">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-18015__s4256e66f5c404f7d822cb00d73527f02"><h4 class="sectiontitle">Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-18015__en-us_topic_0070543512_table35572112" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-18015__en-us_topic_0070543512_row57337714"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-18015__en-us_topic_0070543512_p13843256">Name</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-18015__en-us_topic_0070543512_p47561922">Meaning</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-18015__row17576183422117"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18015__p192431315431">Source</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18015__p692551319435">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-18015__en-us_topic_0070543512_row27310456"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18015__en-us_topic_0070543512_p64663364">ServiceName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18015__en-us_topic_0070543512_p3241099">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-18015__en-us_topic_0070543512_row29169897"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18015__en-us_topic_0070543512_p13951479">RoleName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18015__en-us_topic_0070543512_p56327989">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-18015__en-us_topic_0070543512_row37189853"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18015__en-us_topic_0070543512_p59588135">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18015__en-us_topic_0070543512_p61909621">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-18015__en-us_topic_0070543512_row20315681"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18015__en-us_topic_0070543512_p34957489">Trigger Condition</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18015__en-us_topic_0070543512_p12984378">Specifies the threshold triggering the alarm. If the current indicator value exceeds this threshold, the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-18015__s91c10e29dcc44cd58de178b40bab26e7"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-18015__en-us_topic_0070543512_p45101667">If the available direct memory of the MapReduce service is insufficient, a memory overflow occurs and the service breaks down.</p>
</div>
<div class="section" id="ALM-18015__s7c6e0b84cbc24fdeb07b3a99e4473b8f"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-18015__en-us_topic_0070543512_p29356372">The direct memory of the JobHistoryServer instance is overused or the direct memory is inappropriately allocated.</p>
</div>
<div class="section" id="ALM-18015__sc2bcda49280f439ba1f24fddf1a64e83"><h4 class="sectiontitle">Procedure</h4><p class="tableheading" id="ALM-18015__en-us_topic_0070543512_p29055926"><strong id="ALM-18015__b5839118619155">Check the direct memory usage.</strong></p>
<ol id="ALM-18015__ol49978311191511"><li id="ALM-18015__li1273083191459"><span>On the FusionInsight Manager portal, choose <strong id="ALM-18015__b53118090191459">O&amp;M &gt; Alarm<strong id="ALM-18015__b27872374104950"> &gt; Alarms</strong></strong> &gt; <strong id="ALM-18015__b8300762191459">ALM-18015 JobHistoryServer Direct Memory Usage Exceeds the Threshold</strong> &gt; <strong id="ALM-18015__b7597994191459">Location</strong><strong id="ALM-18015__b1261056133210"> </strong>to check the IP address of the instance for which the alarm is generated.</span></li><li id="ALM-18015__li62884407191459"><span>On the FusionInsight Manager portal, choose <strong id="ALM-18015__b1543418140145">Cluster &gt; </strong><em id="ALM-18015__i1643718140142">Name of the desired cluster</em><strong id="ALM-18015__b843591412143"> &gt; Services</strong> &gt; <strong id="ALM-18015__b36010893191459">MapReduce</strong> &gt; <strong id="ALM-18015__b55662583191459">Instance</strong> &gt; <strong id="ALM-18015__b31201207191459">JobHistoryServer (IP address for which the alarm is generated).</strong> Click the drop-down menu in the upper right corner of <strong id="ALM-18015__b3273144141318">Chart</strong>, choose <strong id="ALM-18015__b7246166191312">Customize</strong> &gt; <strong id="ALM-18015__b44269858191459">Memory Usage Status of JobHistoryServer</strong> to check the direct memory usage.</span></li><li id="ALM-18015__li10699945191459"><span>Check whether the used direct memory of JobHistoryServer reaches 90% of the maximum direct memory specified for JobHistoryServer by default.</span><p><ul class="subitemlist" id="ALM-18015__ul60841206191459"><li id="ALM-18015__li60472230191459">If yes, go to <a href="#ALM-18015__li7519563191459">4</a>.</li><li id="ALM-18015__li66412457191459">If no, go to <a href="#ALM-18015__li59831061191459">9</a>.</li></ul>
</p></li><li id="ALM-18015__li7519563191459"><a name="ALM-18015__li7519563191459"></a><a name="li7519563191459"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-18015__b1089014231142">Cluster &gt; </strong><em id="ALM-18015__i168939238148">Name of the desired cluster</em><strong id="ALM-18015__b14891162316146"> &gt; Services</strong> &gt; <strong id="ALM-18015__b61389197191459">MapReduce</strong> &gt; <strong id="ALM-18015__b15631864191459">Configurations</strong> &gt; <strong id="ALM-18015__b6469054191459">All</strong> <strong id="ALM-18015__b131797462331">Configurations</strong> &gt; <strong id="ALM-18015__b58221486191459">JobHistoryServer</strong> &gt; <strong id="ALM-18015__b54231331191459">System</strong> to check whether "-XX:MaxDirectMemorySize" exists in the <strong id="ALM-18015__b1736520175112">GC_OPTS</strong> parameter.</span><p><ul class="subitemlist" id="ALM-18015__ul9416192824610"><li id="ALM-18015__li17416228104615">If yes, go to <a href="#ALM-18015__li16830456145416">5</a>.</li><li id="ALM-18015__li5416102874611">If no, go to <a href="#ALM-18015__li195912241558">7</a>.</li></ul>
</p></li><li id="ALM-18015__li16830456145416"><a name="ALM-18015__li16830456145416"></a><a name="li16830456145416"></a><span>In the <strong id="ALM-18015__b385913105554">GC_OPTS</strong> parameter, delete "-XX:MaxDirectMemorySize".</span></li><li id="ALM-18015__li567211191459"><span>Save the configuration and restart the JobHistoryServer instance.</span></li><li id="ALM-18015__li195912241558"><a name="ALM-18015__li195912241558"></a><a name="li195912241558"></a><span>Check whether the <strong id="ALM-18015__b1652612315215">ALM-18009 Heap Memory Usage of JobHistoryServer Exceeds the Threshold</strong> exists.</span><p><ul class="subitemlist" id="ALM-18015__ul683411514521"><li id="ALM-18015__li4834156523">If yes, handle the alarm by referring to <strong id="ALM-18015__b8735121442115">ALM-18009 Heap Memory Usage of JobHistoryServer Exceeds the Threshold</strong>.</li><li id="ALM-18015__li7834754527">If no, go to <a href="#ALM-18015__li53290472191459">8</a>.</li></ul>
</p></li><li id="ALM-18015__li53290472191459"><a name="ALM-18015__li53290472191459"></a><a name="li53290472191459"></a><span>Check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-18015__ul30484068191459"><li id="ALM-18015__li5104899191459">If yes, no further action is required.</li><li id="ALM-18015__li10843659191459">If no, go to <a href="#ALM-18015__li59831061191459">9</a>.</li></ul>
</p></li></ol>
<p class="tableheading" id="ALM-18015__p5921163191459"><strong id="ALM-18015__b11370124191521">Collect fault information.</strong></p>
<ol start="9" id="ALM-18015__ol28589065191526"><li id="ALM-18015__li59831061191459"><a name="ALM-18015__li59831061191459"></a><a name="li59831061191459"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-18015__b131651627342">O&amp;M</strong> &gt; <strong id="ALM-18015__b21560976191459">Log &gt; Download</strong>.</span></li><li id="ALM-18015__li31851746191459"><span>Select <strong id="ALM-18015__b1608645191459">JobHistoryServer</strong> in the required cluster from the <strong id="ALM-18015__b14477807191459">Service</strong>.</span></li><li id="ALM-18015__li1145664103113"><span>Click <span><img id="ALM-18015__image1945644173117" src="en-us_image_0269417402.png"></span> in the upper right corner, and set <strong id="ALM-18015__b6456941173117">Start Date</strong> and <strong id="ALM-18015__b11456154113318">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-18015__b13456164113319">Download</strong>.</span></li><li id="ALM-18015__li52451676191459"><span>Contact the <span id="ALM-18015__text4614151421417">O&amp;M personnel</span> and send the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-18015__section1529716184534"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-18015__p4677152685316">After the fault is rectified, the system automatically clears this alarm.</p>
</div>
<div class="section" id="ALM-18015__en-us_topic_0070543512_section797855"><h4 class="sectiontitle">Related Information</h4><p id="ALM-18015__en-us_topic_0070543512_p64271818">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>