forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
87 lines
12 KiB
HTML
87 lines
12 KiB
HTML
<a name="ALM-18014"></a><a name="ALM-18014"></a>
|
|
|
|
<h1 class="topictitle1">ALM-18014 NodeManager Direct Memory Usage Exceeds the Threshold</h1>
|
|
<div id="body59526895"><div class="section" id="ALM-18014__s5fdd42a918f14516aefa13170b06da2c"><h4 class="sectiontitle">Description</h4><p id="ALM-18014__en-us_topic_0070543511_p11754296">The system checks the direct memory usage of the Yarn service every 30 seconds. This alarm is generated when the direct memory usage of a NodeManager instance exceeds the threshold (90% of the maximum memory).</p>
|
|
<p id="ALM-18014__en-us_topic_0070543511_p38679804">The alarm is cleared when the direct memory usage is less than the threshold.</p>
|
|
</div>
|
|
<div class="section" id="ALM-18014__sd2972802626b4b4d9a36d52932ed1fb3"><h4 class="sectiontitle">Attribute</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-18014__en-us_topic_0070543511_table46056421" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-18014__en-us_topic_0070543511_row57610085"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-18014__en-us_topic_0070543511_p35905280">Alarm ID</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-18014__en-us_topic_0070543511_p22646572">Alarm Severity</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-18014__en-us_topic_0070543511_p22433040">Automatically Cleared</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-18014__en-us_topic_0070543511_row5136981"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-18014__en-us_topic_0070543511_p13442339">18014</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-18014__en-us_topic_0070543511_p15087649">Major</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-18014__en-us_topic_0070543511_p14140023">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-18014__sb7e1d6963caa4a089f680c46978868c3"><h4 class="sectiontitle">Parameters</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-18014__en-us_topic_0070543511_table4491214" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-18014__en-us_topic_0070543511_row54597003"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-18014__en-us_topic_0070543511_p60281111">Name</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-18014__en-us_topic_0070543511_p50931783">Meaning</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-18014__row93832043102112"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18014__p192431315431">Source</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18014__p692551319435">Specifies the cluster for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-18014__en-us_topic_0070543511_row31833731"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18014__en-us_topic_0070543511_p28395444">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18014__en-us_topic_0070543511_p18329666">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-18014__en-us_topic_0070543511_row30749271"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18014__en-us_topic_0070543511_p7663035">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18014__en-us_topic_0070543511_p16726062">Specifies the role for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-18014__en-us_topic_0070543511_row16316838"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18014__en-us_topic_0070543511_p46595510">HostName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18014__en-us_topic_0070543511_p16139946">Specifies the host for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-18014__en-us_topic_0070543511_row11041789"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18014__en-us_topic_0070543511_p21969731">Trigger Condition</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18014__en-us_topic_0070543511_p34717793">Specifies the threshold triggering the alarm. If the current indicator value exceeds this threshold, the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-18014__s860d6b43ec1b41e19987f556ec51ff80"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-18014__en-us_topic_0070543511_p60677888">If the available direct memory of the Yarn service is insufficient, a memory overflow occurs and the service breaks down.</p>
|
|
</div>
|
|
<div class="section" id="ALM-18014__s739b2db0ff024015a3a8ab0e4c9e0de2"><h4 class="sectiontitle">Possible Causes</h4><p id="ALM-18014__en-us_topic_0070543511_p15961928">The direct memory of the NodeManager instance is overused or the direct memory is inappropriately allocated.</p>
|
|
</div>
|
|
<div class="section" id="ALM-18014__s625994712ead43bd81cf97e7e13f5d00"><h4 class="sectiontitle">Procedure</h4><p class="tableheading" id="ALM-18014__en-us_topic_0070543511_p17847776"><strong id="ALM-18014__b5686950019135">Check the direct memory usage.</strong></p>
|
|
<ol id="ALM-18014__ol40632174191331"><li id="ALM-18014__li15633359191259"><span>On the FusionInsight Manager portal, choose <strong id="ALM-18014__b155371130203016">O&M</strong> > <strong id="ALM-18014__b20918169191259">Alarm<strong id="ALM-18014__b27872374104950"> > Alarms</strong></strong> > <strong id="ALM-18014__b54045796191259">ALM-18014 NodeManager Direct Memory Usage Exceeds the Threshold</strong> > <strong id="ALM-18014__b16650120191259">Location</strong> to check the IP address of the instance for which the alarm is generated.</span></li><li id="ALM-18014__li35249531191259"><span>On the FusionInsight Manager portal, choose <strong id="ALM-18014__b209351249144514">Cluster > </strong><em id="ALM-18014__i69371449104520">Name of the desired cluster</em><strong id="ALM-18014__b16935134918450"> > Services</strong> > <strong id="ALM-18014__b58342542191259">Yarn</strong> > <strong id="ALM-18014__b55320833191259">Instance</strong> > <strong id="ALM-18014__b28125450191259">NodeManager (IP address for which the alarm is generated)</strong>. Click the drop-down menu in the upper right corner of <strong id="ALM-18014__b3273144141318">Chart</strong>, choose <strong id="ALM-18014__b7246166191312">Customize</strong> > <strong id="ALM-18014__b1967291015463">Resource</strong> > <strong id="ALM-18014__b187181656134513">Percentage of</strong> <strong id="ALM-18014__b136401716144520">Used</strong> <strong id="ALM-18014__b63568938191259">Memory of the NodeManager </strong>to check the direct memory usage.</span></li><li id="ALM-18014__li9496994191259"><span>Check whether the used direct memory of NodeManager reaches 90% of the maximum direct memory specified for NodeManager by default.</span><p><ul class="subitemlist" id="ALM-18014__ul1055221191259"><li id="ALM-18014__li36639757191259">If yes, go to <a href="#ALM-18014__li787981191259">4</a>.</li><li id="ALM-18014__li15030327191259">If no, go to <a href="#ALM-18014__li34398621191259">9</a>.</li></ul>
|
|
</p></li><li id="ALM-18014__li787981191259"><a name="ALM-18014__li787981191259"></a><a name="li787981191259"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-18014__b125597551453">Cluster > </strong><em id="ALM-18014__i19561185511457">Name of the desired cluster</em><strong id="ALM-18014__b65606558458"> > Services</strong> > <strong id="ALM-18014__b31059062191259">Yarn</strong> > <strong id="ALM-18014__b11096102191259">Configurations</strong> > <strong id="ALM-18014__b32756057191259">All</strong> <strong id="ALM-18014__b10178152293112">Configurations</strong>> <strong id="ALM-18014__b26369065191259">NodeManager</strong> > <strong id="ALM-18014__b35994998191259">System</strong> to check whether "-XX:MaxDirectMemorySize" exists in the <strong id="ALM-18014__b1736520175112">GC_OPTS</strong> parameter.</span><p><ul class="subitemlist" id="ALM-18014__ul15997525134816"><li id="ALM-18014__li299716257489">If yes, go to <a href="#ALM-18014__li66301833195114">5</a>.</li><li id="ALM-18014__li169971025174810">If no, go to <a href="#ALM-18014__li735165905117">7</a>.</li></ul>
|
|
</p></li><li id="ALM-18014__li66301833195114"><a name="ALM-18014__li66301833195114"></a><a name="li66301833195114"></a><span>In the <strong id="ALM-18014__b67786402513">GC_OPTS</strong> parameter, delete "-XX:MaxDirectMemorySize".</span></li><li id="ALM-18014__li7091832191259"><span>Save the configuration and restart the NodeManager instance.</span></li><li id="ALM-18014__li735165905117"><a name="ALM-18014__li735165905117"></a><a name="li735165905117"></a><span>Check whether the <strong id="ALM-18014__b593713587192">ALM-18018 NodeManager Heap Memory Usage Exceeds the Threshold</strong> exists.</span><p><ul class="subitemlist" id="ALM-18014__ul683411514521"><li id="ALM-18014__li4834156523">If yes, handle the alarm by referring to <strong id="ALM-18014__b1712361962016">ALM-18018 NodeManager Heap Memory Usage Exceeds the Threshold</strong>.</li><li id="ALM-18014__li7834754527">If no, go to <a href="#ALM-18014__li56845771191259">8</a>.</li></ul>
|
|
</p></li><li id="ALM-18014__li56845771191259"><a name="ALM-18014__li56845771191259"></a><a name="li56845771191259"></a><span>Check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-18014__ul23071420191259"><li id="ALM-18014__li63826494191259">If yes, no further action is required.</li><li id="ALM-18014__li2563491191259">If no, go to <a href="#ALM-18014__li34398621191259">9</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-18014__p6316196191259"><strong id="ALM-18014__b2435655191343">Collect fault information.</strong></p>
|
|
<ol start="9" id="ALM-18014__ol7634807191346"><li id="ALM-18014__li34398621191259"><a name="ALM-18014__li34398621191259"></a><a name="li34398621191259"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-18014__b172411035133113">O&M</strong> > <strong id="ALM-18014__b41104771191259">Log > Download</strong>.</span></li><li id="ALM-18014__li2247272191259"><span>Select <strong id="ALM-18014__b41152139191259">NodeManager</strong> in the required cluster from the <strong id="ALM-18014__b34824932191259">Service</strong>.</span></li><li id="ALM-18014__li1145664103113"><span>Click <span><img id="ALM-18014__image1945644173117" src="en-us_image_0269417401.png"></span> in the upper right corner, and set <strong id="ALM-18014__b6456941173117">Start Date</strong> and <strong id="ALM-18014__b11456154113318">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-18014__b13456164113319">Download</strong>.</span></li><li id="ALM-18014__li23401966191259"><span>Contact the <span id="ALM-18014__text4614151421417">O&M personnel</span> and send the collected logs.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="ALM-18014__section1529716184534"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-18014__p4677152685316">After the fault is rectified, the system automatically clears this alarm.</p>
|
|
</div>
|
|
<div class="section" id="ALM-18014__s97470c6442954a54bf42fdce7b2f5682"><h4 class="sectiontitle">Related Information</h4><p id="ALM-18014__en-us_topic_0070543511_p64696659">None</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|