forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
81 lines
10 KiB
HTML
81 lines
10 KiB
HTML
<a name="ALM-18023"></a><a name="ALM-18023"></a>
|
|
|
|
<h1 class="topictitle1">ALM-18023 Number of Pending Yarn Tasks Exceeds the Threshold</h1>
|
|
<div id="body1594113689592"><div class="section" id="ALM-18023__section31658481"><h4 class="sectiontitle">Description</h4><p id="ALM-18023__p3566950121720">The alarm module checks the number of pending applications in the Yarn root queue every 60 seconds. The alarm is generated when the number exceeds 60.</p>
|
|
</div>
|
|
<div class="section" id="ALM-18023__section16490876"><h4 class="sectiontitle">Attribute</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-18023__table7825795184" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-18023__row10829199161819"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-18023__p7830149181817">Alarm ID</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-18023__p4832169171818">Alarm Severity</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-18023__p7834295185">Auto Clear</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-18023__row11834698184"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-18023__p138359915188">18023</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-18023__p108361599186">Major</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-18023__p1083810991819">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-18023__section14200159"><h4 class="sectiontitle">Parameters</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-18023__table15448152818187" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-18023__row2451192861813"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-18023__p445318287184">Name</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-18023__p14455152871817">Meaning</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-18023__row077613291817"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18023__p17935380415">Source</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18023__p187931338134115">Specifies the cluster for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-18023__row8457102815185"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18023__p17459122801816">QueueName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18023__p8460192821819">Identifies the queue for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-18023__row846182891817"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18023__p1546213282187">QueueMetric</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18023__p8462142814188">Identifies the queue indicator for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-18023__section60692571"><h4 class="sectiontitle">Impact on the System</h4><ul id="ALM-18023__ul8914113131715"><li id="ALM-18023__li591413314174">It takes long time to end an application.</li><li id="ALM-18023__li1242610612171">A new application cannot run after submission.</li></ul>
|
|
</div>
|
|
<div class="section" id="ALM-18023__section9362234"><h4 class="sectiontitle">Possible Causes</h4><ul id="ALM-18023__ul29000801"><li id="ALM-18023__li2292055">NodeManager node resources are insufficient.</li><li id="ALM-18023__li13582173317229">The maximum resource capacity of the queue and the maximum AM resource percentage are too small.</li><li id="ALM-18023__li945917112045">The monitoring threshold is too small.</li></ul>
|
|
</div>
|
|
<div class="section" id="ALM-18023__section18537579256"><h4 class="sectiontitle">Procedure</h4><p class="tableheading" id="ALM-18023__p6470045115846"><strong id="ALM-18023__b116622471356">Check NodeManager resources.</strong></p>
|
|
<ol id="ALM-18023__ol1868517116470"><li id="ALM-18023__li4684191104718"><span>On FusionInsight Manager, choose <strong id="ALM-18023__b42473451324">Cluster</strong> > <em id="ALM-18023__i3144185143215">Name of the desired cluster</em> > <strong id="ALM-18023__b206156534322">Services</strong> > <strong id="ALM-18023__b156781855203218">Yarn</strong> > <strong id="ALM-18023__b187115583323">ResourceManager (Active)</strong> to access the ResourceManager web UI.</span></li><li id="ALM-18023__li46851012470"><span>Click <strong id="ALM-18023__b161941917121410">Scheduler</strong> and check whether the root queue resources are used up in <strong id="ALM-18023__b21941817201411">Application Queues</strong>.</span><p><ul id="ALM-18023__ul187511556173018"><li id="ALM-18023__li675185663018">If yes, go to <a href="#ALM-18023__li1894618168247">3</a>.</li><li id="ALM-18023__li114533583111">If no, go to <a href="#ALM-18023__li156321342274">4</a>.</li></ul>
|
|
</p></li><li id="ALM-18023__li1894618168247"><a name="ALM-18023__li1894618168247"></a><a name="li1894618168247"></a><span>Expand the capacity of the NodeManager instance of the Yarn service. After the capacity expansion, check whether the alarm is cleared.</span><p><ul id="ALM-18023__ul2024294142412"><li id="ALM-18023__li172421049244">If yes, no further action is required.</li><li id="ALM-18023__li1424317422412">If no, go to <a href="#ALM-18023__li15314143611285">6</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p id="ALM-18023__p8592842272"><strong id="ALM-18023__b689651816148">Check the maximum queue resource capacity and the maximum AM resource percentage.</strong></p>
|
|
<ol start="4" id="ALM-18023__ol15633154152713"><li id="ALM-18023__li156321342274"><a name="ALM-18023__li156321342274"></a><a name="li156321342274"></a><span>Check whether the resources of the queue corresponding to the pending task are used up.</span><p><ul id="ALM-18023__ul116320432715"><li id="ALM-18023__li19632114122710">If yes, go to <a href="#ALM-18023__li1663218419278">5</a>.</li><li id="ALM-18023__li1663212411273">If no, go to <a href="#ALM-18023__li15314143611285">6</a>.</li></ul>
|
|
</p></li><li id="ALM-18023__li1663218419278"><a name="ALM-18023__li1663218419278"></a><a name="li1663218419278"></a><span>On FusionInsight Manager, choose <strong id="ALM-18023__b255824318154">Tenant Resources</strong> > <strong id="ALM-18023__b655913431152">Dynamic Resource Plan</strong> and add resources as required. Check whether the alarms are cleared.</span><p><ul id="ALM-18023__ul106325419271"><li id="ALM-18023__li0632204152712">If yes, no further action is required.</li><li id="ALM-18023__li66321941273">If no, go to <a href="#ALM-18023__li15314143611285">6</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p id="ALM-18023__p1529393618286"><strong id="ALM-18023__b11177653161416">Adjust the monitoring thresholds.</strong></p>
|
|
<ol start="6" id="ALM-18023__ol10314143615285"><li id="ALM-18023__li15314143611285"><a name="ALM-18023__li15314143611285"></a><a name="li15314143611285"></a><span>On FusionInsight Manager, choose <strong id="ALM-18023__b131667668515741">O&M</strong> > <strong id="ALM-18023__b187703917115741">Alarm</strong> > <strong id="ALM-18023__b97239161815741">Thresholds</strong> > <em id="ALM-18023__i26841108815741">Name of the desired cluster</em> > <strong id="ALM-18023__b37318807815741">Yarn</strong> > <strong id="ALM-18023__b193724522615741">Applications</strong> > <strong id="ALM-18023__b171535899515741">Pending Applications</strong>, and increase the thresholds as required.</span></li><li id="ALM-18023__li163141936132814"><span>Check whether the alarm is cleared 5 minutes later.</span><p><ul id="ALM-18023__ul1314036132819"><li id="ALM-18023__li53146360282">If yes, no further action is required.</li><li id="ALM-18023__li731463652817">If no, go to <a href="#ALM-18023__li76841314475">8</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-18023__p5357154554619"><strong id="ALM-18023__b491319201739">Collect the fault information.</strong></p>
|
|
<ol start="8" id="ALM-18023__ol176841310478"><li id="ALM-18023__li76841314475"><a name="ALM-18023__li76841314475"></a><a name="li76841314475"></a><span>On FusionInsight Manager, choose <strong id="ALM-18023__b13761012201619">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-18023__b43877127165">Log</strong> > <strong id="ALM-18023__b5389131281613">Download</strong>.</span></li><li id="ALM-18023__li45621121134714"><span>Expand the <strong id="ALM-18023__b954611843915">Service</strong> drop-down list, and select <strong id="ALM-18023__b195478814396">Yarn</strong> for the target cluster.</span></li><li id="ALM-18023__li195647218474"><span>Click <span><img id="ALM-18023__image104601319175315" src="en-us_image_0263895802.png"></span> in the upper right corner, and set <strong id="ALM-18023__b11270181318395">Start Date</strong> and <strong id="ALM-18023__b19271161393913">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-18023__b1927181323910">Download</strong>.</span></li><li id="ALM-18023__li556542113476"><span>Contact <span id="ALM-18023__text980213834320">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="ALM-18023__section169311343318"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-18023__p754913417333">This alarm is automatically cleared after the fault is rectified.</p>
|
|
</div>
|
|
<div class="section" id="ALM-18023__section20143465"><h4 class="sectiontitle">Related Information</h4><p id="ALM-18023__p32409199">None</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|