doc-exports/docs/mrs/umn/ALM-18025.html
Yang, Tong 3b1f73dece MRS UMN 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-13 12:03:34 +00:00

84 lines
9.4 KiB
HTML

<a name="ALM-18025"></a><a name="ALM-18025"></a>
<h1 class="topictitle1">ALM-18025 Number of Terminated Yarn Tasks Exceeds the Threshold</h1>
<div id="body1594259005512"><div class="section" id="ALM-18025__section23124172435"><h4 class="sectiontitle">Description</h4><p id="ALM-18025__p14509121716435">The alarm module checks the number of terminated applications in the Yarn root queue every 60 seconds. The alarm is generated when the number exceeds 50 for three consecutive times.</p>
</div>
<div class="section" id="ALM-18025__section23131117184318"><h4 class="sectiontitle">Attribute</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-18025__table431431734313" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-18025__row16509717104312"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-18025__p75102017194316">Alarm ID</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-18025__p18510181744313">Alarm Severity</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-18025__p20510181717432">Auto Clear</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-18025__row165103177433"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-18025__p15510121712439">18025</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-18025__p16510317194317">Major</p>
</td>
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-18025__p1751014176434">Yes</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-18025__section10321917184310"><h4 class="sectiontitle">Parameters</h4>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-18025__table20322191713436" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-18025__row135105174431"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-18025__p3510317194312">Name</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-18025__p1951011174439">Meaning</p>
</th>
</tr>
</thead>
<tbody><tr id="ALM-18025__row15510171711439"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18025__p65105174439">Cluster Name</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18025__p10510217134315">Specifies the cluster for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-18025__row451016179439"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18025__p115109171439">Service Name</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18025__p12510101734314">Specifies the service for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-18025__row2510121716435"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18025__p051031713434">Role Name</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18025__p19510111710438">Specifies the role for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-18025__row15421151713552"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18025__p642111745516">HostName</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18025__p119560371492">Specifies the host for which the alarm is generated.</p>
</td>
</tr>
<tr id="ALM-18025__row5473622145519"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-18025__p353873093910">Trigger Condition</p>
</td>
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-18025__p164731722165516">Specifies the threshold triggering the alarm. If the current indicator value exceeds this threshold, the alarm is generated.</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div class="section" id="ALM-18025__section1032901714432"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-18025__p15101317164311">A large number of application tasks are forcibly terminated.</p>
</div>
<div class="section" id="ALM-18025__section5329151794317"><h4 class="sectiontitle">Possible Causes</h4><ul id="ALM-18025__ul4510141754319"><li id="ALM-18025__li8510417144310">The user forcibly terminates a large number of tasks.</li><li id="ALM-18025__li125109173439">The system terminates tasks due to some error.</li></ul>
</div>
<div class="section" id="ALM-18025__section173331917134319"><h4 class="sectiontitle">Procedure</h4><p id="ALM-18025__p13510317204313"><strong id="ALM-18025__b10510171784315">Check the alarm details.</strong></p>
<ol id="ALM-18025__ol1488384754320"><li id="ALM-18025__li588044744311"><span>On the FusionInsight Manager portal, choose <strong id="ALM-18025__b1656491934412">O&amp;M &gt; Alarm &gt; Alarms</strong> to go to the alarm page.</span></li><li id="ALM-18025__li888044784310"><span>View <strong id="ALM-18025__b1088064719436">Additional Information</strong> in the alarm details to check whether the alarm threshold is too small.</span><p><ul id="ALM-18025__ul7880174713432"><li id="ALM-18025__li588017472431">If yes, go to <a href="#ALM-18025__li20880184714436">3</a>.</li><li id="ALM-18025__li4880184711439">If no, go to <a href="#ALM-18025__li2088019471430">4</a>.</li></ul>
</p></li><li id="ALM-18025__li20880184714436"><a name="ALM-18025__li20880184714436"></a><a name="li20880184714436"></a><span>Choose <strong id="ALM-18025__b178801047114318">O&amp;M</strong> &gt; <strong id="ALM-18025__b188018474437">Alarm</strong> &gt; <strong id="ALM-18025__b168801547144312">Thresholds</strong> &gt; <em id="ALM-18025__i14880184744316">Name of the desired cluster</em> &gt; <strong id="ALM-18025__b1688044715433">Yarn</strong> &gt; <strong id="ALM-18025__b888017478437">Other</strong> &gt; <strong id="ALM-18025__b1288014719439">Terminated Applications of root queue</strong> to modify the threshold. Go to <a href="#ALM-18025__li4883154713439">6</a>.</span></li><li id="ALM-18025__li2088019471430"><a name="ALM-18025__li2088019471430"></a><a name="li2088019471430"></a><span>Choose <strong id="ALM-18025__b10880114784318">Cluster</strong> &gt; <em id="ALM-18025__i1088010478436">Name of the desired cluster</em> &gt; <strong id="ALM-18025__b1788074764315">Services</strong> &gt; <strong id="ALM-18025__b19880184744319">Yarn</strong> &gt; <strong id="ALM-18025__b98801747124317">ResourceManager(Active)</strong> to access the ResourceManager web UI.</span></li><li id="ALM-18025__li288119475431"><span>Click <strong id="ALM-18025__b1288024754317">KILLED</strong> in <strong id="ALM-18025__b8880047144313">Applications</strong> and click the task on the top. View the description of <strong id="ALM-18025__b20881447134318">Diagnostics</strong> and rectify the fault based on the task termination details (for example, the task is terminated by a user).</span></li><li id="ALM-18025__li4883154713439"><a name="ALM-18025__li4883154713439"></a><a name="li4883154713439"></a><span>Wait for 3 minutes and check whether the alarm is cleared.</span><p><ul id="ALM-18025__ul138831247124310"><li id="ALM-18025__li088394716437">If yes, no further action is required.</li><li id="ALM-18025__li38831647124314">If no, go to <a href="#ALM-18025__li4879124718434">7</a>.</li></ul>
</p></li></ol>
<p id="ALM-18025__p5745194384319"><strong id="ALM-18025__b16511121714315">Collect the fault information.</strong></p>
<ol start="7" id="ALM-18025__ol28802475435"><li id="ALM-18025__li4879124718434"><a name="ALM-18025__li4879124718434"></a><a name="li4879124718434"></a><span>On the FusionInsight Manager, choose <strong id="ALM-18025__b957571113459">O&amp;M &gt; Log &gt; Download</strong>.</span></li><li id="ALM-18025__li88807475436"><span>Expand the <strong id="ALM-18025__b1487954717437">Service</strong> drop-down list, and select <strong id="ALM-18025__b98791547154320">Yarn</strong> for the target cluster.</span></li><li id="ALM-18025__li13880647194313"><span>Click <span><img id="ALM-18025__image78802047124311" src="en-us_image_0269417413.png"></span> in the upper right corner, and set <strong id="ALM-18025__b13880164714317">Start Date</strong> and <strong id="ALM-18025__b0880194711435">End Date</strong> for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-18025__b1880174712432">Download</strong>.</span></li><li id="ALM-18025__li1388094794314"><span>Contact the <span id="ALM-18025__text4614151421417">O&amp;M personnel</span> and send the collected logs.</span></li></ol>
</div>
<div class="section" id="ALM-18025__section0345517164314"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-18025__p155111017104318">After the fault is rectified, the system automatically clears this alarm.</p>
</div>
<div class="section" id="ALM-18025__section173462017184320"><h4 class="sectiontitle">Related Information</h4><p id="ALM-18025__p1351113174431">None</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
</div>
</div>