forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
162 lines
23 KiB
HTML
162 lines
23 KiB
HTML
<a name="ALM-17003"></a><a name="ALM-17003"></a>
|
|
|
|
<h1 class="topictitle1">ALM-17003 Oozie Service Unavailable</h1>
|
|
<div id="body55202859"><div class="section" id="ALM-17003__se7b6e3e86a194ea2a7e9711be53b033e"><h4 class="sectiontitle">Description</h4><p id="ALM-17003__en-us_topic_0070543676_p66327603">The system checks the Oozie service status in every 5 seconds. This alarm is generated when Oozie or a component on which Oozie depends cannot provide services properly.</p>
|
|
<p id="ALM-17003__en-us_topic_0070543676_p60077523">This alarm is automatically cleared when the Oozie service recovers.</p>
|
|
</div>
|
|
<div class="section" id="ALM-17003__s85b924d1df614d23aedb2f2354223cf6"><h4 class="sectiontitle">Attribute</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-17003__en-us_topic_0070543676_table34441226" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-17003__en-us_topic_0070543676_row6710359"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-17003__en-us_topic_0070543676_p6668207">Alarm ID</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-17003__en-us_topic_0070543676_p3253908">Alarm Severity</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-17003__en-us_topic_0070543676_p62239981">Automatically Cleared</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-17003__en-us_topic_0070543676_row8273699"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-17003__en-us_topic_0070543676_p66189853">17003</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-17003__en-us_topic_0070543676_p59777859">Critical</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-17003__en-us_topic_0070543676_p10168447">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-17003__sd304ff8fd2c74a87acb20279b50107f1"><h4 class="sectiontitle">Parameters</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-17003__en-us_topic_0070543676_table18337846" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-17003__en-us_topic_0070543676_row29992216"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-17003__en-us_topic_0070543676_p13450398">Name</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-17003__en-us_topic_0070543676_p15740453">Meaning</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-17003__row606202248"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-17003__p192431315431">Source</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-17003__p692551319435">Specifies the cluster for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-17003__en-us_topic_0070543676_row67017172"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-17003__en-us_topic_0070543676_p59681876">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-17003__en-us_topic_0070543676_p2393805">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-17003__en-us_topic_0070543676_row21544246"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-17003__en-us_topic_0070543676_p253462">RoleName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-17003__en-us_topic_0070543676_p20530434">Specifies the role for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-17003__en-us_topic_0070543676_row50556185"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-17003__en-us_topic_0070543676_p1410307">HostName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-17003__en-us_topic_0070543676_p47126062">Specifies the host for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-17003__s13f5065bfb1e47578f4e7698d897162f"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-17003__en-us_topic_0070543676_p59114690">Oozie cannot be used to submit jobs.</p>
|
|
</div>
|
|
<div class="section" id="ALM-17003__s81b9ab20cd0f4240bcc27c672801d2bf"><h4 class="sectiontitle">Possible Causes</h4><ul id="ALM-17003__en-us_topic_0070543676_ul23560549"><li id="ALM-17003__en-us_topic_0070543676_li10718357">The DBService service is abnormal or the data of Oozie stored in DBService is damaged.</li><li id="ALM-17003__en-us_topic_0070543676_li29356354">The HDFS service is abnormal or the data of Oozie stored in HDFS is damaged.</li><li id="ALM-17003__en-us_topic_0070543676_li62880597">The Yarn service is abnormal.</li><li id="ALM-17003__en-us_topic_0070543676_li29054463">The Nodeagent process is abnormal.</li></ul>
|
|
</div>
|
|
<div class="section" id="ALM-17003__scf05d4c5628c45f288024b7193c3fd35"><h4 class="sectiontitle">Procedure</h4><p class="tableheading" id="ALM-17003__en-us_topic_0070543676_p4601284"><strong id="ALM-17003__b805121917829">Query the Oozie service health status code.</strong></p>
|
|
<ol id="ALM-17003__ol869733017847"><li id="ALM-17003__li5330620717821"><span>On the FusionInsight Manager portal, choose <strong id="ALM-17003__b15888103493917">Cluster</strong> > <em id="ALM-17003__i58021019193412">Name of the desired cluster</em> ><strong id="ALM-17003__b2180138193918">Services</strong> > <strong id="ALM-17003__b185661740123917">Oozie</strong>. Click <strong id="ALM-17003__b42301607558">oozie</strong> (any one is OK) on the <strong id="ALM-17003__b17554164819544">oozie WebUI</strong>. to go to the Oozie WebUI.</span><p><div class="note" id="ALM-17003__note840916461457"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="ALM-17003__en-us_topic_0193189480_p91833832915">By default, the <strong id="ALM-17003__en-us_topic_0193189480_b4780151814294">admin</strong> user does not have the permissions to manage other components. If the page cannot be opened or the displayed content is incomplete when you access the native UI of a component due to insufficient permissions, you can manually create a user with the permissions to manage that component.</p>
|
|
</div></div>
|
|
</p></li><li id="ALM-17003__li4341815317821"><span>Add <strong id="ALM-17003__b999382217821">/servicehealth</strong> to the URL in the address box of the browser and access again. The value of <strong id="ALM-17003__b2283553917821">statusCode</strong> is the current Oozie service health status code.</span><p><p class="litext" id="ALM-17003__p3773937117821">For example, visit <strong id="ALM-17003__b78754742514">https://10.10.0.117:2</strong><strong id="ALM-17003__b1287114712257">0026/Oozie/oozie/130/oozie/servicehealth</strong>. The result is as follows:</p>
|
|
<pre class="screen" id="ALM-17003__screen411002417821">{"beans":[{"name":"serviceStatus","statusCode":0}]}</pre>
|
|
<p class="litext" id="ALM-17003__p6447656217821">If the health status code cannot be displayed or the browser does not respond, the service may be unavailable due to Oozie process fault. See <a href="#ALM-17003__li3460735817821">13</a> to rectify the fault.</p>
|
|
</p></li><li id="ALM-17003__li841234417821"><span>Perform the operations based on the error code. For details, see <a href="#ALM-17003__table1418843217821">Table 1</a>.</span><p>
|
|
<div class="tablenoborder"><a name="ALM-17003__table1418843217821"></a><a name="table1418843217821"></a><table cellpadding="4" cellspacing="0" summary="" id="ALM-17003__table1418843217821" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Oozie service health status code</caption><thead align="left"><tr id="ALM-17003__row6357541217821"><th align="left" class="cellrowborder" valign="top" width="25%" id="mcps1.3.6.3.3.2.1.2.5.1.1"><p id="ALM-17003__p4063711917821">Status Code</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="25%" id="mcps1.3.6.3.3.2.1.2.5.1.2"><p id="ALM-17003__p327233917821">Description</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="25%" id="mcps1.3.6.3.3.2.1.2.5.1.3"><p id="ALM-17003__p6373293617821">Error Cause</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="25%" id="mcps1.3.6.3.3.2.1.2.5.1.4"><p id="ALM-17003__p6209421417821">Solution</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-17003__row4075492117821"><td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.6.3.3.2.1.2.5.1.1 "><p id="ALM-17003__p4933471417821">0</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.6.3.3.2.1.2.5.1.2 "><p id="ALM-17003__p3668889717821">The service is running properly.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.6.3.3.2.1.2.5.1.3 "><p id="ALM-17003__p1901067217821">None</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.6.3.3.2.1.2.5.1.4 "><p id="ALM-17003__p6346948817821">None</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-17003__row3073281017821"><td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.6.3.3.2.1.2.5.1.1 "><p id="ALM-17003__p3124997017821">18002</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.6.3.3.2.1.2.5.1.2 "><p id="ALM-17003__p4821960217821">The DBService service is abnormal.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.6.3.3.2.1.2.5.1.3 "><p id="ALM-17003__p1347366217821">Oozie fails to connect to DBService or the data stored in DBService is damaged.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.6.3.3.2.1.2.5.1.4 "><p id="ALM-17003__p2440604717821">See <a href="#ALM-17003__li5899993317821">4</a>.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-17003__row4980361617821"><td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.6.3.3.2.1.2.5.1.1 "><p id="ALM-17003__p815983917821">18003</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.6.3.3.2.1.2.5.1.2 "><p id="ALM-17003__p5696724617821">The HDFS service is abnormal.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.6.3.3.2.1.2.5.1.3 "><p id="ALM-17003__p5094420917821">Oozie fails to connect to HDFS or the data stored in HDFS is damaged.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.6.3.3.2.1.2.5.1.4 "><p id="ALM-17003__p2712700317821">See <a href="#ALM-17003__li6587172717821">7</a>.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-17003__row729189117821"><td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.6.3.3.2.1.2.5.1.1 "><p id="ALM-17003__p4557936617821">18005</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.6.3.3.2.1.2.5.1.2 "><p id="ALM-17003__p94113117821">The MapReduce service is abnormal.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.6.3.3.2.1.2.5.1.3 "><p id="ALM-17003__p912276417821">The Yarn service is abnormal.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="25%" headers="mcps1.3.6.3.3.2.1.2.5.1.4 "><p id="ALM-17003__p671805917821">See <a href="#ALM-17003__li6500500117821">11</a>.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-17003__p6058702817821"><strong id="ALM-17003__b109416711798">Check the DBService service.</strong></p>
|
|
<ol start="4" id="ALM-17003__ol646238717928"><li id="ALM-17003__li5899993317821"><a name="ALM-17003__li5899993317821"></a><a name="li5899993317821"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-17003__b13676135941912">Cluster </strong>> <em id="ALM-17003__i10401730123411">Name of the desired cluster</em> > <strong id="ALM-17003__b1746418715161">Services</strong>, and check whether the DBService service is running properly.</span><p><ul class="subitemlist" id="ALM-17003__ul655554817821"><li id="ALM-17003__li2569263117821">If yes, go to <a href="#ALM-17003__li2491190317821">6</a>.</li><li id="ALM-17003__li72839417821">If no, go to <a href="#ALM-17003__li6459530417821">5</a>.</li></ul>
|
|
</p></li><li id="ALM-17003__li6459530417821"><a name="ALM-17003__li6459530417821"></a><a name="li6459530417821"></a><span>Resolve the problem of DBService based on the alarm help and check whether the Oozie alarm is cleared.</span><p><ul class="subitemlist" id="ALM-17003__ul1463379617821"><li id="ALM-17003__li6123735017821">If yes, no further action is required.</li><li id="ALM-17003__li6127830017821">If no, go to <a href="#ALM-17003__li3980393617821">18</a>.</li></ul>
|
|
</p></li><li id="ALM-17003__li2491190317821"><a name="ALM-17003__li2491190317821"></a><a name="li2491190317821"></a><span>Log in to the Oozie database to check whether the data is complete.</span><p><ol class="subitemlist" type="a" id="ALM-17003__ol4836081517821"><li id="ALM-17003__li6483710317821">Log in to the active DBService node as user <strong id="ALM-17003__b4448682417821">root</strong>. <span id="ALM-17003__text10308168102719"></span><p id="ALM-17003__p3420164901912">On the FusionInsight Manager page, choose <strong id="ALM-17003__b1067117103204">Cluster </strong>> <em id="ALM-17003__i166711910172012">Name of the desired cluster</em> > <strong id="ALM-17003__b20671210172019">Services</strong> > <strong id="ALM-17003__b2902713142018">DBService > Instance</strong> to view the IP address of the active DBservice node.</p>
|
|
</li><li id="ALM-17003__li3190990817821">Run the following command to log in to the Oozie database:<p class="litext" id="ALM-17003__p1731401117821"><a name="ALM-17003__li3190990817821"></a><a name="li3190990817821"></a><strong id="ALM-17003__b4666302117821">su - omm</strong></p>
|
|
<p class="litext" id="ALM-17003__p6025768417821"><strong id="ALM-17003__b1333571210552">source ${BIGDATA_HOME}/FusionInsight_BASE_</strong><strong id="ALM-17003__b39166465105520"><span id="ALM-17003__text63480348105520">8.1.0.1</span></strong><strong id="ALM-17003__b5291254510552">/install/FusionInsight-dbservice-2.7.0/.dbservice_profile</strong></p>
|
|
<p class="litext" id="ALM-17003__p1100208517821"><strong id="ALM-17003__b544824917821">gsql -U </strong><em id="ALM-17003__i4903424117821">Username </em><strong id="ALM-17003__b3865499117821">-W </strong><em id="ALM-17003__i1235060317821">Oozie database password</em><strong id="ALM-17003__b4404656617821"> -p 20051 -d </strong><em id="ALM-17003__i6087477717821">Database name</em></p>
|
|
</li><li id="ALM-17003__li4265612617821">After the login is successful, enter <strong id="ALM-17003__b1875371717821">\d</strong> to check whether there are 15 data tables.<p id="ALM-17003__p3456573117821">The Oozie service has 15 data tables by default. If these data tables are deleted or the table structure is modified, the Oozie service may be unavailable. Contact the <span id="ALM-17003__text4614151421417">O&M personnel</span> to back up the data and perform restoration.</p>
|
|
</li></ol>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-17003__p3259415117821"><strong id="ALM-17003__b2365007017950">Check the HDFS service.</strong></p>
|
|
<ol start="7" id="ALM-17003__ol1621028217104"><li id="ALM-17003__li6587172717821"><a name="ALM-17003__li6587172717821"></a><a name="li6587172717821"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-17003__b19905111510165">Cluster</strong> > <em id="ALM-17003__i22181139368">Name of the desired cluster</em> > <strong id="ALM-17003__b149061215111610">Services</strong>, and check whether the HDFS service is running properly.</span><p><ul class="subitemlist" id="ALM-17003__ul3714524217821"><li id="ALM-17003__li4138412217821">If yes, go to <a href="#ALM-17003__li940532017821">9</a>.</li><li id="ALM-17003__li6377957217821">If no, go to <a href="#ALM-17003__li2988812617821">8</a>.</li></ul>
|
|
</p></li><li id="ALM-17003__li2988812617821"><a name="ALM-17003__li2988812617821"></a><a name="li2988812617821"></a><span>Resolve the problem of HDFS based on the alarm help and check whether the Oozie alarm is cleared.</span><p><ul class="subitemlist" id="ALM-17003__ul332090217821"><li id="ALM-17003__li5597463617821">If yes, no further action is required.</li><li id="ALM-17003__li3765169117821">If no, go to <a href="#ALM-17003__li3980393617821">18</a>.</li></ul>
|
|
</p></li><li id="ALM-17003__li940532017821"><a name="ALM-17003__li940532017821"></a><a name="li940532017821"></a><span>Log in to HDFS to check whether the Oozie file directory structure is complete.</span><p><ol class="subitemlist" type="a" id="ALM-17003__ol3199183217821"><li id="ALM-17003__li501915017821">Download and install an HDFS client..</li><li id="ALM-17003__li3338080917821">Log in to the client node as user <strong id="ALM-17003__b4517235217821">root</strong> and run the following commands to check whether <strong id="ALM-17003__b389798617821">/user/oozie/share</strong> exists. <span id="ALM-17003__text2026851212914"></span><p id="ALM-17003__p3508187717821">If the cluster uses the security mode, perform security authentication.</p>
|
|
<p id="ALM-17003__p2305980317821"><strong id="ALM-17003__b4730144317821">kinit admin</strong></p>
|
|
<p id="ALM-17003__p5590476117821"><strong id="ALM-17003__b621164017821">hdfs dfs -ls /user/oozie/share</strong></p>
|
|
</li></ol>
|
|
<ul id="ALM-17003__ul3832773717821"><li id="ALM-17003__li4120163217821">If yes, go to <a href="#ALM-17003__li3980393617821">18</a>.</li><li id="ALM-17003__li4899788017821">If no, go to <a href="#ALM-17003__li367846717821">10</a>.</li></ul>
|
|
</p></li><li id="ALM-17003__li367846717821"><a name="ALM-17003__li367846717821"></a><a name="li367846717821"></a><span>In the Oozie client installation directory, manually upload the share directory to <strong id="ALM-17003__b1753902117821">/user/oozie</strong> in HDFS, and check whether the alarm is cleared.</span><p><ul class="subitemlist" id="ALM-17003__ul4892717817821"><li id="ALM-17003__li2363346517821">If yes, no further action is required.</li><li id="ALM-17003__li3526251417821">If no, go to <a href="#ALM-17003__li3980393617821">18</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-17003__p3769142017821"><strong id="ALM-17003__b26941179171022">Check the Yarn and MapReduce service.</strong></p>
|
|
<ol start="11" id="ALM-17003__ol49961161113011"><li id="ALM-17003__li6500500117821"><a name="ALM-17003__li6500500117821"></a><a name="li6500500117821"></a><span>On the FusionInsight Manager portal, choose <strong id="ALM-17003__b1135631511365">Cluster > </strong><em id="ALM-17003__i16359915203619">Name of the desired cluster</em> > <strong id="ALM-17003__b173871342181616">Services</strong>, and check whether the Yarn and MapReduce services are running properly.</span><p><ul class="subitemlist" id="ALM-17003__ul722277717821"><li id="ALM-17003__li6435709017821">If yes, go to <a href="#ALM-17003__li3980393617821">18</a>.</li><li id="ALM-17003__li4554177317821">If no, go to <a href="#ALM-17003__li2196836817821">12</a>.</li></ul>
|
|
</p></li><li id="ALM-17003__li2196836817821"><a name="ALM-17003__li2196836817821"></a><a name="li2196836817821"></a><span>Resolve the problem of Yarn and MapReduce based on the alarm help and check whether the Oozie alarm is cleared.</span><p><ul class="subitemlist" id="ALM-17003__ul2098382617821"><li id="ALM-17003__li4817410117821">If yes, no further action is required.</li><li id="ALM-17003__li978807617821">If no, go to <a href="#ALM-17003__li3980393617821">18</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-17003__p5463671217821"><strong id="ALM-17003__b43016917113020">Check the Oozie process.</strong></p>
|
|
<ol start="13" id="ALM-17003__ol34866439113026"><li id="ALM-17003__li3460735817821"><a name="ALM-17003__li3460735817821"></a><a name="li3460735817821"></a><span>Log in to each node of Oozie as user <strong id="ALM-17003__b6349758517821">root</strong>. <span id="ALM-17003__text445092182920"></span></span></li><li id="ALM-17003__li2876472117821"><span>Run the <strong id="ALM-17003__b4303076617821">ps -ef | grep oozie</strong> command to check whether the Oozie process exists.</span><p><ul class="subitemlist" id="ALM-17003__ul4793532217821"><li id="ALM-17003__li6294004017821">If yes, go to <a href="#ALM-17003__li1524116517821">15</a>.</li><li id="ALM-17003__li6497847017821">If no, go to <a href="#ALM-17003__li3980393617821">18</a>.</li></ul>
|
|
</p></li><li id="ALM-17003__li1524116517821"><a name="ALM-17003__li1524116517821"></a><a name="li1524116517821"></a><span>Collect fault information in <strong id="ALM-17003__b5755589817821">prestartDetail.log</strong>, <strong id="ALM-17003__b4824104017821">oozie.log</strong>, and <strong id="ALM-17003__b3151617917821">catalina.out</strong> in the Oozie log directory <strong id="ALM-17003__b1521015617821">/var/log/Bigdata/oozie</strong>. If the alarm is not caused by manual misoperation, go to <a href="#ALM-17003__li3722887217821">16</a>.</span></li></ol>
|
|
<p class="tableheading" id="ALM-17003__p2406308417821"><strong id="ALM-17003__b717122017118">Check the Nodeagent process.</strong></p>
|
|
<ol start="16" id="ALM-17003__ol41541432171124"><li id="ALM-17003__li3722887217821"><a name="ALM-17003__li3722887217821"></a><a name="li3722887217821"></a><span>Log in to each node of Oozie as user <strong id="ALM-17003__b295276017821">root</strong>. Run the <strong id="ALM-17003__b2657484717821">ps -ef | grep nodeagent</strong> command to check whether the Nodeagent process exists.</span><p><ul class="subitemlist" id="ALM-17003__ul1159308117821"><li id="ALM-17003__li507898517821">If yes, go to <a href="#ALM-17003__li2866055917821">17</a>.</li><li id="ALM-17003__li874466017821">If no, go to <a href="#ALM-17003__li3980393617821">18</a>.</li></ul>
|
|
</p></li><li id="ALM-17003__li2866055917821"><a name="ALM-17003__li2866055917821"></a><a name="li2866055917821"></a><span>Run the <strong id="ALM-17003__b6662439217821">kill -9 </strong><em id="ALM-17003__i6274861917821">The process ID of nodeagent</em> command, wait 10 minutes, and check whether alarm is cleared.</span><p><ul class="subitemlist" id="ALM-17003__ul4792374917821"><li id="ALM-17003__li2786666717821">If yes, no further action is required.</li><li id="ALM-17003__li4260756317821">If no, go to <a href="#ALM-17003__li3980393617821">18</a>.</li></ul>
|
|
</p></li><li id="ALM-17003__li3980393617821"><a name="ALM-17003__li3980393617821"></a><a name="li3980393617821"></a><span>Contact the <span id="ALM-17003__text10854858112113">O&M personnel</span> and send the collected logs.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="ALM-17003__section1529716184534"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-17003__p4677152685316">After the fault is rectified, the system automatically clears this alarm.</p>
|
|
</div>
|
|
<div class="section" id="ALM-17003__s1c5d055ea5c74980a70253d5b8630699"><h4 class="sectiontitle">Related Information</h4><p id="ALM-17003__en-us_topic_0070543676_p62381196">None</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|