Yang, Tong 6182f91ba8 MRS component operation guide_normal 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-09 14:55:21 +00:00

20 lines
3.2 KiB
HTML

<a name="mrs_01_1709"></a><a name="mrs_01_1709"></a>
<h1 class="topictitle1">NameNode Fails to Be Restarted Due to EditLog Discontinuity</h1>
<div id="body1597735021617"><div class="section" id="mrs_01_1709__s3bd3ea1bdb6d481eba572bcfcb5b5bb7"><h4 class="sectiontitle">Question</h4><p id="mrs_01_1709__a962e8929c71c4a32a17b2878c161e41e">If a JournalNode server is powered off, the data directory disk is fully occupied, and the network is abnormal, the EditLog sequence number on the JournalNode is inconsecutive. In this case, the NameNode restart may fail.</p>
</div>
<div class="section" id="mrs_01_1709__s2e5a938eb53d403aaae03bc49dfdeb38"><h4 class="sectiontitle">Symptom</h4><p id="mrs_01_1709__p1338843943916">The NameNode fails to be restarted. The following error information is reported in the NameNode run logs:</p>
<p id="mrs_01_1709__p8388133916396"><span><img id="mrs_01_1709__image7264116133516" src="en-us_image_0000001349289573.png"></span></p>
</div>
<div class="section" id="mrs_01_1709__se41ab46cba2b41ffbfa1a61ca49bf38b"><h4 class="sectiontitle">Solution</h4><ol id="mrs_01_1709__ol14587174955118"><li id="mrs_01_1709__li115871749135119">Find the active NameNode before the restart, go to its data directory (you can obtain the directory, such as <strong id="mrs_01_1709__b26015016764027">/srv/BigData/namenode/current</strong> by checking the configuration item <strong id="mrs_01_1709__b158562506264027">dfs.namenode.name.dir</strong>), and obtain the sequence number of the latest FsImage file, as shown in the following figure:<p id="mrs_01_1709__p1158714915516"><span><img id="mrs_01_1709__image5587194935115" src="en-us_image_0000001348770293.png"></span></p>
</li><li id="mrs_01_1709__li45872494513">Check the data directory of each JournalNode (you can obtain the directory such as<strong id="mrs_01_1709__b130070628464027">/srv/BigData/journalnode/hacluster/current</strong> by checking the value of the configuration item <strong id="mrs_01_1709__b159614931764027">dfs.journalnode.edits.dir</strong>), and check whether the sequence number starting from that obtained in step 1 is consecutive in edits files. That is, you need to check whether the last sequence number of the previous edits file is consecutive with the first sequence number of the next edits file. (As shown in the following figure, edits_0000000000013259231-0000000000013259237 and edits_0000000000013259239-0000000000013259246 are not consecutive.)<p id="mrs_01_1709__p158714955112"><span><img id="mrs_01_1709__image9587124913511" src="en-us_image_0000001295930432.png"></span></p>
</li><li id="mrs_01_1709__li14587154985114">If the edits files are not consecutive, check whether the edits files with the related sequence number exist in the data directories of other JournalNodes or NameNode. If the edits files can be found, copy a consecutive segment to the JournalNode.</li><li id="mrs_01_1709__li1958817497519">In this way, all inconsecutive edits files are restored.</li><li id="mrs_01_1709__li20588144925112">Restart the NameNode and check whether the restart is successful. If the fault persists, contact technical support.</li></ol>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1690.html">FAQ</a></div>
</div>
</div>