Yang, Tong 6182f91ba8 MRS component operation guide_normal 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-09 14:55:21 +00:00

31 lines
6.3 KiB
HTML

<a name="mrs_01_1706"></a><a name="mrs_01_1706"></a>
<h1 class="topictitle1">Why are There Two Standby NameNodes After the active NameNode Is Restarted?</h1>
<div id="body1597735020372"><div class="section" id="mrs_01_1706__sf0597189331a41efbb030be1e23a08ae"><h4 class="sectiontitle">Question</h4><p id="mrs_01_1706__a3bd025bd788d419e81ad7b388b688c9c">Why are there two standby NameNodes after the active NameNode is restarted?</p>
<p id="mrs_01_1706__aee738fb892814a499fbd1665595fa9c8">When this problem occurs, check the ZooKeeper and ZooKeeper FC logs. You can find that the sessions used for the communication between the ZooKeeper server and client (ZKFC) are inconsistent. The session ID of the ZooKeeper server is <strong id="mrs_01_1706__b8404103343117">0x164cb2b3e4b36ae4</strong>, and the session ID of the ZooKeeper FC is <strong id="mrs_01_1706__b17932439123113">0x144cb2b3e4b36ae4</strong>. Such inconsistency means that the data interaction between the ZooKeeper server and ZKFC fails.</p>
<p id="mrs_01_1706__a220a79e8a2544003a818e7901f9d83b2">Content of the ZooKeeper log is as follows:</p>
<pre class="screen" id="mrs_01_1706__sf3592e3ab65c4c33b60a1e53f6ba993b">2015-04-15 21:24:54,257 | INFO | CommitProcessor:22 | Established session 0x164cb2b3e4b36ae4 with negotiated timeout 45000 for client /192.168.0.117:44586 | org.apache.zookeeper.server.ZooKeeperServer.finishSessionInit(ZooKeeperServer.java:623)
2015-04-15 21:24:54,261 | INFO | NIOServerCxn.Factory:192-168-0-114/192.168.0.114:2181 | Successfully authenticated client: authenticationID=hdfs/hadoop@<em id="mrs_01_1706__i1120962517100">&lt;System domain name&gt;</em>; authorizationID=hdfs/hadoop@<em id="mrs_01_1706__i18992104512437">&lt;System domain name&gt;</em>. | org.apache.zookeeper.server.auth.SaslServerCallbackHandler.handleAuthorizeCallback(SaslServerCallbackHandler.java:118)
2015-04-15 21:24:54,261 | INFO | NIOServerCxn.Factory:192-168-0-114/192.168.0.114:2181 | Setting authorizedID: hdfs/hadoop@<em id="mrs_01_1706__i1418865813432">&lt;System domain name&gt;</em> | org.apache.zookeeper.server.auth.SaslServerCallbackHandler.handleAuthorizeCallback(SaslServerCallbackHandler.java:134)
2015-04-15 21:24:54,261 | INFO | NIOServerCxn.Factory:192-168-0-114/192.168.0.114:2181 | adding SASL authorization for authorizationID: hdfs/hadoop@<em id="mrs_01_1706__i1218687441">&lt;System domain name&gt;</em> | org.apache.zookeeper.server.ZooKeeperServer.processSasl(ZooKeeperServer.java:1009)
2015-04-15 21:24:54,262 | INFO | ProcessThread(sid:22 cport:-1): | Got user-level KeeperException when processing <strong id="mrs_01_1706__a26a23c1cc62a42bab847d852c1f40f88">sessionid:0x164cb2b3e4b36ae4</strong> type:create cxid:0x3 zxid:0x20009fafc txntype:-1 reqpath:n/a Error Path:/hadoop-ha/hacluster/ActiveStandbyElectorLock Error:KeeperErrorCode = NodeExists for /hadoop-ha/hacluster/ActiveStandbyElectorLock | org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:648)</pre>
<p id="mrs_01_1706__a9b55631b1a3b409fad980871af937250">Content of the ZKFC log is as follows:</p>
<pre class="screen" id="mrs_01_1706__s7f88de17a54e4fc1a7a093f4d6bff603">2015-04-15 21:24:54,237 | INFO | main-SendThread(192-168-0-114:2181) | Socket connection established to 192-168-0-114/192.168.0.114:2181, initiating session | org.apache.zookeeper.ClientCnxn$SendThread.primeConnection(ClientCnxn.java:854)
2015-04-15 21:24:54,257 | INFO | main-SendThread(192-168-0-114:2181) | Session establishment complete on server 192-168-0-114/192.168.0.114:2181, <strong id="mrs_01_1706__ab4772012f9a347288e70df0408c2885e">sessionid = 0x144cb2b3e4b36ae4</strong> , negotiated timeout = 45000 | org.apache.zookeeper.ClientCnxn$SendThread.onConnected(ClientCnxn.java:1259)
2015-04-15 21:24:54,260 | INFO | main-EventThread | EventThread shut down | org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:512)
2015-04-15 21:24:54,262 | INFO | main-EventThread | Session connected. | org.apache.hadoop.ha.ActiveStandbyElector.processWatchEvent(ActiveStandbyElector.java:547)
2015-04-15 21:24:54,264 | INFO | main-EventThread | Successfully authenticated to ZooKeeper using SASL. | org.apache.hadoop.ha.ActiveStandbyElector.processWatchEvent(ActiveStandbyElector.java:573)</pre>
</div>
<div class="section" id="mrs_01_1706__sa6a84f4f6d3f4094970ae90722c19fcb"><h4 class="sectiontitle">Answer</h4><ul id="mrs_01_1706__uf5ff95d3f75b498293ce1d2063f450b3"><li id="mrs_01_1706__l18aff339ceb141bba26f97f39ce687b2">Cause Analysis<p id="mrs_01_1706__ade17620fc29a4054a5ec2de0a2ef896e"><a name="mrs_01_1706__l18aff339ceb141bba26f97f39ce687b2"></a><a name="l18aff339ceb141bba26f97f39ce687b2"></a>After the active NameNode restarts, the temporary node <strong id="mrs_01_1706__b161116914222847">/hadoop-ha/hacluster/ActiveStandbyElectorLock</strong> created on ZooKeeper is deleted. After the standby NameNode receives that information that the <strong id="mrs_01_1706__b131930505022847">/hadoop-ha/hacluster/ActiveStandbyElectorLock</strong> node is deleted, the standby NameNode creates the /<strong id="mrs_01_1706__b15763490922847">hadoop-ha/hacluster/ActiveStandbyElectorLock</strong> node in ZooKeeper in order to switch to the active NameNode. However, when the standby NameNode connects with ZooKeeper through the client ZKFC, the session ID of ZKFC differs from that of ZooKeeper due to network issues, overload CPU, or overload clusters. In this case, the watcher of the standby NameNode fails to detect that the temporary node has been successfully created, and fails to consider the standby NameNode as the active NameNode. After the original active NameNode restarts, it detects that the <strong id="mrs_01_1706__b74246944422847">/hadoop-ha/hacluster/ActiveStandbyElectorLock</strong> already exists and becomes the standby NameNode. Therefore, both NameNodes are standby NameNodes.</p>
</li></ul>
<ul id="mrs_01_1706__u4e53d2688e0842c6b7528693c96c90db"><li id="mrs_01_1706__l9a8570ed789b4616b713c4148c43b216">Solution<p id="mrs_01_1706__ab082c9143ca840428ecba7520922dfb5"><a name="mrs_01_1706__l9a8570ed789b4616b713c4148c43b216"></a><a name="l9a8570ed789b4616b713c4148c43b216"></a>You are advised to restart two ZKFCs of HDFS on FusionInsight Manager.</p>
</li></ul>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1690.html">FAQ</a></div>
</div>
</div>