forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
77 lines
8.2 KiB
HTML
77 lines
8.2 KiB
HTML
<a name="mrs_01_2108"></a><a name="mrs_01_2108"></a>
|
|
|
|
<h1 class="topictitle1">Why Do ZooKeeper Servers Fail to Start After Many znodes Are Created?</h1>
|
|
<div id="body1595905686225"><div class="section" id="mrs_01_2108__s29697258ba1d4e448b0dd531146172d6"><h4 class="sectiontitle">Question</h4><p id="mrs_01_2108__ae9d73d94d8b3419a83ced5b6553af7a5">After a large number of znodes are created, ZooKeeper servers in the ZooKeeper cluster become faulty and cannot be automatically recovered or restarted.</p>
|
|
<p id="mrs_01_2108__ae64b3c88e012489f949cc0998938c27d">Logs of followers:</p>
|
|
</div>
|
|
<pre class="screen" id="mrs_01_2108__s14e5592ed5d744cea0785b48dee6550b">2016-06-23 08:00:18,763 | WARN | QuorumPeer[myid=26](plain=/10.16.9.138:2181)(secure=disabled) | Exception when following the leader | org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:93)
|
|
java.net.SocketTimeoutException: Read timed out
|
|
at java.net.SocketInputStream.socketRead0(Native Method)
|
|
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
|
|
at java.net.SocketInputStream.read(SocketInputStream.java:170)
|
|
at java.net.SocketInputStream.read(SocketInputStream.java:141)
|
|
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
|
|
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
|
|
at java.io.DataInputStream.readInt(DataInputStream.java:387)
|
|
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
|
|
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
|
|
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
|
|
at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:156)
|
|
at org.apache.zookeeper.server.quorum.Learner.registerWithLeader(Learner.java:276)
|
|
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:75)
|
|
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1094)</pre>
|
|
<pre class="screen" id="mrs_01_2108__sec1a2f9bc92a4d28a6cff4c933adb79a">2016-06-23 08:00:18,764 | INFO | QuorumPeer[myid=26](plain=/10.16.9.138:2181)(secure=disabled) | shutdown called | org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:198)
|
|
java.lang.Exception: shutdown Follower
|
|
at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:198)
|
|
at org.apache.zookeeper.server.quorum.QuorumPeer.stopFollower(QuorumPeer.java:1141)
|
|
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1098)</pre>
|
|
<p id="mrs_01_2108__aa592663894d0402e90ca9c5b86dc356f">Logs of the leader:</p>
|
|
<pre class="screen" id="mrs_01_2108__sb97709de30d643aa92bc36eaa5749b66">2016-06-23 07:30:57,481 | WARN | QuorumPeer[myid=25](plain=/10.16.9.136:2181)(secure=disabled) | Unexpected exception | org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1108)
|
|
java.lang.InterruptedException: Timeout while waiting for epoch to be acked by quorum
|
|
at org.apache.zookeeper.server.quorum.Leader.waitForEpochAck(Leader.java:1221)
|
|
at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:487)
|
|
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1105)</pre>
|
|
<pre class="screen" id="mrs_01_2108__sbffdb5bebb104a58ba98949dfdd56242">2016-06-23 07:30:57,482 | INFO | QuorumPeer[myid=25](plain=/10.16.9.136:2181)(secure=disabled) | Shutdown called | org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:623)
|
|
java.lang.Exception: shutdown Leader! reason: Forcing shutdown
|
|
at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:623)
|
|
at org.apache.zookeeper.server.quorum.QuorumPeer.stopLeader(QuorumPeer.java:1149)
|
|
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1110)</pre>
|
|
<div class="section" id="mrs_01_2108__s9baaaacbe1ce4b91834d7fd99f6f3d4c"><h4 class="sectiontitle">Answer</h4><p id="mrs_01_2108__a516fcc2fcb9b4a3b9f6bca784444f3b2">After a large number of znodes are created, a large volume of data needs to be synchronized between the follower and leader. If the data synchronization is not complete within the specified time, all ZooKeeper servers fail to start.</p>
|
|
<p id="mrs_01_2108__a10796c6b99674320b7e7a58ee1502475">Go to the <strong id="mrs_01_2108__b796135295017">All Configurations</strong> page of the ZooKeeper service by referring to <a href="mrs_01_2125.html">Modifying Cluster Service Configuration Parameters</a>. To recover ZooKeeper servers, increase the values of <span class="parmname" id="mrs_01_2108__pda716e22c91f45028acf7cd3a6c49a6f"><b>syncLimit</b></span> and <span class="parmname" id="mrs_01_2108__p0b1814cec07845b8aae8f8011782484f"><b>initLimit</b></span> in the ZooKeeper configuration file <span class="filepath" id="mrs_01_2108__f1a943d07a05443b9ac63015c0cdec04a"><b>zoo.cfg</b></span> until ZooKeeper servers are successfully started.</p>
|
|
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="mrs_01_2108__td08917816a554d809d3d9ca0e61fa277" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Parameters</caption><thead align="left"><tr id="mrs_01_2108__r76d8e385e97c46bcb09e0b4c2d75027d"><th align="left" class="cellrowborder" valign="top" width="19.12%" id="mcps1.3.7.4.2.4.1.1"><p id="mrs_01_2108__acc54b3bb4c2f4709b03470735fb69ab1">Parameter</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="65.91%" id="mcps1.3.7.4.2.4.1.2"><p id="mrs_01_2108__a2739f360c4884a70b02dec736587c4a4">Description</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="14.97%" id="mcps1.3.7.4.2.4.1.3"><p id="mrs_01_2108__ac423d5a63da1469383c2a730a46e36bd">Default Value</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="mrs_01_2108__ra7688eee994344459b90d3db51134694"><td class="cellrowborder" valign="top" width="19.12%" headers="mcps1.3.7.4.2.4.1.1 "><p id="mrs_01_2108__a061a71dd3dd34733aa4148fa94b7738f">syncLimit</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="65.91%" headers="mcps1.3.7.4.2.4.1.2 "><p id="mrs_01_2108__a55cc9de4015245cb88b3c52f48ced905">Interval (unit: tick) at which data is synchronized between the follower and the leader. If the leader does not respond to the follower within the specified time, the connection between the leader and follower cannot be set up.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="14.97%" headers="mcps1.3.7.4.2.4.1.3 "><p id="mrs_01_2108__ae091b2e8c91d4a1facffe3d821625954">15</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_2108__r9a0d5b8c46c542f3a65d884908d9fbe1"><td class="cellrowborder" valign="top" width="19.12%" headers="mcps1.3.7.4.2.4.1.1 "><p id="mrs_01_2108__a01a87c013403446792c23d1cbe328f61">initLimit</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="65.91%" headers="mcps1.3.7.4.2.4.1.2 "><p id="mrs_01_2108__af13929294085467cb3031af0bf47438e">Interval (unit: tick) within which the connection and synchronization between the follower and leader must be completed.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="14.97%" headers="mcps1.3.7.4.2.4.1.3 "><p id="mrs_01_2108__ae5bf0817cdf9421fb360264d7f14ec2b">15</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
<p id="mrs_01_2108__adec73f5d688b4510a2900c568f173f4b">If ZooKeeper servers do not recover even after <span class="parmname" id="mrs_01_2108__pc74e934e12a1440199b033bd624ab663"><b>initLimit</b></span> and <span class="parmname" id="mrs_01_2108__pa134119071594aa8bb31099ae758293f"><b>syncLimit</b></span> are set to <span class="parmvalue" id="mrs_01_2108__p65acdc1772cf4648a1701e0ecc1d7a0e"><b>300</b></span> ticks, check that no other application is killing the ZooKeeper. For example, if the parameter value is <span class="parmvalue" id="mrs_01_2108__p97e39734e4d54ba9a6324502131926b4"><b>300</b></span> and the ticket duration is 2000 ms, the maximum synchronization duration is 600s (300 x 2000 ms).</p>
|
|
<p id="mrs_01_2108__a09b8359b02cf466fa446448d1d928527">There may exist the situation where an overwhelming amount of data is created in ZooKeeper and it takes long to synchronize data between the follower and the leader and to save data to the hard disk. This means that ZooKeeper needs to run for a long time. Ensure that no other monitoring application kills the ZooKeeper while ZooKeeper is running.</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_2107.html">Common Issues About ZooKeeper</a></div>
|
|
</div>
|
|
</div>
|
|
|