Yang, Tong 6182f91ba8 MRS component operation guide_normal 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-09 14:55:21 +00:00

62 lines
7.0 KiB
HTML

<a name="mrs_01_1655"></a><a name="mrs_01_1655"></a>
<h1 class="topictitle1">Why Messages Containing FileNotFoundException and no lease Are Frequently Displayed in the HMaster Logs During the WAL Splitting Process?</h1>
<div id="body1596003895088"><div class="section" id="mrs_01_1655__s0eac6650fac740c4867316cdffdaaad1"><h4 class="sectiontitle">Question</h4><p id="mrs_01_1655__aaa2c6d54135f4f11919f267c7a4e2aca">Why messages containing FileNotFoundException and no lease are frequently displayed in the HMaster logs during the WAL splitting process?</p>
<pre class="screen" id="mrs_01_1655__s00c7b4376458444fa4d066d435dd0712">2017-06-10 09:50:27,586 | ERROR | split-log-closeStream-2 | Couldn't close log at hdfs://hacluster/hbase/data/default/largeT1/2b48346d087275fe751fc049334fda93/recovered.edits/0000000000000000000.temp | org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink$2.call(WALSplitter.java:1330)
java.io.FileNotFoundException: No lease on /hbase/data/default/largeT1/2b48346d087275fe751fc049334fda93/recovered.edits/0000000000000000000.temp (inode 1092653): File does not exist. [Lease. Holder: DFSClient_NONMAPREDUCE_1202985678_1, pendingcreates: 1936]
?at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3432)
?at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3223)
?at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3057)
?at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3011)
?at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:842)
?at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:526)
?at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
?at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
?at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:973)
?at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2260)
?at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2256)
?at java.security.AccessController.doPrivileged(Native Method)
?at javax.security.auth.Subject.doAs(Subject.java:422)
?at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1769)
?at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2254)
?at sun.reflect.GeneratedConstructorAccessor40.newInstance(Unknown Source)
?at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
?at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
?at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
?at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
?at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1842)
?at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1639)
?at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:665)</pre>
</div>
<div class="section" id="mrs_01_1655__s96dd62f408954e2b80a1c41bedfca029"><h4 class="sectiontitle">Answer</h4><p id="mrs_01_1655__a2146f74a3b2543c7ab7cb5d4d7eddd95">During the WAL splitting process, the WAL splitting timeout period is specified by the <span class="parmname" id="mrs_01_1655__parmname158630236055043"><b>hbase.splitlog.manager.timeout</b></span> parameter. If the WAL splitting process fails to complete within the timeout period, the task is submitted again. Multiple WAL splitting tasks may be submitted during a specified period. If the <strong id="mrs_01_1655__b14014540555043">temp</strong> file is deleted when one WAL splitting task completes, other tasks cannot find the file and the FileNotFoudException exception is reported. To avoid the problem, perform the following modifications:</p>
<p id="mrs_01_1655__af33f13037e824319864d436055bdb852">The default value of <span class="parmname" id="mrs_01_1655__parmname154030586055043"><b>hbase.splitlog.manager.timeout</b></span> is 600,000 ms. The cluster specification is that each RegionServer has 2,000 to 3,000 regions. When the cluster is normal (HBase is normal and HDFS does not have a large number of read and write operations), you are advised to adjust this parameter based on the cluster specifications. If the actual specifications (the actual average number of regions on each RegionServer) are greater than the default specifications (the default average number of regions on each RegionServer, that is, 2,000), the adjustment solution is (actual specifications/default specifications) x Default time.</p>
<p id="mrs_01_1655__ab2ffcdb6b89445b1ace73cf7005a3389">Set the <strong id="mrs_01_1655__b70275479355043">splitlog</strong> parameter in the <span class="filepath" id="mrs_01_1655__filepath109380410355043"><b>hbase-site.xml</b></span> file on the server. <a href="#mrs_01_1655__td061a2527dd94860b0b6d9989d7fd9ee">Table 1</a> describes the parameter.</p>
<div class="tablenoborder"><a name="mrs_01_1655__td061a2527dd94860b0b6d9989d7fd9ee"></a><a name="td061a2527dd94860b0b6d9989d7fd9ee"></a><table cellpadding="4" cellspacing="0" summary="" id="mrs_01_1655__td061a2527dd94860b0b6d9989d7fd9ee" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Description of the <strong id="mrs_01_1655__b82326037355043">splitlog</strong> parameter</caption><thead align="left"><tr id="mrs_01_1655__rb73a1c19817848cc8d637a2cad0e2df8"><th align="left" class="cellrowborder" valign="top" width="18.86%" id="mcps1.3.2.5.2.4.1.1"><p id="mrs_01_1655__ad1021f74e68044aab92d5711aa80b143"><strong id="mrs_01_1655__aacdf4422109140f5a0c4b781fdbe8eb0">Parameter</strong></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="57.21000000000001%" id="mcps1.3.2.5.2.4.1.2"><p id="mrs_01_1655__a87557042b611415e8d1e4b233bb595c0"><strong id="mrs_01_1655__a74b792dbcbf841db80a5efcc575694eb">Description</strong></p>
</th>
<th align="left" class="cellrowborder" valign="top" width="23.93%" id="mcps1.3.2.5.2.4.1.3"><p id="mrs_01_1655__a8d32a4a805984a35b0aff71ed22639d7"><strong id="mrs_01_1655__abd65f601d52b4035849c9e92de255e7b">Default Value</strong></p>
</th>
</tr>
</thead>
<tbody><tr id="mrs_01_1655__r55a86f803f21406f823979aa6d30c948"><td class="cellrowborder" valign="top" width="18.86%" headers="mcps1.3.2.5.2.4.1.1 "><p id="mrs_01_1655__ae8da6648cab74696a1aeec64204d5170">hbase.splitlog.manager.timeout</p>
</td>
<td class="cellrowborder" valign="top" width="57.21000000000001%" headers="mcps1.3.2.5.2.4.1.2 "><p id="mrs_01_1655__p102861459171011">Timeout period for receiving worker response by the distributed SplitLog management program.</p>
</td>
<td class="cellrowborder" valign="top" width="23.93%" headers="mcps1.3.2.5.2.4.1.3 "><p id="mrs_01_1655__a41e43df7f2874bb7b30115257858d96a">600000</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1638.html">Common Issues About HBase</a></div>
</div>
</div>