Yang, Tong 6182f91ba8 MRS component operation guide_normal 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-09 14:55:21 +00:00

31 lines
6.0 KiB
HTML

<a name="mrs_01_1645"></a><a name="mrs_01_1645"></a>
<h1 class="topictitle1">Why Does HMaster Exits Due to Timeout When Waiting for the Namespace Table to Go Online?</h1>
<div id="body1596003894349"><div class="section" id="mrs_01_1645__s22112062b24b484ab4585a2d0122844d"><h4 class="sectiontitle">Question</h4><p id="mrs_01_1645__af358e378eb5b489f8e113ddb285746e1">Why does HMaster exit due to timeout when waiting for the namespace table to go online?</p>
</div>
<div class="section" id="mrs_01_1645__s37da9f0c51ea4f3c95156c39e55f1132"><h4 class="sectiontitle">Answer</h4><p id="mrs_01_1645__afa7c7d521e8f40249af7c57b142c4a01">During the HMaster active/standby switchover or startup, HMaster performs WAL splitting and region recovery for the RegionServer that failed or was stopped previously.</p>
<p id="mrs_01_1645__aff4de7b25eb648ee9b85e3a83131db0c">Multiple threads are running in the background to monitor the HMaster startup process.</p>
<ul id="mrs_01_1645__u4a1bc1fdcf4c474b9f342172d47bc338"><li id="mrs_01_1645__l5aa1146bc1dd402c87a4387b97171ccd">TableNamespaceManager<p id="mrs_01_1645__a3c14287e8aca46c98abdac86e73bebe1"><a name="mrs_01_1645__l5aa1146bc1dd402c87a4387b97171ccd"></a><a name="l5aa1146bc1dd402c87a4387b97171ccd"></a>This is a help class, which is used to manage the allocation of namespace tables and monitoring table regions during HMaster active/standby switchover or startup. If the namespace table is not online within the specified time (<strong id="mrs_01_1645__b67311633131110">hbase.master.namespace.init.timeout</strong>, which is 3,600,000 ms by default), the thread terminates HMaster abnormally.</p>
</li></ul>
<ul id="mrs_01_1645__u4a3c0d99babc4b88a5a4fb254be4bf4b"><li id="mrs_01_1645__l174d8012e3a844758665c805316359a7">InitializationMonitor<p id="mrs_01_1645__ae6cceefd66ee44a98411fd1e7a82f01f"><a name="mrs_01_1645__l174d8012e3a844758665c805316359a7"></a><a name="l174d8012e3a844758665c805316359a7"></a>This is an initialization thread monitoring class of the primary HMaster, which is used to monitor the initialization of the primary HMaster. If a thread fails to be initialized within the specified time (<strong id="mrs_01_1645__b16402017155518">hbase.master.initializationmonitor.timeout</strong>, which is 3,600,000 ms by default), the thread terminates HMaster abnormally. If <strong id="mrs_01_1645__b12599142611554">hbase.master.initializationmonitor.haltontimeout</strong> is started, the default value is <strong id="mrs_01_1645__b545243490105148">false</strong>.</p>
</li></ul>
<p id="mrs_01_1645__a10c9e53a73984e49ab8582e3488882f2">During the HMaster active/standby switchover or startup, if the <strong id="mrs_01_1645__b1786112421105148">WAL hlog</strong> file exists, the WAL splitting task is initialized. If the WAL hlog splitting task is complete, it initializes the table region allocation task.</p>
<p id="mrs_01_1645__a09df7ba10d7d4d4089e4b5b5d579981a">HMaster uses ZooKeeper to coordinate log splitting tasks and valid RegionServers and track task development. If the primary HMaster exits during the log splitting task, the new primary HMaster attempts to resend the unfinished task, and RegionServer starts the log splitting task from the beginning.</p>
<p id="mrs_01_1645__ad4fb418847354b8a9eb7a1fa1e95334a">The initialization of the HMaster is delayed due to the following reasons:</p>
<ul id="mrs_01_1645__ud383f46b9b2848b9a72cb357dd698400"><li id="mrs_01_1645__l72316bd5377a477daaac28b498a5da93">Network faults occur intermittently.</li><li id="mrs_01_1645__l5a1f7573d17349f59d311e96427eb993">Disks run into bottlenecks.</li><li id="mrs_01_1645__l04fe959da8b7443d9901ab7e93d5ae00">The log splitting task is overloaded, and RegionServer runs slowly.</li><li id="mrs_01_1645__l8a8d463bcc754d649411046a699fdae7">RegionServer (region opening) responds slowly.</li></ul>
<p id="mrs_01_1645__ab3ab3e2461314747b6d414dcda70e958">In the preceding scenarios, you are advised to add the following configuration parameters to enable HMaster to complete the restoration task earlier. Otherwise, the Master will exit, causing a longer delay of the entire restoration process.</p>
<ul id="mrs_01_1645__u58968dd714f24518b21a30f3ec8f3be0"><li id="mrs_01_1645__l1ce0f1c1a10e4f12a9fc6feb8fc0f779">Increase the online waiting timeout period of the namespace table to ensure that the Master has enough time to coordinate the splitting tasks of the RegionServer worker and avoid repeated tasks.<p id="mrs_01_1645__acc9bb42438854082998012eb5c922d3d"><a name="mrs_01_1645__l1ce0f1c1a10e4f12a9fc6feb8fc0f779"></a><a name="l1ce0f1c1a10e4f12a9fc6feb8fc0f779"></a><span class="parmname" id="mrs_01_1645__paca571fd5a3844d9b960dc2247d287cc"><b>hbase.master.namespace.init.timeout</b></span> (default value: 3,600,000 ms)</p>
</li></ul>
<ul id="mrs_01_1645__ufb54491893674f85a49310d24581b1e9"><li id="mrs_01_1645__lcafcc3ff04ad4aafb18541e8fdf8b472">Increase the number of concurrent splitting tasks through RegionServer worker to ensure that RegionServer worker can process splitting tasks in parallel (RegionServers need more cores). Add the following parameters to <em id="mrs_01_1645__i7556144718582">Client installation path</em> <strong id="mrs_01_1645__b1951185495810">/HBase/hbase/conf/hbase-site.xml</strong>:<p id="mrs_01_1645__aad00a001c2654d279600dbb0006c7dab"><span class="parmname" id="mrs_01_1645__p7cb84a6c823e4687899eeb06e2cd2c5d"><b>hbase.regionserver.wal.max.splitters</b></span> (default value: 2)</p>
</li></ul>
<ul id="mrs_01_1645__u105d3452e4704870a2e2ab9d1a78d2b7"><li id="mrs_01_1645__lee3f3ef9db6248cf9f9d1d9b05049a62">If all restoration processes require time, increase the timeout period for initializing the monitoring thread.<p id="mrs_01_1645__a6acb411d5ad74ca4a4e44f6d4d04166d"><a name="mrs_01_1645__lee3f3ef9db6248cf9f9d1d9b05049a62"></a><a name="lee3f3ef9db6248cf9f9d1d9b05049a62"></a><strong id="mrs_01_1645__b14288196195917">hbase.master.initializationmonitor.timeout</strong> (default value: 3,600,000 ms)</p>
</li></ul>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1638.html">Common Issues About HBase</a></div>
</div>
</div>