Why does HMaster exit due to timeout when waiting for the namespace table to go online?
During the HMaster active/standby switchover or startup, HMaster performs WAL splitting and region recovery for the RegionServer that failed or was stopped previously.
Multiple threads are running in the background to monitor the HMaster startup process.
This is a help class, which is used to manage the allocation of namespace tables and monitoring table regions during HMaster active/standby switchover or startup. If the namespace table is not online within the specified time (hbase.master.namespace.init.timeout, which is 3,600,000 ms by default), the thread terminates HMaster abnormally.
This is an initialization thread monitoring class of the primary HMaster, which is used to monitor the initialization of the primary HMaster. If a thread fails to be initialized within the specified time (hbase.master.initializationmonitor.timeout, which is 3,600,000 ms by default), the thread terminates HMaster abnormally. If hbase.master.initializationmonitor.haltontimeout is started, the default value is false.
During the HMaster active/standby switchover or startup, if the WAL hlog file exists, the WAL splitting task is initialized. If the WAL hlog splitting task is complete, it initializes the table region allocation task.
HMaster uses ZooKeeper to coordinate log splitting tasks and valid RegionServers and track task development. If the primary HMaster exits during the log splitting task, the new primary HMaster attempts to resend the unfinished task, and RegionServer starts the log splitting task from the beginning.
The initialization of the HMaster is delayed due to the following reasons:
In the preceding scenarios, you are advised to add the following configuration parameters to enable HMaster to complete the restoration task earlier. Otherwise, the Master will exit, causing a longer delay of the entire restoration process.
hbase.master.namespace.init.timeout (default value: 3,600,000 ms)
hbase.regionserver.wal.max.splitters (default value: 2)