The alarm module checks the HBase DR data synchronization status every 30 seconds. When disaster recovery (DR) data fails to be synchronized to a standby cluster, the alarm is triggered.
When DR data synchronization succeeds, the alarm is cleared.
If the multi-instance function is enabled in the cluster and multiple HBase service instances are installed, you need to determine the HBase service instance where the alarm is generated based on the value of ServiceName in Location. For example, if the HBase1 service is unavailable, ServiceName=HBase1 is displayed in Location, and the operation object in the procedure needs to be changed from HBase to HBase1.
Alarm ID |
Alarm Severity |
Automatically Cleared |
---|---|---|
19006 |
Critical |
Yes |
Name |
Meaning |
---|---|
Source |
Specifies the cluster for which the alarm is generated. |
ServiceName |
Specifies the service for which the alarm is generated. |
RoleName |
Specifies the role for which the alarm is generated. |
HostName |
Specifies the host for which the alarm is generated. |
Trigger Condition |
Specifies the threshold triggering the alarm. If the current indicator value exceeds this threshold, the alarm is generated. |
HBase data in a cluster fails to be synchronized to the standby cluster, causing data inconsistency between active and standby clusters.
Observe whether the system automatically clears the alarm.
Check the HBase service status of the standby cluster.
If the cluster uses a security mode, perform security authentication first and then access the hbase shell interface as user hbase.
cd /opt/client
source ./bigdata_env
kinit hbaseuser
The DR synchronization status of a node is as follows.
10-10-10-153: SOURCE: PeerID=abc, SizeOfLogQueue=0, ShippedBatches=2, ShippedOps=2, ShippedBytes=320, LogReadInBytes=1636, LogEditsRead=5, LogEditsFiltered=3, SizeOfLogToReplicate=0, TimeForLogToReplicate=0, ShippedHFiles=0, SizeOfHFileRefsQueue=0, AgeOfLastShippedOp=0, TimeStampsOfLastShippedOp=Mon Jul 18 09:53:28 CST 2016, Replication Lag=0, FailedReplicationAttempts=0 SOURCE: PeerID=abc1, SizeOfLogQueue=0, ShippedBatches=1, ShippedOps=1, ShippedBytes=160, LogReadInBytes=1636, LogEditsRead=5, LogEditsFiltered=3, SizeOfLogToReplicate=0, TimeForLogToReplicate=0, ShippedHFiles=0, SizeOfHFileRefsQueue=0, AgeOfLastShippedOp=16788, TimeStampsOfLastShippedOp=Sat Jul 16 13:19:00 CST 2016, Replication Lag=16788, FailedReplicationAttempts=5
In the preceding step, data on the faulty node 10-10-10-153 fails to be synchronized to a standby cluster whose PeerID is abc1.
PEER_ID CLUSTER_KEY STATE TABLE_CFS abc1 10.10.10.110,10.10.10.119,10.10.10.133:2181:/hbase2 ENABLED abc 10.10.10.110,10.10.10.119,10.10.10.133:2181:/hbase ENABLED
In the preceding information, /hbase2 indicates that data is synchronized to the HBase2 instance of the standby cluster.
Check network connections between RegionServers on active and standby clusters.
Collect fault information.
After the fault is rectified, the system automatically clears this alarm.
None