The system checks the HDFS DataNode Heap Memory usage every 30 seconds and compares the actual Heap Memory usage with the threshold. The HDFS DataNode Heap Memory usage has a default threshold. This alarm is generated when the HDFS DataNode Heap Memory usage exceeds the threshold.
You can change the threshold in O&M > Alarm > Thresholds > Name of the desired cluster > HDFS.
When the Trigger Count is 1, this alarm is cleared when the HDFS DataNode Heap Memory usage is less than or equal to the threshold. When the Trigger Count is greater than 1, this alarm is cleared when the HDFS DataNode Heap Memory usage is less than or equal to 90% of the threshold.
Alarm ID |
Alarm Severity |
Automatically Cleared |
---|---|---|
14008 |
Major |
Yes |
Name |
Meaning |
---|---|
Source |
Specifies the cluster for which the alarm is generated. |
ServiceName |
Specifies the service for which the alarm is generated. |
RoleName |
Specifies the role for which the alarm is generated. |
HostName |
Specifies the host for which the alarm is generated. |
Trigger condition |
Specifies the threshold triggering the alarm. If the current indicator value exceeds this threshold, the alarm is generated. |
The HDFS DataNode Heap Memory usage is too high, which affects the data read/write performance of the HDFS.
The HDFS DataNode Heap Memory is insufficient.
Delete unnecessary files.
If the cluster uses the security mode, perform security authentication.
Run the kinit hdfs command and enter the password as prompted. Obtain the password from the administrator.
Check the DataNode JVM memory usage and configuration.
By default, the admin user does not have the permissions to manage other components. If the page cannot be opened or the displayed content is incomplete when you access the native UI of a component due to insufficient permissions, you can manually create a user with the permissions to manage that component.
Adjust the configuration in the system.
The mapping between the average number of blocks of a DataNode instance and the DataNode memory is as follows:
Collect fault information.
After the fault is rectified, the system automatically clears this alarm.
None