When the NameNode node is overloaded (100% of the CPU is occupied), the NameNode is unresponsive. The HDFS clients that are connected to the overloaded NameNode fail to run properly. However, the HDFS clients that are newly connected to the NameNode will be switched to a backup NameNode and run properly.
The default configuration must be used (as described in Table 1) when the error preceding described occurs: the keep alive mechanism is enabled for the RPC connection between the HDFS client and the NameNode. The keep alive mechanism will keep the HDFS client waiting for the response from server and prevent the connection from being out timed, causing the unresponsiveness of the HDFS client.
Perform the following operations to the unresponsive HDFS client:
Procedure:
Configure the following parameters in the core-site.xml file on the client.
Parameter |
Description |
Default Value |
---|---|---|
ipc.client.ping |
If the ipc.client.ping parameter is configured to true, the HDFS client will wait for the response from the server and periodically send the ping message to avoid disconnection caused by tcp timeout. If the ipc.client.ping parameter is configured to false, the HDFS client will set the value of ipc.ping.interval as the timeout time. If no response is received within that time, timeout occurs. To avoid the unresponsiveness of HDFS when the NameNode is overloaded for a long time, you are advised to set the parameter to false. |
true |
ipc.ping.interval |
If the value of ipc.client.ping is true, ipc.ping.interval indicates the interval between sending the ping messages. If the value of ipc.client.ping is false, ipc.ping.interval indicates the timeout time for connection. To avoid the unresponsiveness of HDFS when the NameNode is overloaded for a long time, you are advised to set the parameter to a large value, for example 900000 (unit ms) to avoid timeout when the server is busy. |
60000 |