Optimizing HDFS NameNode RPC QoS

Scenarios

Several finished Hadoop clusters are faulty because the NameNode is overloaded and unresponsive.

Such problem is caused by the initial design of Hadoop: In Hadoop, the NameNode functions as an independent part and in its namespace coordinates various HDFS operations, including obtaining the data block location, listing directories, and creating files. The NameNode receives HDFS operations, regards them as RPC calls, and places them in the FIFO call queue for read threads to process. Requests in FIFO call queue are served first-in first-out. However, users who perform more I/O operations are served more time than those performing fewer I/O operations. In this case, the FIFO is unfair and causes the delay.

Figure 1 NameNode request processing based on the FIFO call queue

The unfair problem and delaying mentioned before can be improved by replacing the FIFO queue with a new type of queue called FairCallQueue. In this way, FAIR queues assign incoming RPC calls to multiple queues based on the scale of the caller's call. The scheduling module tracks the latest calls and assigns a higher priority to users with a smaller number of calls.

Figure 2 NameNode request processing based on FAIRCallQueue

Configuration Description

  • <port> indicates the RPC port configured on the NameNode.
  • The backoff function based on the response time takes effect only when ipc.<port> .backoff.enable is set to true.