This section applies to MRS 3.x or later.
Several finished Hadoop clusters are faulty because the NameNode is overloaded and unresponsive.
Such problem is caused by the initial design of Hadoop: In Hadoop, the NameNode functions as an independent part and in its namespace coordinates various HDFS operations, including obtaining the data block location, listing directories, and creating files. The NameNode receives HDFS operations, regards them as RPC calls, and places them in the FIFO call queue for read threads to process. Requests in FIFO call queue are served first-in first-out. However, users who perform more I/O operations are served more time than those performing fewer I/O operations. In this case, the FIFO is unfair and causes the delay.
The unfair problem and delaying mentioned before can be improved by replacing the FIFO queue with a new type of queue called FairCallQueue. In this way, FAIR queues assign incoming RPC calls to multiple queues based on the scale of the caller's call. The scheduling module tracks the latest calls and assigns a higher priority to users with a smaller number of calls.
After the FairCallQueue is configured, the control module determines the sub-queue to which the received invoking is allocated. The current scheduling module is DecayRpcScheduler, which only continuously tracks the priority numbers of various calls and periodically reduces these numbers.
Go to the All Configurations page of HDFS and enter a parameter name in the search box by referring to Modifying Cluster Service Configuration Parameters.
Parameter |
Description |
Default Value |
---|---|---|
ipc.<port>.callqueue.impl |
Specifies the queue implementation class. You need to run the org.apache.hadoop.ipc.FairCallQueue command to enable the QoS feature. |
java.util.concurrent.LinkedBlockingQueue |
Backoff is one of the FairCallQueue functions. It requires the client to retry operations (such as creating, deleting, and opening a file) after a period of time. When the backoff occurs, the RCP server throws RetriableException. The FairCallQueue performs backoff in either of the following cases:
Parameter |
Description |
Default Value |
---|---|---|
ipc.<port>.backoff.enable |
Specifies whether to enable the backoff. When the current application contains a large number of user callings, the RPC request is blocked if the connection limit of the operating system is not reached. Alternatively, when the RPC or NameNode is heavily loaded, some explicit exceptions can be thrown back to the client based on certain policies. The client can understand these exceptions and perform exponential rollback, which is another implementation of the RetryInvocationHandler class. |
false |
ipc.<port>.decay-scheduler.backoff.responsetime.enable |
Indicate whether to enable the backoff based on the average queue response time. |
false |
ipc.<port>.decay-scheduler.backoff.responsetime.thresholds |
Configure the response time threshold for each queue. The response time threshold must match the number of priorities (the value of ipc.<port> .faircallqueue.priority-levels). Unit: millisecond |
10000,20000,30000,40000 |