Why does a new application fail if a NodeManager has been in unhealthy status for 10 minutes?
When nodeSelectPolicy is set to SEQUENCE and the first NodeManager connected to the ResourceManager is unavailable, the ResourceManager attempts to assign tasks to the same NodeManager in the period specified by yarn.nm.liveness-monitor.expiry-interval-ms.
You can use either of the following methods to avoid the preceding problem:
yarn.resourcemanager.am-scheduling.node-blacklisting-enabled = true;
yarn.resourcemanager.am-scheduling.node-blacklisting-disable-threshold = 0.5.