By default, NameNode randomly selects a DataNode to write files. If the disk capacity of some DataNodes in a cluster is inconsistent (the total disk capacity of some nodes is large and of some nodes is small), the nodes with small disk capacity will be fully written. To resolve this problem, change the default disk selection policy for data written to DataNode to the available space block policy. This policy increases the probability of writing data blocks to the node with large available disk space. This ensures that the node usage is balanced when disk capacity of DataNodes is inconsistent.
The default replica storage policy of the NameNode is as follows:
If there are more replicas, randomly store them on other DataNodes.
The replica selection mechanism (org.apache.hadoop.hdfs.server.blockmanagement.AvailableSpaceBlockPlacementPolicy) is as follows:
The total disk capacity deviation of DataNodes in the cluster cannot exceed 100%.