Yang, Tong 3f5759eed2 MRS comp-lts 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2023-01-19 17:08:45 +00:00

3.0 KiB

Why Does an Error Occur During DataNode Capacity Calculation When Multiple data.dir Are Configured in a Partition?

Question

DataNode capacity count incorrect if several data.dir configured in one disk partition.

Answer

Currently calculation will be done based on the disk like df command in linux. Ideally user should not configure multiple directories for same disk which will be huge impact on performance where all data will go to one disk.

Hence it is always better to configure like below:

For example:

if the machine is having disks like following:

host-4:~ # df -h
Filesystem      Size    Used    Avail    Use%   Mounted on
/dev/sda1       352G   11G      324G    4%      /
udev              190G    252K   190G    1%      /dev
tmpfs             190G   72K      190G    1%     /dev/shm
/dev/sdb1       2.7T    74G      2.5T      3%    /data1
/dev/sdc1       2.7T    75G      2.5T      3%    /data2
/dev/sdd1       2.7T    73G      2.5T      3%    /data

Suggested way of configuration:

<property>
<name>dfs.datanode.data.dir</name>
<value>/data1/datadir/,/data2/datadir,/data3/datadir</value>
</property>

Following is not recommended:

<property>
<name>dfs.datanode.data.dir</name>
<value>/data1/datadir1/,/data2/datadir1,/data3/datadir1,/data1/datadir2/data1/datadir3,/data2/datadir2,/data2/datadir3,/data3/datadir2,/data3/datadir3</value>
</property>