forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
138 lines
28 KiB
HTML
138 lines
28 KiB
HTML
<a name="mrs_01_1676"></a><a name="mrs_01_1676"></a>
|
|
|
|
<h1 class="topictitle1">Configuring HDFS NodeLabel</h1>
|
|
<div id="body1595904095702"><div class="section" id="mrs_01_1676__s9a54debc4c2942c19903752c157ec5d0"><h4 class="sectiontitle">Scenario</h4><p id="mrs_01_1676__a221e73e8e04142638864d7b725f7e7a8">You need to configure the nodes for storing HDFS file data blocks based on data features. You can configure a label expression to an HDFS directory or file and assign one or more labels to a DataNode so that file data blocks can be stored on specified DataNodes.</p>
|
|
<p id="mrs_01_1676__a1b7157abc7b24ba08e60fedc5c330d00">If the label-based data block placement policy is used for selecting DataNodes to store the specified files, the DataNode range is specified based on the label expression. Then proper nodes are selected from the specified range.</p>
|
|
<div class="note" id="mrs_01_1676__note10291110184210"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="mrs_01_1676__p0919719172411">This section applies to MRS 3.<em id="mrs_01_1676__i157479726252323">x</em> or later.</p>
|
|
<p id="mrs_01_1676__p292021911240">After cross-AZ HA is enabled for a single cluster, the HDFS NodeLabel function cannot be configured.</p>
|
|
</div></div>
|
|
<ul id="mrs_01_1676__ud147556408ff49adb7392b695f0fbf2b"><li id="mrs_01_1676__lb9f33c2680534802af21469ec7838aeb">Scenario 1: DataNodes partitioning scenario<p id="mrs_01_1676__a8fbef87c3c8540aab9c9486881edff39"><a name="mrs_01_1676__lb9f33c2680534802af21469ec7838aeb"></a><a name="lb9f33c2680534802af21469ec7838aeb"></a>Scenario description:</p>
|
|
<p id="mrs_01_1676__a59aace3f3f8b4a0fb929c23b403549af">When different application data is required to run on different nodes for separate management, label expressions can be used to achieve separation of different services, storing specified services on corresponding nodes.</p>
|
|
<p id="mrs_01_1676__a36ba7992fe574cd48ebffcdefa77a389">By configuring the NodeLabel feature, you can perform the following operations:</p>
|
|
<ul id="mrs_01_1676__u6c5e796a000a4a89932288ccbf7874d7"><li id="mrs_01_1676__l3c65294d6e924614a9e119744d5df72f">Store data in <strong id="mrs_01_1676__b135415235752323">/HBase</strong> to DN1, DN2, DN3, and DN4.</li><li id="mrs_01_1676__l58c6831ceb504a0db925f460783d0711">Store data in <strong id="mrs_01_1676__b85565772652323">/Spark</strong> to DN5, DN6, DN7, and DN8.</li></ul>
|
|
<div class="fignone" id="mrs_01_1676__f29094c7c7de94c108e1f8ddea541eab7"><a name="mrs_01_1676__f29094c7c7de94c108e1f8ddea541eab7"></a><a name="f29094c7c7de94c108e1f8ddea541eab7"></a><span class="figcap"><b>Figure 1 </b>DataNode partitioning scenario</span><br><span><img id="mrs_01_1676__image146210219261" src="en-us_image_0000001295930800.png"></span></div>
|
|
<div class="note" id="mrs_01_1676__ndee557d4238e4c30844a80bea021cc10"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ul id="mrs_01_1676__u3da7ea774e0645999e33220d44ec9130"><li id="mrs_01_1676__l340086f9abbd48078303ea4d758089ff">Run the <b><span class="cmdname" id="mrs_01_1676__cmdname15772194120151">hdfs nodelabel -setLabelExpression -expression 'LabelA[fallback=NONE]' -path /Hbase</span></b> command to set an expression for the <strong id="mrs_01_1676__b4654638164914">Hbase</strong> directory. As shown in <a href="#mrs_01_1676__f29094c7c7de94c108e1f8ddea541eab7">Figure 1</a>, the data block replicas of files in the <span class="filepath" id="mrs_01_1676__fb5b0eb2f78b145d6947ac97a9d14ca7a"><b>/Hbase</b></span> directory are placed on the nodes labeled with the <strong id="mrs_01_1676__b116249375952323">LabelA</strong>, that is, DN1, DN2, DN3, and DN4. Similarly, run the <b><span class="cmdname" id="mrs_01_1676__cmdname12573182481611">hdfs nodelabel -setLabelExpression -expression 'LabelB[fallback=NONE]' -path /Spark</span></b> command to set an expression for the Spark directory. Data block replicas of files in the <span class="filepath" id="mrs_01_1676__f90528509fc544633b368b14583607996"><b>/Spark</b></span> directory can be placed only on nodes labeled with <strong id="mrs_01_1676__b102025484852323">LabelB</strong>, that is, DN5, DN6, DN7, and DN8.</li><li id="mrs_01_1676__le9f01a0a87024315b95ecaf95e59b12c">For details about how to set labels for a data node, see <a href="#mrs_01_1676__s7752fba8102e4f20ae2c86f564e2114c">Configuration Description</a>.</li><li id="mrs_01_1676__l0c7740c1a43845bbb55d1df30fa91718">If multiple racks are available in one cluster, it is recommended that DataNodes of these racks should be available under each label, to ensure reliability of data block placement.</li></ul>
|
|
</div></div>
|
|
</li></ul>
|
|
<ul id="mrs_01_1676__ue37fc79670bf4e4a8b6771c36b737d41"><li id="mrs_01_1676__lac5b1000cbb04748bc0612789ffedbf5">Scenario 2: Specifying replica location when there are multiple racks<p id="mrs_01_1676__ac35740942c0645c5a41cb6472c4e4bee"><a name="mrs_01_1676__lac5b1000cbb04748bc0612789ffedbf5"></a><a name="lac5b1000cbb04748bc0612789ffedbf5"></a>Scenario description:</p>
|
|
<p id="mrs_01_1676__af346b4cf5aa94e9b8cbe531c80b16ae2">In a heterogeneous cluster, customers need to allocate certain nodes with high availability to store important commercial data. Label expressions can be used to specify replica location so that the replica can be placed on a high reliable node.</p>
|
|
<p id="mrs_01_1676__a8ca864bd05b74706994c64eb267698a7">Data blocks in the <span class="filepath" id="mrs_01_1676__fdb510c646cbc4c75bf15cc66cb4be16a"><b>/data</b></span> directory have three replicas by default. In this case, at least one replica is stored on a node of RACK1 or RACK2 (nodes of RACK1 and RACK2 are high reliable), and the other two are stored separately on the nodes of RACK3 and RACK4.</p>
|
|
<div class="fignone" id="mrs_01_1676__fc11a32eb02734547a8059541295c9ee2"><span class="figcap"><b>Figure 2 </b>Scenario example</span><br><span><img id="mrs_01_1676__image6881139162619" src="en-us_image_0000001349170365.png"></span></div>
|
|
<div class="note" id="mrs_01_1676__nb48a7c782b184067bf8c11ef72fb4fb8"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="mrs_01_1676__a46bea0b762f141379b76c568d77429e1">Run the <b><span class="cmdname" id="mrs_01_1676__cmdname1121812301209">hdfs nodelabel -setLabelExpression -expression 'LabelA||LabelB[fallback=NONE],LabelC,LabelD' -path /data</span></b> command to set an expression for the <span class="filepath" id="mrs_01_1676__fd0093a12c6744f03873924c934030761"><b>/data</b></span> directory.</p>
|
|
<p id="mrs_01_1676__ac665cc8654d5492eab419a303c544f05">When data is to be written to the <span class="filepath" id="mrs_01_1676__f0bdfae0bf69341d1afd20a5572c14c48"><b>/data</b></span> directory, at least one data block replica is stored on a node labeled with the LabelA or LabelB, and the other two data block replicas are stored separately on the nodes labeled with the LabelC and LabelD.</p>
|
|
</div></div>
|
|
</li></ul>
|
|
</div>
|
|
<div class="section" id="mrs_01_1676__s7752fba8102e4f20ae2c86f564e2114c"><a name="mrs_01_1676__s7752fba8102e4f20ae2c86f564e2114c"></a><a name="s7752fba8102e4f20ae2c86f564e2114c"></a><h4 class="sectiontitle">Configuration Description</h4><ul id="mrs_01_1676__udd722331ec2b476ab687fce909d8c325"><li id="mrs_01_1676__l44b2818f83314628835f41265293cdf2">DataNode label configuration<p id="mrs_01_1676__p39519895015"><a name="mrs_01_1676__l44b2818f83314628835f41265293cdf2"></a><a name="l44b2818f83314628835f41265293cdf2"></a>Go to the <strong id="mrs_01_1676__b1114363020264">All Configurations</strong> page of HDFS and enter a parameter name in the search box by referring to <a href="mrs_01_2125.html">Modifying Cluster Service Configuration Parameters</a>.</p>
|
|
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="mrs_01_1676__t188e805e50714a76920d721b9c98b37e" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Parameter description</caption><thead align="left"><tr id="mrs_01_1676__r87fac8bc957b41bbb77b231f36cf4db4"><th align="left" class="cellrowborder" valign="top" width="16.85%" id="mcps1.3.2.2.1.2.2.4.1.1"><p id="mrs_01_1676__a89675551370f432a82bc75ec25863977"><strong id="mrs_01_1676__a4306cddd469949669023d7de696a4a3f">Parameter</strong></p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="64.97%" id="mcps1.3.2.2.1.2.2.4.1.2"><p id="mrs_01_1676__aa03cc02dd6f24d92ab9472af749f557f"><strong id="mrs_01_1676__aa47bc2e6838b48d4abe6f910b0a3d178">Description</strong></p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="18.18%" id="mcps1.3.2.2.1.2.2.4.1.3"><p id="mrs_01_1676__ac20655d9916a4c3cb806455d653918a6"><strong id="mrs_01_1676__a3b67a00310ce46f4842c6f650e75eb4d">Default Value</strong></p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="mrs_01_1676__row49502581444"><td class="cellrowborder" valign="top" width="16.85%" headers="mcps1.3.2.2.1.2.2.4.1.1 "><p id="mrs_01_1676__p19511658048">dfs.block.replicator.classname</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="64.97%" headers="mcps1.3.2.2.1.2.2.4.1.2 "><p id="mrs_01_1676__p195265816414">Used to configure the DataNode policy of HDFS.</p>
|
|
<p id="mrs_01_1676__p8398145419616">To enable the NodeLabel function, set this parameter to <strong id="mrs_01_1676__b182685562352323">org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyWithNodeLabel</strong>.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="18.18%" headers="mcps1.3.2.2.1.2.2.4.1.3 "><p id="mrs_01_1676__p29526583410">org.apache.hadoop.hdfs.server.blockmanagement.AvailableSpaceBlockPlacementPolicy</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="mrs_01_1676__r26172c4ef099430ea526ad701bfb4a0c"><td class="cellrowborder" valign="top" width="16.85%" headers="mcps1.3.2.2.1.2.2.4.1.1 "><p id="mrs_01_1676__a1e214c7537104192b25c8b54e3027097">host2tags</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="64.97%" headers="mcps1.3.2.2.1.2.2.4.1.2 "><p id="mrs_01_1676__ad9eb998e35ff42dfb8811ed804e94391">Used to configure a mapping between a DataNode host and a label.</p>
|
|
<p id="mrs_01_1676__a935a932bd6fe4fee92900d184aeecab7">The host name can be configured with an IP address extension expression (for example, <strong id="mrs_01_1676__b48204243352323">192.168.1.[1-128]</strong> or <strong id="mrs_01_1676__b171086826452323">192.168.[2-3].[1-128]</strong>) or a regular expression (for example, <strong id="mrs_01_1676__b194179894352323">/datanode-[123]/</strong> or <strong id="mrs_01_1676__b16108496952323">/datanode-\d{2}/</strong>) starting and ending with a slash (/). The label configuration name cannot contain the following characters: = / \ <strong id="mrs_01_1676__b159162526252323">Note</strong>: The IP address must be a service IP address.</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="18.18%" headers="mcps1.3.2.2.1.2.2.4.1.3 "><p id="mrs_01_1676__af09f10aa91f14eceba7d0acf867e9589">-</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
<div class="note" id="mrs_01_1676__n14c06283e4414ce38cb08727ef58f945"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ul id="mrs_01_1676__u6b8170389f1d4712b2c2b3da8f5825b7"><li id="mrs_01_1676__la55f779cceb3481382a4174b38a64e6c">The <strong id="mrs_01_1676__b962613272512">host2tags</strong> configuration item is described as follows:<p id="mrs_01_1676__ac7946fd81f8b4044a41810cd75ccdd41">Assume there are 20 DataNodes which range from dn-1 to dn-20 in a cluster and the IP addresses of clusters range from 10.1.120.1 to 10.1.120.20. The value of <strong id="mrs_01_1676__b67112861252323">host2tags</strong> can be represented in either of the following methods:</p>
|
|
<p id="mrs_01_1676__afe108892b9314c1996be025265eebcb8"><strong id="mrs_01_1676__aa07c47a5ae524471878977c0a1268b08">Regular expression of the host name</strong></p>
|
|
<p id="mrs_01_1676__a57987788de274f85807722b96840513a"><span class="parmname" id="mrs_01_1676__pb397f6f3febf4405a8a7287a778b4d28"><b>/dn-\d/ = label-1</b></span> indicates that the labels corresponding to dn-1 to dn-9 are <span id="mrs_01_1676__p940194862b304fea92ed64816b401274">label</span>-1, that is, dn-1 = <span id="mrs_01_1676__p56303ef95f4b45528a37c98683e44d5f">label</span>-1, dn-2 = <span id="mrs_01_1676__pd01977d5b6b24beb8d81aaf7d85b5a62">label</span>-1, ..., dn-9 = <span id="mrs_01_1676__p67c20273bc7f4b22991ae1b4c4cc5193">label</span>-1.</p>
|
|
<p id="mrs_01_1676__a8e32a2064a8e4da7a6417e3db73cf4c7"><span class="parmname" id="mrs_01_1676__p247b8f218ed042ddafc06f8075cc2e7b"><b>/dn-((1[0-9]$)|(20$))/ = label-2</b></span> indicates that the labels corresponding to dn-10 to dn-20 are<span id="mrs_01_1676__p5e08b63661e645d2b5b55e575edd130f"> label</span>-2, that is, dn-10 = <span id="mrs_01_1676__p0931e2420efc4bf4ae21781c1f6b5a92">label</span>-2, dn-11 = <span id="mrs_01_1676__p0b3477cd6480495f869cc0a3368484ce">label</span>-2, ...dn-20 = <span id="mrs_01_1676__p856f62cea9924690b1e3c4b88cc447cb">label</span>-2.</p>
|
|
<p id="mrs_01_1676__a2b204f2b956445b9b28a69c901443a28"><strong id="mrs_01_1676__a48850c6d4d7449e8adba516932ef8d53">IP address range expression</strong></p>
|
|
<p id="mrs_01_1676__a08022ddc56ca4be6b1eded64a20c2029"><span class="parmname" id="mrs_01_1676__p9f9e1ba55cf44e9dbaaf0195e07d87de"><b>10.1.120.[1-9] = label-1</b></span> indicates that the labels corresponding to 10.1.120.1 to 10.1.120.9 are <span id="mrs_01_1676__p1a059258962c42be83aea7e6b53af1a0">label</span>-1, that is, 10.1.120.1 = <span id="mrs_01_1676__pca8ff23026e7403e8cffe131582b664a">label</span>-1, 10.1.120.2 = <span id="mrs_01_1676__pa1617daa0a204be3a53fdaa05d44a10b">label</span>-1, ..., and 10.1.120.9 = <span id="mrs_01_1676__pc05cee4b839e49be97ed8f4f145dbda6">label</span>-1.</p>
|
|
<p id="mrs_01_1676__afb7e30420ab447078add98204f7b2f2a"><span class="parmname" id="mrs_01_1676__p7e91d71252d7467f97b68fc247f7cdf5"><b>10.1.120.[10-20] = label-2</b></span> indicates that the labels corresponding to 10.1.120.10 to 10.1.120.20 are <span id="mrs_01_1676__p22475e722c88498980da2d24974eb6fe">label</span>-2, that is, 10.1.120.10 = <span id="mrs_01_1676__p11f160bf0f6146d18584c32ff7ce8ba9">label</span>-2, 10.1.120.11 = <span id="mrs_01_1676__pe57d1d9cc31140568b319ad2cdcf90fe">label</span>-2, ..., and 10.1.120.20 = <span id="mrs_01_1676__pa96c760e9bd4424e91a176c1eb6e5556">label</span>-2.</p>
|
|
</li></ul>
|
|
<ul id="mrs_01_1676__u7327151c48fb4622a64ae12732e30944"><li id="mrs_01_1676__l5537cb2c4f8840b3b2bf3956ddddc321">Label-based data block placement policies are applicable to capacity expansion and reduction scenarios.<p id="mrs_01_1676__abf085c27cf7f46c3adaffc4c767fafbc"><a name="mrs_01_1676__l5537cb2c4f8840b3b2bf3956ddddc321"></a><a name="l5537cb2c4f8840b3b2bf3956ddddc321"></a>A newly added DataNode will be assigned a label if the IP address of the DataNode is within the IP address range in the <strong id="mrs_01_1676__b210573413652323">host2tags</strong> configuration item or the host name of the DataNode matches the host name regular expression in the <strong id="mrs_01_1676__b202115144852323">host2tags</strong> configuration item.</p>
|
|
<p id="mrs_01_1676__a1190927120f94fc497d804feed27b41e">For example, the value of <span class="parmname" id="mrs_01_1676__p12d9241bfa7d473b8263dbfd7da50a74"><b>host2tags</b></span> is <strong id="mrs_01_1676__b16781647142919">10.1.120.[1-9] = label-1</strong>, but the current cluster has only three DataNodes: 10.1.120.1 to 10.1.120.3. If DataNode 10.1.120.4 is added for capacity expansion, the DataNode is labeled as label-1. If the 10.1.120.3 DataNode is deleted or out of the service, no data block will be allocated to the node.</p>
|
|
</li></ul>
|
|
</div></div>
|
|
</li></ul>
|
|
<ul id="mrs_01_1676__u56248a3d9da14678b9276b966876f231"><li id="mrs_01_1676__lb2b2395b405a444aadb70a4d645d21b6">Set label expressions for directories or files.<ul id="mrs_01_1676__u7310eef4e3b0496db941e02a5344bdfe"><li id="mrs_01_1676__led739ed6b7724c8984dea746e44f00c6">On the HDFS parameter configuration page, configure <span class="parmname" id="mrs_01_1676__pf2a315fec58b4c6eb33f31e8e1e506ad"><b>path2expression</b></span> to configure the mapping between HDFS directories and labels. If the configured HDFS directory does not exist, the configuration can succeed. When a directory with the same name as the HDFS directory is created manually, the configured label mapping relationship will be inherited by the directory within 30 minutes. After a labeled directory is deleted, a new directory with the same name as the deleted one will inherit its mapping within 30 minutes.</li><li id="mrs_01_1676__l6cefcc3923d14e4d8c5814693358e864">For details about configuring items using commands, see the <strong id="mrs_01_1676__a8f3ccbe838104e9ab89562a6be9ebbd7"><strong id="mrs_01_1676__b12427191611192">hdfs nodelabel</strong> -setLabelExpression</strong> command.</li><li id="mrs_01_1676__l4739887a6d604fb9b25c4964f9db9ec1">To set label expressions using the Java API, invoke the <strong id="mrs_01_1676__b86089920120">setLabelExpression(String src, String labelExpression)</strong> method using the instantiated object NodeLabelFileSystem. <i><span class="varname" id="mrs_01_1676__vfcc9cadc13c74633bdd31b671a956064">src</span></i> indicates a directory or file path on HDFS, and <span class="parmname" id="mrs_01_1676__p977fa35379d243ea86ce9814b3d1363b"><b>labelExpression</b></span> indicates the label expression.</li></ul>
|
|
</li></ul>
|
|
<ul id="mrs_01_1676__ue1e6c227ece342ce90f61dff3c94a009"><li id="mrs_01_1676__l8e36ed63c12c4c50b16a50bb7b34507e">After the NodeLabel is enabled, you can run the <strong id="mrs_01_1676__b1620411217109">hdfs nodelabel -listNodeLabels</strong> command to view the label information of each DataNode.</li></ul>
|
|
</div>
|
|
<div class="section" id="mrs_01_1676__sfb2d69c75cde4c6eb6e2c3a0fb8e403a"><h4 class="sectiontitle">Block Replica Location Selection</h4><p id="mrs_01_1676__a59296d7d2d1d44f1bf9f611fc7d4165b">Nodelabel supports different placement policies for replicas. The expression <span class="parmname" id="mrs_01_1676__p6247f38acef54cb1821aa849127432e4"><b>label-1,label-2,label-3</b></span> indicates that three replicas are respectively placed in DataNodes containing label-1, label-2, and label-3. Different replica policies are separated by commas (,).</p>
|
|
<p id="mrs_01_1676__ae72403f8dc91497f89d15cf44d0a4cd2">If you want to place two replicas in DataNode with label-1, set the expression as follows: <span class="parmname" id="mrs_01_1676__pab88a65fc5a14134af462e180a4a90b2"><b>label-1[replica=2],label-2,label-3</b></span>. In this case, if the default number of replicas is 3, two nodes with label-1 and one node with label-2 are selected. If the default number of replicas is 4, two nodes with label-1, one node with label-2, and one node with label-3 are selected. Note that the number of replicas is the same as that of each replica policy from left to right. However, the number of replicas sometimes exceeds the expressions. If the default number of replicas is 5, the extra replica is placed on the last node, that is, the node labeled with label-3.</p>
|
|
<p id="mrs_01_1676__aecf4aac64e5b4f248342842bab7d8892">When the ACLs function is enabled and the user does not have the permission to access the labels used in the expression, the DataNode with the label is not selected for the replica.</p>
|
|
</div>
|
|
<div class="section" id="mrs_01_1676__se2ecff28a93849ee85d52faf0582caec"><h4 class="sectiontitle">Deletion of Redundant Block Replicas</h4><p id="mrs_01_1676__a2f3a8f23ed904a1fbd63929aef016868">If the number of block replicas exceeds the value of <span class="parmname" id="mrs_01_1676__pcf9ad68ad28a461faca3d0f3760397b4"><b>dfs.replication</b></span> (number of file replicas specified by the user), HDFS will delete redundant block replicas to ensure cluster resource usage.</p>
|
|
<p id="mrs_01_1676__a2de4f093594e41aa91d317435c0ab04c">The deletion rules are as follows:</p>
|
|
<ul id="mrs_01_1676__u339bbe7571144f098d180a25a09bd36f"><li id="mrs_01_1676__lab9ab034269849ba964ec190dc95b206">Preferentially delete replicas that do not meet any expression.<p id="mrs_01_1676__ad62e046136724c18aa3d61875327ade3"><a name="mrs_01_1676__lab9ab034269849ba964ec190dc95b206"></a><a name="lab9ab034269849ba964ec190dc95b206"></a>For example: The default number of file replicas is <strong id="mrs_01_1676__b95105355952323">3</strong>.</p>
|
|
<p id="mrs_01_1676__a59f66e8d467a4de09febcbe4a139803f">The label expression of <strong id="mrs_01_1676__b182944214452323">/test</strong> is <strong id="mrs_01_1676__b40060138252323">LA[replica=1],LB[replica=1],LC[replica=1]</strong>.</p>
|
|
<p id="mrs_01_1676__aa96e605b1e4a430dbf9ca4f5ef457dd3">The file replicas of <strong id="mrs_01_1676__b138443594852323">/test</strong> are distributed on four nodes (D1 to D4), corresponding to labels (LA to LD).</p>
|
|
<pre class="screen" id="mrs_01_1676__sf2ba778759d14bd59c44a162f0ebe7aa">D1:LA
|
|
D2:LB
|
|
D3:LC
|
|
D4:LD</pre>
|
|
<p id="mrs_01_1676__a49c17a4dd07942d187402160e6fb5056">Then, block replicas on node D4 will be deleted.</p>
|
|
</li><li id="mrs_01_1676__le9ba42bda00141f2b55403cdddcfb0a9">If all replicas meet the expressions, delete the redundant replicas which are beyond the number specified by the expression.<p id="mrs_01_1676__a61dbbb137a3545e086674c26e7e9b079"><a name="mrs_01_1676__le9ba42bda00141f2b55403cdddcfb0a9"></a><a name="le9ba42bda00141f2b55403cdddcfb0a9"></a>For example: The default number of file replicas is <strong id="mrs_01_1676__b42525814652323">3</strong>.</p>
|
|
<p id="mrs_01_1676__afad69e43a82f4c9294c1de2a6acf3e45">The label expression of <strong id="mrs_01_1676__b82110174052323">/test</strong> is <strong id="mrs_01_1676__b206089614552323">LA[replica=1],LB[replica=1],LC[replica=1]</strong>.</p>
|
|
<p id="mrs_01_1676__aa6172ff0d81543afa9ed7165c32c5964">The file replicas of <strong id="mrs_01_1676__b117931411552323">/test</strong> are distributed on the following four nodes, corresponding to the following labels.</p>
|
|
<pre class="screen" id="mrs_01_1676__s78a10c9903a74cc582c7009b69b696e4">D1:LA
|
|
D2:LA
|
|
D3:LB
|
|
D4:LC</pre>
|
|
<p id="mrs_01_1676__a0f7b725c34be4484b454869d16f85686">Then, block replicas on node D1 or D2 will be deleted.</p>
|
|
</li><li id="mrs_01_1676__l14ed41069b8c4667a2635ba084aa01fe">If a file owner or group of a file owner cannot access a label, preferentially delete the replica from the DataNode mapped to the label.</li></ul>
|
|
</div>
|
|
<div class="section" id="mrs_01_1676__sd2fc43fb8e164318b29fab773548fcbf"><h4 class="sectiontitle">Example of label-based block placement policy</h4><p id="mrs_01_1676__a2157b03c58c64d50b355341c8956d5a8">Assume that there are six DataNodes, namely, dn-1, dn-2, dn-3, dn-4, dn-5, and dn-6 in a cluster and the corresponding IP address range is 10.1.120.[1-6]. Six directories must be configured with label expressions. The default number of block replicas is <strong id="mrs_01_1676__b120490373852323">3</strong>.</p>
|
|
<ul id="mrs_01_1676__ubfcf019abd994ab39d8580d251bc4529"><li id="mrs_01_1676__ld63dab23ee5f47528894630b1af3006e">The following provides three expressions of the DataNode label in <span class="filepath" id="mrs_01_1676__f395de590c6394f049e9ae3209cabd656"><b>host2labels</b></span> file. The three expressions have the same function.<ul id="mrs_01_1676__u917789efd34f43cabd52394abc3efc31"><li id="mrs_01_1676__l322ae81a439c4094995a4aad30c76114">Regular expression of the host name<pre class="screen" id="mrs_01_1676__s35a05dc8f2804ad3b24a203e2e57d872">/dn-[1456]/ = label-1,label-2
|
|
/dn-[26]/ = label-1,label-3
|
|
/dn-[3456]/ = label-1,label-4
|
|
/dn-5/ = label-5</pre>
|
|
</li><li id="mrs_01_1676__l6fb91104bedc4b32bdd5655db82f9602">IP address range expression<pre class="screen" id="mrs_01_1676__sf477760e5d044da7bb87db16a627a4c2">10.1.120.[1-6] = label-1
|
|
10.1.120.1 = label-2
|
|
10.1.120.2 = label-3
|
|
10.1.120.[3-6] = label-4
|
|
10.1.120.[4-6] = label-2
|
|
10.1.120.5 = label-5
|
|
10.1.120.6 = label-3</pre>
|
|
</li><li id="mrs_01_1676__lc2dfb456472f4651b2725aec72184c48">Common host name expression<pre class="screen" id="mrs_01_1676__sa7108c8e9fa6451eabc590cbaacd8073">/dn-1/ = label-1, label-2
|
|
/dn-2/ = label-1, label-3
|
|
/dn-3/ = label-1, label-4
|
|
/dn-4/ = label-1, label-2, label-4
|
|
/dn-5/ = label-1, label-2, label-4, label-5
|
|
/dn-6/ = label-1, label-2, label-3, label-4</pre>
|
|
</li></ul>
|
|
</li></ul>
|
|
<ul id="mrs_01_1676__uee027530b85848fdbe5f2ab7cf5c857f"><li id="mrs_01_1676__lc775441bc7314e7dbfa5c3af0f094952">The label expressions of the directories are set as follows:<pre class="screen" id="mrs_01_1676__s9cd8247cb0e94c35ac5bebd5cabe436f">/dir1 = label-1
|
|
/dir2 = label-1 && label-3
|
|
/dir3 = label-2 || label-4[replica=2]
|
|
/dir4 = (label-2 || label-3) && label-4
|
|
/dir5 = !label-1
|
|
/sdir2.txt = label-1 && label-3[replica=3,fallback=NONE]
|
|
/dir6 = label-4[replica=2],label-2</pre>
|
|
<div class="note" id="mrs_01_1676__ncf77820b177d4de5aabd63625f30b24d"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="mrs_01_1676__p1692516413263">For details about the label expression configuration, see the <strong id="mrs_01_1676__aea4b2952e3f542cab5b101ebecbf6ee7">hdfs nodelabel -setLabelExpression</strong> command.</p>
|
|
</div></div>
|
|
<p id="mrs_01_1676__aa2f19c535d9a4202b6886d1b3a3cc87d">The file data block storage locations are as follows:</p>
|
|
<ul id="mrs_01_1676__u9d1ac4c8d799424ea7e0c8cadff0b380"><li id="mrs_01_1676__la73edc440c534bd491efaf6a46222ab0">Data blocks of files in the <span class="filepath" id="mrs_01_1676__f74d725bc67aa4f08a423e857e9caed85"><b>/dir1</b></span> directory can be stored on any of the following nodes: dn-1, dn-2, dn-3, dn-4, dn-5, and dn-6.</li><li id="mrs_01_1676__la2dbac41e0ed421ba6231fbbcb554f93">Data blocks of files in the <span class="filepath" id="mrs_01_1676__fd188a9880dc14d3cb4f84e9a21aac3cc"><b>/dir2</b></span> directory can be stored on the dn-2 and dn-6 nodes. The default number of block replicas is <strong id="mrs_01_1676__b72650957852323">3</strong>. The expression matches only two DataNodes. The third replica will be stored on one of the remaining nodes in the cluster.</li><li id="mrs_01_1676__l2e1fe62d6b3047909fb1ec6b51cb8985">Data blocks of files in the <span class="filepath" id="mrs_01_1676__f93669b24bf2842949f6d6182839af887"><b>/dir3</b></span> directory can be stored on any three of the following nodes: dn-1, dn-3, dn-4, dn-5, and dn-6.</li><li id="mrs_01_1676__l07dfb04615164edc9e480022e26b28e7">Data blocks of files in the <span class="filepath" id="mrs_01_1676__f6bcfa9eef7af4beea41f275dbfe9c56c"><b>/dir4</b></span> directory can be stored on the dn-4, dn-5, and dn-6 nodes.</li><li id="mrs_01_1676__l4edf11172329408bae6806ebdecf3caf">Data blocks of files in the <span class="filepath" id="mrs_01_1676__f073206475ed949a0ac430781da72bbdd"><b>/dir5</b></span> directory do not match any DataNode and will be stored on any three nodes in the cluster, which is the same as the default block selection policy.</li><li id="mrs_01_1676__leb21c1a5419949c49a2acf93bd95b206">For the data blocks of the <span class="filepath" id="mrs_01_1676__f7b0da468b64945eead38ae86a93f13e7"><b>/sdir2.txt</b></span> file, two replicas are stored on the dn-2 and dn-6 nodes. The left one is not stored in the node because <strong id="mrs_01_1676__b12715604152323">fallback=NONE</strong> is enabled.</li><li id="mrs_01_1676__l76f7c402535c43aa84fb3cb4c7af7654">Data blocks of the files in the <span class="filepath" id="mrs_01_1676__fd391efe267ff4200827de2c993e1f3da"><b>/dir6</b></span> directory are stored on the two nodes with label-4 selected from dn-3, dn-4, dn-5, and dn-6 and another node with label-2. If the specified number of file replicas in the <span class="filepath" id="mrs_01_1676__f72197cbcf824468da79167eba6e48a84"><b>/dir6</b></span> directory is more than 3, the extra replicas will be stored on a node with label-2.</li></ul>
|
|
</li></ul>
|
|
</div>
|
|
<div class="section" id="mrs_01_1676__se981e8a1a8614c5b956dd6b4d1520955"><h4 class="sectiontitle">Restrictions</h4><p id="mrs_01_1676__p153041618172612">In configuration files, <span class="parmname" id="mrs_01_1676__pd8487cb5dfdb40a98cced23d9a8324f2"><b>key</b></span> and <span class="parmname" id="mrs_01_1676__pa1e583e2d40f45ca8c1e43f6a5c28ecf"><b>value</b></span> are separated by equation signs (=), colons (:), and whitespace. Therefore, the host name of the <span class="parmname" id="mrs_01_1676__p543d878bf1414e809dc56ba190572154"><b>key</b></span> cannot contain these characters because these characters may be considered as separators.</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_0790.html">Using HDFS</a></div>
|
|
</div>
|
|
</div>
|
|
|