doc-exports/docs/mrs/component-operation-guide/mrs_01_1416.html

<a name="mrs_01_1416"></a><a name="mrs_01_1416"></a>

<h1 class="topictitle1">CarbonData Data Migration</h1>
<div id="body1595920210166"><div class="section" id="mrs_01_1416__sd016475a013e4017a1cd8fa815052c79"><h4 class="sectiontitle">Scenario</h4><p id="mrs_01_1416__acd71bdd1a8024c16a13d82c6e9b9bddd">If you want to rapidly migrate CarbonData data from a cluster to another one, you can use the CarbonData backup and restoration commands. This method does not require data import in the target cluster, reducing required migration time.</p>
</div>
<div class="section" id="mrs_01_1416__sad2227036f784f4b86dd3bf1ade02c31"><h4 class="sectiontitle">Prerequisites</h4><p id="mrs_01_1416__p1574534153714">The Spark2x client has been installed in a directory, for example, <strong id="mrs_01_1416__b645342832012">/opt/client</strong>, in two clusters. The source cluster is cluster A, and the target cluster is cluster B. </p>
</div>
<div class="section" id="mrs_01_1416__sa74003b14482484db8a4b23fe312c2e6"><h4 class="sectiontitle">Procedure</h4><ol id="mrs_01_1416__ob48c9d3a8cb34b4c931240af1e930349"><li id="mrs_01_1416__laa4d0f3b90154536a901cc53d9d38209"><span>Log in to the node where the client is installed in cluster A as a client installation user.</span></li><li id="mrs_01_1416__la0a212dd77b845be89530e70a4a53418"><span>Run the following commands to configure environment variables:</span><p><p id="mrs_01_1416__af49ec266fb67453eaffbce9e9fdb6f0c"><strong id="mrs_01_1416__af1814c1a96124dcbaedc330d88f54f27">source /opt/client/bigdata_env</strong></p>
<p id="mrs_01_1416__ab573e11a3b944389a445dac4e2aeceee"><strong id="mrs_01_1416__a6d8c574e519e4d64a0064b703e10ade1">source /opt/client/Spark2x/component_env</strong></p>
</p></li><li id="mrs_01_1416__lc048829d498a4604827caf8a12039036"><span>If the cluster is in security mode, run the following command to authenticate the user. In normal mode, skip user authentication.</span><p><p id="mrs_01_1416__af339daf88d724297b2f60b2934625bd1"><strong id="mrs_01_1416__adb1086ace6c145539319c127f502b483">kinit </strong><em id="mrs_01_1416__afb1a4caa31874224b566e0c35efda50a">carbondatauser</em></p>
<p id="mrs_01_1416__a6457666748a048c3bd40137e5a37ef97"><em id="mrs_01_1416__i18131152343718">carbondatauser</em> indicates the user of the original data. That is, the user has the read and write permissions for the tables.</p>
<div class="note" id="mrs_01_1416__note159935396372"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="mrs_01_1416__p14993539113719">You must add the user to the <strong id="mrs_01_1416__b13396313131318">hadoop</strong> (primary group) and <strong id="mrs_01_1416__b0307202911615">hive</strong> groups, and associate it with the <strong id="mrs_01_1416__b739601361318">System_administrator</strong> role.</p>
</div></div>
</p></li><li id="mrs_01_1416__leef25bd89bfd401bb258969d6f21041b"><span>Run the following command to connect to the database and check the location for storing table data on HDFS:</span><p><p id="mrs_01_1416__aebab4cda4c6d438f8d4e11ae202fb22e"><strong id="mrs_01_1416__b870414692215">spark-beeline</strong></p>
<p id="mrs_01_1416__af2b9c9eb15e44eb0a8313d6454b7a075"><strong id="mrs_01_1416__a7a220d8ff39244199c3af06555db80a1">desc formatted<strong id="mrs_01_1416__a015bf47cb2a440ccb1c1be77f866afa8"> </strong></strong><em id="mrs_01_1416__a3842bb492d614d5cb9e1cc70930452ab">Name of the table containing the original data</em><strong id="mrs_01_1416__a34b69a758ab74395b67e9718a1ddeb69">;</strong></p>
<p id="mrs_01_1416__a3d0248a2c75a40e280142244a1d9e72e"><span class="parmname" id="mrs_01_1416__pbbdb9bd8bc4e497eaa2471cf040d456d"><b>Location</b></span> in the displayed information indicates the directory where the data file resides.</p>
</p></li><li id="mrs_01_1416__le50accb58ab14602a3e5696a40acfa86"><span>Log in to the node where the client is installed in cluster B as a client installation user and configure the environment variables:</span><p><p id="mrs_01_1416__en-us_topic_0095127612_p12791562001"><strong id="mrs_01_1416__en-us_topic_0095127612_b115124122001">source /opt/client/bigdata_env</strong></p>
<p id="mrs_01_1416__en-us_topic_0095127612_p365028442001"><strong id="mrs_01_1416__en-us_topic_0095127612_b600901422001">source /opt/client/Spark2x/component_env</strong></p>
</p></li><li id="mrs_01_1416__en-us_topic_0095127612_li39403722001"><span>If the cluster is in security mode, run the following command to authenticate the user. In normal mode, skip user authentication.</span><p><p id="mrs_01_1416__en-us_topic_0095127612_p354633492001"><strong id="mrs_01_1416__en-us_topic_0095127612_b507346922001">kinit </strong><em id="mrs_01_1416__a9faa52ee3aeb49dca39e304da6c82854">carbondatauser2</em></p>
<p id="mrs_01_1416__en-us_topic_0095127612_p86068492001"><em id="mrs_01_1416__i9304153053718">carbondatauser2</em> indicates the user that uploads data.</p>
<div class="note" id="mrs_01_1416__note936214215381"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="mrs_01_1416__p83631425384">You must add the user to the <strong id="mrs_01_1416__b688613741716">hadoop</strong> (primary group) and <strong id="mrs_01_1416__b988653711715">hive</strong> groups, and associate it with the <strong id="mrs_01_1416__b16887337191718">System_administrator</strong> role.</p>
</div></div>
</p></li><li id="mrs_01_1416__ld16d6a27d65f4e2c9181394e7ffde032"><span>Run the <strong id="mrs_01_1416__b20400763794853">spark-beeline</strong> command to connect to the database.</span></li><li id="mrs_01_1416__ld70b46b251b549858924c92d85c4d48b"><span>Does the database that maps to the original data exist?</span><p><ul id="mrs_01_1416__u8b533238dbb943c5af1a330956557f98"><li id="mrs_01_1416__l2a69abf907e44cdda5bd25c75037eaa1">If yes, go to <a href="#mrs_01_1416__lb95e9d29c6fc469a8375f190f4136467">9</a>.</li><li id="mrs_01_1416__l64571313012d45c9801c86d506b1bafe">If no, run the <strong id="mrs_01_1416__b9105752172015">create database</strong> <em id="mrs_01_1416__i1140151152115">Database name</em> command to create a database with the same name as that maps to the original data and go to <a href="#mrs_01_1416__lb95e9d29c6fc469a8375f190f4136467">9</a>.</li></ul>
</p></li><li id="mrs_01_1416__lb95e9d29c6fc469a8375f190f4136467"><a name="mrs_01_1416__lb95e9d29c6fc469a8375f190f4136467"></a><a name="lb95e9d29c6fc469a8375f190f4136467"></a><span>Copy the original data from the HDFS directory in cluster A to that in cluster B.</span><p><p id="mrs_01_1416__ad8181de316f94f2b86915aa4b1d48812">When uploading data in cluster B, ensure that the upload directory has the directories with the same names as the database and table in the original directory and the upload user has the permission to write data to the upload directory. After the data is uploaded, the user has the permission to read and write the data.</p>
<p id="mrs_01_1416__a9e12cb663fd048de9f21f3d2b186e6b3">For example, if the original data is stored in <span class="filepath" id="mrs_01_1416__filepath10302153411218"><b>/user/carboncadauser/warehouse/db1/tb1</b></span>, the data can be stored in <span class="filepath" id="mrs_01_1416__filepath1930883492112"><b>/user/carbondatauser2/warehouse/db1/tb1</b></span> in the new cluster.</p>
<ol type="a" id="mrs_01_1416__ol1472865333919"><li id="mrs_01_1416__li87285532397">Run the following command to download the original data to the <strong id="mrs_01_1416__b1485165582113">/opt/backup</strong> directory of cluster A:<p id="mrs_01_1416__p6231255153215"><strong id="mrs_01_1416__b084217201348">hdfs dfs -get</strong><strong id="mrs_01_1416__b4584162153410"> /user/carboncadauser/warehouse/db1/tb1</strong><strong id="mrs_01_1416__b6842112018347"> /opt/</strong><strong id="mrs_01_1416__b16899133833513">backup</strong></p>
</li><li id="mrs_01_1416__li1470611054012">Run the following command to copy the original data of cluster A to the <strong id="mrs_01_1416__b1961161113229">/opt/backup</strong> directory on the client node of cluster B.<p id="mrs_01_1416__p147816404352"><strong id="mrs_01_1416__b651641312313">scp /opt/backup root@</strong><em id="mrs_01_1416__i756650113817">IP address of the client node of cluster B</em>:<strong id="mrs_01_1416__b1215712194382">/opt/</strong><strong id="mrs_01_1416__b51579195386">backup</strong></p>
</li><li id="mrs_01_1416__li8360830144119">Run the following command to upload the data copied to cluster B to HDFS:<p id="mrs_01_1416__p38771783444"><a name="mrs_01_1416__li8360830144119"></a><a name="li8360830144119"></a><strong id="mrs_01_1416__b2087711817446">hdfs dfs -put</strong><strong id="mrs_01_1416__b128771087449"> /opt/</strong><strong id="mrs_01_1416__b1087712811448">backup</strong> <strong id="mrs_01_1416__b78778824419">/user/carbondatauser2/warehouse/db1/tb1</strong></p>
</li></ol>
</p></li><li id="mrs_01_1416__laf7ce95fc3cc4ab2a96640541690ed30"><a name="mrs_01_1416__laf7ce95fc3cc4ab2a96640541690ed30"></a><a name="laf7ce95fc3cc4ab2a96640541690ed30"></a><span>In the client environment of cluster B, run the following command to generate the metadata associated with the table corresponding to the original data in Hive:</span><p><p id="mrs_01_1416__a961fd13e6d41491085c538ec0e269f10"><strong id="mrs_01_1416__b1173711915542">REFRESH TABLE<em id="mrs_01_1416__i673618198544"> </em></strong><em id="mrs_01_1416__i475610197547">$dbName.$tbName</em><strong id="mrs_01_1416__b18737819195420"><em id="mrs_01_1416__i773701913548">;</em></strong></p>
<p id="mrs_01_1416__ac13c929095fc4ad1aded1cd9afff5e80"><em id="mrs_01_1416__i155089392610125">$dbName</em> indicates the database name, and <em id="mrs_01_1416__i6444916110125">$tbName</em> indicates the table name.</p>
</p></li><li id="mrs_01_1416__li12999187252"><span>If the original table contains an index table, perform <a href="#mrs_01_1416__lb95e9d29c6fc469a8375f190f4136467">9</a> and <a href="#mrs_01_1416__laf7ce95fc3cc4ab2a96640541690ed30">10</a> to migrate the index table directory from cluster A to cluster B.</span></li><li id="mrs_01_1416__ld68042a78e074d1e8c115533b1cdc285"><span>Run the following command to register an index table for the CarbonData table (skip this step if no index table is created for the original table):</span><p><p id="mrs_01_1416__afce681d7130942b1ae3ca563ff96d80e"><strong id="mrs_01_1416__en-us_topic_0095127612_b651619822035">REGISTER INDEX TABLE </strong><em id="mrs_01_1416__en-us_topic_0095127612_i445229222035">$tableName</em> ON <em id="mrs_01_1416__a79b03b51394a474a99cc291ed6e571e8">$maintable</em>;</p>
<p id="mrs_01_1416__a2fe8a31a566945ada8ed9eca9f3ed3de"><em id="mrs_01_1416__i20572470510341">$tableName</em> indicates the index table name, and <em id="mrs_01_1416__i1486716401757">$maintable</em> indicates the table name.</p>
</p></li></ol>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1405.html">CarbonData Operation Guide</a></div>
</div>
</div>