doc-exports/docs/dws/dev/dws_04_0212.html
Lu, Huayi a24ca60074 DWS DEVELOPER 811 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Lu, Huayi <luhuayi@huawei.com>
Co-committed-by: Lu, Huayi <luhuayi@huawei.com>
2023-01-19 13:37:49 +00:00

98 lines
19 KiB
HTML

<a name="EN-US_TOPIC_0000001146360931"></a><a name="EN-US_TOPIC_0000001146360931"></a>
<h1 class="topictitle1">Preparing Data in an MRS Cluster</h1>
<div id="body8662426"><p id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_p20390204373512">Before importing data from MRS to a <span id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_ph1692512422530">GaussDB(DWS)</span> cluster, you must have:</p>
<ol id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_ol382575493513"><li id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_li482575483516">Created an MRS cluster.</li><li id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_li6825754183517">Created the Hive/Spark ORC table in the MRS cluster and stored the table data to the HDFS path corresponding to the table.</li></ol>
<p id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_p48195853145921">If you have completed the preparations, skip this section.</p>
<p id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_p166216918494">In this tutorial, the Hive ORC table will be created in the MRS cluster as an example to complete the preparation work. The process for creating the Spark ORC table in the MRS cluster and the SQL syntax are similar to those of Hive.</p>
<div class="section" id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_section55166005141018"><a name="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_section55166005141018"></a><a name="en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_section55166005141018"></a><h4 class="sectiontitle">Data File</h4><p id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_p7523163718476">The sample data of the <strong id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_b84235270617837">product_info.txt</strong> data file is as follows:</p>
<pre class="screen" id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_screen1650215324201">100,XHDK-A-1293-#fJ3,2017-09-01,A,2017 Autumn New Shirt Women,red,M,328,2017-09-04,715,good
205,KDKE-B-9947-#kL5,2017-09-01,A,2017 Autumn New Knitwear Women,pink,L,584,2017-09-05,406,very good!
300,JODL-X-1937-#pV7,2017-09-01,A,2017 autumn new T-shirt men,red,XL,1245,2017-09-03,502,Bad.
310,QQPX-R-3956-#aD8,2017-09-02,B,2017 autumn new jacket women,red,L,411,2017-09-05,436,It's really super nice
150,ABEF-C-1820-#mC6,2017-09-03,B,2017 Autumn New Jeans Women,blue,M,1223,2017-09-06,1200,The seller's packaging is exquisite
200,BCQP-E-2365-#qE4,2017-09-04,B,2017 autumn new casual pants men,black,L,997,2017-09-10,301,The clothes are of good quality.
250,EABE-D-1476-#oB1,2017-09-10,A,2017 autumn new dress women,black,S,841,2017-09-15,299,Follow the store for a long time.
108,CDXK-F-1527-#pL2,2017-09-11,A,2017 autumn new dress women,red,M,85,2017-09-14,22,It's really amazing to buy
450,MMCE-H-4728-#nP9,2017-09-11,A,2017 autumn new jacket women,white,M,114,2017-09-14,22,Open the package and the clothes have no odor
260,OCDA-G-2817-#bD3,2017-09-12,B,2017 autumn new woolen coat women,red,L,2004,2017-09-15,826,Very favorite clothes
980,ZKDS-J-5490-#cW4,2017-09-13,B,2017 Autumn New Women's Cotton Clothing,red,M,112,2017-09-16,219,The clothes are small
98,FKQB-I-2564-#dA5,2017-09-15,B,2017 autumn new shoes men,green,M,4345,2017-09-18,5473,The clothes are thick and it's better this winter.
150,DMQY-K-6579-#eS6,2017-09-21,A,2017 autumn new underwear men,yellow,37,2840,2017-09-25,5831,This price is very cost effective
200,GKLW-l-2897-#wQ7,2017-09-22,A,2017 Autumn New Jeans Men,blue,39,5879,2017-09-25,7200,The clothes are very comfortable to wear
300,HWEC-L-2531-#xP8,2017-09-23,A,2017 autumn new shoes women,brown,M,403,2017-09-26,607,good
100,IQPD-M-3214-#yQ1,2017-09-24,B,2017 Autumn New Wide Leg Pants Women,black,M,3045,2017-09-27,5021,very good.
350,LPEC-N-4572-#zX2,2017-09-25,B,2017 Autumn New Underwear Women,red,M,239,2017-09-28,407,The seller's service is very good
110,NQAB-O-3768-#sM3,2017-09-26,B,2017 autumn new underwear women,red,S,6089,2017-09-29,7021,The color is very good
210,HWNB-P-7879-#tN4,2017-09-27,B,2017 autumn new underwear women,red,L,3201,2017-09-30,4059,I like it very much and the quality is good.
230,JKHU-Q-8865-#uO5,2017-09-29,C,2017 Autumn New Clothes with Chiffon Shirt,black,M,2056,2017-10-02,3842,very good</pre>
<p id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_p4707194216174"></p>
</div>
<div class="section" id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_section17710193111494"><h4 class="sectiontitle">Creating a Hive ORC Table in an MRS Cluster</h4><ol id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_ol596585324916"><li id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_li610292175116">Create an MRS cluster.<p id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_p327965243618"><a name="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_li610292175116"></a><a name="en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_li610292175116"></a><span id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_ph4697659163611">For details, see "Creating a Cluster &gt; Custom Creation of a Cluster" in the <em id="EN-US_TOPIC_0000001146360931__i255911441503">MapReduce Service User Guide</em>.</span></p>
</li><li id="EN-US_TOPIC_0000001146360931__li1053201072811">Download the client.<ol type="a" id="EN-US_TOPIC_0000001146360931__ol1839312372814"><li id="EN-US_TOPIC_0000001146360931__li12393152319283">Go back to the MRS cluster page. Click the cluster name. On the <strong id="EN-US_TOPIC_0000001146360931__b3377218104111">Dashboard</strong> tab page of the cluster details page, click <strong id="EN-US_TOPIC_0000001146360931__b6378121810410">Access Manager</strong>. If a message is displayed indicating that EIP needs to be bound, bind an EIP first.</li><li id="EN-US_TOPIC_0000001146360931__li144140315297">Enter the username <strong id="EN-US_TOPIC_0000001146360931__b16192155344112">admin</strong> and its password for logging in to MRS Manager. The password is the one you entered when creating the MRS cluster.</li><li id="EN-US_TOPIC_0000001146360931__li51752683016">Choose <strong id="EN-US_TOPIC_0000001146360931__b172909512427">Services</strong> &gt; <strong id="EN-US_TOPIC_0000001146360931__b92901659425">Download Client</strong>. Set <strong id="EN-US_TOPIC_0000001146360931__b1729105134217">Client Type</strong> to <strong id="EN-US_TOPIC_0000001146360931__b1529125164215">Only configuration files</strong> and set <strong id="EN-US_TOPIC_0000001146360931__b172921594217">Download To</strong> to <strong id="EN-US_TOPIC_0000001146360931__b5292105104215">Server</strong>. Click <strong id="EN-US_TOPIC_0000001146360931__b1091513314450">OK</strong>.<p id="EN-US_TOPIC_0000001146360931__p1597118183118"><span><img id="EN-US_TOPIC_0000001146360931__image15954201814314" src="figure/en-us_image_0000001217970670.png"></span></p>
</li></ol>
</li><li id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_li14725131112614"><a name="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_li14725131112614"></a><a name="en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_li14725131112614"></a>Log in to the Hive client of the MRS cluster.<ol type="a" id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_ol5884554142019"><li id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_li9214205214208">Log in to a Master node.<p id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_p1638611202389"><a name="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_li9214205214208"></a><a name="en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_li9214205214208"></a><span id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_ph88081243386">For details, see "Remote Login Guide &gt; Logging In to a Master Node" in the <em id="EN-US_TOPIC_0000001146360931__i6961144217415">MapReduce Service User Guide</em>.</span></p>
</li><li id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_li141551359102716">Run the following command to switch the user:<pre class="screen" id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_screen1829592632812">sudo su - omm</pre>
</li><li id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_li5310293617727">Run the following command to go to the client directory:<pre class="screen" id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_screen5555184313282">cd /opt/client</pre>
</li><li id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_li6464792217731">Run the following command to configure the environment variables:<pre class="screen" id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_screen1755017813295">source bigdata_env</pre>
</li><li id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_li1388912510181">If Kerberos authentication is enabled for the current cluster, run the following command to authenticate the current user. The current user must have the permission for creating Hive tables. For details, see "Creating a Role" in the <em id="EN-US_TOPIC_0000001146360931__i1977410229112">MapReduce Service User Guide</em>. Configure a role with the required permissions. For details, see "Creating a Role" in the <em id="EN-US_TOPIC_0000001146360931__i133711030105317">MapReduce Service User Guide</em>. Bind a role to the user. If the Kerberos authentication is disabled for the current cluster, skip this step.<pre class="screen" id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_screen10522753236"><strong id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_b46997547217">kinit</strong> <em id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_i615612311">MRS cluster user</em></pre>
<p id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_p188507194203">Example: <strong id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_b172871881232">kinit hiveuser</strong></p>
</li><li id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_li393313508298">Run the following command to start the Hive client:<pre class="screen" id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_screen1033628113010">beeline</pre>
</li></ol>
</li><li id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_li2965165364910">Create a database demo on Hive.<p id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_p772915035915"><a name="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_li2965165364910"></a><a name="en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_li2965165364910"></a>Run the following command to create the database demo:</p>
<pre class="screen" id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_screen19556195525916">CREATE DATABASE <em id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_i61291258181811">demo</em>;</pre>
</li><li id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_li3918155310532">Create table <strong id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_b84235270610625">product_info</strong> of the <strong id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_b84235270610636">Hive TEXTFILE</strong> type in the database demo and import the <a href="#EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_section55166005141018">Data File</a> (<strong id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_b84235270610711">product_info.txt</strong>) to the HDFS path corresponding to the table.<p id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_p207721924191516">Run the following command to switch to the database demo:</p>
<pre class="screen" id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_screen160482212169">USE <em id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_i1845335410180">demo</em>;</pre>
<p id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_p15840911125717">Run the following command to create table <strong id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_b8423527061091">product_info</strong> and define the table fields based on data in the <a href="#EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_section55166005141018">Data File</a>.</p>
<pre class="screen" id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_screen5420145345710">DROP TABLE <em id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_i13919174183">product_info</em>;
CREATE TABLE <em id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_i18856171901817">product_info </em>
(
product_price int ,
product_id char(30) ,
product_time date ,
product_level char(10) ,
product_name varchar(200) ,
product_type1 varchar(20) ,
product_type2 char(10) ,
product_monthly_sales_cnt int ,
product_comment_time date ,
product_comment_num int ,
product_comment_content varchar(200)
)
row format delimited fields terminated by ','
stored as TEXTFILE;</pre>
<p id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_p2119181543920"><span id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_ph15936220123916">For details about how to import data to an MRS cluster, see "Cluster Operation Guide &gt; Managing Active Clusters &gt; Managing Data Files" in the <em id="EN-US_TOPIC_0000001146360931__i168716518588">MapReduce Service User Guide</em>.</span></p>
</li><li id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_li89651653174915">Create a Hive ORC table named <strong id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_b842352706101330">product_info_orc</strong> in the database demo.<p id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_p17914350195812">Run the following command to create the Hive ORC table <strong id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_b842352706101611">product_info_orc</strong>. The table fields are the same as those of the <strong id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_b842352706101618">product_info</strong> table created in the previous step.</p>
<pre class="screen" id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_screen1312214199597">DROP TABLE <em id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_i166964514165">product_info_orc</em>;
CREATE TABLE <em id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_i8525134819161">product_info_orc</em>
(
product_price int ,
product_id char(30) ,
product_time date ,
product_level char(10) ,
product_name varchar(200) ,
product_type1 varchar(20) ,
product_type2 char(10) ,
product_monthly_sales_cnt int ,
product_comment_time date ,
product_comment_num int ,
product_comment_content varchar(200)
)
row format delimited fields terminated by ','
stored as orc;</pre>
</li><li id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_li139651532491">Insert data in the <strong id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_b842352706101724">product_info</strong> table to the Hive ORC table <strong id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_b842352706101739">product_info_orc</strong>.<pre class="screen" id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_screen19890124905512">insert into <em id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_i3563318172">product_info_orc </em>select * from <em id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_i7417158101718">product_info</em>;</pre>
<p id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_p1085414111575">Query table <strong id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_b84235270610182">product_info_orc</strong>.</p>
<pre class="screen" id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_screen735414118573">select * from <em id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_i46265441713">product_info_orc</em>;</pre>
<p id="EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_p18872064582">If data displayed in the <a href="#EN-US_TOPIC_0000001146360931__en-us_topic_0000001082830951_en-us_topic_0109259515_en-us_topic_0101477888_section55166005141018">Data File</a> can be queried, the data has been successfully inserted to the ORC table.</p>
</li></ol>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dws_04_0210.html">Importing Data from MRS to a Cluster</a></div>
</div>
</div>