doc-exports/docs/mrs/component-operation-guide/mrs_01_0386.html

<a name="mrs_01_0386"></a><a name="mrs_01_0386"></a>

<h1 class="topictitle1">Using CarbonData from Scratch</h1>
<div id="body1589421630479"><p id="mrs_01_0386__a33de4bd77f4c4304810f92439ff56aa3">This section is for MRS 3.x or earlier. For MRS 3.x or later, see <a href="mrs_01_1400.html">Using CarbonData (for MRS 3.x or Later)</a>.</p>
<p id="mrs_01_0386__acc107eedf35543e4b9a6d8349341e6be">This section describes the procedure of using Spark CarbonData. All tasks are based on the Spark-beeline environment. The tasks include:</p>
<ol id="mrs_01_0386__o4d78abb0c52d4710aeb4a3c863d08eae"><li id="mrs_01_0386__le9fd8c6ec03142a9af086f15813f62b6">Connecting to Spark<p id="mrs_01_0386__ae00174deba224b908a3333fed893d2b3"><a name="mrs_01_0386__le9fd8c6ec03142a9af086f15813f62b6"></a><a name="le9fd8c6ec03142a9af086f15813f62b6"></a>Before performing any operation on CarbonData, users must connect CarbonData to Spark.</p>
</li><li id="mrs_01_0386__l4f64f24edfa04ea1ac386a6d705f8faa">Creating a CarbonData table<p id="mrs_01_0386__abacaf61756c8449cb398ed926c2c4a87"><a name="mrs_01_0386__l4f64f24edfa04ea1ac386a6d705f8faa"></a><a name="l4f64f24edfa04ea1ac386a6d705f8faa"></a>After connecting to Spark, users must create a CarbonData table to load and query data.</p>
</li><li id="mrs_01_0386__laec7087abe9f4ae985cfaaf4bd8dd89a">Loading data to the CarbonData table<p id="mrs_01_0386__a5f4027177b3a4a498d697083ed7c6514"><a name="mrs_01_0386__laec7087abe9f4ae985cfaaf4bd8dd89a"></a><a name="laec7087abe9f4ae985cfaaf4bd8dd89a"></a>Users load data from CSV files in HDFS to the CarbonData table.</p>
</li><li id="mrs_01_0386__lab22d151c4484dedbc2ef4ba13eb6f16">Querying data from the CarbonData table<p id="mrs_01_0386__a30af77a928734c6ab24ec7d99ef8b238"><a name="mrs_01_0386__lab22d151c4484dedbc2ef4ba13eb6f16"></a><a name="lab22d151c4484dedbc2ef4ba13eb6f16"></a>After data is loaded to the CarbonData table, users can run query commands such as <strong id="mrs_01_0386__en-us_topic_0056189602_b84235270694748">groupby</strong> and <strong id="mrs_01_0386__en-us_topic_0056189602_b84235270694753">where</strong>.</p>
</li></ol>
<div class="section" id="mrs_01_0386__sf10b35613afe4c6d8c13a2e169220750"><h4 class="sectiontitle">Prerequisites</h4><p id="mrs_01_0386__a0f3ac61926304f079fb8eb3186d124a1">A client has been installed. For details, see <a href="mrs_01_2126.html">Using an MRS Client</a>.</p>
</div>
<div class="section" id="mrs_01_0386__s868bc997d49740f7a96d0f6915b2dac5"><h4 class="sectiontitle">Procedure</h4><ol id="mrs_01_0386__o7f2f7e0d23264930b387db8f13d579a5"><li id="mrs_01_0386__lc568d6217a8648e08a31d162927125a6"><span>Connect CarbonData to Spark.</span><p><ol type="a" id="mrs_01_0386__o48f1a1ad5f39495cbc88231572f348fb"><li id="mrs_01_0386__l23f9e94fe8ca457ab0630fe25cdb804c">Prepare a client based on service requirements and use user <strong id="mrs_01_0386__b02711349598">root</strong> to log in to the node where the client is installed.<p id="mrs_01_0386__en-us_topic_0056189602_p274796142857">For example, if you have updated the client on the Master2 node, log in to the Master2 node to use the client. For details, see <a href="mrs_01_2126.html">Using an MRS Client</a>.</p>
</li><li id="mrs_01_0386__l13b18a2611fd490bae9793218e5cf3c9">Run the following commands to switch the user and configure environment variables:<p id="mrs_01_0386__a34f2de25f9704ff4b06403909f5ce7d9"><a name="mrs_01_0386__l13b18a2611fd490bae9793218e5cf3c9"></a><a name="l13b18a2611fd490bae9793218e5cf3c9"></a><strong id="mrs_01_0386__en-us_topic_0056189602_b560210819742">sudo su - omm</strong></p>
<p id="mrs_01_0386__a05fdc54744f24beea84a8ea0edbefd81"><strong id="mrs_01_0386__a5e73a3b702d14fef935ca549a4ce7ec3">source /opt/client/bigdata_env</strong></p>
</li><li id="mrs_01_0386__lab6db5d6e2554b539e15e442a8dd5ef1">For clusters with Kerberos authentication enabled, run the following command to authenticate the user. For clusters with Kerberos authentication disabled, skip this step.<p id="mrs_01_0386__ae87d15574e94431193a045c8878940a8"><a name="mrs_01_0386__lab6db5d6e2554b539e15e442a8dd5ef1"></a><a name="lab6db5d6e2554b539e15e442a8dd5ef1"></a><strong id="mrs_01_0386__en-us_topic_0056189602_b4538954119504">kinit </strong><em id="mrs_01_0386__en-us_topic_0056189602_i6508981389504"><strong id="mrs_01_0386__b3457422569504">Spark username</strong></em></p>
<div class="note" id="mrs_01_0386__note3768142717515"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="mrs_01_0386__p118910426518">The user needs to be added to user groups <strong id="mrs_01_0386__b142865614116">hadoop</strong> (primary group) and <strong id="mrs_01_0386__b91331411114">hive</strong>.</p>
</div></div>
</li><li id="mrs_01_0386__lbee08bfe9d1149b8a61d429d3adfa357">Run the following command to connect to the Spark environment.<p id="mrs_01_0386__ac08da93eb3bb4c6188ecf67fe7fb430d"><a name="mrs_01_0386__lbee08bfe9d1149b8a61d429d3adfa357"></a><a name="lbee08bfe9d1149b8a61d429d3adfa357"></a><strong id="mrs_01_0386__accc53e8d635f414f8c8fff61a3586352">spark-beeline</strong></p>
</li></ol>
</p></li><li id="mrs_01_0386__l2f75dd5cc666435db76ab7b5b75925fd"><span>Create a CarbonData table.</span><p><p id="mrs_01_0386__a7f262a9202d449398b6cde82163825f4">Run the following command to create a CarbonData table, which is used to load and query data.</p>
<p id="mrs_01_0386__a58f8c307f050446cbf2b53cc23832772"><strong id="mrs_01_0386__a31ff985343804628ba3ec6dfd9306ba3">CREATE TABLE x1 (imei string, deviceInformationId int, mac string, productdate timestamp, updatetime timestamp, gamePointId double, contractNumber double)</strong></p>
<p id="mrs_01_0386__a58b8ad42b31c40b0bbad0a7c29209dcc"><strong id="mrs_01_0386__a77bd38f732784b038d890af2d09e7c1e">STORED BY 'org.apache.carbondata.format'</strong></p>
<p id="mrs_01_0386__adeebbfc655ce40fe9d02ed0d527f3d67"><strong id="mrs_01_0386__a64975f134d8c4cc2ba83ef4a7066f921">TBLPROPERTIES ('DICTIONARY_EXCLUDE'='mac','DICTIONARY_INCLUDE'='deviceInformationId');</strong></p>
<p id="mrs_01_0386__a52ec19543b5740578784674b6157f463">The command output is as follows:</p>
<pre class="screen" id="mrs_01_0386__sce51c9727d664e49a452d2d39f074e3f">+---------+--+
| result  |
+---------+--+
+---------+--+
No rows selected (1.551 seconds)</pre>
</p></li><li id="mrs_01_0386__lc90cf170eea44a3fbf8e83ee4f4b024e"><span>Load data from CSV files to the CarbonData table.</span><p><p id="mrs_01_0386__a7c1c7d315b964813a5dfc1a00a2af0a3">Run the command to load data from CSV files based on the required parameters. Only CSV files are supported. The CSV column name and sequence configured in the <strong id="mrs_01_0386__b1022903620182">LOAD</strong> command must be consistent with those in the CarbonData table. The data formats and number of data columns in the CSV files must also be the same as those in the CarbonData table.</p>
<p id="mrs_01_0386__p96617401666">The CSV files must be stored on HDFS. You can upload the files to OBS and import them from OBS to HDFS on the <strong id="mrs_01_0386__b13931712111616">Files</strong> page of the MRS console.</p>
<p id="mrs_01_0386__a781b58eea7564efc9fe85cbcb4f2fd0d">If Kerberos authentication is enabled, prepare the CSV files in the work environment and import them to HDFS using open-source HDFS commands. In addition, assign the Spark user with the read and execute permissions of the files on HDFS by referring to <a href="mrs_01_1406.html#mrs_01_1406__li122143593123">5</a>.</p>
<p id="mrs_01_0386__a06dd21c0e8c74500b018f2deea277545">For example, the <span class="filepath" id="mrs_01_0386__en-us_topic_0056189602_filepath16066606331052"><b>data.csv</b></span> file is saved in the <span class="filepath" id="mrs_01_0386__en-us_topic_0056189602_filepath106458084210519"><b>tmp</b></span> directory of HDFS with the following contents:</p>
<pre class="screen" id="mrs_01_0386__sa7f9f03ecfeb47c2957fe45e1cd04823">x123,111,dd,2017-04-20 08:51:27,2017-04-20 07:56:51,2222,33333</pre>
<p id="mrs_01_0386__acf5a9361dac94d909bef476c3abf31cd">The command for loading data from that file is as follows:</p>
<p id="mrs_01_0386__a20f0767838164aa2ba33dd85ab630cd5"><strong id="mrs_01_0386__aaf27e3df75c948198ab025cede2b8095">LOAD DATA inpath 'hdfs://hacluster/tmp/data.csv' into table x1 options('DELIMITER'=',','QUOTECHAR'='"','FILEHEADER'='imei, deviceinformationid,mac,productdate,updatetime,gamepointid,contractnumber');</strong></p>
<p id="mrs_01_0386__a4635c934c2c941ca98bacb67f5b17e74">The command output is as follows:</p>
<pre class="screen" id="mrs_01_0386__se59261ef27df46a482edff020f8cb450">+---------+--+
| Result  |
+---------+--+
+---------+--+
No rows selected (3.039 seconds)</pre>
</p></li><li id="mrs_01_0386__l2afe94d3203c49d89a0dd0284df08172"><span>Query data from the CarbonData.</span><p><ul id="mrs_01_0386__ud6fe383ae07744a5a68410580035e677"><li id="mrs_01_0386__l60c05114c29346dea641b5dff837090e"><strong id="mrs_01_0386__ae42eccec63cb4bcf8d0747edf5dac44a">Obtaining the number of records</strong><p id="mrs_01_0386__afde706fa90fb478f9e9ec9d420d8b0a3">Run the following command to obtain the number of records in the CarbonData table:</p>
<p id="mrs_01_0386__afac9eb51194f408c907e594368fc4434"><strong id="mrs_01_0386__a26fbcf410cb7434681e041b2e1650888">select count(*) from x1;</strong></p>
</li><li id="mrs_01_0386__l8a218cab94184557bd7b219adfca90cd"><strong id="mrs_01_0386__a421c074621484bd2a62fd869252e6c6c">Querying with the groupby condition</strong><p id="mrs_01_0386__a89bfaf4cb6dd4877ab7d6764c24e4329">Run the following command to obtain the <span class="parmname" id="mrs_01_0386__p1795e555f4c24ea38cc02bffef40cb3f"><b>deviceinformationid</b></span> records without repetition in the CarbonData table:</p>
<p id="mrs_01_0386__a853225a69d6d4577bb006ab70ab89fd0"><strong id="mrs_01_0386__ab9d7166068d74ae28d0dc852384a6f73">select deviceinformationid,count (distinct deviceinformationid) from x1 group by deviceinformationid;</strong></p>
</li><li id="mrs_01_0386__l2aa894c4e57f4f41a887e847f67ce249"><strong id="mrs_01_0386__a3ce997d26b9b4f8fb621351985843ebd">Querying with the where condition</strong><p id="mrs_01_0386__a1f41fe3553184d76b3d7e3d62a3e5458">Run the following command to obtain specific <strong id="mrs_01_0386__afe47a0e06ad646d0b82043e02435196e">deviceinformationid</strong> records:</p>
<p id="mrs_01_0386__a3a51900a7ecb47dc98763abaa104e253"><strong id="mrs_01_0386__a25077add75ab4a82bb72f2b64abf6330">select * from x1 where deviceinformationid='111';</strong></p>
</li></ul>
<div class="note" id="mrs_01_0386__ncbc01999c3d7476cb81a85b69eeb4db2"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="mrs_01_0386__aad19aa6c73034070a2ebf1daa6cbfe8e">If the query result has non-English characters, the columns in the query result may not be aligned. This is because characters of different languages occupy different widths.</p>
</div></div>
</p></li><li id="mrs_01_0386__la0aaeab8449e4763967003e9a0205d94"><span>Run the following command to exit the Spark environment.</span><p><p id="mrs_01_0386__en-us_topic_0056189602_p950577019754"><strong id="mrs_01_0386__en-us_topic_0056189602_b584565421983">!quit</strong></p>
</p></li></ol>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_0385.html">Using CarbonData (for Versions Earlier Than MRS 3.x)</a></div>
</div>
</div>