forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
60 lines
16 KiB
HTML
60 lines
16 KiB
HTML
<a name="mrs_01_0760"></a><a name="mrs_01_0760"></a>
|
|
|
|
<h1 class="topictitle1">Accessing Alluxio Using a Data Application</h1>
|
|
<div id="body1589421646581"><p id="mrs_01_0760__p1120316191571">The port number used for accessing the Alluxio file system is 19998, and the access address is <strong id="mrs_01_0760__b68431111641">alluxio://</strong><em id="mrs_01_0760__i12290114140"><Master node IP address of Alluxio></em><strong id="mrs_01_0760__b19687117242">:19998/</strong><em id="mrs_01_0760__i62074203415"><PATH></em>. This section uses examples to describe how to access the Alluxio file system using data applications (Spark, Hive, Hadoop MapReduce, and Presto).</p>
|
|
<div class="section" id="mrs_01_0760__section450673117125"><h4 class="sectiontitle">Using Alluxio as the Input and Output of a Spark Application</h4><ol id="mrs_01_0760__ol573635916120"><li id="mrs_01_0760__li67368594122"><span>Log in to the Master node in a cluster as user <strong id="mrs_01_0760__b106766598823423">root</strong> using the password set during cluster creation.</span></li><li id="mrs_01_0760__li18666184218138"><span>Run the following command to configure environment variables:</span><p><p id="mrs_01_0760__p1413125131316"><strong id="mrs_01_0760__b5834917148">source /opt/client/bigdata_env</strong></p>
|
|
</p></li><li id="mrs_01_0760__li9632101013154"><span>If Kerberos authentication is enabled for the current cluster, run the following command to authenticate the user. If Kerberos authentication is disabled for the current cluster, skip this step:</span><p><p id="mrs_01_0760__p3903911193255"><strong id="mrs_01_0760__b78912088023423">kinit</strong> <em id="mrs_01_0760__i87571140923423">MRS cluster user</em></p>
|
|
<p id="mrs_01_0760__p23308555145027">Example: <strong id="mrs_01_0760__b21788660145616">kinit admin</strong></p>
|
|
</p></li><li id="mrs_01_0760__li1772118813143"><span>Prepare an input file and copy local data to the Alluxio file system.</span><p><p id="mrs_01_0760__p164611751171516">For example, prepare the input file <strong id="mrs_01_0760__b73986414523423">test_input.txt</strong> in the local <strong id="mrs_01_0760__b163438549823423">/home</strong> directory, and run the following command to save the <strong id="mrs_01_0760__b179137304623423">test_input.txt</strong> file to Alluxio:</p>
|
|
<p id="mrs_01_0760__p10914154613156"><strong id="mrs_01_0760__b16681789161">alluxio fs copyFromLocal /home/test_input.txt /input</strong></p>
|
|
</p></li><li id="mrs_01_0760__li1957191214161"><span>Run the following commands to start <strong id="mrs_01_0760__b24555722523423">spark-shell</strong>:</span><p><p id="mrs_01_0760__p57609233163"><strong id="mrs_01_0760__b19834163014164">spark-shell</strong></p>
|
|
</p></li><li id="mrs_01_0760__li6273163481614"><span>Run the following commands in spark-shell:</span><p><p id="mrs_01_0760__p17111751202120"><strong id="mrs_01_0760__b1221735511219">val s = sc.textFile("alluxio://<<em id="mrs_01_0760__i9997162084512">Name of the Alluxio node</em>>:19998/input")</strong></p>
|
|
<p id="mrs_01_0760__p81111514217"><strong id="mrs_01_0760__b13224135552112">val double = s.map(line => line + line)</strong></p>
|
|
<p id="mrs_01_0760__p61115111214"><strong id="mrs_01_0760__b2022905572118">double.saveAsTextFile("alluxio://<<em id="mrs_01_0760__i16724173819484">Name of the Alluxio node</em>>:19998/output")</strong></p>
|
|
<div class="note" id="mrs_01_0760__note33301467249"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="mrs_01_0760__p4330186162411">Replace <strong id="mrs_01_0760__b17451122664919"><em id="mrs_01_0760__i147291122154920">Name of the Alluxio node</em></strong><strong id="mrs_01_0760__b1045142614913">>:19998</strong> with the actual node name and port numbers of all nodes where the AlluxioMaster instance is deployed. Use commas (,) to separate the node name and port number, for example, <strong id="mrs_01_0760__b93891733155218">node-ana-coremspb.mrs-m0va.com:19998,node-master2kiww.mrs-m0va.com:19998,node-master1cqwv.mrs-m0va.com:19998</strong>.</p>
|
|
</div></div>
|
|
</p></li><li id="mrs_01_0760__li1677164893916"><span>Press <strong id="mrs_01_0760__b12917752152110">Ctrl+C</strong> to exit spark-shell.</span></li><li id="mrs_01_0760__li7193050151711"><span>Run the <strong id="mrs_01_0760__b204523556423423">alluxio fs ls /</strong> command to check whether the output directory <strong id="mrs_01_0760__b124767892823423">/output</strong> containing double content of the input file exists in the root directory of Alluxio.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="mrs_01_0760__section7925727121811"><h4 class="sectiontitle">Creating a Hive Table on Alluxio</h4><ol id="mrs_01_0760__ol12316550181811"><li id="mrs_01_0760__li1131685081810"><span>Log in to the Master node in a cluster as user <strong id="mrs_01_0760__b3191799523423">root</strong> using the password set during cluster creation.</span></li><li id="mrs_01_0760__li16561835101916"><span>Run the following command to configure environment variables:</span><p><p id="mrs_01_0760__p85616356191"><strong id="mrs_01_0760__b5567353196">source /opt/client/bigdata_env</strong></p>
|
|
</p></li><li id="mrs_01_0760__li3561435121920"><span>If Kerberos authentication is enabled for the current cluster, run the following command to authenticate the user. If Kerberos authentication is disabled for the current cluster, skip this step:</span><p><p id="mrs_01_0760__p056113516198"><strong id="mrs_01_0760__b54668998423423">kinit</strong> <em id="mrs_01_0760__i82446430123423">MRS cluster user</em></p>
|
|
<p id="mrs_01_0760__p1566358192">Example: <strong id="mrs_01_0760__b105617356196">kinit admin</strong></p>
|
|
</p></li><li id="mrs_01_0760__li7861163741920"><span>Prepare an input file. For example, prepare the <strong id="mrs_01_0760__b105807558823423">hive_load.txt</strong> input file in the local <strong id="mrs_01_0760__b30186718823423">/home</strong> directory. The file content is as follows:</span><p><pre class="screen" id="mrs_01_0760__screen68216581194">1, Alice, company A
|
|
2, Bob, company B</pre>
|
|
</p></li><li id="mrs_01_0760__li14882410207"><span>Run the following command to import the <strong id="mrs_01_0760__b31328748823423">hive_load.txt</strong> file to Alluxio:</span><p><p id="mrs_01_0760__p8443135875214"><strong id="mrs_01_0760__b48741844132013">alluxio fs copyFromLocal /home/hive_load.txt /hive_input</strong></p>
|
|
</p></li><li id="mrs_01_0760__li78549476204"><span>Run the following command to start the Hive beeline:</span><p><p id="mrs_01_0760__p5872162820373"><strong id="mrs_01_0760__b16206904218">beeline</strong></p>
|
|
</p></li><li id="mrs_01_0760__li1125919546213"><span>Run the following commands in beeline to create a table based on the input file in Alluxio:</span><p><p id="mrs_01_0760__p157912039062"><strong id="mrs_01_0760__b1251165510615">CREATE TABLE u_user(id INT, name STRING, company STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;</strong></p>
|
|
<p id="mrs_01_0760__p497134214611"><strong id="mrs_01_0760__b1681104012216">LOAD DATA INPATH 'alluxio://<<em id="mrs_01_0760__i17596163315544">Name of the Alluxio node</em>>:19998/hive_input' INTO TABLE u_user;</strong></p>
|
|
<div class="note" id="mrs_01_0760__note43811935142315"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="mrs_01_0760__p2187944112312">Replace <strong id="mrs_01_0760__b1814174314544"><em id="mrs_01_0760__i18814114355418">Name of the Alluxio node</em></strong><strong id="mrs_01_0760__b981413431545">>:19998</strong> with the actual node name and port numbers of all nodes where the AlluxioMaster instance is deployed. Use commas (,) to separate the node name and port number, for example, <strong id="mrs_01_0760__b8814114319549">node-ana-coremspb.mrs-m0va.com:19998,node-master2kiww.mrs-m0va.com:19998,node-master1cqwv.mrs-m0va.com:19998</strong>.</p>
|
|
</div></div>
|
|
</p></li><li id="mrs_01_0760__li14206645112219"><span>Run the following command to view the created table:</span><p><p id="mrs_01_0760__p19675184652215"><strong id="mrs_01_0760__b752293810228">select * from u_user;</strong></p>
|
|
</p></li></ol>
|
|
</div>
|
|
<div class="section" id="mrs_01_0760__section12420127102313"><h4 class="sectiontitle">Running Hadoop Wordcount in Alluxio</h4><ol id="mrs_01_0760__ol1903182720235"><li id="mrs_01_0760__li135810312234"><span>Log in to the Master node in a cluster as user <strong id="mrs_01_0760__b133718962723423">root</strong> using the password set during cluster creation.</span></li><li id="mrs_01_0760__li958193119233"><span>Run the following command to configure environment variables:</span><p><p id="mrs_01_0760__p1558183111235"><strong id="mrs_01_0760__b12581031112312">source /opt/client/bigdata_env</strong></p>
|
|
</p></li><li id="mrs_01_0760__li1858203117232"><span>If Kerberos authentication is enabled for the current cluster, run the following command to authenticate the user. If Kerberos authentication is disabled for the current cluster, skip this step:</span><p><p id="mrs_01_0760__p2587316232"><strong id="mrs_01_0760__b32269527523423">kinit</strong> <em id="mrs_01_0760__i35797258123423">MRS cluster user</em></p>
|
|
<p id="mrs_01_0760__p105817315238">Example: <strong id="mrs_01_0760__b12581431162312">kinit admin</strong></p>
|
|
</p></li><li id="mrs_01_0760__li45873192319"><span>Prepare an input file and copy local data to the Alluxio file system.</span><p><p id="mrs_01_0760__p215085822315">For example, prepare the input file <strong id="mrs_01_0760__b202147255423423">test_input.txt</strong> in the local <strong id="mrs_01_0760__b18118017423423">/home</strong> directory, and run the following command to save the <strong id="mrs_01_0760__b105808322923423">test_input.txt</strong> file to Alluxio:</p>
|
|
<p id="mrs_01_0760__p15479144145914"><strong id="mrs_01_0760__b13293172110244">alluxio fs copyFromLocal /home/test_input.txt /input</strong></p>
|
|
</p></li><li id="mrs_01_0760__li48193248242"><span>Run the following command to execute the wordcount job:</span><p><p id="mrs_01_0760__p18192020625"><strong id="mrs_01_0760__b1243421132512">yarn jar /opt/share/hadoop-mapreduce-examples-<<em id="mrs_01_0760__i11453111516569">Hadoop version</em>>-mrs-<<em id="mrs_01_0760__i1878013331566">MRS cluster version</em>>/hadoop-mapreduce-examples-<<em id="mrs_01_0760__i137063516568">Hadoop version</em>>-mrs-<<em id="mrs_01_0760__i1187218582565">MRS cluster version</em>>.jar wordcount alluxio://<<em id="mrs_01_0760__i4802418115719">Name of the Alluxio node</em>>:19998/input alluxio://<<em id="mrs_01_0760__i31232256571">Name of the Alluxio node</em>>:19998/output</strong></p>
|
|
<div class="note" id="mrs_01_0760__note367011261222"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ul id="mrs_01_0760__ul14753631202211"><li id="mrs_01_0760__li855224519222">Replace <strong id="mrs_01_0760__b6740111213594"><</strong><strong id="mrs_01_0760__b674011217591"><em id="mrs_01_0760__i139851936587">Hadoop version</em></strong><strong id="mrs_01_0760__b117401112135912">></strong> with the actual one.</li><li id="mrs_01_0760__li828435832219">Replace <strong id="mrs_01_0760__b19940201915914"><</strong><strong id="mrs_01_0760__b4940419125912"><em id="mrs_01_0760__i796785115914">MRS cluster version</em></strong><strong id="mrs_01_0760__b1940171925916">></strong> with the major version of MRS. For example, for a cluster of MRS 1.9.2, mrs-1.9.0 is used.</li><li id="mrs_01_0760__li187531331182212">Replace <strong id="mrs_01_0760__b1637135345913"><em id="mrs_01_0760__i2063715319597">Name of the Alluxio node</em></strong><strong id="mrs_01_0760__b1963725319595">>:19998</strong> with the actual node name and port numbers of all nodes where the AlluxioMaster instance is deployed. Use commas (,) to separate the node name and port number, for example, <strong id="mrs_01_0760__b263735395910">node-ana-coremspb.mrs-m0va.com:19998,node-master2kiww.mrs-m0va.com:19998,node-master1cqwv.mrs-m0va.com:19998</strong>.</li></ul>
|
|
</div></div>
|
|
</p></li><li id="mrs_01_0760__li14902171392512"><span>Run the <strong id="mrs_01_0760__b50972070823423">alluxio fs ls /</strong> command to check whether the output directory <strong id="mrs_01_0760__b154594931423423">/output</strong> containing the wordcount result exists in the root directory of Alluxio.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="mrs_01_0760__section494714103266"><h4 class="sectiontitle">Using Presto to Query Tables in Alluxio</h4><ol id="mrs_01_0760__ol6666122710262"><li id="mrs_01_0760__li3666152772618"><span>Log in to the Master node in a cluster as user <strong id="mrs_01_0760__b42124768623423">root</strong> using the password set during cluster creation.</span></li><li id="mrs_01_0760__li1366692782619"><span>Run the following command to configure environment variables:</span><p><p id="mrs_01_0760__p106661527162610"><strong id="mrs_01_0760__b136661127152617">source /opt/client/bigdata_env</strong></p>
|
|
</p></li><li id="mrs_01_0760__li11666182712263"><span>If Kerberos authentication is enabled for the current cluster, run the following command to authenticate the user. If Kerberos authentication is disabled for the current cluster, skip this step:</span><p><p id="mrs_01_0760__p11666112712618"><strong id="mrs_01_0760__b63137139723423">kinit</strong> <em id="mrs_01_0760__i49797120323423">MRS cluster user</em></p>
|
|
<p id="mrs_01_0760__p96661227142616">Example: <strong id="mrs_01_0760__b1666192719264">kinit admin</strong></p>
|
|
</p></li><li id="mrs_01_0760__li16661527172618"><span>Run the following commands to start Hive Beeline to create a table on Alluxio.</span><p><p id="mrs_01_0760__p10761937174013"><strong id="mrs_01_0760__b17419132213277">beeline</strong></p>
|
|
<p id="mrs_01_0760__p937819407400"><strong id="mrs_01_0760__b74301228275">CREATE TABLE u_user (id int, name string, company string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 'alluxio://<<em id="mrs_01_0760__i91725562013">Name of the Alluxio node</em>>:19998/u_user';</strong></p>
|
|
<p id="mrs_01_0760__p2354127192713"><strong id="mrs_01_0760__b644012214274">insert into u_user values(1,'Alice','Company A'),(2, 'Bob', 'Company B');</strong></p>
|
|
<div class="note" id="mrs_01_0760__note790712170237"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="mrs_01_0760__p1993712510235">Replace <strong id="mrs_01_0760__b130373715218"><em id="mrs_01_0760__i6303163714211">Name of the Alluxio node</em></strong><strong id="mrs_01_0760__b530313371226">>:19998</strong> with the actual node name and port numbers of all nodes where the AlluxioMaster instance is deployed. Use commas (,) to separate the node name and port number, for example, <strong id="mrs_01_0760__b1304163717214">node-ana-coremspb.mrs-m0va.com:19998,node-master2kiww.mrs-m0va.com:19998,node-master1cqwv.mrs-m0va.com:19998</strong>.</p>
|
|
</div></div>
|
|
</p></li><li id="mrs_01_0760__li1489191518288"><span>Start the Presto client. For details, see <a href="mrs_01_0434.html#mrs_01_0434__li9368161132311">2</a> to <a href="mrs_01_0434.html#mrs_01_0434__li15202527183812">8</a> in <a href="mrs_01_0434.html">Using a Client to Execute Query Statements</a>.</span></li><li id="mrs_01_0760__li449804943211"><span>On the Presto client, run the <strong id="mrs_01_0760__b1572305123519">select * from hive.default.u_user;</strong> statement to query the table created in Alluxio:</span><p><div class="fignone" id="mrs_01_0760__fig129013018357"><span class="figcap"><b>Figure 1 </b>Using Presto to query the table created in Alluxio</span><br><span><img id="mrs_01_0760__image14557131914417" src="en-us_image_0000001349170061.png"></span></div>
|
|
</p></li></ol>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_0756.html">Using Alluxio</a></div>
|
|
</div>
|
|
</div>
|
|
|