Yang, Tong 6182f91ba8 MRS component operation guide_normal 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-09 14:55:21 +00:00

71 lines
11 KiB
HTML

<a name="mrs_01_1929"></a><a name="mrs_01_1929"></a>
<h1 class="topictitle1">Getting Started</h1>
<div id="body1595920205285"><p id="mrs_01_1929__a6e676d7951ba4e9e95db18a605c53664">This section describes how to use Spark2x to submit Spark applications, including Spark Core and Spark SQL. Spark Core is the kernel module of Spark. It executes tasks and is used to compile Spark applications. Spark SQL is a module that executes SQL statements. </p>
<div class="section" id="mrs_01_1929__s3f3243f155584adea798b1be45b9e386"><h4 class="sectiontitle">Scenario Description</h4><p id="mrs_01_1929__af9124550b57d4389bee89da9987886db">Develop a Spark application to perform the following operations on logs about netizens' dwell time for online shopping on a weekend.</p>
<ul id="mrs_01_1929__udba8bb2c79744ce3aecb327e8dc39ca9"><li id="mrs_01_1929__l8cd8fb6e3df74c3f9e747200465b3f8c">Collect statistics on female netizens who dwell on online shopping for more than 2 hours on the weekend.</li><li id="mrs_01_1929__lf3367d7979de4740abda4d057da37869">The first column in the log file records names, the second column records genders, and the third column records the dwell durations in the unit of minute. Three columns are separated by comma (,).</li></ul>
<p id="mrs_01_1929__a81dccf98f1804cbdbbe5bd5d1b1633af"><strong id="mrs_01_1929__b23263533011281">log1.txt</strong>: logs collected on Saturday</p>
<pre class="screen" id="mrs_01_1929__s655d928a134a47af95285fbc7e2dd4e0">LiuYang,female,20
YuanJing,male,10
GuoYijun,male,5
CaiXuyu,female,50
Liyuan,male,20
FangBo,female,50
LiuYang,female,20
YuanJing,male,10
GuoYijun,male,50
CaiXuyu,female,50
FangBo,female,60</pre>
<p id="mrs_01_1929__a9e1580a3a62a4a349e270a771d41ff3d"><strong id="mrs_01_1929__b68315760511281">log2.txt</strong>: logs collected on Sunday</p>
<pre class="screen" id="mrs_01_1929__s6cf74abb192b4a80aa908eac65c4d0c1">LiuYang,female,20
YuanJing,male,10
CaiXuyu,female,50
FangBo,female,50
GuoYijun,male,5
CaiXuyu,female,50
Liyuan,male,20
CaiXuyu,female,50
FangBo,female,50
LiuYang,female,20
YuanJing,male,10
FangBo,female,50
GuoYijun,male,50
CaiXuyu,female,50
FangBo,female,60 </pre>
</div>
<div class="section" id="mrs_01_1929__s896a88005fdf49d3bd7d0e30e7609222"><h4 class="sectiontitle">Prerequisites</h4><ul id="mrs_01_1929__u8238a509ce324ac6bca5f74035928e23"><li id="mrs_01_1929__l4f8e01e9ce8f4c8dbf9190302d97f57d">On Manager, you have created a user and granted the HDFS, Yarn, Kafka, and Hive permissions to the user.</li><li id="mrs_01_1929__l289b916b9ceb4873b9d19a3b836bcaa9">You have installed and configured tools such as IntelliJ IDEA and JDK based on the development language.</li><li id="mrs_01_1929__l2362814fb4a94cfa98525cf9d6df7ac3">You have installed the Spark2x client and configured the client network connection.</li><li id="mrs_01_1929__lf629b14cfd4341ae8eeadd7e5335d503">For Spark SQL programs, you have started Spark SQL or Beeline on the client to enter SQL statements.</li></ul>
</div>
<div class="section" id="mrs_01_1929__sc5f99a209f004e958e5e2eaacd0f0ab8"><h4 class="sectiontitle">Procedure</h4><ol id="mrs_01_1929__o629d474c9d5142e8b1432ed6e721c9cd"><li id="mrs_01_1929__l77a871bc1c094dceaad5aad62c026a7e"><a name="mrs_01_1929__l77a871bc1c094dceaad5aad62c026a7e"></a><a name="l77a871bc1c094dceaad5aad62c026a7e"></a><span>Obtain the sample project and import it to IDEA. Import the JAR package on which the sample project depends. Use IDEA to configure and generate JAR packages.</span></li><li id="mrs_01_1929__lb7fcda0824644fa39186196187e1f0b9"><span>Prepare the data required by the sample project.</span><p><div class="p" id="mrs_01_1929__aea9125f048414a4e874a4bf11d05ca7e">Save the original log files in the scenario description to the HDFS system.<ol type="a" id="mrs_01_1929__o0bfe1121300b4b6ba949697e4155b678"><li id="mrs_01_1929__lc8e2ff515cd04ded9326569a61755e32">Create two text files (<strong id="mrs_01_1929__b76858671811281">input_data1.txt</strong> and <strong id="mrs_01_1929__b53126341411281">input_data2.txt</strong>) on the local host and copy the content in the <strong id="mrs_01_1929__b122801836711281">log1.txt</strong> and <strong id="mrs_01_1929__b206993422311281">log2.txt</strong> files to the <strong id="mrs_01_1929__b73621030211281">input_data1.txt</strong> and <strong id="mrs_01_1929__b174448241111281">input_data2.txt</strong> files, respectively.</li><li id="mrs_01_1929__l6e2b671e1384403da9edbe9fddac4872"><a name="mrs_01_1929__l6e2b671e1384403da9edbe9fddac4872"></a><a name="l6e2b671e1384403da9edbe9fddac4872"></a>Create the <strong id="mrs_01_1929__b813347311281">/tmp/input</strong> directory in HDFS, and upload <strong id="mrs_01_1929__b158014944311281">input_data1.txt</strong> and <strong id="mrs_01_1929__b149381990711281">input_data2.txt</strong> to the <strong id="mrs_01_1929__b9596624011281">/tmp/input</strong> directory:</li></ol>
</div>
</p></li><li id="mrs_01_1929__l96bf79434b2946cebb71d7ecf337af33"><span>Upload the generated JAR package to the Spark2x running environment (Spark2x client), for example, <span class="filepath" id="mrs_01_1929__f081cb81ca3f0446f9fb2e845efe2007f"><b>/opt/female</b></span>.</span></li><li id="mrs_01_1929__lf3302495e40a4d5dac621ffd778efc5e"><span>Go the client directory, configure the environment variables, and log in to the system. When you use a client to connect to a specific instance in a scenario where multiple Spark2x instances are installed or Spark and Spark2x instances are installed, run the following commands to load the environment variables of the instance.</span><p><p id="mrs_01_1929__a6b46c6126fde420dac710fe838fd6f94"><i><b><span class="cmdname" style="font-family:Arial" id="mrs_01_1929__cmdname3607151193014">source bigdata_env</span></b></i></p>
<p id="mrs_01_1929__p7260190175116"><b><span class="cmdname" id="mrs_01_1929__cmdname917417102820">source Spark2x/component_env</span></b></p>
<p id="mrs_01_1929__a5ff6850adc834198baf031fd3a9c74d1"><strong id="mrs_01_1929__b17729121152415">kinit &lt;</strong><em id="mrs_01_1929__i37302182414">service user for authentication</em><strong id="mrs_01_1929__b273017120249">&gt;</strong></p>
</p></li><li id="mrs_01_1929__l89b647b0710b42f9963e106c5825313d"><span>Run the following script in the <strong id="mrs_01_1929__b187681005911281">bin</strong> directory to submit the Spark application:</span><p><p id="mrs_01_1929__a2a583150840b49efbe07198ced5c22f2"><i><b><span class="cmdname" style="font-family:Arial" id="mrs_01_1929__cmdname027042918305">spark-submit --class</span></b></i> <i><span class="varname" id="mrs_01_1929__v5189db801f3d4f95a3fe151dbda4f70b">com.xxxx.bigdata.spark.examples.FemaleInfoCollection</span></i> <i><b><span class="cmdname" style="font-family:Arial" id="mrs_01_1929__cmdname1527092915309">--master yarn-client</span></b></i> <i><span class="varname" id="mrs_01_1929__v33a359a5cdb548c8bff0750a1475b29b">/opt/female/FemaleInfoCollection.jar &lt;inputPath&gt;</span></i></p>
<div class="note" id="mrs_01_1929__note133928426112"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ul id="mrs_01_1929__ul1026784512136"><li id="mrs_01_1929__li9267124520137"><strong id="mrs_01_1929__b178345519581">FemaleInfoCollection.jar</strong> is the JAR package generated in <a href="#mrs_01_1929__l77a871bc1c094dceaad5aad62c026a7e">1</a>.</li><li id="mrs_01_1929__li11268154531312"><strong id="mrs_01_1929__b617901605910">&lt;inputPath&gt;</strong> is the directory created in <a href="#mrs_01_1929__l6e2b671e1384403da9edbe9fddac4872">2.b</a>.</li></ul>
</div></div>
</p></li><li id="mrs_01_1929__l57b093d8a7bf4d5a8a67ad089a0cdf1d"><span>(Optional) After calling the <strong id="mrs_01_1929__b177091202311281">spark-sql</strong> or <strong id="mrs_01_1929__b42303654111281">spark-beeline</strong> script in the <strong id="mrs_01_1929__b182449022511281">bin</strong> directory, directly enter SQL statements to perform operations such as query.</span><p><p id="mrs_01_1929__a9df43a9600f645c184d39c69531b735e">For example, create a table, insert a piece of data, and then query the table.</p>
<pre class="screen" id="mrs_01_1929__sbaab35ab543745cf9eb544fccf63bd15">spark-sql&gt; CREATE TABLE TEST(NAME STRING, AGE INT);
Time taken: 0.348 seconds
spark-sql&gt;INSERT INTO TEST VALUES('Jack', 20);
Time taken: 1.13 seconds
spark-sql&gt; SELECT * FROM TEST;
Jack 20
Time taken: 0.18 seconds, Fetched 1 row(s)</pre>
</p></li><li id="mrs_01_1929__ld86303b2b69d4ec3bfbdfd2b6317a90d"><span>View the running result of the Spark application.</span><p><ul id="mrs_01_1929__u6252f5bbd63843ffa6ae00a0dea12021"><li id="mrs_01_1929__la783f93744f744eda5a204ea60a76cef">View the running result data in a specified file.<p id="mrs_01_1929__afac2f708781f4a8394ed26bc618268a7"><a name="mrs_01_1929__la783f93744f744eda5a204ea60a76cef"></a><a name="la783f93744f744eda5a204ea60a76cef"></a>The storage path and format of the result data are specified by the Spark application.</p>
</li><li id="mrs_01_1929__l6bddd238c440495e902ed2d5ba76340a">Check the running status on the web page.<ol type="a" id="mrs_01_1929__o72e870619b8349559a26a436e59d747a"><li id="mrs_01_1929__lae6f74a3590d4e33a6884886623f0abe">Log in to Manager. Select <strong id="mrs_01_1929__b123306341511281">Spark2x</strong> from the <strong id="mrs_01_1929__b188326740811281">Service</strong> drop-down list.</li></ol><ol type="a" start="2" id="mrs_01_1929__oa85e6c8cab4c40a7834831af85d74db3"><li id="mrs_01_1929__l08bc0abc862f41b5888e01883cb47b6f">Go to the Spark2x overview page and click an instance in the Spark web UI, for example, <strong id="mrs_01_1929__b102630116011281">JobHistory2x(host2)</strong>.</li><li id="mrs_01_1929__l4261f8e7632446a69d6e2f81c60a27f6">The History Server UI is displayed.<p id="mrs_01_1929__a629ed9416c6e405d8ab1670217411a3a"><a name="mrs_01_1929__l4261f8e7632446a69d6e2f81c60a27f6"></a><a name="l4261f8e7632446a69d6e2f81c60a27f6"></a>The History Server UI is used to display the status of Spark applications that are complete or incomplete.</p>
<div class="fignone" id="mrs_01_1929__fig35497358166"><span class="figcap"><b>Figure 1 </b>History Server UI</span><br><span><img id="mrs_01_1929__image17621929191610" src="en-us_image_0000001439299573.png"></span></div>
</li><li id="mrs_01_1929__l6ca5aead42e14bbc8354d709c1b35114">Select an application ID and click this page to go to the Spark UI of the application.<p id="mrs_01_1929__ac7da9f55d77940019569eb867d7f648e"><a name="mrs_01_1929__l6ca5aead42e14bbc8354d709c1b35114"></a><a name="l6ca5aead42e14bbc8354d709c1b35114"></a>Spark UI: used to display the status of running applications.</p>
<div class="fignone" id="mrs_01_1929__fig7825164831912"><span class="figcap"><b>Figure 2 </b>Spark UI</span><br><span><img id="mrs_01_1929__image55054001913" src="en-us_image_0000001438291713.png"></span></div>
</li></ol>
</li><li id="mrs_01_1929__l6ba3c0fbf4934bec8fb6f1ec094b7c86">View Spark logs to learn application runtime conditions.<p id="mrs_01_1929__ad2660b2588a74495b3ba7ffacaf6c4d7"><a name="mrs_01_1929__l6ba3c0fbf4934bec8fb6f1ec094b7c86"></a><a name="l6ba3c0fbf4934bec8fb6f1ec094b7c86"></a>View <a href="mrs_01_1971.html">Spark2x Logs</a> to learn application running status, and adjust applications based on log information.</p>
</li></ul>
</p></li></ol>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1928.html">Basic Operation</a></div>
</div>
</div>