Yang, Tong 6182f91ba8 MRS component operation guide_normal 2.0.38.SP20 version
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com>
Co-authored-by: Yang, Tong <yangtong2@huawei.com>
Co-committed-by: Yang, Tong <yangtong2@huawei.com>
2022-12-09 14:55:21 +00:00

73 lines
13 KiB
HTML

<a name="mrs_01_1934"></a><a name="mrs_01_1934"></a>
<h1 class="topictitle1">Spark on HBase V2 Overview and Basic Applications</h1>
<div id="body1595920205566"><div class="section" id="mrs_01_1934__scf5e5ef1867443e38f6539643d7069e0"><h4 class="sectiontitle">Scenario</h4><p id="mrs_01_1934__a18ba939289fd42af8928b03e4f5e8cda">Spark on HBase V2 allows users to query HBase tables in Spark SQL and to store data for HBase tables by using the Beeline tool. You can use HBase APIs to create, read data from, and insert data into tables.</p>
</div>
<div class="section" id="mrs_01_1934__s53d6716d486a44c4abd1c6a267f34b13"><h4 class="sectiontitle">Procedure</h4><ol id="mrs_01_1934__ol16886161321913"><li id="mrs_01_1934__ldcd0883f80dc41aba86350e66b032275"><span>Log in to Manager and choose <span id="mrs_01_1934__text178413512113"><strong id="mrs_01_1934__b19789511118">Cluster</strong> &gt; <em id="mrs_01_1934__i5838518111">Name of the desired cluster</em></span> &gt; <strong id="mrs_01_1934__b584115119111">Cluster Properties</strong> to check whether the cluster is in security mode.</span><p><ul id="mrs_01_1934__ul1852954205415"><li id="mrs_01_1934__li135295411547">If yes, go to <a href="#mrs_01_1934__ld2cb5a391f5e4625bbe0de71686c6bb6">2</a>.</li><li id="mrs_01_1934__li9199164713541">If no, go to <a href="#mrs_01_1934__lca2e4c1523a4428f89fd147c1041a231">5</a>.</li></ul>
</p></li></ol><ol start="2" id="mrs_01_1934__o39a9a466425c47eeaf0ce77118bc43df"><li id="mrs_01_1934__ld2cb5a391f5e4625bbe0de71686c6bb6"><a name="mrs_01_1934__ld2cb5a391f5e4625bbe0de71686c6bb6"></a><a name="ld2cb5a391f5e4625bbe0de71686c6bb6"></a><span>Choose <span id="mrs_01_1934__text438756161114"><span id="mrs_01_1934__text11371056191111"><strong id="mrs_01_1934__b932056161115">Cluster &gt; </strong><em id="mrs_01_1934__i12373569114">Name of the desired cluster</em></span><strong id="mrs_01_1934__b133795611111"> &gt; </strong></span><strong id="mrs_01_1934__b6384566113">Service &gt; Spark2x &gt; Configuration &gt; All Configurations &gt; JDBCServer2x &gt; Default</strong>, and modify the following parameter.</span><p>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="mrs_01_1934__tc4be9151493b48d9b2a9c8aedf15fb18" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Parameter list 1</caption><thead align="left"><tr id="mrs_01_1934__r1fd48f89d62d4320a02bd42c7c48dede"><th align="left" class="cellrowborder" valign="top" width="46.300000000000004%" id="mcps1.3.2.3.1.2.1.2.4.1.1"><p id="mrs_01_1934__aaef6ce21329140e7bfd9c546ea442a11">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="27.51%" id="mcps1.3.2.3.1.2.1.2.4.1.2"><p id="mrs_01_1934__ae6061e1b9c6d4297b0e699b752f4bf32">Default Value</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="26.19%" id="mcps1.3.2.3.1.2.1.2.4.1.3"><p id="mrs_01_1934__a0408a63cc8b9484986ccd39cb18f3b5f">Changed To</p>
</th>
</tr>
</thead>
<tbody><tr id="mrs_01_1934__r6d4557c76aa442c28feafccb15a5eb96"><td class="cellrowborder" valign="top" width="46.300000000000004%" headers="mcps1.3.2.3.1.2.1.2.4.1.1 "><p id="mrs_01_1934__acfdfdf5c65eb4779a1a89dd6eaa2dc7d">spark.yarn.security.credentials.hbase.enabled</p>
</td>
<td class="cellrowborder" valign="top" width="27.51%" headers="mcps1.3.2.3.1.2.1.2.4.1.2 "><p id="mrs_01_1934__a69b7ebc4d8e042b6bd61320cf4e6c0aa">false</p>
</td>
<td class="cellrowborder" valign="top" width="26.19%" headers="mcps1.3.2.3.1.2.1.2.4.1.3 "><p id="mrs_01_1934__a1a057919b97a470b9c8e2cf1effda8e2">true</p>
</td>
</tr>
</tbody>
</table>
</div>
<div class="note" id="mrs_01_1934__n1f4db076482b4157af5680c958347ccc"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="mrs_01_1934__ad1396cf7c97b4e4cb73ae92e44cac968">To ensure that Spark2x can access HBase for a long time, do not modify the following parameters of the HBase and HDFS services:</p>
<ul id="mrs_01_1934__udc7b508836704e58a2de53df873fd0eb"><li id="mrs_01_1934__l0d5ced6dedf74384bf5d65c6405d7efb">dfs.namenode.delegation.token.renew-interval</li><li id="mrs_01_1934__ld845dfd3142d4641877778546050cccb">dfs.namenode.delegation.token.max-lifetime</li><li id="mrs_01_1934__l6d00052e5a684a3a9e54d0572a5fb5aa">hbase.auth.key.update.interval</li><li id="mrs_01_1934__l909afc22bdab43b5bc6ca1ff973d5ede">hbase.auth.token.max.lifetime (The value is fixed to <strong id="mrs_01_1934__b1817344552112851">604800000</strong> ms, that is, 7 days.)</li></ul>
<p id="mrs_01_1934__p1324878162754">If the preceding parameter configuration must be modified based on service requirements, ensure that the value of the HDFS parameter <strong id="mrs_01_1934__b2073706457112851">dfs.namenode.delegation.token.renew-interval</strong> is not greater than the values of the HBase parameters <strong id="mrs_01_1934__b500439343112851">hbase.auth.key.update.interval</strong>, <strong id="mrs_01_1934__b206221437112851">hbase.auth.token.max.lifetime</strong>, and <strong id="mrs_01_1934__b496041460112851">dfs.namenode.delegation.token.max-lifetime</strong>.</p>
</div></div>
</p></li><li id="mrs_01_1934__l1e1d3dadd17f45d6889cb9adc58028f9"><span>Choose <span class="menucascade" id="mrs_01_1934__menucascade979017161217"><b><span class="uicontrol" id="mrs_01_1934__uicontrol57848151210">SparkResource2x</span></b> &gt; <b><span class="uicontrol" id="mrs_01_1934__uicontrol12789116121">Default</span></b></span> and modify the following parameters.</span><p>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="mrs_01_1934__t9625925c743c48719cda76a39c644f04" frame="border" border="1" rules="all"><caption><b>Table 2 </b>Parameter list 2</caption><thead align="left"><tr id="mrs_01_1934__r6cd0ad608b8c41118608941cdd577b7b"><th align="left" class="cellrowborder" valign="top" width="46.12%" id="mcps1.3.2.3.2.2.1.2.4.1.1"><p id="mrs_01_1934__a4fd1fdd6b1dd49b2891706aa024a4854">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="27.689999999999998%" id="mcps1.3.2.3.2.2.1.2.4.1.2"><p id="mrs_01_1934__a5c74b135417d4a29a0c8786f99276ac3">Default Value</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="26.19%" id="mcps1.3.2.3.2.2.1.2.4.1.3"><p id="mrs_01_1934__aa3d6c1200ff2493abb5c8624d5d75a51">Changed To</p>
</th>
</tr>
</thead>
<tbody><tr id="mrs_01_1934__rac523e3c4a0c48b182f2d13eac7b2499"><td class="cellrowborder" valign="top" width="46.12%" headers="mcps1.3.2.3.2.2.1.2.4.1.1 "><p id="mrs_01_1934__a271e6ddee7b04bad938fa19875a15cbc">spark.yarn.security.credentials.hbase.enabled</p>
</td>
<td class="cellrowborder" valign="top" width="27.689999999999998%" headers="mcps1.3.2.3.2.2.1.2.4.1.2 "><p id="mrs_01_1934__a09bce325188746c897c8200bfcfe7f77">false</p>
</td>
<td class="cellrowborder" valign="top" width="26.19%" headers="mcps1.3.2.3.2.2.1.2.4.1.3 "><p id="mrs_01_1934__a840a959ec49c45eab76ede32fcdeb6bb">true</p>
</td>
</tr>
</tbody>
</table>
</div>
</p></li><li id="mrs_01_1934__l402c177e2fcc40e4aefeee5764de9fa9"><span>Restart the Spark2x service for the configuration to take effect.</span><p><div class="note" id="mrs_01_1934__ndeccba30d5974dfe9bfc3c02ecca13be"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="mrs_01_1934__a7a19438b44484944a238747fb28d66c0">If you need to use the Spark on HBase function on the Spark2x client, download and install the Spark2x client again.</p>
</div></div>
</p></li><li id="mrs_01_1934__lca2e4c1523a4428f89fd147c1041a231"><a name="mrs_01_1934__lca2e4c1523a4428f89fd147c1041a231"></a><a name="lca2e4c1523a4428f89fd147c1041a231"></a><span>On the Spark2x client, use the spark-sql or spark-beeline connection to query tables created by Hive on HBase. You can create an HBase table by running SQL commands or create an external table to associate the HBase table. For details, see the following description. The following uses the HBase table <strong id="mrs_01_1934__b1186721711112851">table1</strong> as an example.</span><p><ol type="a" id="mrs_01_1934__oceaff45b15a3413db4fd1e8bb3e3c515"><li id="mrs_01_1934__l9f23464dea9849498ac4f504a3c1f4ff">Run the following commands to create a table using the spark-beeline tool:<p id="mrs_01_1934__p132119157144"><a name="mrs_01_1934__l9f23464dea9849498ac4f504a3c1f4ff"></a><a name="l9f23464dea9849498ac4f504a3c1f4ff"></a><strong id="mrs_01_1934__b29612641410">create table </strong><em id="mrs_01_1934__i12181510142">hbaseTable</em>1</p>
<p id="mrs_01_1934__p1211815121412"><strong id="mrs_01_1934__b51001722111412">(</strong><em id="mrs_01_1934__i188419397341">id string,</em> <em id="mrs_01_1934__i158861739143418">name string,</em> <em id="mrs_01_1934__i138889396342">age int</em><strong id="mrs_01_1934__b1511392211149">)</strong></p>
<p id="mrs_01_1934__p120191516141"><strong id="mrs_01_1934__b7114152221418">using org.apache.spark.sql.hbase.HBaseSource</strong><strong id="mrs_01_1934__b52546375491">V2</strong></p>
<p id="mrs_01_1934__p420171518148"><strong id="mrs_01_1934__b711652213147">options(</strong></p>
<p id="mrs_01_1934__p1120191561419"><strong id="mrs_01_1934__b2407113218144">hbaseTableName </strong><em id="mrs_01_1934__i996815326148">"table2"</em><strong id="mrs_01_1934__b1540763231412">,</strong></p>
<p id="mrs_01_1934__p16202157143"><strong id="mrs_01_1934__b1042544953516">keyCols </strong><em id="mrs_01_1934__i1492454943510">"</em><em id="mrs_01_1934__i154281549123514">id</em><em id="mrs_01_1934__i7924124923515">"</em><strong id="mrs_01_1934__b15425134911357">,</strong></p>
<p id="mrs_01_1934__p142041591415"><strong id="mrs_01_1934__b81211622121413">colsMapping "</strong><em id="mrs_01_1934__i13285193711356">name=</em><em id="mrs_01_1934__i94501536183519">cf1.cq1</em><em id="mrs_01_1934__i16285437163518">,</em><em id="mrs_01_1934__i202881137123513">age=</em><em id="mrs_01_1934__i154562367350">cf1.cq2</em><strong id="mrs_01_1934__b064912654118">");</strong></p>
<div class="note" id="mrs_01_1934__note177541450171811"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><ul id="mrs_01_1934__ul17655287463"><li id="mrs_01_1934__li136551187466"><strong id="mrs_01_1934__b1839274815112851">hbaseTable1</strong>: name of the created Spark table</li><li id="mrs_01_1934__li965618124615"><strong id="mrs_01_1934__b609991621112851">id string,name string</strong>, <strong id="mrs_01_1934__b1318848822112851">age int</strong>: field name and field type of the Spark table</li><li id="mrs_01_1934__li17656138114610"><strong id="mrs_01_1934__b1842976931112851">table2</strong>: name of the HBase table</li><li id="mrs_01_1934__li5656789465"><em id="mrs_01_1934__i630222286112851">id</em>: row key column name of the HBase table</li><li id="mrs_01_1934__li1165720854612"><em id="mrs_01_1934__i1876466365112851">name=cf1.cq1</em>, <em id="mrs_01_1934__i713791237112851">age=cf1.cq2</em>: mapping between columns in the Spark table and columns in the HBase table. The <strong id="mrs_01_1934__b1746960273112851">name</strong> column of the Spark table maps the <strong id="mrs_01_1934__b1555557655112851">cq1</strong> column in the <strong id="mrs_01_1934__b1831261683112851">cf1</strong> column family of the HBase table, and the <strong id="mrs_01_1934__b70906935112851">age</strong> column of the Spark table maps the <strong id="mrs_01_1934__b1871630294112851">cq2</strong> column in the <strong id="mrs_01_1934__b204626021112851">cf1</strong> column family of the HBase table.</li></ul>
</div></div>
</li><li id="mrs_01_1934__l42ff86dc110648738bf5abbe63c1b14b">Run the following command to import data to the HBase table using a CSV file:<p id="mrs_01_1934__p19354836141311"><a name="mrs_01_1934__l42ff86dc110648738bf5abbe63c1b14b"></a><a name="l42ff86dc110648738bf5abbe63c1b14b"></a><strong id="mrs_01_1934__b1736814061317">hbase org.apache.hadoop.hbase.mapreduce.ImportTsv </strong><strong id="mrs_01_1934__b1169810911367">-Dimporttsv.separator="</strong><strong id="mrs_01_1934__b529822713813">,</strong><strong id="mrs_01_1934__b26991398362">" -</strong><strong id="mrs_01_1934__b195092843614">Dimporttsv.columns=HBASE_ROW_KEY,</strong><em id="mrs_01_1934__i159541228143619">cf1:cq1,cf1:cq2,cf1:cq3,cf1:cq4,cf1:cq5</em><strong id="mrs_01_1934__b17464113319369"> </strong><em id="mrs_01_1934__i112373417361">table2 /hperson</em></p>
<p id="mrs_01_1934__p749019919586">Where <strong id="mrs_01_1934__b384558963112851">table2</strong> indicates the name of the HBase table, and <strong id="mrs_01_1934__b1645000586112851">/hperson</strong> indicates the path where the CSV file is stored.</p>
</li><li id="mrs_01_1934__la9d0aa37ddc544cb9018aed8baf035f2">Run the following command to query data in spark-sql or spark-beeline. <em id="mrs_01_1934__i2134589995112851">hbaseTable1</em> indicates the corresponding Spark table name.<p id="mrs_01_1934__p487815444223"><strong id="mrs_01_1934__b83911916233">select * from </strong><em id="mrs_01_1934__i1296220972315">hbaseTable</em><em id="mrs_01_1934__i182321422357">1;</em></p>
</li></ol>
</p></li></ol>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1928.html">Basic Operation</a></div>
</div>
</div>