doc-exports/docs/mrs/component-operation-guide/mrs_01_2062.html

<a name="mrs_01_2062"></a><a name="mrs_01_2062"></a>

<h1 class="topictitle1">How Does Spark2x Access External Cluster Components?</h1>
<div id="body1595920225826"><div class="section" id="mrs_01_2062__section28541140131913"><h4 class="sectiontitle">Question</h4><p id="mrs_01_2062__p1897514221714">There are two clusters, cluster 1 and cluster 2. How do I use Spark2x in cluster 1 to access HDFS, Hive, HBase, and Kafka components in cluster 2?</p>
</div>
<div class="section" id="mrs_01_2062__section1725115551193"><h4 class="sectiontitle">Answer</h4><ol id="mrs_01_2062__ol16382126207"><li id="mrs_01_2062__li1638272152018">Components in two clusters can access each other. However, there are the following restrictions:<ul id="mrs_01_2062__ul87901331122514"><li id="mrs_01_2062__li979073182515">Only one Hive MetaStore can be accessed. Specifically, Hive MetaStore in cluster 1 and Hive MetaStore in cluster 2 cannot be accessed at the same time.</li><li id="mrs_01_2062__li1279123132516">User systems in different clusters are not synchronized. When users access components in another cluster, user permission is determined by the user configuration of the peer cluster. For example, if user A of cluster 1 does not have the permissions to access the HBase meta table in cluster 1 but user A of cluster 2 can access the HBase meta table in cluster 2, user A of cluster 1 can access the HBase meta table in cluster 2.</li><li id="mrs_01_2062__li147911331162510">To enable components in a security cluster to communicate with each other across Manager, you need to configure mutual trust.</li></ul>
</li><li id="mrs_01_2062__li038217215201">The following describes how to access Hive, HBase, and Kafka components in cluster 2 as user A.<div class="note" id="mrs_01_2062__note67917184279"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="mrs_01_2062__p579211842713">The following operations are based on the scenario where a user uses the FusionInsight client to submit the Spark2x application. If the user uses the configuration file directory, the user needs to modify the corresponding file in the configuration directory of the application and upload the configuration file to the executor.</p>
<p id="mrs_01_2062__p179533713518">When the HDFS and HBase clients access the server, <strong id="mrs_01_2062__b18786102384414">hostname</strong> is used to configure the server address. Therefore, the hosts configuration of all nodes to be accessed must be saved in the <strong id="mrs_01_2062__b9925163010446">/etc/hosts</strong> file on the client. You can add the host of the peer cluster node to the<strong id="mrs_01_2062__b3593837164411"> /etc/hosts</strong> file of the client node in advance.</p>
</div></div>
<ul id="mrs_01_2062__ul1930374619318"><li id="mrs_01_2062__li530314611313">Access Hive metastore: Replace the<strong id="mrs_01_2062__b920019399442"> hive-site.xml</strong> file in the <span class="filepath" id="mrs_01_2062__filepath14303146183115"><b>conf</b></span> directory of the Spark2x client in cluster 1 with the <strong id="mrs_01_2062__b3209639164414">hive-site.xml</strong> file in the <span class="filepath" id="mrs_01_2062__filepath10303184683111"><b>conf</b></span> directory of the Spark2x client in cluster 2.<p id="mrs_01_2062__p61451156141016">After the preceding operations are performed, you can use Spark SQL to access Hive MetaStore. To access Hive table data, you need to perform the operations in <a href="#mrs_01_2062__li13277182417246">• Access HDFS of two clusters at the same time:</a> and set <strong id="mrs_01_2062__b10729165014412">nameservice</strong> of the peer cluster to <strong id="mrs_01_2062__b19730250174417">LOCATION</strong>.</p>
</li><li id="mrs_01_2062__li20303446173111">Access HBase of the peer cluster.<ol type="a" id="mrs_01_2062__ol9715599203"><li id="mrs_01_2062__li438382132016">Configure the IP addresses and host names of all ZooKeeper nodes and HBase nodes in cluster 2 in the <strong id="mrs_01_2062__b4265438184515">/etc/hosts </strong>file on the client node of cluster 1.</li><li id="mrs_01_2062__li1838317213202">Replace the<strong id="mrs_01_2062__b61884514453"> hbase-site.xml</strong> file in the <span class="filepath" id="mrs_01_2062__filepath15303114612315"><b>conf</b></span> directory of the Spark2x client in cluster 1 with the <strong id="mrs_01_2062__b819745164511">hbase-site.xml</strong> file in the <span class="filepath" id="mrs_01_2062__filepath123039460311"><b>conf</b></span> directory of the Spark2x client in cluster 2.</li></ol>
</li><li id="mrs_01_2062__li730354613120">Access Kafka: Set the address of the Kafka Broker to be accessed to the Kafka Broker address in cluster 2.</li><li id="mrs_01_2062__li13277182417246"><a name="mrs_01_2062__li13277182417246"></a><a name="li13277182417246"></a>Access HDFS of two clusters at the same time:<ul id="mrs_01_2062__ul2054015424248"><li id="mrs_01_2062__li12231539142413">Two tokens with the same NameService cannot be obtained at the same time. Therefore, the NameServices of the HDFS in two clusters must be different. For example, one is <strong id="mrs_01_2062__b158573254619">hacluster</strong>, and the other is <strong id="mrs_01_2062__b488993710466">test</strong>.<ol type="a" id="mrs_01_2062__ol192151415174212"><li id="mrs_01_2062__li17215141510428">Obtain the following configurations from the <strong id="mrs_01_2062__b154477924710">hdfs-site.xml</strong> file of cluster2 and add them to the <strong id="mrs_01_2062__b166951813154712">hdfs-site.xml</strong> file in the <strong id="mrs_01_2062__b18308191615472">conf</strong> directory of the Spark2x client in cluster1:<p id="mrs_01_2062__p17755133319293"><strong id="mrs_01_2062__b98781739114711">dfs.nameservices.mappings</strong>, <strong id="mrs_01_2062__b155194364711">dfs.nameservices</strong>, <strong id="mrs_01_2062__b107324465475">dfs.namenode.rpc-address.test.*</strong>, <strong id="mrs_01_2062__b1189034916471">dfs.ha.namenodes.test</strong>, and <strong id="mrs_01_2062__b14166185316471">dfs.client.failover.proxy.provider.test</strong></p>
<p id="mrs_01_2062__p792111306346">The following is an example:</p>
<pre class="screen" id="mrs_01_2062__screen886613314207">&lt;property&gt;
&lt;name&gt;dfs.nameservices.mappings&lt;/name&gt;
&lt;value&gt;[{"name":"hacluster","roleInstances":["14","15"]},{"name":"test","roleInstances":["16","17"]}]&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;dfs.nameservices&lt;/name&gt;
&lt;value&gt;hacluster,test&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;dfs.namenode.rpc-address.test.16&lt;/name&gt;
&lt;value&gt;192.168.0.1:8020&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;dfs.namenode.rpc-address.test.17&lt;/name&gt;
&lt;value&gt;192.168.0.2:8020&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;dfs.ha.namenodes.test&lt;/name&gt;
&lt;value&gt;16,17&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;dfs.client.failover.proxy.provider.test&lt;/name&gt;
&lt;value&gt;org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider&lt;/value&gt;
&lt;/property&gt;</pre>
</li></ol><ol type="a" start="2" id="mrs_01_2062__ol1215181504216"><li id="mrs_01_2062__li24041549102119">Modify <strong id="mrs_01_2062__b75816387206">spark.yarn.extra.hadoopFileSystems = hdfs://test and spark.hadoop.hdfs.externalToken.enable = true</strong> in the <strong id="mrs_01_2062__b15326045132012">spark-defaults.conf </strong>configuration file under the <strong id="mrs_01_2062__b15562144817201">conf</strong> directory on the Spark client of cluster 1.<pre class="screen" id="mrs_01_2062__screen1318014199251">spark.yarn.extra.hadoopFileSystems = hdfs://test
spark.hadoop.hdfs.externalToken.enable = true</pre>
</li></ol><ol type="a" start="3" id="mrs_01_2062__ol18644315446"><li id="mrs_01_2062__li1945532005518">In the application submission command, add the <strong id="mrs_01_2062__b927713119212">--keytab</strong> and <strong id="mrs_01_2062__b1046319313213">--principal</strong> parameters and set them to the user who submits the task in cluster1.</li><li id="mrs_01_2062__li1664163164412">Use the Spark client of cluster1 to submit the application. Then, the two HDFS services can be accessed at the same time.</li></ol>
</li></ul>
</li><li id="mrs_01_2062__li1344714364414">Access HBase of two clusters at the same time:<ol type="a" id="mrs_01_2062__ol15182513436"><li id="mrs_01_2062__li1721451519583">Modify <strong id="mrs_01_2062__b109193203212">spark.hadoop.hbase.externalToken.enable = true</strong> in the <strong id="mrs_01_2062__b1659012289215">spark-defaults.conf</strong> configuration file under the <strong id="mrs_01_2062__b793219317210">conf</strong> directory on the Spark client of cluster 1.<pre class="screen" id="mrs_01_2062__screen11233016165820">spark.hadoop.hbase.externalToken.enable = true</pre>
</li><li id="mrs_01_2062__li131845204315">When accessing HBase, you need to use the configuration file of the corresponding cluster to create a <strong id="mrs_01_2062__b2032033994815">Configuration</strong> object for creating a <strong id="mrs_01_2062__b12325639124812">Connection</strong> object.</li><li id="mrs_01_2062__li1221195744319">In an MRS cluster, tokens of multiple HBase services can be obtained at the same time to solve the problem that the executor cannot access HBase. The method is as follows:<p id="mrs_01_2062__p245442162119"><a name="mrs_01_2062__li1221195744319"></a><a name="li1221195744319"></a>Assume that you need to access HBase of the current cluster and HBase of cluster2. Save the <strong id="mrs_01_2062__b15464814104915">hbase-site.xml</strong> file of cluster2 in a compressed package named <strong id="mrs_01_2062__b1387192274918">external_hbase_conf***</strong>, and use <strong id="mrs_01_2062__b185591253499">--archives</strong> to specify the compressed package when submitting the command.</p>
</li></ol>
</li></ul>
</li></ol>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_2002.html">Common Issues About Spark2x</a></div>
</div>
</div>