There are two clusters, cluster 1 and cluster 2. How do I use Spark2x in cluster 1 to access HDFS, Hive, HBase, and Kafka components in cluster 2?
The following operations are based on the scenario where a user uses the FusionInsight client to submit the Spark2x application. If the user uses the configuration file directory, the user needs to modify the corresponding file in the configuration directory of the application and upload the configuration file to the executor.
When the HDFS and HBase clients access the server, hostname is used to configure the server address. Therefore, the hosts configuration of all nodes to be accessed must be saved in the /etc/hosts file on the client. You can add the host of the peer cluster node to the /etc/hosts file of the client node in advance.
After the preceding operations are performed, you can use Spark SQL to access Hive MetaStore. To access Hive table data, you need to perform the operations in • Access HDFS of two clusters at the same time: and set nameservice of the peer cluster to LOCATION.
dfs.nameservices.mappings, dfs.nameservices, dfs.namenode.rpc-address.test.*, dfs.ha.namenodes.test, and dfs.client.failover.proxy.provider.test
The following is an example:
<property> <name>dfs.nameservices.mappings</name> <value>[{"name":"hacluster","roleInstances":["14","15"]},{"name":"test","roleInstances":["16","17"]}]</value> </property> <property> <name>dfs.nameservices</name> <value>hacluster,test</value> </property> <property> <name>dfs.namenode.rpc-address.test.16</name> <value>192.168.0.1:8020</value> </property> <property> <name>dfs.namenode.rpc-address.test.17</name> <value>192.168.0.2:8020</value> </property> <property> <name>dfs.ha.namenodes.test</name> <value>16,17</value> </property> <property> <name>dfs.client.failover.proxy.provider.test</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property>
spark.yarn.extra.hadoopFileSystems = hdfs://test spark.hadoop.hdfs.externalToken.enable = true
spark.hadoop.hbase.externalToken.enable = true
Assume that you need to access HBase of the current cluster and HBase of cluster2. Save the hbase-site.xml file of cluster2 in a compressed package named external_hbase_conf***, and use --archives to specify the compressed package when submitting the command.