SparkSQL may need to be associated with other components. For example, Spark on HBase requires HBase permissions. The following describes how to associate SparkSQL with HBase.
After the permissions are assigned, you can use statements that are similar to SQL statements to access HBase tables from SparkSQL. The following uses the procedure for assigning a user the permissions to query HBase tables as an example.
Set spark.yarn.security.credentials.hbase.enabled to true.
In the Configure Resource Permission table, choose Name of the desired cluster > HBase > HBase Scope > global. Select create of the namespace default, and click OK.
In this example, the created table is saved in the default database of Hive and has the CREATE permission of the default database. If you save the table to a Hive database other than default, perform the following operations:
In the Configure Resource Permission table, choose Name of the desired cluster > Hive > Hive Read Write Privileges, select CREATE for the desired database, and click OK.
In the Configure Resource Permission table, choose Name of the desired cluster > Yarn > Scheduling Queue > root. Select Submit of default, and click OK.
source /opt/client/bigdata_env
source /opt/client/Spark2x/component_env
/opt/client/Spark2x/spark/bin/beeline -u "jdbc:hive2://<zkNode1_IP>:<zkNode1_Port>,<zkNode2_IP>:<zkNode2_Port>,<zkNode3_IP>:<zkNode3_Port>/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=sparkthriftserver2x;user.principal=spark2x/hadoop.<system domain name>@<system domain name>;saslQop=auth-conf;auth=KERBEROS;principal=spark2x/hadoop.<system domain name>@<system domain name>;"
create table hbaseTable (id string, name string, age int) using org.apache.spark.sql.hbase.HBaseSource options (hbaseTableName "table1", keyCols "id", colsMapping = ", name=cf1.cq1, age=cf1.cq2");
The created SparkSQL table and the HBase table are stored in the Hive database default and the HBase namespace default, respectively.
source /opt/client/bigdata_env
source /opt/client/Spark2x/component_env
/opt/client/Spark2x/spark/bin/beeline -u "jdbc:hive2://<zkNode1_IP>:<zkNode1_Port>,<zkNode2_IP>:<zkNode2_Port>,<zkNode3_IP>:<zkNode3_Port>/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=sparkthriftserver2x;user.principal=spark2x/hadoop.<system domain name>@<system domain name>;saslQop=auth-conf;auth=KERBEROS;principal=spark2x/hadoop.<system domain name>@<system domain name>;"