Scala Example Code

Development Description

The CloudTable HBase and MRS HBase can be connected to DLI as data sources.

Accessing a Data Source Using a SQL API

  1. Insert data.
    1
    sparkSession.sql("insert into test_hbase values('12345','abc','guiyang',false,null,3,23,2.3,2.34)")
    
  2. Query data.
    1
    sparkSession.sql("select * from test_hbase").show ()
    

Accessing a Data Source Using a DataFrame API

  1. Construct a schema.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    val attrId = new StructField("id",StringType)
    val location = new StructField("location",StringType)
    val city = new StructField("city",StringType)
    val booleanf = new StructField("booleanf",BooleanType)
    val shortf = new StructField("shortf",ShortType)
    val intf = new StructField("intf",IntegerType)
    val longf = new StructField("longf",LongType)
    val floatf = new StructField("floatf",FloatType)
    val doublef = new StructField("doublef",DoubleType)
    val attrs = Array(attrId, location,city,booleanf,shortf,intf,longf,floatf,doublef)
    
  2. Construct data based on the schema type.
    1
    2
    val mutableRow: Seq[Any] = Seq("12345","abc","city1",false,null,3,23,2.3,2.34)
    val rddData: RDD[Row] = sparkSession.sparkContext.parallelize(Array(Row.fromSeq(mutableRow)), 1)
    
  3. Import data to HBase.
    1
    sparkSession.createDataFrame(rddData, new StructType(attrs)).write.insertInto("test_hbase")
    
  4. Read data from HBase.
    1
    2
    3
    4
    5
    6
    7
    8
    val map = new mutable.HashMap[String, String]()
    map("TableName") = "table_DupRowkey1"
    map("RowKey") = "id:5,location:6,city:7"
    map("Cols") = "booleanf:CF1.booleanf,shortf:CF1.shortf,intf:CF1.intf,longf:CF1.longf,floatf:CF1.floatf,doublef:CF1.doublef"
    map("ZKHost")="cloudtable-cf82-zk3-pa6HnHpf.cloudtable.com:2181,
                   cloudtable-cf82-zk2-weBkIrjI.cloudtable.com:2181,
                   cloudtable-cf82-zk1-WY09px9l.cloudtable.com:2181"
    sparkSession.read.schema(new StructType(attrs)).format("hbase").options(map.toMap).load().show()
    

Submitting a Spark Job

  1. Generate a JAR package based on the code and upload the package to DLI.

  2. (Optional) Add the krb5.conf and user.keytab files to other dependency files of the job when creating a Spark job in an MRS cluster with Kerberos authentication enabled. Skip this step if Kerberos authentication is not enabled for the cluster.
  3. In the Spark job editor, select the corresponding dependency module and execute the Spark job.

    • If the Spark version is 2.3.2 (will be offline soon) or 2.4.5, set Module to sys.datasource.hbase when you submit a job.
    • If the Spark version is 3.1.1, you do not need to select a module. Set Spark parameters (--conf).

      spark.driver.extraClassPath=/usr/share/extension/dli/spark-jar/datasource/hbase/*

      spark.executor.extraClassPath=/usr/share/extension/dli/spark-jar/datasource/hbase/*

Complete Example Code