Using Spark SQL Jobs to Analyze OBS Data

DLI allows you to use data stored on OBS. You can create OBS tables on DLI to access and process data in your OBS bucket.

This section describes how to create an OBS table on DLI, import data to the table, and insert and query table data.

Prerequisites

Preparations

Creating a Database on DLI

  1. Log in to the DLI management console and click SQL Editor. On the displayed page, set Engine to spark and Queue to the created SQL queue.
  2. Enter the following statement in the SQL editing window to create the testdb database.
    create database testdb;

The following operations in this section must be performed for the testdb database.

DataSource and Hive Syntax for Creating an OBS Table on DLI

The main difference between DataSource syntax and Hive syntax lies in the range of table data storage formats supported and the number of partitions supported. For the key differences in creating OBS tables using these two syntax, refer to Table 1.

Table 1 Syntax differences

Syntax

Data Types

Partitioning

Number of Partitions

DataSource

ORC, PARQUET, JSON, CSV, and AVRO

You need to specify the partitioning column in both CREATE TABLE and PARTITIONED BY statements. For details, see Creating a Single-Partition OBS Table Using DataSource Syntax.

A maximum of 7,000 partitions can be created in a single table.

Hive

TEXTFILE, AVRO, ORC, SEQUENCEFILE, RCFILE, and PARQUET

Do not specify the partitioning column in the CREATE TABLE statement. Specify the column name and data type in the PARTITIONED BY statement. For details, see Creating an OBS Table Using Hive Syntax.

A maximum of 100,000 partitions can be created in a single table.

Creating an OBS Table Using the DataSource Syntax

The following describes how to create an OBS table for CSV files. The methods of creating OBS tables for other file formats are similar.

Creating an OBS Table Using Hive Syntax

The following describes how to create an OBS table for TEXTFILE files. The methods of creating OBS tables for other file formats are similar.

FAQs