DLI allows you to use data stored on OBS. You can create OBS tables on DLI to access and process data in your OBS bucket.
This section describes how to create an OBS table on DLI, import data to the table, and insert and query table data.
Creating a Database on DLI
create database testdb;
The following operations in this section must be performed for the testdb database.
The main difference between DataSource syntax and Hive syntax lies in the range of table data storage formats supported and the number of partitions supported. For the key differences in creating OBS tables using these two syntax, refer to Table 1.
Syntax |
Data Types |
Partitioning |
Number of Partitions |
---|---|---|---|
DataSource |
ORC, PARQUET, JSON, CSV, and AVRO |
You need to specify the partitioning column in both CREATE TABLE and PARTITIONED BY statements. For details, see Creating a Single-Partition OBS Table Using DataSource Syntax. |
A maximum of 7,000 partitions can be created in a single table. |
Hive |
TEXTFILE, AVRO, ORC, SEQUENCEFILE, RCFILE, and PARQUET |
Do not specify the partitioning column in the CREATE TABLE statement. Specify the column name and data type in the PARTITIONED BY statement. For details, see Creating an OBS Table Using Hive Syntax. |
A maximum of 100,000 partitions can be created in a single table. |
The following describes how to create an OBS table for CSV files. The methods of creating OBS tables for other file formats are similar.
Jordon,88,23 Kim,87,25 Henry,76,26
CREATE TABLE testcsvdatasource (name STRING, score DOUBLE, classNo INT ) USING csv OPTIONS (path "obs://dli-test-021/test.csv");
If you create an OBS table using a specified file, you cannot insert data to the table with DLI. The OBS file content is synchronized with the table data.
select * from testcsvdatasource;
Jordon,88,23 Kim,87,25 Henry,76,26 Aarn,98,20
select * from testcsvdatasource;
CREATE TABLE testcsvdata2source (name STRING, score DOUBLE, classNo INT) USING csv OPTIONS (path "obs://dli-test-021/data");
insert into testcsvdata2source VALUES('Aarn','98','20');
select * from testcsvdata2source;
Jordon,88,23 Kim,87,25 Henry,76,26
CREATE TABLE testcsvdata3source (name STRING, score DOUBLE, classNo INT) USING csv OPTIONS (path "obs://dli-test-021/data2");
insert into testcsvdata3source VALUES('Aarn','98','20');
select * from testcsvdata3source;
CREATE TABLE testcsvdata4source (name STRING, score DOUBLE, classNo INT) USING csv OPTIONS (path "obs://dli-test-021/data3") PARTITIONED BY (classNo);
Jordon,88,25 Kim,87,25 Henry,76,25
ALTER TABLE testcsvdata4source ADD PARTITION (classNo = 25) LOCATION 'obs://dli-test-021/data3/classNo=25';
select * from testcsvdata4source where classNo = 25;
insert into testcsvdata4source VALUES('Aarn','98','25'); insert into testcsvdata4source VALUES('Adam','68','24');
When a partitioned table is queried using the where condition, the partition must be specified. Otherwise, the query fails and "DLI.0005: There should be at least one partition pruning predicate on partitioned table" is reported.
select * from testcsvdata4source where classNo = 25;
select * from testcsvdata4source where classNo = 24;
CREATE TABLE testcsvdata5source (name STRING, score DOUBLE, classNo INT, dt varchar(16)) USING csv OPTIONS (path "obs://dli-test-021/data4") PARTITIONED BY (classNo,dt);
insert into testcsvdata5source VALUES('Aarn','98','25','2021-07-27'); insert into testcsvdata5source VALUES('Adam','68','25','2021-07-28');
select * from testcsvdata5source where classNo = 25;
select * from testcsvdata5source where dt like '2021-07%';
Jordon,88,24,2021-07-29 Kim,87,24,2021-07-29 Henry,76,24,2021-07-29
ALTER TABLE testcsvdata5source ADD PARTITION (classNo = 24,dt='2021-07-29') LOCATION 'obs://dli-test-021/data4/classNo=24/dt=2021-07-29';
select * from testcsvdata5source where classNo = 24;
select * from testcsvdata5source where dt like '2021-07%';
The following describes how to create an OBS table for TEXTFILE files. The methods of creating OBS tables for other file formats are similar.
Jordon,88,23 Kim,87,25 Henry,76,26
CREATE TABLE hiveobstable (name STRING, score DOUBLE, classNo INT) STORED AS TEXTFILE LOCATION 'obs://dli-test-021/data5' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' indicates that records are separated by commas (,).
select * from hiveobstable;
insert into hiveobstable VALUES('Aarn','98','25'); insert into hiveobstable VALUES('Adam','68','25');
select * from hiveobstable;
Create an OBS Table Containing Data of Multiple Formats
Jordon,88-22,23:21 Kim,87-22,25:22 Henry,76-22,26:23
CREATE TABLE hiveobstable2 (name STRING, hobbies ARRAY<string>, address map<string,string>) STORED AS TEXTFILE LOCATION 'obs://dli-test-021/data6' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY '-' MAP KEYS TERMINATED BY ':';
select * from hiveobstable2;
CREATE TABLE IF NOT EXISTS hiveobstable3(name STRING, score DOUBLE) PARTITIONED BY (classNo INT) STORED AS TEXTFILE LOCATION 'obs://dli-test-021/data7' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
You can specify the partition key in the PARTITIONED BY statement. Do not specify the partition key in the CREATE TABLE IF NOT EXISTS statement. The following is an incorrect example:
CREATE TABLE IF NOT EXISTS hiveobstable3(name STRING, score DOUBLE, classNo INT) PARTITIONED BY (classNo) STORED AS TEXTFILE LOCATION 'obs://dli-test-021/data7';
insert into hiveobstable3 VALUES('Aarn','98','25'); insert into hiveobstable3 VALUES('Adam','68','25');
select * from hiveobstable3 where classNo = 25;
Jordon,88,24 Kim,87,24 Henry,76,24
ALTER TABLE hiveobstable3 ADD PARTITION (classNo = 24) LOCATION 'obs://dli-test-021/data7/classNo=24';
select * from hiveobstable3 where classNo = 24;
DLI.0005: There should be at least one partition pruning predicate on partitioned table `xxxx`.`xxxx`.;
Cause: The partition key is not specified in the query statement of a partitioned table.
Solution: Ensure that the where condition contains at least one partition key.
CREATE TABLE testcsvdatasource (name string, id int) USING csv OPTIONS (path "obs://dli-test-021/data/test.csv");
Cause: Data cannot be inserted if a specific file is used in the table creation statement. For example, the OBS file obs://dli-test-021/data/test.csv is used in the preceding example.
CREATE TABLE testcsvdatasource (name string, id int) USING csv OPTIONS (path "obs://dli-test-021/data");
CREATE TABLE IF NOT EXISTS testtable(name STRING, score DOUBLE, classNo INT) PARTITIONED BY (classNo) STORED AS TEXTFILE LOCATION 'obs://dli-test-021/data7';
Cause: Do not specify the partition key in the list following the table name. Specify the partition key in the PARTITIONED BY statement.
CREATE TABLE IF NOT EXISTS testtable(name STRING, score DOUBLE) PARTITIONED BY (classNo INT) STORED AS TEXTFILE LOCATION 'obs://dli-test-021/data7';