Creating an OBS Table Using the DataSource Syntax

Function

Create an OBS table using the DataSource syntax.

The main differences between the DataSource and the Hive syntax lie in the supported data formats and the number of supported partitions. For details, see syntax and precautions.

Usage

Precautions

Syntax

1
2
3
4
5
6
7
CREATE TABLE [IF NOT EXISTS] [db_name.]table_name 
  [(col_name1 col_type1 [COMMENT col_comment1], ...)]
  USING file_format 
  [OPTIONS (path 'obs_path', key1=val1, key2=val2, ...)] 
  [PARTITIONED BY (col_name1, col_name2, ...)]
  [COMMENT table_comment]
  [AS select_statement];

Keyword

Parameter

Table 1 Parameter description

Parameter

Description

db_name

Database name

The value can contain letters, numbers, and underscores (_), but cannot contain only numbers or start with a number or underscore (_).

table_name

Name of the table to be created in the database

The value can contain letters, numbers, and underscores (_), but cannot contain only numbers or start with a number or underscore (_). The matching rule is ^(?!_)(?![0-9]+$)[A-Za-z0-9_$]*$.

Special characters must be enclosed in single quotation marks ('').

col_name

Column names with data types separated by commas (,)

The column name contains letters, digits, and underscores (_). It cannot contain only digits and must contain at least one letter.

col_type

Data type of a column field

col_comment

Column field description

file_format

Input format of the table. The value can be orc, parquet, json, csv, or avro.

path

OBS storage path where data files are stored

Format: obs://bucketName/tblPath

bucketName: bucket name

tblPath: directory name. You do not need to specify the file name following the directory.

For details about attribute names and values during table creation, see Table 2.

For details about the table attribute names and values when file_format is set to csv, see Table 2 and Table 3.

table_comment

Description of the table

select_statement

The CREATE TABLE AS statement is used to insert the SELECT query result of the source table or a data record to a new table in OBS bucket.

Table 2 OPTIONS parameter description

Parameter

Description

Default Value

path

Specified table storage location. Currently, only OBS is supported.

-

multiLevelDirEnable

Whether to iteratively query data in subdirectories when subdirectories are nested. When this parameter is set to true, all files in the table path, including files in subdirectories, are iteratively read when a table is queried.

false

dataDelegated

Whether to clear data in the path when deleting a table or partition

false

compression

Specified compression format. Generally, you need to set this parameter to zstd for parquet files.

-

When the file format is set to CSV, you can set the following OPTIONS parameters:
Table 3 OPTIONS parameter description of the CSV data format

Parameter

Description

Default Value

delimiter

Data separator

Comma (,)

quote

Quotation character

Double quotation marks (" ")

escape

Escape character

Backslash (\)

multiLine

Whether the column data contains carriage return characters or transfer characters. The value true indicates yes and the value false indicates no.

false

dateFormat

Date format of the date field in a CSV file

yyyy-MM-dd

timestampFormat

Date format of the timestamp field in a CSV file

yyyy-MM-dd HH:mm:ss

mode

Mode for parsing CSV files. The options are as follows:

  • PERMISSIVE: Permissive mode. If an incorrect field is encountered, set the line to Null.
  • DROPMALFORMED: When an incorrect field is encountered, the entire line is discarded.
  • FAILFAST: Error mode. If an error occurs, it is automatically reported.

PERMISSIVE

header

Whether CSV contains header information. The value true indicates that the table header information is contained, and the value false indicates that the information is not included.

false

nullValue

Character that represents the null value. For example, nullValue= "\\N" indicates that \N represents the null value.

-

comment

Character that indicates the beginning of the comment. For example, comment= '#' indicates that the line starting with # is a comment.

-

compression

Data compression format. Currently, gzip, bzip2, and deflate are supported. If you do not want to compress data, enter none.

none

encoding

Data encoding format. Available values are utf-8, gb2312, and gbk. Value utf-8 will be used if this parameter is left empty.

utf-8

Example