The LOAD DATA function can be used to import data in CSV, Parquet, ORC, JSON, and Avro formats. The data is converted into the Parquet data format for storage.
1 2 | LOAD DATA INPATH 'folder_path' INTO TABLE [db_name.]table_name OPTIONS(property_name=property_value, ...); |
Parameter |
Description |
---|---|
folder_path |
OBS path of the file or folder used for storing the raw data. |
db_name |
Enter the database name. If this parameter is not specified, the current database is used. |
table_name |
Name of the DLI table to which data is to be imported. |
The following configuration options can be used during data import:
The configuration item is OPTIONS ('DATA_TYPE' = 'CSV').
When importing a CSV file or a JSON file, you can select one of the following modes:
You can set the mode by adding OPTIONS ('MODE' = 'PERMISSIVE') to the OPTIONS parameter.
The configuration item is OPTIONS('DELIMITER'=',').
For CSV data, the following delimiters are supported:
The configuration item is OPTIONS('QUOTECHAR'='"').
The configuration item is OPTIONS('COMMENTCHAR'='#').
The configuration item is OPTIONS('HEADER'='true').
OPTIONS('FILEHEADER'='column1,column2')
Enter ESCAPECHAR in the CSV data. ESCAPECHAR must be enclosed in double quotation marks (" "). For example, "a\b".
The configuration item is OPTIONS('MAXCOLUMNS'='400').
Name of the Optional Parameter |
Default Value |
Maximum Value |
---|---|---|
MAXCOLUMNS |
2000 |
20000 |
After the value of MAXCOLUMNS Option is set, data import will require the memory of executor. As a result, data may fail to be imported due to insufficient executor memory.
OPTIONS('DATEFORMAT'='dateFormat')
Character |
Date or Time Element |
Example |
---|---|---|
G |
Epoch ID |
AD |
y |
Year |
1996; 96 |
M |
Month |
July; Jul; 07 |
w |
Number of the week in a year |
27 (the twenty-seventh week of the year) |
W |
Number of the week in a month |
2 (the second week of the month) |
D |
Number of the day in a year |
189 (the 189th day of the year) |
d |
Number of the day in a month |
10 (the tenth day of the month) |
u |
Number of the day in a week |
1 (Monday), ..., 7 (Sunday) |
a |
am/pm flag |
pm (12:00-24:00) |
H |
Hour time (0-23) |
2 |
h |
Hour time (1-12) |
12 |
m |
Number of minutes |
30 |
s |
Number of seconds |
55 |
S |
Number of milliseconds |
978 |
z |
Time zone |
Pacific Standard Time; PST; GMT-08:00 |
OPTIONS('TIMESTAMPFORMAT'='timestampFormat')
OPTIONS('MODE'='permissive')
OPTIONS('BADRECORDSPATH'='obs://bucket/path')
It is recommended that this option be used together with the DROPMALFORMED pattern to import the records that can be successfully converted into the target table and store the records that fail to be converted to the specified error record storage directory.
Before importing data, you must create a table. For details, see Creating an OBS Table or Creating a DLI Table.
1 2 | LOAD DATA INPATH 'obs://dli/data.csv' INTO TABLE t OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','COMMENTCHAR'='#','HEADER'='false'); |
1 2 | LOAD DATA INPATH 'obs://dli/alltype.json' into table jsontb OPTIONS('DATA_TYPE'='json','DATEFORMAT'='yyyy/MM/dd','TIMESTAMPFORMAT'='yyyy/MM/dd HH:mm:ss'); |