CREATE FOREIGN TABLE creates an HDFS or OBS foreign table in the current database to access or export structured data stored on HDFS or OBS. You can also export data in ORC format to HDFS or OBS.
The hybrid data warehouse (standalone) does not support OBS and HDFS foreign table import and export.
Create an HDFS foreign table.
1 2 3 4 5 6 7 8 9 10 11 12 | CREATE FOREIGN TABLE [ IF NOT EXISTS ] table_name ( [ { column_name type_name [ { [CONSTRAINT constraint_name] NULL | [CONSTRAINT constraint_name] NOT NULL | column_constraint [...]} ] | table_constraint [, ...]} [, ...] ] ) SERVER server_name OPTIONS ( { option_name ' value ' } [, ...] ) [ {WRITE ONLY | READ ONLY}] DISTRIBUTE BY {ROUNDROBIN | REPLICATION} [ PARTITION BY ( column_name ) [ AUTOMAPPED ] ] ; |
1 2 3 | [CONSTRAINT constraint_name] {PRIMARY KEY | UNIQUE} [NOT ENFORCED [ENABLE QUERY OPTIMIZATION | DISABLE QUERY OPTIMIZATION] | ENFORCED] |
1 2 3 | [CONSTRAINT constraint_name] {PRIMARY KEY | UNIQUE} (column_name) [NOT ENFORCED [ENABLE QUERY OPTIMIZATION | DISABLE QUERY OPTIMIZATION] | ENFORCED] |
Does not throw an error if a table with the same name exists. A notice is issued in this case.
Specifies the name of the foreign table to be created.
Value range: a string. It must comply with the naming convention.
Specifies the name of a column in the foreign table. Columns are separated by commas (,).
Value range: a string. It must comply with the naming convention.
Specifies the data type of the column.
Data types supported by ORC tables.
The data types supported by TXT table are the same as those in row-store tables.
Specifies the name of a constraint for the foreign table.
Specifies whether the column allows NULL.
When you create a table, whether the data in HDFS is NULL or NOT NULL cannot be guaranteed. The consistency of data is guaranteed by users. Users must decide whether the column is NULL or NOT NULL. (The optimizer optimizes the NULL/NOT NULL and generates a better plan.)
Specifies the server name of the foreign table. Users can customize its name.
Value range: a string indicating an existing server. It must comply with the naming convention.
Specifies whether a data file contains a table header. header is available only for CSV files.
If header is on, the first row of the data file will be identified as the header and ignored during export. If header is off, the first row will be identified as a data row.
Value range: true, on, false, and off. The default value is false or off.
Specifies the quotation mark for the CSV format. The default value is a double quotation mark (").
The quote value cannot be the same as the delimiter or null value.
The quote value must be a single-byte character.
Invisible characters are recommended as quote values, such as 0x07, 0x08, and 0x1b.
Specifies an escape character for a CSV file. The value must be a single-byte character.
The default value is a double quotation mark ("). If the value is the same as the quote value, it will be replaced with \0.
Specifies the file path on OBS. This is an OBS foreign table parameter. The data sources of multiple buckets are separated by vertical bars (|), for example, LOCATION 'obs://bucket1/folder/ | obs://bucket2/'. The database scans all objects in the specified folders.
When accessing a DLI multi-version table, you do not need to specify the location parameter.
When accessing a DLI multi-version table, you do not need to specify the foldername parameter.
Specifies the column delimiter of data, and uses the default delimiter if it is not set. The default delimiter of TEXT is a tab.
Valid value:
The value of delimiter can be a multi-character delimiter whose length is less than or equal to 10 bytes.
Specifies the newline character style of the imported data file.
Value range: multi-character newline characters within 10 bytes. Common newline characters include \r (0x0D), \n (0x0A), and \r\n (0x0D0A). Special newline characters include $ and #.
Valid value:
The default value is \N for the TEXT format.
Specifies whether to escape the backslash (\) and its following characters in .txt format.
noescaping is available only for TEXT source data files.
Value range: true, on, false, and off. The default value is false or off.
Specifies whether to generate an error message when the last column in a row in the source file is lost during data loading.
Value range: true, on, false, and off. The default value is false or off.
missing data for column "tt"
Specifies whether to ignore excessive columns when the number of data source files exceeds the number of foreign table columns. This parameter is available during data import.
Value range: true, on, false, and off. The default value is false or off.
extra data after last expected column
Specifies the DATE format for data import. This syntax is available only for READ ONLY foreign tables.
Value range: any valid DATE value. For details, see Date and Time Processing Functions and Operators.
Specifies the TIME format for data import. This syntax is available only for READ ONLY foreign tables.
Value range: a valid TIME value. Time zones cannot be used. For details, see Date and Time Processing Functions and Operators.
time_format is available only for TEXT and CSV source data files.
Specifies the TIMESTAMP format for data import. This syntax is available only for READ ONLY foreign tables.
Value range: any valid TIMESTAMP value. Time zones are not supported. For details, see Date and Time Processing Functions and Operators.
timestamp_format is available only for TEXT and CSV source data files.
Specifies the SMALLDATETIME format for data import. This syntax is available only for READ ONLY foreign tables.
Value range: a valid SMALLDATETIME value. For details, see Date and Time Processing Functions and Operators.
smalldatetime_format is available only for TEXT and CSV source data files.
This parameter specifies the data code of the data table to be exported when the database code is different from the data code of the data table. For example, the database code is Latin-1, but the data in the exported data table is in UTF-8 format. This parameter is optional. If this parameter is not specified, the database encoding format is used by default. This syntax is valid only for the write-only HDFS foreign table.
Value range: data code types supported by the database encoding
The dataencoding parameter is valid only for the ORC-formatted write-only HDFS foreign table.
Specifies the file size of a write-only foreign table. This parameter is optional. If this parameter is not specified, the file size in the distributed file system configuration is used by default. This syntax is available only for the write-only foreign table.
Value range: an integer ranging from 1 to 1024
The filesize parameter is valid only for the ORC-formatted write-only HDFS foreign table.
Specifies the compression mode of ORC files. This parameter is optional. This syntax is available only for the write-only foreign table.
Value range: zlib, snappy, and lz4 The default value is snappy.
Specifies the ORC version number. This parameter is optional. This syntax is available only for the write-only foreign table.
Value range: Only 0.12 is supported. The default value is 0.12.
Specifies the project ID corresponding to DLI. You can obtain the project ID from the management console. This parameter is available only when the server type is DLI. This feature is supported only in 8.1.1 or later.
Specifies the name of the database where the DLI multi-version table to be accessed is located. This parameter is available only when the server type is DLI. This feature is supported only in 8.1.1 or later.
Specifies the name of the DLI multi-version table to be accessed. This parameter is available only when the server type is DLI. This feature is supported only in 8.1.1 or later.
Specifies whether to check the character encoding.
In TEXT format, the rule of error tolerance for invalid characters imported is as follows:
In ORC format, the rule of error tolerance for invalid characters imported is as follows:
Parameter |
OBS |
HDFS |
||||||||
---|---|---|---|---|---|---|---|---|---|---|
- |
TEXT |
CSV |
ORC |
CARBONDATA |
TEXT |
CSV |
ORC |
PARQUET |
||
READ ONLY |
READ ONLY |
READ ONLY |
WRITE ONLY |
READ ONLY |
READ ONLY |
READ ONLY |
READ ONLY |
WRITE ONLY |
READ ONLY |
|
location |
√ |
√ |
√ |
× |
× |
× |
× |
× |
× |
× |
format |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
header |
× |
√ |
× |
× |
× |
× |
√ |
× |
× |
× |
delimiter |
√ |
√ |
× |
× |
× |
√ |
√ |
× |
× |
× |
quote |
× |
√ |
× |
× |
× |
× |
√ |
× |
× |
× |
escape |
× |
√ |
× |
× |
× |
× |
√ |
× |
× |
× |
null |
√ |
√ |
× |
× |
× |
√ |
√ |
× |
× |
× |
noescaping |
√ |
× |
× |
× |
× |
√ |
× |
× |
× |
× |
encoding |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
fill_missing_fields |
√ |
√ |
× |
× |
× |
√ |
√ |
× |
× |
× |
ignore_extra_data |
√ |
√ |
× |
× |
× |
√ |
√ |
× |
× |
× |
date_format |
√ |
√ |
× |
× |
× |
√ |
√ |
× |
× |
× |
time_format |
√ |
√ |
× |
× |
× |
√ |
√ |
× |
× |
× |
timestamp_format |
√ |
√ |
× |
× |
× |
√ |
√ |
× |
× |
× |
smalldatetime_format |
√ |
√ |
× |
× |
× |
√ |
√ |
× |
× |
× |
chunksize |
√ |
√ |
× |
× |
× |
√ |
√ |
× |
× |
× |
filenames |
× |
× |
× |
× |
√ |
√ |
√ |
√ |
× |
√ |
foldername |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
√ |
dataencoding |
× |
× |
× |
× |
× |
× |
× |
× |
√ |
× |
filesize |
× |
× |
× |
× |
× |
× |
× |
× |
√ |
× |
compression |
× |
× |
× |
√ |
× |
× |
× |
× |
√ |
× |
version |
× |
× |
× |
√ |
× |
× |
× |
× |
√ |
× |
checkencoding |
√ |
√ |
√ |
× |
√ |
√ |
√ |
√ |
√ |
√ |
totalrows |
√ |
√ |
√ |
× |
× |
× |
× |
× |
× |
× |
WRITE ONLY creates a write-only HDFS/OBS foreign table.
READ ONLY creates a read-only HDFS/OBS foreign table.
If the foreign table type is not specified, a read-only foreign table is created by default.
Specifies ROUNDROBIN as the distribution mode for the HDFS/OBS foreign table.
Specifies REPLICATION as the distribution mode for the HDFS/OBS foreign table.
column_name specifies the partition column. AUTOMAPPED means the partition column specified by the HDFS partitioned foreign table is automatically mapped with the partition directory information in HDFS. The prerequisite is that the sequences of partition columns specified in the HDFS foreign table and in the directory are the same. This function is applicable only to read-only foreign tables.
Specifies the name of informational constraint of the foreign table.
Value range: a string. It must comply with the naming convention.
The primary key constraint specifies that one or more columns of a table must contain unique (non-duplicate) and non-null values. Only one primary key can be specified for a table.
Specifies that a group of one or more columns of a table must contain unique values. For the purpose of a unique constraint, NULL is not considered equal.
Specifies the constraint to be an informational constraint. This constraint is guaranteed by the user instead of the database.
The default value is ENFORCED. ENFORCED is a reserved parameter and is currently not supported.
Specifies the informational constraint on column_name.
Value range: a string. It must comply with the naming convention, and the value of column_name must exist.
Optimizes an execution plan using an informational constraint.
Disables the optimization of an execution plan using an informational constraint.
In GaussDB(DWS), the use of data constraints depend on users. If users can make data sources strictly comply with certain constraints, the query on data with such constraints can be accelerated. Foreign tables do not support Index. Informational constraint is used for optimizing query plans.
The constraints of creating informational constraints for a foreign table are as follows:
Example 1: In HDFS, import the TPC-H benchmark test tables part and region using Hive. The path of the part table is /user/hive/warehouse/partition.db/part_4, and that of the region table is /user/hive/warehouse/mppdb.db/region_orc11_64stripe/.
1 | CREATE SERVER hdfs_server FOREIGN DATA WRAPPER HDFS_FDW OPTIONS (address '10.10.0.100:25000,10.10.0.101:25000',hdfscfgpath '/opt/hadoop_client/HDFS/hadoop/etc/hadoop',type'HDFS'); |
The IP addresses and port numbers of HDFS NameNodes are specified in OPTIONS. 10.10.0.100:25000,10.10.0.101:25000 indicates the IP addresses and port numbers of the primary and standby HDFS NameNodes. It is the recommended format. Two groups of parameter values are separated by commas (,). Take '10.10.0.100:25000' as an example. In this example, the IP address is 10.10.0.100, and the port number is 25000.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | CREATE FOREIGN TABLE ft_region ( R_REGIONKEY INT4, R_NAME TEXT, R_COMMENT TEXT ) SERVER hdfs_server OPTIONS ( FORMAT 'orc', encoding 'utf8', FOLDERNAME '/user/hive/warehouse/mppdb.db/region_orc11_64stripe/' ) DISTRIBUTE BY roundrobin; |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | CREATE FOREIGN TABLE ft_part ( p_partkey int, p_name text, p_mfgr text, p_brand text, p_type text, p_size int, p_container text, p_retailprice float8, p_comment text ) SERVER hdfs_server OPTIONS ( FORMAT 'orc', encoding 'utf8', FOLDERNAME '/user/hive/warehouse/partition.db/part_4' ) DISTRIBUTE BY roundrobin PARTITION BY (p_mfgr) AUTOMAPPED; |
GaussDB(DWS) allows you to specify files using the keyword filenames or foldername. The latter is recommended. The key word distribute specifies the storage distribution mode of the region table.
1 2 3 4 5 6 7 8 9 10 11 | SELECT * FROM pg_foreign_table WHERE ftrelid='ft_region'::regclass; ftrelid | ftserver | ftwriteonly | ftoptions ---------+----------+-------------+------------------------------------------------------------------------------ 16510 | 16509 | f | {format=orc,foldername=/user/hive/warehouse/mppdb.db/region_orc11_64stripe/} (1 row) select * from pg_foreign_table where ftrelid='ft_part'::regclass; ftrelid | ftserver | ftwriteonly | ftoptions ---------+----------+-------------+------------------------------------------------------------------ 16513 | 16509 | f | {format=orc,foldername=/user/hive/warehouse/partition.db/part_4} (1 row) |
Export data from the TPC-H benchmark test table region table to the /user/hive/warehouse/mppdb.db/regin_orc/ directory of the HDFS file system through the HDFS write-only foreign table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | CREATE FOREIGN TABLE ft_wo_region ( R_REGIONKEY INT4, R_NAME TEXT, R_COMMENT TEXT ) SERVER hdfs_server OPTIONS ( FORMAT 'orc', encoding 'utf8', FOLDERNAME '/user/hive/warehouse/mppdb.db/regin_orc/' ) WRITE ONLY; |
1 | INSERT INTO ft_wo_regin SELECT * FROM region; |
Perform operations on an HDFS foreign table that includes informational constraints.
1 2 3 4 5 6 7 8 9 10 | CREATE FOREIGN TABLE ft_region ( R_REGIONKEY int, R_NAME TEXT, R_COMMENT TEXT , primary key (R_REGIONKEY) not enforced) SERVER hdfs_server OPTIONS(format 'orc', encoding 'utf8', foldername '/user/hive/warehouse/mppdb.db/region_orc11_64stripe') DISTRIBUTE BY roundrobin; |
1 2 3 4 5 6 7 8 9 10 11 | SELECT relname,relhasindex FROM pg_class WHERE oid='ft_region'::regclass; relname | relhasindex ------------------------+------------- ft_region | f (1 row) SELECT conname, contype, consoft, conopt, conindid, conkey FROM pg_constraint WHERE conname ='region_pkey'; conname | contype | consoft | conopt | conindid | conkey -------------+---------+---------+--------+----------+-------- region_pkey | p | t | t | 0 | {1} (1 row) |
1 2 3 4 5 6 | ALTER FOREIGN TABLE ft_region DROP CONSTRAINT region_pkey RESTRICT; SELECT conname, contype, consoft, conindid, conkey FROM pg_constraint WHERE conname ='region_pkey'; conname | contype | consoft | conindid | conkey ---------+---------+---------+----------+-------- (0 rows) |
1 | ALTER FOREIGN TABLE ft_region ADD CONSTRAINT constr_unique UNIQUE(R_REGIONKEY) NOT ENFORCED; |
1 2 3 4 5 6 | ALTER FOREIGN TABLE ft_region DROP CONSTRAINT constr_unique RESTRICT; SELECT conname, contype, consoft, conindid, conkey FROM pg_constraint WHERE conname ='constr_unique'; conname | contype | consoft | conindid | conkey ---------+---------+---------+----------+-------- (0 rows) |
1 2 3 4 5 6 7 | ALTER FOREIGN TABLE ft_region ADD CONSTRAINT constr_unique UNIQUE(R_REGIONKEY) NOT ENFORCED disable query optimization; SELECT relname,relhasindex FROM pg_class WHERE oid='ft_region'::regclass; relname | relhasindex ------------------------+------------- ft_region | f (1 row) |
1 | ALTER FOREIGN TABLE ft_region DROP CONSTRAINT constr_unique CASCADE; |
Read data stored in OBS using a foreign table.
1 2 3 4 5 6 | CREATE SERVER obs_server FOREIGN DATA WRAPPER DFS_FDW OPTIONS ( ADDRESS 'obs.xxx.xxx.com', ACCESS_KEY 'xxxxxxxxx', SECRET_ACCESS_KEY 'yyyyyyyyyyyyy', TYPE 'OBS' ); |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | CREATE FOREIGN TABLE customer_address ( ca_address_sk integer not null, ca_address_id char(16) not null, ca_street_number char(10) , ca_street_name varchar(60) , ca_street_type char(15) , ca_suite_number char(10) , ca_city varchar(60) , ca_county varchar(30) , ca_state char(2) , ca_zip char(10) , ca_country varchar(20) , ca_gmt_offset decimal(36,33) , ca_location_type char(20) ) SERVER obs_server OPTIONS ( FOLDERNAME '/user/hive/warehouse/mppdb.db/region_orc11_64stripe1/', FORMAT 'ORC', ENCODING 'utf8', TOTALROWS '20' ) DISTRIBUTE BY roundrobin; |
1 2 3 4 5 | SELECT COUNT(*) FROM customer_address; count ------- 20 (1 row) |
Read a DLI multi-version foreign table using a foreign table. Only DLI 8.1.1 and later support the multi-version foreign table example.
1 2 3 4 5 6 7 8 9 | CREATE SERVER dli_server FOREIGN DATA WRAPPER DFS_FDW OPTIONS ( ADDRESS 'obs.xxx.xxx.com', ACCESS_KEY 'xxxxxxxxx', SECRET_ACCESS_KEY 'yyyyyyyyyyyyy', TYPE 'DLI', DLI_ADDRESS 'dli.xxx.xxx.com', DLI_ACCESS_KEY 'xxxxxxxxx', DLI_SECRET_ACCESS_KEY 'yyyyyyyyyyyyy' ); |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | CREATE FOREIGN TABLE customer_address ( ca_address_sk integer not null, ca_address_id char(16) not null, ca_street_number char(10) , ca_street_name varchar(60) , ca_street_type char(15) , ca_suite_number char(10) , ca_city varchar(60) , ca_county varchar(30) , ca_state char(2) , ca_zip char(10) , ca_country varchar(20) , ca_gmt_offset decimal(36,33) , ca_location_type char(20) ) SERVER dli_server OPTIONS ( FORMAT 'ORC', ENCODING 'utf8', DLI_PROJECT_ID 'xxxxxxxxxxxxxxx', DLI_DATABASE_NAME 'database123', DLI_TABLE_NAME 'table456' ) DISTRIBUTE BY roundrobin; |
1 2 3 4 5 | SELECT COUNT(*) FROM customer_address; count ------- 20 (1 row) |