CREATE FOREIGN TABLE (SQL on OBS or Hadoop)

Function

CREATE FOREIGN TABLE creates an HDFS or OBS foreign table in the current database to access or export structured data stored on HDFS or OBS. You can also export data in ORC format to HDFS or OBS.

The hybrid data warehouse (standalone) does not support OBS and HDFS foreign table import and export.

Precautions

Syntax

Create an HDFS foreign table.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
CREATE FOREIGN TABLE [ IF NOT EXISTS ] table_name 
( [ { column_name type_name 
    [ { [CONSTRAINT constraint_name] NULL |
    [CONSTRAINT constraint_name] NOT NULL |
      column_constraint [...]} ] |
      table_constraint [, ...]} [, ...] ] ) 
    SERVER server_name 
    OPTIONS ( { option_name ' value ' } [, ...] ) 
    [ {WRITE ONLY | READ ONLY}]
    DISTRIBUTE BY {ROUNDROBIN | REPLICATION}
   
    [ PARTITION BY ( column_name ) [ AUTOMAPPED ] ] ;

Parameter Description

Informational Constraint

In GaussDB(DWS), the use of data constraints depend on users. If users can make data sources strictly comply with certain constraints, the query on data with such constraints can be accelerated. Foreign tables do not support Index. Informational constraint is used for optimizing query plans.

The constraints of creating informational constraints for a foreign table are as follows:

Example 1

Example 1: In HDFS, import the TPC-H benchmark test tables part and region using Hive. The path of the part table is /user/hive/warehouse/partition.db/part_4, and that of the region table is /user/hive/warehouse/mppdb.db/region_orc11_64stripe/.

  1. Establish HDFS_Server, with HDFS_FDW or DFS_FDW as the foreign data wrapper.
    1
    CREATE SERVER hdfs_server FOREIGN DATA WRAPPER HDFS_FDW OPTIONS (address '10.10.0.100:25000,10.10.0.101:25000',hdfscfgpath '/opt/hadoop_client/HDFS/hadoop/etc/hadoop',type'HDFS');
    

    The IP addresses and port numbers of HDFS NameNodes are specified in OPTIONS. 10.10.0.100:25000,10.10.0.101:25000 indicates the IP addresses and port numbers of the primary and standby HDFS NameNodes. It is the recommended format. Two groups of parameter values are separated by commas (,). Take '10.10.0.100:25000' as an example. In this example, the IP address is 10.10.0.100, and the port number is 25000.

  1. Create an HDFS foreign table. The HDFS server associated with the table is hdfs_server, the corresponding file format of the ft_region table on the HDFS server is 'orc', and the file directory in the HDFS file system is '/user/hive/warehouse/mppdb. db/region_orc11_64stripe/'.
  1. View the created server and foreign table.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    SELECT * FROM pg_foreign_table WHERE ftrelid='ft_region'::regclass;
     ftrelid | ftserver | ftwriteonly |                                  ftoptions
    ---------+----------+-------------+------------------------------------------------------------------------------
       16510 |    16509 | f           | {format=orc,foldername=/user/hive/warehouse/mppdb.db/region_orc11_64stripe/}
    (1 row)
    
    select * from pg_foreign_table where ftrelid='ft_part'::regclass;
     ftrelid | ftserver | ftwriteonly |                            ftoptions
    ---------+----------+-------------+------------------------------------------------------------------
       16513 |    16509 | f           | {format=orc,foldername=/user/hive/warehouse/partition.db/part_4}
    (1 row)
    

Example 2

Export data from the TPC-H benchmark test table region table to the /user/hive/warehouse/mppdb.db/regin_orc/ directory of the HDFS file system through the HDFS write-only foreign table.

  1. Create an HDFS foreign table. The corresponding foreign data wrapper is HDFS_FDW or DFS_FDW, which is the same as that in Example 1.
  2. Create a write-only HDFS foreign table.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    CREATE FOREIGN TABLE ft_wo_region
    (
        R_REGIONKEY INT4,
        R_NAME TEXT,
        R_COMMENT TEXT
    )
    SERVER
        hdfs_server
    OPTIONS
    (
        FORMAT 'orc',
        encoding 'utf8',
        FOLDERNAME '/user/hive/warehouse/mppdb.db/regin_orc/'
    )
    WRITE ONLY;
    
  3. Writes data to the HDFS file system through a write-only foreign table.
    1
    INSERT INTO ft_wo_regin SELECT * FROM region;
    

Example 3

Perform operations on an HDFS foreign table that includes informational constraints.

Example 4

Read data stored in OBS using a foreign table.

  1. Create obs_server, with DFS_FDW as the foreign data wrapper.
    1
    2
    3
    4
    5
    6
    CREATE SERVER obs_server FOREIGN DATA WRAPPER DFS_FDW OPTIONS ( 
      ADDRESS 'obs.xxx.xxx.com', 
       ACCESS_KEY 'xxxxxxxxx', 
      SECRET_ACCESS_KEY 'yyyyyyyyyyyyy', 
      TYPE 'OBS'
    );
    
    • ADDRESS is the endpoint of OBS. Replace it with the actual endpoint. You can find the domain name by searching for the value of regionCode in the region_map file.
    • ACCESS_KEY and SECRET_ACCESS_KEY are access keys for the cloud account system. Replace the values as needed.
    • TYPE indicates the server type. Retain the value OBS.
  2. Create an OBS foreign table named customer_address, which does not contain partition columns and is associated with an OBS server named obs_server. Files on obs_server are in ORC format and stored in /user/hive/warehouse/mppdb.db/region_orc11_64stripe1/.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    CREATE FOREIGN TABLE customer_address
    (
        ca_address_sk             integer               not null,
        ca_address_id             char(16)              not null,
        ca_street_number          char(10)                      ,   
        ca_street_name            varchar(60)                   ,   
        ca_street_type            char(15)                      ,   
        ca_suite_number           char(10)                      ,   
        ca_city                   varchar(60)                   ,   
        ca_county                 varchar(30)                   ,   
        ca_state                  char(2)                       ,   
        ca_zip                    char(10)                      ,   
        ca_country                varchar(20)                   ,   
        ca_gmt_offset             decimal(36,33)                  ,   
        ca_location_type          char(20)    
    ) 
    SERVER obs_server OPTIONS (
        FOLDERNAME '/user/hive/warehouse/mppdb.db/region_orc11_64stripe1/',
        FORMAT 'ORC',
        ENCODING 'utf8',
        TOTALROWS  '20'
    )
    DISTRIBUTE BY roundrobin;
    
  3. Query data stored in OBS using a foreign table.
    1
    2
    3
    4
    5
    SELECT COUNT(*) FROM customer_address;
     count 
    -------
        20
    (1 row)
    

Example 5

Read a DLI multi-version foreign table using a foreign table. Only DLI 8.1.1 and later support the multi-version foreign table example.

  1. Create dli_server, with DFS_FDW as the foreign data wrapper.
    1
    2
    3
    4
    5
    6
    7
    8
    9
    CREATE SERVER dli_server FOREIGN DATA WRAPPER DFS_FDW OPTIONS ( 
      ADDRESS 'obs.xxx.xxx.com', 
      ACCESS_KEY 'xxxxxxxxx', 
      SECRET_ACCESS_KEY 'yyyyyyyyyyyyy', 
      TYPE 'DLI',
      DLI_ADDRESS 'dli.xxx.xxx.com',
      DLI_ACCESS_KEY 'xxxxxxxxx',
      DLI_SECRET_ACCESS_KEY 'yyyyyyyyyyyyy'
    );
    
    • ADDRESS is the endpoint of OBS. DLI_ADDRESS is the endpoint of DLI. Replace it with the actual endpoint.
    • ACCESS_KEY and SECRET_ACCESS_KEY are access keys for the cloud account system to access OBS. Use the actual value.
    • DLI_ACCESS_KEY and DLI_SECRET_ACCESS_KEY are access keys for the cloud account system to access DLI. Use the actual value.
    • TYPE indicates the server type. Retain the value DLI.
  1. Create the OBS foreign table customer_address for accessing DLI. The table does not contain partition columns, and the DLI server associated with the table is dli_server. In the preceding command, dli_project_id is xxxxxxxxxxxxxxx, dli_database_name is database123, and dli_table_name is table456. Set their values based on site requirements.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    CREATE FOREIGN TABLE customer_address
    (
        ca_address_sk             integer               not null,
        ca_address_id             char(16)              not null,
        ca_street_number          char(10)                      ,   
        ca_street_name            varchar(60)                   ,   
        ca_street_type            char(15)                      ,   
        ca_suite_number           char(10)                      ,   
        ca_city                   varchar(60)                   ,   
        ca_county                 varchar(30)                   ,   
        ca_state                  char(2)                       ,   
        ca_zip                    char(10)                      ,   
        ca_country                varchar(20)                   ,   
        ca_gmt_offset             decimal(36,33)                  ,   
        ca_location_type          char(20)    
    ) 
    SERVER dli_server OPTIONS (
        FORMAT 'ORC',
        ENCODING 'utf8',
        DLI_PROJECT_ID 'xxxxxxxxxxxxxxx',
        DLI_DATABASE_NAME 'database123',
        DLI_TABLE_NAME 'table456'
    )
    DISTRIBUTE BY roundrobin;
    
  2. Query data in a DLI multi-version table using a foreign table.
    1
    2
    3
    4
    5
    SELECT COUNT(*) FROM customer_address;
     count 
    -------
        20
    (1 row)
    

Helpful Links

ALTER FOREIGN TABLE (for HDFS or OBS), DROP FOREIGN TABLE