CREATE FOREIGN TABLE creates a GDS foreign table.
CREATE FOREIGN TABLE creates a GDS foreign table in the current database for concurrent data import and export. The GDS foreign table can be read-only or write-only, used for concurrent data import and export, respectively. The OBS foreign table is read-only by default.
1 2 3 4 5 6 7 8 9 | CREATE FOREIGN TABLE [ IF NOT EXISTS ] table_name ( [ { column_name type_name POSITION(offset,length) | LIKE source_table } [, ...] ] ) SERVER gsmpp_server OPTIONS ( { option_name ' value ' } [, ...] ) [ { WRITE ONLY | READ ONLY }] [ WITH error_table_name | LOG INTO error_table_name] [REMOTE LOG 'name'] [PER NODE REJECT LIMIT 'value'] [ TO { GROUP groupname | NODE ( nodename [, ... ] ) } ]; |
CREATE FOREIGN TABLE provides multiple parameters, which are classified as follows:
Does not throw an error if a table with the same name already exists. A notice is issued in this case.
Specifies the name of the foreign table to be created.
Value range: a string. It must comply with the naming convention.
Specifies the name of a column in the foreign table.
Value range: a string. It must comply with the naming convention.
Specifies the data type of the column.
Defining the location of each column in the data file in fixed length mode.
offset is the start of the column in the source file, and length is the length of the column.
Value range: offset must be greater than 0 bytes, and its unit is byte.
The length of each record must be less than 1 GB. By default, columns not in the file are replaced with null.
Specifies the server name of the foreign table. For the GDS foreign table, its server is created by initial database, which is gsmpp_server.
Specifies all types of parameters of foreign table data.
Specifies the data source location of the foreign table, which can be expressed through URLs. Separate URLs with vertical bars (|).
Currently, GDS can automatically create a directory defined by a foreign table during data export. For example, when the foreign table location defines that gsfs:// 192.168.0.91:5000/2019/09 executes an export task, if the 2019/09 subdirectory in the GDS data directory does not exist, the subdirectory is automatically created. You do not need to manually create the directory specified in the foreign table.
For example: gsfs://192.168.0.90:5000/* or file:///data/data.txt or gsfs:// 192.168.0.90:5000/* | gsfs:// 192.168.0.91:5000/*.
Specifies the format of the data source file in a foreign table.
Value range: CSV, TEXT. The default value is TEXT.
Specifies whether a data file contains a table header. header is available only for CSV and FIXED files.
When data is imported, if header is on, the first row of the data file will be identified as title row and ignored. If header is off, the first row is identified as data.
When data is exported, if header is on, fileheader must be specified. fileheader is used to specify the export header file format. If header is off, the exported file does not include a title row.
Value range: true, on, false, and off. The default value is false or off.
Specifies a file that defines the content in the header for exported data. The file contains one row of data description of each column.
For example, to add a header in a file containing product information, define the file as follows:
The information of products.\n
Specifies the name prefix of the exported data file exported using GDS from a write-only foreign table.
If file_type is set to pipe, the pipe file dbName_schemaName_foreignTableName.pipe is generated.
If both out_filename_prefix and location specify a pipe name, the pipe name specified in location is used.
"con","aux","nul","prn","com0","com1","com2","com3","com4","com5","com6","com7","com8","com9","lpt0","lpt1","lpt2","lpt3","lpt4","lpt5","lpt6","lpt7","lpt8","lpt9"
Specifies the column delimiter of data, and uses the default delimiter if it is not set. The default delimiter of TEXT is a tab and that of CSV is a comma (,). No delimiter is used in FIXED format.
Valid value:
The value of delimiter can be a multi-character delimiter whose length is less than or equal to 10 bytes.
Specifies which characters in a CSV source data file will be identified as quotation marks. The default value is a double quotation mark (").
Specifies which characters in a CSV source data file are escape characters. Escape characters can only be single-byte characters.
Default value: the same as the value of QUOTE
Valid value:
Specifies in TEXT format, whether to escape the backslash (\) and its following characters.
noescaping is available only for the TEXT format.
Value range: true, on, false, and off. The default value is false or off.
Specifies the encoding of a data file, that is, the encoding used to parse, check, and generate a data file. Its default value is the default client_encoding value of the current database.
Before you import foreign tables, it is recommended that you set client_encoding to the file encoding format, or a format matching the character set of the file. Otherwise, unnecessary parsing and check errors may occur, leading to import errors, rollback, or even invalid data import. Before you import foreign tables, you are also advised to specify this parameter, because the export result using the default character set may not be what you expected.
If this parameter is not specified when you create a foreign table, a warning message will be displayed on the client.
Currently, GDS cannot parse or write in a file using multiple encoding formats during foreign table import or export.
Specifies whether to generate an error message when the last column in a row in the source file is lost during data import.
Value range: true, on, false, and off. The default value is false or off.
missing data for column "tt"
Specifies whether to ignore excessive columns when the number of data source files exceeds the number of foreign table columns. This parameter is available during data import.
Value range: true, on, false, and off. The default value is false or off.
extra data after last expected column
If the newline character at the end of the row is lost, setting the parameter to true will ignore data in the next row.
Specifies the maximum number of data format errors allowed during a data import task. If the number of errors does not reach the maximum number, the data import task can still be executed.
You are advised to replace this syntax with PER NODE REJECT LIMIT 'value'.
Examples of data format errors include the following: a column is lost, an extra column exists, a data type is incorrect, and encoding is incorrect. Once a non-data format error occurs, the whole data import process is stopped.
Value range: a positive integer or unlimited
The default value is 0, indicating that error information is returned immediately.
Enclose positive integer values with single quotation marks ('').
Specifies the data import policy during a specific data import process. GaussDB(DWS) supports only the Normal mode.
Valid value:
Specifies the newline character style of the imported or exported data file.
Value range: multi-character newline characters within 10 bytes. Common newline characters include \r (0x0D), \n (0x0A), and \r\n (0x0D0A). Special newline characters include $ and #.
This parameter is generally used with the compatible_illegal_chars parameter. If a data file contains a truncated Chinese character, the truncated character and a delimiter will be encoded into another Chinese character due to inconsistent encoding between the foreign table and the database. As a result, the delimiter is masked and an error will be reported, indicating that there are missing fields.
This parameter is used to avoid encoding a truncated character and a delimiter into another character.
Value range: true, on, false, and off. The default value is false or off.
This parameter is disabled by default. It is recommended that you disable this parameter, because encoding a truncated character and a delimiter into another character is rarely required. If the parameter is enabled, the scenario may be incorrectly identified and thereby causing incorrect information imported to the table.
Specifies the type of the file to be imported or exported.
Value options: normal, pipe. normal is the default value.
This parameter specifies whether the GDS process automatically creates a named pipe.
Value options: true, on, false, and off. The default value is true/on.
Specifies the length of fixed format data. The unit is byte. This syntax is available only for READ ONLY foreign tables.
Value range: Less than 1 GB, and greater than or equal to the total length specified by POSITION (The total length is the sum of offset and length in the last column of the table definition.)
Specifies how the columns of the types BYTEAOID, CHAROID, NAMEOID, TEXTOID, BPCHAROID, VARCHAROID, NVARCHAR2OID, and CSTRINGOID are aligned during fixed-length export.
Value range: align_left, align_right
Default value: align_right
The bytea data type must be in hexadecimal format (for example, \XXXX) or octal format (for example, \XXX\XXX\XXX). The data to be imported must be left-aligned (that is, the column data starts with either of the two formats instead of spaces). Therefore, if the exported file needs to be imported using a GDS foreign table and the file data length is less than that specified by the foreign table formatter, the exported file must be left aligned. Otherwise, an error is reported during the import.
Imports data of the DATE type. This syntax is available only for READ ONLY foreign tables.
Value range: any valid DATE value. For details, see Date and Time Processing Functions and Operators.
If ORACLE is specified as the compatible database, the DATE format is TIMESTAMP. For details, see timestamp_format below.
Imports data of the TIME type. This syntax is available only for READ ONLY foreign tables.
Value range: any valid TIME value. Time zones cannot be used. For details, see Date and Time Processing Functions and Operators.
Imports data of the TIMESTAMP type. This syntax is available only for READ ONLY foreign tables.
Value range: any valid TIMESTAMP value. Time zones are not supported. For details, see Date and Time Processing Functions and Operators.
Imports data of the SMALLDATETIME type. This syntax is available only for READ ONLY foreign tables.
Value range: any valid SMALLDATETIME value. For details, see Date and Time Processing Functions and Operators.
Enables or disables fault tolerance on invalid characters during data import. This syntax is available only for READ ONLY foreign tables.
Value range: true, on, false, and off. The default value is false or off.
The rule of error tolerance when you import invalid characters is as follows:
(1) \0 is converted to a space.
(2) Other invalid characters are converted to question marks.
(3) If compatible_illegal_chars is set to true or on, invalid characters are tolerated. If NULL, DELIMITER, QUOTE, and ESCAPE are set to a spaces or question marks. Errors like "illegal chars conversion may confuse COPY escape 0x20" will be displayed to prompt user to modify parameter values that cause confusion, preventing import errors.
Specifies whether a foreign table is read-only. This parameter is available only for data import.
Specifies whether a foreign table is write-only. This parameter is available only for data import.
Specifies the table where data format errors generated during parallel data import are recorded. You can query the error information table after data is imported to obtain error details. This parameter is available only after reject_limit is set.
To be compatible with PostgreSQL open source interfaces, you are advised to replace this syntax with LOG INTO.
Value range: a string. It must comply with the naming convention.
Specifies the table where data format errors generated during parallel data import are recorded. You can query the error information table after data is imported to obtain error details.
This parameter is available only after PER NODE REJECT LIMIT is set.
Value range: a string. It must comply with the naming convention.
Concurrently imports data in parallel through GDS foreign tables, to improve single-file import performance. This parameter is only used for data import.
The parameter format is file_sequence'total number of shards-current shard'. Example:
file_sequence '3-1' indicates that the imported file is logically split into three shards and the data currently imported by the foreign table is the data on the first shard.
file_sequence '3-2' indicates that the imported file is logically split into three shards and the data currently imported by the foreign table is the data on the second shard.
file_sequence '3-3' indicates that the imported file is logically split into three shards and the data currently imported by the foreign table is the data on the third shard.
This parameter has the following constraints:
When data is imported in parallel in CSV format, some shards fail to be imported in the following scenario because the CSV rules conflict with the GDS splitting logic:
Scenario: A CSV file contains a newline character that is not escaped, the newline character is contained in the character specified by quote, and the data of this line is in the first row of the logical shard.
For example, if you import the big.csv file in parallel, the following information is displayed:
--id, username, address 10001,"customer1 name","Rose District" 10002,"customer2 name"," 23 Road Rose District NewCity" 10003,"customer3 name","NewCity"
After the file is split into two shards, the content of the first shard is as follows:
10001,"customer1 name","Rose District" 10002,"customer2 name"," 23
The content of the second shard is as follows:
Road Rose District NewCity" 10003,"customer3 name","NewCity"
The newline character after 23 Road Rose in the first line of the second shard is contained between double quotation marks. As a result, GDS cannot determine whether the newline character is a newline character in the field or a separator in the line. Therefore, two data records on the first shard are successfully imported, but the second shard fails to be imported.
The data format error information is saved as files in GDS. name is the prefix of the error data file.
This parameter specifies the allowed number of data format errors on each DN during data import. If the number of errors exceeds the specified value on any DN, data import fails, an error is reported, and the system exits data import.
This syntax specifies the error tolerance of a single node.
Examples of data format errors include the following: a column is lost, an extra column exists, a data type is incorrect, and encoding is incorrect. When a non-data format error occurs, the whole data import process stops.
Value range: integer, unlimited. The default value is 0, indicating that error information is returned immediately.
Currently, TO GROUP cannot be used. TO NODE is used for internal scale-out tools.
Create a foreign tablecustomer_ft to import data from GDS server 10.10.123.234 in TEXT format.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | CREATE FOREIGN TABLE customer_ft ( c_customer_sk integer , c_customer_id char(16) , c_current_cdemo_sk integer , c_current_hdemo_sk integer , c_current_addr_sk integer , c_first_shipto_date_sk integer , c_first_sales_date_sk integer , c_salutation char(10) , c_first_name char(20) , c_last_name char(30) , c_preferred_cust_flag char(1) , c_birth_day integer , c_birth_month integer , c_birth_year integer , c_birth_country varchar(20) , c_login char(13) , c_email_address char(50) , c_last_review_date char(10) ) SERVER gsmpp_server OPTIONS ( location 'gsfs://10.10.123.234:5000/customer1*.dat', FORMAT 'TEXT' , DELIMITER '|', encoding 'utf8', mode 'Normal') READ ONLY; |
Create a foreign table to import data from GDS servers 192.168.0.90 and 192.168.0.91 in TEXT format. Record errors that occur during data import in foreign_HR_staffS_ft. A maximum of two data format errors are allowed during the data import.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | CREATE FOREIGN TABLE foreign_HR_staffS_ft ( staff_ID NUMBER(6) , FIRST_NAME VARCHAR2(20), LAST_NAME VARCHAR2(25), EMAIL VARCHAR2(25), PHONE_NUMBER VARCHAR2(20), HIRE_DATE DATE, employment_ID VARCHAR2(10), SALARY NUMBER(8,2), COMMISSION_PCT NUMBER(2,2), MANAGER_ID NUMBER(6), section_ID NUMBER(4) ) SERVER gsmpp_server OPTIONS (location 'gsfs://192.168.0.90:5000/* | gsfs://192.168.0.91:5000/*', format 'TEXT', delimiter E'\x08', null '',reject_limit '2') WITH err_HR_staffS_ft; |