The data servers and the cluster reside on the same intranet. The IP addresses are 192.168.0.90 and 192.168.0.91. Source data files are in CSV format.
1 2 3 4 5 6 | CREATE TABLE tpcds.reasons ( r_reason_sk integer not null, r_reason_id char(16) not null, r_reason_desc char(100) ); |
mkdir -p /input_data
groupadd gdsgrp useradd -g gdsgrp gds_user
chown -R gds_user:gdsgrp /input_data
The GDS installation path is /opt/bin/dws/gds. Source data files are stored in /input_data/. The IP addresses of the data servers are 192.168.0.90 and 192.168.0.91. The GDS listening port is 5000. GDS runs in daemon mode.
/opt/bin/dws/gds/bin/gds -d /input_data -p 192.168.0.90:5000 -H 10.10.0.1/24 -D
Start GDS on the data server whose IP address is 192.168.0.91.
/opt/bin/dws/gds/bin/gds -d /input_data -p 192.168.0.91:5000 -H 10.10.0.1/24 -D
Data export mode settings are as follows:
Information about the data format is configured based on data format parameters specified during data export. The parameter configurations are as follows:
Configure import error tolerance parameters as follows:
Based on the above settings, the foreign table is created using the following statement:
1 2 3 4 5 6 7 | CREATE FOREIGN TABLE tpcds.foreign_tpcds_reasons ( r_reason_sk integer not null, r_reason_id char(16) not null, r_reason_desc char(100) ) SERVER gsmpp_server OPTIONS (location 'gsfs://192.168.0.90:5000/* | gsfs://192.168.0.91:5000/*', format 'CSV',mode 'Normal', encoding 'utf8', delimiter E'\x08', quote E'\x1b', null '', fill_missing_fields 'false') LOG INTO err_tpcds_reasons PER NODE REJECT LIMIT 'unlimited'; |
1 | INSERT INTO tpcds.reasons SELECT * FROM tpcds.foreign_tpcds_reasons; |
1 | SELECT * FROM err_tpcds_reasons; |
ps -ef|grep gds gds_user 128954 1 0 15:03 ? 00:00:00 gds -d /input_data -p 192.168.0.90:5000 -D gds_user 129003 118723 0 15:04 pts/0 00:00:00 grep gds kill -9 128954
The data servers and the cluster reside on the same intranet. The server IP address is 192.168.0.90. Source data files are in CSV format. Data will be imported to two tables using multiple threads in Normal mode.
1 2 3 4 5 6 | CREATE TABLE tpcds.reasons1 ( r_reason_sk integer not null, r_reason_id char(16) not null, r_reason_desc char(100) ) ; |
1 2 3 4 5 6 | CREATE TABLE tpcds.reasons2 ( r_reason_sk integer not null, r_reason_id char(16) not null, r_reason_desc char(100) ) ; |
mkdir -p /input_data
groupadd gdsgrp useradd -g gdsgrp gds_user
chown -R gds_user:gdsgrp /input_data
/gds/gds -d /input_data -p 192.168.0.90:5000 -H 10.10.0.1/24 -D -t 2 -r
The foreign table tpcds.foreign_tpcds_reasons1 is used as an example to describe how to configure parameters in a foreign table.
Data export mode settings are as follows:
Information about the data format is configured based on data format parameters specified during data export. The parameter configurations are as follows:
Configure import error tolerance parameters as follows:
Based on the preceding settings, the foreign table tpcds.foreign_tpcds_reasons1 is created using the following statement:
1 2 3 4 5 6 | CREATE FOREIGN TABLE tpcds.foreign_tpcds_reasons1 ( r_reason_sk integer not null, r_reason_id char(16) not null, r_reason_desc char(100) ) SERVER gsmpp_server OPTIONS (location 'gsfs://192.168.0.90:5000/import1/*', format 'CSV',mode 'Normal', encoding 'utf8', delimiter E'\x08', quote E'\x1b', null '',fill_missing_fields 'on')LOG INTO err_tpcds_reasons1 PER NODE REJECT LIMIT 'unlimited'; |
Based on the preceding settings, the foreign table tpcds.foreign_tpcds_reasons2 is created using the following statement:
1 2 3 4 5 6 | CREATE FOREIGN TABLE tpcds.foreign_tpcds_reasons2 ( r_reason_sk integer not null, r_reason_id char(16) not null, r_reason_desc char(100) ) SERVER gsmpp_server OPTIONS (location 'gsfs://192.168.0.90:5000/import2/*', format 'CSV',mode 'Normal', encoding 'utf8', delimiter E'\x08', quote E'\x1b', null '',fill_missing_fields 'on')LOG INTO err_tpcds_reasons2 PER NODE REJECT LIMIT 'unlimited'; |
1 | INSERT INTO tpcds.reasons1 SELECT * FROM tpcds.foreign_tpcds_reasons1; |
1 | INSERT INTO tpcds.reasons2 SELECT * FROM tpcds.foreign_tpcds_reasons2; |
1 2 | SELECT * FROM err_tpcds_reasons1; SELECT * FROM err_tpcds_reasons2; |
ps -ef|grep gds gds_user 128954 1 0 15:03 ? 00:00:00 gds -d /input_data -p 192.168.0.90:5000 -D -t 2 -r gds_user 129003 118723 0 15:04 pts/0 00:00:00 grep gds kill -9 128954
gds -d /***/gds_data/ -D -p 192.168.0.1:7789 -l /***/gds_log/aa.log -H 0/0 -t 10 -D
If you need to set the timeout interval of a pipe, use the --pipe-timeout parameter.
CREATE TABLE test_pipe_1( id integer not null, sex text not null, name text );
CREATE FOREIGN TABLE foreign_test_pipe_tr( like test_pipe ) SERVER gsmpp_server OPTIONS (LOCATION 'gsfs://192.168.0.1:7789/foreign_test_pipe.pipe', FORMAT 'text', DELIMITER ',', NULL '', EOL '0x0a' ,file_type 'pipe',auto_create_pipe 'false');
INSERT INTO test_pipe_1 select * from foreign_test_pipe_tr;
cd /***/gds_data/
mkfifo foreign_test_pipe.pipe;
A pipe will be automatically cleared after an operation is complete. To perform another operation, create a pipe again.
cat postgres_public_foreign_test_pipe_tw.txt > foreign_test_pipe.pipe
gzip -d < out.gz > foreign_test_pipe.pipe
hdfs dfs -cat - /user/hive/***/test_pipe.txt > foreign_test_pipe.pipe
INSERT INTO test_pipe_1 select * from foreign_test_pipe_tr; INSERT 0 4 SELECT * FROM test_pipe_1; id | sex | name ----+-----+---------------- 3 | 2 | 11111111111111 1 | 2 | 11111111111111 2 | 2 | 11111111111111 4 | 2 | 11111111111111 (4 rows)
GDS also supports importing data through multi-process pipes. That is, one foreign table corresponds to multiple GDSs.
The following takes importing a local file as an example.
gds -d /***/gds_data/ -D -p 192.168.0.1:7789 -l /***/gds_log/aa.log -H 0/0 -t 10 -D gds -d /***/gds_data_1/ -D -p 192.168.0.1:7790 -l /***/gds_log_1/aa.log -H 0/0 -t 10 -D
If you need to set the timeout interval of a pipe, use the --pipe-timeout parameter.
CREATE TABLE test_pipe( id integer not null, sex text not null, name text );
CREATE FOREIGN TABLE foreign_test_pipe_tr( like test_pipe ) SERVER gsmpp_server OPTIONS (LOCATION 'gsfs://192.168.0.1:7789/foreign_test_pipe.pipe|gsfs://192.168.0.1:7790/foreign_test_pipe.pipe', FORMAT 'text', DELIMITER ',', NULL '', EOL '0x0a' , file_type 'pipe', auto_create_pipe 'false');
INSERT INTO test_pipe_1 select * from foreign_test_pipe_tr;
cd /***/gds_data/ cd /***/gds_data_1/
mkfifo foreign_test_pipe.pipe;
cat postgres_public_foreign_test_pipe_tw.txt > foreign_test_pipe.pipe
INSERT INTO test_pipe_1 select * from foreign_test_pipe_tr; INSERT 0 4 SELECT * FROM test_pipe_1; id | sex | name ----+-----+---------------- 3 | 2 | 11111111111111 1 | 2 | 11111111111111 2 | 2 | 11111111111111 4 | 2 | 11111111111111 (4 rows)
gds -d /***/gds_data/ -D -p GDS_IP:GDS_PORT -l /***/gds_log/aa.log -H 0/0 -t 10 -D
If you need to set the timeout interval of a pipe, use the --pipe-timeout parameter.
CREATE TABLE test_pipe( id integer not null, sex text not null, name text ); INSERT INTO test_pipe values(1,2,'11111111111111'); INSERT INTO test_pipe values(2,2,'11111111111111'); INSERT INTO test_pipe values(3,2,'11111111111111'); INSERT INTO test_pipe values(4,2,'11111111111111'); INSERT INTO test_pipe values(5,2,'11111111111111');
CREATE FOREIGN TABLE foreign_test_pipe( id integer not null, age text not null, name text ) SERVER gsmpp_server OPTIONS (LOCATION 'gsfs://GDS_IP:GDS_PORT/', FORMAT 'text', DELIMITER ',', NULL '', EOL '0x0a' ,file_type 'pipe') WRITE ONLY;
INSERT INTO foreign_test_pipe SELECT * FROM test_pipe;
CREATE TABLE test_pipe (id integer not null, sex text not null, name text);
CREATE FOREIGN TABLE foreign_test_pipe(like test_pipe) SERVER gsmpp_server OPTIONS (LOCATION 'gsfs://GDS_IP:GDS_PORT/', FORMAT 'text', DELIMITER ',', NULL '', EOL '0x0a' , file_type 'pipe', auto_create_pipe 'false');
INSERT INTO test_pipe SELECT * FROM foreign_test_pipe;
SELECT * FROM test_pipe; id | sex | name ----+-----+---------------- 3 | 2 | 11111111111111 6 | 2 | 11111111111111 7 | 2 | 11111111111111 1 | 2 | 11111111111111 2 | 2 | 11111111111111 4 | 2 | 11111111111111 5 | 2 | 11111111111111 8 | 2 | 11111111111111 9 | 2 | 11111111111111 (9 rows)
By default, the pipeline file exported from or imported to GDS is named in the format of Database name_Schema name_Foreign table name .pipe. Therefore, the database name and schema name of the target cluster must be the same as those of the source cluster. If the database or schema is inconsistent, you can specify the same pipe file in the URL of the location.
For example:
CREATE FOREIGN TABLE foreign_test_pipe(id integer not null, age text not null, name text) SERVER gsmpp_server OPTIONS (LOCATION 'gsfs://GDS_IP:GDS_PORT/foreign_test_pipe.pipe', FORMAT 'text', DELIMITER ',', NULL '', EOL '0x0a' ,file_type 'pipe') WRITE ONLY;
CREATE FOREIGN TABLE foreign_test_pipe(like test_pipe) SERVER gsmpp_server OPTIONS (LOCATION 'gsfs://GDS_IP:GDS_PORT/foreign_test_pipe.pipe', FORMAT 'text', DELIMITER ',', NULL '', EOL '0x0a' ,file_type 'pipe',auto_create_pipe 'false');