The data server and the cluster reside on the same intranet, the IP address of the data server is 192.168.0.90, and data source files are in CSV format. In this scenario, data is exported in parallel in Remote mode.
To export data in parallel in Remote mode, perform the following operations:
mkdir -p /output_data
groupadd gdsgrp useradd -g gdsgrp gds_user
chown -R gds_user:gdsgrp /output_data
/opt/bin/dws/gds/bin/gds -d /output_data -p 192.168.0.90:5000 -H 10.10.0.1/24 -D
Data export mode settings are as follows:
Data format parameter settings are as follows:
Based on the above settings, the foreign table is created using the following statement:
1 2 3 4 5 6 | CREATE FOREIGN TABLE foreign_tpcds_reasons ( r_reason_sk integer not null, r_reason_id char(16) not null, r_reason_desc char(100) ) SERVER gsmpp_server OPTIONS (LOCATION 'gsfs://192.168.0.90:5000/', FORMAT 'CSV',ENCODING 'utf8',DELIMITER E'\x08', QUOTE E'\x1b', NULL '') WRITE ONLY; |
1 | INSERT INTO foreign_tpcds_reasons SELECT * FROM tpcds.reason; |
ps -ef|grep gds gds_user 128954 1 0 15:03 ? 00:00:00 gds -d /output_data -p 192.168.0.90:5000 -D gds_user 129003 118723 0 15:04 pts/0 00:00:00 grep gds kill -9 128954
The data server and the cluster reside on the same intranet, the IP address of the data server is 192.168.0.90, and data source files are in CSV format. In this scenario, data is concurrently exported to two target tables using multiple threads in Remote mode.
To concurrently export data using multiple threads in Remote mode, perform the following operations:
mkdir -p /output_data groupadd gdsgrp useradd -g gdsgrp gds_user
chown -R gds_user:gdsgrp /output_data
/opt/bin/dws/gds/bin/gds -d /output_data -p 192.168.0.90:5000 -H 10.10.0.1/24 -D -t 2
Based on the preceding settings, the foreign table foreign_tpcds_reasons1 is created using the following statement:
1 2 3 4 5 6 | CREATE FOREIGN TABLE foreign_tpcds_reasons1 ( r_reason_sk integer not null, r_reason_id char(16) not null, r_reason_desc char(100) ) SERVER gsmpp_server OPTIONS (LOCATION 'gsfs://192.168.0.90:5000/', FORMAT 'CSV',ENCODING 'utf8', DELIMITER E'\x08', QUOTE E'\x1b', NULL '') WRITE ONLY; |
Based on the preceding settings, the foreign table foreign_tpcds_reasons2 is created using the following statement:
1 2 3 4 5 6 | CREATE FOREIGN TABLE foreign_tpcds_reasons2 ( r_reason_sk integer not null, r_reason_id char(16) not null, r_reason_desc char(100) ) SERVER gsmpp_server OPTIONS (LOCATION 'gsfs://192.168.0.90:5000/', FORMAT 'CSV', DELIMITER E'\x08', QUOTE E'\x1b', NULL '') WRITE ONLY; |
1 | INSERT INTO foreign_tpcds_reasons1 SELECT * FROM tpcds.reason; |
1 | INSERT INTO foreign_tpcds_reasons2 SELECT * FROM tpcds.reason; |
ps -ef|grep gds gds_user 128954 1 0 15:03 ? 00:00:00 gds -d /output_data -p 192.168.0.90:5000 -D -t 2 gds_user 129003 118723 0 15:04 pts/0 00:00:00 grep gds kill -9 128954
gds -d /***/gds_data/ -D -p 192.168.0.1:7789 -l /***/gds_log/aa.log -H 0/0 -t 10 -D
If you need to set the timeout interval of a pipe, use the --pipe-timeout parameter.
CREATE TABLE test_pipe( id integer not null, sex text not null, name text ) ; INSERT INTO test_pipe values(1,2,'11111111111111'); INSERT INTO test_pipe values(2,2,'11111111111111'); INSERT INTO test_pipe values(3,2,'11111111111111'); INSERT INTO test_pipe values(4,2,'11111111111111');
CREATE FOREIGN TABLE foreign_test_pipe_tw( id integer not null, age text not null, name text ) SERVER gsmpp_server OPTIONS (LOCATION 'gsfs://192.168.0.1:7789/', FORMAT 'text', DELIMITER ',', NULL '', EOL '0x0a' ,file_type 'pipe', auto_create_pipe 'false') WRITE ONLY;
INSERT INTO foreign_test_pipe_tw select * from test_pipe;
cd /***/gds_data/
mkfifo postgres_public_foreign_test_pipe_tw.pipe
A pipe will be automatically cleared after an operation is complete. To perform another operation, create a pipe again.
cat postgres_public_foreign_test_pipe_tw.pipe > postgres_public_foreign_test_pipe_tw.txt
gzip -9 -c < postgres_public_foreign_test_pipe_tw.pipe > out.gz
cat postgres_public_foreign_test_pipe_tw.pipe | hdfs dfs -put - /user/hive/***/test_pipe.txt
cat postgres_public_foreign_test_pipe_tw.txt 3,2,11111111111111 1,2,11111111111111 2,2,11111111111111 4,2,11111111111111
vim out.gz 3,2,11111111111111 1,2,11111111111111 2,2,11111111111111 4,2,11111111111111
hdfs dfs -cat /user/hive/***/test_pipe.txt 3,2,11111111111111 1,2,11111111111111 2,2,11111111111111 4,2,11111111111111
GDS also supports importing and exporting data through multi-process pipes. That is, one foreign table corresponds to multiple GDSs.
The following takes exporting a local file as an example.
gds -d /***/gds_data/ -D -p 192.168.0.1:7789 -l /***/gds_log/aa.log -H 0/0 -t 10 -D gds -d /***/gds_data_1/ -D -p 192.168.0.1:7790 -l /***/gds_log/aa.log -H 0/0 -t 10 -D
If you need to set the timeout interval of a pipe, use the --pipe-timeout parameter.
CREATE TABLE test_pipe (id integer not null, sex text not null, name text);
INSERT INTO test_pipe values(1,2,'11111111111111'); INSERT INTO test_pipe values(2,2,'11111111111111'); INSERT INTO test_pipe values(3,2,'11111111111111'); INSERT INTO test_pipe values(4,2,'11111111111111');
CREATE FOREIGN TABLE foreign_test_pipe_tw( id integer not null, age text not null, name text ) SERVER gsmpp_server OPTIONS (LOCATION 'gsfs://192.168.0.1:7789/|gsfs://192.168.0.1:7790/', FORMAT 'text', DELIMITER ',', NULL '', EOL '0x0a' ,file_type 'pipe', auto_create_pipe 'false') WRITE ONLY;
INSERT INTO foreign_test_pipe_tw select * from test_pipe;
cd /***/gds_data/ cd /***/gds_data_1/
mkfifo postgres_public_foreign_test_pipe_tw.pipe
cat postgres_public_foreign_test_pipe_tw.pipe > postgres_public_foreign_test_pipe_tw.txt
cat /***/gds_data/postgres_public_foreign_test_pipe_tw.txt 3,2,11111111111111
cat /***/gds_data_1/postgres_public_foreign_test_pipe_tw.txt 1,2,11111111111111 2,2,11111111111111 4,2,11111111111111