GaussDB(DWS) uses GDS to allocate the source data for parallel data import. Deploy GDS on the data server.
If a large volume of data is stored on multiple data servers, install, configure, and start GDS on each server. Then, data on all the servers can be imported in parallel. The procedure for installing, configuring, and starting GDS is the same on each data server. This section describes how to perform this procedure on one data server.
Therefore, use the latest version of GDS. After the database is upgraded, download the latest version of GaussDB(DWS) GDS as instructed in Procedure. When the import or export starts, GaussDB(DWS) checks the GDS versions. If the versions do not match, an error message is displayed and the import or export is terminated.
To obtain the version number of GDS, run the following command in the GDS decompression directory:
gds -V
To view the database version, run the following SQL statement after connecting to the database:
1 | SELECT version(); |
mkdir -p /opt/bin/dws
Use the SUSE Linux package as an example. Upload the GDS package dws_client_8.1.x_suse_x64.zip to the directory created in the previous step.
cd /opt/bin/dws unzip dws_client_8.1.x_suse_x64.zip
groupadd gdsgrp useradd -g gdsgrp gds_user
chown -R gds_user:gdsgrp /opt/bin/dws/gds chown -R gds_user:gdsgrp /input_data
su - gds_user
If the current cluster version is 8.0.x or earlier, skip 9 and go to 10.
If the current cluster version is 8.1.x, go to the next step.
cd /opt/bin/dws/gds/bin source gds_env
GDS is green software and can be started after being decompressed. There are two ways to start GDS. One is to run the gds command to configure startup parameters. The other is to write the startup parameters into the gds.conf configuration file and run the gds_ctl.py command to start GDS.
gds -d dir -p ip:port -H address_string -l log_file -D -t worker_num
Example:
/opt/bin/dws/gds/bin/gds -d /input_data/ -p 192.168.0.90:5000 -H 10.10.0.1/24 -l /opt/bin/dws/gds/gds_log.txt -D -t 2
gds -d dir -p ip:port -H address_string -l log_file -D -t worker_num --enable-ssl --ssl-dir Cert_file
Example:
/opt/bin/dws/gds/bin/gds -d /input_data/ -p 192.168.0.90:5000 -H 10.10.0.1/24 -l /opt/bin/dws/gds/gds_log.txt -D --enable-ssl --ssl-dir /opt/bin/
Replace the information in italic as required.
GDS determines the number of threads based on the number of concurrent import transactions. Even if multi-thread import is configured before GDS startup, the import of a single transaction will not be accelerated. By default, an INSERT statement is an import transaction.
vim /opt/bin/dws/gds/config/gds.conf
Example:
The gds.conf configuration file contains the following information:
<?xml version="1.0"?> <config> <gds name="gds1" ip="192.168.0.90" port="5000" data_dir="/input_data/" err_dir="/err" data_seg="100MB" err_seg="100MB" log_file="/log/gds_log.txt" host="10.10.0.1/24" daemon='true' recursive="true" parallel="32"></gds> </config>
Information in the configuration file is described as follows:
python3 gds_ctl.py start
Example:
cd /opt/bin/dws/gds/bin
python3 gds_ctl.py start
Start GDS gds1 [OK]
gds [options]:
-d dir Set data directory.
-p port Set GDS listening port.
ip:port Set GDS listening ip address and port.
-l log_file Set log file.
-H secure_ip_range
Set secure IP checklist in CIDR notation. Required for GDS to start.
-e dir Set error log directory.
-E size Set size of per error log segment.(0 < size < 1TB)
-S size Set size of data segment.(1MB < size < 100TB)
-t worker_num Set number of worker thread in multi-thread mode, the upper limit is 32. If without setting, the default value is 1.
-s status_file Enable GDS status report.
-D Run the GDS as a daemon process.
-r Read the working directory recursively.
-h Display usage.
Attribute |
Description |
Value Range |
---|---|---|
name |
Identifier |
- |
ip |
Listening IP address |
The IP address must be valid. Default value: 127.0.0.1 |
port |
Listening port |
Value range: 1024 to 65535 (integer) Default value: 8098 |
data_dir |
Data file directory |
- |
err_dir |
Error log file directory |
Default value: data file directory |
log_file |
Log file Path |
- |
host |
Host IP address allowed to be connected to GDS (The value must in CIDR format and this parameter is available for the Linux OS only.) |
- |
recursive |
Whether the data file directories are recursive |
Value range:
Default value: false |
daemon |
Whether the process is running in daemon mode |
Value range:
Default value: false |
parallel |
Number of concurrent data import threads |
Value range: 0 to 32 (integer) Default value: 1 |