Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Lu, Huayi <luhuayi@huawei.com> Co-committed-by: Lu, Huayi <luhuayi@huawei.com>
12 KiB
Exporting Data In Parallel Using GDS
In high-concurrency scenarios, you can use GDS to export data from a database to a common file system.
In the current GDS version, data can be exported from a database to a pipe file.
- If the local disk space of the GDS user is insufficient:
- The data exported from GDS is compressed using the pipe to occupy less disk space.
- The exported data is transferred through the pipe to the HDFS server for storage.
- If you need to cleanse data before exporting data:
- You can compile programs as needed and read streaming data from pipes in real time.
Overview
- The CN only plans data export tasks and delivers the tasks to DNs. In this case, the CN is released to process other tasks.
- In this way, the computing capabilities and bandwidths of all the DNs are fully leveraged to export data.
Related Concepts
- Data file: A TEXT, CSV, or FIXED file that stores data exported from the GaussDB(DWS) database.
- Foreign table: A table that stores information, such as the format, location, and encoding format of a data file.
- GDS: A data service tool. To export data, deploy it on the server where data files are stored.
- Table: Tables in the database, including row-store tables and column-store tables. Data in the data files is exported from these tables.
- Remote mode: Service data in a cluster is exported to hosts outside the cluster.
Exporting a Schema
Data can be exported to GaussDB(DWS) in Remote mode.
- Remote mode: Service data in a cluster is exported to hosts outside the cluster.
- In this mode, multiple GDSs are used to concurrently export data. One GDS can export data for only one cluster at a time.
- The data export rate of a GDS that resides on the same intranet as cluster nodes is limited by the network bandwidth. A 10GE configuration is recommended.
- Data files in TEXT or CSV format are supported. The size of data in a single row must be less than 1 GB.
Data Export Process
Process |
Description |
Subtask |
---|---|---|
Plan data export. |
Prepare data to be exported and plan the export path for the mode to be selected. For details, see Planning Data Export. |
- |
Start GDS. |
If the Remote mode is selected, install, configure, and start GDS on data servers. For details, see Installing, Configuring, and Starting GDS. |
- |
Create a foreign table, |
Create a foreign table to help GDS specify information about a data file. The foreign table stores information, such as the location, format, encoding, and inter-data delimiter of a data file. For details, see Creating a GDS Foreign Table. |
- |
Export data. |
After the foreign table is created, run the INSERT statement to efficiently export data to data files. For details, see Exporting Data. |
- |
Stop GDS. |
Stop GDS after data is exported. For details, see Stopping GDS. |
- |