Parallel OBS Data Export

Overview

GaussDB(DWS) databases allow you to export data in parallel using OBS foreign tables, in which the export mode and the exported data format are specified. Data is exported in parallel through multiple DNs from GaussDB(DWS) to the OBS server, improving the overall export performance.
  • The CN only plans data export tasks and delivers the tasks to DNs. In this case, the CN is released to process external requests.
  • The computing capability and bandwidth of all the DNs are fully leveraged to export data.
  • You can concurrently export data using multiple OBS services, but the bucket and object paths specified for the export tasks must be different and cannot be null.
  • The OBS server connects to GaussDB(DWS) cluster nodes. The export rate is affected by the network bandwidth.
  • The TEXT and CSV data file formats are supported. The size of data in a single row must be less than 1 GB.
  • Data in ORC format is supported only by 8.1.0 or later.

Related Concepts

Principles

The following describes the principles of exporting data from a cluster to OBS by using a distributed hash table or a replication table.

Naming Rules of Exported Files

Rules for naming the files exported from GaussDB(DWS) to OBS are as follows:

Data Export Process

Figure 2 Concurrent data export
Table 1 Process description

Procedure

Description

Subtask

Plan data export.

Create an OBS bucket and a folder in the OBS bucket as the directory for storing exported data files.

For details, see Planning Data Export.

-

Create an OBS foreign table.

Create a foreign table to help OBS specify information about data files to be exported. The foreign table stores information, such as the destination location, format, encoding, and data delimiter of a source data file.

For details, see Creating an OBS Foreign Table.

-

Export data.

After the foreign table is created, run the INSERT statement to efficiently export data to data files.

For details, see Exporting Data.

-