For example, data imported or exported in MySQL compatibility mode can be exported or imported only in MySQL compatibility mode.
Generally, objects are managed as files. However, OBS has no file system–related concepts, such as files or folders. To let users easily manage data, OBS allows them to simulate folders. Users can add a slash (/) in the object name, for example, tpcds1000/stock.csv. In this name, tpcds1000 is regarded as the folder name and stock.csv the file name. The value of key (object name) is still tpcds1000/stock.csv, and the content of the object is the content of the stock.csv file.
The following describes the principles of exporting data from a cluster to OBS by using a distributed hash table or a replication table.
A distributed hash table stores data in hash mode. Figure 1 shows how to export data from table (T2) to OBS as an example.
During table data storage, the col2 hash column in table T2 is hashed, and a hash value is generated. The tuple is distributed to corresponding DNs for storage according to the mapping between the DNs and the hash value.
When data is exported to OBS, DNs that store the exported data of T2 directly export their data files to OBS. Original data on multiple nodes will be exported in parallel.
A replication table stores a package of complete table data on each GaussDB(DWS) node. When exporting data to OBS, GaussDB(DWS) randomly selects a DN for export.
Rules for naming the files exported from GaussDB(DWS) to OBS are as follows:
For example, the data of table t1 on datanode3 will be exported as t1_datanode3_segment.0, t1_datanode3_segment.1, and so on.
You are advised to export data from different clusters or databases to different OBS buckets or different paths of the same OBS bucket.
A segment has already stored 100 pieces of tuples (1023 MB) when datanode3 exports data from t1 to OBS. If a 5 MB tuple is inserted to the segment, the data size becomes 1028 MB. In this case, file t1_datanode3_segment.0 (1023 MB) is generated and stored on OBS, and the new tuple is stored on OBS as file t1_datanode3_segment.1.
For example, a cluster has DataNode1, DataNode2, DataNode3, DataNode4, DataNode5, and DataNode6, which store 1.5 GB, 0.7 GB, 0.6 GB, 0.8 GB, 0.4 GB, and 0.5 GB data, respectively. Seven OBS segment files will be generated during data export because DataNode1 will generate two segment files, which store 1 GB and 0.5 GB data, respectively.
Procedure |
Description |
Subtask |
---|---|---|
Plan data export. |
Create an OBS bucket and a folder in the OBS bucket as the directory for storing exported data files. For details, see Planning Data Export. |
- |
Create an OBS foreign table. |
Create a foreign table to help OBS specify information about data files to be exported. The foreign table stores information, such as the destination location, format, encoding, and data delimiter of a source data file. For details, see Creating an OBS Foreign Table. |
- |
Export data. |
After the foreign table is created, run the INSERT statement to efficiently export data to data files. For details, see Exporting Data. |
- |