Uploading Data to OBS

Scenarios

Before importing data from OBS to a cluster, prepare source data files and upload these files to OBS. If the data files have been stored on OBS, you only need to complete 2 to 3 in Uploading Data to OBS.

Preparing Data Files

Prepare source data files to be uploaded to OBS. GaussDB(DWS) supports only source data files in CSV, TXT, ORC, or CarbonData format.

If user data cannot be saved in CSV format, store the data as any text file.

According to How Data Is Imported, when the source data file contains a large volume of data, evenly split the file into multiple files before storing it to OBS. The import performance is better when the number of files is an integer multiple of the DN quantity.

Assume that you have stored the following three CSV files in OBS:

Uploading Data to OBS

  1. Upload data to OBS.

    Store the source data files to be imported in the OBS bucket in advance.

    1. Log in to the OBS management console.

      Click Service List and choose Object Storage Service to open the OBS management console.

    2. Create a bucket.

      For details about how to create an OBS bucket, see "OBS Console Operation Guide > Managing Buckets > Creating a Bucket" in the Object Storage Service User Guide..

      For example, create two buckets named mybucket and mybucket02.

    3. Create a folder.

      For details, see "OBS Console Operation Guide > Managing Objects > Creating a Folder" in the Object Storage Service User Guide.

      For example:

      • Create a folder named input_data in the mybucket OBS bucket.
      • Create a folder named input_data in the mybucket02 OBS bucket.
    4. Upload the files.

      For details, see "OBS Console Operation Guide > Managing Objects > Uploading a File" in the Object Storage Service User Guide..

      For example:

      • Upload the following data files to the input_data folder in the mybucket OBS bucket:
        1
        2
        product_info.0
        product_info.1
        
      • Upload the following data file to the input_data folder in the mybucket02 OBS bucket:
        1
        product_info.2
        

  2. Obtain the OBS path for storing source data files.

    After the source data files are uploaded to an OBS bucket, a globally unique access path is generated. The OBS path of the source data files is the value of the location parameter used for creating a foreign table.

    The OBS path in the location parameter is in the format of obs://bucket_name/file_path/

    For example, the OBS paths are as follows:

    1
    2
    3
    obs://mybucket/input_data/product_info.0
    obs://mybucket/input_data/product_info.1
    obs://mybucket02/input_data/product_info.2
    

  3. Grant the OBS bucket read permission for the user who will import data.

    When importing data from OBS to a cluster, the user must have the read permission for the OBS buckets where the source data files are located. You can configure the ACL for the OBS buckets to grant the read permission to a specific user.

    For details, see "OBS Console Operation Guide > Permission Control > Configuring a Bucket ACL" in the Object Storage Service User Guide.