Developing a Hive SQL Job

This section introduces how to develop Hive SQL scripts on DataArts Factory.

Scenario Description

As a one-stop big data development platform, DataArts Factory supports development of multiple big data tools. Hive is a data warehouse tool running on Hadoop. It can map structured data files to a database table and provides a simple SQL search function that converts SQL statements into MapReduce tasks.

Preparations

Creating a Hive Data Connection

Before developing a Hive SQL script, you must create a data connection to MRS Hive on the Manage Data Connections page of Management Center. The data connection name is hive1009.

Description of key parameters:

Developing a Hive SQL Script

Choose Development > Develop Script and create a Hive SQL script named hive_sql. Then enter SQL statements in the editor to fulfill business requirements.

Figure 1 Developing a script

Notes:

Developing a Hive SQL Job

After the Hive SQL script is developed, build a periodically deducted job for the Hive SQL script so that the script can be executed periodically.

  1. Create an empty DataArts Factory job named job_hive_sql.

    Figure 2 Creating a job named job_hive_sql

  2. Go to the job development page, drag the MRS Hive SQL node to the canvas, and click the node to configure node properties.

    Figure 3 Configuring properties for an MRS Hive SQL node

    Description of key properties:

    • SQL Script: Hive SQL script hive_sql that is developed in Developing a Hive SQL Script.
    • Data Connection: Data connection that is configured in the SQL script hive_sql is selected by default. The value can be changed.
    • Database: Database that is configured in the SQL script hive_sql and is selected by default. The value can be changed.
    • Node Name: Name of the SQL script hive_sql by default. The value can be changed.

  3. After configuring the job, click to test it.
  4. If the job runs successfully, click the blank area on the canvas and configure the job scheduling policy on the scheduling configuration page on the right.

    Figure 4 Configuring the scheduling mode

    Note:

    From Jan 1 to Jan 25 in 2021, the job was executed at 02:00 every day.

  5. Click Submit and Execute. The job will be automatically executed every day.