Overview

DataArts Factory is a one-stop big data collaborative development platform that provides fully managed big data scheduling capabilities. It manages various big data services, making big data more accessible than ever before and helping you effortlessly build big data processing centers.

DataArts Factory used to be Data Lake Factory (DLF). Therefore, in this document, both Data Lake Factory and DLF can be used to refer to DataArts Factory.

Introduction to DataArts Factory

DataArts Factory enables a variety of operations such as data management, script development, job development, job scheduling, and monitoring, facilitating data analysis and processing.

Figure 1 DataArts Factory architecture

Main Functions

Table 1 Main functions of DataArts Factory

Function

Description

Data management

  • Manages multiple data warehouses, such as GaussDB(DWS), DLI and MRS Hive.
  • Manages data tables using the GUI or data definition language (DDL).

Script development

  • Provides an online script editor that allows more than one operator to collaboratively develop and debug SQL, Python, and Shell scripts online.
  • Allows use of variables and functions.

Job development

  • Provides a graphical designer that allows you to quickly build a data processing workflow by drag-and-drop.
  • Presets multiple task types such as data integration, SQL, and Shell, and completes data analysis and processing by dependency between tasks.
  • Supports job import and export.

Resource management

Supports unified management of file, jar, and archive resources used during script and job development.

Job scheduling

Schedules jobs to run once or recursively and use events to trigger scheduling jobs.

Monitoring

  • You can run, suspend, restore, or terminate a job.
  • You can view the operation details of each job and each node in the job.
  • You can use various methods to receive notifications when a job or task error occurs.

Objects in DataArts Factory