Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
17 KiB
Managing Loader Jobs
Scenario
You can create, view, edit, and delete jobs on the Loader page.
This section applies to versions earlier than MRS 3.x.
Prerequisites
You have accessed the Loader page. For details, see Loader Page.
Creating a Job
- On the Loader page, click New job.
- In Connection, set parameters.
- In Name, enter a job name.
- In From link and To link, select links accordingly.
After you select a link of a type, data is obtained from the specified source and saved to the destination.
- In From, configure the job of the source link.
For details, see Source Link Configurations of Loader Jobs.
- In To, configure the job of the destination link.
For details, see Destination Link Configurations of Loader Jobs.
- Check whether a database link is selected in To link.
Database links include:
- generic-jdbc-connector
- hbase-connector
- hive-connector
If you set To link to a database link, you need to configure a mapping between service data and a field in the database table.
- In Field Mapping, enter a field mapping. Then proceed to 7.
Field Mapping specifies a mapping between each column of user data and a field in the database table.
Table 1 Field Mapping properties Parameter
Description
Column Num
Field sequence of service data
Sample
First row of sample values of service data
Column Family
When To link is hbase-connector, you can select a column family for storing data.
Destination Field
Field for storing data
Type
Type of the field selected by the user
Row Key
When To link is hbase-connector, you need to select Destination Field as a row key.
- In Task Config, set job running parameters.
Table 2 Loader job running properties Parameter
Description
Extractors
Number of Map tasks
Loaders
Number of Reduce tasks
This parameter is displayed only when the destination field is HBase or Hive.
Max. Error Records in a Single Shard
Error record threshold. If the number of error records of a single Map task exceeds the threshold, the task automatically stops and the obtained data is not returned.
NOTE:Data is read and written in batches for MYSQL and MPPDB of generic-jdbc-connector by default. Errors are recorded once at most for each batch of data.
Dirty Data Directory
Specifies the directory for saving dirty data. If you leave this parameter blank, dirty data will not be saved.
- Click Save.
Viewing a Job
- Access the Loader page. The Loader job management page is displayed by default.
- If Kerberos authentication is enabled for the cluster, all jobs created by the current user are displayed by default and other users' jobs cannot be displayed.
- If Kerberos authentication is disabled for the cluster, all Loader jobs of the cluster are displayed.
- In Sqoop Jobs, enter a job name to filter the job.
- Click Refresh to obtain the latest job status.
Editing a Job
- Access the Loader page. The Loader job management page is displayed by default.
- Click the job name to go to the edit page.
- Modify the job configuration parameters based on service requirements.
- Click Save.