Reviewed-by: Kacur, Michal <michal.kacur@t-systems.com> Co-authored-by: proposalbot <proposalbot@otc-service.com> Co-committed-by: proposalbot <proposalbot@otc-service.com>
5.5 KiB
- original_name
mrs_01_1084.html
Using Loader from Scratch
You can use Loader to import data from the SFTP server to HDFS.
This section applies to MRS clusters earlier than 3.x.
Prerequisites
- You have prepared service data.
- You have created an analysis cluster.
Procedure
Access the Loader page.
Access the cluster details page.
- For versions earlier than MRS 1.9.2, log in to MRS Manager and choose Services.
- For MRS 1.9.2 or later, click the cluster name on the MRS console and choose Components.
Choose Hue. In Hue Web UI of Hue Summary, click Hue (Active). The Hue web UI is displayed.
Choose Data Browsers > Sqoop.
The job management tab page is displayed by default on the Loader page.
On the Loader page, click Manage links.
Click New link and create sftp-connector. For details, see
File Server Link <mrs_01_0402__s73ada4f9d7e94890a00a2c7a90856ba6>
.Click New link, enter the link name, select hdfs-connector, and create hdfs-connector.
On the Loader page, click Manage jobs.
Click New Job.
In Connection, set parameters.
- In Name, enter a job name.
- Select the source link created in
3 <mrs_01_1084__li48883218306>
and the target link created in4 <mrs_01_1084__li14723052103216>
.
In From, configure the job of the source link.
For details, see
ftp-connector or sftp-connector <mrs_01_0404__s033d5edc10164032b9ea23d01081beae>
.In To, configure the job of the target link.
For details, see
hdfs-connector <mrs_01_0405__s0e7a49c2520c498aa9e3d9fa84325e2e>
.In Task Config, set job running parameters.
Table 1 Loader job running properties Parameter Description Extractors Number of Map tasks Loaders Number of Reduce tasks
This parameter is displayed only when the destination field is HBase or Hive.
Max. Error Records in a Single Shard Error record threshold. If the number of error records of a single Map task exceeds the threshold, the task automatically stops and the obtained data is not returned.
Note
Data is read and written in batches for MYSQL and MPPDB of generic-jdbc-connector by default. Errors are recorded once at most for each batch of data.
Dirty Data Directory Directory for saving dirty data. If you leave this parameter blank, dirty data will not be saved. Click Save.