This section describes how to submit a DistCp job using the Oozie client.
You are advised to download the latest client.
If the current client is an earlier version, you need to download and install the client again.
source /opt/client/bigdata_env
For example, the oozieuser user is authenticated using the following command:
kinit oozieuser
cd /opt/client/Oozie/oozie-client-*/examples/apps/distcp/
Table 1 lists the files that you need to pay attention to in the directory.
vi job.properties
Perform the following modifications:
Change the value of userName to the name of the human-machine user who submits the job, for example, userName=oozieuser.
cp workflow.xml workflow.xml.bak
vi workflow.xml
Modify the following content:
<workflow-app xmlns="uri:oozie:workflow:1.0" name="distcp-wf"> <start to="distcp-node"/> <action name="distcp-node"> <distcp xmlns="uri:oozie:distcp-action:1.0"> <resource-manager>${resourceManager}</resource-manager> <name-node>${nameNode}</name-node> <prepare> <delete path="hdfs://target_ip:target_port/user/${userName}/${examplesRoot}/output-data/${outputDir}"/> </prepare> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> <property> <name>oozie.launcher.mapreduce.job.hdfs-servers</name> <value>hdfs://source_ip:source_port,hdfs://target_ip:target_port</value> </property> </configuration> <arg>${nameNode}/user/${userName}/${examplesRoot}/input-data/text/data.txt</arg> <arg>hdfs://target_ip:target_port/user/${userName}/${examplesRoot}/output-data/${outputDir}/data.txt</arg> </distcp> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>DistCP failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
target_ip:target_port is the HDFS active NameNode address of the other trusted cluster, for example, 10.10.10.233:25000.
source_ip:source_port indicates the HDFS active NameNode address of the source cluster, for example, 10.10.10.223:25000.
Change the two IP addresses and port numbers based on the site requirements.
oozie job -oozie https://Host name of the Oozie role:21003/oozie/ -config job.properties -run
-oozie URL of the Oozie server that executes a job
-config Workflow property file
-run Executing a workflow
Log in to the Oozie web UI at https://IP address of the Oozie role:21003/oozie as user oozieuser.
On the Oozie web UI, you can view the submitted workflow information based on the job ID in the table on the page.