Configuring the Log Archiving and Clearing Mechanism

Scenario

Job and task logs are generated during execution of a MapReduce application.

The job logs and task logs of the MapReduce are stored on HDFS (when the log aggregation function is enabled). If the mechanism for periodically archiving and deleting log files is not configured for a cluster with a large number of computation tasks, the log files will occupy large memory space of HDFS and increase the cluster load.

Log archive is implemented by Hadoop Archives. The number (number of Map tasks) of concurrent archiving tasks started by the Hadoop Archives is related to the total size of log files to be archived. The formula is as follows: Number of concurrent archive tasks = Total size of log files to be archived/Size of archive files.

Configuration

Go to the All Configurations page of the MapReduce service. For details, see Modifying Cluster Service Configuration Parameters.

Enter the parameter name in the search box, change the parameter value, and save the configuration. On the Dashboard tab page of the Mapreduce service, choose More > Synchronize Configuration. After the synchronization is complete, restart the Mapreduce service.