Hadoop Logging

Logging Overview

Log4J Log Levels

  • Find log4j log levels for daemon service using http://<namenode>:50070/logLevel or http://<resource manager>:8088/logLevel
  • You can temporarily change log level using the above links. Log levels resets to the log4j.properties after restarts
  • You can also find log level and temporarily set the log level using shell command
$ hadoop daemonlog -getlevel <resource manager>:8088 <daemon class name as appears in jps -l command as below>
  • Find daemon class name using jps command
$ sudo jps -l
5826 org.apache.hadoop.yarn.server.nodemanager.NodeManager
5401 org.apache.hadoop.hdfs.server.namenode.NameNode
5537 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
6115 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
7454 org.apache.spark.deploy.history.HistoryServer
5725 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer
5310 org.apache.hadoop.hdfs.qjournal.server.JournalNode
5203 org.apache.hadoop.hdfs.server.datanode.DataNode

HDFS Audit Logging

  • Enable HDFS audit logging by setting the env variable in hadoop-env.sh and set log4j log hanger in log4j.properties

Job History

  • Job history is maintained for all completed job by job history service
  • Job history is stored HDFS as specified in mapreduce.jobhistory.done-dir [mapred-site.xml]
  • Job history records are stored for a week because system deletes them
  • Job history includes - job, tasks, attempts and stored in json format.
  • View job history information by http://<job history server>:19888

MapReduce Task Logs

  • MapReduce task logs are produced by log4j and stored locally by default
  • MapReduce task logs can be aggregated to a HDFS location by setting yarn.log-aggregation-enable [yarn-site.xml] property
  • Task logs are stored for 3 hrs or as configured in yarn.log-aggregation.retain-seconds [yarn-site.xml]
  • Logging level at the task level can be altered using the following properties
    • mapreduce.map.log.level
    • mapreduce.reduce.log.level