Hadoop Pseudo Distributed Mode

Hadoop cluster can be emulated with "pseudo-distributed mode"

  • all Hadoop demons run, and applications feel like tHey are being executed on a real cluster
  • good for testing Hadoop MapReduce jobs before running them on a fully distributed cluster

Setting Up Locally


  • install Hadoop from binaries, put e.g. to ~/soft/hadoop-2.6.0/
  • point HADOOP_CONF_DIR to some directory with config, e.g. ~/conf/hadoop-local

You need to export the following env variables:


export HADOOP_HOME=~/soft/hadoop-2.6.0

export HADOOP_CONF_DIR=~/conf/hadoop-cluster


Also, if you don't have a java on your PATH, you need to create hadoop-env.sh in HADOOP_CONF_DIR and add (replace)

export JAVA_HOME=/home/user/soft/jdk1.8.0_60/
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}


Hadoop in "Pseudo-distributed mode" should have properties similar to these:

cat core-site.xml
<?xml version="1.0"?>
cat hdfs-site.xml 
<?xml version="1.0"?>
cat mapred-site.xml 
<?xml version="1.0"?>
cat yarn-site.xml 
<?xml version="1.0"?>

File System

  • Once the configuration is set, format the filesystem
  • hdfs namenode -format
  • if hadoop.tmp.dir is not specified, it'll use /tmp/hadoop-${user.name}, which is cleaned after each reboot

Setting SSH Access

  • Application master and workers on the cluster communicate via ssh
  • it's the same for pseudodistributed mode - except that the master and all the workers are located on the same machine
  • but they still need to use ssh for that
  • so make sure you can do ssh localhost
  • if not - check if ssh service and ssh-agent are running

Starting Daemons

To start, use

mr-jobhistory-daemon.sh start historyserver

make sure namenode started:

telnet localhost 8020

If namenode doesn't start in local mode, do [1]:

  • delete all contents from the hadoop temporary folder: rm -Rf tmp_dir
  • format the namenode: hadoop namenode -format
  • start the namenode again: start-dfs.sh

Starting datanodes

  • hadoop-daemon.sh start datanode
  • to check if it works:
hadoop fs -put somefile /home/username/
hadoop fs -ls /home/username/ 


  • if datanode doesn't start [2]
  • if yarn resourcemanager doesn't start

Jobs Monitoring

yarn application -list
yarn application -kill application_1445857836386_0002