Hadoop 2 Setup on 64-bit Ubuntu 12.04 – Part 1

Hadoop on UbuntuNOTE: This post deals with only a minimal single-node cluster setup. Other posts will deal with various issues related to resource allocation on a multi-node cluster.

While setting up Hadoop 2.2.0 on Ubuntu 12.04.3 LTS 64-bit (VM on Hyper-V), I had to refer to multiple resources and had to overcome some roadblocks. The procedure that worked for me is shared here in three posts:

  1. This post describes software setup and configuration.
  2. Part 2 describes starting up processes and running an example.
  3. Part 3 describes building native libraries for the 64-bit system to give a noticeable performance boost. The downloaded distribution contains 32-bit binaries and the alternative Java libraries can’t match this performance.


Before performing the setup steps below, I had to ensure that SSH and JDK6 were installed.

sudo apt-get install ssh
sudo apt-get install openjdk-6-jdk

Create Hadoop User

It’s recommended that all Hadoop-related work be performed while logged in as a designated user for this purpose. I named the user hadoop.

  1. Change to root user (Alternatively type sudo in front of each command in steps 2-5 below).
    sudo -s

    Provide password when prompted.

  2. Create user.
    useradd -d /home/hadoop -m hadoop
  3. Set password for hadoop.
    passwd hadoop

    Provide the desired password for hadoop.

  4. Add hadoop to sudoers file.
    usermod -a -G sudo hadoop
  5. Add bash as the default shell.
    usermod -s /bin/bash hadoop
  6. Connect as hadoop for the remaining steps.
    su hadoop

Configure ssh for hadoop

  1. Generate key and add to authorized keys.
    ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
    cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
  2. Set permissions and ownership on key files/folders.
    sudo chmod go-w $HOME $HOME/.ssh
    sudo chmod 600 $HOME/.ssh/authorized_keys
    sudo chown `whoami` $HOME/.ssh/authorized_keys

Test SSH Setup

  1. Connect.
    ssh localhost

    Say yes if prompted for trusting. It should open connection with SSH. Log in with hadoop user/password.

  2. Close test session.

Configure Environment

  1. Create .bash_profile with the following contents. Note that HADOOP_PREFIX specifies where Hadoop lives. This can be a different path depending upon the desired setup.
    export HADOOP_PREFIX="/home/hadoop/product/hadoop-2.2.0"
    export PATH=$PATH:$HADOOP_PREFIX/bin
    export PATH=$PATH:$HADOOP_PREFIX/sbin
  2. Source .bash_profile to make the environment variables effective.
    source .bash_profile

Configure Hadoop

  1. Download the distribution from one mirrors.
  2. Unzip and extract the distribution to the $HADOOP_PREFIX path as configured above.
  3. Edit $HADOOP_PREFIX/etc/hadoop/core-site.xml to have the following contents. Note that another port may be specified.
        <description>The name of the default file system.  Either the
          literal string "local" or a host:port for HDFS.
  4. Edit $HADOOP_PREFIX/etc/hadoop/hdfs-site.xml to have the following contents. Note that other paths may be used in the configuration below. However, if a different path is used, it must be used consistently in all the steps throughout the setup.
        <description>Determines where on the local filesystem the DFS name node
            should store the name table.  If this is a comma-delimited list
            of directories then the name table is replicated in all of the
            directories, for redundancy.
        <description>Determines where on the local filesystem an DFS data node
            should store its blocks.  If this is a comma-delimited
            list of directories, then data will be stored in all named
            directories, typically on different devices.
            Directories that do not exist are ignored.
  5. Create the workspace paths used in the configuration earlier. This is where HDFS lives.
    mkdir -p /home/hadoop/workspace/hadoop_space/hadoop2/dfs/name
    mkdir -p /home/hadoop/workspace/hadoop_space/hadoop2/dfs/data
  6. If mapred-site.xml doesn’t exist, copy from the template
    cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml

    Edit mapred-site.xml to have the following contents.

  7. Create the mapreduce paths configured earlier.
    mkdir -p /home/hadoop/workspace/hadoop_space/hadoop2/mapred/system
    mkdir -p /home/hadoop/workspace/hadoop_space/hadoop2/mapred/local
  8. Edit yarn-site.xml to have the following contents. Note mapreduce.shuffle from previous versions needs to be mapreduce_shuffle now.
    <!-- Site specific YARN configuration properties -->
  9. Edit hadoop-env.sh to set the JAVA_HOME correctly. Use the correct path for the system.
    # The java implementation to use.
    #export JAVA_HOME=${JAVA_HOME}
    export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64

Prepare HDFS

  1. Format hdfs to prepare for first use.
    hdfs namenode -format

    Review the output for successful completion.

In part 2, we start up HDFS and YARN, and run an example.

This entry was posted in Big Data and tagged , , , , , , , , , , , . Bookmark the permalink.

7 Responses to Hadoop 2 Setup on 64-bit Ubuntu 12.04 – Part 1

  1. Pingback: Hadoop 2 Setup on 64-bit Ubuntu 12.04 – Part 2 | Data Heads

  2. fenix says:

    lol… there are many features u dont have any clue… setting up cluster is much easier than u have written here.. and dude chk ur env variables and classpaths, u ll definitely get error while starting your cluster..

  3. dataheads says:


    This post sets up a single-node cluster with minimal configuration. These are the precise steps that I have used, and they do work. I plan to write a separate post dealing with details of the cluster setup dealing with decisions about memory and other resources.

    Thanks for sharing your opinion though.

  4. Haifa says:


    I am beginner to linux, so how can i create .bash_profile in step 1 of configuring environment?


  5. Hello everybody. This one of the “easiest to follow” tutorials i have found. Very neat and precise. I too have setup a multi-node hadoop cluster inside oracle solaris 11.1 using zones. You can have a look at http://hashprompt.blogspot.in/2014/05/multi-node-hadoop-cluster-on-oracle.html

  6. sach says:

    Thanks so much for aux_service setting.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s