Hadoop 2 Setup on 64-bit Ubuntu 12.04 – Part 1

Hadoop on Ubuntu NOTE: This post deals with only a minimal single-node cluster setup. Other posts will deal with various issues related to resource allocation on a multi-node cluster.

While setting up Hadoop 2.2.0 on Ubuntu 12.04.3 LTS 64-bit (VM on Hyper-V), I had to refer to multiple resources and had to overcome some roadblocks. The procedure that worked for me is shared here in three posts:

This post describes software setup and configuration.
Part 2 describes starting up processes and running an example.
Part 3 describes building native libraries for the 64-bit system to give a noticeable performance boost. The downloaded distribution contains 32-bit binaries and the alternative Java libraries can’t match this performance.

Prerequisites

Before performing the setup steps below, I had to ensure that SSH and JDK6 were installed.

sudo apt-get install ssh
sudo apt-get install openjdk-6-jdk

Create Hadoop User

It’s recommended that all Hadoop-related work be performed while logged in as a designated user for this purpose. I named the user hadoop.

Change to root user (Alternatively type sudo in front of each command in steps 2-5 below).
```
sudo -s
```
Provide password when prompted.
Create user.
```
useradd -d /home/hadoop -m hadoop
```
Set password for hadoop.
```
passwd hadoop
```
Provide the desired password for hadoop.
Add hadoop to sudoers file.
```
usermod -a -G sudo hadoop
```
Add bash as the default shell.
```
usermod -s /bin/bash hadoop
```
Connect as hadoop for the remaining steps.
```
su hadoop
```

Configure ssh for hadoop

Generate key and add to authorized keys.

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Set permissions and ownership on key files/folders.

sudo chmod go-w $HOME $HOME/.ssh
sudo chmod 600 $HOME/.ssh/authorized_keys
sudo chown `whoami` $HOME/.ssh/authorized_keys

Test SSH Setup

Connect.
```
ssh localhost
```
Say yes if prompted for trusting. It should open connection with SSH. Log in with hadoop user/password.
Close test session.
```
exit
```

Configure Environment

Create .bash_profile with the following contents. Note that HADOOP_PREFIX specifies where Hadoop lives. This can be a different path depending upon the desired setup.

export HADOOP_PREFIX="/home/hadoop/product/hadoop-2.2.0"
export PATH=$PATH:$HADOOP_PREFIX/bin
export PATH=$PATH:$HADOOP_PREFIX/sbin
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export YARN_HOME=${HADOOP_PREFIX}

Source .bash_profile to make the environment variables effective.
```
source .bash_profile
```

Configure Hadoop

Download the distribution from one mirrors.
Unzip and extract the distribution to the $HADOOP_PREFIX path as configured above.

Edit $HADOOP_PREFIX/etc/hadoop/core-site.xml to have the following contents. Note that another port may be specified.

<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:8020</value>
    <description>The name of the default file system.  Either the
      literal string "local" or a host:port for HDFS.
    </description>
    <final>true</final>
  </property>
</configuration>

Edit $HADOOP_PREFIX/etc/hadoop/hdfs-site.xml to have the following contents. Note that other paths may be used in the configuration below. However, if a different path is used, it must be used consistently in all the steps throughout the setup.

<configuration>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/home/hadoop/workspace/hadoop_space/hadoop2/dfs/name</value>
    <description>Determines where on the local filesystem the DFS name node
        should store the name table.  If this is a comma-delimited list
        of directories then the name table is replicated in all of the
        directories, for redundancy.
    </description>
    <final>true</final>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/home/hadoop/workspace/hadoop_space/hadoop2/dfs/data</value>
    <description>Determines where on the local filesystem an DFS data node
        should store its blocks.  If this is a comma-delimited
        list of directories, then data will be stored in all named
        directories, typically on different devices.
        Directories that do not exist are ignored.
    </description>
    <final>true</final>
    </property>
    <property>
      <name>dfs.replication</name>
      <value>1</value>
    </property>
    <property>
    <name>dfs.permissions</name>
    <value>false</value>
  </property>
</configuration>

Create the workspace paths used in the configuration earlier. This is where HDFS lives.

mkdir -p /home/hadoop/workspace/hadoop_space/hadoop2/dfs/name
mkdir -p /home/hadoop/workspace/hadoop_space/hadoop2/dfs/data

If mapred-site.xml doesn’t exist, copy from the template

cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml

Edit mapred-site.xml to have the following contents.

<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
  <property>
    <name>mapred.system.dir</name>
    <value>file:/home/hadoop/workspace/hadoop_space/hadoop2/mapred/system</value>
    <final>true</final>
  </property>
  <property>
    <name>mapred.local.dir</name>
    <value>file:/home/hadoop/workspace/hadoop_space/hadoop2/mapred/local</value>
    <final>true</final>
  </property>
</configuration>

Create the mapreduce paths configured earlier.

mkdir -p /home/hadoop/workspace/hadoop_space/hadoop2/mapred/system
mkdir -p /home/hadoop/workspace/hadoop_space/hadoop2/mapred/local

Edit yarn-site.xml to have the following contents. Note mapreduce.shuffle from previous versions needs to be mapreduce_shuffle now.

<configuration>
<!-- Site specific YARN configuration properties -->
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
</configuration>

Edit hadoop-env.sh to set the JAVA_HOME correctly. Use the correct path for the system.

# The java implementation to use.
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64

Prepare HDFS

Format hdfs to prepare for first use.
```
hdfs namenode -format
```
Review the output for successful completion.

In part 2, we start up HDFS and YARN, and run an example.

7 Responses to Hadoop 2 Setup on 64-bit Ubuntu 12.04 – Part 1

SutoCom says:

November 22, 2013 at 11:11 am

Reblogged this on Sutoprise Avenue, A SutoCom Source.

Pingback: Hadoop 2 Setup on 64-bit Ubuntu 12.04 – Part 2 | Data Heads
fenix says:

December 8, 2013 at 8:00 pm

lol… there are many features u dont have any clue… setting up cluster is much easier than u have written here.. and dude chk ur env variables and classpaths, u ll definitely get error while starting your cluster..

dataheads says:

December 9, 2013 at 3:46 am

Fenix,

This post sets up a single-node cluster with minimal configuration. These are the precise steps that I have used, and they do work. I plan to write a separate post dealing with details of the cluster setup dealing with decisions about memory and other resources.

Thanks for sharing your opinion though.

Haifa says:

March 11, 2014 at 12:34 am

Hello,

I am beginner to linux, so how can i create .bash_profile in step 1 of configuring environment?

thanks,
HA

Baban Gaigole says:

May 30, 2014 at 3:07 pm

Hello everybody. This one of the “easiest to follow” tutorials i have found. Very neat and precise. I too have setup a multi-node hadoop cluster inside oracle solaris 11.1 using zones. You can have a look at http://hashprompt.blogspot.in/2014/05/multi-node-hadoop-cluster-on-oracle.html

sach says:

August 8, 2014 at 6:40 am

Thanks so much for aux_service setting.

	sach on Hadoop 2 Setup on 64-bit Ubunt…
	Upkar Yadav on Hadoop 2 Setup on 64-bit Ubunt…
	Baban Gaigole on Hadoop 2 Setup on 64-bit Ubunt…
	Madhu Sudhan on Hadoop 2 Setup on 64-bit Ubunt…
	Baban Gaigole on Hadoop 2 Setup on 64-bit Ubunt…