Now is the time when we start to see the fruits of our labor in getting the ODROID XU4 low cost cluster built. We will be installing Hadoop and configuring it to serve an NFS mount that can be mounted on your client computer (e.g., your laptop) to be able to interact with the HDFS file system as if it were another hard drive on your computer. This feature will greatly ease the use of our cluster, as it will minimize the need for a user to log into the cluster to use it. An NFS mount is not the only necessary facet of the cluster to enable the client usage vision, but it is an important one.
Before we install Hadoop, let’s discuss what we are trying to accomplish by installing it. Hadoop has three components: the Hadoop File System (HDFS), Yarn, and Map-Reduce. For our purposes, we are most interested in HDFS, but we will play around with the other two. What HDFS will do for us is turn the MicroSD cards installed onto every node into a single “virtual drive” that files can be written to for use by the cluster’s analytics applications. HDFS will (if appropriate) break a large file up into parts and then distribute those parts around the cluster. The benefit of this is that when doing distributed computing operations on the file, work will be split up with each node being responsible for processing a set of parts.
Installing Java
Hadoop is written in Java, so the first thing we need to do is install Java on all nodes in the cluster. Run the following commands on each node in the cluster to install Oracle’s latest version of Java 8. Note that there will be some interactive steps in the process requiring you to accept Oracle’s license. Also, install the useful utility rsync
.
sudo add-apt-repository ppa:webupd8team/java sudo apt-get update sudo apt-get install oracle-java8-installer sudo apt-get install rsync
Getting the Hadoop Software
Hadoop can be downloaded from the Apache Hadoop site. However, the version you will download there has some issues with running on our ODROID XU4 cluster. Specifically, it contains native libraries built for the x86 processor which were built to make certain operations run faster than they would in Hadoop’s main implementation language of Java. The x86 build of Hadoop will work in that Hadoop will automatically fall back to the slower Java implementation. But, we want our installation to be as fast as possible, so Hadoop needs to be built for the XU4’s ARM processor with hard float capabilities.
I am not going to document the Hadoop build process here. There is a good article describing how to build Hadoop on a Raspberry Pi here. The process for the XU4 is pretty much identical. Build Hadoop on the master node. We will <code>rsync</code> it to the rest of the cluster after everything is configured.
If you want to skip the build process, you can download from my GitHub repository. To directly download that build to your master node, use this command for the Hadoop 2.7.2 build:
Then place this file into the /opt
directory on the master node. Alternatively, you can download it directly to the master node:
cd /opt sudo wget http://diybigdata.net/downloads/hadoop/hadoop-2.7.2.armhf.tar.gz
Preparing the Cluster for Hadoop
Our first step to installing Hadoop is to set up the hduser
account on all our nodes and enable passwordless logins from the master node to all slaves. While logged in as user odroid
on the master node:
parallel-ssh -i -h ~odroid/cluster/all.txt -l root "addgroup hadoop"
Then you will need to do the following logged into each node one by one as parallel-ssh
doesn’t handle the interactive nature of creating a user. Note that you will be asked to create a password for the hduser
. I simply used the same one that account odroid
uses.
sudo adduser --ingroup hadoop hduser sudo adduser hduser sudo
Then, back on the master node, give hduser
super user rights and distribute the master node’s SSH keys:
parallel-ssh -i -h ~odroid/cluster/all.txt -l root "usermod -aG sudo hduser" su hduser cd ssh-keygen -t rsa -P "" cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys ssh hduser@localhost exit ssh hduser@master exit ssh-copy-id hduser@slave1 ssh hduser@slave1 exit ssh-copy-id hduser@slave2 ssh hduser@slave2 exit ssh-copy-id hduser@slave3 ssh hduser@slave3 exit
Finally, let’s create the Hadoop data directory in our /data
mount. Under the odroid user, issue:
parallel-ssh -i -h ~odroid/cluster/all.txt -l root "mkdir -p /data/hdfs/tmp" parallel-ssh -i -h ~odroid/cluster/all.txt -l root "chown -R hduser:hadoop /data/hdfs"
Installing and Configuring Hadoop
Unpack the Hadoop package in the /opt
directory. I also like creating a symlink to the install from the /usr/local
directory. On the master node:
cd /opt sudo tar xzf hadoop-2.7.2.armhf.tar.gz sudo chown -R hduser:hadoop hadoop-2.7.2 cd /usr/local sudo ln -s /opt/hadoop-2.7.2 hadoop
The next step is to configure the Hadoop installation on the master node. This requires editing several configuration files found the the /usr/local/hadoop/etc/hadoop
directory. I’ve posted the contents of these files to the GitHub repository for the ODROID CU4 cluster project.
The first configuration file is hadoop-env.sh
:
vi etc/hadoop/hadoop-env.sh
Adjust the following two lines to match what is shown:
export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::") export HADOOP_HEAPSIZE=384
These changes tell Hadoop where it can find the Java libraries and sets the default heap size. Since our devices have 2 GB of RAM and we want to save as much RAM as possible for a Apache Spark install later, we are limiting Hadoop’s heap to 384 MB.
Verify that Hadoop’s environment is (minimally) set up:
bin/hadoop
The next configuration file sets up the core variables for Hadoop across all components of Hadoop, the core-site.xml
:
vi etc/hadoop/core-site.xml
Add the following values between the <configuration></configuration>
tags.
To configure the HDFS component, edit hdfs-site.xml
:
vi etc/hadoop/hdfs-site.xml
Add the following values between the <configuration></configuration>
tags.
To configure the Map-Reduce component, edit mapred-site.xml
:
cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml vi etc/hadoop/mapred-site.xml
Add the following values between the <configuration></configuration>
tags. Note that many of these configurations items control the memory allocated during a map-reduce job.
To configure the resource manager YARN, edit yarn-site.xml
:
vi etc/hadoop/yarn-site.xml
Add the following values between the <configuration></configuration>
tags. Note that many of these configurations items control the memory allocated during a map-reduce job.
Create and edit the masters
and slaves
files:
vi etc/hadoop/masters
Set the masters
file contents to:
master
Now open the slaves
file for editing:
vi etc/hadoop/slaves
Set the slaves
file contents to:
master slave1 slave2 slave3
Installing Configured Hadoop to All Slaves
Go back to the odroid
account and use rsync
to push the configured Hadoop installation to each slave.
exit parallel-ssh -i -h ~/cluster/slaves.txt -l root "mkdir -p /opt/hadoop-2.7.2/" sudo rsync -avxP /opt/hadoop-2.7.2/ root@slave1:/opt/hadoop-2.7.2/ sudo rsync -avxP /opt/hadoop-2.7.2/ root@slave2:/opt/hadoop-2.7.2/ sudo rsync -avxP /opt/hadoop-2.7.2/ root@slave3:/opt/hadoop-2.7.2/ parallel-ssh -i -h ~/cluster/slaves.txt -l root "chown -R hduser:hadoop /opt/hadoop-2.7.2/" parallel-ssh -i -h ~/cluster/slaves.txt -l root "ln -s /opt/hadoop-2.7.2 /usr/local/hadoop"
Install see utilities to interact with the HDFS NFS mount:
sudo apt-get install nfs-common
Finally, update the master node’s .bashrc
file for the hduser
user to add Hadoop to the earth path:
su hduser vi ~/.bashrc
And add these lines at the end:
export PATH=$PATH:/usr/local/hadoop/sbin:/usr/local/hadoop/bin
Starting and Stopping HDFS
Before you start HDFS for the first time, you need to format the HDFS NameNode. Ensure that you are logged into the master node as hduser
for these commands.
hdfs namenode -format
Then to start HDFS and the NFS server:
start-dfs.sh
The Hadoop convention is that all your files will be in your user filer. Use this command to create it, but change the name to what you want:
hdfs dfs -mkdir -p /user/michael
To stop HDFS:
stop-dfs.sh
That’s it. You now have Hadoop installed and configured on your ODROID XU4 cluster. In the next post, we will explore how to connect to HDFS and what we can do with Hadoop.
3 thoughts on “Installing Hadoop onto an ODROID XU4 Cluster”