HDFS – DIY Big Data

Adding a New Node to the ODROID XU4 Cluster

August 20, 2016August 28, 2016 michael

I recently acquired another ODROID XU4 device (and a MicroSD card for bulk storage) to add it to my XU4 cluster. This new node brings my node count to five. Adding a new node to the cluster is relatively straight forward, but there are a lot of details. In the modern datacenter, this operation would be accomplished through a package manager, which would build the new node according to an image. However, I haven’t set up package management on my cluster so I will need to sit up the new node manually. In the original cluster configuration, I used a 5-port ethernet switch for the internal network. Given that there was an open port, no additional hardware beyond the new node is needed. However, if I were to add a sixth node (or beyond), I would need to update my ethernet switch to something such as an 8-port switch. I will also note that using the 40 mm PCB spacers I originally ordered makes Read More …

Connecting to HDFS from an computer external to the cluster

August 13, 2016August 13, 2016 michael

Since I have set up my ODROID XU4 cluster to work with Spark from a Jupyter web notebook, one of the little annoyances I have had is how inefficient it was to transferring data into the HDFS file system on the cluster. It involved downloading data to my Mac laptop via a web download, transferring that data to the master node, then running hdfs dfs -put to place the data onto the file system. While I did set up my cluster to create an NFS mount, it can be very slow when transferring large files to HDFS. So, my solution to this was to install HDFS on my Mac laptop, and configure it to use the ODROID XU4 cluster. One of the requirements for an external client (e.g., your Mac laptop) to interact with HDFS on a cluster is that the external client must be able to connect to every node in the cluster. Since the master node of the cluster was Read More …

Running the Word Count Job with Hadoop

July 2, 2016July 16, 2016 michael

Now that we have Hadoop up and running on the ODROID XU4 cluster, let’s take it for a spin. Every technical platform has a “Hello World” type project, and for Hadoop and other map-reduce platforms, it is Word Count. In fact, the Apache Hadoop MapReduce Tutorial uses Word Count to demonstrate how to write a Hadoop job. We are going to use that tutorial’s code. However, at the time of this writing the WordCount example on the Apache Hadoop site has a bug in it. I’ve corrected that bug and posted the updated code to my Github repository for the ODROID XU4 cluster project. Log into the master node as hduser. We need to update the environment variables so that you can easily build Hadoop jobs. Do this by editing the .bashrc file and add the following at the end: export JAVA_HOME=$(readlink -f /usr/bin/java | sed “s:bin/java::”).. export PATH=$PATH:/usr/local/hadoop/sbin:/usr/local/hadoop/bin:${JAVA_HOME}/bin export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar Then grab the corrected Word Count job code from the Read More …

One byte at a time …

Tag: HDFS

Adding a New Node to the ODROID XU4 Cluster

Connecting to HDFS from an computer external to the cluster

Running the Word Count Job with Hadoop