I used to work at a company named Quantcast, which is known for processing petabyte-scale data sets. When operating at that scale, any inefficiency in your data infrastructure gets multiplied many times over. Early on, Quantcast found that the inefficiencies in Hadoop literally cost them by requiring more hardware to obtain certain levels of performance. So Quantcast set out to write a highly efficient big data stack. At the bottom of that stack was Quantcast’s distributed file system called QFS. Quantcast open-sourced QFS, and performed several studies to show the efficiency gains QFS had over HDFS. Given the first hand experience I have had with QFS performance, I thought it would interesting to get it running on the ODROID XU4 cluster in combination with Spark to see how much QFS would improve processing speeds in our set up. Spoiler alert: there was no time improvements, but QFS required less memory than HDFS, allowing me to assign more RAM to Spark.
While QFS can be a more powerful file system, it is also a bit more raw in terms of supporting tools and installation ease as compared to our previous experience with installing HDFS. In such, there are more details to this installation to get the file system running. I did create a tool that will enable you to launch QFS on the cluster with the same ease of launching the HDFS file system, but more on that later.
Note that these instructions could be applied to the cluster as it exists after the last blog post, or on a new cluster that has just been set up with no installed software. Ultimately you should choose if you are going to persist with HDFS or QFS as your file system on the cluster.
The first thing to do is to install all of QFS itself. While Quantcast does distribute some pre-built binaries, none of them are built for ARM system. Compiling QFS is rather straightforward, but time consuming. I have made a build for ARMv71 of QFS’s last stable release.
To directly download my distributions of QFS from my server, do the following as hduser on the master node as
cd /opt sudo wget http://diybigdata.net/downloads/qfs/qfs-ubuntu-16.04.3-1.2.1-armv7l.tgz sudo tar xvzf qfs-ubuntu-16.04.3-1.2.1-armv7l.tgz sudo chown -R hduser:hadoop qfs-ubuntu-16.04.3-1.2.1-armv7l sudo ln -s /opt/qfs-ubuntu-16.04.3-1.2.1-armv7l /usr/local/qfs
Now we have to make some directories that do not come with the standard distribution of QFS, then sync the software out to the slaves. We sync the software before creating the configuration files since we have to have different configuration files between the master and slave nodes.
mkdir /usr/local/qfs/conf mkdir /usr/local/qfs/sbin parallel-ssh -i -h ~odroid/cluster/all.txt -l root "mkdir -p /data/qfs/logs" parallel-ssh -i -h ~odroid/cluster/all.txt -l root "mkdir -p /data/qfs/checkpoint" parallel-ssh -i -h ~odroid/cluster/slaves.txt -l hduser "mkdir /data/qfs/chunk" parallel-ssh -i -h ~odroid/cluster/all.txt -l root "chown -R hduser:hadoop /data/qfs" rsync -avz /opt/qfs-ubuntu-16.04.3-1.2.1-armv7l/ root@slave1:/opt/qfs-ubuntu-16.04.3-1.2.1-armv7l rsync -avz /opt/qfs-ubuntu-16.04.3-1.2.1-armv7l/ root@slave2:/opt/qfs-ubuntu-16.04.3-1.2.1-armv7l rsync -avz /opt/qfs-ubuntu-16.04.3-1.2.1-armv7l/ root@slave3:/opt/qfs-ubuntu-16.04.3-1.2.1-armv7l parallel-ssh -i -h ~odroid/cluster/slaves.txt -l root "chown -R hduser:hadoop /opt/qfs-ubuntu-16.04.3-1.2.1-armv7l" parallel-ssh -i -h ~odroid/cluster/slaves.txt -l root "ln -s /opt/qfs-ubuntu-16.04.3-1.2.1-armv7l /usr/local/qfs" parallel-ssh -i -h ~/cluster/all.txt -l root "apt-get update" parallel-ssh -i -h ~/cluster/all.txt -l root "apt-get install libboost-regex-dev -y"
Now install configuration and tools from the odroid-xu4-cluster Github repository. Note that the configuration found in this repository assumes that the XU4 cluster is configured as previously described, including checking out the git repository.
cd ~/odroid-xu4-cluster/ git pull cp qfs/sbin/* /usr/local/qfs/sbin/ cp qfs/configuration/* /usr/local/qfs/conf/ parallel-scp -h ~odroid/cluster/slaves.txt -l hduser qfs/configuration/Chunkserver.prp /usr/local/qfs/conf/
Let’s review the various configuration files. The set of configurations that went on the master node controls the set up of Meta Server, which is the equivalent to a Name Node in HDFS. First there is
Metaserver.prp configuration file:
metaServer.clientPort = 20000 metaServer.chunkServerPort = 30000 metaServer.logDir = /data/qfs/logs metaServer.cpDir = /data/qfs/checkpoint metaServer.recoveryInterval = 30 metaServer.clusterKey = qfs-odroid-xu4 metaServer.rackPrefixes = 10.10.10.2 1 10.10.10.3 2 10.10.10.4 3 metaServer.msgLogWriter.logLevel = INFO chunkServer.msgLogWriter.logLevel = NOTICE metaServer.rootDirMode = 0777 metaServer.rootDirGroup = 1001 metaServer.rootDirUser = 1001
The last two lines are of particular interest as they set the UID and GID of the user that writes to file system. QFS has the option of enforcing permissions through security schemes like Kerberos. However we are not going to use that level of authentication to the file system, but QFS still needs to know a user that files are to be written under. Here I configured things to use the hduser and hadoop group on the master server. You should check whether your installation has the same UID and GID for those users as mine did. To check:
id -u hduser id -g hduser
If your UID and GID are different, then update the configuration file accordingly. Another configuration file on master node is the wedUI.cfg file which controls the monitoring UI for QFS:
[webserver] webServer.metaserverHost = 10.10.10.1 webServer.metaserverPort = 20000 webServer.port = 20050 webServer.docRoot = /usr/local/qfs/webui/files/ webServer.host = 0.0.0.0 webserver.allmachinesfn = /dev/null
Finally, there is the
qfs-env.sh file which is used to set key environment variables to be used when launching QFS using the
#!/bin/bash # # QFS specific environment variables # export QFS_LOGS_DIR=/data/qfs/logs
On each of the slave nodes we installed the configuration file for the Chunk Servers,
Chunkserver.prp, which hold the blocks of data that gets saved to QFS. It’s configuration file contains:
chunkServer.metaServer.hostname = 10.10.10.1 chunkServer.metaServer.port = 30000 chunkServer.clientPort = 20000 chunkServer.chunkDir = /data/qfs/chunk chunkServer.clusterKey = qfs-odroid-xu4 chunkServer.stdout = /dev/null chunkServer.stderr = /dev/null chunkServer.ioBufferPool.partitionBufferCount = 65536 chunkServer.msgLogWriter.logLevel = INFO chunkServer.diskQueue.threadCount = 2
You will notice that both the
Chunkserver.prp files contains a variable named
chunkServer.clusterKey. This variable’s value is simply a string that internally identifies the instance of QFS the configuration file pertains to. On very large clusters, it is not uncommon to run multiple instances of a distributed file system. The typical pattern is to have a read-only instance that contains the “production data”, and then a read-write instance where jobs can do work. QFS was designed to permit many different instances of file systems to run on the same cluster and enables it through having each process know which file system instance it belongs to, as identified by the
chunkServer.clusterKey value. Though we are running only one instance of QFS, we still need to set the cluster key so that the chunk servers will connect properly to the meta server.
We now have QFS fully installed. We need to initialize the Meta Server (note that we only ever do this once):
/usr/local/qfs/bin/metaserver -c /usr/local/qfs/conf/Metaserver.prp
To launch QFS, use the launch script installed from the git repository:
Putting Data onto QFS
Let’s put the concatenated blog dat that we constructed previously onto our QFS instance. Note that these are fairly verbose commands.
/usr/local/qfs/bin/tools/qfsshell -s master -p 20000 -q -- mkdir /user/michael/data /usr/local/qfs/bin/tools/cptoqfs -s master -p 20000 -d ~/blogs_all.xml -k /user/michael/data/blogs_all.xml -r 2
More documentation on other QFS commands available and how to configure things to be less verbose can be found at the QFS wiki.
If you already installed Spark in accordance to my last post on installing Spark with Hadoop, do not skip this section as you will need to install a different version of Spark again. The reason for this is that QFS provides bindings that allows clients to interact with it as if it were HDFS. These instructions are written assuming that you do not have Spark already installed to work with Hadoop. If you wish to keep both use cases for Spark available, you will need to install spark into a different directory and have a different symlink in
/usr/local. I leave it as an exercise to the reader to update these instructions as appropriate to accomplish that.
The first step is to download and install a version of Spark that was compiled to work with the ARM71 processor:
cd /opt sudo wget http://diybigdata.net/downloads/spark/spark-2.2.0-bin-hadoop2.7-arm71-double-alignment.tgz sudo tar xvzf spark-2.2.0-bin-hadoop2.7-arm71-double-alignment.tgz sudo chown -R hduser:hadoop spark-2.2.0-bin-armhf sudo ln -s /opt/spark-2.2.0-bin-armhf /usr/local/spark
Now we need to configure Spark to run in standalone cluster mode:
cd /usr/local/spark/conf cp spark-env.sh.template spark-env.sh vi spark-env.sh
Add the following lines somewhere in the
LD_LIBRARY_PATH=/usr/local/qfs/lib SPARK_DIST_CLASSPATH=/usr/local/qfs/lib/hadoop-2.7.2-qfs-1.2.1.jar:/usr/local/qfs/lib/qfs-access-1.2.1 SPARK_HOME=/usr/local/spark/ PYSPARK_PYTHON=python3 PYTHONHASHSEED=12121969 HADOOP_CONF_DIR=/usr/local/spark/conf/ SPARK_LOCAL_DIRS=/data/spark SPARK_WORKER_MEMORY=1300M SPARK_WORKER_CORES=2
Next we need to set up the defaults configuration values that Spark uses.
cp spark-defaults.conf.template spark-defaults.conf vi spark-defaults.conf
Add the following lines to the file:
spark.master spark://master:7077 spark.serializer org.apache.spark.serializer.KryoSerializer spark.executor.memory 1300M spark.driver.memory 1000M spark.io.compression.codec lz4 spark.driver.cores 2 spark.executor.cores 1 # These settings help prevent Spark from thinking the ODROID XU4 node # is failing when a thread gets slowed down etither by the governor or # getting on a slower core. # # spark.akka.timeout is only needed pre-Spark 2.0 #spark.akka.timeout 200 spark.network.timeout 800 spark.worker.timeout 800 # This setting is to tell the class loaders in Spark that they # only need to load the QFS access libraries once spark.sql.hive.metastore.sharedPrefixes com.quantcast.qfs
We need to tell spark where the slave nodes are.
cp slaves.template slaves vi slaves
Add the following lines to the slaves file:
slave1 slave2 slave3
You will note that in the
spark-defaults.conf file we set the
HADOOP_CONF_DIR to this Spark installation’s configuration directory. The reason for that is we cannot use our original Hadoop configuration here as it points to HDFS. Instead, we need to tell Spark to use QFS. To do that, create a
core-site.xml file in this Spark installation’s configuration directory:
And set the contents to:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Setting for QFS--> <configuration> <property> <name>fs.qfs.impl</name> <value>com.quantcast.qfs.hadoop.QuantcastFileSystem</value> </property> <property> <name>fs.defaultFS</name> <value>qfs://master:20000</value> </property> <property> <name>fs.qfs.metaServerHost</name> <value>master</value> </property> <property> <name>fs.qfs.metaServerPort</name> <value>20000</value> </property> </configuration>
The final step is to copy the Spark software and configuration to each of the slaves. As the odroid user on the master node:
rsync -avxz /opt/spark-2.2.0-bin-armhf/ root@slave1:/opt/spark-2.2.0-bin-armhf rsync -avxz /opt/spark-2.2.0-bin-armhf/ root@slave2:/opt/spark-2.2.0-bin-armhf rsync -avxz /opt/spark-2.2.0-bin-armhf/ root@slave3:/opt/spark-2.2.0-bin-armhf parallel-ssh -i -h ~odroid/cluster/slaves.txt -l root "chown -R hduser:hadoop /opt/spark-2.2.0-bin-armhf" parallel-ssh -i -h ~odroid/cluster/slaves.txt -l root "ln -s /opt/spark-2.2.0-bin-armhf /usr/local/spark"
We now have an installation of Spark set up to work with QFS.
You may skip this section of you previously installed Jupyter base on the previous post.
Installing Juypyter Notebooks is relatively simple. As
hduser on the master node:
sudo mkdir /data/jupyter sudo chown -R hduser:hadoop /data/jupyter/ mkdir ~/notebooks sudo apt-get install python3-pip sudo pip3 install jupyter
Running Word Count
Before running the word count program on Spark with the QFS data store, we need to launch spark than Jupyter (assuming that QFS and Spark are already running). Note that you should replace the IP address in the
--ip parameter with the external IP address of your cluster.
/usr/local/spark/sbin/start-all.sh XDG_RUNTIME_DIR="/data/jupyter" PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook --no-browser --ip 192.168.1.50 --port=7777 --notebook-dir=/home/hduser/notebooks" /usr/local/spark/bin/pyspark --master spark://master:7077
Note the use of the Spark installation set up for QFS. Once it is running, point your computer’s browser to
http://<cluster IP address>:7777/. You will see the Jupyter interface. Create a new notebook and enter the word count program using the QFS data source into the first cell and press shift-enter:
from operator import add blog_data = sc.textFile('qfs://master:20000/user/michael/data/blogs_all.xml').cache() counts = blog_data.flatMap(lambda x: x.split(' ')).map(lambda x: (x, 1)).reduceByKey(add) counts.take(100)
On my system, the cluster took 2.3 minutes to complete word count versus 2.8 minutes with HDFS as we previously set it up. While this is certainly a speed up, the real reason QFS was faster in this case is because the original run with HDFS was using a block size of 16 MB in HDFS. This means there were more part files for HDFS to read and Spark to process, which created about a half of minute of additional overhead. I changed the HDFS block size to 64 MB, which is what QFS is hard coded to use as a block size (you can only change QFS’s block size if you change the code and recompile), and the Spark on HDFS time for this problem sped up to 2.3 minutes. As I have previously pointed out, reducing the number of blocks processed has significant impact in map/reduce technologies like Spark and Hadoop.
It is also worth noting that the small ODROID XU4 cluster doesn’t really leverage QFS’s full potential. The is because QFS is designed to address the inefficiencies that happen at very large scale use. For example, if we had at least chunk server 9 nodes, QFS could use Reed-Solomon encoding of data which makes more efficient the data replication and recovery process.
Shutting Things Down
To shut down Jupyter, Spark, and QFS, first close and halt any open notebooks in Jupyter, then press control-C twice in the terminal that launched Jupyter. Now run the shutdown scripts for Spark and QFS: