Quantcast File System 1.2 for ARM71

I have been using the Quantcast File System (QFS) as my primary distributed file system on my ODROID XU4 cluster.  Due to QFS’s low memory footprint. it works well with Spark, allowing me to assign as much of the ODROID XU4’s limited 2 GB RAM footprint to the Spark executor running on a node. Recently, QFS 1.2 was released. This version brings many features and updates, many not relevant to my ODROID cluster use case. However, the most notable updates relevant to the ODROID XU4 cluster include: Correct Spark’s ability to create a hive megastore on a new QFS instance (QFS-332) Improved error reporting in the QFS/HDFS shim HDFS shim for the Hadoop 2.7.2 API, which the latest versions of Spark use. In this post, I will update the ODROID XU4 cluster to use QFS 1.2.0. I have also updated my original post on using QFS with Spark, which you should use if you are starting from scratch. Install  QFS 1.2 cd /opt Read More …

Using the Quantcast File System with Spark on the ODROID XU4 Cluster

I used to work at a company named Quantcast, which is known for processing petabyte-scale data sets. When operating at that scale, any inefficiency in your data infrastructure gets multiplied many times over. Early on, Quantcast found that the inefficiencies in Hadoop literally cost them by requiring more hardware to obtain certain levels of performance. So Quantcast set out to write a highly efficient big data stack. At the bottom of that stack was Quantcast’s distributed file system called QFS. Quantcast open-sourced QFS, and performed several studies to show the efficiency gains QFS had over HDFS. Given the first hand experience I have had with QFS performance, I thought it would interesting to get it running on the ODROID XU4 cluster in combination with Spark to see how much QFS would improve processing speeds in our set up. Spoiler alert: there was no time improvements, but QFS required less memory than HDFS, allowing me to assign more RAM to Spark. Installing Read More …