This project walks you through all the steps necessary to create a distributed compute cluster for personal productivity and the exploration of big data and machine learning technologies such as Hadoop, Spark, and ElasticSearch. This cluster will leverage the very economical EGLOBAL S200 mini computer and have a total of 48 thread and 256 GB of RAM across four nodes, making it ideal for running moderately sized Apache Spark jobs to analyze data sets on the scale of a terabyte in size with reasonable productivity.
Building the Cluster
Below are the steps to obtain, setup, and configure a Personal Compute Cluster.
- Design and Hardware Selection
- Hardware Construction
- Operating System Installation and Configuration
- Setting up a Docker Swarm
- Creating a GlusterFS Volume
Installing Data Analysis Software
- Deploying Spark with GlusterFS as a Stack on a Docker Swarm
- Deploying Spark with the Quantcast File System (QFS) on a Docker Swarm
Cluster Performance
More to come …