This project walks you through all the steps necessary to create a distributed compute cluster for personal productivity and the exploration of big data and machine learning technologies such as Hadoop, Spark, and ElasticSearch. This cluster will levered the high economical EGLOBAL S200 mini computer a total of 48 thread and 256 GB of RAM across four nodes, making it ideal for running moderately sized Apache Spark jobs to analyze data sets on the scale of a terabyte in size with reasonable productivity.
Building the Cluster
Below are the steps to obtain, setup, and configure a Personal Compute Cluster.
- Design and Hardware Selection
- Hardware Construction
- Operating System Installation and Configuration
- Setting up a Docker Swarm
- Creating a GlusterFS Volume
Installing Data Analysis Software
More to come …