Personal Compute Cluster – 2019 Edition

Personal Compute Cluster

This project walks you through all the steps necessary to create a distributed compute cluster for personal productivity and the exploration of big data and machine learning technologies such as Hadoop, Spark, and ElasticSearch. This cluster will leverage the very economical EGLOBAL S200 mini computer and have a total of 48 thread and 256 GB of RAM across four nodes, making it ideal for running moderately sized Apache Spark jobs to analyze data sets on the scale of a terabyte in size with reasonable productivity.

Building the Cluster

Below are the steps to obtain, setup, and configure a Personal Compute Cluster.

Installing Data Analysis Software

Cluster Performance

More to come …