Our first task in building any cluster is to first design how it will be set up, most notably how the nodes will interact with each other. The cluster we will be building will have 4 nodes, one master node and three slaves. Each node will be connected to each other by the ethernet switch. However, we want the node-to-node communication to be it’s own network. This maximizes the throughput in the node-to-node communication, which is important for distributed computation, and also makes the cluster behave more like a single device to the external network. The benefit of doing this is that we can add or remove nodes to the cluster without a client ever knowing. However, this approach does present some challenges with how an external client (e.g., your laptop) will interact with the data analysis software, such as Hadoop, but we will deal with that later. My goal is really to create a “data analysis appliance”, so requiring any client to understand the cluster’s network topology is contrary to that goal.
To keep the cluster network isolated from the external network, a router will need to be introduced. This could have easily been accomplished if we purchased a router rather than a switch, however, there is a significant cost difference between a router and a switch. The good news is that the master node can perform all the functions of a router if it has an additional ethernet port. For this is the reason we will purchase a USB ethernet dongle, which will be attached to the master node.
In order for the master node to perform the duties of a router, we will need to configure it to be both a DHCP server for the cluster network, and provide the NAT service between the external network and the cluster network. The DHCP server will assign IP addresses to each of the slave nodes, and the NAT service allows the slave nodes to reach out to the external network if needed. This overall cluster networking design is illustrated below.
It is a fairly simple design, but it is important to think these things through before building any cluster.