Improving Linux Kernel Network Configuration for Spark on High Performance Networks

My Personal Compute Cluster recent had a failure where only of my nodes disassociated from the cluster and the 2.5 Gbps high speed ethernet link that I had set up through a USB dongle became unresponsive. Investigating the problem, I saw in the system log on that node that the kernel thought it was getting a SYN flood through the 2.5 Gbps ethernet link. Basically, the kernel turned off that networking link because it thought it was getting a DDoS attack. Clearly there wasn’t a true DDoS attack happening since my cluster is on its own network. I researched what would cause this and learned that the standard Linux kernel networking configuration is tuned for 1 Gbps ethernet links. Basically, the intense data transfer between my Spark nodes over the 2.5 Gbps ethernet links filled the kernel’s network queue. To fix the problem, I had to increase the size of the queue. To make the needed improvements to how the Read More …

Configuring DHCP and NAT in ODROID XU4 Cluster

UPDATE – I have rebuilt this cluster to use Ubuntu 16.04. You can find updated instruction for Ubuntu 16.04 here. As was discussed in the network design post, we will set up the master node as a router to manage network traffic in and out of the cluster.  Before starting, ensure that all of the slave nodes have been powered down, that your home network is still connected directly to the open port on the cluster’s ethernet switch, that you have collected each node’s MAC address, and that the master node is powered up and you are logged into it via SSH. The first step is to explicitly set up the networking interfaces for both the eth0 and eth1 device on the master node. Note that by default the Ubuntu system that was installed on the master node treats eth0 a as requesting a DHCP lease on the network it is attached to. This is why it got an IP Read More …

Configuring the ODROID XU4 Operating System

UPDATE – This post was originally written for HardKernel’s distribution of Ubuntu 15.10, but now has been changed to use the ODROID server image for Ubuntu 14.04 LTS, which is available from HardKernel here. The motivation for this change was to use an official HardKernel support distribution that was specifically built for headless server application. Ubuntu 14.04 LTS may be an older distribution, but it works for our purposes. UPDATE 2 – I have since rebuilt this cluster to use Ubuntu 16.04. You can find updated instruction here. Temporary Networking Setup When setting up the nodes initially, you will need to SSH into them to configure their settings. However, if we go straight to our network design, we will not be able to connect to any node between the master node is not yet set up as a router. So we will need to connect each node directly to the external network (e.g., you home network). Since the bill of materials called Read More …

Network Design for the Low Cost Cluster

Our first task in building any cluster is to first design how it will be set up, most notably how the nodes will interact with each other. The cluster we will be building will have 4 nodes, one master node and three slaves. Each node will be connected to each other by the ethernet switch. However, we want the node-to-node communication to be it’s own network. This maximizes the throughput in the node-to-node communication, which is important for distributed computation, and also makes the cluster behave more like a single device to the external network. The benefit of doing this is that we can add or remove nodes to the cluster without a client ever knowing. However, this approach does present some challenges with how an external client (e.g., your laptop) will interact with the data analysis software, such as Hadoop, but we will deal with that later. My goal is really to create a “data analysis appliance”, so requiring any client Read More …