Improving Linux Kernel Network Configuration for Spark on High Performance Networks

My Personal Compute Cluster recent had a failure where only of my nodes disassociated from the cluster and the 2.5 Gbps high speed ethernet link that I had set up through a USB dongle became unresponsive. Investigating the problem, I saw in the system log on that node that the kernel thought it was getting a SYN flood through the 2.5 Gbps ethernet link. Basically, the kernel turned off that networking link because it thought it was getting a DDoS attack. Clearly there wasn’t a true DDoS attack happening since my cluster is on its own network. I researched what would cause this and learned that the standard Linux kernel networking configuration is tuned for 1 Gbps ethernet links. Basically, the intense data transfer between my Spark nodes over the 2.5 Gbps ethernet links filled the kernel’s network queue. To fix the problem, I had to increase the size of the queue. To make the needed improvements to how the Read More …