Upgrading the Compute Cluster to 2.5G Ethernet

I recently updated my Personal Compute Cluster to use faster ethernet interconnect for the cluster network. After putting together a PySpark Benchmark, I was intrigued to see how faster networking within the cluster would help.

Note: All product links below are affiliate links.

Upgrading the Cluster Networking Hardware

To upgrade the networking for the EGLOBAL S200 computers that I use within my cluster, my only real option was to use USB ethernet dongles. This is because the S200 computer has no PCIe expansion slots, but it does have a USB 3 bus. This gives me a few options. There are no 10 Gbps ethernet dongles for USB 3, but there are several 5 and 2.5 Gbps ethernet dongles. These are part fo the more recent NBASE-T ethernet standard which allows faster than 1 Gbps ethernet over Cat 5e and Cat 6 cabling.

The first option I investigated was the StarTech USB 3 to 5 Gbps Ethernet Adapter. I am going to jump to the punch line on this one: I found that using this dongle on the S200 computer was unstable. I wasn’t able to ever determine the exact reason why, but I suspect the 5 Gbps traffic from the dongle saturated the computer’s USB 3.0 bus, which is itself rated for 5 Gbps of data transfer. The failures I experience were intermittent, but happened under heavy networking load.

So next I looked into 2.5 Gbps ethernet dongles. The good news is that there are actually many options available for 2.5 Gbps. Ultimately I selected the extremely well priced CableCreation USB 3.0 to 2.5 Gigabit Ethernet Adapter. At less than $30 per dongle, this was the most economical option for a 2.5 Gbps ethernet USB adapter. And it ultimately worked as expected with zero problems.

The centerpiece of the networking upgrade is the networking switch. The original 1 Gbps switch I used in my cluster will not work at the higher speeds that the 2.5 Gbps Ethernet adapters enable. I researched the switch options a lot. There simply aren’t that many NBASE-T enabled switches available, and fewer that are prices for the consumer. Ultimately, I identified these options:

  • NETGEAR 5-Port 10G Multi-Gigabit Ethernet Unmanaged Switch (XS505M) – This switch has 4 RJ45 ethernet ports that can handle all NBASE-T speed up to 10 Gbps. The fifth port is an SFP+ port ideal for uplinking to other switches. This option was the most economical for enabling future use as a full 10 G switch.
  • NETGEAR 10-Port Multi-Gigabit/10G Smart Managed Pro Switch (MS510TX) – This switch is the cheapest of the presented options. It has four 1 G ports, two 1/2.5 G ports, two 1/2.5/5 G ports, and one 10 G port shared between an RJ45 and SFP+ connector. This means it has four ports that can operate at 2.5 Gbps, which would work with my four node cluster.
  • TRENDnet 8-Port 10G EdgeSmart Switch – This switch has 8 1/2.5/5/10 Gbps ports and is a managed switch, making it the best “future proof” switch I could find. It is certainly not the cheapest switch on this list, but I found it to be a good option in case I wanted to add more nodes to my cluster or use it as part of some other 10 G networking I need. Since it is a managed switch, I could theoretically split it into functionally separate switches if I need some 10G ports for something else in the future.

There are other choices, but I found these to be the most intriguing for the indicated reasons. I ultimately went with the TRENDnet 8-port switch listed above because I wanted a switch that would last me a while, but I think any of the listed switches would have worked well with the 2.5 G ethernet adapters I selected.

The final hardware item I had to purchase was ethernet cables. While 2.5 Gbps ehternet should be able to work over Cat5e cabling, I went ahead and bought Cat7 cabling because it was otherwise cheap.

Once I got all the hardware, I needed to transition my cluster’s networking to the new 2.5 Gbps ethernet adapters. I won’t go over the details of each step, as I basically wiped each node’s OS partition and reinstalled the OS as described in my original build out posts. However, this time the master node’s WAN port is the built-in ethernet and the LAN port on all the nodes is the USB 2.5 Gbps adapter.

One thing I did do differently that made setting the OS up much easier was to create a more human readable device name for the USB ethernet adapter on each node, otherwise you would have to use a cryptic sequence of semi-random text to identify the USB ethernet adapter on each machine. To name the USB ethernet adaptor the easier-touse name of usbeth0, create and edit the file /etc/udev/rules.d/70-persistent-net.rules, adding the following line:

SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="01:23:45:67:89:ab", NAME="usbeth0"

Note that the value following the ATTR{address}== key is the MAC address of the USB ethernet adapter attached to the computer. To find the adapters MAC address, issue the ip link command while the USB adapter is attached to the computer. Once the file is edited as indicated, restarting the node allows the new naming for the ethernet adapter to take effect.

When naming the USB ethernet adaptor in this manner, then the /etc/netplan/01-netcfg.yaml file on the master node is much easier to read:

# This file describes the network interfaces available on your system
# For more information, see netplan(5).
network:
  version: 2
  renderer: networkd
  ethernets:
    enp2s0:
      dhcp4: yes
    usbeth0:
      addresses: [10.1.1.1/24]
      dhcp4: false
      nameservers:
        addresses: [8.8.8.8, 8.8.7.7]
        search: []

Update – Since writing this post, I have learned that certain linux kernel settings need to be updated so that linux can manage the faster networking speeds that come with the 2.5G ethernet network. More information can be found here.

Performance Comparison

Once I upgraded the networking, I set out to determine how much it impacted the sped of my Apache4 Spark calculations. In my last post, I created a simple PySpark benchmarking job. I reran the benchmarking job using the same test data that I generated for the prior post, and anything shuffle related was noticeably faster. However, during he process of doing these tests, I noticed that the way files were being stored on QFS wasn’t what I expected.

Long story short, QFS was saving the test files with Reed-Solomon encoding rather than doing straight replication. Reed-Solomon encoding is a space-saving and data-integrity feature of QFS, but it comes at the expense of some speed. Furthermore, you only really get the data-integrity feature if you have at least nine chunk servers storing the encoded strips of data. My cluster has only four nodes. I would still benefit from the space saving feature, but I wanted to focus on speed. The reason why straight replication is preferred for speed reasons is because then the data file is stored in its entirety on a given node, and Spark can then take advantage with the data’s node locality by scheduling tasks on the same node where the data is location, thus not requiring network transfers to work on that data file. Read-Solomon break a file up into 9 stripes (6 data and 3 parity), and then distributes those stripes across the cluster. There is no such thing as true data locality with Read-Solomon encode files.

It turns out that the reason QFS wasn’t storing the generated test generate data straight replication was due to a bug in the HDFS shim that allows Spark to communicate with QFS. I filed a bug, and the QFS team was able to fix it. Re-running all of the tests with the test data generated onto QFS with 2x replication yielded the best speed improvements.

Altogether, the performance I got on my Personal Computer Cluster across the difference configurations was:

Benchmark Test Original Time (Seconds)New Time (Seconds)Time With 2x Replications Fix
Shuffle – Group By547.5495.6345.6
Shuffle – Repartition916.3915.6783.8
Shuffle – Inner Join1416.11238.81065.5
Shuffle – Broadcast Inner Join1211.01091.8910.9
CPU – SHA-512921.2906.4886.1
CPU – Calculate Pi – Python UDF916.2917.2921.4
CPU – Calculate Pi – Dataframe Functions10.810.910.8

From these results I would say that improving the data read/write speed by using straight replication had much more impact on the speed of the benchmark tests than improving the networking, though the networking improvement did have noticeable impact. The respective improvements make sense, reading from a local disk will always be faster than reading from the network.

The takeaway I get from this these experiments is that data locality is still very important even in the age of much faster networking. I do think as a future test, I will test various file systems I could use in the cluster to see which ones result in the best performance. For now, I am happy with my updated hardware configuration and fix to QFS.

One thought on “Upgrading the Compute Cluster to 2.5G Ethernet”

Leave a Reply