I haven’t posted an update to my data analysis projects in a while. Partly because my day job has been a bit busy lately, and partly because what time I do have for my recreational coding has been taken up by a problem I was experiencing with Apache Spark. I started have stability problems on my ODROID XU4 cluster. I didn’t fully understand the cause at first, thinking for the longest time it was my own code. In the end, it proved to be a bug in spark, or more specifically, an incompatibility between Spark’s memory management and the ARM71 platform of my ODROID XU4 cluster.
The issue has to do with how some CPUs operate on double
floating point values. These CPUs, including the 32-bit ARM71 CPU found in the ODROID XU4, requires that when the CPU operates on a double
floating point value the 8 bytes of memory used to contain the value should be aligned to 8 byte boundaries. What that means is that the memory address of the first byte of the double
should be divisible by 8. Why would the CPU require that? Well, there are many good resources on the internet covering memory alignment requirements of a CPU, so I will not try to replicate that here.
But then this begets the question of why this issue relevant to Spark, especially since it is based on the JVM, which guarantees that objects are aligned to appropriate memory boundaries. It turns out that there are several scenarios that Spark streams information to a byte stream, such as writing results to a parquet file. A low level operation in making this happen is to cover objects and values to a byte stream, and conversely converting a byte stream to objects and values. These low lever conversions are handled by the Platform class in Spark. Consider the function in this class that converts a byte stream to a double
:
public static double getDouble(Object object, long offset) { return _UNSAFE.getDouble(object, offset); }
The _UNSAFE
variable is an instance of the sun.misc.Unsafe
class. You can easily find documentation on this class around the internet, but the gist of its purpose is to allow a programmer work around the JVMs memory management and directly interact with memory values. If you know what you are doing, directly manipulating values in memory can be very powerful and efficient. However, it does mean you the programmer have to manage any platform or CPU specific rules for directly manipulating memory, such as byte alignment rules.
In the above function, what is effectively happening under the hood is that a memory location, which is indicated by the combination of the object
and offset
parameters, is getting directly cast to a double
type. On the ARM71 CPU in the ODROID XU4 because the CPU can’t work double
values that are not aligned, this generates a low-level exception if the memory address is not aligned to an 8 byte boundary. This issue was the root of my problem. I didn’t encounter the problem right away because my initial experiments with Spark on the ODROID XU4 cluster did not manipulate double
values.
I didn’t understand that the problem was a byte alignment issue at first, so I filed a bug with the Spark project (SPARK-18819). After some discussions and further investigation, I narrowed the problem down to the above getDouble()
function, and created a pull request against the Spark project. However, the code I created would introduce some slowness to Spark in a performance critical location, even on platforms that don’t require double
alignment, most notably x86 CPUs. This is because to ensure the cast to double
to occur against a memory address that is aligned, I first copy the byte stream data to a memory location that is known to be 8 byte aligned.
Since Spark isn’t really intended to run at scale on platforms like the ODROID XU4, it was determined that the best path here is to create and publish a patch to Spark that people can apply to the Spark source code to build their own distribution of Spark that is able to run on platforms like the ODROID XU4. For those who are interested, the patch is available here:
If you don’t want to go through the process of created a distribution build of Spark yourself. I have created a build with the patch applied of Spark v2.2.0. This distributions is available here:
For those of you who follow this blog, I have updated my Spark installation posts to use the above distribution with the double
alignment patch applied.