I had quite a bit of fun with trying to achieve bit-identical floating point behavior across platforms for Croquet. Turns out our claim about "bit identical" behavior is not so bit-identical when it comes to floating point after all. Reading up on IEEE754 and some of the more interesting fpu options like extended internal fpu accuracy or fused multiply-add (all of which are "consistent" with IEEE754 although delivering different results) I have found a wonderful little library (http://www.netlib.org/fdlibm/) which deals very well with these issues. Using it, I have been able to get truly bit-identical results between x87 and PPC and I think it likely that other architectures (Sparc, Arm, Mips) can be made to work with this library as well.
Which raises an interesting question: Should we change the VM so that by default we *always* use that library? It would mean replacing our boiler-plate #include<math.h> by an #include<fdlibm.h> and having to link everything (both the VM as well as external plugins if needed) with the library, the cost of which is somewhere in the 60kbytes range. Together with a few platform specific tweaks (like setting the default accuracy on x87 to 53 bits mantissa, or disabling fused-madd on PPC) it *should* make floating point behave perfectly bit-identically across platforms.
A word about the runtime efficiency: The overhead seems to be fairly acceptable for the most common uses. Using the primitives, I have been unable to measure noticable differences between the current primitives and the fdlibm primitives *except* for the implementation of sqrt(x) which appears to be significantly faster in hardware. However, both on x87 and PPC it turns out that the results are bit-identical both across the architectures as well as in comparison to fdlibm. And that in turn means we can use the hardware sqrt directly. Note that none of the other implementations give identical results; in particular when it comes to boundary or extreme cases (like huge exponents) the results can *vastly* differ - up to the point that x87 will tell you that sin(1.0e100) is NaN which obviously ridiculous. fdlibm computes this to -0.38 and I have no idea if that's to be considered "correct" but at least we get consistent results across the various platforms.
What do you think? It is worth using fdlibm generally instead of -lm?
Cheers, - Andreas
PS. If somebody has access to Sparc/Mips/Arm I would be interested to see how they behave.