Cross-platform floating point behavior - Vm-dev

19 Nov 2005


      Hi Folks,
I had quite a bit of fun with trying to achieve bit-identical floating 
point behavior across platforms for Croquet. Turns out our claim about 
"bit identical" behavior is not so bit-identical when it comes to 
floating point after all. Reading up on IEEE754 and some of the more 
interesting fpu options like extended internal fpu accuracy or fused 
multiply-add (all of which are "consistent" with IEEE754 although 
delivering different results) I have found a wonderful little library 
(http://www.netlib.org/fdlibm/) which deals very well with these issues. 
Using it, I have been able to get truly bit-identical results between 
x87 and PPC and I think it likely that other architectures (Sparc, Arm, 
Mips) can be made to work with this library as well.
Which raises an interesting question: Should we change the VM so that by 
default we *always* use that library? It would mean replacing our 
boiler-plate #include<math.h> by an #include<fdlibm.h> and having to 
link everything (both the VM as well as external plugins if needed) with 
the library, the cost of which is somewhere in the 60kbytes range. 
Together with a few platform specific tweaks (like setting the default 
accuracy on x87 to 53 bits mantissa, or disabling fused-madd on PPC) it 
*should* make floating point behave perfectly bit-identically across 
platforms.
A word about the runtime efficiency: The overhead seems to be fairly 
acceptable for the most common uses. Using the primitives, I have been 
unable to measure noticable differences between the current primitives 
and the fdlibm primitives *except* for the implementation of sqrt(x) 
which appears to be significantly faster in hardware. However, both on 
x87 and PPC it turns out that the results are bit-identical both across 
the architectures as well as in comparison to fdlibm. And that in turn 
means we can use the hardware sqrt directly. Note that none of the other 
implementations give identical results; in particular when it comes to 
boundary or extreme cases (like huge exponents) the results can *vastly* 
differ - up to the point that x87 will tell you that sin(1.0e100) is NaN 
which obviously ridiculous. fdlibm computes this to -0.38 and I have no 
idea if that's to be considered "correct" but at least we get consistent 
results across the various platforms.
What do you think? It is worth using fdlibm generally instead of -lm?
Cheers,
   - Andreas
PS. If somebody has access to Sparc/Mips/Arm I would be interested to 
see how they behave.