Is floating point math deterministic?

I like my couch, and spend time sitting on it pretty much every evening. But I’m scared of what lies beneath it. An ungodly mix of food crumbs, cat hair, decaying skin cells... <shudder> I try not to think about it, and avoid looking under there!

Floating point math is like that for most game developers. We all use it, and most of the time it does what we want, but we don't spend too much time exploring its darker nooks and crannies.

Y'all know that floating point numbers are only an approximate representation, right? This can cause trouble when calculations produce inexact results, but the errors are usually small, so problems are fortunately rare.

A more dangerous but less widely understood issue is that floating point precision changes as you move around the world. The further from the origin, the less accuracy you have. This leads to pernicious bugs where your physics engine works perfectly in a small test level, but then the collision goes wrong only at the most distant corner of your largest level. If your game world is large enough to run into such problems, the solution is to use fixed point integer coordinates for position values.

But what about the common case where your game does not have any precision problems, and works perfectly despite the warts of floating point math? We're ok then, right?

Not so fast...

I compile my game in Debug configuration, and run it on an x86 Windows PC, using the 2.0 CLR, with a debugger attached. It works perfectly in spite of floating point rounding errors.

Then I compile the same game in Release configuration, and run it on Xbox 360, which has a PowerPC CPU core, using the .NET Compact Framework CLR, without a debugger. Again it works perfectly, but not exactly the same as before! My floating point computations will round in different ways, and thus produce slightly different results.

Floating point rounding can be different from one CPU architecture to another, but changes in compiler and optimization settings have a bigger impact. Common differences include:

Modern processors have vector math units that support advanced instructions such as dot product and fused multiply-add. To take advantage of such hardware, an optimizer might replace the computation ((a * b) + c) with a single instruction: mad(a, b, c). The result is basically the same, but unlikely to be rounded the same way. If we changed our optimizer settings to prevent the use of these instructions, or ran on a machine that did not have the necessary vector math unit, we would get a slightly different result.
Not every CPU supports denormalized floating point values. If a calculation produces an extremely tiny result, some processors will round it down to zero, while others preserve it.
Given a piece of code such as:

        foreach (foo in list)
        {
            foo.X = foo.Y + Bar + Baz;
        }

a smart optimizer might rearrange this to move the common subexpression outside the loop:

        float tmp = Bar + Baz;

        foreach (foo in list)
        {
            foo.X = foo.Y + tmp;
        }

This optimization has changed the evaluation order. What used to be ((foo.Y + Bar) + Baz) is now (foo.Y + (Bar + Baz)). Addition is associative, so the two versions are in theory identical, but they reach their goal via a different intermediate result, which will not round the same way. The optimized code will produce a slightly different result to the original.

The x87 FPU used by Intel can be switched between 32, 64, and 80 bit internal precision. Changing mode is expensive, so most programs just leave it set to 64 bit internal precision regardless of whether they are dealing with floats or doubles. This means computations are done at a higher than necessary precision, but hey, getting more precision than we expected shouldn't be a problem, right? Unfortunately, this architecture makes it hard to get consistent results when you change the optimizer. For instance the expression x = ((a * b) + c) might naively be compiled into this assembly pseudocode:

        float tmp;  // 32 bit precision temporary variable

        push a;     // converts 32 to 64 bit
        push b;     // converts 32 to 64 bit
        multiply;   // 64 bit computation
        pop tmp;    // converts result to 32 bits

        push tmp;   // converts 32 to 64 bit
        push c;     // converts 32 to 64 bit
        add;        // 64 bit computation
        pop x;      // converts result to 32 bits

Even though the multiply and add instructions are using 64 bit internal precision, the results are immediately converted back to 32 bit format, so this does not affect the result. But it is horribly inefficient to write out a value and then immediately load it back into the FPU! A smart optimizer would collapse this to:

        push a;     // converts 32 to 64 bit
        push b;     // converts 32 to 64 bit
        multiply;   // 64 bit computation
        push c;     // converts 32 to 64 bit
        add;        // 64 bit computation
        pop x;      // converts result to 32 bits

Now the temporary result (after the multiply but before the add) is stored in an FPU register with full 64 precision, so the end result will not be rounded the same way. The result will be different from both the earlier non-optimized x87 code and also from any other CPU that does not have such a crazy internal architecture.

Ok, so any time we change the compiler or optimizer, we'll get slightly different results. Many games don't care. But remember those butterflies that like to create hurricanes? They ain't got nothing on floating point rounding differences as a cause of chaos!

If you store replays as controller inputs, they cannot be played back on machines with different CPU architectures, compilers, or optimization settings. In MotoGP, this meant we could not share saved replays between Xbox and PC. It also meant that if we saved a replay from a debug build of the game, it would not work in release builds, or vice versa. This is not always a problem (we never shipped debug builds, after all), but if we ever released a patch, we had to build it using the exact same compiler as the original game. If the compiler had been updated since our original release, and we built a patch using the newer compiler, this could change things around enough that replays saved by the original game would no longer play back correctly.
If your game does networking by sending controller inputs as opposed to actual object positions (a popular approach for genres such as RTS that have many objects and can tolerate high latencies), this will not work across different platforms or builds of the game.
Beware when writing unit tests. For instance, Ito wrote a very thorough set of tests for the XNA Framework math library. Once they all passed on his local machine, he checked them in. Half of them immediately failed on the nightly build server. When we debugged the tests to see what was going wrong, we still got failures, but in different places to the previous test run! We eventually figured out that Ito had been using a debug configuration, but the server was building release. When we attached the debugger to investigate the problem, that switched the JIT into debugging mode, which produced different results yet again. To make the tests work consistently, we had to include a tolerance threshold in the result comparison, so they would pass even if the rounding errors were slightly different. And that was even before we tried running them on Xbox...

This is madness! Why don't we make all hardware work the same?

Well, we could, if we didn't care about performance. We could say "hey Mr. Hardware Guy, forget about your crazy fused multiply-add instructions and just give us a basic IEEE implementation", and "hey Compiler Dude, please don't bother trying to optimize our code". That way our programs would run consistently slowly everywhere :-)

This issue is particularly scary for .NET developers. It's all very well to say "don't change your compiler or optimizer settings", but what does that mean when your code is compiled on the fly by the CLR? I can't find much official documentation in this area, so what follows is just educated guesswork:

Within a single run of a .NET program, repeated calculations should always produce identical results. This means it is possible to store controller based replays and play them back from memory. Note: this assumes the system will never revisit and re-optimize methods to improve performance hotspots. To the best of my knowledge the CLR does not do that today, but some other platforms do, so who knows if this will always be the case?
On Windows, .NET programs cannot save controller based replays to disk, then play them back at some later date or on a different machine. An intervening service pack might change the CLR in such a way that the JIT produces different code, or if you move the replay to a different computer, the CLR might adapt depending on the specific CPU variant it is running on.
On Xbox, you can save controller based replays to disk, then play them back in a different run of your program. This is robust because the Xbox Game Studio runtimes are side-by-side, so your game will always use the same version of the Compact Framework that you developed it with.
Be aware that you are dealing with two optimizers, not just one! The C# compiler optimizer is controlled by whether you build your program as Debug or Release, but the JIT optimizer is controlled by whether or not you run with a debugger attached. So you may not get the same results when you launch using F5 versus Ctrl+F5 from Visual Studio.

Blog index - Back to my homepage

Is floating point math deterministic?

Originally posted to Shawn Hargreaves Blog on MSDN, Wednesday, March 25, 2009