I like my couch, and spend time sitting on it pretty much every evening. But I’m scared of what lies beneath it. An ungodly mix of food crumbs, cat hair, decaying skin cells... <shudder> I try not to think about it, and avoid looking under there!
Floating point math is like that for most game developers. We all use it, and most of the time it does what we want, but we don't spend too much time exploring its darker nooks and crannies.
Y'all know that floating point numbers are only an approximate representation, right? This can cause trouble when calculations produce inexact results, but the errors are usually small, so problems are fortunately rare.
A more dangerous but less widely understood issue is that floating point precision changes as you move around the world. The further from the origin, the less accuracy you have. This leads to pernicious bugs where your physics engine works perfectly in a small test level, but then the collision goes wrong only at the most distant corner of your largest level. If your game world is large enough to run into such problems, the solution is to use fixed point integer coordinates for position values.
But what about the common case where your game does not have any precision problems, and works perfectly despite the warts of floating point math? We're ok then, right?
Not so fast...
I compile my game in Debug configuration, and run it on an x86 Windows PC, using the 2.0 CLR, with a debugger attached. It works perfectly in spite of floating point rounding errors.
Then I compile the same game in Release configuration, and run it on Xbox 360, which has a PowerPC CPU core, using the .NET Compact Framework CLR, without a debugger. Again it works perfectly, but not exactly the same as before! My floating point computations will round in different ways, and thus produce slightly different results.
Floating point rounding can be different from one CPU architecture to another, but changes in compiler and optimization settings have a bigger impact. Common differences include:
foreach (foo in list) { foo.X = foo.Y + Bar + Baz; }
a smart optimizer might rearrange this to move the common subexpression outside the loop:
float tmp = Bar + Baz; foreach (foo in list) { foo.X = foo.Y + tmp; }
This optimization has changed the evaluation order. What used to be ((foo.Y + Bar) + Baz) is now (foo.Y + (Bar + Baz)). Addition is associative, so the two versions are in theory identical, but they reach their goal via a different intermediate result, which will not round the same way. The optimized code will produce a slightly different result to the original.
float tmp; // 32 bit precision temporary variable push a; // converts 32 to 64 bit push b; // converts 32 to 64 bit multiply; // 64 bit computation pop tmp; // converts result to 32 bits push tmp; // converts 32 to 64 bit push c; // converts 32 to 64 bit add; // 64 bit computation pop x; // converts result to 32 bits
Even though the multiply and add instructions are using 64 bit internal precision, the results are immediately converted back to 32 bit format, so this does not affect the result. But it is horribly inefficient to write out a value and then immediately load it back into the FPU! A smart optimizer would collapse this to:
push a; // converts 32 to 64 bit push b; // converts 32 to 64 bit multiply; // 64 bit computation push c; // converts 32 to 64 bit add; // 64 bit computation pop x; // converts result to 32 bits
Now the temporary result (after the multiply but before the add) is stored in an FPU register with full 64 precision, so the end result will not be rounded the same way. The result will be different from both the earlier non-optimized x87 code and also from any other CPU that does not have such a crazy internal architecture.
Ok, so any time we change the compiler or optimizer, we'll get slightly different results. Many games don't care. But remember those butterflies that like to create hurricanes? They ain't got nothing on floating point rounding differences as a cause of chaos!
This is madness! Why don't we make all hardware work the same?
Well, we could, if we didn't care about performance. We could say "hey Mr. Hardware Guy, forget about your crazy fused multiply-add instructions and just give us a basic IEEE implementation", and "hey Compiler Dude, please don't bother trying to optimize our code". That way our programs would run consistently slowly everywhere :-)
This issue is particularly scary for .NET developers. It's all very well to say "don't change your compiler or optimizer settings", but what does that mean when your code is compiled on the fly by the CLR? I can't find much official documentation in this area, so what follows is just educated guesswork: