SimpleMath, created by my colleague Chuck Walbourn, is a header file that wraps the DirectXMath SIMD vector/matrix math API with an easier to use C++ interface. It provides the following types, with similar names, methods, and operator overloads to the XNA Game Studio math API:
DirectXMath provides highly optimized vector and matrix math functions, which take advantage of SSE SIMD intrinsics when compiled for x86/x64, or the ARM NEON instruction set when compiled for an ARM platform such as Windows RT or Windows Phone. The downside of being designed for efficient SIMD usage is that DirectXMath can be somewhat complicated to work with. Developers must be aware of correct type usage (understanding the difference between SIMD register types such as XMVECTOR vs. memory storage types such as XMFLOAT4), must take care to maintain correct alignment for SIMD heap allocations, and must carefully structure their code to avoid accessing individual components from a SIMD register. This complexity is necessary for optimal SIMD performance, but sometimes you just want to get stuff working without so much hassle!
Enter SimpleMath...
These types derive from the equivalent DirectXMath memory storage types (for instance Vector3 is derived from XMFLOAT3), so they can be stored in arbitrary locations without worrying about SIMD alignment, and individual components can be accessed without bothering to call SIMD accessor functions. But unlike XMFLOAT3, the Vector3 type defines a rich set of methods and overloaded operators, so it can be directly manipulated without having to first load its value into an XMVECTOR. Vector3 also defines an operator for automatic conversion to XMVECTOR, so it can be passed directly to methods that were written to use the lower level DirectXMath types.
If that sounds horribly confusing, the short version is that the SimpleMath types pretty much Just Work™ the way you would expect them to.
By now you must be wondering, where is the catch? And of course there is one. SimpleMath hides the complexities of SIMD programming by automatically converting back and forth between memory and SIMD register types, which tends to generate additional load and store instructions. This can add significant overhead compared to the lower level DirectXMath approach, where SIMD loads and stores are under explicit control of the programmer.
You should use SimpleMath if you are:
You should go straight to the underlying DirectXMath API if you:
This need not be a global either/or decision. The SimpleMath types know how to convert themselves to and from the corresponding DirectXMath types, so it is easy to mix and match. You can use SimpleMath for the parts of your program where readability and development time matter most, then drop down to DirectXMath for performance hotspots where runtime efficiency is more important.
Here is a simple object movement calculation, implemented using DirectXMath. Note the skullduggery to make sure the PlayerCat instance will always be 16 byte aligned (and I didn't even include the implementation of the AlignedNew helper here!)
#include <DirectXMath.h> using namespace DirectX; __declspec(align(16)) class PlayerCat : public AlignedNew<PlayerCat> { public: void Update() { const float cFriction = 0.99f; XMVECTOR pos = XMLoadFloat3A(&mPosition); XMVECTOR vel = XMLoadFloat3A(&mVelocity); XMStoreFloat3A(&mPosition, pos + vel); XMStoreFloat3A(&mVelocity, vel * cFriction); } private: XMFLOAT3A mPosition; XMFLOAT3A mVelocity; };
Using SimpleMath, the same math is, well, a little more simple :-)
#include "SimpleMath.h" using namespace DirectX::SimpleMath; class PlayerCat { public: void Update() { const float cFriction = 0.99f; mPosition += mVelocity; mVelocity *= cFriction; } private: Vector3 mPosition; Vector3 mVelocity; };
Here is the x86 SSE code generated for the DirectXMath version of the Update method:
movaps xmm2,xmmword ptr [ecx+10h] movaps xmm1,xmmword ptr [ecx] andps xmm2,xmmword ptr [?g_XMMask3@DirectX@@3UXMVECTORI32@1@B] andps xmm1,xmmword ptr [?g_XMMask3@DirectX@@3UXMVECTORI32@1@B] movaps xmm0,xmmword ptr [__xmm@3f7d70a43f7d70a43f7d70a43f7d70a4] addps xmm1,xmm2 mulps xmm0,xmm2 movq mmword ptr [ecx],xmm1 shufps xmm1,xmm1,0AAh movss dword ptr [ecx+8],xmm1 movq mmword ptr [ecx+10h],xmm0 shufps xmm0,xmm0,0AAh movss dword ptr [ecx+18h],xmm0 ret
The SimpleMath version generates slightly more than twice as many machine instructions:
movss xmm2,dword ptr [ecx] movss xmm0,dword ptr [ecx+4] movss xmm1,dword ptr [ecx+0Ch] unpcklps xmm2,xmm0 movss xmm0,dword ptr [ecx+8] movlhps xmm2,xmm0 movss xmm0,dword ptr [ecx+10h] unpcklps xmm1,xmm0 movss xmm0,dword ptr [ecx+14h] movlhps xmm1,xmm0 addps xmm2,xmm1 movss dword ptr [ecx],xmm2 movaps xmm0,xmm2 shufps xmm0,xmm2,55h movss dword ptr [ecx+4],xmm0 shufps xmm2,xmm2,0AAh movss dword ptr [ecx+8],xmm2 movss xmm1,dword ptr [ecx+0Ch] movss xmm0,dword ptr [ecx+10h] unpcklps xmm1,xmm0 movss xmm0,dword ptr [ecx+14h] movlhps xmm1,xmm0 mulps xmm1,xmmword ptr [__xmm@3f7d70a43f7d70a43f7d70a43f7d70a4] movaps xmm0,xmm1 movss dword ptr [ecx+0Ch],xmm1 shufps xmm0,xmm1,55h shufps xmm1,xmm1,0AAh movss dword ptr [ecx+10h],xmm0 movss dword ptr [ecx+14h],xmm1 ret
Most of this difference is because I was able to used aligned loads and stores in the DirectXMath version, while the SimpleMath code must do extra work to handle memory locations that might not be properly aligned. Also note how the SimpleMath version loads the mVelocity value from memory into SIMD registers twice, while the extra control offered by DirectXMath allowed me to do this just once.
But hey, sometimes performance isn't the most important goal. If you care more about optimizing for developer efficiency, SimpleMath could be for you.