Compressed GPU data formats

Consider the flow of data through a typical game:

Customer downloads game from Marketplace
Game data is stored in the filesystem
Game loads the data
Data is stored in memory
CPU or GPU fetches data from memory
CPU or GPU does awesome cool stuff with the data

Making the data smaller has benefits at every stage:

Faster to download
Need less storage space to install the game
Faster load times
Takes up less memory
Fewer cache misses mean less time wasted waiting for memory fetches, so the game runs faster

So we should always compress everything as much as possible, right?

But every ointment has a fly…

Before we can use compressed data, we must decompress it. The question is, does this decompression work take more or less time than we saved by using compression in the first place? We can choose to decompress our data at any of the stages I listed above. Stages that come before the decompression operation will benefit from the compressed format, while later stages will not. Games often combine multiple forms of compression that are decompressed at different times:

Decompress between stages 1 and 2
- Windows Phone .xap and Xbox Indie .ccgame packages are downloaded from Marketplace in compressed format, but decompressed when installed onto the target device
Decompress during stage 3
- XNA .xnb content compression applies to data stored in the filesystem, which is decompressed when loaded into memory
- JPEG and PNG image files are compressed in the filesystem, and decompressed during loading
Decompress between stages 5 and 6
- Polygon collision detection routines can often benefit from choosing a smaller data format, for instance only storing triangle vertices without bothering to cache the normal, plane, or bounding box. Even though they must recalculate this missing data every time they check a triangle, this extra computation is often cheaper than the memory bandwidth to fetch precomputed values!
- Textures can use DXT compression
- Vertex data can use smaller packed vector formats

Those last two bullets are interesting because they decompress inside the GPU after data is fetched from memory. This allows the compression to benefit every stage of the data flow. Unlike JPEG compression, DXT textures remain compressed after loading, so they save memory and bandwidth as well as filesystem storage space. And because the GPU is a highly asynchronous parallel processor, decompressing DXT textures or packed vertex data is literally free! Of course the hardware cannot actually decompress data in zero time, but there are dedicated unpacking units which run in parallel with other GPU tasks, and which are fast enough to never be the performance bottleneck. So in practice, taking advantage of these compression features will never cost you extra time to decompress the data, and can save a lot of time if your rendering is bottlenecked by memory bandwidth.

I set out to write this article specifically about packed vertex data formats, but just setting the scene has made it long enough already, so the details will have to wait for my next post.

Blog index - Back to my homepage

Compressed GPU data formats

Originally posted to Shawn Hargreaves Blog on MSDN, Friday, November 19, 2010