Understanding GPU performance

Originally posted to Shawn Hargreaves Blog on MSDN, Friday, March 14, 2008

When C# code runs too slowly, you can profile it to see where the time is being spent.

Graphics programmers are not so lucky. If you work for a big commercial studio and have access to an Xbox devkit, the Xbox version of PIX gives nearly as much information about GPU performance as profilers can tell you about the CPU, but for the rest of us the GPU remains a mysterious black box which cannot be so easily measured.

I like to think of GPU performance investigation as a Sherlock Holmes mystery:

We must gather clues, form a hypothesis, and then confront the villain to make them confess.

We do have one advantage over the original Mr. Holmes, and that is the ability to rewind time and run our program again with minor changes. This is an incredibly powerful tool. Let's say the evidence leads us to suspect Performance was killed by Colonel Mustard, in the library, with the candlestick. To confirm this theory, we can comment out our candlestick rendering code, and run the program again. Is Performance still dead? If she lives, we know the murder weapon was the candlestick, so should probably investigate how many polygons that is built from, and how expensive a shader it is using.

To be successful with this kind of investigation, two things are required:

  1. You need an accurate way of measuring whether Performance is alive, dead, or merely in a coma. See here for how to display the framerate, and the end of this article for how to set up a special profiling configuration of your project, which disables fixed timestep and turns off vsync. You normally want those on, but should temporarily turn them off any time you want to check Performance's heartbeat.

  2. You need a good mental model of how GPU hardware works. What made me suspect the candlestick? How did I know it wasn't the lead piping? Without some understanding to guide our suspicions, we could waste a lot of time randomly investigating one thing after another. GPU performance is highly nonlinear, so you will often find that removing an entire model makes no difference at all, but then making a seemingly minor change to a different model can double your framerate. To the uninitiated, these results can seem pretty random. Stay tuned...
Blog index   -   Back to my homepage