GPU profiling in Visual Studio 2013 Update 2

Originally posted to Shawn Hargreaves Blog on MSDN, Saturday, April 5, 2014

The graphics debugging tool formerly known as PIX has been integrated into Visual Studio for a while now, and gets better in every release.  But unlike Xbox PIX, the Windows incarnation of this technology has until now been only for debugging and not profiling.  It provided lots of information about what happened, but none at all about how long things took.

For Windows Phone 8.1, my team (hi Adrian!) added the ability to measure and analyze GPU performance.  I’m particularly proud of the fact that, thanks to our efforts to make the Windows and Phone graphics stacks as similar as possible, we were able to build this new feature focusing mostly on Phone, yet the resulting code works exactly the same on full Windows.  Visual Studio is even able to reuse a single version of our GPU performance analysis DLL across both Windows 8.1 and Phone 8.1.

Rong’s talk at the Build conference shows this in action, and you can download Visual Studio 2013 Update 2 RC to try it out for yourself.

 

Here’s how it works.  I opened the default D3D project template, which gives me an oh-so-exciting spinning cube plus a framerate counter in the bottom right corner:

 

image

 

To use the graphics diagnostics feature, open the Debug menu, click Graphics, and then Start Diagnostics:

 

image

 

This will run the app with D3D tracing enabled.  Press the Print Screen (PrtSc) key one or more times to capture the frames you want to analyze.  When you quit the app, Visual Studio will open its graphics debugger.  This will look familiar if you have used PIX before, but the UI is considerably improved in this release, plus it now supports Phone as well as Windows:

 

image

 

So far so good, but where is this new profiling feature?   Select the Frame Analysis tab, and click where it says Click here:

 

image

 

Our new analysis engine will whir and click for a while (the more complicated your rendering, the longer this will take).  When everything has been measured it shows a report describing the GPU performance of every draw call in the frame:

 

image

 

This simple app only contains two draw calls.  Event #117 (DrawIndexed) is the cube, while #137 (DrawIndexedInstanced) is the framerate counter.  There would obviously be a lot more data if you analyzed something more complicated, in which case the ID3DUserDefinedAnnotation API can be used to organize and label different sections of your rendering.

The blue bars near the top (labeled Time) show how long each draw call took for the GPU to execute.  Clearly our cube is much more expensive than the framerate text  (although both are ridiculously quick – this template isn’t exactly stressing my GPU :-)   The column titled Baseline shows the numeric duration of each draw, and the other columns show a series of experiments where we changed various things about the rendering and measured how much difference each one made to the GPU.  For instance this data tells us that:

  1. Shrinking the output viewport to 1x1 reduced GPU time to just 2% of the original.  This means we are heavily fill rate limited, so a possible optimization would be to reduce the backbuffer resolution.
  2. Turning on 2x or 4x MSAA slowed things down, but only by ~10% – worth considering whether we can afford that slight perf hit in exchange for the quality improvement?
  3. Reducing the backbuffer from 32 to 16 bit format gave only a small improvement.
  4. Automatically adding mipmaps to all the textures, or shrinking all the textures to half size, did not significantly affect performance, so we know this app is not bottlenecked by texture fetch bandwidth.

There are a couple of different forms of color highlighting going on in this report:

  1. The background of the first draw call is light red to show it was one of the more expensive draws in the frame, and therefore the part worth concentrating on.
  2. The most statistically significant differences produced by the various rendering experiments are highlighted in green (for improvements) or red (for changes that hurt performance).  Numbers that are not highlighted indicate that, although we did measure a change of performance, this may just be random measurement noise rather than a truly significant change.

Move the mouse over any of these numbers to a view a hover tip showing more data about that particular measurement.

 

“Sounds great!  So what types of device can I use this stuff on?”

  1. The debugging part of this tool works on all Windows 8.1 and Phone 8.1 devices.
  2. Performance analysis requires the graphics driver to support timestamp queries, which was not part of Windows Phone 8.  This will work on Windows, and on newer 8.1 phones once those are available, but it will not work on existing phones  (even when they are upgraded to the 8.1 OS, their older drivers will be missing the necessary query ability)
  3. New 8.1 phones will also report GPU counter values directly from the driver, which gives much richer information about what is going on inside the GPU.
Blog index   -   Back to my homepage