Normally, the CPU and GPU run in parallel. Framerate = max(CPU time, GPU time).
If your code causes a pipeline stall, however, the processors must take turns to run while the other one sits idle. Yikes! Now framerate = CPU time + GPU time. In other words, programs that stall can be both CPU and GPU bound at the same time.
The easiest way to cause a stall is to draw some graphics into a rendertarget, then GetData on the rendertarget texture to read the results back to the CPU. Think about what happens if you do this:
One of the great successes of the Direct3D API is how it hides the asynchronous nature of GPU hardware. Many graphics programmers are writing parallel code without even realizing it! But as soon as you try to read data back from GPU to CPU, all this parallelism is lost (one reason it is hard to accelerate things like physics or AI on the GPU).
A similar problem occurs with occlusion queries. To avoid a stall, the query returns immediately, but with the IsComplete property set to false. The query completes at whatever later time George gets around to processing the relevant drawing instructions. Games must deal with this data not being available straight away. For instance our Lens Flare sample falls back on occlusion data from the previous frame if the latest information is not yet available.
There is one situation where you can cause pipeline stalls purely by writing data to the GPU, rather than reading back from it. Can anyone figure out what that is?