SetDataOptions: NoOverwrite versus Discard

Originally posted to Shawn Hargreaves Blog on MSDN, Wednesday, July 7, 2010

This seemingly innocuous code:

    device.SetVertexBuffer(vb);
    device.Draw(...);
    device.SetVertexBuffer(null);

    vb.SetData(...);

is the sort of thing that makes graphics driver writers wake in the night, heart pounding, drenched in sweat. Verily 'tis the stuff of nightmares.

The problem is that the GPU runs asynchronously later than the CPU. When the CPU reaches the SetData call and wants to change the contents of the vertex buffer (or index buffer, or texture...) the GPU hasn't yet got around to processing the earlier draw call, so it still needs the previous contents of that buffer. What on earth is a poor driver to do?

  1. It could stall the pipeline, blocking the CPU until the GPU has finished using the resource. But stalling is not great for performance, and not possible at all on Xbox when using a resolution that requires predicated tiling. Tiling replays the same GPU command buffer multiple times (once per EDRAM tile), so it is nonsensical to expect the GPU to catch up with the CPU while the CPU is still in the process of generating that command buffer.

  2. It could just ignore the problem, letting both CPU and GPU continue ahead regardless. Everything 'works', but by the time the GPU reaches the Draw call, the vertex buffer will now contain the wrong data, so it will end up using vertices that didn't even exist back when the Draw call was issued. Probably not what the programmer had in mind!

  3. It could give up, throw its hands in the air, throw an exception, or crash. Easy for the driver writer, but not so much for the game programmer who wants to call SetData but is not allowed because of GPU implementation details they know nothing about and have no control over. 

  4. It could perform resource renaming, aka. Deep Magic, which works like so:
    1. Internally allocate a new resource, the same size and format as the original
    2. If only setting part of the resource, copy the contents of the original over to the new one
    3. Perform the SetData operation on the new resource
    4. Swap the two resources, so any time the CPU tries to refer to the original, it actually gets the new copy instead
    5. Keep the original resource around until the GPU has finished with it, then free it
    6. This allows rendering to continue in parallel, without any stalls, no matter when SetData is called
    7. If done right, the caller need never even be aware that this renaming took place

So which approach does XNA choose? As of version 4.0:

Prior to version 4.0, things worked the same on Windows, but our Xbox implementation was less awesome:

I'm very happy that we finally found time to implement resource renaming on Xbox, so you can SetData any time you like, regardless of whether predicated tiling is in use, and SetDataOptions.Discard works the same as on other platforms.

 

When to use SetDataOptions.Discard

The Discard flag is a hint to the driver that you no longer care about any of the data in the resource, so it does not need to bother preserving the existing contents. This can make resource renaming more efficient, because it allows the driver to skip the data copy described in the above step 4.b.

You should specify the Discard flag any time you are calling SetData on a dynamic buffer, and are planning on entirely replacing the contents of that buffer. Even if the current SetData call only changes part of the buffer, if you no longer need the data in the rest of the buffer, this is your chance to let the driver know that. It will love you for giving such a useful hint!

Note that Discard only means "the driver is allowed to throw away the current contents of the buffer if it finds that to be useful". The driver does not HAVE to discard the buffer contents if it does not wish to do so! If the buffer is not currently in use by the GPU, the driver will ignore the Discard hint.

 

When to use SetDataOptions.NoOverwrite

The NoOverwrite flag is a hint to the driver that you are not going to change any part of the resource which the GPU might still be using. This is not enforced, but if you do change data while the GPU is using it, you will get incorrect rendering (typically flickering, but almost anything might happen depending on timing).

You can use the NoOverwrite flag when combining multiple independent pieces of data into a single larger buffer. If you SetData one piece of data into one part of the buffer, then draw using this data, and are now about to SetData a different piece of data into a different part of the buffer, this is your chance to tell the driver that even though you are changing a resource which is still in use by the GPU, you happen to know that the region you are changing is not the same as the region the GPU is using, so there is no need for it to bother stalling or renaming the resource.

Used wisely, the NoOverwrite hint can provide dramatic speed gains. But used incorrectly, it can produce incorrect rendering results. Check out our Particle 3D sample for an example of using it correctly (see the giant comment near the top of the ParticleSystem class), or the way I implemented skidmarks in MotoGP.

 

When to combine them both

A common pattern for games that need to generate lots of dynamic geometry is to use a single dynamic buffer as a circular queue. New geometry is appended to the buffer with NoOverwrite, then drawn, then more geometry is appended again using NoOverwrite, and drawn, rinse, lather, repeat. When you reach the end of the buffer, the position is reset back to the start, and this wrapping SetData switches to Discard mode, which signals the driver to perform a rename and give us a fresh copy of the buffer. This scheme allows any amount of geometry to be efficiently drawn using a single relatively small buffer. The driver will internally allocate however many renamed copies are neccessary to avoid stalling.

In code:

    // Initialize.
    const int BufferSize = xxxx;
    DynamicVertexBuffer vb = new DynamicVertexBuffer(device, typeof(VertexType), BufferSize, 0);
    int currentBufferPosition = 0;

    // Add new geometry to the buffer, and return the offset for drawing these vertices.
    int AddVerticesToDynamicBuffer(VertexType[] vertices)
    {
        // Append to the existing buffer.
        int position = currentBufferPosition;
        SetDataOptions hint = SetDataOptions.NoOverwrite;

        // If we reached the end, wrap back to the beginning and Discard the existing buffer contents.
        if (position + vertices.Length > BufferSize)
        {
            position = 0;
            hint = SetDataOptions.Discard;
        }

        // Write the new data into the buffer.
        vb.SetData(position * sizeof(VertexType), vertices, 0, vertices.Length, sizeof(VertexType), hint);
        currentBufferPosition = position + vertices.Length;
        return position;
    }

Internally, SpriteBatch does pretty much exactly this.

Blog index   -   Back to my homepage