XML and the Content Pipeline

I decided to write a few posts about the role played by XML in the XNA Framework Content Pipeline, because this isn't well documented and people seem to find it confusing.

The first thing to get straight is the distinction between which things are fundamental parts of the pipeline architecture, versus which are just specific implementations for one particular type of data. Let's start with a recap of the basic pipeline architecture:

You have a file containing game data, which can be in any format you like
The ContentImporter reads this file from disk, returning a managed object
1. It might return one of our standard Microsoft.Xna.Framework.Content.Pipeline.Graphics types, but could also load any custom type of your own
The ContentProcessor converts the managed object into a different format
1. Sometimes it returns the same type, but massages the contents of the data (for instance adding mipmaps to a texture)
2. Other times it may return an entirely different type (for instance converting a FontDescription into a SpriteFontContent)
3. The processor may also be a no-op
The ContentTypeWriter writes the processor output object into a binary .xnb file
The .xnb file is deployed to Xbox
Your game calls ContentManager.Load
The ContentTypeReader loads the .xnb data into memory

Note that there is no mention of XML in any of these steps. So at a fundamental level, XML is not part of the underlying Content Pipeline architecture.

XML enters the picture in two places:

In stage 1 of the steps described above, your file might well happen to be in XML format. If that is the case, you would want the importer in stage 2 to read XML data. There are many ways this can be achieved:

You could use our built in XmlImporter, which is a trivial wrapper around the IntermediateSerializer class
Or you could write a custom importer using any of the following:
- The standard .NET XmlSerializer
- Or XmlDocument
- Or XPath
- Or XmlReader
- Or the serializer formerly known as Indigo
- Or the WPF XAML serializer
- Or any of the various third party XML solutions
- Notice a trend here? .NET offers a lot of different ways to read XML data!

Why, given all these options, did we bother to create our own IntermediateSerializer? Weren't there enough different serializers already?

Because of the second place where the pipeline uses XML.

After the importer runs, but before the processor, we have an optional stage 2.5, where we write the data that was just loaded to an XML file in the obj directory. We do this for two reasons:

When you are debugging a problem with your data, it can be useful to examine it in a human readable XML format. This makes it easy to see exactly what has been read by the importer, and what is going into the processor.
For performance. Because we have cached the data in this XML file, if a later part of the build requests that same data again, we can just deserialize it rather than having to re-import the original file from scratch. This was originally designed to speed up the case where you change your processor code, requiring the processor to run again, but the importer and original file have not changed. We never had time to implement that level of smarts in the pipeline (I still hope we'll get around to it someday!) but this cache file is part of our planning to eventually make that possible.

Caching is optional: importers can turn it off by setting an attribute. We disable it for our texture importer (textures are big, fast to import, and not very interesting to debug, so caching them would be a waste of time) but we do cache the outputs from our X and FBX importers.

We originally designed our IntermediateSerializer for the purpose of managing these cache files (I will explain why XmlSerializer was unsuitable in my next post). Once we had a serializer of our very own, we decided it would be useful to expose this as a public API so people could use it for other things as well. For instance:

When you are debugging a complex processor, it can be useful to manually call IntermediateSerializer.Serialize at interesting points, dumping out copies of your data for later analysis.
Since this serializer can efficiently transfer model data between a human readable XML format and the pipeline object model, perhaps this might be useful for people writing tools such as level editors? For instance they could use a technique like this sample to import models from X and FBX formats, then do all their editing directly on the NodeContent data, using the IntermediateSerializer to load and save it.
Once we had the IntermediateSerializer, we found ourselves wanting to use it to import XML files into the pipeline. For instance it was trivial for us to load .spritefont files by calling into this existing serializer code. We decided it would be useful if we wrapped it up to create the generic XmlImporter, so people could easily use it to load their own XML data.

The important thing to take away from all this is that there is nothing special or magic about our XmlImporter. This happens to be the default importer which we select when you add an XML file to your content project, but if you don't like how the IntermediateSerializer works, or want to load XML data in some other way, you can write your own importer using any of the other XML choices provided by .NET.

Also, you should note that using the XmlImporter only affects the importer stage of the pipeline. Once the data has been imported, it is just a regular managed object like any other. XML is not involved in the processor, ContentTypeWriter, .xnb, or ContentTypeReader stages.

My next couple of posts will talk about why we decided to create this new serializer, and go into more detail about how it works.

Blog index - Back to my homepage

XML and the Content Pipeline

Originally posted to Shawn Hargreaves Blog on MSDN, Friday, May 30, 2008