MotoGP Online Postmortem:

Making an Xbox Live launch title in 7 weeks


By Shawn Hargreaves



The original Xbox version of MotoGP, developed by Climax and published by THQ, was released in May 2002. This included a system link mode supporting up to 16 players, which was hugely popular with both visiting journalists and people at Microsoft, but unfortunately not accessible to the majority of gamers due to the practical problems of getting enough people, Xboxes, and TV's into the same room at the same time. This was a great shame because racing games are always more fun in multiplayer mode, and ours was no exception.

The original Xbox version of MotoGP

Microsoft first approached us about the possibility of adding support for their upcoming online service in May, but it wasn't immediately obvious how this could be done. We all agreed it would be great if we could have something out in time for the Live launch, but this left us very little time to add the Live support, and people who had bought the first game would be justifiably unhappy if we released a new version just a few months later!

After much discussion, we decided the best option would be to produce an online-only demo version of the game, which would be bundled with the Live starter kits. This would offer three tracks and a limited selection of bikes free to everyone subscribed to Live, but would also be able to read save files from the original game, so if the player had previously bought this and unlocked more features, they would be able to use the full set of 11 tracks and import their customised bikes into the Live version. We hoped this would encourage more people to go out and buy the full game, at the same time as making the Live support a free upgrade to anyone who already had the original title. Designing our new version as an online-only game hugely simplified the implementation, as it allowed the entire team to concentrate purely on Live support for the duration of the project.

Unfortunately, negotiations between Microsoft, THQ, and Climax took so long that we didn't get the go-ahead to start work until early July, by which point we only had 7 weeks left to complete the project. As if that wasn't enough of a challenge, George, the programmer who wrote our original networking code, chose this summer to be away getting married, so I had to take over responsibility for the network subsystem, starting from a position of total ignorance.

I will never forget the moment when, six weeks into the project, with our matchmaking system finally working well enough that you could create and join sessions, and a Microsoft programmer due to show up the next day to help us add voice support, I finally got around to opening up a UDP socket ready to start talking to other machines, only to realise that I had absolutely no idea how to do this! I was tempted to email Xbox developer support and admit I didn't have a clue what I was doing, but a quick Google search turned up a beginners guide to Winsock, which was enough to see me through.

Amazingly enough, we did manage to get the game finished on time, and even better, it has been widely praised as having the best Live implementation of any of the launch titles. I'm immensely grateful to everyone at Microsoft for their help and most importantly trust during this project, because we had so little to show right up to the very end: not even a proper schedule, because we couldn't spare the time to write down anything more formal than "I'm pretty sure that if we don't sleep much and drink enough coffee we might perhaps be able to maybe get it done". It was an immense relief when, just a week short of the deadline, George returned from his honeymoon and merged his position interpolation code in with my packet sending layer, so we finally had bikes moving around the track in roughly the right places. Thankfully this worked on the first attempt, because it would have been extremely embarrassing if we'd ended up without any working races at all!

Seven weeks experience as a network programmer isn't really enough for me to be able to claim more than beginner status, but for what it is worth, here are the top 10 things I learned during the course of this project...




1: Supporting Xbox Live is mostly a user interface problem, and not about networking at all.

Microsoft have already wrapped a lot of the more awkward network protocols into simple API functions, so the biggest challenge left for you is to call these routines in the right order, and to present information to the user in a convenient way.


Xbox Live requires some standard user interfaces, such as this friends list, to be available both from the main menu system and via the in-game pause menu

Most game interfaces are simple, with the bulk of development time being spent on attractive presentation rather than a solid technical underpinning. This is not good enough for a fluid Live implementation, which requires a large number of UI screens, some non-trivial flows of control from one screen to another, and far more options than would be normal for a single player game. A Live UI is constrained not only by what works in the context of your game, but also by Technical Certification Requirements (TCRs) for standard Live features such as the friends list and invite system.

At the very least I think a Live-capable UI layer should provide:

  • Multiple layers, so you can bring up error messages, confirmation prompts, progress indicators, etc, over the top of any other screens.

  • Reusable menu screens and alert boxes, where you provide a message string, a couple of options for the user to choose between, and a callback function that will be executed once the user has finished interacting with the screen. Live support involves a lot of obscure, rarely used error messages and confirmation prompts, so it needs to be easy to add these.

  • UI functions such as muting players, sending feedback, and changing your voice mask must be accessible from the ingame pause menu as well as from the main menu system. Ideally you should be able to bring up any UI screen during the game, because it adds a lot if gamers can access their friends list, respond to invitations, check the scoreboards, etc, without having to quit out of the game first.

  • Easy reordering of UI screens. In a typical single player game your menu structure might be as simple as:

        choose game mode
        choose vehicle
        choose level
        play game
    

    but when entering a Live session, this sequence is different depending on whether you are creating a new session, doing an Optimatch search, or looking for a Quickmatch, and in the latter case, there are still more differences depending on whether the matchmaking subsystem found a suitable session for you to join or decided to create a new one. This can quickly get out of hand unless you have a nice way to bring up a single screen in many different contexts, without the screen itself having to know where it came from or where to go next.

    For example, our scoreboards were initially accessed from the top level menu, but once we started playing the game we realised it would be better if they could be reached from inside the Live lobby as well. Although our scoreboards UI is a complex chain of three different screens, it only needed a single line of code to call this up from a new menu, telling the scoreboards system where to return when the user backed out of it, or if there was a network error. If this change had not been so easy to make, we would have had to settle for an inferior scoreboards implementation.




    2: If your code is flawed, it is better to rewrite it than to work around the problems.

    This was one of the main theories decided in a code postmortem at the end of the first MotoGP project. When we started on the Live version, our initial analysis of the work involved suggested the UI requirements discussed above, but at the time our UI system provided none of that, and had already proven awkward to work with even for the simpler situations encountered in a single player game.

    Rewriting your entire codebase in two months might sound like a tall order, but that is exactly what we did. Starting with such basic elements as our text rendering system, we rewrote every single UI screen, every speedometer, lap counter, and position indicator overlay, and every bit of game mode logic.

    A month into the project I was worried that we had bitten off far too much, and were never going to get it all working in time. But after six weeks it was working well enough that we were able to start moving screens around and experimenting with different layouts, taking advantage of a flexibility we never had before. After seven weeks, we shipped what one gamer called "the best example of what an Xbox Live game can be". Today, we have a solid codebase that is a pleasure to work with as we finish off MotoGP version 2.

    It made for a stressful couple of months, but I'm convinced we would never have managed such a good Live implementation if we had stuck with our old UI system, and might well have failed altogether. I can now confidently state that rewriting huge swathes of code takes a lot less time than you might expect, and the benefits can be enormous.




    3: Networks are asynchronous: deal with it.

    When any operation may take an indefinite amount of time to complete, you have to either build a state machine to track the current status of every task, or fire up a separate thread for each potentially blocking operation.

    I don't much like threads, mostly because they can be a nightmare to debug, but also because of the wasted resources of separate stack memory for each one. So I decided to build a state machine.

    MotoGP uses a list of NetTask objects, each of which encapsulates a single online operation. These are pumped once per frame (60 times a second) to do whatever network processing they may require, and each task provides a status that can be checked by the UI system. If all the tasks have an OK status, everything is normal so the UI carries on as normal. If any tasks have a BUSY state, the UI displays an animating progress indicator, and ignores user input until the task has completed. This allows the network code to be written in a simple way, completely ignoring UI issues, for instance:

        void LiveFriends::get_friends(int user)
        {
            if (XOnlineFriendsEnumerate(user, NULL, &mEnumHandle) == S_OK) {
                mStatus = NetStatus::BUSY;
                mUser = user;
            }
            else {
                mStatus = NetStatus::ERROR_MESSAGE;
                mStatus.mMessage = Msg::NetErr_Friends;
            }
        }
    
    
        void LiveFriends::update()
        {
            if (!mEnumHandle)
                return;
    
            switch (XOnlineTaskContinue(mEnumHandle)) {
    
                case XONLINETASK_S_RUNNING:
                    // still downloading: leave the BUSY flag set
                    break;
    
                case XONLINETASK_S_RESULTS_AVAIL:
                    // got new a set of data: copy it and clear the BUSY flag
                    mNumFriends = XOnlineFriendsGetLatest(mUser, MAX_FRIENDS, mFriends);
                    mStatus.clear();
                    break;
    
                default:
                    // oh dear!
                    XOnlineTaskClose(mEnumHandle);
                    mEnumHandle = NULL;
    
                    mStatus = NetStatus::ERROR_MESSAGE;
                    mStatus.mMessage = Msg::NetErr_Friends;
                    break;
            }
        }
    

    For its part, the UI code for entering the friends screen looks like:

        void go_into_friends_screen()
        {
            gLiveFriends.get_friends(current_user);
    
            activate_screen(gNetBusyScreen, go_into_friends_hook);
        }
    
    
        static void go_into_friends_hook()
        {
            activate_screen(gFriendsScreen);
        }
    

    In the normal case, the NetBusyScreen will display an animating "network busy" graphic as long as the gLiveFriends network task is in a BUSY state. When the network state changes to OK, the NetBusyScreen detects this and calls the provided hook function, which brings up the friends screen to display the information that has just been downloaded.

    If the friends enumeration fails, the network task sets an ERROR_MESSAGE status, in which case the UI layer displays that message to the user and dismisses the NetBusyScreen, so the hook function will never be called and the friends screen will never be activated.

    If a really bad error happens, one of the network tasks might set a status like SESSION_LOST, DASHBOARD_TROUBLESHOOT, or CABLE_UNPLUGGED, in which case the UI layer will take some more drastic action such as quitting right back out to the top level of the menu system, or rebooting to the Xbox Dashboard network troubleshooter.

    This design worked extremely well on MotoGP, keeping the UI and network systems simple, separate, and as linear as possible. It got to the point where I sometimes forgot we weren't multithreading, and started worrying about possible race conditions if the network layer changed state in the middle of a piece of UI code! (only to feel extremely stupid when I remembered we were just pumping each of these tasks one after another: lack of sleep is my only excuse :-)

    I'm sure there are many other nice ways of structuring such things, but my main point is that you need to choose a good one at the start of your Live project. Asynchronicity isn't going to go away, but a good design can make it easy to live with.

    Our system link code in the first MotoGP game blocked while establishing TCP connections, and the very last couple of bugs that held up our submission were caused by too many people trying to join a session at the same time, and crashing the network stack because it was unable to respond to them. Blocking is evil: don't ever do it!




    Various filters can be used to disguise your voice

    4: Everything else is easy.

    Once you have dealt with the UI and async issues, everything is pretty much plain sailing from then on.

    The Live API is good, well documented, and surprisingly reliable.

    Complex subsystems like the friends system and matchmaking have already been mostly designed for you.

    The overhead is small: we dropped from 20 bikes in our single player game to 16 in the Live version, which was more than enough to recover the memory and CPU taken up by Live. We could have managed 19 bikes if it wasn't for bandwidth issues.

    Voice support, which I was initially worried about, turned out to use up a negligible amount of bandwidth (down to as little as 450 bytes a second), and it was extraordinarily easy to implement because one of the example programs contained a reusable module that took care of all the hard bits. That code has since been merged into the XDK to form the high level voice API.

    Kudos to Microsoft on a job well done.




    5: Xbox Live is broadband-only, and a broadband connection is always active.

    In other words, if someone has a Live account, there is never any reason not to sign them onto the Live service as soon as they boot up your game. Don't hide the Live signon away inside the multiplayer menu, because then you won't be able to access Live features while they are in the single player game. Online scoreboards are if anything more relevant to single player than multiplayer modes, because when someone is trying to set a new record, they generally don't want to be distracted by other human players getting in the way. Most importantly, the Live experience works a lot better if your friends list is constantly accessible, so you can check who else is online and receive invitations to join other sessions regardless of whether you are currently playing online or in a single player mode.

    MotoGP Online had no single player game modes, so it was obvious and inevitable that the player should always be signed in to Live. It is not so easy to provide this level of online integration in a game that also has to work for players without network access, but it can be done. MotoGP 2, due out in May 2003, combines our Live implementation with an improved single player game, and I expect it to become the reference example of how to integrate the two.




    6: When you create an online community, be sure to include police.

    If you create a situation that allows even the smallest amount of social interaction, some people will get a kick out of spoiling this for others. The reasons why are probably best left to the psychologists, but as developers it is our job to prevent our games being ruined by antisocial behaviour.

    With features such as muting, voice masking, private sessions, and sending feedback on other players, it is clear that Microsoft have already put a great deal of thought into these problems, but a glance through any Live related web forum will still turn up a huge number of posts complaining about the behaviour of other gamers.

    On MotoGP, one of the biggest problems turned out to be people racing backwards around the track, trying to crash into the other riders. This started out as a way to fill the time before the start of the next race if a player gave up because they were too far behind the race leaders, but once it became clear how much it annoyed the more serious players, people started joining sessions purely for the purpose of racing backwards in them. It got to the point where every single race would have two or three backward riders, and people even started forming roadblocks of several bikes lined up across the track.

    You could try to prevent antisocial behaviour by building strict rules into the game, but that only works if you are able to predict what kind of rules will be needed. We certainly never anticipated the backward rider problem until we saw people doing this after the game was released! Also, some people may genuinely want to be able to race backwards, for instance to set up what they call "demolition derby" game sessions.

    I think it is better to let the host of each individual session choose whatever rules of behaviour they want to enforce. This way, players who find a like-minded host can enjoy their preferred style of gameplay, while people who disagree are free to go elsewhere or create new sessions of their own. I believe a good online game should give the host as much power as possible to police what happens in their game.

    The most important aspect of giving control to the host is that they must be able to kick other players out of the session. We got this half-right in MotoGP Online, in that the host can kick out players, but there is nothing stopping them from rejoining as soon as the current race finishes. This gets particularly bad if someone is using the Quickmatch search mode, which is likely to keep automatically putting them back into the same session even if they are not deliberately trying to annoy the host! In the upcoming MotoGP 2 we have added an option to ban players for the duration of the session.

    I think it is also useful to give the host as many options as possible for controlling the rules of the game itself. For instance in MotoGP we included an option to disable bike-to-bike collisions, which many people turned on as a response to the backward rider problem, even though we had never intended it to be used for that.

    If the host is going to enforce good behaviour, it is crucial that they know who to enforce it against. In other words they need to be able to match in-game behaviours up with the responsible gamertag, so they can tell who to reprimand and perhaps kick out of the session. In MotoGP we display gamertags above each bike for a few seconds when they first come onscreen, and show the names of whoever is talking over the voice communicator in the corner of the screen. In my experience, people are noticeably more polite in games that feature a continual display of who is talking, I suspect because this makes the communication feel less anonymous.




    7: Online scoreboards can be both a blessing and a curse.

    MotoGP uses the online scoreboards to store record lap times for every track, and also gives each player an overall ranking taken from the sum of their best times across all the tracks. This feature barely made it into the game, having been at the top of our "drop this if we run out of time" list the entire way through the project, but within days of implementing it we knew we were onto a winner. The persistent lap times added so much excitement to our competition within the team, and even more so to a rankings war that quickly developed between our lead designer and various people at Microsoft, this feature turned out to be if anything even more addictive than the regular online races!

    Online scoreboards are great, but only as long as all the times on them are accurate!

    A racing session lasts for at most a few hours, but lap records last forever. The scoreboards interface makes it easy to compare your times against those of your friends, so the first thing you do after signing on is usually to check if anyone has recently overtaken you. If they have, it is hard to rest until you've shaved another few seconds off your own time to get back in front.

    The more that people care about their scoreboard ranking, the more they will be upset if that ranking is inaccurate for any reason, and the more time they will be willing to spend looking for ways to cheat the system. We were lucky that a racing game has such a simple and obviously fair skill measurement as a lap time, where other genres have to rely on more opaque heuristics that risk players figuring out how to cheat the algorithm. But we were not so lucky in other ways...

    During the Live public beta test, we noticed the MotoGP scoreboards were filling up with obviously impossible lap times, including a couple of 0.01 second entries! This turned out to be caused by extreme cases of network lag, which could accelerate bikes to what should have been impossible velocities. Our network code is generally quite robust against poor connections, but when you are hooked up to a persistent scoreboard, it only needs a single bug to happen once. That error will then be preserved forever, and sorted to the top of the list where everyone can see it.

    Fortunately for us, Xbox Live provides an autoupdate mechanism. Retail disks had already been manufactured by the time we fixed the bug, but that was no problem. We prepared an update in time for the retail Live launch, reset the scoreboards to remove the glitch times, and all was well. As far as I'm aware, MotoGP was the first ever console game to be updated via an online download, and the process is so transparent that many gamers never even realised it had happened.

    Sadly, a few months after the Live launch, more impossible times started turning up on top of the scoreboards. This time the problems were caused by holes in the collision mesh, which were allowing people to punch through fences in order to bypass sections of the track. There were only a handful of places where this was possible, but during several hundred man-years of online racing, people managed to find and exploit even the most obscure weaknesses. No QA department has the resources to match that level of scrutiny.

    During the development of MotoGP 2, we have made it a top priority to check and double-check every element of the game that can contribute to scoreboard records, and to include as many redundant safeguards as we could think of. This is important for online games in a way that it never was before.




    8: Packet headers are your worst enemy.

    (warning: non-programmers may want to skip ahead to section 10)

    This is where I show up my lack of experience as a network programmer, because I'm sure it will be common knowledge to anyone familiar with such things! At the beginning of this project I understood in an intellectual sense that packet headers could be expensive, but it wasn't until I started working out some actual numbers that I realised how much of an issue this could be.

    A standard UDP header is 28 bytes, and a TCP header is 40 bytes. On Xbox they are larger, due to the overhead of the packet encryption and NAT traversal, requiring 44 bytes for a UDP header or 56 for TCP.

    This may not seem like too much at first glance, but consider the case of 16 players sitting in a lobby talking to each other. MotoGP uses a 40 millisecond granularity on the voice compression, which outputs an 18 byte chunk of speech data. 18 bytes times 25 packets a second times 15 listeners gives a total data rate of 53 kbps, but if you add in the UDP header sizes, this goes up to 182 kbps. That is an overhead of 243%, and makes the difference between meeting or missing our target of working over a 64 kbps connection.

    Obviously, you don't actually need to send voice data every 40 milliseconds. The tradeoff is that the longer you buffer it up, the more latency you introduce, but also the less space you waste on packet headers. In MotoGP we send voice packets 4 times a second, which gives a 73 kbps data rate, 38% packet overhead. Still slightly over our 64 kbps target, but not by so much that I can't get away with ignoring the math and pretending it is ok :-)

    With so much of the total bandwidth being wasted on packet headers, it is crucially important to minimise this in every way you can. Never send two packets where one will do. If you have a bit of data that you'd ideally like to send this frame, consider whether you might be able to hold it back for a while and then merge it in with some other information that is due to be sent in the near future.

    Don't mix TCP and UDP. If you send some data by one and some by the other, you won't be able to merge packets, so you'll end up paying twice the header overhead. Even if you have to increase the payload size in order to do this, you can reduce the overall bandwidth usage by combining everything into a single protocol. Since TCP is unsuitable for game data and voice packets, in practice this means you should use UDP (or the Xbox VDP equivalent) for everything.

    Once you have all your packets running over the same protocol, make sure they all use the same port, too. That minimises resource usage, and due to an optimisation in the Xbox network stack implementation, if you put everything on port 1000 you can save 4 bytes per packet, too.




    9: TCP is not a reliable protocol.

    Our system link code in the original MotoGP game used UDP for game traffic, and TCP for important things like joining sessions, starting races, and communicating the results. Every machine had a TCP link to every other, and we assumed that these would always stay up and always be reliable.

    Bzzzt, wrong!

    Networks are far less predictable than that. Even playing over the LAN here in our office, we saw obscure problems that could only be explained by TCP connections occasionally being dropped. This was rare enough (and impossible to reproduce) that we managed to pretend it didn't happen, but working on the assumption that the Internet was sure to be worse than our LAN, we decided the Live version had to be robust enough to deal with any number of dropped connections.

    First off, we changed our reliable data system from a peer to peer layout to client server. This meant that although clients were still sending UDP game traffic directly to every other client, they only needed to maintain a single TCP connection to the host. This eliminated a whole category of netsplit conditions where machines could end up with inconsistent views of the session. If they could communicate with the host, they were now guaranteed to have a correct version of all reliable data, or if they were unable to reach the host, they would be dropped out of the session.

    The next task was to deal with dropped TCP connections. These are rather more common on Xbox than on PC, due to the fixed-memory nature of a console platform. The TCP protocol guarantees reliable and in-order delivery, and to achieve this it must buffer up copies of all packets until it gets confirmation that they have been successfully received. A PC with virtual memory can queue up unlimited amounts of pending data, but on Xbox the amount of memory used by the network stack is fixed at startup, by default to 48k. If you try to send a TCP packet while this buffer is still full of previous packets awaiting acknowledgement, the network stack has no choice but to drop the connection.

    The Xbox TCP implementation can be made more robust by increasing the buffer size, but that still wouldn't be enough to deal with really bad spikes in the number of dropped packets, and in any case I had more important things to do with my RAM. So I wrote some code to automatically reopen any connections that might get dropped. There were a couple of fiddly race conditions involved with getting reconnect messages from machines you hadn't yet noticed were gone, but I got it working after a few hours and a few hundred lines of code.

    The problem is, if a TCP connection can be dropped and then reopened, it is no longer reliable because you will have lost any data sent while it was down. To deal with this I had to attach a version number to each piece of data, so I could detect anything that had gone missing and know to resend it. Without intending to, I had ended up building my own reliable protocol over the top of TCP, at which point it occurred to me, why not strip out the TCP entirely and send the whole thing by UDP? So that's what I did. Stripping out layers of listens and connects made the code a lot simpler and easier to understand, and the end result was also far more bandwidth efficient for the reasons discussed in the previous section.

    I was initially nervous about replacing TCP with my own homebrew protocol, because TCP uses a lot of clever techniques that have been perfected by experts over many years, while I was completely new to the field. What I came to understand, though, was that TCP solves the wrong problem. It provides all-or-nothing, do-or-die delivery of a continual stream of data, which is great for an ftp client or web browser, but less than ideal for an online game where you really just want to synchronise the contents of a fixed set of data structures. It doesn't matter if some packets get dropped, as long as you can sync up the data later on when conditions improve. When the host is sending out multiple evolving versions of the race results structure, updated as each player crosses the finish line, it is actually counterproductive for TCP to try to deliver all of these in order. As soon as a more recent version of the structure is available, it is irrelevant what happens to the previous ones, as long as all clients are guaranteed to get this one important latest version.

    The amount of reliable data needed in a game is minutely small compared to the total bandwidth usage, and it is surprisingly easy to implement an efficient reliable protocol by piggybacking a few version numbers over the top of an existing flow of UDP packets. This protocol needs to synchronise the contents of a fixed number of state structures, rather than guaranteeing a stream of data. Even if you have information that seems like it needs to be sent as a one-off message, it is easy to reword this into a persistent state that can be embedded in a structure. For instance the host deciding to start the race can be implemented by incrementing a "race number" integer in the game state structure. As soon as clients notice this has changed, that is their signal to go from the lobby into the race, or if they still haven't finished the previous race, they know to abandon it and load the new one.




    10: You will need a lot of devkits!

    This is extremely obvious, but we only worked it out halfway through the project. To make an online game you will need at least two devkits per programmer, and three for testing anything but the simplest network scenarios. MotoGP supports up to 16 players, and to test this we had to round up a set of 16 kits, which meant interrupting the entire code and art teams until the test was complete.




    Project timeline:

    2002:
    • April 19 - final Xbox build of the original MotoGP game
    • May 17 - first spoke to Microsoft about the possibility of doing a Live version
    • May 20 - original Xbox game arrives in the stores
    • June 13 - finished the PC version
    • July 5 - started working on the Xbox Live version
    • August 23 - MotoGP Online is complete, and goes into testing
    • September 19 - final build sent off to Microsoft
    • October 9 - MotoGP goes out as part of the Xbox Live public beta
    • November 4 - we prepare an autoupdate to fix scoreboard glitch problems
    • November 15 - Xbox Live retail launch in the US
    Summer 2003 - MotoGP 2 combines a refined Xbox Live implementation with improvements to the single player game.