I recently bought a new house, and along the way our agent recommended we get a sewer inspection. Those things are so cool! The guy arrived with a roll of tubing attached to a gizmo that looked like a prop from City of Lost Children, unwound this into the sewer, and got a realtime video feed back to his laptop.
The bad news is he found a cave-in just past the south wall (but don't worry, it won't cost too much to fix). In fact the inspector suspected this even before we saw it on the video. He had a feeling something was wrong because he noticed a depression in the lawn.
But here's the thing: even though he was able to diagnose the problem by applying a mixture of experience and intuition, we still sent down the camera to confirm his guess before shelling out the $$$ to buy the house. Guesses are no substitute for hard data, especially when large sums of money are involved!
Software optimization works the same way. Intuition can be a useful starting point, but if you really care to understand what's going on, there's nothing quite like looking inside the sewer and seeing for yourself. And yet, I've lost count of how many times I've seen variants on the following conversation:
Programmer: Can someone tell me how to speed up my Frobnicate method?
Expert: How do you know Frobnicate is the bottleneck? Have you profiled this program?
Programmer: Well no, but I'm pretty sure; I mean it's got to be, right?
Expert: Do you even have a profiler?
Programmer: Well, not exactly...
Before you can hope to speed up a program, you have to know what part of the code is making it slow. The problem with guessing is that if you guess wrong, you can waste a lot of time trying to speed up something that wasn't slow in the first place.
People are often tempted to add timing instructions to their code, using Stopwatch to record how long various sections take to run. There is indeed a time and place for this technique (I will blog more about it later), but it is a blunt instrument and prone to error. Common problems:
A better solution is to use a profiling tool. Profilers are to game programmers what spirit levels are to carpenters. Sure, you can do good work without one, but at some point you get fed up of eyeballing shelf after shelf ("do you mind checking if this looks straight while I hold it in place? Ok, down a bit on the left, sorry, I meant the other way, ok, that looks good; wait, no, it seems wonky now I'm standing over here...") and you realize it is worth a trip to Home Depot to spend $15 for the proper tool.
Convinced yet?
Ok, now the bad news: there is no single tool that will tell you everything in one place. To truly understand performance, you must combine data from several tools. Here are the techniques I use most often, starting with the most important:
Make sure your profiler isn't being tricked by graphics rendering costs. Graphics is another "play now, pay later" feature which can lead to surprising profiler results. You need to understand how the CPU and GPU run asynchronously from each other, then work out whether you are CPU or GPU bound. If GPU bound, narrow down what part of the GPU is your bottleneck and try some performance experiments. Also watch out for pipeline stalls.