Evils of Optimization

Performance. The most important goal to many game developers is ensuring that their game reaches the highest levels of performance possible, in the hopes that an engine with sufficient efficiency will either allow the game designers to push the boundaries of the start of the art or to at least allow the game to run on the widest variety of hardware possible. Most of those same developers are doing more harm to their own cause than good.

The first problem with most optimization work done by a great many developers these days is that the optimizations they implement are flat out wrong. At an abstract level, all computer developers worth their salt are aware of basic algorithmic efficiency, and are quite capable of figuring out the best algorithms to use for a given task. When it comes to lower level optimizations, however, developers are often quite misguided. In large part this is simply because quite a few valid optimization techniques of days gone by are no longer valid on modern hardware. Many of these old optimization techniques remain popular even with today's students, not just the grizzled professionals who were actually programming in a distant age.

As an example, a long, long time ago, computer processors were slow and computer memory access was roughly equivalent to the speed of those processors. In those ancient, by-gone days, performing basic mathematical calculations like finding a sine or square root was not at all on option for real-time pseudo-3D graphics. It was quite common to pre-calculate tables of values, such as a table of the sine for each of the 360 whole degrees. Accessing these tables was much faster than calculating the sine of a degree on the fly, and there was no serious loss of precision due to the avoidance of floating point on older machines. I've seen students -- students who've never even programmed before a year ago -- try that same "optimization" in a new game project. The problem is that modern CPU speeds have far outclassed modern memory access speeds. A modern CPU can calculate a sine in less ticks than it would take to pull a pre-calculated value out of RAM... and we're talking about calculations on high-precision floating-point values, not just integers. We can hope that the table stays lodged firmly in L1 cache, but why waste space on storing a table there when we don't need to? Using a pre-calculated table not only limits the precision available but will actually slow down the application.

A second example is the floating-point arithmetic just pointed out. I've seen students who quote two-decade-old wisdom about using fixed-point math and avoiding floating point at all costs. Modern CPUs not only have on-die integrated floating-point units, and not only have floating-point SIMD units, but some actually have faster and more powerful floating-point units than they do integer units. Once again, the old optimization techniques actually do more harm on modern architectures than good.

Chances are, anything a game developers thinks he knows about optimization is wrong. Anything that he knows that is actually right is going to become wrong in a few years. The only people who really understand modern CPUs and architectures are the hardware engineers designing them and the people who live and breath instruction sets: the compiler developers. We need to just rely on the compiler and let it do its job. Chances are, its doing a better job at optimizing our code than we ever could. Compilers pull off all kinds of tricks that most developers don't even realize are possible or necessary at all.

That's not to say that we can always trust compilers to always do the right thing, nor that we should expect a compiler to fix badly written code, nor that compilers can't use a hint here and there. These things are all exceptional cases, though, not the norm. If we as developers think we have found one of those exceptional cases, we should actually check and make sure, through the use of proper profiling and performance monitoring tools. Once we've found and supposedly fixed a real performance problem, we should always rerun our benchmarks and make sure we're getting a real improvement. Most importantly, if the optimization at all made the code more obtuse but did not yield a meaningful performance improvement, we must be willing to revert the optimization -- throwing away all that work, or at least burying it in an SCM branch or history -- and leave the code in a simpler and easier to maintain state.

There is a deeper problem with the performance-oriented mentality of many game developers. Focusing on performance results in a lack of focus on anything else... and just about everything else is more important than performance. Stability and correctness, for example. All developers are aware of the dangers of premature optimization, and yet all too many game developers -- especially students -- fall prey to it just the same. We've all heard stories of the developers who optimized some critical piece of a project early on in development, only to find out that a modification is needed to the component's behavior at a later date which is nearly impossible to make because of how convoluted and unreadable the optimized code is, and the project ends up being delayed or canceled as a result.

Performance obsession is harmful even without such an extreme disaster, though. The simple fact is that every minute spent on optimizing some piece of code that doesn't really, truly, honestly need it is a minute not spent on making the game actually better. Even as an engine developer there are more useful ways to spend development time than optimizing code that's already performing well enough. Increased designer-friendly debugging and editing tools, easier and more flexible content pipelines, more stable physics engines, bug fixes, useful new features, or even just documentation writing. We as developers sometimes fall into the habit of thinking that "faster" is what our designers and users want, but in reality the users want "funner" and that requires giving the designers what they want, which is "easier and more powerful." There are times that optimization really is what the designer needs -- maybe he wants to fit more enemies into a particular scene than the engine can currently handle on mainstream hardware -- but often than not he's going to want more powerful AI debugging tools, in-game editor support, improved and easier scripting and actor logic facilities, or better content pipeline tools to improve his content turn-around time.

Too often I've seen both professionals and students spend quite a large amount of time worrying about how to best squeeze the absolute best performance out of some piece of code that isn't even part of the hot path of the game engine. Is a C string more efficient than a C++ std::string? Maybe. Does it matter for the piece of code in question? Maybe. Does it matter enough to be worth giving up the safety and ease of using a std::string? Overwhelming odds are that no, it does not. We have to train ourselves not to think about such things. They're not useful thoughts, especially not in the early stages of a project. Picking an O(1) algorithm over an O(N^2) algorithm is a useful way to spend our planning time early on in a project. Avoiding branches in code because we heard that CPU branch mis-prediction slaughters performance, however, is just a total waste of time and only results in uglier, harder to understand, and harder to extend code, which is the last thing we want early on in a project when we know we're almost certainly going to have to revisit, reread, and modify much of what we write at some later point in the project.

A perfect personal anecdote from today is that one of my teammates was curious as to whether or not using a std::vector in a particular hot path would be too inefficient compared to using a static array. The first problem with his query is that two other options weren't even listed as potential solutions, illustrating the fact that many developers aren't aware of all the optimization techniques available, much less how to pick between them. One other option would be to use a dynamically-sized array using the C++ new[] operator or C's malloc(), which is an option in this case because the size of the data is not expected to change after it is allocated. However, all that really does over using the STL vector is shave off the storage for the vector's capacity, which is a small gain but is not yet worth it, especially given that we very well might need to be able to resize the array later on during development. A fourth option that is actually likely to provide real performance improvements (today, at least) is to allocate the extra storage space when allocating the C++ object itself using an overridden operator new, improving data locality without wasting a lot of extra cache space like a statically-sized array.

The second problem with the question is that time was spent thinking about these issues instead of just getting the code written and running. Time spent worrying about how best to pre-optimize a key game component that doesn't yet exist -- and which other developers need to have written before they can work on their own components -- is time that is quite literally wasted. In this particular case it was only a matter of about 15 minutes, but in many cases it turns into hours, days, or weeks spent trying to optimize code that doesn't even exist yet and causing delays for the rest of the project team.

Optimization is evil. It tempts us all. It's not a forbidden fruit in the garden of creation. It's a fruit that developers are actually told to take, and yet it brings ruin just the same. None of us know jack about optimization, and what little we do know is totally wrong. The best thing we can do as game developers is to ignore the temptation to make code fast and focus instead on making our code better.