Direct3D 11 has some handy debug APIs provided that make tracking down problems much easier. Unfortunately, while the API reference is rather complete and quite useful, I haven't been able to find any good tutorials on actually using them in real code. The following are a few tips on using the API interfaces.
I used to find myself needing to do a quick refresher on the different matrix notations and usage patterns in computer graphics every time I sat down to do any 3D math. I'm admittedly not a graphics expert and it took me a long while to beat this information into my head. There are a lot of articles online on the topic, but I found most of them to be incomplete or poorly written. In the interest of saving others some headaches, I decided to write up my own explanation. If you get confused by the different notations for matrices, the "right-handed" vs "left-handed" coordinate systems, pre- vs post-multiplication, or the differences between row-major and column-major matrices, read on.
There are some articles around the Net these days detailing wishlists for OpenGL 4.2. Two of the more informed ones can be found here and here. Between those two, the most interesting and common requests seem to come around DSA (direct state access) and better bind-by-semantic behavior for shaders. I'm going to lean a bit more radically and propose an improved version of both that goes a bit beyond what the current proposals detail but which is still quite within the realm of possibility.
Just a few years ago, we had it easy. Optimizing a game engine was all about low-level code. Machine optimizations. Instruction counts. Black magic mathematics. As technology rolled on, however, things got more complicated. CPUs got faster while memory stayed slow, invalidating many optimization techniques relying on tables and pre-computed values while new optimizations involving memory access patterns became critical to high performance applications. Deep CPU pipelines meant that branching into different code paths optimized for highly specific cases could be slower than a single generic code path that had no branches at all. The raw throughput of the modern CPU simply made many optimizations unnecessary while the increasing latency in pathological cases made yesterday's unnecessary optimization's into today's top priority hot spots. Then the multi-core CPUs fell into consumers' hands. At first it was the dual-core CPUs, many of which were simply two single-core CPUs bundled onto a single die. Then the quad-cores, and the hexa-cores, and now we've even got duodeci-cores. We're in a world where much if not most of our code could be rewritten as highly specialized math kernels and executed in blindingly-fast speeds on a GPU and yet most programmers still aren't sure how to make the most of the dual-core CPUs that are half a decade old by now.