There are some articles around the Net these days detailing wishlists for OpenGL 4.2. Two of the more informed ones can be found here and here. Between those two, the most interesting and common requests seem to come around DSA (direct state access) and better bind-by-semantic behavior for shaders. I'm going to lean a bit more radically and propose an improved version of both that goes a bit beyond what the current proposals detail but which is still quite within the realm of possibility.
Just a few years ago, we had it easy. Optimizing a game engine was all about low-level code. Machine optimizations. Instruction counts. Black magic mathematics. As technology rolled on, however, things got more complicated. CPUs got faster while memory stayed slow, invalidating many optimization techniques relying on tables and pre-computed values while new optimizations involving memory access patterns became critical to high performance applications. Deep CPU pipelines meant that branching into different code paths optimized for highly specific cases could be slower than a single generic code path that had no branches at all. The raw throughput of the modern CPU simply made many optimizations unnecessary while the increasing latency in pathological cases made yesterday's unnecessary optimization's into today's top priority hot spots. Then the multi-core CPUs fell into consumers' hands. At first it was the dual-core CPUs, many of which were simply two single-core CPUs bundled onto a single die. Then the quad-cores, and the hexa-cores, and now we've even got duodeci-cores. We're in a world where much if not most of our code could be rewritten as highly specialized math kernels and executed in blindingly-fast speeds on a GPU and yet most programmers still aren't sure how to make the most of the dual-core CPUs that are half a decade old by now.