A colleague recently hooked up the free-as-in-beer memory profiler MTuner by Milos Tosic to our engine. Even without the full integration of the SDK, we started getting highly quality and useful information within a couple hours.
MTuner's site explains its individual features in great detail, so I'll only summarize a few here. It's main feature is that it tracks every allocation along with the call stack that made the allocation, allowing developers to pinpoint where allocations are being made. It also has some excellent visualizers and call graph tree to find out where memory is going or which pieces of code are making the most allocations.
It works automatically for apps that use the default system allocators and also provides an SDK for hooking into custom allocators. The SDK also allows tagging regions of code which feeds into the debugger, e.g. to group all Physics allocations or to insert events like Level Loaded into the memory event stream. We haven't really looked at that part yet, but it seems cool.
Note that the tool itself only runs on 64-bit Windows hosts. The memory capture support claims to also support PS3 and PS4; I'd imagine it could support Linux or OSX as well, though no such support is claimed and I haven't tried it myself.