Sean's Coding Journal Ramblings of Sean Middleditch, Game Developer and Student

13Jan/121

C++ Metadata – Part I, Singletons and Lookup

Class metadata systems allow C++ applications to have a sizable fraction of the runtime reflection and introspection available in other high level languages, such as C#, Python, or Java. While C++ does not offer any true metadata system itself (excluding the nearly useless typeinfo/RTTI system, which is barely enough to handle dynamic_cast<>'s needs), it's certainly possible to build a system that gets the job done and maintains sufficient levels of performance and ease of use. Given what a huge topic this is, I'm going to break this article up into a series. Today I'm going to talk about the singleton patterns most useful for a metadata system, as well as go over some introductory topics and alternatives.

Note: I consider this article to be highly out-of-date. It is a good first cut at a metadata system, but a much more complete and modern version of this article needs to be written.

Why Metadata

A question I've been asked several times is, "why would anyone even want a system like this?" Right after that is usually, "won't this be really slow, or waste a lot of memory, or just be more costly than a less flexible solution?" I often equate these questions to the old, "aren't virtual functions too slow for games?" and the somewhat old, "aren't components and data-driven design too inefficient to make up for their added flexibility?"

The Why of metadata is simply this: metadata allows for runtime data-driven systems in a game engine to work with a minimum of extra maintenance or bugs. Metadata systems allow for the creation of simple factories with almost no extra code. They make serialization of game objects trivial. They allow for property editing in a game editor window or debug toolbar to be very simple. Metadata allows for enhanced debugging output. Metadata helps with documenting objects. Metadata can even be used for script binding.

This all stems from the basic notion of what metadata is in a programming language. Metadata is simply the addition of information to the base type system. As a very simple example, take the name of a class. In C++, at runtime, there is normally no way to ask a question like, "what is the name of the class instance referenced by my Foo* variable?" (Yes, C++'s RTTI and its typeid and typeinfo provide a name but it's not usually the name you'd expect or want.) If it weren't for dynamic polymorphism, the answer might well just be "Foo" but it's just not that simple for us. When we start using templates and metaprogramming, we likewise will not know the real type of an object by simply looking at the declaration for said object in the code.

With a proper metadata system, we have the ability to get an object that represents a class (or even a primitive type). We can have something similar to metaprogramming type traits except with even more information and which is available at runtime. We'll be able to write code like the following:

// very simplistic deserialization of game object components
IComponent* deserialize_component(std::istream& input)
{
  std::string name;

  input >> name;
  const Metadata* meta = MetaManager::lookup(name);

  IComponent* component = meta->createInstance(g_ComponentAllocator);

  while (input)
  {
    input >> name;
    const Property* prop = meta->getProperty(name);

    prop->deserialize(component, input);
  }

  return component;
}

As per the example, by far the biggest use I'm getting out of metadata in my current project is the data-driven runtime composition of game objects, which allows us to highly customize game entities in very intricate ways with just a simple text file. We're also using it for our configuration system, some great debugging macros, and our editor is starting to make use of it for property editing as well.

Touching briefly on the questions above about performance, it's worth noting that the system I'll be describing is highly efficient in terms of both CPU time and memory. It's possible to get even more efficiency at the cost of some API cleanliness. I'm going to describe the easier to use approach, but I'll note the areas where things could be changed to maximize efficiency. In my projects I lean towards a middle of the road approach, which has an API slightly more cumbersome than what I describe here, but which has a bit more efficiency.

Metadata Class

The first thing we'll need is an actual class for our metadata objects. (Yes, by the end of this, our metadata system will be able to describe itself.) Nothing particularly complicated is needed, but we're going to keep things extra simple for now. Our metadata system will have just enough information to print out the type and size of an object. Further articles in the series will expand its capabilities.

class Metadata
{
public:
  Metadata(const char* name, size_t size) : m_Name(name), m_Size(size) {}

  const char* name() const { return m_Name; }
  size_t size() const { return m_Size; }

private:
  const char* m_Name;
  size_t m_Size;
};

Metadata Registry

One of the core features of a metadata system is ability to find the metadata for a class by name without needing an instance of a class first. This is useful for factory systems, for example.

The idea then is that there should be another global system that manages all Metadata objects. There are a lot of ways to make this work, ranging from the simple but inefficient to the highly complex but (practically) zero-cost. I'm going to go over the simple approach.

At the most basic level, I just need to make a class to manage Metadata objects, and then have Metadata objects register themselves with that class. That class is going to only use static methods and members (and hence could just be a few C-style functions and globals, if you wish) in order to avoid order-of-initialization problems.

class MetaManager
{
public:
  // add a new Metadata instance to the manager
  static void registerMeta(const Metadata* meta)
  {
    MetaMap& metas = getMetas();
    metas[meta->name()] = meta;
  }

  // find an instance of a Metadata object by name
  static const Metadata* get(const char* name)
  {
    const MetaMap& metas = getMetas();
    MetaMap::const_iterator meta = metas.find(name);
    return meta == metas.end() ? NULL : meta->second;
  }

private:
  typedef std::map<std::string, const Metadata*> MetaMap;

  // safe and easy singleton for our std::map of Metadata objects
  static MetaMap& getMetas()
  {
    static MetaMap metas;
    return metas;
  }
};

There's not much more that would ever be needed than this. However, it's not all that efficient to use std::string or std::map for this system. Especially for memory-constrained environments, it can be quite worthwhile to avoid these.

Avoiding the use of std::string is the easiest. There's no reason to make a copy of the Metadata's name rather than just using the C-style character pointer. It will be necessary to specify a custom comparator for the std::map instance, but that should hopefully be a trivial exercise.

Avoiding the std::map altogether is a bit trickier. We want to keep efficient lookup of Metadata objects (no worse than O(logN), which means we're going to want to keep them in a well-behaved tree. The approach I've used is to actually write my own tree where the Metadata objects are also the nodes themselves; that is, the Metadata class has the child pointers and such necessary to implement a binary tree (or a red-black tree, or so on). The consequence is a lot of extra code since std::map can't just be reused, but there are absolutely zero runtime memory allocations required for the metadata system, the memory cost of metadata is as good as or better than std::map, and the runtime complexity is the same.

If compile-time string hashing was feasible in Visual C++, it would also be an option to use a fixed-size hash table with our choice of "perfect" hash, but sadly that won't be an option until at least Visual Studio 11 SP1 or so (whenever Microsoft finally gets us constexpr support like GCC and Clang already have).

If you do implement your own tree, I recommend doing it as a template mixin, as it will be useful later on for attaching properties to metadata, which has the same set of problems (and which is just another std::map member in the Metadata class if you go the simpler route).

To finish off the registration, the constructor for Metadata should call MetaManager::registerMeta(this), to register itself with the manager.

Name-Based Singletons

Now that we have a Metadata class, the question becomes, "how do I create an instance of the Metadata object for each class?" There's a lot of ways to do this, I've used them all and seen most of them used in other engines, and they each have advantages and disadvantages. Let's start with the simpler methods and build up to the one I feel is best.

The first approach is to simply make a global variable for each class. For example, if I had a class named GameObject, then I might have a global variable named g_MetaGameObject which is an instance of Metadata. That simple. A couple macros make it easy to declare, define, and reference these globals.

#define DECLARE_META(metatype) extern Metadata g_Meta ## metatype;
#define DEFINE_META(metatype) Metadata g_Meta ## metatype(#metatype, sieof(metatype));
#define META_TYPE(metatype) (&g_Meta ## metatype)
#define META(object) ((object)->getMetadata())

Pretty darned simple. Except for that last macro. That one is a problem, and is one of the bigger reasons why I don't recommend this approach. It turns out that to actually get the metadata for a particular object, that object is going to have to have a method like getMetadata(). Required and mandatory. There is no need for a common base class, just that every class using metadata has a method named getMetadata which requires a pointer to the Metadata for the class.

class Foo
{
public:
  const Metadata* getMetadata() const { return META_TYPE(Foo); }
};

This can be simplified ever so slightly with the addition of another macro. While this method does not need to be virtual for classes that won't be derived from, it will need to be virtual for classes that have children so that the META macro works as expected when called on a base pointer. I'll keep it simple and just show the version of the macro for dynamic classes.

#define DYNAMIC_META(metatype) public: const Metadata* getMetadata() const { return META_TYPE(metatype); }

class Foo {
  DYNAMIC_META(Foo);
};

There are three big problems with this approach. First, there is the requirement for the getMetadata() method. While the other approaches require that for dynamically polymorphic objects (there's only one workaround there, but it has a cost), it should not be necessary to modify or extend a class just to participate in metadata. In particular, it's impossible to add metadata to a class that cannot be modified, such as a third party library, as well as to primitive types.

The second big problem is that this approach does not work inside templates, as the compile-time metadata lookup requires knowledge of the type's name (in order to construct the name of the appropriate metadata global) which the template does not have.

The third big problem is that types with namespaces or template parameters cannot have metadata, as there's no way to construct a name for these objects. Passing a class identifier like MyNamespace::MyClass to any of the above macros will produce invalid code. The same happens if trying to pass a type like MyTemplate.

Templated-Based Singletons

The second and final approach I'll cover is to use templates to define and find metadata. The idea here is that a templated type can have static methods and even static members. By the same rules that all other static members and static local variables operate, a statically allocated object in a template will exist once and once only in the problem (the One Definition Rule guarantees this).

It's thus possible to create a new simple templated type and a new set of macros for creating metadata. This template removes several of the disadvantages of the previous approach. First, it allows to lookup the metadata for any type based on the compiler's knowledge of the type rather than the type's name, so it works in templates. Second, as there is no need to construct a valid global identifier for each class, it trivially supports classes in namespaces or with template parameters, and it can be used without needing to extend a class, and it can even be used with primitive types like int or float. Finally, a bit of template metaprogramming we'll go over lately will make the requirement of a getMetadata() method on objects go away in every case where dynamic polymorphism isn't an issue.

template <typename MetaType>
class MetaSingleton
{
public:
  static const Metadata* get() { return &s_Meta; }

private:
  static Metadata s_Meta;
};

#define DEFINE_META(metatype) Metadata MetaSingleton<metatype>::s_Meta(#metatype, sizeof(metatype));
#define META_TYPE(metatype) (MetaSingleton<metatype>::get())
#define META(object) (MetaSingleton<decltype(object)>::get())

That last macro probably requires some explanation. If we have an object, there is a new C++11 feature called decltype that allows us to get the type of that object. This may sound redundant (if we have an object, we must have declared its type somewhere, after all, so clearly we already know what it is), but it's far more convenient to just pass in the object's name to a macro than to duplicate the type, especially for some of those long and ugly templated names.

There is one downside to this implementation of the META macro: it does not work with dynamic polymorphism at all. That is, if we have a pointer to MyBase but the instance it references is actually MyDerived, the Metadata instance returned by META will be that of MyBase. That's no good. We'll fix this later on in the article.

You may have noticed that I implemented the singleton Metadata in MetaSingleton differently than I did in MetaManager. There are two reasons for that. Static local variables as found in the MetaManager::getMetas() method are usually implemented internally by having a hidden global boolean for each static local and checking if that boolean has been toggled to true or not ever time the function is called. Globals of primitive types like bool are always zero initialized before any code runs, so that hidden boolean is always guaranteed to be false the first time the getMetas() method is called, even if it's called from some random constructor in a global object. In that first call, because the boolean is false the object is initialized and then the boolean is set to true, thereby guaranteeing that the static local object is initialized only once and initialized only when first used. However, that boolean is checked every time the method is called. While metadata objects are not generally accessed in any performance-sensitive part of a game, I did want to illustrate that raw bare-metal efficiency is possible even with a complex metadata system for the non-believers.

The second reason to use the static member is that it allows (requires, in fact) the initialization of that member to be explicit and in a .cpp file. Right now we need a macro during initialization in order to set the class name in the Metadata object. We're also going to want that explicit initialization for a later article when I show how to add properties to objects and do other advanced runtime initialization.

Robust Compile-Time Lookup via Partial Template Specialization

We've got a major problem in our template approach right not. Whenever we access MetaSingleton<>, we're passing in some qualified type name. Because of how templates work, MetaSingleton and MetaSingleton and MetaSingleton are all different, and hence will have their own static Metadata objects. We really don't want that.

There's two approaches to fixing this. The first is to make use of special metaprogramming templates like std::remove_const<>, std::remove_reference<>, and so on. They're pretty easy to use, but they're difficult to use just right in a case as complex as this one. If we have some qualified type like "MyType const* const&" then we're going to have to nest those std::remove* uses pretty deeply to make sure it gets simplified down to the unqualified "MyType". Thankfully, partial template specialization is much easier to use and solves all our problems here.

Partial template specialization allows us to make alternative versions of MetaSingleton that just reference the unqualified version. The end result is that no matter what we pass in as the template parameter, only the unqualified version's static Metadata instance will ever be referenced. This isn't hard to do, it just requires some mostly redundant typing.

template <typename MetaType>
class MetaSingleton<const MetaType> : public MetaSingleton<MetaType> {};

template <typename MetaType>
class MetaSingleton<MetaType&> : public MetaSingleton<MetaType> {};

template <typename MetaType>
class MetaSingleton<const MetaType&> : public MetaSingleton<MetaType> {};

template <typename MetaType>
class MetaSingleton<MetaType&&> : public MetaSingleton<MetaType> {};

template <typename MetaType>
class MetaSingleton<MetaType*> : public MetaSingleton<MetaType> {};

template <typename MetaType>
class MetaSingleton<const MetaType*> : public MetaSingleton<MetaType> {};

Note that the const versions of reference and pointer specializations are required.

Also, as a quiz for the reader: for completeness' sake, is a specialization for "volatile" necessary? What about "static" and "register"? (Hint: what is the difference in C++ between a type qualifier and a storage class, and which keywords fall into those categories, and why?)

Supporting Polymorphic Classes

Our META macro still does not support dynamic polymorphism. We need some way to get the Metadata instance for the type of actual object we're interested in, independent of whether or not we're accessing it by a pointer or reference to a base type.

This unfortunately goes back to needing a virtual getMetadata() method on our objects. Fortunately, with the template approach, this method is only needed when dynamic polymorphism is even an issue. Also, the template approach allows the use of a template mixin for creating the virtual method, which may be considered easier or prettier than the macro we used before.

template <typename MetaType, typename BaseType>
struct MetaMixin : public BaseType
{
  virtual const Metadata* getMetadata() const { return MetaSingleton<MetaType>::get(); }
};

Using this is just a simple application of CRTP (the Curiously Recursive Template Pattern).

class DerivedType : public MetaMixin<MyClass, BaseType>
{
  /// ...
};

It is arguable if this is prettier or not. I'm not particularly fond of needing to make the "real" base type be a template parameter to the mixin template, but there's no way around it. Conversely, the macro variant is:

#define DYNAMIC_META(metatype) \
  public: virtual const Metadata* getMetadata() const \
  { return MetaSingleton<metatype>::get(); }


class DerivedType : public BaseType
{
  DERIVED_META(DerivedType);

  /// ...
};

Either way works perfectly fine (and should generate identical class layouts and runtime code), so use whatever you feel looks best or is easier to understand. If you make use of C++'s builtin RTTI, the template mixin approach may bloat your RTTI tables a bit, so that may be a deciding factor to push you towards the macro approach.

Simply having this method where it's needed is only part of the solution. We can now safely call this method where it needs to be called, but wouldn't it be nice if we could just keep using that META macro, and have it automatically call getMetadata() if it exists or use the decltype trick otherwise?

Template Metaprogramming for Metadata Lookup

Template metaprogramming offers us the ability to selectively call the getMetadata() method. It's an ugly, horrific, mind-bending trick, but a working one none-the-less. Another article in the series is going to be making heavy use of metaprogramming to extend the capabilities of the Metadata object without requiring excessive manual intervention, so we're going to have to cover this topic no matter what. Time to rip off the bandaid. In fact, let's just look at the code.

template <typename MetaType>
struct MetaIsDynamic
{
private:
  struct no_return {};
  template <typename U> static char check(decltype(static_cast<U*>(0)->getMetadata())*);
  template <typename U> static no_return check(...);

public:
  static const bool value = !std::is_same<no_return, decltype(check<MetaType>(0))>::value;
};

Yeah, that's pretty ugly. It's not even particularly obvious. There's an even uglier way to do this, and there's a lot of ways that one would think should work but won't (with Visual Studio, at least).

I'm not going to explain metaprogramming's core tricks here. If you're unfamiliar with template metaprogramming in general, I recommend the book Modern C++ Design. The topic is big enough that I simply can't do it justice here, at least not without giving it an entire large article all to itself.

Let's look back to MetaIsDynamic. We have a boolean result, defined as:

static const bool value = !std::is_same<no_return, decltype(check<MetaType>(0))>::value;

The tricky part is all the mess with std::is_same, no_return, decltype, and check(0). Like I said above, we're using type comparisons internally. That use of the check(0) will evaluate to a function call which will have one of two different return types; if MetaType does NOT have a getMetadata() method, then the return type of the check<>() function will be the struct no_return defined in MetaIsDynamic, and if MetaType DOES have a getMetadata() method, then the return will be something other than struct no_return. I'm checking the return type by using decltype and the std::is_same template. The whole line can be read as "boolean 'value' is false if no_return is the same type as the return value of 'check(0)', and true otherwise."

The big question then is how the return type of check<>() is chosen. This is done using the template behavior called Substitution Failure Is Not An Error (SFINAE). We can thus arrange for there to be two templated versions of a function that are non-ambiguous (because that would be an error, SFINAE or no) and where one will be invalid on some condition we specify (such as whether the templated type has a specific method or not).

To use that trick, the MetaIsDynamic struct has two versions of the check<>() function.

template <typename U> static char check(decltype(static_cast<U*>(0)->getMetadata())*);
template <typename U> static no_return check(...);

Let's look at the first version of check<>(). It returns a char (our "true" value) and takes... some kind of mess. The parameter type is definitely a beast, but it's not too bad at all once it's broken down to its component parts. The parameter is a pointer, as can be seen by the asterisk at the end. It's a pointer to decltype of an expression, which means that it's a pointer to whatever type that expression evaluates to. That expression is the result of calling getMetadata() on some pointer cast, so the decltype evaluates to whatever the return type of getMetadata() is. The pointer cast is a cast from 0 (a.k.a. NULL) to the template type U, which is just a cheap way of constructing a "random" pointer to call a method on, because we need an expression using a method and that requires an object. Since decltype only looks at the type of an expression and never actually evaluates the expression, the pointer we use does not need to be a valid object, so casting NULL to our desired type is sufficient. Thus the whole parameter can be read as "a pointer to the return type of U::getMetadata()".

One might ask "but what if type U doesn't have a getMetadata() method?" Well, that's the exact question that the compiler has to answer when it evaluates this declaration of check<>(). Because that expression is used to construct the parameter type, and since the parameter type must be a real valid type or else there is a substitution failure, this is where SFINAE kicks in. If typename U (which is always the same type as MetaType) does not have a getMetadata() method then this declaration of function check<>() is simply thrown away and ignored by the SFINAE rules, with no errors. Which is exactly what we want.

The second version of check<>() is simply the fallback. Its parameter is an ellipses because of the overload resolution precedence in C++, allowing it to accept a pointer argument but also ensuring that the first version of check<>() is always unambiguously chosen if it exists.

The one fault of the code as presented is that it does not in any way guarantee that the getMetadata() on the type actually returns pointer to Metadata. While it's possible with a bit more effort to check that as well, I'd argue that there is little need; just don't do anything silly like adding methods called "getMetadata" that don't get Metadata and there won't be a problem.

Actually using this new MetaIsDynamic metaprogramming struct takes just a bit more work. A new set of template specializations is needed, then it's all done.

template <typename MetaType>
struct MetaLookup
{
  template <typename U>
  static typename std::enable_if<MetaIsDynamic<U>::value, const Metadata*>::type resolve(const U& obj)
  {
    return obj.getMetadata();
  }

  template <typename U>
  static typename std::enable_if<!MetaIsDynamic<U>::value, const Metadata*>::type resolve(const U&)
  {
    return MetaSingleton<U>::get();
  }

  static const Metadata* get(const MetaType& obj) { return resolve<MetaType>(obj); }
};

template <typename MetaType>
struct MetaLookup<MetaType*>
{
  static const Metadata* get(const MetaType* obj) { return MetaLookup<MetaLookup>::get(obj); }
};

template <typename MetaType>
struct MetaLookup<const MetaType*> : public MetaLookup<MetaType*> {};

There's a tad bit more metaprogramming in there via the use of std::enable_if. That is another neat little trick relying on SFINAE, which allows for a version of a function to exist based on some boolean constant. A boolean constant like the one produced by our MetaIsDynamic. One version of the resolve<>() function will invoke the getMetadata() method and the other just uses MetaSingleton directly.

Now we just need a new version of the META macro to make it all nice and easy to use.

#define META(obj) (MetaLookup<decltype(obj)>::get((obj)))

Note again that decltype does not actually evaluate its arguments, so the repeated use of the 'obj' macro parameter does not run afoul of the usual double expansion problems of preprocessor macros.

It's worth noting that when compiled with optimizations on, all of this metadata lookup code will compile away to either a trivial address lookup of a global object or (for classes that have a virtual getMetadata() method) a single virtual call to a function that simply returns the address of a global object. Stepping through the code in debug builds can be a little scary, though, as it may seem like there are hobajillion recursive calls (especially when those lookup templates get hit), but don't be fooled by how dumb a compiler is with all of its optimizations turned off.

Usage Examples

Keeping in mind that the Metadata class so far doesn't do much, here's a few examples of how the system can be used. There was a lot of innards and implementation details, but at the end of the day it's nice to see what all that work can do.

class MyClass { /* ... */ };

DEFINE_META(MyClass);
DEFINE_META(int);
DEFINE_META(float);
DEFINE_META(std::string);

void print_size_of(const char* class_name)
{
  const Metadata* meta = MetaManager::get(class_name);
  if (meta != NULL)
    std::cout << "Size of " << meta->name() << " is " << meta->size() << std::endl;
  else
    std::cout << "Size of " << class_name << " is unknown" << std::endl;
}

template <typename Type>
void print_name_of(const Type& object)
{
  const Metadata* meta = META(object);
  std::cout << "Object is a " << meta->name() << std::endl;
}

int main()
{
  MyClass foo;
  int bar;
  float baz;

  print_size_of("MyClass"); // Size of MyClass is 1
  print_size_of("std::string"); // Size of std::string is 24
  print_size_of("bool"); // Size of bool is unknown
  print_name_of(foo); // Object is a MyClass
  print_name_of(bar); // Object is a int
  print_name_of(baz); // Object is a float
}

Note you may get different sizes output depending on your compiler and standard library vendor.

The Future

In following articles in the series, I'll go over making the Metadata class more useful. In particular, we'll be adding factory methods including cloning, simple inheritance checks (think dynamic_cast<> that doesn't require instances of objects), additional class attributes like alignment or an is_abstract flag, and named/typed class properties with full getter/setter support. I may even go over method binding for script language binding generation if I have the time (it's a big topic).

Comments (1) Trackbacks (2)
  1. it maybe cool if this metadata system support containers such as vectors, list and so on.
    because in many projects, serialize && de-serialize containers is necessary and important.


Leave a comment

Recent Articles

Categories

Archives