OGRE uses strings a lot (maybe a little too much?) The specific uses I am talking about here and want to optimize away are the names it uses both externally and sometimes internally as handles to create, look up and access stuff (like scene nodes, animation states, etc.)
Obviously, the use of strings here has a lot of benefits: they are convenient to work with, easy to generate uniquely and then remember, easy to read and grok in debugging sessions and logs and the like. But they have downsides as well: they are slow to compare and look up (relative to integers,) they need more memory (the empty std::string or std::wstring object is ~24 bytes or more, depending on the platform and STL implementation) and they (in most implementations) need at least one extra memory allocation to store the actual characters themselves.
The particular use I'm talking about here (use of strings as handles) is inherently restricted in certain ways which make them amenable to certain transparent optimizations. These strings are almost never "processed" like usual strings (one doesn't usually try to match and replace a regular expression in these strings; at least not in a game runtime code) and they can be viewed as immutable for most purposes. In short, we create these strings, use them in look-ups (in BSTs or hash tables) and generally treat them as keys but almost never mess with them like strings.
There is an optimization pattern called "flyweight" that can be applied here (note that I'm just referencing a known pattern to give some credibility to my argument; I'm not one of those programmers obsessed with OO and patterns!) What I'm proposing is something like this:
- We define a very lightweight StringHandle class that contains very little data (only a single 32-bit handle or index or hash value which I'll describe shortly.) This class has all the behaviors of a string; it has all the proper operators and member functions. But the actual characters are stored elsewhere.
- The character data for the strings are stored in a StringTable class, which is a singleton, created very early by Ogre::Root. This StringTable stores the strings themselves, along with other information (e.g. a reference count,) that the index (or handle or hash value) inside each StringHandle actually refers to.
- Each string is stored only once in the StringTable, and the index value of duplicates will obviously be the same. This means that to compare (for equality) two strings via their StringHandle objects, one only needs to compare the index values (32-bit integers) inside the StringHandles.
- Internally, the StringTable can use a hash table to store strings and then the value inside StringHandle objects will be hash values for the strings. It can use a BST, a digital trie or much more complex data structures for better performance, better memory usage, etc.
- Because StringHandles are small, storing them and passing them around inside various objects in OGRE itself and in user programs will be faster and more memory-efficient than duplicating the strings.
- Because a StringHandle behaves essentially the same as a string, user code won't be affected much (or none at all.) With proper cast and assignment operators, even user code that uses plane old std::string or char* can work unaffected (with only a recompile.)
- Use cases like logging or reading from/writing to a stream will read/write StringHandles as strings of characters. This of course helps keep file formats the same and log/script files readable, without need for serializing and deserializing the string table.
- The only downside I can think of (aside from the work required to implement this!) is that debugging OGRE code in a debugger will be a little harder because string values used for names of stuff won't be as readily decipherable in the debugger. However, some debuggers (e.g. Microsoft Visual Studio) have support for special handling based on data type which we can employ and write very simple extensions to look up StringHandles from the StringTable and display them as character strings even in the debugger.
Of course, this is not a new or even uncommon technique; many (most?) game engines use this internally and some even expose their users to it. Again, I must emphasize that the impact of implementing this in OGRE for the user code will be minimal; users can take advantage of this in their own code (they just use Ogre::StringHandle instead of Ogre::String for some of their data types) but if they don't, because of the nice conversions and casts implemented for StringHandle, they don't notice that anything has changed; they just don't receive some of the performance benefits.
Another nice thing about adopting this strategy is rather easy migration of OGRE's code base to adopt it. We start by implementing the StringTable singleton and the StringHandle class, and then slowly convert each OGRE subsystem and module to use StringHandle instead of String where appropriate. Even after this strategy is implemented, the only thing needed to revert the whole system's behavior to the way it is now is typedefing StringHandle to String; all should be back to normal then (if care is taken in implementing StringHandle.)
Over time, the internal representation and handling of strings inside StringTable can be optimized and better (and more complex) data structures used to make this scheme more efficient and therefore more beneficial. The interface will remain the same and no further OGRE-wide code changes will be needed (or very little.)
Now, I have a question for the OGRE community: is this worth doing at all?! Do you see any problems/downsides that I have missed? Note that what I'm suggesting is purely hypothetical. I have not done any work in this area on OGRE (although I have implemented and used this scheme successfully in my game engines and other projects, both hobbies and commercial projects.)
I'd greatly appreciate any thoughts, advice, help, ridicule, etc.