Optimizing Away the Use of Strings as Handles

Discussion area about developing or extending OGRE, adding plugins for it or building applications on it. No newbie questions please, use the Help forum for that.
User avatar
yzt
Halfling
Posts: 98
Joined: Wed Dec 28, 2005 7:43 am
x 2
Contact:

Optimizing Away the Use of Strings as Handles

Post by yzt »

(I apologize if this has already been discussed elsewhere. I would be happy if someone would point me in the direction of the previous threads.)

OGRE uses strings a lot (maybe a little too much?) The specific uses I am talking about here and want to optimize away are the names it uses both externally and sometimes internally as handles to create, look up and access stuff (like scene nodes, animation states, etc.)

Obviously, the use of strings here has a lot of benefits: they are convenient to work with, easy to generate uniquely and then remember, easy to read and grok in debugging sessions and logs and the like. But they have downsides as well: they are slow to compare and look up (relative to integers,) they need more memory (the empty std::string or std::wstring object is ~24 bytes or more, depending on the platform and STL implementation) and they (in most implementations) need at least one extra memory allocation to store the actual characters themselves.

The particular use I'm talking about here (use of strings as handles) is inherently restricted in certain ways which make them amenable to certain transparent optimizations. These strings are almost never "processed" like usual strings (one doesn't usually try to match and replace a regular expression in these strings; at least not in a game runtime code) and they can be viewed as immutable for most purposes. In short, we create these strings, use them in look-ups (in BSTs or hash tables) and generally treat them as keys but almost never mess with them like strings.

There is an optimization pattern called "flyweight" that can be applied here (note that I'm just referencing a known pattern to give some credibility to my argument; I'm not one of those programmers obsessed with OO and patterns!) What I'm proposing is something like this:
  • We define a very lightweight StringHandle class that contains very little data (only a single 32-bit handle or index or hash value which I'll describe shortly.) This class has all the behaviors of a string; it has all the proper operators and member functions. But the actual characters are stored elsewhere.
  • The character data for the strings are stored in a StringTable class, which is a singleton, created very early by Ogre::Root. This StringTable stores the strings themselves, along with other information (e.g. a reference count,) that the index (or handle or hash value) inside each StringHandle actually refers to.
  • Each string is stored only once in the StringTable, and the index value of duplicates will obviously be the same. This means that to compare (for equality) two strings via their StringHandle objects, one only needs to compare the index values (32-bit integers) inside the StringHandles.
  • Internally, the StringTable can use a hash table to store strings and then the value inside StringHandle objects will be hash values for the strings. It can use a BST, a digital trie or much more complex data structures for better performance, better memory usage, etc.
  • Because StringHandles are small, storing them and passing them around inside various objects in OGRE itself and in user programs will be faster and more memory-efficient than duplicating the strings.
  • Because a StringHandle behaves essentially the same as a string, user code won't be affected much (or none at all.) With proper cast and assignment operators, even user code that uses plane old std::string or char* can work unaffected (with only a recompile.)
  • Use cases like logging or reading from/writing to a stream will read/write StringHandles as strings of characters. This of course helps keep file formats the same and log/script files readable, without need for serializing and deserializing the string table.
  • The only downside I can think of (aside from the work required to implement this!) is that debugging OGRE code in a debugger will be a little harder because string values used for names of stuff won't be as readily decipherable in the debugger. However, some debuggers (e.g. Microsoft Visual Studio) have support for special handling based on data type which we can employ and write very simple extensions to look up StringHandles from the StringTable and display them as character strings even in the debugger.
In this scheme, creating/destroying a string will become slightly slower. Depending on the underlying data structure used in the StringTable, this can be a few operations more than creating a normal string (in case of a simple hash table,) or a little more. Most string operations (e.g. determining the length, converting it to a char*, etc.) will become slower by a factor of O(1), because of the indirection and the work involved in looking them up and inserting them in the StringTable. This again, depends on the underlying data structure and can be as much as a factor of O(log N) where N is the total number of string in StringTable. However, I think that the far more common use cases for this type of strings are storing them as handles and using them as keys to look up the objects they name (scene nodes, animation states, etc.) which become far more efficient under this scheme.

Of course, this is not a new or even uncommon technique; many (most?) game engines use this internally and some even expose their users to it. Again, I must emphasize that the impact of implementing this in OGRE for the user code will be minimal; users can take advantage of this in their own code (they just use Ogre::StringHandle instead of Ogre::String for some of their data types) but if they don't, because of the nice conversions and casts implemented for StringHandle, they don't notice that anything has changed; they just don't receive some of the performance benefits.

Another nice thing about adopting this strategy is rather easy migration of OGRE's code base to adopt it. We start by implementing the StringTable singleton and the StringHandle class, and then slowly convert each OGRE subsystem and module to use StringHandle instead of String where appropriate. Even after this strategy is implemented, the only thing needed to revert the whole system's behavior to the way it is now is typedefing StringHandle to String; all should be back to normal then (if care is taken in implementing StringHandle.)

Over time, the internal representation and handling of strings inside StringTable can be optimized and better (and more complex) data structures used to make this scheme more efficient and therefore more beneficial. The interface will remain the same and no further OGRE-wide code changes will be needed (or very little.)

Now, I have a question for the OGRE community: is this worth doing at all?! Do you see any problems/downsides that I have missed? Note that what I'm suggesting is purely hypothetical. I have not done any work in this area on OGRE (although I have implemented and used this scheme successfully in my game engines and other projects, both hobbies and commercial projects.)

I'd greatly appreciate any thoughts, advice, help, ridicule, etc. :D
Do not meddle in the affairs of Wizards, for they are subtle and quick to anger.
My useless rants
The company I work for (Dead Mage)
The first game we made (Garshasp: The Monster Slayer)
TheSHEEEP
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 972
Joined: Mon Jun 02, 2008 6:52 pm
Location: Berlin
x 65

Re: Optimizing Away the Use of Strings as Handles

Post by TheSHEEEP »

Though I did not look that much into the internals as you probablydid, I pretty much agree with you that Ogre's heavy use of strings can be or already is considered problematic. Which is only logical, I guess. After some time spent on optimization, you just need to see a string and think "Okay, how to get rid of it?" :lol:
So, if we had unlimited resources I'd certainly say "Go for it!", but unless we find someone who wants to tackle exactly that,.... I guess you see the problem. ;)

Before optimizing anything we have to know the actual bottlenecks.

As good as this idea sounds (and IMHO it should have been used from the beginning, but that is an easy thing to say in hindsight), it will not be implemented by the team if the improvement would not be worth it or until someone just burns for the idea :)
Maybe you could experiment with the sources, implement such a StringHandle in Ogre and compare the speed. Or do some profiling to see how much of the time actually goes away for string comparisons. That would be absolutely awesome and helpful, and afaik has not been done yet.
My site! - Have a look :)
Also on Twitter - extra fluffy
User avatar
Wolfmanfx
OGRE Team Member
OGRE Team Member
Posts: 1525
Joined: Fri Feb 03, 2006 10:37 pm
Location: Austria - Leoben
x 99
Contact:

Re: Optimizing Away the Use of Strings as Handles

Post by Wolfmanfx »

I would vote quasi-compile-time-string-hashing so we never store strings internally for comparison or lookup.
TheSHEEEP
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 972
Joined: Mon Jun 02, 2008 6:52 pm
Location: Berlin
x 65

Re: Optimizing Away the Use of Strings as Handles

Post by TheSHEEEP »

We still need to to store all the strings somewhere, though. Sometimes, you just need to get those names for various stuff (logging, etc.).
So, this StringHash from the article could be used, but not as the lone solution for everything that has to do with strings.
Maybe combine that with the string table approach.
My site! - Have a look :)
Also on Twitter - extra fluffy
User avatar
Klaim
Old One
Posts: 2565
Joined: Sun Sep 11, 2005 1:04 am
Location: Paris, France
x 56
Contact:

Re: Optimizing Away the Use of Strings as Handles

Post by Klaim »

I'm not a very experienced Ogre dev but I think it's a good idea. I don't like the naming though.

There were plans to limit the string usage in Ogre at some point but not very drastic (see the 1.7 roadmap in the wiki). Also, I'm using similar techniques with my game entities, which all have an id object which is a 128bit id (UUID in fact), then I have names associated to it somewhere else, as most of the time I don't need the names.


I agree with TheSHEEEP that the strings still need to exists for logging purpose at least. Maybe keeping them could be optional though (and enabled by default). It would allow people not wanting to have them in memory (because of some constraints) to use Ogre without worrying about that.

For checking the bottleneck, it would require that someone with a full application with "classic" usage would check if it's one of the important performance bottleneck related to Ogre, right?
TheSHEEEP
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 972
Joined: Mon Jun 02, 2008 6:52 pm
Location: Berlin
x 65

Re: Optimizing Away the Use of Strings as Handles

Post by TheSHEEEP »

Klaim wrote:For checking the bottleneck, it would require that someone with a full application with "classic" usage would check if it's one of the important performance bottleneck related to Ogre, right?
Well, "full application" is relative ;)
I think a test case with just using a lot of nodes and meshes and animations would be good.
Also, changing some of that during runtime to get a good amount of string comparisons like in a "full application". So, basically, we should have some kind of variable stress test. Probably as a plugin or sample.
In short: A good enough test case application should suffice.

There is that instancing sample with animations, maybe that would work? But then again, it is instancing and therefore already optimized in a way, so probably not representative enough.
My site! - Have a look :)
Also on Twitter - extra fluffy
User avatar
syedhs
Silver Sponsor
Silver Sponsor
Posts: 2703
Joined: Mon Aug 29, 2005 3:24 pm
Location: Kuala Lumpur, Malaysia
x 51

Re: Optimizing Away the Use of Strings as Handles

Post by syedhs »

First, we have to establish the benefits of having this kind of change - and as for me, it is going to improve performance? Because string is such a convenient, we can refer to certain objects using name (string) for an example, so what are we gaining by having this change? Or any other benefits..
A willow deeply scarred, somebody's broken heart
And a washed-out dream
They follow the pattern of the wind, ya' see
Cause they got no place to be
That's why I'm starting with me
User avatar
yzt
Halfling
Posts: 98
Joined: Wed Dec 28, 2005 7:43 am
x 2
Contact:

Re: Optimizing Away the Use of Strings as Handles

Post by yzt »

Thanks for the replies and the enthusiasm!

Actually, I have no hard data on the (lack of) performance of strings. Years ago (when 1.4 was the in the trunk) I did some tests on a "real" application I was developing and found that 5-7 percent of my time (the whole application time) was being spent in string manipulation! However, this would have included OGRE and the application's own stuff (but the application wouldn't have contributed a significant portion of this.) But, after seeing that number, I immediately changed my ways and haven't used strings in working with OGRE ever since! Unfortunately, that code belongs to my previous employer and I don't have access to it now to do more measurements and report. So you have to treat this 5-7 percent as anecdotal at best. As it stands, I don't any nontrivial body of code that uses string names (the naive way) in any meaningful way, but I might try and write a test case.

@TheSHEEP:
I don't know if you have seen Jason Gregory's great book "Game Engine Architecture"? He uses OGRE (to some extent) in that book and also in some university courses he taught and I remember that his most prominent criticism of OGRE was this overuse of strings. I remember because although I shared that view, I was affronted nonetheless, as he was bashing my favorite 3D rendering engine! :D

Anyways, I don't expect any member of the core team to cut away from their own projects and work on my pet peeve! I was just discussing a hypothetical. I might get a chance to work on this, but I have no definite plans right now. I might have some time in the near future though.

The way I see it is that it might not be hugely beneficial to implement this scheme, but it will definitely have some benefits (some memory savings, laying the groundwork for more complex and more efficient solutions in the future, the separation of various string usages that just makes sense, etc.) and no significant drawbacks (because it doesn't affect user code.) So if I (or someone else) actually do this, it won't be a bad thing.

@Wolfmanfx:
I have seen those and have even implemented some variations of those ideas (using macros and template tricks and even tools that replace the strings with their hash values before code compile) but as was pointed out here we do need to store strings somewhere at runtime. Also, in larger applications, there are very few name strings that are hard coded and available at compile time. Most come from level description files or resource files or gameplay scripts or the like.

Ah, and there are features in C++11 that will make implementing this idea a breathe (i.e. user-defined literals and constexprs,) but AFAIK they are not supported in any mainstream compilers (not GCC 4.7 or MSVC11.) I've been waiting for that for a loooong time!

@Klaim:
I would think that using 128-bit GUIDs (or UUIDs or any kind of IDs) might not provide that much of a benefit either space-wise or speed-wise. I would imagine that not much of the names people use for stuff in OGRE are more than 15 characters long (which will make them less than 128 bits.) Of course, there is the overhead of the std::string object itself (24 bytes or more) and the extra allocation(s) it performs to get room for the actual data. But I can see your point of course. 128 bit IDs will have practically no chance of collisions.

It would be easy to customize the size of the index (handle, ID, whatever) stored.

But, what I'm suggesting here is not abolishing strings altogether. I'm just suggesting an under-the-hood, transparent-to-the-user reorganization of some internal OGRE data and that's it. I'm not trying to replace strings with UUIDs. That might look a bit daunting for the newcomers to OGRE. I'm proposing a way to let the rookies work with OGRE in an intuitive and convenient way, and yet implement an optimization that appeals to the pros and the more experienced (and opens the door for their own tampering and improvements.)

Even if we don't actually implement any of the ideas we are discussing here, the act of separating the type name of strings used for names and handles from everything else will be beneficial on its own. It lets people customize that type as they like, whether it is using UUIDs instead of strings, implementing quasi compile-time string hashing or a string table.

@syedhs:
Obviously you are right in demanding the benefits to be established, but again, I have to stress that I myself like the convenience of strings and what I'm suggesting will not change the user code in any way. Of course, after this there will be a different set of recommendations for using strings as names and passing them and receiving them, but that's just if you want better performance. If you continue to do what you do today, your code will probably not have any worse performance, and maybe even be a little bit better.

On the other hand, I'm not pretending that this is going to be a free optimization. Interfaces need to be changed from accepting a const Ogre::String & to accepting a simple StringHandle (or whatever the name becomes.) People need to be educated about this new scheme of treating strings (although I doubt that it would be any problem at all for any experienced programmer; and the less experienced can just ignore all this.) Even that change in all the function prototypes can happen gradually.

So, while I'm not suggesting that we just rush into this headlong (who am I to be suggesting that?!) I'm also saying that this scheme won't have any drawbacks either, because it can be disabled easily with a compiler command-line macro, because the transition to it would be very gradual and very incremental, and because it would be transparent in most cases.
Do not meddle in the affairs of Wizards, for they are subtle and quick to anger.
My useless rants
The company I work for (Dead Mage)
The first game we made (Garshasp: The Monster Slayer)
User avatar
saejox
Goblin
Posts: 260
Joined: Tue Oct 25, 2011 1:07 am
x 36

Re: Optimizing Away the Use of Strings as Handles

Post by saejox »

Every time i create a node with no name, i cringe.
54+ bytes of useless data. (32 wstring + "Unnamed_XXX")

i never used MovableObject::getName or Node::getName once. Ogre::Any is enough for all my needs.
Nimet - Advanced Ogre3D Mesh/dotScene Viewer
asPEEK - Remote Angelscript debugger with html interface
ogreHTML - HTML5 user interfaces in Ogre
User avatar
yzt
Halfling
Posts: 98
Joined: Wed Dec 28, 2005 7:43 am
x 2
Contact:

Re: Optimizing Away the Use of Strings as Handles

Post by yzt »

saejox wrote:Every time i create a node with no name, i cringe.
Me too, brother. Me too... (and you forgot the memory allocation overhead of 8-16 bytes.)

By the way, the scheme I'm outlined above will lend itself to alternative string implementations more easily. Since you know your strings will be immutable (and mutating them will need some work anyways to put the new string in its correct place in the table) you may even store them as char pointers (or wchar_t pointers, if that's what you really want for your scene node names! Or even UTF-8 characters.) Goodbye std::wstring overhead!

Another optimization just occurred to me. Since it is quite common to not specify a name for things in OGRE (specially scene nodes) in which case OGRE will automagically generates one for you, we can have a special StringHandle value that indicates that this object has no name and it needs to be auto-generated. Every time you need the name (which is unlikely since you have specifically said you didn't care about the name) it will be auto-generated using the proper algorithm (sensitive to object type) and be given to you but not stored. This means that you'll only waste 4 bytes instead of tens of bytes per object; and there will be no superfluous memory allocation for a small block of heap that will probably also live for a long time, which make them a prime candidate for causing fragmentation.

Anyways, whether or not that this particular optimization is doable and worthwhile, this kind of thing will be easier to do, once we separate the type of strings we use for names from other strings.
Do not meddle in the affairs of Wizards, for they are subtle and quick to anger.
My useless rants
The company I work for (Dead Mage)
The first game we made (Garshasp: The Monster Slayer)
User avatar
Kojack
OGRE Moderator
OGRE Moderator
Posts: 7157
Joined: Sun Jan 25, 2004 7:35 am
Location: Brisbane, Australia
x 534

Re: Optimizing Away the Use of Strings as Handles

Post by Kojack »

One of the original goals of Ogre 2.0 back when sinbad was first talking about it was to remove the need for strings. That was partly achieved since then by hiding the need for strings (back then you had to explicitly give a name to everything, there were no autogenerated names for nodes or entities), but the strings are still there inside.

I vaguely remember somebody years ago extended the ogre string class to use hash based comparisons (Ogre used to use a custom class that inherited from std::string. Now it just typedefs std::string). They got a performance boost for some things.
User avatar
yzt
Halfling
Posts: 98
Joined: Wed Dec 28, 2005 7:43 am
x 2
Contact:

Re: Optimizing Away the Use of Strings as Handles

Post by yzt »

Kojack wrote:I vaguely remember somebody years ago extended the ogre string class to use hash based comparisons (Ogre used to use a custom class that inherited from std::string. Now it just typedefs std::string). They got a performance boost for some things.
Note that I don't propose this scheme for all strings. Some stuff like paths and logged strings and whatnot are just strings. I'm specifically talking about strings that are used to name stuff (these can be thought of as immutable handles/keys.)

In any case, it would be a good idea to have two typedefs for strings: and Ogre::String that is just a std::wstring (or std::string) and maybe an Ogre::NameString that for now can just be a std::wstring (or std::string) too (let's ignore the issue of allocators for now.) The OGRE internals and interfaces then need to be updated so that they use Ogre::NameString when they actually mean a name/handle/key. A very simple optimization then would be to just use std::string as Ogre::NameString, as I imagine almost nobody would need to use non-ASCII names for the name of their scene nodes and stuff. Obviously this will be subject to a new build config variable so people that for some reason do need international names for their internal names can get them.
Do not meddle in the affairs of Wizards, for they are subtle and quick to anger.
My useless rants
The company I work for (Dead Mage)
The first game we made (Garshasp: The Monster Slayer)
User avatar
Klaim
Old One
Posts: 2565
Joined: Sun Sep 11, 2005 1:04 am
Location: Paris, France
x 56
Contact:

Re: Optimizing Away the Use of Strings as Handles

Post by Klaim »

yzt wrote: @Klaim:
I would think that using 128-bit GUIDs (or UUIDs or any kind of IDs) might not provide that much of a benefit either space-wise or speed-wise. I would imagine that not much of the names people use for stuff in OGRE are more than 15 characters long (which will make them less than 128 bits.) Of course, there is the overhead of the std::string object itself (24 bytes or more) and the extra allocation(s) it performs to get room for the actual data. But I can see your point of course. 128 bit IDs will have practically no chance of collisions.

It would be easy to customize the size of the index (handle, ID, whatever) stored.
Sorry I was not clear: I wasn't suggesting to you to use UUID. I need UUIDs for this specific project, so I use them as Id values, associated with strings for naming of a lot of these objects. It was just an anecdotal commentary, sorry.
But, what I'm suggesting here is not abolishing strings altogether. I'm just suggesting an under-the-hood, transparent-to-the-user reorganization of some internal OGRE data and that's it. I'm not trying to replace strings with UUIDs. That might look a bit daunting for the newcomers to OGRE. I'm proposing a way to let the rookies work with OGRE in an intuitive and convenient way, and yet implement an optimization that appeals to the pros and the more experienced (and opens the door for their own tampering and improvements.)

Even if we don't actually implement any of the ideas we are discussing here, the act of separating the type name of strings used for names and handles from everything else will be beneficial on its own. It lets people customize that type as they like, whether it is using UUIDs instead of strings, implementing quasi compile-time string hashing or a string table.
As I said I didn't suggest to use UUIDs (I need them for specific reasons), but hashes or simple (atomic) iteration of integers should do the trick I guess? The important thing is to have it encapsulated in a type to identify objects which then is associated with the string as you already said. I'm not even sure it's necessary to have that type be usable as a string directly.
User avatar
yzt
Halfling
Posts: 98
Joined: Wed Dec 28, 2005 7:43 am
x 2
Contact:

Re: Optimizing Away the Use of Strings as Handles

Post by yzt »

Klaim wrote:The important thing is to have it encapsulated in a type to identify objects which then is associated with the string as you already said. I'm not even sure it's necessary to have that type be usable as a string directly.
Agreed.

The "handle" type being usable as a string just has the advantage that it eases the migration. This change is not going to be implemented over a course of a few days in OGRE itself, so it might be beneficial to make the change incrementally and little-by-little. And the handle type being transparently convertible to/from string lets the implementer and the team and the users avoid a lot of compile errors and hassle.

Also, there are a lot of user code out there that relies on the interfaces to many OGRE functions be able to accept strings and to give back strings. I'm guessing that breaking all that code is not a good thing! :D
Do not meddle in the affairs of Wizards, for they are subtle and quick to anger.
My useless rants
The company I work for (Dead Mage)
The first game we made (Garshasp: The Monster Slayer)
User avatar
Klaim
Old One
Posts: 2565
Joined: Sun Sep 11, 2005 1:04 am
Location: Paris, France
x 56
Contact:

Re: Optimizing Away the Use of Strings as Handles

Post by Klaim »

yzt wrote: Also, there are a lot of user code out there that relies on the interfaces to many OGRE functions be able to accept strings and to give back strings. I'm guessing that breaking all that code is not a good thing! :D
I agree! In the same time I'm thinking:

1. if you provide a PR for a future interface-breaking version (2.0 for example) then it's ok I guess?
2. if the handling type have an (implicit) operator string, then std::string s = ogre_object->getName(); will still work as expected, right?
3. if the handling type accept a string as an implicit constructor, pushing string in functions will still work, right?
4. the only case I see would break easily is if the user uses auto : auto s = ogre_object->getName(); would not return a string. It's easy to fix though, like ogre_object->getName().str() or something like that, but it is interface breaking.
5. The cases where it would break but I doubt it's really often used is ogre_object->getName().size() for example. Maybe I'm wrong, hard to say indeed!

I'm not suggesting a particular way of doing smooth transition (you do the proposition, not me :) ), what I want to point is that the handling type shouldn't totally reflect string interface itself, because it might make types not-that-obvious on the user side. What do you think about just providing the implicit constructor and string operator? (and a function to get the string explicitly too).
User avatar
yzt
Halfling
Posts: 98
Joined: Wed Dec 28, 2005 7:43 am
x 2
Contact:

Re: Optimizing Away the Use of Strings as Handles

Post by yzt »

Klaim wrote:I'm not suggesting a particular way of doing smooth transition (you do the proposition, not me :) ), what I want to point is that the handling type shouldn't totally reflect string interface itself, because it might make types not-that-obvious on the user side. What do you think about just providing the implicit constructor and string operator? (and a function to get the string explicitly too).
(The ones you count are basically all I can think of too. All seems correct and fine to me.)

You see, there's a fine line here that you are touching upon as well. I'm not sure either that providing full drop-in compatibility (all the conversions, all the implicit constructors, all the methods, all the operators, etc.) with std::string is a good thing in general. After all, our StringHandle class (or whatever its name would be) is decidedly not a std::string. So, it's good to get compile errors that tell you your presumptions (about x->getName() returning a std::string) are wrong. That's why C++ is strongly typed language and that's a Good Thing. In the long run, I don't even think having just a cast-to-string operator and an implicit constructor would be a good idea. Experience has taught me to avoid these kinds of implicit conversions.

But we have a long way till 2.0 (maybe 2 years?) and this should not take that long to implement. The strategy I suggest for the meantime is this: (this is of course still hypothetical.)
We implement this, with the StringHandle class having a full std::string interface (including the iterators and whatnot.) But, we add a build setting (a macro) that controls the whether that interface is in or completely compiled out. If the OGRE team decide so, they can declare that the string-as-name interfaces are being deprecated, but to ease the transition, there a such-and-such build setting that can enable or disable drop-in compatibility with std::string. The ideal thing would be that over time, people test their code and migrate is over to the new interfaces at their own leisure and nobody's code would be broken by an overnight change. They will at least have plenty of warning!


So, after all this, is anybody up for doing a test to measure string manipulation cost? Or at least suggest a test that would be meaningful? (we had some suggestions at the beginning.) What are some modules that rely on name strings heavily?

I'm guessing having a large scene graph and doing a lot of look-up and manipulation using node names could be one test (and not unrealistic either, as game code could store scene node names instead of pointers... I don't know of any code that does, but they could!) Also maybe bone manipulation in skeletal animation? I don't know...
Do not meddle in the affairs of Wizards, for they are subtle and quick to anger.
My useless rants
The company I work for (Dead Mage)
The first game we made (Garshasp: The Monster Slayer)
Knotanalt
Halfling
Posts: 94
Joined: Sun Jul 01, 2012 2:58 pm
x 2

Re: Optimizing Away the Use of Strings as Handles

Post by Knotanalt »

yzt wrote: Now, I have a question for the OGRE community: is this worth doing at all?! Do you see any problems/downsides that I have missed? Note that what I'm suggesting is purely hypothetical. I have not done any work in this area on OGRE (although I have implemented and used this scheme successfully in my game engines and other projects, both hobbies and commercial projects.)
It's just a big question mark how much it's worth it, if at all.

I don't see how it would really help to be honest. The only worry I have about the name lookup is the speed, but that could be done by hash if it's not already, and you don't necessarily have to look up by name anyway if you track it yourself. I can't see that strings are going to use enough much memory to care at all, at least not on a PC.

There's also drawbacks, yes. It's a tradeoff, and maybe not a good one. The tradeoff is that you have less memory at the cost of more actual allocations of memory, that are not clumped together with their handles. Fragmentation. Which is usually the bad call to make for 3D apps.

But unless you use a crazy amount of strings I doubt it's going to matter even a little, whichever you choose. It would matter a lot more for a business app than a typical 3D app.

Plus there's the debugging issue you mentioned, why my string class doesn't do anything fancy like that.
User avatar
stealth977
Gnoll
Posts: 638
Joined: Mon Dec 15, 2008 6:14 pm
Location: Istanbul, Turkey
x 42

Re: Optimizing Away the Use of Strings as Handles

Post by stealth977 »

Just my 50 cents,

Using StringIDs instead of strings will speedup:

1 - Lookups
2 - Object creation especially when you dont want to/no need to specify custom names
3 - Will make overall processing much faster especially when you can define some StringIDs as constants:

Code: Select all

const StringID HandBone = STRINGID("Hand Bone"); // Will never be re-calculated each time you need to access it

functionX(...)
{
   bone * exbone = getBone( HandBone );
   ....
}

functionY(...)
{
   bone * exbone = getBone( HandBone );
   ....
}
4 - Some experimental in-accurate tests being said to have %5 improvements when StringIDs are used, it may be right or wrong, lower or higher, but sure would have much better performance than directly using strings
5 - Performance improvement will get much better and better as OGRE code gets faster like in 2.0 proposal, it may be %5 when using already slow 1.8 pipeline, but may get you %10 improvement when OGRE gets much faster like proposed for 2.0...

All that being said, there is a serious drawback for StringIDs, no matter which hashing method you use, hashes will collide at some point which has 2 possible solutions:
1 - Use a secondary hash (which std:: probably uses) for collisions, but it makes everything slow. It is the way to go for General Purpose implementations like in STD...
2 - Add collision checking and reporting only for debug mode, so user has the chance to use "Hand_Bone" instead of "Hand Bone" if "Hand Bone" collides with something else...

Of course a good hashing method would determine how likely a collision will occur, for 32bit StringIDs, it may be anywhere between once in 5000 strings to once in 100000 different strings...

Finally, IF 2.0 proposal I read is going to be implemented (and I am so looking forward to see it implemented) i would suggest implementing StringIDs there, right from the beginning, since it would be much much easier, less problematic and you wouldnt need to worry about backward compatibility or implicit constructors/converters...
Ismail TARIM
Ogitor - Ogre Scene Editor
WWW:http://www.ogitor.org
Repository: https://bitbucket.org/ogitor
User avatar
syedhs
Silver Sponsor
Silver Sponsor
Posts: 2703
Joined: Mon Aug 29, 2005 3:24 pm
Location: Kuala Lumpur, Malaysia
x 51

Re: Optimizing Away the Use of Strings as Handles

Post by syedhs »

For me, performance improvement only matters when we can improve the efficiency the code in the tight loop (eg Root::renderOneFrame, SceneManager::_setPass etc). If the code inside those functions are using strings to do certain lookup for an example, then the effort is worth it - and only if the code cannot be changed to not using string lookup.
A willow deeply scarred, somebody's broken heart
And a washed-out dream
They follow the pattern of the wind, ya' see
Cause they got no place to be
That's why I'm starting with me
User avatar
yzt
Halfling
Posts: 98
Joined: Wed Dec 28, 2005 7:43 am
x 2
Contact:

Re: Optimizing Away the Use of Strings as Handles

Post by yzt »

syedhs wrote:For me, performance improvement only matters when we can improve the efficiency the code in the tight loop (eg Root::renderOneFrame, SceneManager::_setPass etc). If the code inside those functions are using strings to do certain lookup for an example, then the effort is worth it - and only if the code cannot be changed to not using string lookup.
It's not exactly that easy a decision for library, now is it?! Of course, with expert users, they will look at the library source code and they will measure what takes how long, etc. But what about every other user? What about those expert users when they are not careful?

In general, a middleware (specially a game middleware) cannot know exactly what users will do in their inner loops. You and I may know not to animate the position of 1000 meshes in the frame listener callbacks, using strings to look them up, but not every user will know that, and not every user that knows will avoid doing it! Those cases only lead to bad reputation for OGRE.

Besides, if a particular optimization does not clutter up the interfaces and doesn't introduce any significant new "gotchas" and pitfalls, what's the harm in doing it?!

@stealth977:
That's quite a good way to use string IDs. Also, with certain C++11 features (including user-defined literals and constexprs) you can even remove the slight inconvenience of having to define your string IDs as constants like you have shown. AFAIK some of these features are implemented in latest MSVC compiler CTP and most do exist in GCC 4.7/4.8 (I don't know about Clang.)
Do not meddle in the affairs of Wizards, for they are subtle and quick to anger.
My useless rants
The company I work for (Dead Mage)
The first game we made (Garshasp: The Monster Slayer)
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Optimizing Away the Use of Strings as Handles

Post by dark_sylinc »

In the wiki page I already set that for 2.0 uniqueness (i.e. SceneNode, Entity, etc) should be by ID (i.e. something as simple as an incrementing counter), not by name. Also names should be represented by rather than variable length strings (link can't be linkified in this forum: http://home.comcast.net/~tom_forsyth/blog.wiki.html#[[A%20sprintf%20that%20isn%27t%20as%20ugly]] ). Ogre abuses the use of std::string for trivial stuff a lot.

Here's a nice read about someone else's experience with strings, fixed length strings, and hashed strings.
noorus
Halfling
Posts: 75
Joined: Wed Apr 20, 2011 9:55 pm
Location: Helsinki, Finland
x 3

Re: Optimizing Away the Use of Strings as Handles

Post by noorus »

If the internals of Ogre were modified to not do object lookups by names, we could get rid of all the autogenerated names. Just have an internal ID, and only generate a name based on that if getName() is called. That way we'd have both the performance of numbers and the convenience of strings.
Creator of Nice Input Library, for your advanced input needs.
Image
Tiffer
Kobold
Posts: 36
Joined: Thu Oct 04, 2012 12:00 pm
x 6

Re: Optimizing Away the Use of Strings as Handles

Post by Tiffer »

Hi guys. I'm also tormented by Ogre's usage of strings. I have some free time, I could contribute if there's willingness. My plan would be to massively replace string with "unsigned int". Doesn't sound that complicated. I've done similar kinds of refactoring in other libraries before. For backwards compatibility, we could maintain overloads for object creation with the string, and those strings could be hashed to a unsigned int.

What I need to know is, when doing this kind of thing, what branch should I use? If I do something I want to use it right away, and I'm using 1.8.1. But I don't suppose you'd like to see such a radical change in 1.8, which is a problem, cause it kills the motivation to do it!

I guess I could migrate to 1.9? Would such a change be acceptable for 1.9 now that there's a RC1 underway?
Tiffer
Kobold
Posts: 36
Joined: Thu Oct 04, 2012 12:00 pm
x 6

Re: Optimizing Away the Use of Strings as Handles

Post by Tiffer »

Wow, there are 24 classes in Ogre with a string member "mName". I would have though they would all inherit from OgreNamed or something like that which would have made the conversion easier. Well that's a bummer.

Anyway, the big ones are OgreNode and OgreMovableObject, so I'm gonna look into those. Which one do you guys think causes a bigger overhead?

Here's my plan for now. Replacing string as a type for the identifier member with this:

Code: Select all

namespace Ogre {
  class Identifier {

   private:
    uint uid;
    String *name; //an optional convenience, it isn't used for comparison,
                          //it's important that it's a pointer so that there's no overhead
                          //if the user doesn't want to use it

   public:

    Identifier(); //autogenerates uid, name remains NULL
    Identifier(uint uid); //assigns uid, name remains NULL
    Identifier(const String& name); //autogenerates uid and assigns optional name
    Identifier(uint uid, const String& name);

    bool operator== (const Identifier& identifier) const
    {
       return identifier.uid == uid;
    }

    bool hasName();
    const String& getName();
    //void setName(const String& name); //not sure if you should be able to change it
  };
}
Ogre would prefer to autogenerate id's in the ranges:
0000000016-0000FFFF16
7FFF000016-7FFFFFFF16

Which are reserved memory locations in Windows (dunno about other OS), that way the user can use pointers as ids without clashes. This is just a small convenience that isn't fail proof because you could run out of those id's, if the user wants to use pointers with objects of a certain class he should just make sure to name every object of that class so that there are not clashes.
User avatar
Klaim
Old One
Posts: 2565
Joined: Sun Sep 11, 2005 1:04 am
Location: Paris, France
x 56
Contact:

Re: Optimizing Away the Use of Strings as Handles

Post by Klaim »

You will need hashing functions or add an operator to convert implicitely to the uint because names are mostly used in (hash)maps.
Or at least a < operator.

[mumbling to myself - side comment - not a suggestion at all]
I have a similar id type in my game. It is a bit specific, but basically it hold a pointer that might or not be pointing to the object, and a 128bit identifier (because I need uuid, the boost one -that is very simple). It's a bit different setup but it allows me to add a -> operator that allow, if the pointer is set, to use the id as a pointer to the object. If it's not a valid pointer (I have specific way to know if the pointed object is or not the one that the id value represent, so if the pointer is not null, it's valid), if it's not a valid pointer then the id use have to retrieve the object through a find function.
[/mumbling to myself - side comment - not a suggestion at all]

May I also minor suggestions:
- add a to_string() free function (like the new to_string() standard functions) that convert the id value to a String, either via it's member, or by searching in the name "database" or something;
- change the name to something shorter as it will be used in a lot of interfaces;


It's not clear to me how you intend to store the strings exactly? Have a Singleton id->name table?
Post Reply