Optimizing Away the Use of Strings as Handles

Discussion area about developing or extending OGRE, adding plugins for it or building applications on it. No newbie questions please, use the Help forum for that.
User avatar
masterfalcon
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 4270
Joined: Sun Feb 25, 2007 4:56 am
Location: Bloomington, MN
x 126

Re: Optimizing Away the Use of Strings as Handles

Post by masterfalcon »

Alright. The 2.0 branch is up to date. Go for it.
Tiffer
Kobold
Posts: 36
Joined: Thu Oct 04, 2012 12:00 pm
x 6

Re: Optimizing Away the Use of Strings as Handles

Post by Tiffer »

I'm now using my own fork of Ogre with unsigned int's instead of strings. The performance gain is just too big not to, which is a PITA because I will miss on Ogre updates. What I do to adapt plugins that require string handle functionality is to keep a map of int to string equivalences.

The hash idea makes me uneasy, isn't there a potential for collisions? The idea that every once in a while the game may crash mysteriously for a user doesn't sit well with me. Or can you prove mathematically that there won't be any collisions inside a string length range? If so calculating that range and telling the library user would be extremely important.

Now that I think about it, it seems that strings would have to be too small to guarantee no collisions (4 chars?). I think we could get it to 16 chars though if we limit them to alphabet/digit range (A-Za-z0-9) and tailor made the hash algorithm (haven't checked it so dunno if thats possible).

My original idea for this would be to just bite the incompatibility bullet and move to a permanent solution based on uids. In my opinion we should decouple Handlers from Identifiers by introducing tags or labels, whatever you wanna call it.

Code: Select all

Node*node = parent->createChildNode() //object is assigned a unique unsigned integer identifier from a internal counter.
//node = createChildNode(32) //or you can assign one if you know what you are doing (not recommended)
uint objectID = node->getUID(); //here you get the autoassigned id in case you need it for later

node->addTag("object"); //you can also add as many tags as you need
node->addTag("destructible");

node->setName("chair132"); //names are like tag, not used as handlers and optional, but they are guaranteed to be unique

sceneManager->getByUID(objectID); //you now could retrieve it by uid
sceneManager->getByTag("object"); //or retrieve many of them with this
sceneManager->getByName("chair132"); //or only one by name

User avatar
vitefalcon
Orc
Posts: 438
Joined: Tue Sep 18, 2007 5:28 pm
Location: Seattle, USA
x 13

Re: Optimizing Away the Use of Strings as Handles

Post by vitefalcon »

Tiffer wrote:I'm now using my own fork of Ogre with unsigned int's instead of strings. The performance gain is just too big not to, which is a PITA because I will miss on Ogre updates. What I do to adapt plugins that require string handle functionality is to keep a map of int to string equivalences.

The hash idea makes me uneasy, isn't there a potential for collisions? The idea that every once in a while the game may crash mysteriously for a user doesn't sit well with me. Or can you prove mathematically that there won't be any collisions inside a string length range? If so calculating that range and telling the library user would be extremely important.

Now that I think about it, it seems that strings would have to be too small to guarantee no collisions (4 chars?). I think we could get it to 16 chars though if we limit them to alphabet/digit range (A-Za-z0-9) and tailor made the hash algorithm (haven't checked it so dunno if thats possible).
With a hash string, there are chances of collisions. I don't have strict numbers here to give you an estimate for the probability of collisions, but so far, with around 45k resource files that my game client uses, I don't get a clash yet. Maybe I'm just too lucky.
Tiffer wrote:My original idea for this would be to just bite the incompatibility bullet and move to a permanent solution based on uids. In my opinion we should decouple Handlers from Identifiers by introducing tags or labels, whatever you wanna call it.
I guess you mean 'handles from identifiers'. If so, I sort of agree to that, provided that it doesn't break the functionality (not compatibility). For example, people will still need the functionality of searching for resources by wild-characters and for this to happen, you'll need a list of strings for each resource. One way to manage this would be to have a ID-String repository you had originally mentioned, but then comes the issue of storing those ID-to-String storage in groups, because resources have a concept of resource-groups. So then you'll need to come up with a solution where you have mappings of ID-to-String, String-to-Group and then there would be other base functionality that would still be missing by that implementation. I haven't thought of one yet, but people with more experience can probably give examples.

The ultimate goals for this change would be:
  • Use integer identifiers for resources so that when it comes to picking an exact resource, it would be done in the most efficient manner.
  • Keep around the name (path) of the resource in such a way that it can be extracted as a list by group, or get the group from the resource name.
The hash string implementation works for me. Doesn't mean it's the ultimate solution. Probably we'd have to come up with a better one. I'd however implement it with HashString in my fork and let it for review by others. That way, you'd have something implemented to work on and compare to your implementation and take the best of both worlds or the one (or none) based on what everyone thinks. In the current situation, with just theories flying around, there are no hard measurable metrics that the devs can use to support either implementations.
Image
Tiffer
Kobold
Posts: 36
Joined: Thu Oct 04, 2012 12:00 pm
x 6

Re: Optimizing Away the Use of Strings as Handles

Post by Tiffer »

Well, I wouldn't use a library that rolls the dice like that, tbh. Programs should only crash for circumstances beyond your control such as hardware failure or resource starvation. I can see it as a nice facility for users, though, but the problem is that plugins will start using it and you'll be forced to use it too.

However, as long as it's optional for the end-user and plugin developers are strongly discouraged from using it, I'd be totally cool with this if it helps people transition out of strings in the core. Things done in compile time are always helpful.
User avatar
vitefalcon
Orc
Posts: 438
Joined: Tue Sep 18, 2007 5:28 pm
Location: Seattle, USA
x 13

Re: Optimizing Away the Use of Strings as Handles

Post by vitefalcon »

Having mulled over it over night, I believe Tiffer has a better solution and hash-string may not be what Ogre3D needs. Let me give the reasons behind it: Hash-strings are good to be used in cases where the asset names are known because the main advantage of hash-strings is that it creates resourced ID's at compile time. Therefore hash-strings will not give compile-time penalties for Ogre3D, but it will create compile-time penalties for the end-user. This might negatively impact Ogre's reputation. In that aspect it's better suited for an Asset Pipeline, but not for Ogre, which is a generic 3D graphics engine.

I am however thinking over using Tiffer's idea of ID+String mapping. I'll first create a brief design document on the changes before implementing anything so that everyone is clear of what is being done.

@Tiffer: can you do the same with what you have already implemented? I will be using the Wiki for documenting the design. We could just start one with '[Proposal] Project <Working Title>' in wiki, discuss about that in a separate thread and updating that wiki till we're satisfied with the design changes being proposed. After the decisions and changes are known making changes to Ogre would be clear. After which changes will be made to a forked repository. We could use mine.

For now, due to lack of sleep and no creative thinking, I'm going to start the wiki page with the title [Proposal] Project String Busters.
Last edited by vitefalcon on Wed Jan 30, 2013 11:30 pm, edited 1 time in total.
Image
Knotanalt
Halfling
Posts: 94
Joined: Sun Jul 01, 2012 2:58 pm
x 2

Re: Optimizing Away the Use of Strings as Handles

Post by Knotanalt »

There's no real problem that can't be easily solved, not like hashing has not been done before but the whole idea of a hash is a bit silly to be honest.

Simply cache the lookup value of the material, as I said before.

So you have a material name and a pointer.

When you look it up you do a check.

Resource* getResources()
{
If(ref==0)
{
ref = oldWayYouLookup();
}
return ref;
}

now use ref normally.

Hash is an extra chance for cache miss (maybe even more than one!), plus other extra potential bugs to deal with. A comparison to zero for all intents and purposes has no cost at all.

So for any node you only ever do the comparison once.
User avatar
vitefalcon
Orc
Posts: 438
Joined: Tue Sep 18, 2007 5:28 pm
Location: Seattle, USA
x 13

Re: Optimizing Away the Use of Strings as Handles

Post by vitefalcon »

Maybe you didn't read what I last posted. I was saying hashing is not appropriate for Ogre3D. Also your idea of caching doesn't make sense to me. Can you please explain what kind of scenario are you talking about? Currently, all loaded resources are placed into an std::map<Ogre::String, ResourcePtr>. How does caching help? What are you trying to solve with caching? How does getResources() work? What resource is it returning without an handle being specified?
Image
Knotanalt
Halfling
Posts: 94
Joined: Sun Jul 01, 2012 2:58 pm
x 2

Re: Optimizing Away the Use of Strings as Handles

Post by Knotanalt »

vitefalcon wrote:Maybe you didn't read what I last posted. I was saying hashing is not appropriate for Ogre3D. Also your idea of caching doesn't make sense to me. Can you please explain what kind of scenario are you talking about? Currently, all loaded resources are placed into an std::map<Ogre::String, ResourcePtr>. How does caching help? What are you trying to solve with caching? How does getResources() work? What resource is it returning without an handle being specified?
That's just pseudocode as I don't know where the exact lines you were trying to optimize are.

I know you said it's innapropriate now but the point I am making is you seem to be overcomplicating things a whole lot. If I understand where the original performance issue is there's no reason for hash string or for hashing strings or any of this, it's just way ovekill.

But you told me that it's resource lookup that the string optimization is saving "5%" of the framerate.

So I assume that nodes have a material name and it looks up from there using string comparison.

Looking in the string itself jumps memory location. Possible cache miss.

std::map<Ogre::String, ResourcePtr>

Plus several more here, and apparently a string comparison. Does it do string compare then binary search? That's a lot of hoops to jump through over and over and over.

All while you are in your "critical loop".

So if that's where the performance issue is coming in this shortcircuits it completely. If not then why don't you post the code bit that this fix optimizes and I can just post another simple, exact fix that's clearer.
Owen
Google Summer of Code Student
Google Summer of Code Student
Posts: 91
Joined: Mon May 01, 2006 11:36 am
x 21

Re: Optimizing Away the Use of Strings as Handles

Post by Owen »

My general suggestion is
  • For explicitly named things, have a string interning table so that only one copy of the name need be stored
  • For things with no assigned name, use the pointer as the "name"
If there is a desire to support translating identifiers back to their names (I'd certainly expect there is), then the alignemnt of the underlying type can be relied on for this

A simple option for the identifier class would be e.g.

Code: Select all

static_assert(alignof(String) > 1, "String must be a aligned to at least 2 byte boundaries")
class Identifier {
    uintptr_t mHandle;
public:
    Identifier(const String& str) : mHandle(reinterpret_cast<uintptr_t>(StringTable::intern(str)) {}

    // Explicit because it would steal string literals...
    template<typename T>
    explicit Identifier(const T* p) : mHandle(reinterpret_cast<uintptr_t>(p) | 1) {
        static_assert(alignof(T) > 1, "T must be aligned to at least 2 byte boundaries");
    }

    String toString() const
    {
        if(T & 1) {
            // Perhaps return "<Unnamed 0xADDRESS>" for convinience?
        } else {
            StringTable::lookup(reinterpret_cast<const String*>(mHandle));
        }
    }

    // Relational operators/etc as normal
};
Object creation functions would then probably want some "explicit" way of identifying the presence of a string or not - perhaps something along the lines of boost::optional<String>? Or perhaps the "null" (mHandle == 0) identifier could be declared invalid and setName/constructors, when receiving an Ogre::Identifier in that form, would automatically assign the ID based upon the object's address.

The one remaining question is names when serialized in files - There, again, perhaps store something special when an auto generated identifier is encountered? Perhaps the empty string, which would avoid changing the format?

The nice thing about this is that it mostly just slots right in with the work Tiffer has already done, without significantly altering the APIs.
User avatar
vitefalcon
Orc
Posts: 438
Joined: Tue Sep 18, 2007 5:28 pm
Location: Seattle, USA
x 13

Re: Optimizing Away the Use of Strings as Handles

Post by vitefalcon »

Knotanalt wrote: That's just pseudocode as I don't know where the exact lines you were trying to optimize are.

I know you said it's innapropriate now but the point I am making is you seem to be overcomplicating things a whole lot. If I understand where the original performance issue is there's no reason for hash string or for hashing strings or any of this, it's just way ovekill.
So you're trying to help without knowing the problem domain.
Knotanalt wrote:But you told me that it's resource lookup that the string optimization is saving "5%" of the framerate.
Sorry, whom are you referring to here? I didn't give any metric my friend. I don't even have a scenario of having to look up resource strings every frame to make that claim.
Knotanalt wrote:So I assume that nodes have a material name and it looks up from there using string comparison.
Please don't assume if you want to help. Please do your homework.
Knotanalt wrote:Looking in the string itself jumps memory location. Possible cache miss.

std::map<Ogre::String, ResourcePtr>

Plus several more here, and apparently a string comparison. Does it do string compare then binary search? That's a lot of hoops to jump through over and over and over.

All while you are in your "critical loop".

So if that's where the performance issue is coming in this shortcircuits it completely. If not then why don't you post the code bit that this fix optimizes and I can just post another simple, exact fix that's clearer.
I skipped this part because I get the feeling its not going to help, since you haven't really looked into the problem.

Just to give everyone a good heads up, resource managers already uses std::hash<Ogre::String, Ogre::ResourcePtr> instead of std::map<Ogre::String, Ogre::ResourcePtr>. So if anything, hashing hasn't caused anyone any issues so far. Are we all incredibly lucky? That being said, I still don't think hash strings will do any good to Ogre, but only to the end-user or to his/her asset pipeline. In short, I'm starting to wonder if any optimization is required at all.
Image
User avatar
vitefalcon
Orc
Posts: 438
Joined: Tue Sep 18, 2007 5:28 pm
Location: Seattle, USA
x 13

Re: Optimizing Away the Use of Strings as Handles

Post by vitefalcon »

Owen wrote:Object creation functions would then probably want some "explicit" way of identifying the presence of a string or not - perhaps something along the lines of boost::optional<String>? Or perhaps the "null" (mHandle == 0) identifier could be declared invalid and setName/constructors, when receiving an Ogre::Identifier in that form, would automatically assign the ID based upon the object's address.
So you're adding one more layer on top of the problem. The main thing that I was originally proposing to do was eliminate the need of strings in compiled code, in which case there are no string comparisons. Only number comparisons. What you're suggesting is making each string to have an idempotent existence like in C#. To check if a string exists in a map, you'll have to O(n log n) search, which was exactly what I was thinking of eliminating. Since you haven't looked at the code, but I did in detail yesterday, seems like all string-to-resource look ups (internally) has an O(1) complexity due to using std::hash as the data-structure to store the mapping, instead of an std::map and it has been like that always. There is the look up of the resource group which is in a std::map, but I doubt that would give anyone a significant performance gain for resource look up.

Adding an extra layer like you suggested, I believe, serves little purpose. Unless every program removes the requirement of having a string, there's nothing really to optimize.
Image
bstone
OGRE Expert User
OGRE Expert User
Posts: 1920
Joined: Sun Feb 19, 2012 9:24 pm
Location: Russia
x 201

Re: Optimizing Away the Use of Strings as Handles

Post by bstone »

vitefalcon wrote:resource managers already uses std::hash<Ogre::String, Ogre::ResourcePtr> instead of std::map<Ogre::String, Ogre::ResourcePtr>. So if anything, hashing hasn't caused anyone any issues so far. Are we all incredibly lucky? That being said, I still don't think hash strings will do any good to Ogre, but only to the end-user or to his/her asset pipeline. In short, I'm starting to wonder if any optimization is required at all.
Hash containers still store the real (unhashed) values and perform full comparison among all colliding hashes. That's why something like std::hash_map<> would never cause any troubles compared to discarding the real values and using only their hashes.
vitefalcon wrote:So you're adding one more layer on top of the problem.
I would disagree. That could be a good intermediate step before dropping the strings out completely and it would boost performance for applications that wouldn't use string identifiers right away as been demonstrated already by Tiffer earlier in this thread.
Owen
Google Summer of Code Student
Google Summer of Code Student
Posts: 91
Joined: Mon May 01, 2006 11:36 am
x 21

Re: Optimizing Away the Use of Strings as Handles

Post by Owen »

vitefalcon wrote:
Owen wrote:Object creation functions would then probably want some "explicit" way of identifying the presence of a string or not - perhaps something along the lines of boost::optional<String>? Or perhaps the "null" (mHandle == 0) identifier could be declared invalid and setName/constructors, when receiving an Ogre::Identifier in that form, would automatically assign the ID based upon the object's address.
So you're adding one more layer on top of the problem. The main thing that I was originally proposing to do was eliminate the need of strings in compiled code, in which case there are no string comparisons. Only number comparisons. What you're suggesting is making each string to have an idempotent existence like in C#. To check if a string exists in a map, you'll have to O(n log n) search, which was exactly what I was thinking of eliminating. Since you haven't looked at the code, but I did in detail yesterday, seems like all string-to-resource look ups (internally) has an O(1) complexity due to using std::hash as the data-structure to store the mapping, instead of an std::map and it has been like that always. There is the look up of the resource group which is in a std::map, but I doubt that would give anyone a significant performance gain for resource look up.

Adding an extra layer like you suggested, I believe, serves little purpose. Unless every program removes the requirement of having a string, there's nothing really to optimize.
The first gain is, on average, 2*sizeof(size_t) for every named object or container entry, plus a reduction in the amount of memory allocation/copying happening.

The way I see it, we can divide the cases in half:
  • Objects which absolutely need identifiers, such as resources.
  • Objects for which identifiers are "nice to have", such as scene nodes, bones, etc
For the first case, I think we are pretty much stuck with string identifiers, as people are unlikely to wish to refer to their, say, "sheet metal" resource as 0xfeedface throughout their material scripts, scenes, meshes, etc.

For the latter case, I'd argue that either the identifiers should be strings or not exist at all. Integer IDs are, fundamentally, not much more useful than a convention e.g. "the second bone is the neck".

Fundamentally, you're never going to get rid of all string comparisons unless you get rid of string based or derived identifiers entirely... at which point, you may as well just make the identifiers the application's job entirely and get rid of them from the Ogre classes.

(Doing the latter for scene nodes would appear to require dropping things like SceneManager::setWorldGeometry, however)
Knotanalt
Halfling
Posts: 94
Joined: Sun Jul 01, 2012 2:58 pm
x 2

Re: Optimizing Away the Use of Strings as Handles

Post by Knotanalt »

vitefalcon wrote:
Knotanalt wrote: That's just pseudocode as I don't know where the exact lines you were trying to optimize are.

I know you said it's innapropriate now but the point I am making is you seem to be overcomplicating things a whole lot. If I understand where the original performance issue is there's no reason for hash string or for hashing strings or any of this, it's just way ovekill.
So you're trying to help without knowing the problem domain.
Knotanalt wrote:But you told me that it's resource lookup that the string optimization is saving "5%" of the framerate.
Sorry, whom are you referring to here? I didn't give any metric my friend. I don't even have a scenario of having to look up resource strings every frame to make that claim.
Knotanalt wrote:So I assume that nodes have a material name and it looks up from there using string comparison.
Please don't assume if you want to help. Please do your homework.
Knotanalt wrote:Looking in the string itself jumps memory location. Possible cache miss.

std::map<Ogre::String, ResourcePtr>

Plus several more here, and apparently a string comparison. Does it do string compare then binary search? That's a lot of hoops to jump through over and over and over.

All while you are in your "critical loop".

So if that's where the performance issue is coming in this shortcircuits it completely. If not then why don't you post the code bit that this fix optimizes and I can just post another simple, exact fix that's clearer.
I skipped this part because I get the feeling its not going to help, since you haven't really looked into the problem.

Just to give everyone a good heads up, resource managers already uses std::hash<Ogre::String, Ogre::ResourcePtr> instead of std::map<Ogre::String, Ogre::ResourcePtr>. So if anything, hashing hasn't caused anyone any issues so far. Are we all incredibly lucky? That being said, I still don't think hash strings will do any good to Ogre, but only to the end-user or to his/her asset pipeline. In short, I'm starting to wonder if any optimization is required at all.
There's no reason to be such a jerk, seriously.

Why don't you just post the code location like I asked instead of being insulting.
So you're trying to help without knowing the problem domain.
Now you are making the big assumption here. Obviously you are not good with english because you have made some pretty dumb responses to what I have said.

vitefalcon wrote:It is not for scenegraph traversing. Every resource is currently identified by a string. If you have N resources you would have to do N string comparisons where each string comparison takes another M comparisons, where M is the length of the string. Dont forget the number of bytes it has to fetch from memory for each string.

With a HashString, to the programmer that looks and feels like a string but is represented as a number instead. So the number of comparisons is stil N, however we wil get a huge performance gain with the comparison.
Now did you post this or not when I asked where the 5% performance increase of using hashstring comes from??!?!

If so then you owe me an apology. If it's not what the performance gain is from from WTF did you post that in response to my question? Try to act like a regular person from now on.

I was nice before and qualified by saying things like "if I misunderstood" out of politeness but obviously you misunderstood, that was clear then. You shouldn't be a tenth so arrogant with people when you barely speak the language, let alone jump on someone for trying to be polite by assuming they are admitting to idiocy or something.
Last edited by Knotanalt on Tue Jan 29, 2013 12:04 am, edited 2 times in total.
Knotanalt
Halfling
Posts: 94
Joined: Sun Jul 01, 2012 2:58 pm
x 2

Re: Optimizing Away the Use of Strings as Handles

Post by Knotanalt »

And if that's not where the performance savings comes in can someone ELSE point it out to me? I don't have Ogre set up for profiling and don't have lots of materials set up to text performance on a "real" project yet anyway or I would, but I am interested to see what's going on.
User avatar
vitefalcon
Orc
Posts: 438
Joined: Tue Sep 18, 2007 5:28 pm
Location: Seattle, USA
x 13

Re: Optimizing Away the Use of Strings as Handles

Post by vitefalcon »

Knotanalt wrote:There's no reason to be such a jerk, seriously.

Why don't you just post the code location like I asked instead of being insulting.
So who's being the jerk here? I seriously wasn't trying to jerk you off (no pun intended). You were making wild claims, from my point, without understanding the problem. Otherwise, why're you asking show me what to optimize?
Knotanalt wrote:Now you are making the big assumption here. Obviously you are not good with english because you have made some pretty dumb responses to what I have said.
Does my English not being good mean I'm wrong? If so we can do nit-picking and flame-wars just with your 'english' being spelt with the first alphabet being a small-case, when a noun should spelt with a capitalized letter. English is my third language. Considering that I'd say I'm better than most who have it as their second language.

Knotanalt wrote:
vitefalcon wrote:It is not for scenegraph traversing. Every resource is currently identified by a string. If you have N resources you would have to do N string comparisons where each string comparison takes another M comparisons, where M is the length of the string. Dont forget the number of bytes it has to fetch from memory for each string.

With a HashString, to the programmer that looks and feels like a string but is represented as a number instead. So the number of comparisons is stil N, however we wil get a huge performance gain with the comparison.
Now did you post this or not when I asked where the 5% performance increase of using hashstring comes from??!?!
Yes, I did. But I never gave the 5% gain in framerate claim you originally made. Didn't you? Now let's look at what I was talking there. Considering:
  • we're going to search for a material name in MaterialManager
  • given
    • we have 69 materials in Examples.material file alone
    • each material names have atleast 9 characters (since all start with "Examples/")
The maximum number of comparisons is going to be ln(69)*9 = 58.xx comparisons.
Now with hash-strings, since the strings itself gets compiled into a number by means of TMP, the maximum number of comparisons becomes ln(69). Now that's a 89% reduction in search, right? ((1-1/9) * 100). Is that a huge-performance gain or what? Now I'm being modest with the numbers here. You, as well as all, know that the examples material names are much longer than 9 characters. So the performance gain should be more than 90%.

NOTE: By performance gain, I mean, the number of comparisons avoided.

P.S.: I did mention later that this would be pointless in Ogre3D because Ogre3D itself is doesn't know what materials the user uses and hence cannot compile those values into numbers and hence hash-strings are only useful to be used for end-users of Ogre3D by creating them on their own.
Knotanalt wrote:If so then you owe me an apology. If it's not what the performance gain is from from WTF did you post that in response to my question? Try to act like a regular person from now on.
I owe you no apology because you were the one making false numbers and claims against me. If anything, you owe me one. But it doesn't matter. Mate, you should try to act like a regular person and understand that English doesn't measure your knowledge. If you took the response so badly, I didn't mean it. All I was trying to point out was that you were trying to solve something whose problem you didn't know/understand.
Knotanalt wrote:I was nice before and qualified by saying things like "if I misunderstood" out of politeness but obviously you misunderstood, that was clear then. You shouldn't be a tenth so arrogant with people when you barely speak the language, let alone jump on someone for trying to be polite by assuming they are admitting to idiocy or something.
Did you? Did you really say "if I misunderstood" in that or the previous post, where you talked about caching? Stop making false claims again! Please go back and read your own posts. I re-read it and I cannot find such a sentence. You did say 'If I understand...' and that's what I said you didn't understand. All I can see is that you were offended because I didn't explain the whole thing to you and spoon-feed you the code location. You had the whole 3 pages of this post to understand that. Did you take the time to read it? There isn't one particular code location to solve. That's the whole point. It's a big change! If you want to help don't be ignorant saying I don't have the development environment setup so show me the code. Be pragmatic, realistic and just do it instead of being lazy.
Last edited by vitefalcon on Tue Jan 29, 2013 3:40 am, edited 1 time in total.
Image
User avatar
vitefalcon
Orc
Posts: 438
Joined: Tue Sep 18, 2007 5:28 pm
Location: Seattle, USA
x 13

Re: Optimizing Away the Use of Strings as Handles

Post by vitefalcon »

bstone wrote:Hash containers still store the real (unhashed) values and perform full comparison among all colliding hashes. That's why something like std::hash_map<> would never cause any troubles compared to discarding the real values and using only their hashes.
Agreed. I was under the assumption it acted more like a hash-table. Sorry, I didn't do my due diligence before your post on std::unordered_map.
bstone wrote:
vitefalcon wrote:So you're adding one more layer on top of the problem.
I would disagree. That could be a good intermediate step before dropping the strings out completely and it would boost performance for applications that wouldn't use string identifiers right away as been demonstrated already by Tiffer earlier in this thread.
I agree with Tiffer's suggestion. The only thing I said to that was I wasn't sure what sort of current use-cases it would break or not work with. What I wasn't sure of was about the Identifier class that serves to make string have idempotent instances. Wouldn't you still implement the mapping from string to Identifier using std::map or std::unordered_map? If so, you're still doing a O(n log n) search using std::map and O(n)'ish from std::unordered_map for when the key is a string. And since currently all of those map data-structures are std::unordered_map<String, ResourcePtr>, you're just adding one more layer like this: std::unordered_map<String, Identifier> [O(n)'ish] -> std::unordered_map<Identifier, ResourcePtr> [O(1)'ish], instead of std::unordered_map<Identifier, ResourcePtr> [O(n)'ish]. Let's say the user has to store the Identifier instance instead of name, I know that ResourcePtr is a bulkier data-structure than Identifier, but why not keep the copy of a ResourcePtr itself? Maybe something didn't click with me about that data-structure. I'll revisit it and try to learn when I get home.

EDIT: (I haven't reached home) I think I get it. Correct me if I'm wrong. So you'd implement your resource loading like this?

Code: Select all

Identifier myMaterial("Examples/MyMaterial");
myEntity->setMaterialId(myMaterial); // I know there isn't one like this already. It's a possible function call, right?
So basically, setMaterialId() -> foreach sub-entity -> subEntity->setMaterialId() -> materialManager->getById() -> O(1) complexity to get the material.

EDIT2: No I don't, creating the Identifier still would have O(n) complexity :|
Image
User avatar
masterfalcon
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 4270
Joined: Sun Feb 25, 2007 4:56 am
Location: Bloomington, MN
x 126

Re: Optimizing Away the Use of Strings as Handles

Post by masterfalcon »

I have an idea, let me know what you think. Maybe I'm oversimplifying things.

We don't need any kind of map or hashing. Let's go back to the core of the issue, strings are used internally for identifying various resources. The allocation and comparison overhead is too much and using strings is unnecessary anyway. So we need to use some sort of integer identifier. Let me suggest a size_t(I'll get to why in a moment), that ought to be large enough on most platforms for now.

Let's give the ResourceManager class, and therefore its subclasses, a StringVector of names for resources of the type that it manages. Resource types that contain other lists can do the same. For example, Technique has a list of Pass names, etc. But what about the unique ID? We already have an implied ID as the index in the StringVector and it's scoped to that particular resource type. Resources would know their ID but not their name but that could be queried through the manager as it is now.

To assign ID's we can find the first NULL object in the vector(not empty string) the index of which becomes the ID for that new resource. That way things can be deleted and their spaces(and ID's) reused without resizing the vector and keep it as packed as possible.

Maybe I haven't thought it through enough, maybe this won't work at all. Let me know.
bstone
OGRE Expert User
OGRE Expert User
Posts: 1920
Joined: Sun Feb 19, 2012 9:24 pm
Location: Russia
x 201

Re: Optimizing Away the Use of Strings as Handles

Post by bstone »

vitefalcon wrote:So who's being the jerk here? I seriously wasn't trying to jerk you off (no pun intended). You were making wild claims, from my point, without understanding the problem. Otherwise, why're you asking show me what to optimize?
vitefalcon, don't pay too much attention to that guy. I've seen the exact same pattern in another thread - posting stuff without any thought process behind it, then getting angry when pointed to that fact, then claiming how much more he knows, then never proving that.
vitefalcon wrote:I agree with Tiffer's suggestion. The only thing I said to that was I wasn't sure what sort of current use-cases it would break or not work with. What I wasn't sure of was about the Identifier class that serves to make string have idempotent instances. Wouldn't you still implement the mapping from string to Identifier using std::map or std::unordered_map?
No, you wouldn't have to map the strings to identifiers with the proper implementation. The code that Owen posted gives one idea (though I wouldn't use alignment to store the identifier type flag - that saves some memory but it isn't critical and complicates things, a regular member variable could be used instead). His identifier can wrap a pointer to string representation, or a pointer to the resource itself. It will fail if constructed from a temporary string with a heap allocated rep therefore we shouldn't really store a pointer to the internal presentation of the string but rather a copy of the string itself. All resource managers would now use HashMap< Identifier, ResourcePtr > instead of HashMap< String, ResourcePtr >. The Identifier class would provide a hash function to deal with both types (e.g. by using one bit in the bucket selector range to distinguish between strings and pointers to reduce collisions), implementing operator<() would also be easy so you could use Identifier in an ordered map or set if needed. That will basically make the switch mostly a search & replace campaign.

A slightly more advanced idea would be updating related Ogre's internals to only look inside the map if the passed Identifier wraps a string. If it wraps a pointer then extract it and use right away. There are some type safety concerns with this method though.

This might not be exactly what you're thinking about so I will repeat this: the idea is that you mainly preserve the string based API but let the user use pointer based identifiers for the sake of a slight performance boost.
Last edited by bstone on Tue Jan 29, 2013 6:56 pm, edited 1 time in total.
bstone
OGRE Expert User
OGRE Expert User
Posts: 1920
Joined: Sun Feb 19, 2012 9:24 pm
Location: Russia
x 201

Re: Optimizing Away the Use of Strings as Handles

Post by bstone »

masterfalcon wrote:Let's give the ResourceManager class, and therefore its subclasses, a StringVector of names for resources of the type that it manages. Resource types that contain other lists can do the same. For example, Technique has a list of Pass names, etc. But what about the unique ID? We already have an implied ID as the index in the StringVector and it's scoped to that particular resource type. Resources would know their ID but not their name but that could be queried through the manager as it is now.
That's certainly a step in the right direction but it kills the string based API. If we're fine with that (I'm absolutely for it) then we don't need any integral identifiers as well. All that is needed is to rework Ogre to use pointers everywhere internally without having to look them up in maps first. Remove strings from the API - leave only methods accepting pointers to the objects in question. That's it - no more string comparisons inside Ogre. The entities and resources would store their names so you could look them up while debugging or log them. If the user needs a string based access he can build and maintain a map itself and look it up in his code before passing the respective pointer down to the API.
User avatar
Zonder
Ogre Magi
Posts: 1173
Joined: Mon Aug 04, 2008 7:51 pm
Location: Manchester - England
x 76

Re: Optimizing Away the Use of Strings as Handles

Post by Zonder »

bstone wrote:
masterfalcon wrote:Let's give the ResourceManager class, and therefore its subclasses, a StringVector of names for resources of the type that it manages. Resource types that contain other lists can do the same. For example, Technique has a list of Pass names, etc. But what about the unique ID? We already have an implied ID as the index in the StringVector and it's scoped to that particular resource type. Resources would know their ID but not their name but that could be queried through the manager as it is now.
That's certainly a step in the right direction but it kills the string based API. If we're fine with that (I'm absolutely for it) then we don't need any integral identifiers as well. All that is needed is to rework Ogre to use pointers everywhere internally without having to look them up in maps first. Remove strings from the API - leave only methods accepting pointers to the objects in question. That's it - no more string comparisons inside Ogre. The entities and resources would store their names so you could look them up while debugging or log them. If the user needs a string based access he can build and maintain a map itself and look it up in his code before passing the respective pointer down to the API.
This is exactly how I see it as well.

Only thing I can think of is objects been deleted and places holding a pointer.
There are 10 types of people in the world: Those who understand binary, and those who don't...
User avatar
masterfalcon
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 4270
Joined: Sun Feb 25, 2007 4:56 am
Location: Bloomington, MN
x 126

Re: Optimizing Away the Use of Strings as Handles

Post by masterfalcon »

bstone wrote:
masterfalcon wrote:Let's give the ResourceManager class, and therefore its subclasses, a StringVector of names for resources of the type that it manages. Resource types that contain other lists can do the same. For example, Technique has a list of Pass names, etc. But what about the unique ID? We already have an implied ID as the index in the StringVector and it's scoped to that particular resource type. Resources would know their ID but not their name but that could be queried through the manager as it is now.
That's certainly a step in the right direction but it kills the string based API. If we're fine with that (I'm absolutely for it) then we don't need any integral identifiers as well. All that is needed is to rework Ogre to use pointers everywhere internally without having to look them up in maps first. Remove strings from the API - leave only methods accepting pointers to the objects in question. That's it - no more string comparisons inside Ogre. The entities and resources would store their names so you could look them up while debugging or log them. If the user needs a string based access he can build and maintain a map itself and look it up in his code before passing the respective pointer down to the API.
In what I'm envisioning, the string based API would still be intact for the end user, that's what the StringVectors are for. But we'll just never(hopefully) use it internally. That way we can have a decent compromise between increased performance and ease of use. Perhaps some introspection would be nice for debugging though.
Zonder wrote:Only thing I can think of is objects been deleted and places holding a pointer.
When a resource is deleted we'll just need to make sure the spot in the vector gets cleared out as well.
bstone
OGRE Expert User
OGRE Expert User
Posts: 1920
Joined: Sun Feb 19, 2012 9:24 pm
Location: Russia
x 201

Re: Optimizing Away the Use of Strings as Handles

Post by bstone »

Zonder wrote:Only thing I can think of is objects been deleted and places holding a pointer.
Ogre's use of smart pointers with reference counting mitigates that to a good degree. On the other hand, if that becomes a huge problem for the user then it might be the right time to switch to Unity and code with C# or Javascript. I believe that relying on Ogre to throw an exception if you pass it a name of an object that doesn't exist anymore is not a good idea in general. You would have to avoid that anyway so it's not that much of a difference.
User avatar
Zonder
Ogre Magi
Posts: 1173
Joined: Mon Aug 04, 2008 7:51 pm
Location: Manchester - England
x 76

Re: Optimizing Away the Use of Strings as Handles

Post by Zonder »

bstone wrote:
Zonder wrote:Only thing I can think of is objects been deleted and places holding a pointer.
Ogre's use of smart pointers with reference counting mitigates that to a good degree.
Yeah exactly it shouldn't be too bad.
There are 10 types of people in the world: Those who understand binary, and those who don't...
User avatar
vitefalcon
Orc
Posts: 438
Joined: Tue Sep 18, 2007 5:28 pm
Location: Seattle, USA
x 13

Re: Optimizing Away the Use of Strings as Handles

Post by vitefalcon »

masterfalcon wrote:Let's give the ResourceManager class, and therefore its subclasses, a StringVector of names for resources of the type that it manages. Resource types that contain other lists can do the same. For example, Technique has a list of Pass names, etc. But what about the unique ID? We already have an implied ID as the index in the StringVector and it's scoped to that particular resource type. Resources would know their ID but not their name but that could be queried through the manager as it is now.
I guess the resources, as it is now, already know their ID. Checking out the Resource.h, you'll see a private member called mHandle, which is of type ResourceHandle, which is a typedef of unsigned long long as per the documentation. I haven't dug deep enough to state this with absolute certainty, but it seems that Ogre currently assigns a Resource ID, but it's not being used efficiently.
Image