Performance issues with component system and Node [Help!]

Discussion area about developing with Ogre2 branches (2.1, 2.2 and beyond)
Post Reply
al2950
OGRE Expert User
OGRE Expert User
Posts: 1202
Joined: Thu Dec 11, 2008 7:56 pm
Location: Bristol, UK
x 76

Performance issues with component system and Node [Help!]

Post by al2950 » Thu Jul 06, 2017 11:00 am

Hi

I have been trying to optimize my component system, and create a sensible memory model that works well for my general case. It turns out the greatest bottle neck is updating Ogre:SceneNode transform data every frame. Now currently I call setPosition and setOrientation for every SceneNode every frame, and it has a fairly dramatic cost. The comments on those functions state not to call them much due to SoA -> AoS, so what do other people do about synchronising their 'game engine' with Ogre? Maybe only update nodes that have changed? Or just live with it, Or something else clever I have not thought about!

Any comments or experience on this would be appreciated!
0 x

hyyou
Gremlin
Posts: 166
Joined: Wed Feb 03, 2016 2:24 am
x 6
Contact:

Re: Performance issues with component system and Node [Help!

Post by hyyou » Thu Jul 06, 2017 12:59 pm

Yes, I encountered the same issue.
I faintly remember that the cost is significant. (It was several month ago, I am not sure)

It may be because :-
1. Ogre have to recheck whether an object is in a viewport (similar to AABB, may be)
2. Ogre have to re-depth-sort.

Did you profile it? How many sceneNode do you have?
0 x

al2950
OGRE Expert User
OGRE Expert User
Posts: 1202
Joined: Thu Dec 11, 2008 7:56 pm
Location: Bristol, UK
x 76

Re: Performance issues with component system and Node [Help!

Post by al2950 » Thu Jul 06, 2017 1:32 pm

Hi!

Thanks for the reply, I am testing with 100,000 scene nodes, but our scenes are more like 10,000. However I wanted to make the hotspots very obvious.

The hotspot code is not doing any frsutrum culling or anything complicated its simply;

Code: Select all

	proxySceneNode->setPosition(component->Position);
	proxySceneNode->setOrientation(component->Orientation);
And that code can take as long as the complicated functions, like updateSceneGraph! Admittedly the components do not use the same memory model as the scene nodes and so when iterating over them I loose cache coherency on one or the other, depending on which I use to do the iteration. So I am just wondering what others do when synchronising their engine objects with Ogre.
0 x

hyyou
Gremlin
Posts: 166
Joined: Wed Feb 03, 2016 2:24 am
x 6
Contact:

Re: Performance issues with component system and Node [Help!

Post by hyyou » Thu Jul 06, 2017 1:51 pm

Just some random thought ....

1. Object Pool for your "component", may help. It boosted the whole game's performance by 10% for me.

2. As a result from 1, I decided to adopt custom allocator to allocate 99% of object.
I have to refactor everything extensively from bottom-up, but I believe it is the way to go if performance is a major concern. (not finish yet, so I can't verify whether it worths)

3. "Ogre::SceneNode::setPosition()" might (?) do frustrum internally, I am not sure.

4. If there are a lot of objects that can be particles (with no game-logic/collision), you may want to do it in GPU. I have tested it in my ancient opengl program (not Ogre), it is really fast. Something like this :-

Code: Select all

texture_x+=texture_v;
In early-version of my game library, if there are > 4K objects, I will start to get less than 60 fps. (PC)
Thus, IMHO, 10K entities are quite a lot.
0 x

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 4116
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 243
Contact:

Re: Performance issues with component system and Node [Help!

Post by dark_sylinc » Thu Jul 06, 2017 2:52 pm

Things you can do to improve performance in this situation:
  • Group your own systems by static and dynamic. It's rare (but not impossible) to have 10k moving objects. If lots of them are buildings or other sort of static objects, they need to be set differently (i.e. not every frame)
  • Try sort your iteration of SceneNodes by pointer to node->_getTransform().mPosition (and .mIndex since there's up to 4 objects using the same mPosition pointer in SIMD builds) in hopes of improving cache coherency (don't sort every frame!!! just... keep it sorted)
  • If the above doesn't work, try sorting by the pointers you read (if they're somehow cache coherent)
  • Update in parallel. setPosition, setOrientation & setScale are threadable. One quick way is to use UserScalableTask for this. Often the problem is that a single CPU isn't fast enough to burn the full bandwidth its RAM sticks provide, but several CPUs can.
0 x

al2950
OGRE Expert User
OGRE Expert User
Posts: 1202
Joined: Thu Dec 11, 2008 7:56 pm
Location: Bristol, UK
x 76

Re: Performance issues with component system and Node [Help!

Post by al2950 » Thu Jul 06, 2017 4:04 pm

Thanks for the suggestions.
dark_sylinc wrote:Try sort your iteration of SceneNodes by pointer to node->_getTransform().mPosition (and .mIndex since there's up to 4 objects using the same mPosition pointer in SIMD builds) in hopes of improving cache coherency (don't sort every frame!!! just... keep it sorted)
Yeah I thought about this, at the moment I itterate through my components in a cache coherent way, so if i sorted via SceneNodes then my Component would no longer be cache coherent!

The other issue which I may start another thread about, is what do people do about word position in a component system. Having world position can complicate memory layouts drastically, so I was going to have a separate World Transform component which users only add if they need it, the idea being, unlike a rendering system, you don't need to know the world transform of every node in your scene... Again any thoughts/experience on this would be helpful!
0 x

Hotshot5000
OGRE Contributor
OGRE Contributor
Posts: 155
Joined: Thu Oct 14, 2010 12:30 pm
x 2

Re: Performance issues with component system and Node [Help!

Post by Hotshot5000 » Sun Jul 09, 2017 2:31 pm

From looking at the code it says to not call setPosition() or setOrientation() too often, but for translate() or rotate() it doesn't say the same thing. I checked the implementation and it doesn't look like translation and rotation should be faster than setPosition/setOrientation. Am I correct in this or am I missing something? Is translate() somehow faster than setPosition()?
0 x

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 4116
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 243
Contact:

Re: Performance issues with component system and Node [Help!

Post by dark_sylinc » Sun Jul 09, 2017 3:24 pm

Hi,

No. Translate is more expensive than setPosition.

Whereas all that setPosition does is:

Code: Select all

mTransform.mPosition->setFromVector3( position, mTransform.mIndex );
Translate does that and more:

Code: Select all

mTransform.mPosition->getAsVector3( position, mTransform.mIndex );
switch(relativeTo) { /* ... translation math ... */}
mTransform.mPosition->setFromVector3( position, mTransform.mIndex );
As for the comment in setPosition that says:
Don't call this function too often, as we need to convert to SoA
This is aimed at not calling setPosition more than once per frame. Years ago people would want to build a physics engine on top of Ogre, i.e. Ogre::Node containing the transformations of the object, and therefore call node->getPosition() and node->setPosition() too often. Or if not a physics engine, then they would still use it for their logic code (gameplay, AI, etc).
This would result in calling setPosition() multiple times per object per frame, which would be very expensive.

What al2950 is doing is intended usage.

I had more time to think about this issue and I believe keeping your updates sorted by node->_getTransform().mPosition + mIndex should improve performance (maybe even dramatically) as everytime setPosition is called, even if it's just for writing, the CPU will load the entire cache line, write the value and eventually flush the cache to RAM. Since each cache line contains on average 5.33 nodes, updating at random would heavily trash the cache (loading lots of lines, on the cache, unloading lines only to load them again a little while afterwards, etc).
This is also even more important if updating using multiple threads (as I suggested), as updating at random would cause false cache sharing.
al2950 wrote:Yeah I thought about this, at the moment I itterate through my components in a cache coherent way, so if i sorted via SceneNodes then my Component would no longer be cache coherent!
Ogre is much more sensitive to cache coherency because of its SoA arrangement, so you'll likely win more by sorting by Ogre than sorting by Component (unless your Component also uses a SoA arrangement, in that case you're screwed).
The ideal obviously would be to make both your Component and Ogre match in cache coherency, which should be possible but that really depends on the other component.
0 x

N0vember
Gremlin
Posts: 196
Joined: Tue Jan 27, 2009 12:27 am

Re: Performance issues with component system and Node [Help!

Post by N0vember » Mon Jul 10, 2017 10:51 pm

This whole thread make me wonder once again about the "weird" spot in which Ogre is situated. Halfway between a rendering engine and a game engine.
So you really have duplicated transforms for all your entities, and updates every frame to sync the ogre nodes with your engine objects, plus all the overhead that comes with these updates.
Seems like we all have the same pattern. I didn't have performance concerns yet because I didn't go overboard yet with the amount of moving entities.

(Thinking out loud here) What you would like ideally is to not even have the scene hierarchy in Ogre, but in your engine.
In a submission based engine like bgfx you would just compute these transforms in whatever way works best for you and send them to Ogre every frame.

What more is Ogre doing for free for us in this case, apart from updating the transforms of the scene graph ? Frustum culling ?
I think summarizing this and making a list of what comes for free with the Ogre scene graph would help people decide which way to go and how to optimize their use case.
0 x

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 4116
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 243
Contact:

Re: Performance issues with component system and Node [Help!

Post by dark_sylinc » Mon Jul 10, 2017 11:24 pm

N0vember wrote:This whole thread make me wonder once again about the "weird" spot in which Ogre is situated. Halfway between a rendering engine and a game engine.
So you really have duplicated transforms for all your entities, and updates every frame to sync the ogre nodes with your engine objects, plus all the overhead that comes with these updates.
Well, if I were making my own game engine from scratch, I would still follow this pattern. What's in logic != what you see.
Many games do it.
Duplicating transforms is required to make separation of graphics from logic/physics work, make interpolation work (thus dynamic framerate for graphics with fixed framerate for physics), make multithreading work, make quake-style multiplayer work (pos & rotation prediction).

There are cases where you want an animation to affect the world (therefore it would probably be run by the physics engine, or gameplay-specific code); and cases where an animation is purely a cosmetic effect (in which case it would be run by Ogre).

What I would change is that Movables would hold a transform, instead of keeping it in SceneNodes, and the SceneNodes would be optional for when hierarchical animations are wanted for cosmetic effects (and these SceneNodes after processing their animations, would updated the Movable's transform).

Although that would be quite a groundbreaking change (which could improve performance in several places) it really wouldn't improve this particular scenario; as you would be copying to a flat array of MovableObjects instead of a flat array of SceneNodes.
The problem here is the cache friendliness of the operation.
What more is Ogre doing for free for us in this case, apart from updating the transforms of the scene graph ? Frustum culling ?
I think summarizing this and making a list of what comes for free with the Ogre scene graph would help people decide which way to go and how to optimize their use case.
Updating the hierarchy transforms for animation (which is cheap), frustum culling, and... preparing the transform to know how to actually orientate and place the triangles on screen.
It can also be used for SceneQueries, and in the future probably occlusion culling.
Nodes used for lighting are also used for preparing the lighting information (Forward and Forward+)

And last but not least, I would like to hear al2950's numbers after sorting the Nodes, that should dramatically increase the number of nodes he can update per frame (and like zxz mentioned, 10k dynamic objects updated every frame at 60hz is already hard to achieve for many other engines).
0 x

al2950
OGRE Expert User
OGRE Expert User
Posts: 1202
Joined: Thu Dec 11, 2008 7:56 pm
Location: Bristol, UK
x 76

Re: Performance issues with component system and Node [Help!

Post by al2950 » Wed Jul 12, 2017 10:58 pm

dark_sylinc wrote: Duplicating transforms is required to make separation of graphics from logic/physics work, make interpolation work (thus dynamic framerate for graphics with fixed framerate for physics), make multithreading work, make quake-style multiplayer work (pos & rotation prediction).
There are many reasons to keep your logic transforms separate from your graphics transforms, but this reason is often ignored but extremely important. Interpolating your graphics transforms can have a fairly substantial visual impact.
dark_sylinc wrote:And last but not least, I would like to hear al2950's numbers after sorting the Nodes, that should dramatically increase the number of nodes he can update per frame (and like zxz mentioned, 10k dynamic objects updated every frame at 60hz is already hard to achieve for many other engines).
:D Sadly this is very difficult to impossible :( . To itterate over scene node objects in a cache friendly way would require using the node mem managers directly and the trasnform structs. However the issue is there is no sensible way that I can see to link a 'al2950 scene component' to a node mem manager entry. ie I can in my engine easily create a lookup table between component ID and Scene Node ID, but there is no way to store Component ID against a Node Mem Manager entry. I could store it in the userdata field in SceneNode but thats not going to help. Happy to try and suggestions!

As you said updating 10,000 moving objects every frame is not only difficult but unrealistic for most scenes, so i will most likely optimise it by making what I can static nodes, and for the rest create a system where only components that have been moved will be synchronised. It will trash the cash, but will probably be faster in most use cases.
0 x

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 4116
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 243
Contact:

Re: Performance issues with component system and Node [Help!

Post by dark_sylinc » Wed Jul 12, 2017 11:13 pm

I think you're thinking this too hard.

Assuming your code is something like this:

Code: Select all

struct MyGrandEntity
{
    PhysicsEntity *physicsEntity;
    SceneNode *sceneNode;
    MovableObject *movableObject;
    /* ... */
};

std::vector<MyGrandEntity> MyGrandEntityVec;
MyGrandEntityVec entities;

foreach( entity in entities )
{
    entity->sceneNode->setPosition( entity->physicsEntity->getPosition() );
    entity->sceneNode->setOrientation( entity->physicsEntity->getOrientation() );
}
Then you can sort your entities very easily like this:

Code: Select all

bool OrderMyGrandEntityByCache( const MyGrandEntity *_l, const MyGrandEntity *_r )
{
	const Transform &transfA = _l->sceneNode->_getTransform();
	const Transform &transfB = _r->sceneNode->_getTransform();
	uint8 *ptrA = (uint8*)(transfA.mPosition) + transfA.mIndex;
	uint8 *ptrB = (uint8*)(transfB.mPosition) + transfA.mIndex;
	
	return ptrA < ptrB;
}

std::sort( entities.begin(), entities.end(), OrderMyGrandEntityByCache );
Just don't do this every frame, but rather every now on then after creating/deleting N objects (specially when deleting).

For quickly testing the difference, just create 10.000 objects, benchmark it; then sort those 10.000 entries with this snippet, and benchmark again. Then see what happens at 100k
0 x

hyyou
Gremlin
Posts: 166
Joined: Wed Feb 03, 2016 2:24 am
x 6
Contact:

Re: Performance issues with component system and Node [Help!

Post by hyyou » Sat Jul 15, 2017 3:16 am

I guess I understand dark_sylinc's post correctly. If not, I will delete text in this post later.(sorry)

I believe al2950's is currently like :-

Code: Select all

foreach( entity in entities ..... request it to sorted by [b]physicsComponent[/b] address )  
{
    entity->sceneNode->setPosition( entity->physicsComponent ->getPosition() );
    entity->sceneNode->setOrientation( entity->physicsComponent ->getOrientation() );
}
It is not Ogre cache-friendly, but al2950's engine cache-friendly.

If it would be changed into :-

Code: Select all

foreach( entity in entities ..... sorted by [b]ogre's transformation[/b] address )  
{
    entity->sceneNode->setPosition( entity->physicsComponent ->getPosition() );
    entity->sceneNode->setOrientation( entity->physicsComponent ->getOrientation() );
}
It will be Ogre cache-friendly but not al2950's engine cache-friendly.

The performance might be better, might be worse, might be indifferent.
0 x

al2950
OGRE Expert User
OGRE Expert User
Posts: 1202
Joined: Thu Dec 11, 2008 7:56 pm
Location: Bristol, UK
x 76

Re: Performance issues with component system and Node [Help!

Post by al2950 » Sun Jul 16, 2017 9:26 am

dark_sylinc wrote:I think you're thinking this too hard.
:oops: 'Heavy sigh' you are correct, I am embarrassed to say I never thought of using the the pointer address to sort for cache locality, simple but brilliant :D

I am currently away, but I will try and get something tested next week.
0 x

al2950
OGRE Expert User
OGRE Expert User
Posts: 1202
Joined: Thu Dec 11, 2008 7:56 pm
Location: Bristol, UK
x 76

Re: Performance issues with component system and Node [Help!

Post by al2950 » Sun Jul 16, 2017 9:30 am

hyyou wrote:I guess I understand dark_sylinc's post correctly. If not, I will delete text in this post later.(sorry)

I believe al2950's is currently like :-

Code: Select all

foreach( entity in entities ..... request it to sorted by [b]physicsComponent[/b] address )  
{
    entity->sceneNode->setPosition( entity->physicsComponent ->getPosition() );
    entity->sceneNode->setOrientation( entity->physicsComponent ->getOrientation() );
}
It is not Ogre cache-friendly, but al2950's engine cache-friendly.

If it would be changed into :-

Code: Select all

foreach( entity in entities ..... sorted by [b]ogre's transformation[/b] address )  
{
    entity->sceneNode->setPosition( entity->physicsComponent ->getPosition() );
    entity->sceneNode->setOrientation( entity->physicsComponent ->getOrientation() );
}
It will be Ogre cache-friendly but not al2950's engine cache-friendly.

The performance might be better, might be worse, might be indifferent.
Perhaps, but Ogre's memory model is more cache sensitive than mine, to quote dark_sylinc
dark_sylinc wrote:Ogre is much more sensitive to cache coherency because of its SoA arrangement, so you'll likely win more by sorting by Ogre than sorting by Component (unless your Component also uses a SoA arrangement, in that case you're screwed).
The ideal obviously would be to make both your Component and Ogre match in cache coherency, which should be possible but that really depends on the other component.
0 x

hyyou
Gremlin
Posts: 166
Joined: Wed Feb 03, 2016 2:24 am
x 6
Contact:

Re: Performance issues with component system and Node [Help!

Post by hyyou » Tue Jul 18, 2017 5:16 am

Thank a lot, al2950. That enlightens me.

I still wonder though.... Ogre must be much more sensitive many times compared to a game engine to make such optimization yield significant gain.

Case 1: game engine use SoA
I use pool allocator for component, I believe it has the same effect of SoA. (SoA stackoverflow link)

For me, there are 2 most-used indirection pattern inside game-logic (not-ogre-related). The first one is :-

Code: Select all

Entity<-->ComponentPhysic  
Entity<-->ComponentGraphic
Entity<-->ComponentGamelogic1,2,3,...
The above is moderately hard to optimized e.g. address of "Entity" much be (roughly) consistent with "Component".

The second one is hopping like a map. It is practically impossible for me to optimize e.g. :-

Code: Select all

map<Entity,Entity>   //e.g. Access Entity ID 5 <--> Entity ID 8433 (cache miss)  
This memory jumping happen more often than a thin bridge between GraphicComponent<-->Ogre's object.

Case 2: game engine use random new/delete
IMHO, The overall cache miss in game-logic itself would be high and overshadow Ogre's cache miss (when we set transformation).

Conclusion
Thus, in both cases, I believe even after optimizing by sorting by Ogre's address, the overall performance of a game may not increase much i.e. not worth or quite impossible to optimize.

(It might be just because I am not skillful.)
0 x

paroj
OGRE Team Member
OGRE Team Member
Posts: 877
Joined: Sun Mar 30, 2014 2:51 pm
x 176
Contact:

Re: Performance issues with component system and Node [Help!

Post by paroj » Tue Jul 18, 2017 11:04 am

ideally your engine should lay out nodes hierarchically in a BFS fashion, matching OgreNodeMemoryManager. Then updating the state would be just a matter of memcpy which is as fast as it gets when you require double buffering.
0 x

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 4116
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 243
Contact:

Re: Performance issues with component system and Node [Help!

Post by dark_sylinc » Wed Jul 19, 2017 1:17 am

Hi!
hyyou wrote:Thank a lot, al2950. That enlightens me.

I still wonder though.... Ogre must be much more sensitive many times compared to a game engine to make such optimization yield significant gain.

Case 1: game engine use SoA
I use pool allocator for component, I believe it has the same effect of SoA. (SoA stackoverflow link)
No. A pool allocator has not the same effect of "SoA". The problem is not that we store our data in a flat contiguous chunk of memory (that is actually good). The problem is that our data is layed out like this:

Code: Select all

XXXX YYYY ZZZZ XXXX YYYY ZZZZ XXXX
If our nodes are called A, B, C, D, E, F, G, H, I, J, K, L their data would correspond like this:
ABCD ABCD ABCD EFGH EFGH EFGH IJKL IJKL
That means that if you access node A's position, we need to access address 0x0000, 0x0010 and 0x0020.
To access D's position, we need addresses 0x000C, 0x0001C, and 0x002C.
To access E's position, we need addresses 0x0030, 0x00040, and 0x0050.

You may have noticed I used colours. The colours indicate cache lines accessed (assuming cache lines of 64 bytes which is the most common).

As you may have guessed by now, accessing A will load cache line 0; while accessing E requires loading cache lines 0 and 1.
This isn't too hurtful. But if you have 10.000s of nodes and they're horribly scrambled, you'll be constantly loading lots of cache lines just to access one node, and then discard that line. And by the time you need a node that was in the same line, that line is likely not in the same node.

At basic level AoS arrangements have the same issue. Accessing node #8880 and then accessing node #2 are obviously in different cache lines.

But let's compare an AoS arrangement for these same nodes that fit in the same 2 cache lines:

Code: Select all

xyz xyz xyz xyz xyz xyz xyz xyz xyz xyz xy
AAA BBB CCC DDD EEE FFF GGG HHH III JJJ KK
What's the difference?
  1. In SoA to access E, F, G, H, I, J, K, & L we need to load two cache lines (E, F, G, H's data is in the 1st & 2nd cache lines, I, J, K & L's is in the 2nd and 3rd cache lines).
  2. In AoS to access F & K we need to load two cache lines (F is in the 1st & 2nd cache line, K's in the 2nd & 3rd lines).
This isn't a problem if your accesses are relatively sequential. But if they're too chaotic, SoA is way more likely to trash two cache lines, while with AoS most of the time you will be trashing one cache line and rarely two.

That's why Ogre is much more susceptible.
This is for position & scale. Quaternions don't have this issue because they're 16 bytes each, and 4 quaternions fit exactly in 1 cache line, therefore SoA & AoS in Quaternions is exactly the same in terms of cache behavior.
hyyou wrote: For me, there are 2 most-used indirection pattern inside game-logic (not-ogre-related). The first one is :-

Code: Select all

Entity<-->ComponentPhysic  
Entity<-->ComponentGraphic
Entity<-->ComponentGamelogic1,2,3,...
The above is moderately hard to optimized e.g. address of "Entity" much be (roughly) consistent with "Component".
The second one is hopping like a map. It is practically impossible for me to optimize e.g. :-

Code: Select all

map<Entity,Entity>   //e.g. Access Entity ID 5 <--> Entity ID 8433 (cache miss)  
This memory jumping happen more often than a thin bridge between GraphicComponent<-->Ogre's object.
Oh boy. Jumping like a map is never good, no matter what engine.

This Entity / Component model is like Unity's, and that's why Unity's soooo painfully slow when compared to other engines. Although this model could be in theory optimized for cache friendliness, such optimizations would have to make trade offs or certain assumptions, such as assuming most objects are created at loading time and rarely destroyed instead of being created & destroyed dynamically at any time; or assume that a Physics component (which could be disabled, but it must be allocated) must have graphics component and viceversa, or some other restriction.
These optimizations place restrictions or trade offs which have no or little place in a generic engine that could be used in any way the user wants, either creating and destoying objects at almost any time, with a random permutation of components.

While great for flexibility, customization, friendlieness, and adaptable to many situations, all of this is paid in speed. It's slow.

If you're skillfull and have the time to optimize some of your engine's layout to be in harmony, then great. In most cases you should be able to sort your updates by Ogre's memory location, and let the other components trash. It will be a win.
But if you can't simply sort updates like that, you're in a hurry, don't have the time, don't feel like you understand this well enough, or a refactor would be far too time consuming, then what I can suggest is to disable OGRE_SIMD_* in CMake.
You will lose performance from losing SSE2/NEON in scene hierarchy updates, AABB updates & frustum culling.
However you will force Ogre to use AoS instead of SoA, and if your case is too severe, that alone may yield higher overall performance benefit than what you lose from removing SIMD.

Cheers
Matias
0 x

hyyou
Gremlin
Posts: 166
Joined: Wed Feb 03, 2016 2:24 am
x 6
Contact:

Re: Performance issues with component system and Node [Help!

Post by hyyou » Wed Jul 19, 2017 5:06 am

Thank a lot, dark_sylinc!! That is the most delicious post in the internet I ever read in this year.

After read it several times, I plan to reallocate entity/component (move to a more suitable address) - (1) after the first frame it created (2) once in a while - to match access pattern.
(adapted from dark_sylinc's suggestion to sort transform pointer)
  • ♦ For the reallocation, I will use these information to determine to address:-
    → 1. the map entity<-->entity(ies) - only the important ones
    → 2. entity<-->component
    → 3. some custom guide e.g. address of ogre's transform!

    ♦ As a result, I can't cache pointer of entity/component anymore, but I can refactor it to use int-ID instead.
    I have to remember to re-setUserData() for external library (Ogre and Bullet for me).

    ♦ This may be one of the possible restriction you mentioned. Luckily, it doesn't seem too hard (for now). :D
Sorry for a little off-topic ... too excited
0 x

Post Reply