Ogre 2.1 node updates are painful to work with

Shimayama · Post by **Shimayama** » Thu Feb 22, 2018 1:34 pm

We are currently porting a huge code base from Ogre 1.x to 2.1. Things are going pretty well, except for the change in how Node's derived transform and updateFromParent works in 2.1

In a large code base like ours, with a lot of different plugins operating on scene nodes, it is very difficult to make sure that the !mCachedTransformOutOfDate assertion is never triggered by a Node::_getDerivedXXX(). I understand the reason for the changes in node updates, but I think usability has been sacrificed too much, and in the end it is very likely to lead to worse performance because the user will be tempted to use the _getDerivedXXXUpdated() over _getDerivedXXX() too often.

Below is a summary for some of the problems and potential fixes. I might have misunderstand some things though, so feel free to correct me.

Some problems:

1.

Code: Select all

node->_setDerivedPosition(Ogre::Vector3(10.0f, 0.0f, 0.0f));
...
Ogre::Vector3 derivedPos = node->_getDerivedPosition(); // assert( !mCachedTransformOutOfDate );

This one is really unexpected from the users point-of-view..

2.

Code: Select all

parentNode->_setDerivedPosition(Ogre::Vector3(10.0f, 0.0f, 0.0f));
childNode = parentNode->getChild(0);
...
childNode->_setDerivedPosition(Ogre::Vector3(20.0f, 0.0f, 0.0f)) // assert( !mCachedTransformOutOfDate );

This one is difficult to guard against if different systems are controlling updates on nodes at different levels in the scene graph (e.g. 'parentNode' and 'childNode' in this example). This makes it really tempting to overuse the _getDerivedPositionUpdated..

3. Node::_updateFromParent() seems to potentially be much more expensive for a single node than in 1.x, since it updates transforms for multiple nodes (the SoA) and for all parents (and their SoA), and it does not skip the update if already up-to-date (checking mCachedTransformOutOfDate, or mNeedParentUpdate in Ogre 1.x).

4. Node::updateFromParentImpl() is only triggered when the user calls any of the _getDerivedXXXUpdated(), while Node::updateAllTransforms() is triggered by UpdateSceneGraph(). Then why is it necessary for Node::updateFromParentImpl() to update the SoA? Couldn't it just update its own transforms?

5. There is no function Node::_getCachedDerivedXXX(). If you are running RELEASE build and are not concerned about the last update, you can use _getDerivedXXX(), but in DEBUG you might get the mCachedTransformOutOfDate assertion, which makes it tempting to branch on the DEBUG code to use _getDerivedXXXUpdated() instead...

Suggestions of improvements:

* It should be quicker to update the derived transform of a single node.
* It should be possible to explicitly get the cached transform (using same code in RELEASE and DEBUG).
* These changes must not have performance impact on UpdateSceneGraph().

1. Change updateFromParentImpl() to only update its own transform, mTransform.mDerivedTransform[mTransform.mIndex].

2. Define mCachedTransformOutOfDate also in RELEASE build, but without assertions. Change _updateFromParent() to call updateFromParentImpl only if mCachedTransformOutOfDate is true.

3. Add _getCachedDerivedXXX() (similar to what is done in OgreCamera.h).

What do you think of these suggestion? Will they mess things up? I am thinking of implementing something like this in our own Ogre branch, but it would be best if it was integrated in the Ogre main branch.

al2950 · Post by **al2950** » Thu Feb 22, 2018 7:10 pm

We ran into similar issues/pain. I did open a topic a while back but can't find it. To be honest we just implemented our getDerivedXXXNoAssert(const Ogre::SceneNode&) helper function

Lax · Post by **Lax** » Thu Feb 22, 2018 11:14 pm

I had also this assert issues, even I was sure, everything should be in cache, as soon as I moved something bam, another assert. So I deactivated the asserts via macro.

Regards
Lax

Post by **dark_sylinc** » Sat Feb 24, 2018 11:41 pm

Thanks for the useful feedback!

I just pushed a commit to address some of the issues you're describing.

Shimayama wrote: ↑Thu Feb 22, 2018 1:34 pm 1.
Code: Select all
node->_setDerivedPosition(Ogre::Vector3(10.0f, 0.0f, 0.0f));
...
Ogre::Vector3 derivedPos = node->_getDerivedPosition(); // assert( !mCachedTransformOutOfDate );
This one is really unexpected from the users point-of-view..[/code]

Indeed. This is a stupid and non-intuitive behavior. The commit addresses this.
Please note that the children of these nodes are still out of date, but at least calling node->_setDerived* followed by node->_getDerived* no longer asserts.

This one is difficult to guard against if different systems are controlling updates on nodes at different levels in the scene graph (e.g. 'parentNode' and 'childNode' in this example). This makes it really tempting to overuse the _getDerivedPositionUpdated..

3. Node::_updateFromParent() seems to potentially be much more expensive for a single node than in 1.x, since it updates transforms for multiple nodes (the SoA) and for all parents (and their SoA), and it does not skip the update if already up-to-date (checking mCachedTransformOutOfDate, or mNeedParentUpdate in Ogre 1.x).

Hi. There's two different points:

Updating the SoA as opposed to updating just this one node is not "much more expensive". In fact it may be cheaper. The problem is that the memory layout is inherently in SoA form, so when performing the update we batch-load the transforms using SSE/NEON (i.e. using movaps) and update 4 nodes at a time even if it's just for 1 node. It's like grabbing a lot of chips from a potato bag with your hand because it's easier & faster than surgically picking the biggest chip with your fingers.
The only problematic issue is that parents' transform does need an AoS -> SoA conversion. But if we were to focus on just this one node, we would have to be crossing memory boundaries all the time resulting in more instructions. Hence it's not clear which method would win. If this is a heavy problem for you (and it appears it is from what you're describing) you're likely going to get better performance by disabling OGRE_SIMD_SSE2 from CMake, as that improves your cache locality and the SoA structures become AoS.
A macro to have mCachedTransformOutOfDate not just for debugging but also for actual logic (i.e. if parent is already up to date, then stop updating parents) sounds like an interesting idea. However this is a double edge sword because it also means every time you change data (e.g. call setPosition/Quaternion/scale/setDerived*) we have to broadcast the flag to our children (and our children's children). The patch updates the behavior to only do this if the just-introduced debug level is set to HIGH (the default for debug builds), but if you want to rely on this behavior, then the macro that enables mCachedTransformOutOfDate to be used not just for debugging then would also have to always force the broadcast.

4. Node::updateFromParentImpl() is only triggered when the user calls any of the _getDerivedXXXUpdated(), while Node::updateAllTransforms() is triggered by UpdateSceneGraph(). Then why is it necessary for Node::updateFromParentImpl() to update the SoA? Couldn't it just update its own transforms?

This should be clear with the previous explanation, but in case it isn't:
1. If mCachedTransformOutOfDate does not exist (e.g. Release builds) or we do not broadcast this dirty flag (i.e. debug level < high)
then we cannot guarantee all of our parents are up to date. Even if our parent is up to date, our parent's parent may not be, and this implies that our parent wasn't actually up to date.

2. If mCachedTransformOutOfDate exists and the dirty flag is broadcasted to children (i.e. debug level >= high), then you're correct. Once we find a parent that says mCachedTransformOutOfDate == false, we can stop.

5. There is no function Node::_getCachedDerivedXXX(). If you are running RELEASE build and are not concerned about the last update, you can use _getDerivedXXX(), but in DEBUG you might get the mCachedTransformOutOfDate assertion, which makes it tempting to branch on the DEBUG code to use _getDerivedXXXUpdated() instead...

That is... an interesting idea. Though I have goosebumps at the idea of having to provide support to a user who is abusing this function and asks why his rendering is not working.
Try to see if this patch alleviates your problem. If it's still too problematic I can reconsider it, as long as there's a separate macro that can reenable them. Alternatively your debug build could set the debug level to LOW? That will shut up the asserts. When I think about it, after this patch, even setting the debug level to medium may do the trick.

Cheers
Matias

Shimayama · Post by **Shimayama** » Sun Feb 25, 2018 2:59 pm

Thanks for the quick responses and the patch. I just had a quick look, it might solve some of our issues, but I haven't tested it yet. Two notes:

1. It might be desirable to disable this particular assert while having the other high-level asserts enabled. But this should be OK for now for us since it eliminates the need for _GetCachedDerivedXXX().

2. a) What if prior to a call to _setDerivedPosition, mCachedTransformOutOfDate was already true because of a out-of-date derived orientation, wouldn't it be incorrect to set it to false? Maybe more correct to not modify mCachedTransformOutOfDate at all, or we could make it a bitmask where each bit refers to a dirty transform type (i.e. dirty position = 0x01, dirty orientation = 0x02, etc)?
b) Shouldn't it be setTrans instead of makeTrans?

Code: Select all

 void Node::_setDerivedPosition( const Vector3& pos )

     {

         //find where the node would end up in parent's local space

         if(mParent)

+        {

             setPosition( mParent->convertWorldToLocalPosition( pos ) );

+            mTransform.mDerivedPosition->setFromVector3( pos, mTransform.mIndex );

+            mTransform.mDerivedTransform[mTransform.mIndex].makeTrans( pos );

+#if OGRE_DEBUG_MODE >= OGRE_DEBUG_MEDIUM

+            mCachedTransformOutOfDate = false;

+#endif

+        }

     }

Hi. There's two different points:

Updating the SoA as opposed to updating just this one node is not "much more expensive". In fact it may be cheaper. The problem is that the memory layout is inherently in SoA form, so when performing the update we batch-load the transforms using SSE/NEON (i.e. using movaps) and update 4 nodes at a time even if it's just for 1 node. It's like grabbing a lot of chips from a potato bag with your hand because it's easier & faster than surgically picking the biggest chip with your fingers.
The only problematic issue is that parents' transform does need an AoS -> SoA conversion. But if we were to focus on just this one node, we would have to be crossing memory boundaries all the time resulting in more instructions. Hence it's not clear which method would win. If this is a heavy problem for you (and it appears it is from what you're describing) you're likely going to get better performance by disabling OGRE_SIMD_SSE2 from CMake, as that improves your cache locality and the SoA structures become AoS.

Yes. "much more expensive" included the fact that it triggers update for all ancestors without checking whether it is necessary. I was thinking also that it could be optimized slightly more by removing the for( size_t j=0; j<ARRAY_PACKED_REALS; ++j ) to only do the SoA->AoS for the parent of the node in question, and only updating a subset of the SoA data? (but still work on the same data)

A macro to have mCachedTransformOutOfDate not just for debugging but also for actual logic (i.e. if parent is already up to date, then stop updating parents) sounds like an interesting idea. However this is a double edge sword because it also means every time you change data (e.g. call setPosition/Quaternion/scale/setDerived*) we have to broadcast the flag to our children (and our children's children). The patch updates the behavior to only do this if the just-introduced debug level is set to HIGH (the default for debug builds), but if you want to rely on this behavior, then the macro that enables mCachedTransformOutOfDate to be used not just for debugging then would also have to always force the broadcast.

I've been trying to think of any alternative to this.. but it's tricky.. What if it was not broadcasted, but instead fetched when required? Say that each node has an array with pointers to its ancestors' mCachedTransformOutOfDate booleans, which it checks in a quick for loop each time a _updateFromParent is triggered? Of course it might be bad for some applications, and because of memory layout.. But still maybe faster than the broadcasting?

I think this could be an OK short-term compromise: Option to have mCachedTransformOutOfDate but without the broadcasting in RELEASE. Have a public function GetCachedTransformOutOfDate(bool checkAncestors), that returns mCachedTransformOutOfDate, logical OR'ed with all anchestors' if checkAncestors is true. This function will only be used by users that really need it at some points, and won't have any big impact on the rest of the system (except that mCachedTransformOutOfDate has to be set, but without broadcasting). Then the user at least has the possibility to do its own update-or-not logics.

A greater challenge though than the GetDerivedXXX() calls are the SetDerivedXXX(). For example, some user code might once-in-a-while call SetDerivedXXX(), without knowing that an ancestor is dirty, and then the position will become incorrect for a long time (the assert is there for a reason). (as in example 2 in the original post, say that an ancestor is a boat node controlled by SetDerivedPosition by one system, and another is a cannon node, in the boat's subtree, controlled by another system that once-in-a-while uses _SetDerivedOrientation) What is the best way of refactoring this? I guess systems controlling deeper nodes would have to be updated first, and then continue upwards. I was hoping to be able to port to Ogre 2.1 without having to change so much in the layout of the client code, while trying to have some sort of guarantee that client code logics yields same Ogre behaviour as before the update (in terms of positions etc).

Anyway, I'll let you know when I've tried the patch

Thanks for the help.

Shimayama · Post by **Shimayama** » Sun Feb 25, 2018 3:04 pm

By the way. Just found a bug with auto tracking nodes that are relevant:

If there is a Node with auto tracking enabled, and the node has any children, Ogre's UpdateSceneGraph() will produce some out-of-date transforms on the child nodes when updating the auto tracking, which will lead to an assert during the rendering.

I think this could be solved by moving the auto tracking updates up to the beginning of the UpdateSceneGraph() function (and using target->_getDerivedPositionUpdated()?

hyyou · Post by **hyyou** » Mon Feb 26, 2018 3:16 am

This is just a beginner opinion. Sorry if it is a little off-topic.

IMHO, a node manager that is wrong (counter intuitive) in a few aspects is OK.

I try to use Ogre as "write-only" because it should be nothing more than a very good Graphics Engine.
Thus, I ended up implementing a custom node system (integrate with Bullet Physics e.g. btcompoundshape).
In my case : If a user moves a child node, its parent will move too . The relative transformation is fixed by default.
Unlike Ogre that caches absolution transformation, I cache relative transformation of compound-root-body vs child.
Although mine has some level of WTF, it is very handy because I need a very custom Physics-like behavior for node.

xrgo · Post by **xrgo** » Mon Feb 26, 2018 4:59 pm

hyyou wrote: ↑Mon Feb 26, 2018 3:16 am I try to use Ogre as "write-only" because it should be nothing more than a very good Graphics Engine.
Thus, I ended up implementing a custom node system (integrate with Bullet Physics e.g. btcompoundshape).

I do that too =), I handle everything in the logic thread, and then communicate those "write-only" things to the graphics thread

Shimayama · Post by **Shimayama** » Thu Mar 01, 2018 5:17 pm

Sounds good. I've been thinking about doing the same thing for a while. It would give more flexibility and probably greatly simplify the Ogre upgrade job. But unfortunately I don't have the possibility to do it right now

hyyou wrote: ↑Mon Feb 26, 2018 3:16 am I do that too =), I handle everything in the logic thread, and then communicate those "write-only" things to the graphics thread

You then just have a flat Ogre scene graph I suppose?

hyyou · Post by **hyyou** » Sat Mar 03, 2018 2:04 pm

Shimayama wrote: ↑Thu Mar 01, 2018 5:17 pm You then just have a flat Ogre scene graph I suppose?

Yes. The most advantage is that : the coupling between my code and Ogre also decreases.

Another reason of mine is that Ogre's node had a glitch (about 1-2 year ago).
I faintly remember that total-node-derived scale of light should always =1, otherwises light's AABB will cull with wrong scale.
The glitch might be fixed already.

Shimayama · Post by **Shimayama** » Thu Mar 08, 2018 7:13 pm

dark_sylinc, did you have any thoughts about this?

2. a) What if prior to a call to _setDerivedPosition, mCachedTransformOutOfDate was already true because of a out-of-date derived orientation, wouldn't it be incorrect to set it to false?

b) Shouldn't it be setTrans instead of makeTrans?

I might be totally wrong. But just want to make sure :)

Ogre Forums

Ogre 2.1 node updates are painful to work with

Ogre 2.1 node updates are painful to work with

Re: Ogre 2.1 node updates are painful to work with

Re: Ogre 2.1 node updates are painful to work with

Re: Ogre 2.1 node updates are painful to work with

Re: Ogre 2.1 node updates are painful to work with

Re: Ogre 2.1 node updates are painful to work with

Re: Ogre 2.1 node updates are painful to work with

Re: Ogre 2.1 node updates are painful to work with

Re: Ogre 2.1 node updates are painful to work with

Re: Ogre 2.1 node updates are painful to work with

Re: Ogre 2.1 node updates are painful to work with