[2.2] ManualObject update is VERY slow

Anything and everything that's related to OGRE or the wider graphics field that doesn't fit into the other forums.
Post Reply
PiotrPL
Gnoblar
Posts: 4
Joined: Wed Sep 08, 2021 8:05 pm

[2.2] ManualObject update is VERY slow

Post by PiotrPL »

Hi :)

This is my first post.

I'm experimenting with Ogre for some time and I'm having issues. This is one of them.

I'm using Ogre 2.2.5 (SDK downloaded from Ogre site) with Direct3D 11.

I'm creating multiple ManualObjects and their SceneNodes on startup.
Each ManualObject is filled with begin() / end(),
and is later updated with beginUpdate() / end().
Full construction at startup for each ManualObject / SceneNode takes about 0.02 ms.

Then I update 1 ManualObject per second on average.

And this update takes about 4 ms!!!
When average FPS is 250 (4 ms per frame).
So ManualObject update doubles frame time.
Very noticeable single frame spikes on chart
(I write image with frame times on exit).

Each ManualObject has very few vertices / triangles (less than 10).
Increasing vertex / triangle counts by calling function that fills vertices / indices
in loop (100 or even 1000 times) does not increase update time.

At the same time upload of 1 MB texture each frame (synchronously) takes less than 0.2 ms.
When I disable texture update ManualObject update time does not change.

I noticed by tweaking compositors and rendering settings that ManualObject update time
changes almost linearly with frame time: the lower the frame time, the lower the ManualObject update time.

SDK sample for ManualObject runs at about 3000 FPS (0.3 ms per frame).
But it does not have anything besides ManualObject.
If ManualObject update time somehow depends on frame time
then such low frame time is possible in isolated demo.

I compared beginUpdate() / end() with full reconstruction (begin() / end()).
It looks that beginUpdate() is slightly slower than begin().
I also measured execution time for different ManualObject method calls.
With begin() / end() most time is spent in end().
But with beginUpdate() / end() most time is spent in beginUpdate().
Filling buffers takes less than 0.01 ms.

I'm not sure what could be the reason of such bad performance.
ManualObject API is quite simple and my code is logically similar to ManualObject sample.
Esp. same function for buffers filling (position(), index()) is called on startup and for update.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: [2.2] ManualObject update is VERY slow

Post by dark_sylinc »

Hi!

ManualObjects are build for user friendliness, not for speed.

If performance is an issue and the MO won't change, convert the ManualObject to a mesh after it's done being constructed.

If the ManualObject will change often (e.g. every frame) consider using something more direct like shown in DynamicRenderable example

Cheers
PiotrPL
Gnoblar
Posts: 4
Joined: Wed Sep 08, 2021 8:05 pm

Re: [2.2] ManualObject update is VERY slow

Post by PiotrPL »

Thank you for answer :)

I'm aware that Manual Objects are not the fastest way to update geometry.

But startup creation times and ManualObject sample that updates ManualObject with 900 vertices per frame
and still has whole frame time at 0,3 ms show it can be quite fast (as GPU buffers update is batched).

4 ms for couple vertices (and relation with frame time) does not sound right.
There might be some interactions with other parts.

I checked DynamicGeometry sample and it has almost same FPS as ManualObject sample,
but geometry is less complex.

I will try to rewrite it using this approach and I will report if it improves performance.
PiotrPL
Gnoblar
Posts: 4
Joined: Wed Sep 08, 2021 8:05 pm

Re: [2.2] ManualObject update is VERY slow

Post by PiotrPL »

I'm back... with report.

I read DynamicGeometryGameState.cpp/h
and written code for VertexBuffer modification.
I kept ManualObject version for comparison.

Updating VertexBufferPacked with map(), memory write and unmap() took about 0.002 ms.
2000x faster! Very nice improvement.

Then I commented ManualObject update, created Item with dynamic mesh
and attached Item instead of ManualObject to SceneNode.

And suddenly same code (vertex buffer update) was taking 4 ms!!!
Just like ManualObject update...
What's going on???

But there was second issue.
Dynamic meshes disappeared when they were on screen edge.

First make it right, then make it fast.

I rendered AABB from Item::getWorldAabb() but nothing was rendered.
Then Item::getWorldAabbUpdated() but still nothing.
getWorldAabb() and fixed half size? AABB was visible.
So AABB had zero half size...

I checked sample again.
Mesh::_setBounds() was called once when Mesh was created.
I called it on each update but not when Mesh was created.

I checked OgreItem.cpp.
_initialise() method copies Mesh AABB and does not touch it again.

So I called Mesh::_setBounds() during Mesh construction.
And then dynamic meshes were visible when they should be.

But if I can/must set DYNAMIC mesh AABB only during construction,
then AABB will be too large (rendering overhead) or too small (meshes will disappear).
Not good.

I checked where mObjectData (with AABB info) modified in Item::_initialise() is located.
It's protected field in MovableObject. Fortunately there is _getObjectData() method that returns non-const reference.

I modified local fields of ObjectData and world fields were updated properly.
World AABBs were matching meshes.

This method of updating AABB after mesh is created should be included in Dynamic Geometry sample.

Before I move to performance, I would like to point two problems with Mesh::_setBounds() method.

First problem is hidden temporal coupling with _setBoundingSphereRadius().
You will find many articles about temporal coupling, eg.:
https://blog.ploeh.dk/2011/05/24/Design ... alCoupling
_setBoundingSphereRadius() must be called after _setBounds()
because _setBounds() sets AABB and radius.
https://github.com/OGRECave/ogre-next/b ... 2.cpp#L275
Temporal coupling is hidden in this case because it can not be inferred from method names.
And documentation does not mention it. You need to check source code.
If temporal coupling is necessary (eg. for performance) it should be obvious:
begin() / end()
map() / unmap()
open() / close()
Also documentation states that call to _setBoundingSphereRadius() is required,
but it not as call to _setBounds() is sufficient.
Maybe _setBoundingSphereRadius() could be removed along with field,
also because Item::_initialise() method ignores mesh bounding radius
and calls Aaabb::getRadius().
https://github.com/OGRECave/ogre-next/b ... m.cpp#L116

Second problem is boolean parameter (used in many places in Ogre).
Also you will find many articles about it, eg.:
https://understandlegacycode.com/blog/w ... parameters
void _setBounds(const Aabb& bounds, bool pad = true);
Boolean parameters should be used only when their meaning is obvious:

Code: Select all

setEnabled(true);
setEnabled(false);
setVisible(true);
setVisible(false);
They might be OK in private methods, too.
Nothing beats

Code: Select all

render(false);
but Ogre::MemoryDataStream with two booleans in row in constructors is really close.
Quick! What's the difference between those lines?

Code: Select all

Ogre::MemoryDataStream(buffer, bufferSize, true, false) stream;
Ogre::MemoryDataStream(buffer, bufferSize, false, true) stream;
Usually you convert boolean parameters to enums
and sometimes two booleans can be converted to single enum.
But sometimes you can have separate functions:

Code: Select all

_setBounds(const Aabb& bounds)
_setBoundsWithPadding(const Aabb& bounds)
Now the performance.

I measured separately call to map(), memory write and call to unmap().
All time (about 4 ms!) was spent in map().
Memory write was 0.000 ms.
unmap() was about 0.001 ms (sometimes 0.01 ms).

So maybe there is some problem when Item with dynamic mesh is rendered?
I commented line that attached Item to SceneNode.
No difference.

I tried upload() instead of map() + memory write + unmap().
No difference.
But I checked source and it calls map() + memory write + unamp().
So it's not a surprise.
Maybe there could be native implementation for upload()?
OpenGL 2.0 has glBufferSubData(), not sure about Direct3D 11.
map() is much more complex as it needs to expose some internal memory.

Then I tried different flags for vertex buffer:
sample uses BT_DYNAMIC_PERSISTENT (and UO_UNMAP_ALL for unmap()).
No difference.

Then I tried to change thread count for SceneManager.
No difference.

I went back to moment when ManualObject and VertexBuffer were both updated
and vertex buffer update took 0.002 ms.
Vertex buffer was updated AFTER ManualObject.
I reverted order. First Vertex buffer update, then ManualObject.
And suddenly...
ManualObject update was taking 0.002 ms, and vertex buffer update was taking 4 ms!
This means that:
1. there is a problem with first call to map() in frame.
2. ManualObject update is as fast as direct vertex buffer update.

Now back again to visible dynamic mesh.

I tweaked compositor - temp cubemap and shadow map for point light shadow.
750 FPS and suddenly map() was 0.005 ms : 0.02 ms.
I added second compositor so I could switch between them at run time.
I switched them, updated dynamic mesh and it was consistent and repeatable:
400 FPS -> map() was 2 ms
750 FPS -> map() was 0.005 ms : 0.02 ms

Then I checked OpenGL.
It had same problem but vertex buffer performance was somewhat better.
300 FPS -> map() was 1 ms
600 FPS -> map() was stable 0.001 ms

Then I checked Ogre 2.2.1.
Same behavior as 2.2.5.

Back to 2.2.5.

So far I was using Ogre mainly in windowed mode.
I tried fullscreen mode.
And suddenly in Direct3D 11 with 400 FPS map() was taking same time as with 750 FPS in windowed mode:
0.005 ms : 0.02 ms.
With OpenGL map() was still taking 1 ms in fullscreen mode.

Then I tweaked compositors to check again OpenGL in windowed mode.
550 FPS -> map() was 0.5 ms
600 FPS -> map() was 0.001 ms

And finally I tweaked compositors to check Direct3D in much lower FPS: about 100 FPS.
windowed mode -> map() takes 8 ms
fullscreen mode -> map() takes 0.04 ms

Conclusion:
There is performance issue in BufferPacked::map() method
when overall FPS (changed via compositor changes) is below SOME value.
Fullscreen mode fixes this issue for Direct3D but not for OpenGL.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: [2.2] ManualObject update is VERY slow

Post by dark_sylinc »

API issues: yeah you're looking at 20 years of code. Still we can at least document the gotchas you found so hopefully alleviate some of that burden to the user. And hopefully eventually fix those API flaws

Performance: ok you've got my attention. Are you able to repro these issues in one of the samples so I can take a look? It sounds like map() is stalling for some reason, which it shouldn't. Either there's user error or an Ogre bug. I want to take a look
There is performance issue in BufferPacked::map() method
when overall FPS (changed via compositor changes) is below SOME value.
Fullscreen mode fixes this issue for Direct3D but not for OpenGL.
That suspiciously sounds like power management kicking in. If your app isn't too demanding the driver/OS may decide its not worth for the CPU and/or GPU to go into full power, but sometimes it does; causing the inconsistencies you see
PiotrPL
Gnoblar
Posts: 4
Joined: Wed Sep 08, 2021 8:05 pm

Re: [2.2] ManualObject update is VERY slow

Post by PiotrPL »

OK, I will try to reproduce issue inside Dynamic Geometry sample :)

Wouldn't behavior be exact opposite if it's related with power management?
If FPS is low then all power should be unleashed and map() should be super fast.
OTOH if FPS is high then you can surely wait a bit for map(), can't you?
Also map() time scales with FPS and at certain point is suddenly like 100x faster.
And still that 8 ms for 100 FPS... that's very long.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: [2.2] ManualObject update is VERY slow

Post by dark_sylinc »

PiotrPL wrote: Mon Sep 20, 2021 8:42 pm Wouldn't behavior be exact opposite if it's related with power management?
If FPS is low then all power should be unleashed and map() should be super fast.
OTOH if FPS is high then you can surely wait a bit for map(), can't you?
No, because you have two possibilities:
  1. The OS thinks the process is taking acceptable time at low power. Your FPS will now be lower but the OS thinks this is fine
  2. The process is doing something intensive, power should be boosted, fps goes up
Thus this gets you in an weird scenario where you've got low fps, then you start doing more work, and fps goes up instead of down (because the OS predicts at the current power profile it will take too much time, and boosts the clocks)

I'm not saying this is what is happening, but power saving can give you really wtf readings when profiling
Post Reply