I'm back... with report.
I read DynamicGeometryGameState.cpp/h
and written code for VertexBuffer modification.
I kept ManualObject version for comparison.
Updating VertexBufferPacked with map(), memory write and unmap() took about 0.002 ms.
2000x faster! Very nice improvement.
Then I commented ManualObject update, created Item with dynamic mesh
and attached Item instead of ManualObject to SceneNode.
And suddenly same code (vertex buffer update) was taking 4 ms!!!
Just like ManualObject update...
What's going on???
But there was second issue.
Dynamic meshes disappeared when they were on screen edge.
First make it right, then make it fast.
I rendered AABB from Item::getWorldAabb() but nothing was rendered.
Then Item::getWorldAabbUpdated() but still nothing.
getWorldAabb() and fixed half size? AABB was visible.
So AABB had zero half size...
I checked sample again.
Mesh::_setBounds() was called once when Mesh was created.
I called it on each update but not when Mesh was created.
I checked OgreItem.cpp.
_initialise() method copies Mesh AABB and does not touch it again.
So I called Mesh::_setBounds() during Mesh construction.
And then dynamic meshes were visible when they should be.
But if I can/must set DYNAMIC mesh AABB only during construction,
then AABB will be too large (rendering overhead) or too small (meshes will disappear).
Not good.
I checked where mObjectData (with AABB info) modified in Item::_initialise() is located.
It's protected field in MovableObject. Fortunately there is _getObjectData() method that returns non-const reference.
I modified local fields of ObjectData and world fields were updated properly.
World AABBs were matching meshes.
This method of updating AABB after mesh is created should be included in Dynamic Geometry sample.
Before I move to performance, I would like to point two problems with Mesh::_setBounds() method.
First problem is hidden temporal coupling with _setBoundingSphereRadius().
You will find many articles about temporal coupling, eg.:
https://blog.ploeh.dk/2011/05/24/Design ... alCoupling
_setBoundingSphereRadius() must be called after _setBounds()
because _setBounds() sets AABB and radius.
https://github.com/OGRECave/ogre-next/b ... 2.cpp#L275
Temporal coupling is hidden in this case because it can not be inferred from method names.
And documentation does not mention it. You need to check source code.
If temporal coupling is necessary (eg. for performance) it should be obvious:
begin() / end()
map() / unmap()
open() / close()
Also documentation states that call to _setBoundingSphereRadius() is required,
but it not as call to _setBounds() is sufficient.
Maybe _setBoundingSphereRadius() could be removed along with field,
also because Item::_initialise() method ignores mesh bounding radius
and calls Aaabb::getRadius().
https://github.com/OGRECave/ogre-next/b ... m.cpp#L116
Second problem is boolean parameter (used in many places in Ogre).
Also you will find many articles about it, eg.:
https://understandlegacycode.com/blog/w ... parameters
void _setBounds(const Aabb& bounds, bool pad = true);
Boolean parameters should be used only when their meaning is obvious:
Code: Select all
setEnabled(true);
setEnabled(false);
setVisible(true);
setVisible(false);
They might be OK in private methods, too.
Nothing beats
but Ogre::MemoryDataStream with two booleans in row in constructors is really close.
Quick! What's the difference between those lines?
Code: Select all
Ogre::MemoryDataStream(buffer, bufferSize, true, false) stream;
Ogre::MemoryDataStream(buffer, bufferSize, false, true) stream;
Usually you convert boolean parameters to enums
and sometimes two booleans can be converted to single enum.
But sometimes you can have separate functions:
Code: Select all
_setBounds(const Aabb& bounds)
_setBoundsWithPadding(const Aabb& bounds)
Now the performance.
I measured separately call to map(), memory write and call to unmap().
All time (about 4 ms!) was spent in map().
Memory write was 0.000 ms.
unmap() was about 0.001 ms (sometimes 0.01 ms).
So maybe there is some problem when Item with dynamic mesh is rendered?
I commented line that attached Item to SceneNode.
No difference.
I tried upload() instead of map() + memory write + unmap().
No difference.
But I checked source and it calls map() + memory write + unamp().
So it's not a surprise.
Maybe there could be native implementation for upload()?
OpenGL 2.0 has glBufferSubData(), not sure about Direct3D 11.
map() is much more complex as it needs to expose some internal memory.
Then I tried different flags for vertex buffer:
sample uses BT_DYNAMIC_PERSISTENT (and UO_UNMAP_ALL for unmap()).
No difference.
Then I tried to change thread count for SceneManager.
No difference.
I went back to moment when ManualObject and VertexBuffer were both updated
and vertex buffer update took 0.002 ms.
Vertex buffer was updated AFTER ManualObject.
I reverted order. First Vertex buffer update, then ManualObject.
And suddenly...
ManualObject update was taking 0.002 ms, and vertex buffer update was taking 4 ms!
This means that:
1. there is a problem with first call to map() in frame.
2. ManualObject update is as fast as direct vertex buffer update.
Now back again to visible dynamic mesh.
I tweaked compositor - temp cubemap and shadow map for point light shadow.
750 FPS and suddenly map() was 0.005 ms : 0.02 ms.
I added second compositor so I could switch between them at run time.
I switched them, updated dynamic mesh and it was consistent and repeatable:
400 FPS -> map() was 2 ms
750 FPS -> map() was 0.005 ms : 0.02 ms
Then I checked OpenGL.
It had same problem but vertex buffer performance was somewhat better.
300 FPS -> map() was 1 ms
600 FPS -> map() was stable 0.001 ms
Then I checked Ogre 2.2.1.
Same behavior as 2.2.5.
Back to 2.2.5.
So far I was using Ogre mainly in windowed mode.
I tried fullscreen mode.
And suddenly in Direct3D 11 with 400 FPS map() was taking same time as with 750 FPS in windowed mode:
0.005 ms : 0.02 ms.
With OpenGL map() was still taking 1 ms in fullscreen mode.
Then I tweaked compositors to check again OpenGL in windowed mode.
550 FPS -> map() was 0.5 ms
600 FPS -> map() was 0.001 ms
And finally I tweaked compositors to check Direct3D in much lower FPS: about 100 FPS.
windowed mode -> map() takes 8 ms
fullscreen mode -> map() takes 0.04 ms
Conclusion:
There is performance issue in BufferPacked::map() method
when overall FPS (changed via compositor changes) is below SOME value.
Fullscreen mode fixes this issue for Direct3D but not for OpenGL.