[2.1]Ram usage with custom Vaos

Discussion area about developing with Ogre-Next (2.1, 2.2 and beyond)


Hilarius86
Halfling
Posts: 51
Joined: Thu Feb 14, 2019 11:27 am
x 8

[2.1]Ram usage with custom Vaos

Post by Hilarius86 »

VS2017, Win10x64, Ogre-next (2.1)

If I generate a VertexBuffer with BT_DEFAULT_DYNAMIC the size gets multiplied by 3.

Code: Select all

void D3D11VaoManager::allocateVbo(...)
if( bufferType >= BT_DYNAMIC_DEFAULT )
{
  bufferType  = BT_DYNAMIC_DEFAULT; //Persitent mapping not supported in D3D11.
  sizeBytes   *= mDynamicBufferMultiplier;
}
Why does that happen? I ported from 1.11 and my RAM usage seems to have tripled. I kept the workflow of generating a mesh, submesh and vertexbuffer, locking/mapping the buffer and filling it with data followed by unmapping mostly unchanged apart from using the new API. If I switch to a different buffer type below dynamic, I would have to use upload instead of mapping and restructure the code. Just wanting to make sure thats reasonable before moving forward.

Next thing is destruction of manual objects that are not loaded from disk. The ram usage is quite high and is not reduced in the same magnitude when destroying objects . I followed the trail and saw that my vbos get deallocated after a few frames (destroyDelayedBuffers) and are repurposed as free buffers. I don't mind the performance hit to reallocate, but for me it seems unintuative that the ram usage stays high even after deallocation of the vbos.
For some more background: the program is not creating new objects while rendering, but halting rendering to change the scene and only starts after finishing. There is no logic loop, but only starts rendering with mouse input.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5433
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1341

Re: [2.1]Ram usage with custom Vaos

Post by dark_sylinc »

Hilarius86 wrote: Thu Oct 14, 2021 4:18 pm VS2017, Win10x64, Ogre-next (2.1)

If I generate a VertexBuffer with BT_DEFAULT_DYNAMIC the size gets multiplied by 3.

Code: Select all

void D3D11VaoManager::allocateVbo(...)
if( bufferType >= BT_DYNAMIC_DEFAULT )
{
  bufferType  = BT_DYNAMIC_DEFAULT; //Persitent mapping not supported in D3D11.
  sizeBytes   *= mDynamicBufferMultiplier;
}
Why does that happen? I ported from 1.11 and my RAM usage seems to have tripled. I kept the workflow of generating a mesh, submesh and vertexbuffer, locking/mapping the buffer and filling it with data followed by unmapping mostly unchanged apart from using the new API. If I switch to a different buffer type below dynamic, I would have to use upload instead of mapping and restructure the code. Just wanting to make sure thats reasonable before moving forward.
BT_DEFAULT_DYNAMIC is optimized for data that is modified every frame from scratch. It's rare to have to have a lot of this type of data except for particle FXs.

We triple (double if you set "VaoManager::mDynamicBufferMultiplier" in miscSettings to 2 for double buffering) the buffer size because Ogre manually synchronizes data to prevent arbitrary stalls. Ogre 1.x let the driver handle synchronization.
Hilarius86 wrote: Thu Oct 14, 2021 4:18 pm If I switch to a different buffer type below dynamic, I would have to use upload instead of mapping and restructure the code. Just wanting to make sure thats reasonable before moving forward.
If your code was structured to map the buffer, then I recommend you simply use a malloc'ed memory ptr:

Code: Select all

// Old code
void *data = buffer->map( ... );
// write to data
buffer->unmap();

// New code
void *data = OGRE_MALLOC_SIMD( ..., MEMCATEGORY_GEOMETRY );
// write to data
buffer->upload( data, 0, numElements );
OGRE_FREE_SIMD( data, MEMCATEGORY_GEOMETRY ); // Or leave the pointer around for reuse later
Hilarius86 wrote: Thu Oct 14, 2021 4:18 pm Next thing is destruction of manual objects that are not loaded from disk. The ram usage is quite high and is not reduced in the same magnitude when destroying objects . I followed the trail and saw that my vbos get deallocated after a few frames (destroyDelayedBuffers) and are repurposed as free buffers. I don't mind the performance hit to reallocate, but for me it seems unintuative that the ram usage stays high even after deallocation of the vbos.
For some more background: the program is not creating new objects while rendering, but halting rendering to change the scene and only starts after finishing. There is no logic loop, but only starts rendering with mouse input.
Memory is not released immediately.
We have VaoManager::cleanupEmptyPools to release memory immediately but it can be slow as it must perform a defragment step.

You can also fine tune "VaoManager::VERTEX_IMMUTABLE", "VaoManager::VERTEX_DEFAULT" et al. (see c_vboTypes in OgreD3D11VaoManager.cpp) to tune how much memory the pools allocate by default. You can get a good estimate of how much you'll need by watching VaoManager::getMemoryStats output.

I can't remember how much was implemented in Ogre 2.1 as memory management increased tremendously in 2.2, we have the following samples to fine tune and monitor memory:
  1. Samples/2.0/Tutorials/Tutorial_Memory
  2. Samples/2.0/Tests/MemoryCleanup
  3. Samples/2.0/Tests/TextureResidency (Ogre 2.2+)
Last but not least, I assume most of your issues are coming from geometry? Ogre 2.1 has poor texture memory management because it loads all textures as soon as their materials are parsed and this is usually the main cause of memory consumption in Ogre 2.1 based applications, something which got fixed in Ogre 2.2
For debugging texture memory issues you can see the manual section.

Cheers
Hilarius86
Halfling
Posts: 51
Joined: Thu Feb 14, 2019 11:27 am
x 8

Re: [2.1]Ram usage with custom Vaos

Post by Hilarius86 »

Your tip for minimal invasiveness with local buffer sounds very good, as I can test if that is working as expected, before looking into an appropriate refactor.

VaoManager::cleanupEmptyPools was only added in 2.2 sadly, but it will certainly give me some hints to investigate further.
We are currently discussing if this moment and our currect problems warrent moving on from 2.1 to a more recent version. As you answered DOS in a different post, we are also facing problems the problems with initialisizing lots of recources and the resulting texture load times.
Question is what is the impact of migrating to 2.2 and what is the impact and timeline for a release of 2.3?
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5433
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1341

Re: [2.1]Ram usage with custom Vaos

Post by dark_sylinc »

If you're using textures mostly through scripts, porting to 2.2 is straightforward.

If you're using textures directly in C++ you need to do a bit more work but is still relatively easy (see manual, make sure to read that section, it describes the changes).

If you are doing advanced compositor scripts you may want to take advantage of the new load/store semantics (and use RenderDoc to debug you're using them correctly), but that is not mandatory.

If you're using textures directly in C++ a lot, and you also rely on RenderTarget listeners; you'll have to do more work; and port those listeners to equivalent CompositorWorkspace listeners.

All in all is not a big amount of work; but how much depends on your codebase. If you need help sorting out how to port feel free to ask.

Moving from 2.2 to 2.3 will be straightforward as the main change is Vulkan; and we have the changes documented in a GIthub ticket. I am planning to release 2.3 soon (naming poll has started!)
Hilarius86
Halfling
Posts: 51
Joined: Thu Feb 14, 2019 11:27 am
x 8

Re: [2.1]Ram usage with custom Vaos

Post by Hilarius86 »

Thanks for the hints, we have a few spots where we generate/modify textures via C++. From looking over the manual it looks like most classes have direct replacements, we'll just start with a drop in substitution, build it and go from there (if we decide to move forward at the moment).
Also nice to hear that you are making good progress on Vulkan.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5433
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1341

Re: [2.1]Ram usage with custom Vaos

Post by dark_sylinc »

Yes! Btw I forgot to mention the old texture code is in "Deprecated" folder.

If you include that path, your old includes will work (eg. #include "OgreTexture.h"). Although your program will not link, it is extremely useful when replacing in steps, as the cpp file will compile correctly while you're replacing the old methods e.g. replace #include "OgreTexture.h" with #include "OgreTextureGpu.h" fix errors then repeat with OgreTextureManager.h -> OgreTextureManagerGpu.h the remove HardwarePixelBuffer etc
User avatar
DimA
Halfling
Posts: 59
Joined: Tue Jan 10, 2006 12:52 pm
Location: Ukraine
x 6

Re: [2.1]Ram usage with custom Vaos

Post by DimA »

It would be great to have additional buffer type for CPU/GPU fast read access, optimized for platforms with unified memory, especially for iOS.
Many tasks require access to geometry data (geometry LOD generation, precise pick, collision detection, some algorithms with geometry analyze). For example, in our app Live Home 3D, we analyze mesh geometry to automatically detect large planes for planar reflections.
Of cause you may have shadow buffers or even shadow meshes(we use them currently) for these purposes. But this is waste of memory, that is unacceptable for mobile platforms like iOS. Indeed iOS devices, even most modern, have very limited memory model because they do not support system virtual memory. iOS app may get up to 60% of total physical device RAM!
Thanks to unified memory Metal supports MTLStorageModeShared buffers. For static buffers that your write ones and read many times it could be better to use MTLStorageModeShared buffers
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5433
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1341

Re: [2.1]Ram usage with custom Vaos

Post by dark_sylinc »

Hi!

I opened ticket 238 to avoid derailing this conversation.

I can't currently implement this (no time or interest), but as it is described in the ticket; it should be perfectly possible to support it; it may actually be easy if we're lucky. If you so desire, you can attempt to tackle the issue.