[2.2] improving performance

Discussion area about developing with Ogre-Next (2.1, 2.2 and beyond)


User avatar
cc9cii
Greenskin
Posts: 103
Joined: Tue Sep 18, 2018 4:53 am
x 20

Re: [2.2] improving performance

Post by cc9cii »

dark_sylinc wrote: Wed Jul 22, 2020 5:19 pm Is this Release or Debug performance?
cc9cii wrote: Wed Jul 22, 2020 8:57 am I noticed that there is a profiler class - would it be useful to get an insight into why I'm getting such poor performance? If so how do I go about using that?
Build Ogre with CMake setting OGRE_PROFILING_PROVIDER set to either 'remotery' or 'offline'

and add:

Code: Select all

#if OGRE_PROFILING
        Ogre::Profiler::getSingleton().setEnabled( true );
    #if OGRE_PROFILING == OGRE_PROFILING_INTERNAL
        Ogre::Profiler::getSingleton().endProfile( "" );
    #endif
    #if OGRE_PROFILING == OGRE_PROFILING_INTERNAL_OFFLINE
        Ogre::Profiler::getSingleton().getOfflineProfiler().setDumpPathsOnShutdown(
                    mWriteAccessFolder + "ProfilePerFrame",
                    mWriteAccessFolder + "ProfileAccum" );
    #endif
#endif
The 'offline' one will generate exhaustive CSV files (if Ogre is compiled with OGRE_PROFILING_EXHAUSTIVE it gets even more exhaustive).

Remotery is realtime, and you can watch it by opening index.html from ogre-next-deps/src/Remotery/vis in your browser
Hi,

I've been doing some profiling using chrome:tracing as per Chrome Tracing as Profiler Frontend using hrydgard/minitrace. In my case, both Ogre 1.10 and Ogre 2.2 are taking 80%+ of the wall clock time. I.e. there is little to gain from optimising my app logic, other than how it uses Ogre to setup rendering. Since D3D9 can only run on a single thread, I guess I have to use D3D11 (i.e. Ogre 2.2).

A zoomed-in image is attached below. The green sections are my app, the rest is Ogre 2.2

Image

So I tried to add profiling to Ogre. I've tried both offline and remotery. Offline profiling works, but it is difficult to visualise what is going on. I built some rudimentary Excel formulas to display stack growth, etc, but still not very satisfactory.

Remotery looks promising, but how are you meant to interpret the results? It seems to present the same results as the CSV, except real-time and prettier looking.

EDIT: deleted linker error issues (now resolved) and added Remotery pic

Image

This one is Sample_PbsMaterials - looks similar other than the number of scene?


Image
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5446
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1348

Re: [2.2] improving performance

Post by dark_sylinc »

Something is seriously wrong in your computer or build (no offense!)

According to the captures you uploaded, PbsMaterials sample is running at only 30 fps or less; however according to PassMark that GPU is twice as fast as my old Radeon HD 7770 and even faster than my main Radeon RX 560 and the latter gets me 500-700 fps in PbsMaterials.

There is a lot of unnaccounted time in "Command Execution" which implies something in the driver or GPU. Maybe for some reason the GPU is permanently stuck in low power mode.

And the fact that Ogre demos don't run in RenderDoc is a huge red flag.

PS: Unrelated to this: You may get better performance with:

Code: Select all

pbs->setOptimizationStrategy( ConstBufferPool::LowerGpuOverhead );
unlit->setOptimizationStrategy( ConstBufferPool::LowerGpuOverhead );
However you have a massive (unknown) problem for which you're getting very low performance.
Normally with something as big as this I'd recommend formatting the drive and reinstalling Windows.

Perhaps installing the latest drivers fixes it. Otherwise start enumerating all the installed apps in your system and start uninstalling the most suspicious ones. Also look for weird stuff at boot in msconfig.exe

Open DirectX Panel (look for dxcpl.exe it must be somewhere in your system, if your system is too old perhaps you have it as directx.cpl) and make sure apps are not being forced to run under the debug layer.

Check your power profile isn't set to maximize battery. And if you've got Optimus, check the app isn't being redirected to the Intel GPU (though not even Intel GPUs get such low performance of <30fps in PbsMaterials)
User avatar
cc9cii
Greenskin
Posts: 103
Joined: Tue Sep 18, 2018 4:53 am
x 20

Re: [2.2] improving performance

Post by cc9cii »

dark_sylinc wrote: Thu Jul 30, 2020 4:30 pm There is a lot of unnaccounted time in "Command Execution" which implies something in the driver or GPU. Maybe for some reason the GPU is permanently stuck in low power mode.

And the fact that Ogre demos don't run in RenderDoc is a huge red flag.
Hi,

I mentioned it in an earlier post - perhaps you missed it - I have to manually set the GPU to be Nvidia for each application else they usually default to Intel iGpu. Having said that:
  1. I do not wish to limit the application to using Nvidia dGpu only. So I have to find a way to increase performance for iGpu.
  2. Using the same iGpu, Ogre 1.10/D3D9 runs faster. In fact, it runs faster with Nvidia dGpu as well (which is why I'm trying to figure out what is holding back the Ogre 2.2/D3d11 build). Both are running at the same 1600x900 resolution.
Maybe Intel's D3D11 driver is just plain bad? Even if so it still doesn't explain why D3D9 on a single thread runs faster (~120 FPS) than 2 thread D3D11 (~100 FPS) on the same dGpu using Nvidia's drivers. (but - see below question regarding v1 renderables)
Perhaps installing the latest drivers fixes it. Otherwise start enumerating all the installed apps in your system and start uninstalling the most suspicious ones. Also look for weird stuff at boot in msconfig.exe
I've updated the drivers already (I tried the latest drivers from both Lenovo as well as Nvidia direct). I didn't see any differences in performance. I've disabled most things for auto startup.
Check your power profile isn't set to maximize battery. And if you've got Optimus, check the app isn't being redirected to the Intel GPU (though not even Intel GPUs get such low performance of <30fps in PbsMaterials)
Sorry, that was misleading due to the screen resolution - it was set at 4k, which slowed it down. If I set it to 1080p, for example, the demo runs faster. But not crazy fast like you mentioned. With Nvidia dGpu @1080p it is getting around 220 - 250 FPS (with iGpu around 90 - 100 FPS). The power setting is at "better performance" - putting it at maximum "best performance" does not change the FPS at all.
PS: Unrelated to this: You may get better performance with:

Code: Select all

pbs->setOptimizationStrategy( ConstBufferPool::LowerGpuOverhead );
unlit->setOptimizationStrategy( ConstBufferPool::LowerGpuOverhead );
This does increase performance somewhat but most noticeable if using Nvidia dGpu (say from ~80 FPS to close to 100 FPS). But what I'm now noticing is that the "Main Ogre Thread" spends a lot of time in "V1 Renderable update". Does this indicate that if I convert all the v1::Mesh to v2 Mesh I might see an improvement in performance?
al2950
OGRE Expert User
OGRE Expert User
Posts: 1227
Joined: Thu Dec 11, 2008 7:56 pm
Location: Bristol, UK
x 157

Re: [2.2] improving performance

Post by al2950 »

cc9cii wrote: Fri Jul 31, 2020 4:02 am Maybe Intel's D3D11 driver is just plain bad? Even if so it still doesn't explain why D3D9 on a single thread runs faster (~120 FPS) than 2 thread D3D11 (~100 FPS) on the same dGpu using Nvidia's drivers. (but - see below question regarding v1 renderables)
TBH you are comparing apples with pears. No one really knows whats happening in the drivers, Ogre 2.2 using some quite advance hardware specific stuff with Dx11 that is not available in Dx9. So if there is a problem you have no idea how it might manifest itself. Also I had an optimus laptop and it worked fine for everything except graphics dev work... it drove me nuts! At this point I would HIGHLY recommend you test you app on another platform, preferably a desktop, just so you can rule out some of the problems.
cc9cii wrote: Fri Jul 31, 2020 4:02 am But what I'm now noticing is that the "Main Ogre Thread" spends a lot of time in "V1 Renderable update". Does this indicate that if I convert all the v1::Mesh to v2 Mesh I might see an improvement in performance?
In short yes, but the amount of perf gains really depends on a number of factors. Also I can not remember if MyGUI is using V1 Renderable as well. (I really need to sort MyGUI out for Ogre 2.2)