[2.2] improving performance
-
- Greenskin
- Posts: 103
- Joined: Tue Sep 18, 2018 4:53 am
- x 20
[2.2] improving performance
Hi,
Currently I'm seeing about 50% of performance compared to the one base on Ogre 1.10 / D3D9. (i.e. actually slower with Ogre 2.2 / D3D11)
Are there any checklists or frequent newbie mistakes that I should watch out for? One of the key reasons for the porting to Ogre 2.x was the potential increase in performance and I would like to figure out what is going on.
I should note also that 2.2 is a little slower than 2.1 as well.
Many thanks as always,
EDIT: running MSVC profiler I can see that there's hardly any CPU usage (and the notable use is when generating the tangent vectors). The vast majority of the GPU time is spent on "DrawIndexedInstanced". Not sure what that means, but must be unique for D3D11 since running the same scenario with Ogre 1.10/D3D9 most of the GPU time is spent on "GPU Work". It's interesting that D3D9 shows more even thread usage than D3D11.
Currently I'm seeing about 50% of performance compared to the one base on Ogre 1.10 / D3D9. (i.e. actually slower with Ogre 2.2 / D3D11)
Are there any checklists or frequent newbie mistakes that I should watch out for? One of the key reasons for the porting to Ogre 2.x was the potential increase in performance and I would like to figure out what is going on.
I should note also that 2.2 is a little slower than 2.1 as well.
Many thanks as always,
EDIT: running MSVC profiler I can see that there's hardly any CPU usage (and the notable use is when generating the tangent vectors). The vast majority of the GPU time is spent on "DrawIndexedInstanced". Not sure what that means, but must be unique for D3D11 since running the same scenario with Ogre 1.10/D3D9 most of the GPU time is spent on "GPU Work". It's interesting that D3D9 shows more even thread usage than D3D11.
-
- OGRE Team Member
- Posts: 5446
- Joined: Sat Jul 21, 2007 4:55 pm
- Location: Buenos Aires, Argentina
- x 1348
Re: [2.2] improving performance
Ahhh I understand what's happening now.cc9cii wrote: ↑Tue Jul 21, 2020 9:17 am EDIT: running MSVC profiler I can see that there's hardly any CPU usage (and the notable use is when generating the tangent vectors). The vast majority of the GPU time is spent on "DrawIndexedInstanced". Not sure what that means, but must be unique for D3D11 since running the same scenario with Ogre 1.10/D3D9 most of the GPU time is spent on "GPU Work". It's interesting that D3D9 shows more even thread usage than D3D11.
You have a GPU bottleneck.
Ogre 2.1+ shaders are PBS (Physically Based Shading), which are much more expensive than 1.x's which mimic Fixed Function Pipeline.
PBS gives better lighting quality, but the assets you're using were made for an old pipeline.
What I can suggest is that you turn all materials to PbsBrdf::BlinnPhongLegacyMath or BlinnPhongFullLegacy which is the fastest mode we have and looks closer to what 1.x's used to look like and regain some of that lost performance.
Or you can stick to the current BRDF and pair with an artist to "modernize" (aka HD remaster/remake) the assets.
Edit: If you're using Debug, please note D3D11 Debug has very expensive validation layers going on.
-
- Greenskin
- Posts: 103
- Joined: Tue Sep 18, 2018 4:53 am
- x 20
Re: [2.2] improving performance
Hi,
I tried adding the setting as per below when I'm creating the materials (I'm only using Pbs at the moment):
In both cases the performance is better, but still *significantly* slower than Ogre 1.10/D3D9. Also, the lighting is *super* bright, especially the "full legacy" one.
I noticed that there is a profiler class - would it be useful to get an insight into why I'm getting such poor performance? If so how do I go about using that?
I would also like to examine batch counts, etc, but getBatchCount() and getTriangleCount() methods have disappeared. Is there another way of getting these info?
I tried adding the setting as per below when I'm creating the materials (I'm only using Pbs at the moment):
Code: Select all
//pbsDatablock->setBrdf(Ogre::PbsBrdf::PbsBrdf::BlinnPhongLegacyMath);
pbsDatablock->setBrdf(Ogre::PbsBrdf::PbsBrdf::BlinnPhongFullLegacy);
I noticed that there is a profiler class - would it be useful to get an insight into why I'm getting such poor performance? If so how do I go about using that?
I would also like to examine batch counts, etc, but getBatchCount() and getTriangleCount() methods have disappeared. Is there another way of getting these info?
-
- Ogre Magi
- Posts: 1172
- Joined: Mon Aug 04, 2008 7:51 pm
- Location: Manchester - England
- x 76
Re: [2.2] improving performance
There are 10 types of people in the world: Those who understand binary, and those who don't...
-
- OGRE Team Member
- Posts: 5446
- Joined: Sat Jul 21, 2007 4:55 pm
- Location: Buenos Aires, Argentina
- x 1348
Re: [2.2] improving performance
Is this Release or Debug performance?
and add:
The 'offline' one will generate exhaustive CSV files (if Ogre is compiled with OGRE_PROFILING_EXHAUSTIVE it gets even more exhaustive).
Remotery is realtime, and you can watch it by opening index.html from ogre-next-deps/src/Remotery/vis in your browser
Build Ogre with CMake setting OGRE_PROFILING_PROVIDER set to either 'remotery' or 'offline'
and add:
Code: Select all
#if OGRE_PROFILING
Ogre::Profiler::getSingleton().setEnabled( true );
#if OGRE_PROFILING == OGRE_PROFILING_INTERNAL
Ogre::Profiler::getSingleton().endProfile( "" );
#endif
#if OGRE_PROFILING == OGRE_PROFILING_INTERNAL_OFFLINE
Ogre::Profiler::getSingleton().getOfflineProfiler().setDumpPathsOnShutdown(
mWriteAccessFolder + "ProfilePerFrame",
mWriteAccessFolder + "ProfileAccum" );
#endif
#endif
Remotery is realtime, and you can watch it by opening index.html from ogre-next-deps/src/Remotery/vis in your browser
As Zonder said, they were moved to RenderSystem::getMetrics()
-
- Greenskin
- Posts: 103
- Joined: Tue Sep 18, 2018 4:53 am
- x 20
Re: [2.2] improving performance
Hi,
Is there some different batching setup for Ogre 2.2? The application uses v1::StaticGeometry wherever possible and with Ogre 1.10 / D3D9 batch count reported was always 100+. (unless the batch count means something different now?)
RelWithDebInfo - close enough to Release for measuring performance and I'm using the same setting for both 2.2.3 and 1.10.11.
Thank you for this. I've enabled it and I can see the draw count varying between 130 - 400 depending on the complexity of the scene (this does not change if using BlinnPhongLegacyMath or default Pbs) but the batch count is zero!Zonder wrote: ↑Wed Jul 22, 2020 2:40 pmSee here viewtopic.php?f=25&t=83155 specifically viewtopic.php?p=548153#p548153
Is there some different batching setup for Ogre 2.2? The application uses v1::StaticGeometry wherever possible and with Ogre 1.10 / D3D9 batch count reported was always 100+. (unless the batch count means something different now?)
Last edited by cc9cii on Wed Jul 22, 2020 11:29 pm, edited 1 time in total.
-
- OGRE Team Member
- Posts: 5446
- Joined: Sat Jul 21, 2007 4:55 pm
- Location: Buenos Aires, Argentina
- x 1348
Re: [2.2] improving performance
What's the performance numbers like? e.g. one vs the other?
What's your Hardware?
What's your Hardware?
-
- Greenskin
- Posts: 103
- Joined: Tue Sep 18, 2018 4:53 am
- x 20
Re: [2.2] improving performance
With Ogre 2.2.3 / D3D11 getting something like 30 - 80 FPS (depends on the scene and whether using legacy lighting).
With Ogre 1.10.11 / D3D9 getting around 70 - 160 (same scene as above).
Hardware is a laptop running Nvidia Quadro P1000 (also have 6 cores so CPU is not really an issue)
With Ogre 1.10.11 / D3D9 getting around 70 - 160 (same scene as above).
Hardware is a laptop running Nvidia Quadro P1000 (also have 6 cores so CPU is not really an issue)
-
- OGRE Team Member
- Posts: 5446
- Joined: Sat Jul 21, 2007 4:55 pm
- Location: Buenos Aires, Argentina
- x 1348
Re: [2.2] improving performance
Oh, that's a major perf slowdown then.
I wonder if it has to do with the morph / SW skeleton, since that path is not well tested; and it could be causing stalls (the CPU waiting for the GPU).
I wonder if it has to do with the morph / SW skeleton, since that path is not well tested; and it could be causing stalls (the CPU waiting for the GPU).
-
- Greenskin
- Posts: 103
- Joined: Tue Sep 18, 2018 4:53 am
- x 20
Re: [2.2] improving performance
Just tested with all poses commented out and the performance is not much better - maybe 3-5 FPS gain?
Just wondering why the batch count is 0. If it is being reported correctly it could explain the loss of performance.
Just wondering why the batch count is 0. If it is being reported correctly it could explain the loss of performance.
-
- OGRE Team Member
- Posts: 5446
- Joined: Sat Jul 21, 2007 4:55 pm
- Location: Buenos Aires, Argentina
- x 1348
Re: [2.2] improving performance
Batch count is never filled. It makes no sense in Ogre 2.2 except when using very legacy systems.
The closest to Ogre 1.x's batch count is 2.2's mDrawCount.
The closest to Ogre 1.x's batch count is 2.2's mDrawCount.
-
- Greenskin
- Posts: 103
- Joined: Tue Sep 18, 2018 4:53 am
- x 20
Re: [2.2] improving performance
Ok, I think you are right - the reported numbers are very similar.
Looking at the MSVC's profile outputs I think the issue is in the GPU. The CPU is hardly doing anything.
What tools are available to see the details of what is actually happening inside the GPU? At the moment all I get is "DrawIndexedInstanced".
EDIT: just to add some context
I've invested over a month into porting to Ogre 2.2 now. I would like to get something out of this investment. At the moment it's all gone backwards (only a few things are working and slow) but I hope I can overcome the issues.
EDIT2: is face count roughly the same as the old triangle count? I'm asking because the face count is a lot smaller.
Looking at the MSVC's profile outputs I think the issue is in the GPU. The CPU is hardly doing anything.
What tools are available to see the details of what is actually happening inside the GPU? At the moment all I get is "DrawIndexedInstanced".
EDIT: just to add some context
I've invested over a month into porting to Ogre 2.2 now. I would like to get something out of this investment. At the moment it's all gone backwards (only a few things are working and slow) but I hope I can overcome the issues.
EDIT2: is face count roughly the same as the old triangle count? I'm asking because the face count is a lot smaller.
-
- OGRE Team Member
- Posts: 5446
- Joined: Sat Jul 21, 2007 4:55 pm
- Location: Buenos Aires, Argentina
- x 1348
Re: [2.2] improving performance
Sometimes GPU-Z may reveal something (e.g. if a particular GPU sensor is at 100%)
Could you upload a RenderDoc capture from an angle that is performing poorly? The RenderDoc capture is not useful for profiling but it may still tell us something if there is something obviously wrong that could be eating all the performance.
This is definitely strange. Normally you get 4x the performance of Ogre 1.x; and based on the videos you showed about the game it should be rendering at least at 500 fps or more unless it has a ridiculous amount of vertices.
Yes, it should be the same.
Could you upload a RenderDoc capture from an angle that is performing poorly? The RenderDoc capture is not useful for profiling but it may still tell us something if there is something obviously wrong that could be eating all the performance.
-
- Greenskin
- Posts: 103
- Joined: Tue Sep 18, 2018 4:53 am
- x 20
Re: [2.2] improving performance
I have never used RenderDoc so I will need to download/install and figure out how to use it first
I've attached a couple of screenshots (the Ogre 2.2 one is done using <Alt><PrtSc> because screen capture with the new texture code is not yet working).
Ogre 1.10.11 / D3D9: triangles 187481, batches 683
Ogre 2.2.3 / D3D11: face count 31756, draw count 515 (this one is with default Pbs lights)
I've attached a couple of screenshots (the Ogre 2.2 one is done using <Alt><PrtSc> because screen capture with the new texture code is not yet working).
Ogre 1.10.11 / D3D9: triangles 187481, batches 683
Ogre 2.2.3 / D3D11: face count 31756, draw count 515 (this one is with default Pbs lights)
-
- OGRE Expert User
- Posts: 1148
- Joined: Sat Jul 06, 2013 10:59 pm
- Location: Chile
- x 169
Re: [2.2] improving performance
Just to keep hopes high, when I ported from 1.9 to 2.1 I got a huge performance boost, ~30 to ~80 fps in my VR app I was working at that time, plus way better graphics
(of course 1.9 its different from 1.10 and might be many other factors (maybe I was using wrong 1.9, I was very noob at the time), but still)
when I ported from 2.1 to 2.2 I didn't noticed any difference
but when I changed from OGL3+ to D3D11 I also got a little increase from something like ~80 to ~85
I got the rest of the performance I needed when the VR optimizations were implemented, now its not an issue
in pancake mode (non VR) I get 400++ fps in a simple scene like this one (simple.. yet MSAA x4, shadowmaps PFC6x6, pbs with many extra customizations, etc):
is there any logic calculation in your graphics thread?
edit: I have a Nvidia 1070
Saludos!
(of course 1.9 its different from 1.10 and might be many other factors (maybe I was using wrong 1.9, I was very noob at the time), but still)
when I ported from 2.1 to 2.2 I didn't noticed any difference
but when I changed from OGL3+ to D3D11 I also got a little increase from something like ~80 to ~85
I got the rest of the performance I needed when the VR optimizations were implemented, now its not an issue
in pancake mode (non VR) I get 400++ fps in a simple scene like this one (simple.. yet MSAA x4, shadowmaps PFC6x6, pbs with many extra customizations, etc):
is there any logic calculation in your graphics thread?
edit: I have a Nvidia 1070
Saludos!
-
- Greenskin
- Posts: 103
- Joined: Tue Sep 18, 2018 4:53 am
- x 20
Re: [2.2] improving performance
In that scene shown in the attached pics all we have is mostly static geometry, and since it is an interior with the same walls, floor, etc, not many materials. There are about 6 skeletons that are moved by CPU (skinning is done auto-magically by Pbs shader it seems, since I commented out the vertex morph code).
And really, 180k triangles is nothing. External scenes will regularly get 1M+ and in some areas much more.
I do hope I get similar results as you, but it is very slow going.Just to keep hopes high, when I ported from 1.9 to 2.1 I got a huge performance boost, ~30 to ~80 fps in my VR app I was working at that time, plus way better graphics
(of course 1.9 its different from 1.10 and might be many other factors (maybe I was using wrong 1.9, I was very noob at the time), but still)
when I ported from 2.1 to 2.2 I didn't noticed any difference
but when I changed from OGL3+ to D3D11 I also got a little increase from something like ~80 to ~85
I got the rest of the performance I needed when the VR optimizations were implemented, now its not an issue
Cheers,
-
- Greenskin
- Posts: 103
- Joined: Tue Sep 18, 2018 4:53 am
- x 20
Re: [2.2] improving performance
I can't seem to run RenderDoc on my laptop. As soon as it is executed, and even after closing, the D3D11 device is locked(? just guessing) and Ogre will crash each time while in D3D11Device::ReleaseAll() until the laptop is restarted. Tried attaching to a running process and that won't capture anything for some reason.
Stack trace in case it is useful:
Stack trace in case it is useful:
Code: Select all
d3d11.dll!NDXGI::CDevice::EscapeCB() Unknown Non-user code. Symbols loaded.
igdml64.dll!00007ffa27c77d11() Unknown Non-user code. Cannot find or open the PDB file.
igdml64.dll!00007ffa27c6fd31() Unknown Non-user code. Cannot find or open the PDB file.
igd10iumd64.dll!00007ffa893d7eba() Unknown Non-user code. Cannot find or open the PDB file.
d3d11.dll!NDXGI::CDevice::DestroyDriverInstance() Unknown Non-user code. Symbols loaded.
d3d11.dll!CContext::LUCBeginLayerDestruction() Unknown Non-user code. Symbols loaded.
d3d11.dll!CD3D11LayeredChild<struct ID3D11DeviceChild,class NDXGI::CDevice,64>::LUCBeginLayerDestruction(void) Unknown Non-user code. Symbols loaded.
d3d11.dll!CUseCountedObject<NOutermost::CDeviceChild>::`scalar deleting destructor'() Unknown Non-user code. Symbols loaded.
d3d11.dll!CUseCountedObject<class NOutermost::CDeviceChild>::UCDestroy(void) Unknown Non-user code. Symbols loaded.
d3d11.dll!CUseCountedObject<class NOutermost::CDeviceChild>::UCReleaseUse(void) Unknown Non-user code. Symbols loaded.
d3d11.dll!CDevice::LLOBeginLayerDestruction() Unknown Non-user code. Symbols loaded.
d3d11.dll!NDXGI::CDevice::LLOBeginLayerDestruction() Unknown Non-user code. Symbols loaded.
d3d11.dll!NOutermost::CDevice::LLOBeginLayerDestruction(void) Unknown Non-user code. Symbols loaded.
d3d11.dll!TComObject<NOutermost::CDevice>::~TComObject<NOutermost::CDevice>() Unknown Non-user code. Symbols loaded.
d3d11.dll!TComObject<NOutermost::CDevice>::`scalar deleting destructor'() Unknown Non-user code. Symbols loaded.
d3d11.dll!TComObject<class NOutermost::CDevice>::Release(void) Unknown Non-user code. Symbols loaded.
> [Inline Frame] RenderSystem_Direct3D11.dll!Ogre::ComPtr<ID3D11Device1>::InternalRelease() Line 108 C++ Symbols loaded.
[Inline Frame] RenderSystem_Direct3D11.dll!Ogre::ComPtr<ID3D11Device1>::Reset() Line 233 C++ Symbols loaded.
RenderSystem_Direct3D11.dll!Ogre::D3D11Device::ReleaseAll() Line 73 C++ Symbols loaded.
RenderSystem_Direct3D11.dll!Ogre::D3D11RenderSystem::createDevice(const std::string & windowTitle) Line 1624 C++ Symbols loaded.
RenderSystem_Direct3D11.dll!Ogre::D3D11RenderSystem::_initialise(bool autoCreateWindow, const std::string & windowTitle) Line 762 C++ Symbols loaded.
OgreMain.dll!Ogre::Root::initialise(bool autoCreateWindow, const std::string & windowTitle, const std::string & customCapabilitiesConfig) Line 788 C++ Symbols loaded.
openmw.exe!OEngine::Render::OgreRenderer::createWindow(const std::string & title, const OEngine::Render::WindowSettings & settings) Line 129 C++ Symbols loaded.
Last edited by cc9cii on Sat Jul 25, 2020 9:26 am, edited 1 time in total.
-
- Gnoll
- Posts: 659
- Joined: Mon Aug 06, 2007 12:53 pm
- Location: Saarland, Germany
- x 63
Re: [2.2] improving performance
Hi cc9cii,
Could you paste your Ogre initialization code, especially how you create the scene manager (how many threads).
Best Regards
Lax
Could you paste your Ogre initialization code, especially how you create the scene manager (how many threads).
Best Regards
Lax
http://www.lukas-kalinowski.com/Homepage/?page_id=1631
Please support Second Earth Technic Base built of Lego bricks for Lego ideas: https://ideas.lego.com/projects/81b9bd1 ... b97b79be62
-
- Greenskin
- Posts: 103
- Joined: Tue Sep 18, 2018 4:53 am
- x 20
Re: [2.2] improving performance
Only 1 thread (mainly because I wanted to compare with Ogre 1.10):
On another note, I got Nvidia's NSight to run (Ogre keeps crashing with RenderDoc). Oddly enough, sometimes it runs faster with NSight! But when it is running poorly, there's a lot of map/unmap commands and during that time there are no draw commands - I'm guessing these must be loading textures, but why so many? Anyway, need to figure out how to understand what the profiler is telling.
EDIT: Tried increasing the thread count to 3 but no improvement in performance.
Still not sure why it runs faster under NSight.
EDIT2: Maybe my material cache is not working correctly and each and every material is treated as different? Maybe that's why there are so many texture loads? But that doesn't make sense since I'm using createOrRetrieveTexture() and pbsDatablock->setTexture()
Code: Select all
mPassProvider.reset(new MyGUI::OgreCompositorPassProvider());
Ogre::CompositorManager2* compositorManager = Ogre::Root::getSingleton().getCompositorManager2();
if (!compositorManager->getCompositorPassProvider())
compositorManager->setCompositorPassProvider(mPassProvider.get());
mScene = mRoot->createSceneManager(Ogre::ST_GENERIC, 1, "OpenMW");
mCamera = mScene->createCamera("cam");
mCamera->detachFromParent();
Ogre::SceneNode* rootNode = mScene->getRootSceneNode();
Ogre::SceneNode* childNode = rootNode->createChildSceneNode();
childNode->attachObject(mCamera);
mCamera->setNearClipDistance(0.5f);
mCamera->setFarClipDistance( 10000.0f );
mCamera->setAutoAspectRatio( true );
mScene->setAmbientLight(Ogre::ColourValue::White, Ogre::ColourValue::White, Ogre::Vector3::UNIT_Y);
Ogre::SceneNode* lightNode = mScene->getRootSceneNode()->createChildSceneNode();
Ogre::Light* light = mScene->createLight();
light->setName("OpenMW");
lightNode->attachObject(light);
light->setType(Ogre::Light::LT_DIRECTIONAL);
Ogre::Vector3 vec(-0.3f, -0.3f, -0.3f);
vec.normalise();
light->setDirection(vec);
EDIT: Tried increasing the thread count to 3 but no improvement in performance.
Still not sure why it runs faster under NSight.
EDIT2: Maybe my material cache is not working correctly and each and every material is treated as different? Maybe that's why there are so many texture loads? But that doesn't make sense since I'm using createOrRetrieveTexture() and pbsDatablock->setTexture()
-
- OGRE Expert User
- Posts: 1227
- Joined: Thu Dec 11, 2008 7:56 pm
- Location: Bristol, UK
- x 157
Re: [2.2] improving performance
Sorry to hear you are having issues, must be irritating, but like many people here I can promise you your efforts will be rewarded.
Firstly if you have issues with a tool like renderdoc, try and re-produce it with an Ogre sample, as it will help us re-produce it. Or any issue for that matter. I just tried the latest renderdoc with the latest Ogre (2.2.4), and all seems to work fine.
Secondly there might be an issue with MYGUI, so I would be tempted to disable that for the time being.
Thirdly .... You can get massive performance gains with Ogre 2.2+, but there are some gotchas. You mentioned batching, but Ogre 2.1+ actually automatically batches your draw calls together, so when you load a mesh or a texture it tries to put it into a buffer with other textures or meshes. However, for example, if all your textures are different sizes or formats, etc, it will end up creating a new texture array for each texture which could cause issues, and may even cause you to run out of VRAM.
So Ogre 2.2+ can be extremely fast, but there are some important gotchas that you need to be aware of.. Hopefully we can get to the bottom of yours quickly
Firstly if you have issues with a tool like renderdoc, try and re-produce it with an Ogre sample, as it will help us re-produce it. Or any issue for that matter. I just tried the latest renderdoc with the latest Ogre (2.2.4), and all seems to work fine.
Secondly there might be an issue with MYGUI, so I would be tempted to disable that for the time being.
Thirdly .... You can get massive performance gains with Ogre 2.2+, but there are some gotchas. You mentioned batching, but Ogre 2.1+ actually automatically batches your draw calls together, so when you load a mesh or a texture it tries to put it into a buffer with other textures or meshes. However, for example, if all your textures are different sizes or formats, etc, it will end up creating a new texture array for each texture which could cause issues, and may even cause you to run out of VRAM.
So Ogre 2.2+ can be extremely fast, but there are some important gotchas that you need to be aware of.. Hopefully we can get to the bottom of yours quickly
-
- Greenskin
- Posts: 103
- Joined: Tue Sep 18, 2018 4:53 am
- x 20
Re: [2.2] improving performance
On my machine, running RenderDoc once will make *all* Ogre sample projects to crash until the laptop is restarted. I suspect drivers, but don't know for sure. I'm also on Ogre 2.2.3 if that makes any difference.
I am hopeful since launching my app using NSight makes it go faster - so the potential is there, but obviously it is not being setup properly. Strangely, exiting NSight leaves the app running, and it is still running faster...
I am hopeful since launching my app using NSight makes it go faster - so the potential is there, but obviously it is not being setup properly. Strangely, exiting NSight leaves the app running, and it is still running faster...
-
- OGRE Team Member
- Posts: 5446
- Joined: Sat Jul 21, 2007 4:55 pm
- Location: Buenos Aires, Argentina
- x 1348
Re: [2.2] improving performance
In my experience when RenderDoc crashes on all D3D11 apps, it's because you have some weird 3rd party app (either that came with your laptop to "enhance" the experience, or something you installed afterwards) which forcefully hooks into all D3D11 apps, causing performance and stability problems.
Apps known to do this are MSI Afterburner, Mumble (only when running), Nomachine (only when running), Plays.tv
Old drivers can also be a cause.
Apps known to do this are MSI Afterburner, Mumble (only when running), Nomachine (only when running), Plays.tv
Old drivers can also be a cause.
-
- Gnoll
- Posts: 659
- Joined: Mon Aug 06, 2007 12:53 pm
- Location: Saarland, Germany
- x 63
Re: [2.2] improving performance
Is not this one of the issues? You try to compare with 1.10, but Ogre 2.x works totally different... So I would test with as many threads as processors. Maybe there is your bottleneck? But its just an assumption.Only 1 thread (mainly because I wanted to compare with Ogre 1.10):
http://www.lukas-kalinowski.com/Homepage/?page_id=1631
Please support Second Earth Technic Base built of Lego bricks for Lego ideas: https://ideas.lego.com/projects/81b9bd1 ... b97b79be62
-
- Greenskin
- Posts: 103
- Joined: Tue Sep 18, 2018 4:53 am
- x 20
Re: [2.2] improving performance
Hi, I did try with more threads but unfortunately I didn't see any improvement. In any case, performance with Ogre 2.2 shouldn't go down by so much, no?Lax wrote: ↑Fri Jul 24, 2020 5:56 pmIs not this one of the issues? You try to compare with 1.10, but Ogre 2.x works totally different... So I would test with as many threads as processors. Maybe there is your bottleneck? But its just an assumption.Only 1 thread (mainly because I wanted to compare with Ogre 1.10):
Just to be clear, it is Ogre that is crashing not any other apps.dark_sylinc wrote: ↑Fri Jul 24, 2020 2:20 pm In my experience when RenderDoc crashes on all D3D11 apps
EDIT: @dark_sylinc, I can see around 200 map/unmap calls (which I assume to be texture loading) each and every frame - is that normal? When I look at frame profile of the Pbs demo, I can only see a few map/unmap.
EDIT2: Some more info after examining the frames with NSight - more than half the frame time is spent on loading textures and drawing a few HUD overlay items (health bars, local map, etc). There must be something wrong with the way I'm doing these overlay elements. (but that doesn't explain why it runs faster when launched with NSight)
-
- Greenskin
- Posts: 103
- Joined: Tue Sep 18, 2018 4:53 am
- x 20
Re: [2.2] improving performance
Sorry about so many message in this thread.
First some good news - depending on what is happening in the scene approx 25% to 50%+ of the frame time is spent on drawing MyGUI widgets which are used for HUD elements. It needs a rewrite or at least quite a bit of optimisation. If anyone has efficient way of doing HUD elements with Ogre 2.2 please share some hints on how to go about it.
Next, not so good news. Even with HUD elements disabled, the performance is still poor (no gain from removing HUD is seen, either). But if the app is launched with NSight, the performance improves and the additional performance from HUD elements being disabled can be seen. So, something NSight is doing while launching the app is providing the extra performance but I don't see what it could be. If there are things I can check, please feel free to suggest anything. EDIT: This one is resolved - apparently Nvidia Optimus on my laptop was choosing Intel integrated graphics whereas NSight forced the dgpu to be used. Similarly, if I force both RenderDoc and the target application (e.g. Ogre's samples) to use NVidia GPU I no longer get the crash. ("resolved" is being kind - Ogre 2.2 / D3D11 running on a dgpu is still getting less FPS than Ogre 1.10 / D3D9 running on integrated graphics, even with the MyGUI HUD stuff disabled... but like everyone mentioned I have a lot of tuning to do to extract the full potential of 2.2, so enough with complaining and time to get things done! Onwards and upwards as they say.)
EDIT: comparing the frame between 2.1 and 2.2, I've noticed that in 2.1 two shaders are active at one time (I don't know if this is the right way to describe it, I'll attach some pics to illustrate) but in 2.2 only one at a time - is there some setting I have to do differently in 2.2? pls ignore the nonsense
First some good news - depending on what is happening in the scene approx 25% to 50%+ of the frame time is spent on drawing MyGUI widgets which are used for HUD elements. It needs a rewrite or at least quite a bit of optimisation. If anyone has efficient way of doing HUD elements with Ogre 2.2 please share some hints on how to go about it.
Next, not so good news. Even with HUD elements disabled, the performance is still poor (no gain from removing HUD is seen, either). But if the app is launched with NSight, the performance improves and the additional performance from HUD elements being disabled can be seen. So, something NSight is doing while launching the app is providing the extra performance but I don't see what it could be. If there are things I can check, please feel free to suggest anything. EDIT: This one is resolved - apparently Nvidia Optimus on my laptop was choosing Intel integrated graphics whereas NSight forced the dgpu to be used. Similarly, if I force both RenderDoc and the target application (e.g. Ogre's samples) to use NVidia GPU I no longer get the crash. ("resolved" is being kind - Ogre 2.2 / D3D11 running on a dgpu is still getting less FPS than Ogre 1.10 / D3D9 running on integrated graphics, even with the MyGUI HUD stuff disabled... but like everyone mentioned I have a lot of tuning to do to extract the full potential of 2.2, so enough with complaining and time to get things done! Onwards and upwards as they say.)
EDIT: comparing the frame between 2.1 and 2.2, I've noticed that in 2.1 two shaders are active at one time (I don't know if this is the right way to describe it, I'll attach some pics to illustrate) but in 2.2 only one at a time - is there some setting I have to do differently in 2.2? pls ignore the nonsense