[2.2] improving performance

Discussion area about developing with Ogre2 branches (2.1, 2.2 and beyond)
User avatar
cc9cii
Halfling
Posts: 91
Joined: Tue Sep 18, 2018 4:53 am
x 19

[2.2] improving performance

Post by cc9cii »

Hi,

Currently I'm seeing about 50% of performance compared to the one base on Ogre 1.10 / D3D9. (i.e. actually slower with Ogre 2.2 / D3D11)

Are there any checklists or frequent newbie mistakes that I should watch out for? One of the key reasons for the porting to Ogre 2.x was the potential increase in performance and I would like to figure out what is going on.

I should note also that 2.2 is a little slower than 2.1 as well.

Many thanks as always,

EDIT: running MSVC profiler I can see that there's hardly any CPU usage (and the notable use is when generating the tangent vectors). The vast majority of the GPU time is spent on "DrawIndexedInstanced". Not sure what that means, but must be unique for D3D11 since running the same scenario with Ogre 1.10/D3D9 most of the GPU time is spent on "GPU Work". It's interesting that D3D9 shows more even thread usage than D3D11.

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 4501
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 936
Contact:

Re: [2.2] improving performance

Post by dark_sylinc »

cc9cii wrote:
Tue Jul 21, 2020 9:17 am
EDIT: running MSVC profiler I can see that there's hardly any CPU usage (and the notable use is when generating the tangent vectors). The vast majority of the GPU time is spent on "DrawIndexedInstanced". Not sure what that means, but must be unique for D3D11 since running the same scenario with Ogre 1.10/D3D9 most of the GPU time is spent on "GPU Work". It's interesting that D3D9 shows more even thread usage than D3D11.
Ahhh I understand what's happening now.

You have a GPU bottleneck.

Ogre 2.1+ shaders are PBS (Physically Based Shading), which are much more expensive than 1.x's which mimic Fixed Function Pipeline.
PBS gives better lighting quality, but the assets you're using were made for an old pipeline.

What I can suggest is that you turn all materials to PbsBrdf::BlinnPhongLegacyMath or BlinnPhongFullLegacy which is the fastest mode we have and looks closer to what 1.x's used to look like and regain some of that lost performance.

Or you can stick to the current BRDF and pair with an artist to "modernize" (aka HD remaster/remake) the assets.

Edit: If you're using Debug, please note D3D11 Debug has very expensive validation layers going on.

User avatar
cc9cii
Halfling
Posts: 91
Joined: Tue Sep 18, 2018 4:53 am
x 19

Re: [2.2] improving performance

Post by cc9cii »

Hi,

I tried adding the setting as per below when I'm creating the materials (I'm only using Pbs at the moment):

Code: Select all

//pbsDatablock->setBrdf(Ogre::PbsBrdf::PbsBrdf::BlinnPhongLegacyMath);
pbsDatablock->setBrdf(Ogre::PbsBrdf::PbsBrdf::BlinnPhongFullLegacy);
In both cases the performance is better, but still *significantly* slower than Ogre 1.10/D3D9. Also, the lighting is *super* bright, especially the "full legacy" one.

I noticed that there is a profiler class - would it be useful to get an insight into why I'm getting such poor performance? If so how do I go about using that?

I would also like to examine batch counts, etc, but getBatchCount() and getTriangleCount() methods have disappeared. Is there another way of getting these info?

User avatar
Zonder
Ogre Magi
Posts: 1148
Joined: Mon Aug 04, 2008 7:51 pm
Location: Manchester - England
x 64

Re: [2.2] improving performance

Post by Zonder »

cc9cii wrote:
Wed Jul 22, 2020 8:57 am
I would also like to examine batch counts, etc, but getBatchCount() and getTriangleCount() methods have disappeared. Is there another way of getting these info?
See here viewtopic.php?f=25&t=83155 specifically viewtopic.php?p=548153#p548153
There are 10 types of people in the world: Those who understand binary, and those who don't...

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 4501
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 936
Contact:

Re: [2.2] improving performance

Post by dark_sylinc »

Is this Release or Debug performance?
cc9cii wrote:
Wed Jul 22, 2020 8:57 am
I noticed that there is a profiler class - would it be useful to get an insight into why I'm getting such poor performance? If so how do I go about using that?
Build Ogre with CMake setting OGRE_PROFILING_PROVIDER set to either 'remotery' or 'offline'

and add:

Code: Select all

#if OGRE_PROFILING
        Ogre::Profiler::getSingleton().setEnabled( true );
    #if OGRE_PROFILING == OGRE_PROFILING_INTERNAL
        Ogre::Profiler::getSingleton().endProfile( "" );
    #endif
    #if OGRE_PROFILING == OGRE_PROFILING_INTERNAL_OFFLINE
        Ogre::Profiler::getSingleton().getOfflineProfiler().setDumpPathsOnShutdown(
                    mWriteAccessFolder + "ProfilePerFrame",
                    mWriteAccessFolder + "ProfileAccum" );
    #endif
#endif
The 'offline' one will generate exhaustive CSV files (if Ogre is compiled with OGRE_PROFILING_EXHAUSTIVE it gets even more exhaustive).

Remotery is realtime, and you can watch it by opening index.html from ogre-next-deps/src/Remotery/vis in your browser
cc9cii wrote:
Wed Jul 22, 2020 8:57 am
I would also like to examine batch counts, etc, but getBatchCount() and getTriangleCount() methods have disappeared. Is there another way of getting these info?
As Zonder said, they were moved to RenderSystem::getMetrics()

User avatar
cc9cii
Halfling
Posts: 91
Joined: Tue Sep 18, 2018 4:53 am
x 19

Re: [2.2] improving performance

Post by cc9cii »

Hi,
dark_sylinc wrote:
Wed Jul 22, 2020 5:19 pm
Is this Release or Debug performance?
RelWithDebInfo - close enough to Release for measuring performance and I'm using the same setting for both 2.2.3 and 1.10.11.
Zonder wrote:
Wed Jul 22, 2020 2:40 pm
cc9cii wrote:
Wed Jul 22, 2020 8:57 am
I would also like to examine batch counts, etc, but getBatchCount() and getTriangleCount() methods have disappeared. Is there another way of getting these info?
See here viewtopic.php?f=25&t=83155 specifically viewtopic.php?p=548153#p548153
Thank you for this. I've enabled it and I can see the draw count varying between 130 - 400 depending on the complexity of the scene (this does not change if using BlinnPhongLegacyMath or default Pbs) but the batch count is zero!

Is there some different batching setup for Ogre 2.2? The application uses v1::StaticGeometry wherever possible and with Ogre 1.10 / D3D9 batch count reported was always 100+. (unless the batch count means something different now?)
Last edited by cc9cii on Wed Jul 22, 2020 11:29 pm, edited 1 time in total.

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 4501
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 936
Contact:

Re: [2.2] improving performance

Post by dark_sylinc »

What's the performance numbers like? e.g. one vs the other?
What's your Hardware?

User avatar
cc9cii
Halfling
Posts: 91
Joined: Tue Sep 18, 2018 4:53 am
x 19

Re: [2.2] improving performance

Post by cc9cii »

With Ogre 2.2.3 / D3D11 getting something like 30 - 80 FPS (depends on the scene and whether using legacy lighting).

With Ogre 1.10.11 / D3D9 getting around 70 - 160 (same scene as above).

Hardware is a laptop running Nvidia Quadro P1000 (also have 6 cores so CPU is not really an issue)

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 4501
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 936
Contact:

Re: [2.2] improving performance

Post by dark_sylinc »

Oh, that's a major perf slowdown then.

I wonder if it has to do with the morph / SW skeleton, since that path is not well tested; and it could be causing stalls (the CPU waiting for the GPU).

User avatar
cc9cii
Halfling
Posts: 91
Joined: Tue Sep 18, 2018 4:53 am
x 19

Re: [2.2] improving performance

Post by cc9cii »

Just tested with all poses commented out and the performance is not much better - maybe 3-5 FPS gain?

Just wondering why the batch count is 0. If it is being reported correctly it could explain the loss of performance.

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 4501
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 936
Contact:

Re: [2.2] improving performance

Post by dark_sylinc »

Batch count is never filled. It makes no sense in Ogre 2.2 except when using very legacy systems.

The closest to Ogre 1.x's batch count is 2.2's mDrawCount.

User avatar
cc9cii
Halfling
Posts: 91
Joined: Tue Sep 18, 2018 4:53 am
x 19

Re: [2.2] improving performance

Post by cc9cii »

Ok, I think you are right - the reported numbers are very similar.

Looking at the MSVC's profile outputs I think the issue is in the GPU. The CPU is hardly doing anything.

What tools are available to see the details of what is actually happening inside the GPU? At the moment all I get is "DrawIndexedInstanced".

EDIT: just to add some context

I've invested over a month into porting to Ogre 2.2 now. I would like to get something out of this investment. At the moment it's all gone backwards (only a few things are working and slow) but I hope I can overcome the issues.

EDIT2: is face count roughly the same as the old triangle count? I'm asking because the face count is a lot smaller.

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 4501
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 936
Contact:

Re: [2.2] improving performance

Post by dark_sylinc »

Sometimes GPU-Z may reveal something (e.g. if a particular GPU sensor is at 100%)
cc9cii wrote:
Thu Jul 23, 2020 12:37 am
I've invested over a month into porting to Ogre 2.2 now. I would like to get something out of this investment. At the moment it's all gone backwards (only a few things are working and slow) but I hope I can overcome the issues.
This is definitely strange. Normally you get 4x the performance of Ogre 1.x; and based on the videos you showed about the game it should be rendering at least at 500 fps or more unless it has a ridiculous amount of vertices.
cc9cii wrote:
Thu Jul 23, 2020 12:37 am
EDIT2: is face count roughly the same as the old triangle count? I'm asking because the face count is a lot smaller.
Yes, it should be the same.

Could you upload a RenderDoc capture from an angle that is performing poorly? The RenderDoc capture is not useful for profiling but it may still tell us something if there is something obviously wrong that could be eating all the performance.

User avatar
cc9cii
Halfling
Posts: 91
Joined: Tue Sep 18, 2018 4:53 am
x 19

Re: [2.2] improving performance

Post by cc9cii »

I have never used RenderDoc so I will need to download/install and figure out how to use it first :-)

I've attached a couple of screenshots (the Ogre 2.2 one is done using <Alt><PrtSc> because screen capture with the new texture code is not yet working).

Ogre 1.10.11 / D3D9: triangles 187481, batches 683

Image

Ogre 2.2.3 / D3D11: face count 31756, draw count 515 (this one is with default Pbs lights)

Image

xrgo
OGRE Expert User
OGRE Expert User
Posts: 1147
Joined: Sat Jul 06, 2013 10:59 pm
Location: Chile
x 166

Re: [2.2] improving performance

Post by xrgo »

Just to keep hopes high, when I ported from 1.9 to 2.1 I got a huge performance boost, ~30 to ~80 fps in my VR app I was working at that time, plus way better graphics
(of course 1.9 its different from 1.10 and might be many other factors (maybe I was using wrong 1.9, I was very noob at the time), but still)
when I ported from 2.1 to 2.2 I didn't noticed any difference
but when I changed from OGL3+ to D3D11 I also got a little increase from something like ~80 to ~85
I got the rest of the performance I needed when the VR optimizations were implemented, now its not an issue

in pancake mode (non VR) I get 400++ fps in a simple scene like this one (simple.. yet MSAA x4, shadowmaps PFC6x6, pbs with many extra customizations, etc):
Image

is there any logic calculation in your graphics thread?

edit: I have a Nvidia 1070

Saludos!

User avatar
cc9cii
Halfling
Posts: 91
Joined: Tue Sep 18, 2018 4:53 am
x 19

Re: [2.2] improving performance

Post by cc9cii »

xrgo wrote:
Thu Jul 23, 2020 1:49 pm
is there any logic calculation in your graphics thread?
In that scene shown in the attached pics all we have is mostly static geometry, and since it is an interior with the same walls, floor, etc, not many materials. There are about 6 skeletons that are moved by CPU (skinning is done auto-magically by Pbs shader it seems, since I commented out the vertex morph code).

And really, 180k triangles is nothing. External scenes will regularly get 1M+ and in some areas much more.
Just to keep hopes high, when I ported from 1.9 to 2.1 I got a huge performance boost, ~30 to ~80 fps in my VR app I was working at that time, plus way better graphics
(of course 1.9 its different from 1.10 and might be many other factors (maybe I was using wrong 1.9, I was very noob at the time), but still)
when I ported from 2.1 to 2.2 I didn't noticed any difference
but when I changed from OGL3+ to D3D11 I also got a little increase from something like ~80 to ~85
I got the rest of the performance I needed when the VR optimizations were implemented, now its not an issue
I do hope I get similar results as you, but it is very slow going.

Cheers,

User avatar
cc9cii
Halfling
Posts: 91
Joined: Tue Sep 18, 2018 4:53 am
x 19

Re: [2.2] improving performance

Post by cc9cii »

I can't seem to run RenderDoc on my laptop. As soon as it is executed, and even after closing, the D3D11 device is locked(? just guessing) and Ogre will crash each time while in D3D11Device::ReleaseAll() until the laptop is restarted. Tried attaching to a running process and that won't capture anything for some reason.

Stack trace in case it is useful:

Code: Select all

 	d3d11.dll!NDXGI::CDevice::EscapeCB()	Unknown	Non-user code. Symbols loaded.
 	igdml64.dll!00007ffa27c77d11()	Unknown	Non-user code. Cannot find or open the PDB file.
	igdml64.dll!00007ffa27c6fd31()	Unknown	Non-user code. Cannot find or open the PDB file.
 	igd10iumd64.dll!00007ffa893d7eba()	Unknown	Non-user code. Cannot find or open the PDB file.
 	d3d11.dll!NDXGI::CDevice::DestroyDriverInstance()	Unknown	Non-user code. Symbols loaded.
 	d3d11.dll!CContext::LUCBeginLayerDestruction()	Unknown	Non-user code. Symbols loaded.
 	d3d11.dll!CD3D11LayeredChild<struct ID3D11DeviceChild,class NDXGI::CDevice,64>::LUCBeginLayerDestruction(void)	Unknown	Non-user code. Symbols loaded.
 	d3d11.dll!CUseCountedObject<NOutermost::CDeviceChild>::`scalar deleting destructor'()	Unknown	Non-user code. Symbols loaded.
 	d3d11.dll!CUseCountedObject<class NOutermost::CDeviceChild>::UCDestroy(void)	Unknown	Non-user code. Symbols loaded.
 	d3d11.dll!CUseCountedObject<class NOutermost::CDeviceChild>::UCReleaseUse(void)	Unknown	Non-user code. Symbols loaded.
 	d3d11.dll!CDevice::LLOBeginLayerDestruction()	Unknown	Non-user code. Symbols loaded.
 	d3d11.dll!NDXGI::CDevice::LLOBeginLayerDestruction()	Unknown	Non-user code. Symbols loaded.
 	d3d11.dll!NOutermost::CDevice::LLOBeginLayerDestruction(void)	Unknown	Non-user code. Symbols loaded.
 	d3d11.dll!TComObject<NOutermost::CDevice>::~TComObject<NOutermost::CDevice>()	Unknown	Non-user code. Symbols loaded.
 	d3d11.dll!TComObject<NOutermost::CDevice>::`scalar deleting destructor'()	Unknown	Non-user code. Symbols loaded.
 	d3d11.dll!TComObject<class NOutermost::CDevice>::Release(void)	Unknown	Non-user code. Symbols loaded.
>	[Inline Frame] RenderSystem_Direct3D11.dll!Ogre::ComPtr<ID3D11Device1>::InternalRelease() Line 108	C++	Symbols loaded.
 	[Inline Frame] RenderSystem_Direct3D11.dll!Ogre::ComPtr<ID3D11Device1>::Reset() Line 233	C++	Symbols loaded.
 	RenderSystem_Direct3D11.dll!Ogre::D3D11Device::ReleaseAll() Line 73	C++	Symbols loaded.
 	RenderSystem_Direct3D11.dll!Ogre::D3D11RenderSystem::createDevice(const std::string & windowTitle) Line 1624	C++	Symbols loaded.
 	RenderSystem_Direct3D11.dll!Ogre::D3D11RenderSystem::_initialise(bool autoCreateWindow, const std::string & windowTitle) Line 762	C++	Symbols loaded.
 	OgreMain.dll!Ogre::Root::initialise(bool autoCreateWindow, const std::string & windowTitle, const std::string & customCapabilitiesConfig) Line 788	C++	Symbols loaded.
 	openmw.exe!OEngine::Render::OgreRenderer::createWindow(const std::string & title, const OEngine::Render::WindowSettings & settings) Line 129	C++	Symbols loaded.

Last edited by cc9cii on Sat Jul 25, 2020 9:26 am, edited 1 time in total.

Lax
Orc
Posts: 476
Joined: Mon Aug 06, 2007 12:53 pm
Location: Saarland, Germany
x 30

Re: [2.2] improving performance

Post by Lax »

Hi cc9cii,

Could you paste your Ogre initialization code, especially how you create the scene manager (how many threads).

Best Regards
Lax
Image
http://www.lukas-kalinowski.com/Homepage/?page_id=1631
Please support Second Earth Technic Base built of Lego bricks for Lego ideas: https://ideas.lego.com/projects/81b9bd1 ... b97b79be62
Image

User avatar
cc9cii
Halfling
Posts: 91
Joined: Tue Sep 18, 2018 4:53 am
x 19

Re: [2.2] improving performance

Post by cc9cii »

Only 1 thread (mainly because I wanted to compare with Ogre 1.10):

Code: Select all

    mPassProvider.reset(new MyGUI::OgreCompositorPassProvider());

    Ogre::CompositorManager2* compositorManager = Ogre::Root::getSingleton().getCompositorManager2();

    if (!compositorManager->getCompositorPassProvider())
        compositorManager->setCompositorPassProvider(mPassProvider.get());

    mScene = mRoot->createSceneManager(Ogre::ST_GENERIC, 1, "OpenMW");

    mCamera = mScene->createCamera("cam");
    mCamera->detachFromParent();
    Ogre::SceneNode* rootNode = mScene->getRootSceneNode();
    Ogre::SceneNode* childNode = rootNode->createChildSceneNode();
    childNode->attachObject(mCamera);

    mCamera->setNearClipDistance(0.5f);
    mCamera->setFarClipDistance( 10000.0f );
    mCamera->setAutoAspectRatio( true );

    mScene->setAmbientLight(Ogre::ColourValue::White, Ogre::ColourValue::White, Ogre::Vector3::UNIT_Y);
    Ogre::SceneNode* lightNode = mScene->getRootSceneNode()->createChildSceneNode();
    Ogre::Light* light = mScene->createLight();
    light->setName("OpenMW");
    lightNode->attachObject(light);
    light->setType(Ogre::Light::LT_DIRECTIONAL);
    Ogre::Vector3 vec(-0.3f, -0.3f, -0.3f);
    vec.normalise();
    light->setDirection(vec);
On another note, I got Nvidia's NSight to run (Ogre keeps crashing with RenderDoc). Oddly enough, sometimes it runs faster with NSight! But when it is running poorly, there's a lot of map/unmap commands and during that time there are no draw commands - I'm guessing these must be loading textures, but why so many? Anyway, need to figure out how to understand what the profiler is telling.

EDIT: Tried increasing the thread count to 3 but no improvement in performance.

Still not sure why it runs faster under NSight.

EDIT2: Maybe my material cache is not working correctly and each and every material is treated as different? Maybe that's why there are so many texture loads? But that doesn't make sense since I'm using createOrRetrieveTexture() and pbsDatablock->setTexture()

al2950
OGRE Expert User
OGRE Expert User
Posts: 1221
Joined: Thu Dec 11, 2008 7:56 pm
Location: Bristol, UK
x 154

Re: [2.2] improving performance

Post by al2950 »

Sorry to hear you are having issues, must be irritating, but like many people here I can promise you your efforts will be rewarded.

Firstly if you have issues with a tool like renderdoc, try and re-produce it with an Ogre sample, as it will help us re-produce it. Or any issue for that matter. I just tried the latest renderdoc with the latest Ogre (2.2.4), and all seems to work fine.

Secondly there might be an issue with MYGUI, so I would be tempted to disable that for the time being.

Thirdly .... You can get massive performance gains with Ogre 2.2+, but there are some gotchas. You mentioned batching, but Ogre 2.1+ actually automatically batches your draw calls together, so when you load a mesh or a texture it tries to put it into a buffer with other textures or meshes. However, for example, if all your textures are different sizes or formats, etc, it will end up creating a new texture array for each texture which could cause issues, and may even cause you to run out of VRAM.

So Ogre 2.2+ can be extremely fast, but there are some important gotchas that you need to be aware of.. Hopefully we can get to the bottom of yours quickly :)

User avatar
cc9cii
Halfling
Posts: 91
Joined: Tue Sep 18, 2018 4:53 am
x 19

Re: [2.2] improving performance

Post by cc9cii »

On my machine, running RenderDoc once will make *all* Ogre sample projects to crash until the laptop is restarted. I suspect drivers, but don't know for sure. I'm also on Ogre 2.2.3 if that makes any difference.

I am hopeful since launching my app using NSight makes it go faster - so the potential is there, but obviously it is not being setup properly. Strangely, exiting NSight leaves the app running, and it is still running faster...

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 4501
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 936
Contact:

Re: [2.2] improving performance

Post by dark_sylinc »

In my experience when RenderDoc crashes on all D3D11 apps, it's because you have some weird 3rd party app (either that came with your laptop to "enhance" the experience, or something you installed afterwards) which forcefully hooks into all D3D11 apps, causing performance and stability problems.

Apps known to do this are MSI Afterburner, Mumble (only when running), Nomachine (only when running), Plays.tv

Old drivers can also be a cause.

Lax
Orc
Posts: 476
Joined: Mon Aug 06, 2007 12:53 pm
Location: Saarland, Germany
x 30

Re: [2.2] improving performance

Post by Lax »

Only 1 thread (mainly because I wanted to compare with Ogre 1.10):
Is not this one of the issues? You try to compare with 1.10, but Ogre 2.x works totally different... So I would test with as many threads as processors. Maybe there is your bottleneck? But its just an assumption.
Image
http://www.lukas-kalinowski.com/Homepage/?page_id=1631
Please support Second Earth Technic Base built of Lego bricks for Lego ideas: https://ideas.lego.com/projects/81b9bd1 ... b97b79be62
Image

User avatar
cc9cii
Halfling
Posts: 91
Joined: Tue Sep 18, 2018 4:53 am
x 19

Re: [2.2] improving performance

Post by cc9cii »

Lax wrote:
Fri Jul 24, 2020 5:56 pm
Only 1 thread (mainly because I wanted to compare with Ogre 1.10):
Is not this one of the issues? You try to compare with 1.10, but Ogre 2.x works totally different... So I would test with as many threads as processors. Maybe there is your bottleneck? But its just an assumption.
Hi, I did try with more threads but unfortunately I didn't see any improvement. In any case, performance with Ogre 2.2 shouldn't go down by so much, no?
dark_sylinc wrote:
Fri Jul 24, 2020 2:20 pm
In my experience when RenderDoc crashes on all D3D11 apps
Just to be clear, it is Ogre that is crashing not any other apps.

EDIT: @dark_sylinc, I can see around 200 map/unmap calls (which I assume to be texture loading) each and every frame - is that normal? When I look at frame profile of the Pbs demo, I can only see a few map/unmap.

EDIT2: Some more info after examining the frames with NSight - more than half the frame time is spent on loading textures and drawing a few HUD overlay items (health bars, local map, etc). There must be something wrong with the way I'm doing these overlay elements. (but that doesn't explain why it runs faster when launched with NSight)

User avatar
cc9cii
Halfling
Posts: 91
Joined: Tue Sep 18, 2018 4:53 am
x 19

Re: [2.2] improving performance

Post by cc9cii »

Sorry about so many message in this thread.

First some good news - depending on what is happening in the scene approx 25% to 50%+ of the frame time is spent on drawing MyGUI widgets which are used for HUD elements. It needs a rewrite or at least quite a bit of optimisation. If anyone has efficient way of doing HUD elements with Ogre 2.2 please share some hints on how to go about it.

Next, not so good news. Even with HUD elements disabled, the performance is still poor (no gain from removing HUD is seen, either). But if the app is launched with NSight, the performance improves and the additional performance from HUD elements being disabled can be seen. So, something NSight is doing while launching the app is providing the extra performance but I don't see what it could be. If there are things I can check, please feel free to suggest anything. EDIT: This one is resolved - apparently Nvidia Optimus on my laptop was choosing Intel integrated graphics whereas NSight forced the dgpu to be used. Similarly, if I force both RenderDoc and the target application (e.g. Ogre's samples) to use NVidia GPU I no longer get the crash. ("resolved" is being kind - Ogre 2.2 / D3D11 running on a dgpu is still getting less FPS than Ogre 1.10 / D3D9 running on integrated graphics, even with the MyGUI HUD stuff disabled... but like everyone mentioned I have a lot of tuning to do to extract the full potential of 2.2, so enough with complaining and time to get things done! Onwards and upwards as they say.)

EDIT: comparing the frame between 2.1 and 2.2, I've noticed that in 2.1 two shaders are active at one time (I don't know if this is the right way to describe it, I'll attach some pics to illustrate) but in 2.2 only one at a time - is there some setting I have to do differently in 2.2? pls ignore the nonsense

Post Reply