Profiling restored and integrated Remotery to Ogre 2.1

Discussion area about developing with Ogre-Next (2.1, 2.2 and beyond)


Post Reply
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Profiling restored and integrated Remotery to Ogre 2.1

Post by dark_sylinc »

If you have been following the commits or my twitter feed, you may have seen that I've been working on restoring profiling.

Note: Due to complex CMake configuration woes, if you have issues while compiling or linking, it is recommended that you perform a clean CMake build if you want to try profiling

For starters, the old Ogre profiling works again. Just set CMake's OGRE_PROFILING_PROVIDER to "internal" (no quotes) and it will just work.

There aren't many changes to the original profiling wiki article. Most notably CompositorWorkspace::setAmalgamatedProfiling was added which collapses all passes into a few elements, if you are only interested in total time spent in each component, rather than time spent in each space.

INTERNAL without CompositorWorkspace::setAmalgamatedProfiling
Internal.jpg
INTERNAL with CompositorWorkspace::setAmalgamatedProfiling
Internal_Amalgamated.jpg
Important: Internal has a hardcoded limit of 100 samples on screen, and it will crash if there's more. Increasing OverlayProfileSessionListener::mMaxDisplayProfiles can workaround that problem.


Remotery
This is where it gets juicy. If you update OgreDeps, you'll notice Remotery was added to it. Run CMake again, compile it and run the install script, as usual.

Now run CMake on Ogre and set OGRE_PROFILING_PROVIDER to "remotery" (no quotes).
An oh boy--- prepare for a lot of detailed information:
Screenshot_2017-10-01_22-28-08.png
Screenshot_2017-10-01_22-29-01.png
HOT!!! I get:
  • What Ogre is doing with each render
  • How long each pass is taking
  • The GPU side of things as well, including its scheduling!
You'll find the browser tool in OgreDeps/src/Remotery/vis/index.html

Btw you can name passes! They're automatically called "CLEAR 152" "SCENE 155" but those are automatically generated names. You can use "profiler_id WhateverNameIWant" in compositor scripts or from C++ code.


How to interpret the data
There is no one way to interpret it. And sometimes you may be missing information (a "hole") because there's likely a missing pair of OgreProfileBegin() and OgreProfileEnd(). Sometimes it's not easy or obvious, but you have to make sense of the data by adding more markers, and making blind educated guesses and try your hypthesis to see if they pan out.
But here's a concrete example:

Forward3D sample has a large GPU bottleneck, which makes it easy. If our theory that the GPU bottleneck is correct then most of the following things should happen:
  • The GPU would lag behind the CPU, up to 3 frames (maximum allowed by Ogre)
  • Ogre, CPU side, waits in non-sensical stuff like at the beginning of the pass (e.g. close around when the first clear happens) or inside swap buffers. Things like clear or swap buffers take very little time CPU side because it's only issueing one or two commands, so it wouldn't make sense to take long inside of it unless we're waiting for the GPU to catch up
And here's the picture:
Bottleneck.png
Notice that frame 2589 spends like 40% of its CPU time inside SwapBuffers!
Also notice that while the CPU is processing frame 2591, the GPU just began starting frame 2589. The GPU is lagging behind by 2-3 frames. This sample is definitely GPU bottlenecked.

This is a picture of a sample that can perfectly reach 60 fps with VSync enabled. Noticed the GPU starts processing the frame immediately after Ogre finished with it:
60fpsVSync.png
Two things we can learn from this picture:
The GPU spends half of the frame "inside swapbuffers". That tells us it's idling showing the picture on screen with nothing to do.
CPU side Ogre spends like 70% or more of its time inside clear. That's a loooot of time for a single command. It seems the driver decides to stall us to wait for the GPU inside the clear. Additionally, it would appear this is double buffer and not triple buffer, because the OpenGL driver is never letting us get ahead by more than 1 frame.

In summary, Remotery is extremely powerful, and it can give you a lot of insight of what your Ogre app is doing, and how its spending its frame time, which will help you diagnose performance issues.

I'm still adding markers so we get more info. And who knows, maybe we get to add per-Renderable markers and find out how long it takes to render each object?

Known issues:
  • Remotery sometimes crashes your application, specially at shutdown. Restarting the browser sometimes helps. It's slightly annoying, but not too much. Doesn't look high priority.
  • If your app is running too fast (e.g. >400 fps) it will generate many samples and your browser's javascript may have issues catching up. Throttling your application can help. Or you can close it; since the browser will catch up.
123iamking
Gremlin
Posts: 152
Joined: Sat Aug 12, 2017 4:16 pm
x 4

Re: Profiling restored and integrated Remotery to Ogre 2.1

Post by 123iamking »

Thank you dark_sylinc, you are awesome! :D
al2950
OGRE Expert User
OGRE Expert User
Posts: 1227
Joined: Thu Dec 11, 2008 7:56 pm
Location: Bristol, UK
x 157

Re: Profiling restored and integrated Remotery to Ogre 2.1

Post by al2950 »

So awesome :D

However there are a bunch of compile problems (Yes I did do a completely fresh cmake build), which can be covered over with unity builds. I think the majority are caused by render system plugins not including OgreProfiler.h.

Also after I got it to compiled and run it with dx11 and it works great for the PBS sample, but when using it with screen space reflections sample I get this somewhat unhelpful output!
remotery.jpg
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Profiling restored and integrated Remotery to Ogre 2.1

Post by dark_sylinc »

al2950 wrote:Also after I got it to compiled and run it with dx11 and it works great for the PBS sample, but when using it with screen space reflections sample I get this somewhat unhelpful output!
I'll check if it happens very often, but this problem is just Remotery not being able to catch up (or a bug). It's displaying the name hashes instead of the actual names. Is the sample running very fast for you? Sometimes it just fixes up by restarting browser and/or the app.
I don't see a D3D11/GL timeline either, may be you could minimize the sample so Remotery can catch up?

Alternatively try looking in StringTable_Constructor and increase the amount of storage available for names:

Code: Select all

New_1(Buffer, table->text, 8 * 1024);
New_1(rmtHashTable, table->text_map, 1 * 1024);
al2950
OGRE Expert User
OGRE Expert User
Posts: 1227
Joined: Thu Dec 11, 2008 7:56 pm
Location: Bristol, UK
x 157

Re: Profiling restored and integrated Remotery to Ogre 2.1

Post by al2950 »

The timelines are completely useless. The GPU one is missing the and CPU one is empty, or at least I spent a long time searching for something in it to no avail! And the table with the hashes seems to be just garbage. :(

I was testing stuff on my laptop and using the intel card so was only achieving 50fps. I tried on the NVidia card but still had the same issue. So I went through every sample, and all apart from 3 work perfectly (even the simple ones running at 400fps). The 3 that don't work are;
- Screen space reflections
- HDR
- HDR SMAA

Any idea what the connection of those 3 might be!??

PS. All run using D3d11.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Profiling restored and integrated Remotery to Ogre 2.1

Post by dark_sylinc »

Yeah the connection is that all of them have large compositor chains, this they must be generating many samples per second.

Remotery has a few buffer size settings. Though I'm convinced there's a subtle bug or race condition in Remotery. We'll eventually nail it
al2950
OGRE Expert User
OGRE Expert User
Posts: 1227
Joined: Thu Dec 11, 2008 7:56 pm
Location: Bristol, UK
x 157

Re: Profiling restored and integrated Remotery to Ogre 2.1

Post by al2950 »

Ah ok, I will dig a little deeper :)
xrgo
OGRE Expert User
OGRE Expert User
Posts: 1148
Joined: Sat Jul 06, 2013 10:59 pm
Location: Chile
x 168

Re: Profiling restored and integrated Remotery to Ogre 2.1

Post by xrgo »

Thank you so much for this!! now I can try to achieve more performance performance performance!!!

But I have a little problem, I renamed my passes (nice feature btw) and...
the compositor nodes that I generate via code like the shadowNode and myGui... I can see the names I setted, But my main rendering node that is defined in a script does not appear.
Image

its defined like this

Code: Select all

compositor_node MainEngineRenderingNode
{
	in 0 renderwindow
	in 1 TerraShadowTexture
	texture renderAux target_width target_height PF_FLOAT16_RGB

	target renderAux
	{

		pass clear
		{
			colour_value 0 0 0 1
			profiler_id MAIN_CLEAR_PASS
		}

		pass render_scene
		{
			expose TerraShadowTexture
			overlays	off
			shadows		MainEngineShadowNode
			profiler_id MAIN_SCENE_PASS
		}
	}

	target renderwindow
	{
		pass render_quad
		{
			material	FinalProcessingMat
			input 0	renderAux
			profiler_id MAIN_FINAL_PROCESSING_QUAD_PASS
		}
		pass render_scene
		{
			lod_update_list off
			profiler_id MAIN_OVERLAY_SCENE_PASS

			//Render Overlays
			overlays	on
			rq_first	254
			rq_last		255
		}
	}
}
btw shadows are very expensive, and why there is a "Forward3d light collect" in the shadow rendering?

edit: also, you can now move the remotery trello card from todo to done! =D
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Profiling restored and integrated Remotery to Ogre 2.1

Post by dark_sylinc »

xrgo wrote:But I have a little problem, I renamed my passes (nice feature btw) and...
the compositor nodes that I generate via code like the shadowNode and myGui... I can see the names I setted, But my main rendering node that is defined in a script does not appear.
That's probably a bug. Now that I think of it, I don't think I actually tested it.
xrgo wrote:btw shadows are very expensive
Yeah, they usually are. Btw your last shadow map is much more expensive than the rest GPU wise. CPU-wise shadowmap3 took 45% more than shadowmap2, but GPU wise it took 122% more! Either vertex count or shadow map resolution is to blame of the difference.
Maybe you could tune MovableObject::setRenderingDistance? (this reminds me Ogre needs to add a setShadowRenderingDistance). I'm sure that last shadow map has many tiny objects that are being rendered without really contributing to shadows in a meaningful way.
and why there is a "Forward3d light collect" in the shadow rendering?
If I read your screenshot correctly, there is only one "Forward3d light collect" and it's in the same level as your main rendering pass. Scene passes look like this:
  • Main Scene pass
    • ShadowMap pass
      • Frustum Culling
      • Rendering
  • Forward3d light collect (this one's the main pass)
  • Frustum Culling (this one's the main pass)
  • Rendering (this one's the main pass)
Btw remember to click on the frames of the same number for a fair comparison. The picture you took is using frame #47377 from GPU, and frame #47358 from CPU.

Cheers
xrgo
OGRE Expert User
OGRE Expert User
Posts: 1148
Joined: Sat Jul 06, 2013 10:59 pm
Location: Chile
x 168

Re: Profiling restored and integrated Remotery to Ogre 2.1

Post by xrgo »

dark_sylinc wrote:That's probably a bug. Now that I think of it, I don't think I actually tested it.
I bet the fix is coming! xD
dark_sylinc wrote:Either vertex count or shadow map resolution is to blame of the difference.
Maybe you could tune MovableObject::setRenderingDistance? (this reminds me Ogre needs to add a setShadowRenderingDistance). I'm sure that last shadow map has many tiny objects that are being rendered without really contributing to shadows in a meaningful way.
mmm I tried reducing the res to 32px and its the same, then I tried with different objects and noticed that the mere existence of an object in the map causes the time to rise, if I leave just a simple cube I get the problem... I'll keep looking
dark_sylinc wrote:If I read your screenshot correctly, there is only one "Forward3d light collect" and it's in the same level as your main rendering pass
oohh I see now
dark_sylinc wrote:Btw remember to click on the frames of the same number for a fair comparison. The picture you took is using frame #47377 from GPU, and frame #47358 from CPU.
thanks, didn't noticed that

Thank you so much for everything!! you are the best!
al2950
OGRE Expert User
OGRE Expert User
Posts: 1227
Joined: Thu Dec 11, 2008 7:56 pm
Location: Bristol, UK
x 157

Re: Profiling restored and integrated Remotery to Ogre 2.1

Post by al2950 »

Could we remove "Profiler.h" from "Ogre.h", it breaks a lot of 3rd party plugins because they can't find remotery??
aymar
Greenskin
Posts: 145
Joined: Fri Jun 12, 2015 6:53 pm
Location: Florianopolis, Brazil
x 17

Re: Profiling restored and integrated Remotery to Ogre 2.1

Post by aymar »

For me, after deleting the cache, the CMake detects Remotery in both:
Remotery_LIBRARIES_DGB
Remotery_LIBRARIES_REL

But it does not make available OGRE_PROFILING_PROVIDER.

I'm using the latest ogredeps, freshly compiled and the last v2.1 branch.

Am I doing something wrong?
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Profiling restored and integrated Remotery to Ogre 2.1

Post by dark_sylinc »

You need to tick "Advanced" to see it
User avatar
Kaylx
Greenskin
Posts: 123
Joined: Sun May 22, 2011 10:45 pm
Location: Scotland
x 7

Re: Profiling restored and integrated Remotery to Ogre 2.1

Post by Kaylx »

@dark_sylinc

On a random note...
Why has the name of 2.1 went from Tindalos to 'B' ?
See https://bitbucket.org/sinbad/ogre/commi ... d11f2830e5.
Just curious. :)
al2950
OGRE Expert User
OGRE Expert User
Posts: 1227
Joined: Thu Dec 11, 2008 7:56 pm
Location: Bristol, UK
x 157

Re: Profiling restored and integrated Remotery to Ogre 2.1

Post by al2950 »

Kaylx wrote:@dark_sylinc

On a random note...
Why has the name of 2.1 went from Tindalos to 'B' ?
See https://bitbucket.org/sinbad/ogre/commi ... d11f2830e5.
Just curious. :)
Yes that caught me out as well, as my CMAKE started to fail!

I have fixed the issues that I had above and created a pull request here; (Issue was caused by ProfilerBegin, but no ProfilerEnd)
https://bitbucket.org/sinbad/ogre/pull- ... fixes/diff

Also this Pull Request is of interest (shameful nudge/bump :wink: )
https://bitbucket.org/sinbad/ogre/pull- ... fixes/diff
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Profiling restored and integrated Remotery to Ogre 2.1

Post by dark_sylinc »

Kaylx wrote:@dark_sylinc

On a random note...
Why has the name of 2.1 went from Tindalos to 'B' ?
See https://bitbucket.org/sinbad/ogre/commi ... d11f2830e5.
Just curious. :)
It's was an old pending business. Everything was being named Tindalos because it wasn't being updated.
See http://www.ogre3d.org/forums/viewtopic.php?f=25&t=88600

While the exact name hasn't been settled (reason is in the posts), it's codename is B.
al2950 wrote: I have fixed the issues that I had above and created a pull request here; (Issue was caused by ProfilerBegin, but no ProfilerEnd)
https://bitbucket.org/sinbad/ogre/pull- ... fixes/diff
Integrated.
al2950 wrote:Also this Pull Request is of interest (shameful nudge/bump :wink: )
https://bitbucket.org/sinbad/ogre/pull- ... fixes/diff
I have yet to check it out. It's a lot of changes and I certainly don't want to push the offset code to the shaders when it's not needed (i.e. penalize users who are writing to the whole screen). Could you change that to be conditional based on a new property? (that is set if viewport.x or viewport.y != 0)
al2950
OGRE Expert User
OGRE Expert User
Posts: 1227
Joined: Thu Dec 11, 2008 7:56 pm
Location: Bristol, UK
x 157

Re: Profiling restored and integrated Remotery to Ogre 2.1

Post by al2950 »

ok thanks :). I will continue conversation re forwardPlus fixes on the pull request its self.
aymar
Greenskin
Posts: 145
Joined: Fri Jun 12, 2015 6:53 pm
Location: Florianopolis, Brazil
x 17

Re: Profiling restored and integrated Remotery to Ogre 2.1

Post by aymar »

Is this:

Code: Select all

Ogre::Profiler::getSingleton().setEnabled(true);
OgreProfileBegin("Ogre Main Loop");
OgreProfileEnd("Ogre Main Loop");
Necessary for remotery to work?
xrgo
OGRE Expert User
OGRE Expert User
Posts: 1148
Joined: Sat Jul 06, 2013 10:59 pm
Location: Chile
x 168

Re: Profiling restored and integrated Remotery to Ogre 2.1

Post by xrgo »

this:

Code: Select all

    OGRE_NEW Ogre::Profiler();
    Ogre::Profiler::getSingleton().setEnabled(true);
al2950
OGRE Expert User
OGRE Expert User
Posts: 1227
Joined: Thu Dec 11, 2008 7:56 pm
Location: Bristol, UK
x 157

Re: Profiling restored and integrated Remotery to Ogre 2.1

Post by al2950 »

Any chance anyone with access could update the remotery version in OgreDeps? Would be useful to have these recent added functions;
  • Display time in milliseconds.
  • aggregate call counts
  • time in self
Post Reply