Note: Due to complex CMake configuration woes, if you have issues while compiling or linking, it is recommended that you perform a clean CMake build if you want to try profiling
For starters, the old Ogre profiling works again. Just set CMake's OGRE_PROFILING_PROVIDER to "internal" (no quotes) and it will just work.
There aren't many changes to the original profiling wiki article. Most notably CompositorWorkspace::setAmalgamatedProfiling was added which collapses all passes into a few elements, if you are only interested in total time spent in each component, rather than time spent in each space.
INTERNAL without CompositorWorkspace::setAmalgamatedProfiling INTERNAL with CompositorWorkspace::setAmalgamatedProfiling Important: Internal has a hardcoded limit of 100 samples on screen, and it will crash if there's more. Increasing OverlayProfileSessionListener::mMaxDisplayProfiles can workaround that problem.
Remotery
This is where it gets juicy. If you update OgreDeps, you'll notice Remotery was added to it. Run CMake again, compile it and run the install script, as usual.
Now run CMake on Ogre and set OGRE_PROFILING_PROVIDER to "remotery" (no quotes).
An oh boy--- prepare for a lot of detailed information: HOT!!! I get:
- What Ogre is doing with each render
- How long each pass is taking
- The GPU side of things as well, including its scheduling!
Btw you can name passes! They're automatically called "CLEAR 152" "SCENE 155" but those are automatically generated names. You can use "profiler_id WhateverNameIWant" in compositor scripts or from C++ code.
How to interpret the data
There is no one way to interpret it. And sometimes you may be missing information (a "hole") because there's likely a missing pair of OgreProfileBegin() and OgreProfileEnd(). Sometimes it's not easy or obvious, but you have to make sense of the data by adding more markers, and making blind educated guesses and try your hypthesis to see if they pan out.
But here's a concrete example:
Forward3D sample has a large GPU bottleneck, which makes it easy. If our theory that the GPU bottleneck is correct then most of the following things should happen:
- The GPU would lag behind the CPU, up to 3 frames (maximum allowed by Ogre)
- Ogre, CPU side, waits in non-sensical stuff like at the beginning of the pass (e.g. close around when the first clear happens) or inside swap buffers. Things like clear or swap buffers take very little time CPU side because it's only issueing one or two commands, so it wouldn't make sense to take long inside of it unless we're waiting for the GPU to catch up
Also notice that while the CPU is processing frame 2591, the GPU just began starting frame 2589. The GPU is lagging behind by 2-3 frames. This sample is definitely GPU bottlenecked.
This is a picture of a sample that can perfectly reach 60 fps with VSync enabled. Noticed the GPU starts processing the frame immediately after Ogre finished with it: Two things we can learn from this picture:
The GPU spends half of the frame "inside swapbuffers". That tells us it's idling showing the picture on screen with nothing to do.
CPU side Ogre spends like 70% or more of its time inside clear. That's a loooot of time for a single command. It seems the driver decides to stall us to wait for the GPU inside the clear. Additionally, it would appear this is double buffer and not triple buffer, because the OpenGL driver is never letting us get ahead by more than 1 frame.
In summary, Remotery is extremely powerful, and it can give you a lot of insight of what your Ogre app is doing, and how its spending its frame time, which will help you diagnose performance issues.
I'm still adding markers so we get more info. And who knows, maybe we get to add per-Renderable markers and find out how long it takes to render each object?
Known issues:
- Remotery sometimes crashes your application, specially at shutdown. Restarting the browser sometimes helps. It's slightly annoying, but not too much. Doesn't look high priority.
- If your app is running too fast (e.g. >400 fps) it will generate many samples and your browser's javascript may have issues catching up. Throttling your application can help. Or you can close it; since the browser will catch up.