Short answer: Yes.
There are several concepts mixed here:
- Ogre 1.x updated SceneNodes and AABBs on every pass. Ogre 2.x only does this once. This is done.
- Ogre 1.x would perform culling against all entities from all render queues every pass. Ogre 2.x skips the entities that are not in the render queues that the particular pass renders. This is done.
- Ogre 1.x would try to traverse most of the data from all render queues (i.e. traversing the scene graph), even if those render queues are not included in the pass. Ogre 2.x only processes the RQs included by the pass. This is done.
- Currently, Ogre 2.x performs culling every pass, as you suspect. Even if nothing changed between passes (i.e. two consecutive passes rendering the same render queue ID ranges; like a Z-prepass followed by the regular pass). Ouch!
It should be relatively trivial to support skip redundant culling, since the data holding this info is usually cleared and reset in _cullPhase01 and used in _renderPhase02; so in theory we just should call _renderPhase02 and avoid _cullPhase01 (give or take a couple of bits of code that may need changing to make this work flawlessly). In fact this has been within my plans.
The tricky part is ensuring that two consecutive passes are, in culling terms, equivalent; or expose manual control to the compositor writer (just like we do with LOD lists). Or both (by default auto-check whether the passes are equivalent, or override by manual control)
However I'm delaying this for two reasons:
1. Profiling shows that, while culling is relevant enough to still show up in the profiler, it is one of the most efficient routines. It shows up last. It's ALU to Bandwidth ratio is perfect, and it's the least troublesome (in other words, it went from highest to lowest priority in my TODO list).
2. In 2.0 Final there is a CommandBuffer. Profiling shows that preparing the render queue and sorting it is far more expensive. It is theoretically possible to reuse a command buffer between passes. When we reuse it, performance is blazing fast (CPU is bascally idle: taskmgr shows 1% CPU usage, no vsync). No culling, no render queue processing. Nothing. Just a playback of API calls using memory that is already on the GPU. Just save the buffer of commands, and execute it again. The thing is that unless you want to render exactly the same frame with the same shaders and same parameters, reusing the CommandBuffer in a useful way needs a little bit of thinking, tweaking and ironing out the details (i.e. keep the draw commands intact, but change the commands involving the shaders, which is usually what is needed: i.e. Z pre-pass; multi-pass lighting, etc).
I'm hoping that once 2.0 Final goes public, people will find interesting ways to play with the Cmd Buffer.
Performance gains we could achieve by playing with the CommandBuffer far exceed any work that could be avoided by just skipping Culling.
Even if we can't reuse the CommandBuffer, we could still research on the next best thing: reusing the sorted RenderQueue list; which is far easier and happens right after frustum culling. Building the RenderQueue list shows up on the profiler as the #1 hotspot (note: we're GPU bound); so if there's something to gain, we would have to look there (*).
(*) Note that since 2.0 Final is finally GPU bound (YES!!!); my priorities will refocus on features like LOD; advanced occlusion culling, compute shaders, etc.