Hi, sorry one more question!
Currently in the CTP culling is performed for each pass, at least I think it does! I believe there were plans to give greater control & intelligence to the culling stage to allow you to reuse the cull data of a previous pass and hence better performance. Is this still planned for Ogre 2.0 Final or perhaps later? Or is the culling code now so fast its not as important!?
[SOLVED][Ogre 2.0] Re-use of culling data?
-
- OGRE Expert User
- Posts: 1227
- Joined: Thu Dec 11, 2008 7:56 pm
- Location: Bristol, UK
- x 157
[SOLVED][Ogre 2.0] Re-use of culling data?
Last edited by al2950 on Mon Feb 02, 2015 1:29 pm, edited 1 time in total.
-
- OGRE Team Member
- Posts: 5476
- Joined: Sat Jul 21, 2007 4:55 pm
- Location: Buenos Aires, Argentina
- x 1358
Re: [Ogre 2.0] Re-use of culling data?
Short answer: Yes.
Long answer:
There are several concepts mixed here:
The tricky part is ensuring that two consecutive passes are, in culling terms, equivalent; or expose manual control to the compositor writer (just like we do with LOD lists). Or both (by default auto-check whether the passes are equivalent, or override by manual control)
However I'm delaying this for two reasons:
1. Profiling shows that, while culling is relevant enough to still show up in the profiler, it is one of the most efficient routines. It shows up last. It's ALU to Bandwidth ratio is perfect, and it's the least troublesome (in other words, it went from highest to lowest priority in my TODO list).
2. In 2.0 Final there is a CommandBuffer. Profiling shows that preparing the render queue and sorting it is far more expensive. It is theoretically possible to reuse a command buffer between passes. When we reuse it, performance is blazing fast (CPU is bascally idle: taskmgr shows 1% CPU usage, no vsync). No culling, no render queue processing. Nothing. Just a playback of API calls using memory that is already on the GPU. Just save the buffer of commands, and execute it again. The thing is that unless you want to render exactly the same frame with the same shaders and same parameters, reusing the CommandBuffer in a useful way needs a little bit of thinking, tweaking and ironing out the details (i.e. keep the draw commands intact, but change the commands involving the shaders, which is usually what is needed: i.e. Z pre-pass; multi-pass lighting, etc).
I'm hoping that once 2.0 Final goes public, people will find interesting ways to play with the Cmd Buffer.
Performance gains we could achieve by playing with the CommandBuffer far exceed any work that could be avoided by just skipping Culling.
Even if we can't reuse the CommandBuffer, we could still research on the next best thing: reusing the sorted RenderQueue list; which is far easier and happens right after frustum culling. Building the RenderQueue list shows up on the profiler as the #1 hotspot (note: we're GPU bound); so if there's something to gain, we would have to look there (*).
(*) Note that since 2.0 Final is finally GPU bound (YES!!!); my priorities will refocus on features like LOD; advanced occlusion culling, compute shaders, etc.
Long answer:
There are several concepts mixed here:
- Ogre 1.x updated SceneNodes and AABBs on every pass. Ogre 2.x only does this once. This is done.
- Ogre 1.x would perform culling against all entities from all render queues every pass. Ogre 2.x skips the entities that are not in the render queues that the particular pass renders. This is done.
- Ogre 1.x would try to traverse most of the data from all render queues (i.e. traversing the scene graph), even if those render queues are not included in the pass. Ogre 2.x only processes the RQs included by the pass. This is done.
- Currently, Ogre 2.x performs culling every pass, as you suspect. Even if nothing changed between passes (i.e. two consecutive passes rendering the same render queue ID ranges; like a Z-prepass followed by the regular pass). Ouch!
The tricky part is ensuring that two consecutive passes are, in culling terms, equivalent; or expose manual control to the compositor writer (just like we do with LOD lists). Or both (by default auto-check whether the passes are equivalent, or override by manual control)
However I'm delaying this for two reasons:
1. Profiling shows that, while culling is relevant enough to still show up in the profiler, it is one of the most efficient routines. It shows up last. It's ALU to Bandwidth ratio is perfect, and it's the least troublesome (in other words, it went from highest to lowest priority in my TODO list).
2. In 2.0 Final there is a CommandBuffer. Profiling shows that preparing the render queue and sorting it is far more expensive. It is theoretically possible to reuse a command buffer between passes. When we reuse it, performance is blazing fast (CPU is bascally idle: taskmgr shows 1% CPU usage, no vsync). No culling, no render queue processing. Nothing. Just a playback of API calls using memory that is already on the GPU. Just save the buffer of commands, and execute it again. The thing is that unless you want to render exactly the same frame with the same shaders and same parameters, reusing the CommandBuffer in a useful way needs a little bit of thinking, tweaking and ironing out the details (i.e. keep the draw commands intact, but change the commands involving the shaders, which is usually what is needed: i.e. Z pre-pass; multi-pass lighting, etc).
I'm hoping that once 2.0 Final goes public, people will find interesting ways to play with the Cmd Buffer.
Performance gains we could achieve by playing with the CommandBuffer far exceed any work that could be avoided by just skipping Culling.
Even if we can't reuse the CommandBuffer, we could still research on the next best thing: reusing the sorted RenderQueue list; which is far easier and happens right after frustum culling. Building the RenderQueue list shows up on the profiler as the #1 hotspot (note: we're GPU bound); so if there's something to gain, we would have to look there (*).
(*) Note that since 2.0 Final is finally GPU bound (YES!!!); my priorities will refocus on features like LOD; advanced occlusion culling, compute shaders, etc.
-
- OGRE Expert User
- Posts: 1227
- Joined: Thu Dec 11, 2008 7:56 pm
- Location: Bristol, UK
- x 157
Re: [Ogre 2.0] Re-use of culling data?
Thank you for the detailed explanation.