[ogre 13] early-z questions

Problems building or running the engine, queries about how to use features etc.
loath
Platinum Sponsor
Platinum Sponsor
Posts: 294
Joined: Tue Jan 17, 2012 5:18 am
x 67

[ogre 13] early-z questions

Post by loath »

i have some additional early-z questions for ogre 13 / direct11. starting a new thread rather than hijack paul's viewtopic.php?t=96658 thread.

goal: i'd like to have early-z and have access to the depth texture for my transparent shaders (ocean surface, particles, etc) and post processing (ssao, etc). my shaders and materials are all dynamically generated so i can build non-lit early-z vertex and fragment shaders.

how do i make this happen? i believe the following approaches are available but i'd like to get confirmation:

option 1: use a compositor that renders the scene using a material-scheme that overrides the lit shaders with the simple non-lit early-z shaders.

  • a. how do you determine the renderable requirements for a material in handleSchemeNotFound ()? for example, does this renderable/material require instancing support or not? i could embed the requirements in the material name. is there a better approach?

  • b. how do you ensure the depth buffer is filled before the compositor or transparent passes? using a later render queue?

  • c. how do i reference the depth texture in a material? use the name declared in the compositor script? is this an alias to the depth buffer or a RTT texture created by the compositor script?

  • d. does this prevent me from using anti aliasing or is that only when using multiple render targets (MRTs)?

  • e. sub-option is to use the same lit shaders with a potential performance impact per-paroj in the previous thread.

option 2: create a manual render-to-texture and reference this texture via material-scheme (aka rpgplayerrobin's sample?). shaders then report depth via colour output of the non-lit shaders.

  • a. how do i ensure the RTT early-z scene is rendered before my transparent materials and compositors are rendered? render queues?

  • b. i assume there is no way to access the depth buffer (for read only) vs creating a separate RTT?

option 3: add an additional early-z / ambient pass (pass 0) to all my materials. pass 0 would reference the non-lit vertex and fragment shaders.

  • a. this will provide early-z but (probably) not access to a depth texture in later passes of the same material? or is it possible to attach the depth written in pass 0 to passes 1+ for a material? or can i use render queues / render queue priority to render the early-z first to an RTT and then refer to this texture in later render queues?

  • b. render queue sub question: does each render queue completely finish all passes before going to the next level in the render queue?

option 4: any other approaches???

paroj
OGRE Team Member
OGRE Team Member
Posts: 2141
Joined: Sun Mar 30, 2014 2:51 pm
x 1151

Re: [ogre 13] early-z questions

Post by paroj »

first lets clear up the terms: there are depth-textures (aka read-back depth) and there is early-z. You can have both independently as they serve different purposes.

  • early-z is a overdraw reduction technique, which saves you fragment shader execution. If all your objects are sorted front-to-back the gains here are negligible. You can do early-z without reading-back depth - just render your scene twice.
  • depth-textures on the other hand allow you to use the depth-buffer as a texture. This is different to writing depth to RGBA, as you are re-using the same memory here, which is more efficient. Depth-textures are already used for shadowmapping, but are useful for other things as well. I think you are mainly aiming at those.

Option 1

To get a depth texture, you can use a compositor like (untested)

Code: Select all

compositor DepthTexture
{
   technique
   {
        texture scene target_width target_height PF_BYTE_RGBA PF_DEPTH16
        target scene
        {
            pass clear { }
            pass render_scene
            {
                first_render_queue 0
                last_render_queue 10
            }
        }

        // render some queued objects on top, clipping against previous depth
        target scene
        {
            pass render_scene
            {
                first_render_queue 10
                last_render_queue 20
            }
        }

        // Copy final output to screen
        target_output
        {
            input none
            pass render_quad
            {
                material CopyMaterial
                input 0 scene 0
            }
        }
   }
}

Note that no additional materials are needed, as we using the depth-buffer that "normal" rendering generates anyway.

inside a material you can get access the depth-texture as

Code: Select all

material ReadDepthMaterial
{
    technique
    {
        pass
        {
            texture_unit
            {
                content_type compositor DepthTexture scene 1
            }
        }
    }
}

if you were only after early-z, you could do

Code: Select all

compositor EarlyZ
{
   technique
   {
        target_output
        {
            pass clear { }
            pass render_scene
            {
                material_scheme DummyWhite
            }
        }

        target_output
        {
            pass render_scene
            {
                material_scheme ActualShaders
            }
        }
   }
}

that one will not actually work, as compositors only allow one target_output per technique, but you get the idea.

For early-z we actually need a separate, simple material_scheme, as this is the whole point of early-z. To determine whether your material needs instancing, you can use the same logic you already do. After all the "DummyWhite" scheme just uses a different fragment shader - the vertex shader must be the same as the "ActualShaders" scheme.

Option 2

Of course you can write the code that the Compositor above would generate by hand.

Option 3

This will not work. For both early-z and depth-textures the whole scene needs to be rendered before the second pass. However, the material passes are applied when traversing the renderables.

loath
Platinum Sponsor
Platinum Sponsor
Posts: 294
Joined: Tue Jan 17, 2012 5:18 am
x 67

Re: [ogre 13] early-z questions

Post by loath »

this is... pretty badass at how far compositors have come. thanks so much!

paroj
OGRE Team Member
OGRE Team Member
Posts: 2141
Joined: Sun Mar 30, 2014 2:51 pm
x 1151

Re: [ogre 13] early-z questions

Post by paroj »

c. is the performance in this approach better than using compositors (aka option 1)? i mention this after reading dark_sylinc's 2.0 proposal slides, see page 47

the example he gave was

Code: Select all

    target_output
    {
        pass render_scene
        {
            first_render_queue 10
            last_render_queue 90
        }

        pass render_quad { }

        pass render_scene
        {
            first_render_queue 91
            last_render_queue 95
        }
    }

however, after staring on the code for 20min, I think this was never an issue. The RenderTarget is only updated once and both render_scene passes affect the per-target render-queue mask. The render_quad pass is sorted-in accordingly.

What is a problem, is my example from above:

Code: Select all

        target scene
        {
            pass clear { }
            pass render_scene
            {
                first_render_queue 0
                last_render_queue 10
            }
        }

        target scene
        {
            pass render_scene
            {
                first_render_queue 10
                last_render_queue 20
            }
        }

This will update the RenderTarget twice and hence do the culling again.

loath
Platinum Sponsor
Platinum Sponsor
Posts: 294
Joined: Tue Jan 17, 2012 5:18 am
x 67

Re: [ogre 13] early-z questions

Post by loath »

of course, ha ha. i'm assuming this a design level change to fix? (and to the culling code so not specific to compositors)

rpgplayerrobin
Orc Shaman
Posts: 719
Joined: Wed Mar 18, 2009 3:03 am
x 399

Re: [ogre 13] early-z questions

Post by rpgplayerrobin »

option 2: create a manual render-to-texture and reference this texture via material-scheme (aka rpgplayerrobin's sample?). shaders then report depth via colour output of the non-lit shaders.

I am also actually wondering how much performance my version saves on the GPU.
With the automatic PF_DEPTH16 approach, the shaders will still have to go through ALL calculations in the fragment shader, right?
That would mean that for an advanced material with normalmapping, offsetmapping, shadowing, lights, etc, would have to do all those calculations even if those calculations and texture fetches were completely unnecessary. Maybe this is fixed when using RTSS though.

Or does the shaders somehow understand to skip all shader instructions except for the actual depth written? In that case it would be like magic, but I don't think that is the case on normal shaders.

Why I did it this approach was so that I would have a minimal impact on the GPU.
My philosophy is that every little optimization helps (even though this optimization might not be that important, not sure), which is why I can still get over 300 FPS in my game with high graphical settings.

When my project is upgraded, I can attempt this automatic approach instead and compare the performance of it and make an update of it here.

paroj
OGRE Team Member
OGRE Team Member
Posts: 2141
Joined: Sun Mar 30, 2014 2:51 pm
x 1151

Re: [ogre 13] early-z questions

Post by paroj »

loath wrote: Fri May 13, 2022 12:15 am

of course, ha ha. i'm assuming this a design level change to fix? (and to the culling code so not specific to compositors)

actually, no. There is already SceneManager::_renderVisibleObjects, which will skip culling and just process the renderqueue. However skipping culling is only valid if neither camera nor viewport changed.

The possible gain is skipping _findVisibleObjects:
Image

loath
Platinum Sponsor
Platinum Sponsor
Posts: 294
Joined: Tue Jan 17, 2012 5:18 am
x 67

Re: [ogre 13] early-z questions

Post by loath »

rpgplayerrobin wrote: Fri May 13, 2022 12:26 pm

option 2: create a manual render-to-texture and reference this texture via material-scheme (aka rpgplayerrobin's sample?). shaders then report depth via colour output of the non-lit shaders.

I am also actually wondering how much performance my version saves on the GPU.
With the automatic PF_DEPTH16 approach, the shaders will still have to go through ALL calculations in the fragment shader, right?
That would mean that for an advanced material with normalmapping, offsetmapping, shadowing, lights, etc, would have to do all those calculations even if those calculations and texture fetches were completely unnecessary. Maybe this is fixed when using RTSS though.

Or does the shaders somehow understand to skip all shader instructions except for the actual depth written? In that case it would be like magic, but I don't think that is the case on normal shaders.

Why I did it this approach was so that I would have a minimal impact on the GPU.
My philosophy is that every little optimization helps (even though this optimization might not be that important, not sure), which is why I can still get over 300 FPS in my game with high graphical settings.

When my project is upgraded, I can attempt this automatic approach instead and compare the performance of it and make an update of it here.

here is how i interpreted paroj's response: (please correct me if i'm wrong)

  1. overdraw is more of an issue with transparent objects. therefore draw the opaque objects first (ex. queues 10 - 90) with lit shaders. the depth buffer will be filled and we can use it for step 2 (in addition to it's normal uses in step 1).
  2. draw the transparent objects in later stages (ex. queues 91 - 95) which may take advantage of the depth buffer from step 1 potentially reducing overdraw but also useful for post processing techniques.

vs a "pure" early-z approach which requires rendering the opaque objects twice: (once with minimal shaders and once with lit shaders)

  1. draw the opaque objects with minimal shaders just to build the depth buffer.
  2. draw the opaque objects again with lit shaders and hope for a performance reduction from step 1.
  3. draw the transparent objects with the depth buffer and also use with post processing techniques.
Last edited by loath on Fri May 13, 2022 4:34 pm, edited 2 times in total.
loath
Platinum Sponsor
Platinum Sponsor
Posts: 294
Joined: Tue Jan 17, 2012 5:18 am
x 67

Re: [ogre 13] early-z questions

Post by loath »

culling is only valid if neither camera nor viewport changed

are you saying IF the camera/viewport is the same as the previous frame then we can potentially skip? my impression from dark_sylinc's proposal was to cull in groups based on the render queues? (difficult for me to understand from a powerpoint slide although they are excellent).

also seems like optimizations to renderVisibleObjects () could increase performance since 37ms is a huge part of a 53ms frame time. am i understanding this correctly?

paroj
OGRE Team Member
OGRE Team Member
Posts: 2141
Joined: Sun Mar 30, 2014 2:51 pm
x 1151

Re: [ogre 13] early-z questions

Post by paroj »

here is how i interpreted paroj's response: (please correct me if i'm wrong)

yes, I was mainly aiming at this use-case.

However, if the fragment shader does not contain "discard", the drivers usually figure out that they dont need to execute it, if only a depth buffer is bound.

are you saying IF the camera/viewport is the same as the previous frame then we can potentially skip? my impression from dark_sylinc's proposal was to cull in groups based on the render queues?

yes, after _findVisibleObjects all render queues can be used read-only to only render some of them, see:
https://github.com/OGRECave/ogre/blob/f ... .cpp#L1501

also seems like optimizations to renderVisibleObjects () could increase performance

yes, thats why I would focus on that instead of culling optimizations. Note that this is rendering 19600 nodes without instancing. When you have instancing ratios may shift in favor of going for _findVisibleObjects.

loath
Platinum Sponsor
Platinum Sponsor
Posts: 294
Joined: Tue Jan 17, 2012 5:18 am
x 67

Re: [ogre 13] early-z questions

Post by loath »

ok tracking. learning a ton here, thanks.