Hardware PCF shadows

Discussion area about developing or extending OGRE, adding plugins for it or building applications on it. No newbie questions please, use the Help forum for that.
User avatar
sparkprime
Ogre Magi
Posts: 1137
Joined: Mon May 07, 2007 3:43 am
Location: Ossining, New York
x 13

Re: Hardware PCF shadows

Post by sparkprime »

dark_sylinc wrote:Because of some HW restrictions, on among other things, this may not be 100% true, but you have deterministic control over it (which means you can anticipate when it will be shared and when wit won't at production time) and besides the main advantage is that you're guaranteed that RTTs with different depth buffer pool IDs will never share the same depth buffer (unless RSC_RTT_SEPARATE_DEPTHBUFFER isn't present, which happens in arcane OpenGL implementations).
So this is a conservative thing -- you can force things to be separate (for correctness) but you cannot force things to be shared (for performance)?
Now to your point: The purpose of DepthBuffer is to encapsulate API-dependant buffers. In D3D9 it's overloaded by D3D9DepthBuffer, in OGL by GLDepthBuffer.
OK that sounds good, are there created using depth formats? I'm not sure which of these to support, and what the GL equivalents are (from the page you linked a few posts ago)

D3DFMT_D16
D3DFMT_D24X8

These two seem to be supported on modern cards and are only useful when doing a PCF shadow fetch from the shader, we could expose them as PF_SHADOW and PF_SHADOW24 for example.

DF16
DF24
INTZ
RAWZ

Here it's more messy. We should definitely support INTZ as we want to be looking forward and this is supported across all cards with wide capabilities. For older hardware the others are needed. Perhaps we can abstract them into a common 'supported everywhere' format but it seems the limitations are too different between them for this to be practical. Something to think about.

You mentioned
In Ogre, the pixel format PF_DEPTH is reserved for this use, albeit not used.
Not sure whether to remove this or use it for INTZ?

Not sure what to do for GL, I'll ask some folks on IRC and see what they say.
I like the 2nd option most as it's clean and easy.
Agreed, there should be an RTT with depth buffer but no colour buffer. Hopefully this is possible without making a mess of the RenderTarget API.

This also mirrors the ability to write only depth in a particular pass, which I think is nice.
[*] Shadow's depth buffer pool ID can be set in SceneManager::setShadowTextureConfig()
in addition to this, presumably i will have to modify the ShadowTextureConfig to allow specifying 'no colour please', probably with a boolean member variable.
Sticking to the pool ID rules would be really cool.
Yes I think it is rare to be creating so many of these things that numbering would be a problem.

Thanks for your reply, it's exactly the input I was looking for :)
User avatar
sparkprime
Ogre Magi
Posts: 1137
Joined: Mon May 07, 2007 3:43 am
Location: Ossining, New York
x 13

Re: Hardware PCF shadows

Post by sparkprime »

From

http://www.opengl.org/wiki/Image_Format#Depth_formats
Depth formats

These image formats store depth information. There are two kinds of depth formats: normalized integer and floating-point. The normalized integer versions work similar to normalized integers for color formats; they may the integer range onto the depth values [0, 1]. The floating-point version can store any 32-bit floating-point value.

What makes the 32-bit float depth texture particularly interesting is that, as a depth texture format, it can be used with the so-called "shadow" texture lookup functions. Color formats cannot be used with these texture functions.

The available formats are: GL_DEPTH_COMPONENT16, GL_DEPTH_COMPONENT24, GL_DEPTH_COMPONENT32 and GL_DEPTH_COMPONENT_32F.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5514
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1379

Re: Hardware PCF shadows

Post by dark_sylinc »

sparkprime wrote:So this is a conservative thing -- you can force things to be separate (for correctness) but you cannot force things to be shared (for performance)?
Yup, that's correct. Nice way of putting it
OK that sounds good, are there created using depth formats? I'm not sure which of these to support, and what the GL equivalents are (from the page you linked a few posts ago)

D3DFMT_D16
D3DFMT_D24X8
Yes, but they're chosen by the RenderSystem. And the specific format is stored in D3D9DepthBuffer & GLDepthBuffer. More on this later
These two seem to be supported on modern cards and are only useful when doing a PCF shadow fetch from the shader, we could expose them as PF_SHADOW and PF_SHADOW24 for example.
I would prefer to use "PF_DEPTH" only because you can't be picky with what you request. Keep reading. BTW it's also useful when doing all kind of depth-based compositing.
In Ogre, the pixel format PF_DEPTH is reserved for this use, albeit not used.
Not sure whether to remove this or use it for INTZ?
Use it for INTZ. "INTZ" is hack that just exposes Dx10 capabilities to the Dx9 API; therefore you get (almost?) the same stuff you would get by using D3D11 or GL rendersystems. The user doesn't need to know we're hacking. He just needs to know when it's present, and that it works consistently.
Agreed, there should be an RTT with depth buffer but no colour buffer. Hopefully this is possible without making a mess of the RenderTarget API.

This also mirrors the ability to write only depth in a particular pass, which I think is nice.
Just for you to know, write only depth passes can performed using a material pass with colour_write

Well, now to the depth format issue:
D3D9:
Ogre chooses it in D3D9RenderSystem::_createDepthBufferFor(), which delegates to _getDepthStencilFormatFor()
Choosing a depth format for the RT can be picky. Get it wrong and it will fail in some GPUs or multi monitor setups.
This has historic reasons. In the past, many cards only supported a 16-bit Z-buffer, but both 16-bit & 32-bit backbuffers. For performance reasons, usually 16 for colour & 16 for depth was often chosen.
NVIDIA deviated from this trend, and it's Vanta & TNT cards introduced the following rule: 32-bit colour goes with 32-bit depth buffers, and 16 with 16 (in other words, they must use same bit depth).
Other vendors started to mimic them and deviate with their own rules, causing a mess. Stencil buffers were added to DirectX 7 (or was it d3d 6?) and then it became worse: Some GPUs allowed 15-bit depth 1-bit stencil, while others didn't support stencil in 16-bit depth buffers and your only choice was 24-bit depth 8-bit stencil.
What's curious is that these ridiculous rules still exist (well, they aren't ridiculous, but they're a major PITA). These rules are common to both APIs, as these are hardware restrictions.
To mitigate the problem, IDirect3D9::CheckDeviceFormat & CheckDepthStencilMatch were created in D3D8.

So, basically, you can't assume you'll get what you want. Specially if you intend to use 16-bit depth formats.
Ogre plays safe by first trying to query for 24-bit depth 8-bit stencil, then 24-bit depth no stencil, then 16-bit depth. In that order. Modern cards all have D24S8 which is the safest assumption these days, but if you stick to other formats, prepare for customer support ("the game doesn't run in my machine!")
So, Ogre creates an 8-bit stencil always, even if you won't be using stencil at all. This isn't perfect either, as I read once an old ATI paper that said their X series (X600 X1200 X1800, etc) wouldn't benefit from fast Z clear if you don't set the clear stencil flag and you've requested an S8 format, even if you've never used it. I want to believe their current drivers now track if you've used stencil at all and smartly add the stencil clear flag otherwise. ATI's X series are crap anyway (real crap)


GL
OGL has 3 ways to render offscreen (ordered from bad to good):
  • Copy
  • Pbuffers
  • FBO (Frame Buffer Object)
Each technique is handled by a manager: GLCopyingRTTManager, GLPBRTTManager, GLFBOManager
Also each technique uses it's own render texture overload: GLCopyingRenderTexture, GLPBRenderTexture, GLFBORenderTexture

Copy
This is more like a trick. It's not really 'offscreen'. You just render to a portion of the backbuffer and then copy VRAM to a texture. This is old school and how they handled cool effects in the 90's. Needless to say, a RTT can't be bigger than the screen resolution. The depth buffer is always shared for everybody (as there's no notion of separate depth buffer). Very old technique.
Switch "RTT: Preferred Mode: Copy" in the config dialog to see it.
When in this mode, a dummy GLDepthBuffer is created.

PBuffer
Old method. It was supposed to be the way OpenGL would handle RTTs, but it was a complete fiasco. Platform dependent (Windows/Linux/Mac), and too slow as it required very expensive state changes. I can't say much about it because I haven't really experimented with it. It's support was removed from most OGL drivers. Ogre fallbacks to "Copy" if you've selected pbuffer and they aren't supported.
I can't recall if GLDepthBuffer works here. But I never had the chance to test it (and no one had)

FBO
The way it is handled in the present.
GLFBOManager is in charge of choosing a matching depth format for the given RTT, See GLFBOManager::_tryFormat. IIRC Ogre proves everything with all formats and saves them into a list at startup. This is because OGL has an idiotic way of doing things, that to see if something is supported, you have to attempt to use it and check for errors. In realtime this means lots of state changes 'just for querying' which could make it prohibitively slow (Unrelated: I pity GL driver developers, as they need to ensure the driver doesn't crash when you're querying for support for a combination of state changes they didn't anticipate)
Note: D3D Rendersystem caches the query results for each format as they're being requested. This is possible because it's much simpler to query.

What's important about OGL FBO's, is that they allow the possibility of a GPU having depth buffers and stencil buffers physically separate in memory (no such device exists to my knowledge, but they might in the handheld field using GL ES) and hence, in OpenGL you have to request for depth a object, and a stencil object separately. But you have to also be aware that most (all) GPUs may require those two objects to be binded together at the same time (since in practice, stencil and depth lay in the same memory area).
This is properly taken care of in GLDepthBuffer. Also see static const GLenum stencilFormats[] & static const GLenum depthFormats[] declared in OgreGLFBORenderTexture.cpp

GLRenderSystem::_createDepthBufferFor takes care of choosing the right format by delegating to GLFBOManager::getBestDepthStencil (which uses the precomputed table generated at startup)


So, bottom line; choosing the right Depth format can be very tough. Imagine what would happen if you were allowed to be picky with a custom format. That's why I'm inclined for the "PF_DEPTH" enumeration. Which format you actually get depends on too many variables, including how the main render window was created.
User avatar
sparkprime
Ogre Magi
Posts: 1137
Joined: Mon May 07, 2007 3:43 am
Location: Ossining, New York
x 13

Re: Hardware PCF shadows

Post by sparkprime »

copy and pbuffer are of no concern then -- this stuff will simply not be available (and will not be used by default)

There has to be a way to at least attempt to specify the following when making the depth buffers

1) bit depth
2) floating point or integer
3) shadow fetch or depth fetch

i think (3) may only be relevant on nvidia cards?

I'll have a skim of the classes you mentioned anyway
User avatar
sparkprime
Ogre Magi
Posts: 1137
Joined: Mon May 07, 2007 3:43 am
Location: Ossining, New York
x 13

Re: Hardware PCF shadows

Post by sparkprime »

Also what happens if you do a depth read of a depth/stencil combined texture? do you just get the depth part, do you get depth/stencil in R/G channels or do you get all the bits in a single value?
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5514
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1379

Re: Hardware PCF shadows

Post by dark_sylinc »

sparkprime wrote: There has to be a way to at least attempt to specify the following when making the depth buffers
1) bit depth
No.
In D3D10 you're forced to 32 bit
OGL has greater flexibility, but I'm not sure how well it works.
sparkprime wrote: 2) floating point or integer
To my knowledge, no.
sparkprime wrote: 3) shadow fetch or depth fetch
I'm convinced the D24S8_SHADOWMAP flag is some kind of mistake. Google doesn't give anything meaningful, and that value is nowhere declared. We don't even know what value is it supposed to be.
I'm sure if you use tex2Dproj with bilinear filtering on a depth buffer bound as a texture it will work.
User avatar
sparkprime
Ogre Magi
Posts: 1137
Joined: Mon May 07, 2007 3:43 am
Location: Ossining, New York
x 13

Re: Hardware PCF shadows

Post by sparkprime »

OK, we can proceed with a single ogre-level format that is mapped down to the best thing we can find, and then see what does and doesn't work. There is probably little harm in adding more of them later if that doesn't work.

floating point depth textures can be off the menu for now (I didn't actually need them anyway).

PF_DEPTH can be a normalised unsigned integer depth texture of either 16 or 24 bits depending on support

depth vs shadow lookups are (hopefully) decided by what kind of code you write in the shader
User avatar
sparkprime
Ogre Magi
Posts: 1137
Joined: Mon May 07, 2007 3:43 am
Location: Ossining, New York
x 13

Re: Hardware PCF shadows

Post by sparkprime »

I can see where this tex2Dproj confusion is coming from now. In the 3 languages, these are the available functions:

HLSL:
float4 tex2D(sampler2D, float2) // depth fetch
float4 tex2Dbias(sampler2D, float4) // (z unused)
float4 tex2Dlod(sampler2D, float4) // (z unused)
float4 tex2Dproj(sampler2D, float4) // (z unused)

It seems there is no support for shadow or depth fetch in the language. However there is an obvious hole in that the z component for the bias/lod/proj is unused so this can be used for either depth or shadow fetches. However, there is no way to choose which kind of fetch you want. This must be why there are two different formats in D3D.


CG:
float4 tex2D(sampler2D, float2) // depth fetch
float4 tex2D(sampler2D, float3) // shadow fetch
float4 tex2Dproj(sampler2D, float3) // depth fetch with projective divide
float4 tex2Dproj(sampler2D, float4) // shadow fetch with projective divide

How beautiful. Nothing to say here.

GLSL:
vec4 texture2D(sampler2D, vec2, float)
vec4 texture2DProj(sampler2D, vec3, float)
vec4 texture2DProj(sampler2D, vec4, float)
vec4 shadow2DProj(sampler2DShadow, vec4, float)

The extra float is the mipmap bias. The first 3 clearly have no z parameter so can only be used for depth fetches. The last one incorporates both projection and a shadow test. There is no way to do a shadow test without projection. However, obviously by passing '1' as the w value you can use a tex2Dproj to get a tex2D function so this omission is not a big deal.

I'd say that with GLSL and CG there is no problem wiht using a single depth format for all depth textures. However for HLSL the intention needs to be explicit in the depth texture format.
User avatar
sparkprime
Ogre Magi
Posts: 1137
Joined: Mon May 07, 2007 3:43 am
Location: Ossining, New York
x 13

Re: Hardware PCF shadows

Post by sparkprime »

dark_sylinc wrote: In D3D10 you're forced to 32 bit
I don't see where in there it says you're forced to use 32 bit. I see this:

descDepth.Format = pDeviceSettings->d3d10.AutoDepthStencilFormat;

but that could be anything
User avatar
sparkprime
Ogre Magi
Posts: 1137
Joined: Mon May 07, 2007 3:43 am
Location: Ossining, New York
x 13

Re: Hardware PCF shadows

Post by sparkprime »

d3d9 formats:

Buffer flags Value Format
D3DFMT_D16_LOCKABLE 70 16-bit z-buffer bit depth.
D3DFMT_D32 71 32-bit z-buffer bit depth.
D3DFMT_D15S1 73 16-bit z-buffer bit depth where 15 bits are reserved for the depth channel and 1 bit is reserved for the stencil channel.
D3DFMT_D24S8 75 32-bit z-buffer bit depth using 24 bits for the depth channel and 8 bits for the stencil channel.
D3DFMT_D24X8 77 32-bit z-buffer bit depth using 24 bits for the depth channel.
D3DFMT_D24X4S4 79 32-bit z-buffer bit depth using 24 bits for the depth channel and 4 bits for the stencil channel.
D3DFMT_D32F_LOCKABLE 82 A lockable format where the depth value is represented as a standard IEEE floating-point number.
D3DFMT_D24FS8 83 A non-lockable format that contains 24 bits of depth (in a 24-bit floating point format - 20e4) and 8 bits of stencil.
D3DFMT_D32_LOCKABLE 84 A lockable 32-bit depth buffer.
D3DFMT_S8_LOCKABLE 84 A lockable 8-bit stencil buffer.
D3DFMT_D16 80 16-bit z-buffer bit depth.
User avatar
sparkprime
Ogre Magi
Posts: 1137
Joined: Mon May 07, 2007 3:43 am
Location: Ossining, New York
x 13

Re: Hardware PCF shadows

Post by sparkprime »

lots of information for nvidia in 6.2 - 6.4 of http://developer.download.nvidia.com/GP ... de_G80.pdf

in particular for shadow fetch, the depth format is D3DFMT_D24X8 (i presume most of the other standard formats will work too)

for doing a depth fetch it is INTZ or RAWZ for pre GF8, but RAWZ has different semantics so this would have to be made known to the client programmer so they can toggle in some extra code in their shader (to filter out the stencil part)
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5514
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1379

Re: Hardware PCF shadows

Post by dark_sylinc »

sparkprime wrote:I don't see where in there it says you're forced to use 32 bit. I see this:

descDepth.Format = pDeviceSettings->d3d10.AutoDepthStencilFormat;

but that could be anything
I was wrong.
I'll quote what the link says:
To create a depth-stencil buffer that can be used as both a depth-stencil resource and a shader resource a few changes need to be made to sample code in the Create a Depth-Stencil Resource section.
The depth-stencil resource must have a typeless format such as DXGI_FORMAT_R32_TYPELESS.

descDepth.Format = DXGI_FORMAT_R32_TYPELESS;
However, you can use a DXGI_FORMAT_R16_TYPELESS format. Therefore this means you can control the bit depth but not whether you get an int or float.

It just hit me. I believe the HW PCF filtering happens when you force a depth buffer (after all, it's just a IDirect3DSurface9 pointer) as a texture. INTZ instead just gives you D3D10 functionality and may not produce the HW PCF comparison. In this case it would be necessary to create the PF_SHADOW flag as you suggested.
Also using PF_DEPTH PF_DEPTH_16 & PF_DEPTH_32 could be possible. Note I don't know if by using INTZ you can include the bit depth. I'm too tired right now.

Well, the only way is to build a tiny D3D app and try it. At this point it's too much speculation.

Edit: In D3D11, filtered PCF can be achieved not through hacks, but rather altering the sampler state & using SampleCmp instruction.
User avatar
sparkprime
Ogre Magi
Posts: 1137
Joined: Mon May 07, 2007 3:43 am
Location: Ossining, New York
x 13

Re: Hardware PCF shadows

Post by sparkprime »

OK so do we want the ability to make depth buffers like a resource, with a name, and a DepthBufferManager or some such?

Since they are so similar to textures, is it worth extending Texture/TextureManager to also allow the representation of depth buffers? This would make it obvious how to bind them to texture units in material scripts.

Otherwise the only way to 'get at' a depth buffer would be via an RTT, and where the RTT is hidden or dynamically changing (shadow buffer, compositor), some more high level mechanism specialised for whatever is doing the hiding.

I presume there is no reason for images on disk to ever be represented by depth buffers. It seems the key thing here is that a depth buffer is the result of rendering -- it already exists as a necessary evil when doing rendering, so we may as well use it for as much as possible, but we would not start making them for other reasons.

With that in mind, perhaps it makes no sense to make them true resources, and it would be better to simply refer to them more abstractly as an element of an RTT, much like shadow textures and compositor targets are referenced from texture units by toggling a flag, and it assumed that the system will connect them to whatever buffer is filling that role at the time.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5514
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1379

Re: Hardware PCF shadows

Post by dark_sylinc »

I think you're overthinking that part.

They should be normal RTTs, created like any other RTT (TextureManager::createManual, with PF_DEPTH and TU_RENDERTARGET).
The PF_DEPTH alone is enough for the HardwarePixelBuffer contained inside the created Texture class; to create a RenderTexture pointer that contains a null colour buffer and and attach a newly (*) created depth buffer.

After that, referencing the Texture is dead easy, just like any other texture. Materials can contain a predefined name so that they're referenced automatically (see Fresnel sample). They could be created through compositors and be referenced via binding_type compositor; etc. There are many more ways.

(*) At first I thought "ok, we need some way to indicate we want to use an existing depth buffer". But we can't really, since depth buffers that will be used as textures may require specific parameters at creation time (like "typless" or "INTZ" in D3D11 & D3D9 respectively)
User avatar
sparkprime
Ogre Magi
Posts: 1137
Joined: Mon May 07, 2007 3:43 am
Location: Ossining, New York
x 13

Re: Hardware PCF shadows

Post by sparkprime »

So then they would be actual textures, the only difference would be they have an unusual pixel format. Since they would be render targets they'd have the pointer to the RenderTarget and this would be the colourless thing.

What about having a rendertarget with both a custom depth buffer and colour? I think there are some cases where you don't want the 2x performance of depth buffer only but you also don't want to use a colour buffer in an MRT for storing the depth. In fact deferred shading seems an obvious candiate (assuming no depth-only pre-pass to save texture fetches).
User avatar
sparkprime
Ogre Magi
Posts: 1137
Joined: Mon May 07, 2007 3:43 am
Location: Ossining, New York
x 13

Re: Hardware PCF shadows

Post by sparkprime »

By the way are you able to talk on IRC or something like that? It seems to be only us 2 with an opinion :)
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5514
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1379

Re: Hardware PCF shadows

Post by dark_sylinc »

Mmm, good point. May be I was looking it the wrong way; by focusing on the PF_DEPTH pixel format.

I've come up with an interesting alternative:
Instead of using PF_DEPTH (we would remove this pixel format), add a boolean value to RenderTarget which indicates we want a depth buffer that will be used as a texture:

Code: Select all

renderTarget->setDepthBufferTexture( true );
When the render system tries to find a matching depth buffer for that RTT, those depth buffers with the same pool ID that weren't tagged as 'DepthTextures' won't be compatible. If none is found; a new DepthBuffer is created and attached to that RT.

DepthBuffers tagged as 'DepthTextures' will create their own internal Texture interface for access the depth buffer as a Texture.
In other words, the code would look like this:

Code: Select all

//...Create renderTarget code here...
renderTarget->setDepthBufferTexture( true );
renderSystem->setDepthBufferFor( renderTarget ); //Force attaching a depth buffer now (or we could wait after a frame is rendered).
if( !renderTarget->getDepthBuffer() )
{
    //Error. Probably using D3D9 and no INTZ support.
}
else
    TexturePtr depthTexture = renderTarget->getDepthBuffer()->getAsTexture(); //Returns null if this isn't a depth texture
TexturePtr would use an internal pixel format (i.e. PF_DEPTH_INTERNAL) to flag that it isn't a normal texture and that accessing it's contents is API specific; TexturePtr would need a reference to it's creator, a DepthBuffer. The Texture was created when DepthBuffer was created, which in it's constructor was passed as argument "depthTexture = true"

When the user wants the 2x speed you mentioned, he would create RenderTarget with PF_NULL (this PF doesn't exist yet, perhaps we could use PF_UNKNOWN?) and then set setDepthBufferTexture to true. The way the depth buffer is accessed as texture is done the same way.

Of course, two RTs with PF_NULL may actually end up using the same depth buffer. In order to avoid this, those two RTs would need to use different pool IDs. A debug check could be performed to warn the user when this happens.

Additionally, a Compositor syntax must be thought to enable using the depth buffer attached to an ordinary render target as texture for a RenderQuad pass. This syntax wouldn't be necessary for PF_NULL rendertargets, but it would be needed for the MRT/deferred shading case.

Bits of code that would need modification:
  • Create PF_DEPTH_INTERNAL & PF_NULL. Remove unused PF_DEPTH
  • Texture needs to hold a pointer to DepthBuffer
  • Add RenderTarget::setDepthBufferTexture
  • Slightly modify DepthBuffer::isCompatible() to reject when RenderTarget needs depth textures, and we aren't depth textures
  • Modify DepthBuffer constructor to use a new argument to indicate this is a depthTexture, and it will create a Texture with PF_DEPTH_INTERNAL when true.
  • Debug check when two PF_NULL RenderTargets end up using the same depth buffer.
  • Create a sample to show how to do all this.
I like this approach, it doesn't require much modification, it's flexible, and seems to cover all cases

Any thoughts?
User avatar
sparkprime
Ogre Magi
Posts: 1137
Joined: Mon May 07, 2007 3:43 am
Location: Ossining, New York
x 13

Re: Hardware PCF shadows

Post by sparkprime »

dark_sylinc wrote:add a boolean value to RenderTarget which indicates we want a depth buffer that will be used as a texture:

Code: Select all

renderTarget->setDepthBufferTexture( true );
So this chooses depth fetch instead of shadow fetch. It doesn't allow choosing format more precisely but we can probably punt that for now. There is no issue of matching the resolution since that can't be controlled.

How does one do a shadow fetch? Leave this to false? Will the texture binding mechanism still work? Do we need another boolean for that maybe?
When the render system tries to find a matching depth buffer for that RTT, those depth buffers with the same pool ID that weren't tagged as 'DepthTextures' won't be compatible. If none is found; a new DepthBuffer is created and attached to that RT.
So the user won't have to make them, the pool will grow automatically?
DepthBuffers tagged as 'DepthTextures' will create their own internal Texture interface for access the depth buffer as a Texture.
When do DepthBuffers get destroyed, who will clean up the Texture implmentation, and is it possible that a DepthBuffer can outlive a DepthTexture?
TexturePtr would use an internal pixel format (i.e. PF_DEPTH_INTERNAL) to flag that it isn't a normal texture and that accessing it's contents is API specific; TexturePtr would need a reference to it's creator, a DepthBuffer. The Texture was created when DepthBuffer was created, which in it's constructor was passed as argument "depthTexture = true"
So we don't actually have a DepthTexture class, just add some fields to the Texture class to make some instances special?

When the user wants the 2x speed you mentioned, he would create RenderTarget with PF_NULL (this PF doesn't exist yet, perhaps we could use PF_UNKNOWN?) and then set setDepthBufferTexture to true. The way the depth buffer is accessed as texture is done the same way.
Of course, two RTs with PF_NULL may actually end up using the same depth buffer. In order to avoid this, those two RTs would need to use different pool IDs. A debug check could be performed to warn the user when this happens.
Maybe there should just be a 'i want to be unique, damnit' flag, as well as the ids. There could be a special id for that?
Additionally, a Compositor syntax must be thought to enable using the depth buffer attached to an ordinary render target as texture for a RenderQuad pass. This syntax wouldn't be necessary for PF_NULL rendertargets, but it would be needed for the MRT/deferred shading case.
That, and the material interface.

Also we have to figure out how this is used for shadows.
[*] Texture needs to hold a pointer to DepthBuffer
Other way round, no?
[*] Modify DepthBuffer constructor to use a new argument to indicate this is a depthTexture, and it will create a Texture with PF_DEPTH_INTERNAL when true.
We might actually create that every time, for use for shadow fetches? Or maybe just have 2 booleans in the constructor.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5514
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1379

Re: Hardware PCF shadows

Post by dark_sylinc »

sparkprime wrote: So this chooses depth fetch instead of shadow fetch. It doesn't allow choosing format more precisely but we can probably punt that for now. There is no issue of matching the resolution since that can't be controlled.

How does one do a shadow fetch? Leave this to false? Will the texture binding mechanism still work? Do we need another boolean for that maybe?
That's what I was thinking. When it is left to false, it could end up using either a depth texture or a depth buffer. In other words, undefined.
However if experience shows we INTZ doesn't do HW PCF, we would need to make it non boolean, and hold 3 values: Undefined, depth buffer only, depth texture only. Or just false = depth buffer only, true = depth texture only.

As for choosing format more precisely this is still possible: A pixel format of "PF_X8R8G8B8" means we need a 32-bit depth buffer. This means though, we should have PF_NULL16 & PF_NULL32. I think the 'null' is misleading now. May be going back to PF_DEPTH16 & PF_DEPTH32, with a comment & documentation explaining the colour is null.

Note you can't control if you get an integer depth buffer or float one. But this isn't possible at API level anyway.
sparkprime wrote: So the user won't have to make them, the pool will grow automatically?
Yes.
sparkprime wrote: When do DepthBuffers get destroyed, who will clean up the Texture implmentation, and is it possible that a DepthBuffer can outlive a DepthTexture?
I need to sort out that in detail. Definitely Ogre should be responsible for cleaning the Texture automatically.
Textures should be destroyed when DepthBuffers are. However, they would only be actually destroyed if no one is referencing their textures; otherwise the only effect _cleanupDepthBuffers() would do is that in D3D the device is lost and the internal D3D resources must be recreated, but the pointers wouldn't be left dangling out there (unless we're on exit)
Basically, the same behavior you get with manual depth buffers, except this time they might be removed if we are certain they aren't being referenced as textures anywhere (we can know that since TexturePtr is reference count'ed)
sparkprime wrote: So we don't actually have a DepthTexture class, just add some fields to the Texture class to make some instances special?
Yup. But it's not a bad idea if it ends up too much overhead for the original Texture class (checking format == PF_DEPTH_INTERNAL).
Basically DepthTexture (or tweaked Texture) would wrap around DepthBuffer. You're probably right, it makes more sense to use a derived class.

Actually we would need to modify D3D9Texture/GLTexture, or create D3D9DepthTexture & GLDepthTexture. "DepthTexture" alone is pointless, as the implementation is very tightly related to the API.
sparkprime wrote:
Of course, two RTs with PF_NULL may actually end up using the same depth buffer. In order to avoid this, those two RTs would need to use different pool IDs. A debug check could be performed to warn the user when this happens.
Maybe there should just be a 'i want to be unique, damnit' flag, as well as the ids. There could be a special id for that?
Not sure, may be...
I think it would be easier to have an ID counter which defaults to a high value, and is then incremented each time a PF_NULL RT is created. Finally use that ID as the pool ID.

To avoid users accidentally assigning that ID, we add a value called POOL_MAX_VALUE to PoolId enumeration. Automatically generated values will be assigned beyond POOL_MAX_VALUE, and explain the PoolId comments/documentation higher values are possible but they're reserved for internal usage, and using them is undefined.
sparkprime wrote:
Additionally, a Compositor syntax must be thought to enable using the depth buffer attached to an ordinary render target as texture for a RenderQuad pass. This syntax wouldn't be necessary for PF_NULL rendertargets, but it would be needed for the MRT/deferred shading case.
That, and the material interface.

Also we have to figure out how this is used for shadows.
True about the material interface. The manual says the "texture" keyword allows many parameters. May be using "depth_texture" will make the material to use the depth buffer attached to the Render Target. If the texture isn't a render target, this boolean is ignored.

The shadow thingy would go as follow:
D3D11 & GL
Setup as PF_DEPTH32/PF_DEPTH16, use a SampleCmp (or equivalent in GL) for the receiver

D3D9
If INTZ allows HW PCF, it's exactly the same as D3D11, although bilinear filtering instead of SampleCmp
If INTZ and HW PCF don't play toghether, recognize PF_DEPTH32/PF_DEPTH16 as a special case and create a DepthBuffer that contains a DepthTexture, but with a regular D3D9 depth buffer.

We'll have to wait to see how it actually works, but it shouldn't be complex, compared to everything rest.
[*] Texture needs to hold a pointer to DepthBuffer
Other way round, no?
Both ways actually. The DepthBuffer created the texture, it controls it and holds the pointer to it.
The DepthTexture is just a wrapper around a DepthBuffer, so it needs a pointer to it's creator.
[*] Modify DepthBuffer constructor to use a new argument to indicate this is a depthTexture, and it will create a Texture with PF_DEPTH_INTERNAL when true.
We might actually create that every time, for use for shadow fetches? Or maybe just have 2 booleans in the constructor.
I don't understand the question.
User avatar
sparkprime
Ogre Magi
Posts: 1137
Joined: Mon May 07, 2007 3:43 am
Location: Ossining, New York
x 13

Re: Hardware PCF shadows

Post by sparkprime »

dark_sylinc wrote:That's what I was thinking. When it is left to false, it could end up using either a depth texture or a depth buffer. In other words, undefined.
However if experience shows we INTZ doesn't do HW PCF, we would need to make it non boolean, and hold 3 values: Undefined, depth buffer only, depth texture only. Or just false = depth buffer only, true = depth texture only.
I'm pretty certain INTZ won't do a shadow fetch since there is no way to choose which way you want to fetch in the shader, in HLSL. Not sure how this works out when CG is compiled down to D3D ASMs since CG does have 2 versions of the function (one to retrieve the value (depth fetch) and one to do the comparison (shadow fetch).
May be going back to PF_DEPTH16 & PF_DEPTH32, with a comment & documentation explaining the colour is null.
PF_DEPTHONLY16 :)
Note you can't control if you get an integer depth buffer or float one. But this isn't possible at API level anyway.
GL has specific formats so just waiting for D3D. It can always be supported later just by adding more formats though.
sparkprime wrote: When do DepthBuffers get destroyed, who will clean up the Texture implmentation, and is it possible that a DepthBuffer can outlive a DepthTexture?
I need to sort out that in detail. Definitely Ogre should be responsible for cleaning the Texture automatically.
Textures should be destroyed when DepthBuffers are. However, they would only be actually destroyed if no one is referencing their textures; otherwise the only effect _cleanupDepthBuffers() would do is that in D3D the device is lost and the internal D3D resources must be recreated, but the pointers wouldn't be left dangling out there (unless we're on exit)
Basically, the same behavior you get with manual depth buffers, except this time they might be removed if we are certain they aren't being referenced as textures anywhere (we can know that since TexturePtr is reference count'ed)
There's no way to avoid the dangling pointer then, there can always be a TexturePtr in user code, or from a material texture unit or something. We have to either keep everything alive until the texture is gone (i.e. reference from depth buffer to texture is weak) or mark the texture and then check this at rendering time to see if the depth buffer still exists, and throw an exception otherwise (or maybe just not bind it or whatever).
True about the material interface. The manual says the "texture" keyword allows many parameters. May be using "depth_texture" will make the material to use the depth buffer attached to the Render Target. If the texture isn't a render target, this boolean is ignored.
I thought in a material the texture is set to the texture name, not the render target name? IIRC they are not the same.

Regardless, however, I think that a material that is set up to bind an RTT as a texture unit should be changeable to use its depth buffer instead, by making a tiny change.

How about adding 3 more things to the content type? I.e. [named, shadow, compositor, named_depth, shadow_depth, compositor_depth]

However what happens if someone uses named_depth and it's pointing to a texture that is not part of an RTT?
The shadow thingy would go as follow:
D3D11 & GL
Setup as PF_DEPTH32/PF_DEPTH16, use a SampleCmp (or equivalent in GL) for the receiver
The key thing about D3D11/GL is that htey have separate instructions for doing depth / shadow fetches, right? GL has shadow2D and texture2D, and the shadow2D has the extra z param. Can you reiterate the D3D11 situation regarding this?

Once we have the shadow fetch working, in all cases PCF should be obtainable just by changing the filtering through the usual mechanisms so we don't have to worry about PCF, just basic shadow fetches.
I'm more interested in what has to be changed in the scenemanager API and implementation. With the PF_DEPTHONLY thing I guess it falls out quite nicely with the ShadowConfig object. Then as long as the extension to the material system (content type or whatever) works uniformly for compositors and shadow textures, all should be well.

D3D9
If INTZ and HW PCF don't play toghether, recognize PF_DEPTH32/PF_DEPTH16 as a special case and create a DepthBuffer that contains a DepthTexture, but with a regular D3D9 depth buffer.
[*] Modify DepthBuffer constructor to use a new argument to indicate this is a depthTexture, and it will create a Texture with PF_DEPTH_INTERNAL when true.
Or maybe just have 2 booleans in the constructor.
I don't understand the question.
This extra boolean (also discussed earlier) would be the thing that tells it to use the special format that causes the tex2D functions to do shadow fetches instead of depth fetches. It should only be needed on D3D9. A 3-value enum is also good.
User avatar
sparkprime
Ogre Magi
Posts: 1137
Joined: Mon May 07, 2007 3:43 am
Location: Ossining, New York
x 13

Re: Hardware PCF shadows

Post by sparkprime »

As for demos, I guess a good thing to do is modify the shadows demo to use depth-only shadows and hardware pcf where available.

To test the compositor stuff, maybe a modification of the deferred shading demo, but that could make the code very complex. Assuming it works, could we perhaps modify the demo to use only depth fetches and not attempt to put the depth in a colour buffer?

The explicit RTT support probably doesn't need a demo, as I'm guessing shadows and compositors covers 95% of use cases.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5514
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1379

Re: Hardware PCF shadows

Post by dark_sylinc »

sparkprime wrote: PF_DEPTHONLY16 :)
Nice name :)
sparkprime wrote:
Note you can't control if you get an integer depth buffer or float one. But this isn't possible at API level anyway.
GL has specific formats so just waiting for D3D. It can always be supported later just by adding more formats though.
GL has always been lagging behind D3D since 2003. The way I see it, it's gonna be the other way around.
Like you said anyway, we can add more formats and problem solved.
sparkprime wrote:There's no way to avoid the dangling pointer then, there can always be a TexturePtr in user code, or from a material texture unit or something. We have to either keep everything alive until the texture is gone (i.e. reference from depth buffer to texture is weak)
May be I wasn't clear. That's what I meant.
A device reset in D3D wouldn't delete the DepthBuffer ptrs if the TexturePtr is being used somewhere:

Code: Select all

if( texturePtrWeOwn->useCount() > 1 )
   then don't delete this;
Note the device reset implies recreating the Direct3D9 surface, which means it's contents in GPU are lost and become garbage until next render. We can't do anything about that. Users should check no device reset was made, like in any RT.
sparkprime wrote:I thought in a material the texture is set to the texture name, not the render target name? IIRC they are not the same.
RenderTarget derives from Texture. All RenderTargets are Textures, not all textures are RenderTarget. Using the name of the RenderTarget works when you call TextureUnitState::setTextureName. It even works putting the name in the material file, as long as the material is loaded after the RenderTarget is created. Fresnel Demo does this (the render target names are "refraction" and "reflection").
sparkprime wrote: How about adding 3 more things to the content type? I.e. [named, shadow, compositor, named_depth, shadow_depth, compositor_depth]
shadow_depth is unnecessary, since we can infer this at real time automatically when the shadow textures are PF_DEPTHONLY16/32. Furthermore, it saves the developer from having to modify all it's materials when he wants to switch from shadow to shadow_depth or viceversa, just for a simple comparison to see what works better for him.

I was thinking more of:

Code: Select all

texture MyRenderTargetName use_depth
This would work for both compositor and named, and ignored in shadow.
sparkprime wrote:However what happens if someone uses named_depth and it's pointing to a texture that is not part of an RTT?
We could ignore it, or raise an exception (invalid params) except on shadow content type. I believe we can't raise an exception at parsing time, but rather at loading time.
sparkprime wrote:The key thing about D3D11/GL is that htey have separate instructions for doing depth / shadow fetches, right? GL has shadow2D and texture2D, and the shadow2D has the extra z param. Can you reiterate the D3D11 situation regarding this?
D3D9 has tex2Dproj, which takes 4 parameters, being the z param unused. The NVIDIA driver automatically recognizes this when bilinear filtering is on and a depth buffer is bound; then the z param is used for the comparison. Cg has tex2Dproj with 3 parameters, which ends up translating to HLSL's tex2Dproj with 4 params.

GL has shadow2D, no question there.

D3D11 has myTexture.SampleCmpLevelZero, and needs it's sampler state (if I'm not mistaken) to be set to D3D11_FILTER_COMPARISON_MIN_MAG_LINEAR_MIP_POINT
User avatar
sparkprime
Ogre Magi
Posts: 1137
Joined: Mon May 07, 2007 3:43 am
Location: Ossining, New York
x 13

Re: Hardware PCF shadows

Post by sparkprime »

Note the device reset implies recreating the Direct3D9 surface, which means it's contents in GPU are lost and become garbage until next render. We can't do anything about that. Users should check no device reset was made, like in any RT.
I don't understand how this will be presented at the Ogre level but if there is already a precedent for it with RenderTarget then that should obviously be followed. It should not be explained or documented in terms of D3D behaviour as Texture is an OgreMain level concept.

I believe, if a user changes the shadow config and continues to render with the wrong shadow textures (e.g. referred to them specifically by name instead of using the mechanism that causes it to be automatically updated in this event) then he just gets the 'old' contents of this texture indefinitely but otherwise everything still 'works'. I think something similar should be true here. Destroying the DepthBuffer should just mean it is no longer updated, because the 'write' interface to the texture is gone.
RenderTarget derives from Texture.
RenderTexture derives from RenderTarget,
RenderTarget is a root :)
There is no subtype relationship between Texture<-->RenderTarget or Texture<-->RenderTexture
Using the name of the RenderTarget works when you call TextureUnitState::setTextureName. It even works putting the name in the material file, as long as the material is loaded after the RenderTarget is created. Fresnel Demo does this (the render target names are "refraction" and "reflection").
The fresnel demo creates textures called refraction and reflection that have TU_RENDERTARGET, I don't think the rendertarget name is ever set or used. I guess it may not even have any use except for logging / debugging purposes.
sparkprime wrote: How about adding 3 more things to the content type? I.e. [named, shadow, compositor, named_depth, shadow_depth, compositor_depth]
shadow_depth is unnecessary, since we can infer this at real time automatically when the shadow textures are PF_DEPTHONLY16/32. Furthermore, it saves the developer from having to modify all it's materials when he wants to switch from shadow to shadow_depth or viceversa, just for a simple comparison to see what works better for him.
You misunderstand what i mean by shadow_depth -- it would be the same as shadow except it would select the depth buffer of the implicit shadow rtt. The compostior variant is the same except it refers to the implicit compositor rtt, and the named_depth is for explicit user-created RTTs. All 3 are definitely needed, and nothing here is selecting what kind of fetches are done in the shader -- as you say in the most general case that is controlled by the format when the texture is first created.
I was thinking more of:

Code: Select all

texture MyRenderTargetName use_depth
This would work for both compositor and named, and ignored in shadow.
That is effecitvely the same, if you incorporate what I just said above. The only difference is that the 6 values are expressed with one 3-value type and one 2-value type. ([named, compositor, shadow], [true, false])

I can't think of any objective reason to choose between 6 values or 3*2 values though. I don't think it matters as long as we are using the same reasoning behind the need for it. :)
sparkprime wrote:However what happens if someone uses named_depth and it's pointing to a texture that is not part of an RTT?
We could ignore it, or raise an exception (invalid params) except on shadow content type. I believe we can't raise an exception at parsing time, but rather at loading time.
I suppose this is analogous to specifying a compositor or shadow texture that does not exist (too high an index, or wrong name). Then, there is an exception at render time I believe. We just need to add more logic to that check. Alternatively, just bind colour instead, but I suspect an exception would be more helpful, and there may not be any colour buffer anyway.
sparkprime wrote:The key thing about D3D11/GL is that htey have separate instructions for doing depth / shadow fetches, right? GL has shadow2D and texture2D, and the shadow2D has the extra z param. Can you reiterate the D3D11 situation regarding this?
D3D9 has tex2Dproj...
We already discussed D3D9, I'm asking what the tex2D situation is in HLSL of D3D11 -- they can't be doing the same thing as D3D9 or they would also need to a variation on the depth format to distinguish between shadow / depth fetches. They must have something like shadow2D().
D3D11 has myTexture.SampleCmpLevelZero, and needs it's sampler state (if I'm not mistaken) to be set to D3D11_FILTER_COMPARISON_MIN_MAG_LINEAR_MIP_POINT
This is mipmaps and filtering? That setup stuff will just be abstracted by ogre, right? It should even be already implemented and we don't have to change anything. I'm worried about what user shaders will look like.


I feel we are really tying this down now, I should be able to start implementing soon :)
User avatar
sparkprime
Ogre Magi
Posts: 1137
Joined: Mon May 07, 2007 3:43 am
Location: Ossining, New York
x 13

Re: Hardware PCF shadows

Post by sparkprime »

I have a dualboot win xp and linux machine with a 8000 series nvidia chip. So I can't test D3D11 or any ATI implementations.

It occurs to me, the easiest way to get *something* working, as a first step, is to start with GL, implement the depth buffer / texture connection and allow the material to bind the depth buffer of an RTT (at the API level, don't need to touch scripts at first). Then shadow2D and texture2D ought to start working with the depth buffer values, since the default format ought to be OK. PCF ought to work straight out of the box in GL too, by just changing the filtering.

It seems the pixelformat stuff is only required to support depth-only targets and the hacks for D3D9, so that can be the next 2 steps.

The other stuff (script syntax, tests, etc) should not be challenging.
User avatar
sparkprime
Ogre Magi
Posts: 1137
Joined: Mon May 07, 2007 3:43 am
Location: Ossining, New York
x 13

Re: Hardware PCF shadows

Post by sparkprime »

OK I've been looking more closely at the code.

Let's assume that there is a render texture that has been created, and it naturally has a depth buffer automatically created by the pool.

Some TextureUnitState references this texture by name, and has a boolean set that means to bind to the depth rather than the colour buffer.

So we're not doing anything interesting at this stage, just binding the depth buffer in a pass so it can be read from a shader. I am tackling this first because it's a necessary part of any use of depth buffer fetches, and also it's the part I understand least.

One crucial bit seems to be e.g. in _setPass, and renderSingleObject, where this happens:

RenderSystem::_setTextureUnitSettings(...)

We have to somehow tell it to bind not the colour buffer but the depth buffer. I can think of a couple of ways of structuring this (without knowing exactly how it happens in the various backends and RTT strategies).

* We could give a different TexturePtr to the RenderSystem, one that encapsulates the depth instead of the colour. This would be linked from the existing TexturePtr. We would chase the pointer in _setPass if the TUS flag was set.
* We could have a method that sets a boolean in the rendersystem that would cause it to bind the depth instead of the colour of the existing TexturePtr. We would set this boolean if the TUS flag was set.

thoughts?