sparkprime wrote:So this is a conservative thing -- you can force things to be separate (for correctness) but you cannot force things to be shared (for performance)?
Yup, that's correct. Nice way of putting it
OK that sounds good, are there created using depth formats? I'm not sure which of these to support, and what the GL equivalents are (from the page you linked a few posts ago)
D3DFMT_D16
D3DFMT_D24X8
Yes, but they're chosen by the RenderSystem. And the specific format is stored in D3D9DepthBuffer & GLDepthBuffer. More on this later
These two seem to be supported on modern cards and are only useful when doing a PCF shadow fetch from the shader, we could expose them as PF_SHADOW and PF_SHADOW24 for example.
I would prefer to use "PF_DEPTH" only because you can't be picky with what you request. Keep reading. BTW it's also useful when doing all kind of depth-based compositing.
In Ogre, the pixel format PF_DEPTH is reserved for this use, albeit not used.
Not sure whether to remove this or use it for INTZ?
Use it for INTZ. "INTZ" is hack that just exposes Dx10 capabilities to the Dx9 API; therefore you get (almost?) the same stuff you would get by using D3D11 or GL rendersystems. The user doesn't need to know we're hacking. He just needs to know when it's present, and that it works consistently.
Agreed, there should be an RTT with depth buffer but no colour buffer. Hopefully this is possible without making a mess of the RenderTarget API.
This also mirrors the ability to write only depth in a particular pass, which I think is nice.
Just for you to know, write only depth passes can performed using a material pass with colour_write
Well, now to the depth format issue:
D3D9:
Ogre chooses it in D3D9RenderSystem::_createDepthBufferFor(), which delegates to _getDepthStencilFormatFor()
Choosing a depth format for the RT can be picky. Get it wrong and it will fail in some GPUs or multi monitor setups.
This has historic reasons. In the past, many cards only supported a 16-bit Z-buffer, but both 16-bit & 32-bit backbuffers. For performance reasons, usually 16 for colour & 16 for depth was often chosen.
NVIDIA deviated from this trend, and it's Vanta & TNT cards introduced the following rule: 32-bit colour goes with 32-bit depth buffers, and 16 with 16 (in other words, they must use same bit depth).
Other vendors started to mimic them and deviate with their own rules, causing a mess. Stencil buffers were added to DirectX 7 (or was it d3d 6?) and then it became worse: Some GPUs allowed 15-bit depth 1-bit stencil, while others didn't support stencil in 16-bit depth buffers and your only choice was 24-bit depth 8-bit stencil.
What's curious is that these ridiculous rules
still exist (well, they aren't ridiculous, but they're a major PITA). These rules are common to both APIs, as these are hardware restrictions.
To mitigate the problem, IDirect3D9::CheckDeviceFormat & CheckDepthStencilMatch were created in D3D8.
So, basically, you can't assume you'll get what you want. Specially if you intend to use 16-bit depth formats.
Ogre plays safe by first trying to query for 24-bit depth 8-bit stencil, then 24-bit depth no stencil, then 16-bit depth. In that order. Modern cards all have D24S8 which is the safest assumption these days, but if you stick to other formats, prepare for customer support ("the game doesn't run in my machine!")
So, Ogre creates an 8-bit stencil always, even if you won't be using stencil at all. This isn't perfect either, as I read once an old ATI paper that said their X series (X600 X1200 X1800, etc) wouldn't benefit from fast Z clear if you don't set the clear stencil flag and you've requested an S8 format, even if you've never used it. I want to believe their current drivers now track if you've used stencil at all and smartly add the stencil clear flag otherwise. ATI's X series are crap anyway (real crap)
GL
OGL has 3 ways to render offscreen (ordered from bad to good):
- Copy
- Pbuffers
- FBO (Frame Buffer Object)
Each technique is handled by a manager: GLCopyingRTTManager, GLPBRTTManager, GLFBOManager
Also each technique uses it's own render texture overload: GLCopyingRenderTexture, GLPBRenderTexture, GLFBORenderTexture
Copy
This is more like a trick. It's not really 'offscreen'. You just render to a portion of the backbuffer and then copy VRAM to a texture. This is old school and how they handled cool effects in the 90's. Needless to say, a RTT can't be bigger than the screen resolution. The depth buffer is always shared for everybody (as there's no notion of separate depth buffer). Very old technique.
Switch "RTT: Preferred Mode: Copy" in the config dialog to see it.
When in this mode, a dummy GLDepthBuffer is created.
PBuffer
Old method. It was supposed to be the way OpenGL would handle RTTs, but it was a complete fiasco. Platform dependent (Windows/Linux/Mac), and too slow as it required very expensive state changes. I can't say much about it because I haven't really experimented with it. It's support was removed from most OGL drivers. Ogre fallbacks to "Copy" if you've selected pbuffer and they aren't supported.
I can't recall if GLDepthBuffer works here. But I never had the chance to test it (and no one had)
FBO
The way it is handled in the present.
GLFBOManager is in charge of choosing a matching depth format for the given RTT, See GLFBOManager::_tryFormat. IIRC Ogre proves everything with all formats and saves them into a list at startup. This is because OGL has an idiotic way of doing things, that to see if something is supported, you have to
attempt to use it and check for errors. In realtime this means lots of state changes 'just for querying' which could make it prohibitively slow (Unrelated: I pity GL driver developers, as they need to ensure the driver doesn't crash when you're querying for support for a combination of state changes they didn't anticipate)
Note: D3D Rendersystem caches the query results for each format as they're being requested. This is possible because it's much simpler to query.
What's important about OGL FBO's, is that they allow the possibility of a GPU having depth buffers and stencil buffers physically separate in memory (no such device exists to my knowledge, but they might in the handheld field using GL ES) and hence, in OpenGL you have to request for depth a object, and a stencil object separately. But you have to also be aware that most (all) GPUs may require those two objects to be binded together at the same time (since in practice, stencil and depth lay in the same memory area).
This is properly taken care of in GLDepthBuffer. Also see static const GLenum stencilFormats[] & static const GLenum depthFormats[] declared in OgreGLFBORenderTexture.cpp
GLRenderSystem::_createDepthBufferFor takes care of choosing the right format by delegating to GLFBOManager::getBestDepthStencil (which uses the precomputed table generated at startup)
So, bottom line; choosing the right Depth format can be very tough. Imagine what would happen if you were allowed to be picky with a custom format. That's why I'm inclined for the "PF_DEPTH" enumeration. Which format you actually get depends on too many variables, including how the main render window was created.