Page 1 of 1

Hardware PCF

Posted: Mon Jul 13, 2009 2:24 pm
by sparkprime
Is there a way to get access to the hardware PCF operations on modern graphics cards and GL implementations? I can do PCF manually in the shader, but this is very slow (even simple scenes get pushed below 75FPS). It seems hardware PCF is a must for any reasonable shadowmap system.

Obviously it is available in GLSL but doesn't work if you try and use it because the shadow textures are not set up from user space. I had a grep of the OGRE source code to see whether the calls were made anywhere and it drew a blank. Maybe I have the wrong idea about how this should be done.

Is there a story for hardware PCF in Ogre GL? What about Direct3D?

Re: Hardware PCF

Posted: Mon Jul 13, 2009 2:37 pm
by sparkprime
Hmm is this a nvidia only feature?

There is standard support for shadow maps and depth shadow tests in GLSL but it seems that without hardware PCF they aren't much use.

Re: Hardware PCF

Posted: Mon Jul 13, 2009 5:55 pm
by sinbad
This whole area is a mess.

Hardware depth shadow testing is only supported in GL. On D3D, it only works through a hack on Nvidia only.

Hardware PCF exists on Nvidia, but requires renderstate hacks to enable on D3D and custom extensions on GL. On ATI, there's a Fetch4 instead on the X1600 up, but it's generally demoed on D3D9 using a R32F texture (this was around the time of ATI's demos stopped being on GL and went D3D-only), and you have to create the texture with a funky D3DFORMAT (MAKEFOURCC('D','F','2','4')), which is a total PITA. You also have to use hacks on some sampler state to enable it - you piggy-back on D3DSAMP_MIPMAPLODBIAS, setting it to MAKEFOURCC('G','E','T','4'), which you then have to disable again to revert to normal mode. I don't even think there's a setting for it on GL.

In short, it's a total and utter bloody train-wreck of a feature area, completely non-standard. I guess we could add hacks to allow this stuff to be used on a per-vendor, per-API basis, but so far I've avoided doing it because it just makes me feel dirty. Patches welcome if you want to try ;)

[edit]In Dx10.1 there's a standard 'Gather4' function which finally standardises this. Which might make things easier in 5 years :?

Re: Hardware PCF

Posted: Wed Jul 15, 2009 3:12 am
by sparkprime
That sounds like quite a lot of effort. What do you normally do? I remember you posted a video or screenshots of speed tree running in OGRE, and the shadows were quite nice there.

Which also reminds me, is there some premade way of doing cascaded shadow maps or some other way of avoiding the crawly nature of lispsm?

Re: Hardware PCF

Posted: Wed Jul 15, 2009 5:35 pm
by sinbad
I usually use float textures and manually apply PCF. As it happens it's nice to be able to scale the filter anyway if you want full control over the blurring effect (e.g. PCSS), although if a 4-sample filter is free then it wouldn't hurt as an extra.

Parallel-Split Shadow Maps are available out of the box in ShadowCameraSetupPSSM, they use LiSPSM for each split too and in my experience look great for directional light shadows in particular.

Re: Hardware PCF

Posted: Mon Jul 20, 2009 7:40 pm
by sparkprime
Thanks, I somehow managed to miss the PSSM support in OGRE, I found time this weekend to set it up and it does reduce the crawling considerably, for a given total texture memory overhead. The rest of the crawling is probably only avoidable by using soft shadows, right? I spent a long time tuning it with various hand-chosen divides and optimal adjust factors.

One thing I kept running into was the shadow acne problem, because I couldn't find a bias that worked in large scenes and small scenes. I manged to work around this by storing depth values outside of the [0,1] range, i.e. from [-inf,inf] which allowed me to choose a bias in world units (1cm). So instead of using the z from the projection, I computed the dot of the world-space light direction with the world-space vertex, and stored this quantity in the shadow map. Dunno if this is standard or not. I did have a problem with un-drawn areas of the shadow map being set to 1, which in the [0,1] system is fine, but in [-inf,inf] caused a problem. Is there a way to specify the blanking value for shadow maps in OGRE?

It seems PCF is becoming less popular because of its performance impact, especially compared to VSM with summed tables which seems to look better and perform better than depth shadow mapping with PCF. So if this is the case there's probably not much point hacking in support for hardware depth tests / PCF.

Re: Hardware PCF

Posted: Tue Jul 21, 2009 6:13 pm
by sinbad
Sounds like you're trying some of the things I discussed on my blog a while back: ... e-gotchas/

I usually use a linear depth, scaled to a fixed range; although in theory you could use the shadow caster depth range information SceneManager collects, it's more awkward because only a subset of light information is available to be bound to params during the caster phase in 1.6. In 1.7 it's possible to bind light-specific parameters during the caster phase, and in that case I used the light attenuation range to scale the depth for spotlights, and a fixed range for directional lights. This seems to work pretty well. I use backface casting to reduce biasing issues, and world-space additional biasing with slope-scaling - using world space is required to do consistent fading and PCSS too.

VSM always had nasty edge cases for me and I never found it to be reliable enough, without having to step up to complicated compensation algorithms that required SM3/4.

Re: Hardware PCF

Posted: Tue Jul 21, 2009 7:28 pm
by Praetor
Have you tried clamping your VSMs? Apparently a simple clamping operation cuts off a large portion of the artifacts that can happen. Of course you can't eliminate them, and for certain scenes they may still be unacceptable.

As for your blanking problem, you can access the viewport used for the shadow textures, which will let you set the clearing behavior manually. Use the getShadowTexture function on SceneManager and then use the getBuffer function on the Texture to get the HardwarePixelBufferSharedPtr, which lets you called getRenderTarget, which returns to you the RenderTexture. I tend to not like scaling my depths back to 0-1 range. I use to, but found the same issue you have: setting biasing values was tedious. Using regular world units made much more sense and was easier to control.

Re: Hardware PCF

Posted: Tue Jul 21, 2009 11:22 pm
by sinbad
Scaling linearly but still to a [0..1] range works fine though. You only get biasing problems if you scale non-linearly by dividing by w.

Re: Hardware PCF

Posted: Thu Jul 23, 2009 6:34 pm
by sparkprime
I seem to have some problem on win32 nvidia GL -- I think the float32 ends up clamped to 0,1 which means the shadows only work in a very narrow band across the world :) Maybe it isn't a float32 buffer but some sort of integer buffer...

Anyway, I'll try scaling it linearly down to [0,1] by assuming a max world size of 10k or something. Dividing by w is the real problem, as you say :)

Praetor: thanks for the tip, if I do another U turn and go back to world space depth values then that's what I'll use to blank the buffer appropriately

Re: Hardware PCF

Posted: Sun Aug 02, 2009 11:33 pm
by sparkprime
Actually my problem on win32 GL is that the light position auto param of my directional light is making it into the shadow caster shader as xyzw=(0,0,0,1) which meant that everything in the depth texture was 0 except the undrawn areas which were 1. I tried using the scene manager 'shadow caster material' and also setting shadow caster material in each material but the problem was the same. The auto param is bound fine in the non-shadow-caster material.

I'm using quite an old version of OGRE (svn v1-6 r8825), does anyone recall this bug existing and having been fixed? Before I upgrade OGRE to trunk I'm going to trascribe my shaders to HLSL (which I need to do anyway) and see if the problem also exists with D3D9.

Re: Hardware PCF

Posted: Mon Aug 03, 2009 12:45 am
by nullsquared
sinbad wrote:In 1.7 it's possible to bind light-specific parameters during the caster phase, and in that case I used the light attenuation range to scale the depth for spotlights
On 1.6, just set the camera's far clip distance to the light's attenuation range, then use far_clip_distance to normalize into [0,1] - in the receiving shaders, this is equal to light_attenuation_range. This also has the added benefit of clipping anything too far from the light ;)

Re: Hardware PCF

Posted: Mon Aug 03, 2009 12:46 am
by nullsquared
sparkprime wrote:Actually my problem on win32 GL is that the light position auto param of my directional light is making it into the shadow caster shader as xyzw=(0,0,0,1)
Like sinbad said, light auto params are not set properly during the shadow caster rendering.

Re: Hardware PCF

Posted: Mon Aug 03, 2009 6:21 am
by sparkprime
Ah yes he did, buried in the blog post mentioned a few weeks ago.

With spot lights it makes sense to use a small frustum (shouldn't this be happening anyway?). And then divide by this distance instead of w.

With a directional light I don't see any solution because the light isn't fixed, it tends to move as the camera moves. So anything that goes into this depth buffer is not going to make any sense if it's based on the projection z value itself. That's why I wanted the direction of the light, so i could assume the light was at (0,0,0) and go from there.

It may be time to upgrade to svn trunk...

edit: actually you can use the z from the projection because only difference between caster/receiver is important, so the movements of the shadow camera cancel out.

Why do people want to squash down to [0,1] ? Are people using integer shadow maps?

Re: Hardware PCF

Posted: Mon Aug 03, 2009 9:40 am
by sparkprime
Actually you can't just use the z value from the projected vertex position, it is uncomparable to the depth value in the texture unless both have been divided by their respective w values. I would have thought the projection matrix for the depth pass should be identical to the projection matrix bound to the texture_worldviewproj_matrix auto param, but apparently not.

edit: oops no i just had a problem somewhere else. it does work fine. (final answer)

edit2: although if the point is to avoid bias headaches, this doesn't work because the amount of bias depends on the orientation of the camera. The only benefit over the z/w approach is that it uses a bit more of the floating point space, afaict.

Re: Hardware PCF

Posted: Wed Aug 05, 2009 11:37 am
by sinbad
sparkprime wrote:Why do people want to squash down to [0,1] ? Are people using integer shadow maps?
Mostly because it makes it easier to clear the buffer (to 1). You can't clear to the far plane of the camera across all rendersystems, Dx9 doesn't let you clear to anything outside this range. Dx10 and GL do though.

Personally I always avoid z/w techniques because the result is non-linear - sure that gives you more precision up close, but it makes bias less predictable and calculating derived depth values like depth fading & PCSS a total nightmare.

Edit: For what it's worth, and to prove I'm not just making this up ;) here's my current results with 3xPSSM directional depth shadows using linear depth scaled to a fairly arbitrary 1:100000:

Re: Hardware PCF

Posted: Sat Aug 08, 2009 7:23 pm
by sparkprime
Cool, this is exactly the sort of thing I'm trying to make work.

How are you handling the PCF? I can see what look like dithering artifacts but it's hard to tell as there are also JPEG artifacts in the image...

Do you vary PCF in terms of distance from the camera? What about occluder-receiver distance?

How do you set up PSSM, in terms of the cut-off distances and optimal_adjust_factor and so on.

is it 3 x 1024 float32 textures?

Re: Hardware PCF

Posted: Mon Aug 10, 2009 4:47 pm
by sinbad
It's actually quite simple regular-grid PCF, I didn't do any of the PCSS (occluder-receive distance spread), distance fade or a per-pixel rotated poisson disk, all of which I've done in other projects; they all add overhead obviously and in this case I felt it looked good enough just like this.

Rather than explain, here's the code I wrote to do it. Note that I'm not using any biasing in the receiver, because I'm using backface casters, and I want the (fixed) biasing to be a caster parameter; the reason for that is that I have some dynamically aligned quad casters (the leaves) which I want to give a much more significant bias than everything else to avoid shadow variances as the quads rotate.

Code: Select all

// Simple PCF 
// Number of samples in one dimension (square for total samples)


float4 offsetSample(float4 uv, float2 offset, float invMapSize)
	return float4(uv.xy + offset * invMapSize * uv.w, uv.z, uv.w);

float calcDepthShadow(sampler2D shadowMap, float4 uv, float invShadowMapSize)
	float shadow = 0.0;
	float offset = (NUM_SHADOW_SAMPLES_1D/2 - 0.5) * SHADOW_FILTER_SCALE;
	for (float y = -offset; y <= offset; y += SHADOW_FILTER_SCALE)
		for (float x = -offset; x <= offset; x += SHADOW_FILTER_SCALE)
			shadow += tex2Dproj(shadowMap, offsetSample(uv, float2(x, y), invShadowMapSize)).x > uv.z ? 1.0 : 0.0;

	shadow /= SHADOW_SAMPLES;

	return shadow;


float calcPSSMDepthShadow(sampler2D shadowMap0, sampler2D shadowMap1, sampler2D shadowMap2, 
						   float4 lsPos0, float4 lsPos1, float4 lsPos2,
						   float invShadowmapSize0, float invShadowmapSize1, float invShadowmapSize2,
						   float4 pssmSplitPoints, float camDepth)
	if (camDepth <= pssmSplitPoints.y)
		return calcDepthShadow(shadowMap0, lsPos0, invShadowmapSize0);
	else if (camDepth <= pssmSplitPoints.z)
		return calcDepthShadow(shadowMap1, lsPos1, invShadowmapSize1);
		return calcDepthShadow(shadowMap2, lsPos2, invShadowmapSize2);

PSSM setup is 3 split points, distances calculated using calculateSplitPoints with a shadow far distance of 3000 (frustum is 15000 deep), optimal adjusts are 2, 1, 1 respectively.

Shadow maps are indeed 3 x 1024x1024, float32. I haven't experimented with packing at a smaller precision than that, it might be possible since it works fine with a pretty hastily chosen depth range.


Re: Hardware PCF

Posted: Sun Sep 27, 2009 7:06 am
by sparkprime
That's very similar to what I've been doing, too. I haven't managed to get shadows that both look good and extend far into the distance, however. I kill my shadows at 200m.

I can do soft shadows with a 2x2 grid PCF tap and 2x2 dithering. So, 4x4 effective PCF but 2x2 taps per framebuffer pixel. I would like to do 'bilinear PCF' as it's sometimes called, which cuts out some of the shadow jaggies, but as we discussed a while ago in the thread, using the hardware support is a bit of a pain to arrange, and without hardware support, it quadruples the number of texture fetches. I don't notice a big change in quality anyway so it's turned off at the moment. My hardware is pretty lightweight -- GF8400M on a laptop.


I'm not using a poisson disk, I have tried it and prefered a basic dithered look instead. I have a hunch that this will look better after FSAA and is a lot simpler to implement.

Basically I'm quite happy so far... But I want variable penumbras.

You say that you have in the past implemented PCSS? Did you do it in conjunction with LiSPSM in OGRE?

Re: Hardware PCF

Posted: Sun Sep 27, 2009 5:40 pm
by sinbad
Yeah, in conjunction with some other people - we implemented PCSS on a commercial project with configurable-tap PCF, sometimes going to really high taps to get the quality (focus was on quality rather than on frame rate, but we could drop to lower taps for faster updates) - this was especially the case with PCSS since as you extend the coverage of the kernel, you need more taps to avoid the banding. We tried poisson disk, but it was only useful if you could guarantee that the diffuse texturing was relatively high-frequency and so hid the irregular dithering. On flat shaded areas it didn't really cut it quality wise.

And yes, we used it with LiSPSM, but only on spotlights because that's what the project called for.

In my more frame-rate sensitive apps I tend to use a 3x3 tap fixed filter and that works pretty well with LiSPSM & PSSM. I'm not using PCSS in that case. I want to experiment with Exponential Shadow Maps & PCSS some time.

Re: Hardware PCF

Posted: Mon Sep 28, 2009 4:05 pm
by sparkprime
I didn't know LiSPM made sense for spotlights but I suppose it does help.

When doing PCSS with LiSPSM, how do you scale the tap?

Suppose I want the tap to have radius 0 at 0m, and increase by 0.25 every 1m from the caster to a maximum of a 0.5m tap. This is fine, I can do this. I can store the actual distance in the shadow map, perhaps with a scaling factor to make it fit in [0,1] as we have previously discussed. But these quantities are in world space, so how do I get them into shadow texture space in order to offset the UV?

I've tried calculating axes in the plane perpendicular to the light, in object space, and transforming them by the light transform, but this didn't seem to work at all. The penumbra size changed wildly depending on the camera angle.

Am I thinking about this right?

Re: Hardware PCF

Posted: Mon Sep 28, 2009 5:35 pm
by sinbad
I just checked actually, and although we did have LiSPSM in there at one point we ended up with just FocussedShadowCameraSetup. I can't remember why that was, it was a couple of years ago, but it might have been to stabilise the shadow results better.

In the case of this app though, artists are generally setting up still 'shots' so a bit of discontinuity between frames was not a major concern, since they'd tweak things per shot to look best. So it's probably less representative for what you're doing. Also, the number of PCF samples was so high that small changes generally got blurred out - the fewer samples you're using the more obvious it will be.

We never tried to make the PCSS widening factors operate in real-world units. We simply made everything artist-configurable and everything was a factor on top of that, PCSS and depth-based opacity fading. Because it didn't have to be automatic it made everything somewhat simpler, and in fact a fairly small range worked well for most shots. Maybe you'd need to use the shadow camera's projection matrix to determine the coverage area of a point in the shadow texture?

Re: Hardware PCF

Posted: Mon Sep 28, 2009 5:48 pm
by sparkprime
Yeah that is what prompted the other thread

The changes aren't small, they're *huge*. They break the shadows basically. It must be due to a logical error, not due to approximations and such. Using the shadow camera projection matrix instead of the shadow camera worldviewproj matrix makes sense though, because then offsets in x and y are already aligned correctly on the right plane. Recreating the plane from world space was quite complicated and could have been where logic errors crept in. But I need to hack at ogre to expose the shadow matrix so trying this out will take some time.