Page 1 of 1

A cheaper PCF shader?

Posted: Mon Jun 22, 2015 2:22 am
by areay
Hi guys,

I've been playing around a bit with PCF for sampling shadow maps and I've got some pseudo-science to share with you.

Here's how most of setups I've seen on this forum and the Ogre Samples start a lot like this; getting a set of depth samples...

Code: Select all

	float4 depths =	float4(
		tex2D(shadowMap, shadowMapPos.xy + someOffset).r,
		tex2D(shadowMap, shadowMapPos.xy + someOffset).r,
		tex2D(shadowMap, shadowMapPos.xy + someOffset).r,	
		tex2D(shadowMap, shadowMapPos.xy + someOffset).r);
Then the comparisons start, and the examples I've seen all look something like this

Code: Select all

	float final;
	final += (depths.x > fragmentDepth) ? 1.0f : 0.0f;
	final += (depths.y > fragmentDepth) ? 1.0f : 0.0f;
	final += (depths.z > fragmentDepth) ? 1.0f : 0.0f;
	final += (depths.w > fragmentDepth) ? 1.0f : 0.0f;
	final *= 0.25f;
	return final;
But if you do that comparison like this instead (using a 4-float compare then a dot product)

Code: Select all

    float4 inlight = ( depths > fragmentDepth);
    final = dot(inlight, float4(.25, .25, .25, .25));
    return final;
Then you get a cheaper shader! So I'm basing this on the instruction count from the compiled shader

Using a single-pass 2-split PSSM shader taking 8 taps per sample, I get the following compilations

Using standard method; 184 instructions, 4 R-regs using arbfp1. 112 instructions, 4 R-regs, 0 H-regs using FP40
Using 4float compare + dot; 106 instructions & 5 R-regs using arbfp1. 98 instructions, 5 R-regs, 2 H-regs using FP40

I'm no expert, perhaps the cost in extra registers offsets the instruction saving, perhaps the instructions that are being executed are more expensive.

I know in CPU-land, different instructions have differing clock cycle costs, does this also apply in GPU land?

Here's where I found this little optimization ... apping.pdf There's some other interesting things in that paper regarding something called silhouette maps, I'm wondering if anyone has tried to get that working with Ogre. Here's another silhouette map paper

Re: A cheaper PCF shader?

Posted: Fri Jul 17, 2015 11:30 pm
by dark_sylinc
Both approaches will boil down pretty to the same in modern hardware because they're superscalar, not vector. The HLSL or Cg asm code is not a relevant metric when it comes to microoptimizations like this one. Older hardware like the AMD Radeon HD 2000 through 5000 may benefit from it though.

You should look at GPUPerfStudio's shader analyzer or PowerVR's tool to checkout the actual ISA code (sadly the other vendors don't provide tools to see what the actual shader looks like)

If you're looking in a better PCF-filtered shader (higher quality), you may want to look at the produced output from 2.1's shaders when not using depth maps.

The GLSL output is in the form of:

Code: Select all

//The 0.00196 is a magic number that prevents floating point
//precision problems ("1000" becomes "999.999" causing fW to
//be 0.999 instead of 0, hence ugly pixel-sized dot artifacts
//appear at the edge of the shadow).
fW = fract( uv * + 0.00196 );

vec4 c;
c.w = texture(shadowMap, uv ).r;
c.z = texture(shadowMap, uv + vec2( invShadowMapSize.x, 0.0 ) ).r;
c.x = texture(shadowMap, uv + vec2( 0.0, invShadowMapSize.y ) ).r;
c.y = texture(shadowMap, uv + vec2( invShadowMapSize.x, invShadowMapSize.y ) ).r;

c = step( fDepth, c );

retVal += mix(
			mix( c.w, c.z, fW.x ),
			mix( c.x, c.y, fW.x ),
			fW.y );
"mix" is lerp in HLSL/Cg.

The differences can be seen in this post.