I've been playing around a bit with PCF for sampling shadow maps and I've got some pseudo-science to share with you.
Here's how most of setups I've seen on this forum and the Ogre Samples start a lot like this; getting a set of depth samples...
Code: Select all
float4 depths = float4(
tex2D(shadowMap, shadowMapPos.xy + someOffset).r,
tex2D(shadowMap, shadowMapPos.xy + someOffset).r,
tex2D(shadowMap, shadowMapPos.xy + someOffset).r,
tex2D(shadowMap, shadowMapPos.xy + someOffset).r);
Code: Select all
float final;
final += (depths.x > fragmentDepth) ? 1.0f : 0.0f;
final += (depths.y > fragmentDepth) ? 1.0f : 0.0f;
final += (depths.z > fragmentDepth) ? 1.0f : 0.0f;
final += (depths.w > fragmentDepth) ? 1.0f : 0.0f;
final *= 0.25f;
return final;
Code: Select all
float4 inlight = ( depths > fragmentDepth);
final = dot(inlight, float4(.25, .25, .25, .25));
return final;
Using a single-pass 2-split PSSM shader taking 8 taps per sample, I get the following compilations
Using standard method; 184 instructions, 4 R-regs using arbfp1. 112 instructions, 4 R-regs, 0 H-regs using FP40
Using 4float compare + dot; 106 instructions & 5 R-regs using arbfp1. 98 instructions, 5 R-regs, 2 H-regs using FP40
I'm no expert, perhaps the cost in extra registers offsets the instruction saving, perhaps the instructions that are being executed are more expensive.
I know in CPU-land, different instructions have differing clock cycle costs, does this also apply in GPU land?
Here's where I found this little optimization
http://amd-dev.wpengine.netdna-cdn.com/ ... apping.pdf There's some other interesting things in that paper regarding something called silhouette maps, I'm wondering if anyone has tried to get that working with Ogre. Here's another silhouette map paper https://graphics.stanford.edu/papers/silmap/silmap.pdf
