### HDRLib, shader performance and Gauss separation

Posted:

**Wed Nov 25, 2009 5:36 pm**This is part question, comment and request for comment. First the question:

Is there a rule of thumb concerning the relative cost between a single (fragment) texture fetch inside a single pass and doing multiple passes?

Now the comment(s) giving rise to this question (I'm an absolute novice concerning shader optimization, so please excuse and correct me if I say something wrong):

i) The Gaussian blur in Hdrlib uses a reduced 5x5 2D Gauss kernel with 13 texture fetches, which corresponds to a diagonal square of 2*sqrt(2) length. Using separation (as does the Ogre HDR demo) one would need 10=5+5 fetches in two passes and would not need to reduce the square.

ii) Somewhere I read, that using linear texture lookup (not anisotropic or the like) comes at no cost compared to nearest neighbor fetch and my own tests seem to confirm this. Then, doing a 1D 5-Pixel Gauss can be accomplished with just 3 texture lookups (without errors) by adjusting the measurement points. More generally, n texture lookups can be replaced by ceil(n/2) lookups. The Ogre demo would benefit from that.

iii) This point I'm pretty sure about: Doing multiple passes of 2D Gauss is certainly bad. Firstly, doing 1D separated passes reduces the fetches from n^2 to 2n and secondly, doing 2 successive Gausses of std. deviation s results in Gauss with std. deviation sqrt(2)*s, not 2*s (more generally sqrt(s1^2+s2^2))- or is there a cleverer way to extend the deviation using multiple passes?

Is there a rule of thumb concerning the relative cost between a single (fragment) texture fetch inside a single pass and doing multiple passes?

Now the comment(s) giving rise to this question (I'm an absolute novice concerning shader optimization, so please excuse and correct me if I say something wrong):

i) The Gaussian blur in Hdrlib uses a reduced 5x5 2D Gauss kernel with 13 texture fetches, which corresponds to a diagonal square of 2*sqrt(2) length. Using separation (as does the Ogre HDR demo) one would need 10=5+5 fetches in two passes and would not need to reduce the square.

ii) Somewhere I read, that using linear texture lookup (not anisotropic or the like) comes at no cost compared to nearest neighbor fetch and my own tests seem to confirm this. Then, doing a 1D 5-Pixel Gauss can be accomplished with just 3 texture lookups (without errors) by adjusting the measurement points. More generally, n texture lookups can be replaced by ceil(n/2) lookups. The Ogre demo would benefit from that.

iii) This point I'm pretty sure about: Doing multiple passes of 2D Gauss is certainly bad. Firstly, doing 1D separated passes reduces the fetches from n^2 to 2n and secondly, doing 2 successive Gausses of std. deviation s results in Gauss with std. deviation sqrt(2)*s, not 2*s (more generally sqrt(s1^2+s2^2))- or is there a cleverer way to extend the deviation using multiple passes?