Page 1 of 1

HDRLib, shader performance and Gauss separation

Posted: Wed Nov 25, 2009 5:36 pm
by froh
This is part question, comment and request for comment. First the question:

Is there a rule of thumb concerning the relative cost between a single (fragment) texture fetch inside a single pass and doing multiple passes?

Now the comment(s) giving rise to this question (I'm an absolute novice concerning shader optimization, so please excuse and correct me if I say something wrong):

i) The Gaussian blur in Hdrlib uses a reduced 5x5 2D Gauss kernel with 13 texture fetches, which corresponds to a diagonal square of 2*sqrt(2) length. Using separation (as does the Ogre HDR demo) one would need 10=5+5 fetches in two passes and would not need to reduce the square.

ii) Somewhere I read, that using linear texture lookup (not anisotropic or the like) comes at no cost compared to nearest neighbor fetch and my own tests seem to confirm this. Then, doing a 1D 5-Pixel Gauss can be accomplished with just 3 texture lookups (without errors) by adjusting the measurement points. More generally, n texture lookups can be replaced by ceil(n/2) lookups. The Ogre demo would benefit from that.

iii) This point I'm pretty sure about: Doing multiple passes of 2D Gauss is certainly bad. Firstly, doing 1D separated passes reduces the fetches from n^2 to 2n and secondly, doing 2 successive Gausses of std. deviation s results in Gauss with std. deviation sqrt(2)*s, not 2*s (more generally sqrt(s1^2+s2^2))- or is there a cleverer way to extend the deviation using multiple passes?

Re: HDRLib, shader performance and Gauss separation

Posted: Thu Nov 26, 2009 11:13 am
by tuan kuranes
No rule of thumb...
smaller texture, less instructions, etc... check nvperhud,pix,gpuperfstudio tools in order to get precise profiling, that's the way.

i) yes, contributions surely welcome

ii) yes, contributions surely welcome (sure ogre doesn't do this, even the "cheap" bloom technique ? didn't checked.)

iii) search "summed area table". If you read "bilateral filters" and "edge aware blur" papers you'll be up to the latest research on that. (you can read many depth shadowing paper too, lots of them are near the topic of blurring)

Re: HDRLib, shader performance and Gauss separation

Posted: Thu Nov 26, 2009 1:50 pm
by froh
Thanks for the reply, in particular the information source hints.

I had hoped for some experience data like "for resolution xy a texture fetch takes approx. xy fraction of a pass overhead on hardware with XY shader units". Of course, this would mean "almost purely texture lookup shader", although a rule of thumb concerning break even point between code and texture lookup would be interesting, but propably even less easy to give (e.g. in the light of special optimisations in particular hardware types).

I just hoped to be able to save a good deal of the "implement everything you can think of and see what's fastest" approach, at least at the algorithm design level. Does "no rule of thumb" really mean "no pattern visible" or does it just mean "nobody tried to find one"?

Concerning the Ogre examples: To be honest, I did not check all of them, the "nearest neighbor" approach just appeared as a common pattern in the examples I did check, which are not from trunk, but from a some months old 1.6 checkout. But I promise to check as soon as I find time to do it and contribute if my initial impression holds.

Re: HDRLib, shader performance and Gauss separation

Posted: Thu Nov 26, 2009 2:09 pm
by tuan kuranes
what's fastest" approach
Depends on the needs, really.
The 2 pass, bilinear, might be the fastest, but it's very limited (and prone to "boxfilter" artifacts)
Just reading paper (not implementing everything) might give you ideas that applies to your type of scene/images/colors.
Does "no rule of thumb" really mean "no pattern visible" or does it just mean "nobody tried to find one"?
Each GPU generation/model has it's own way of dealing with pixel shaders, so it's really hard to come up with something there apart the usual direction (texture size, texture lookup, instruction count, etc...)
Here's a example (with profiler use) of what I mean.

Re: HDRLib, shader performance and Gauss separation

Posted: Fri Nov 27, 2009 2:55 pm
by froh
Thx again for pointing me at search terms.
An answer to my initial question (certainly not universal, but exactly the kind of information I was after), may be found at ... EG2005.pdf, Table 2.

On the other hand, also thanks for adjusting my priorities, I was really asking the second question before the first. The argument for taking the reduced Gauss kernel in hdrlib I read was just saving texture fetches, but it should really read: If we must cope with few texture fetches, we must not take a rectangular region because of the artifacts introduced by multiplying with the cut-off box filter and we must not make the Gauss pixel radius too small because of the artifacts from the convoluted downsampling box filter.

Concerning "limited", I would not bother - the Gauss was just an example for me to understand general concepts. Where I really did fail to think enough was the kernel sizes we are talking about - much too small for scaling arguments to dominate the dirty grounds of hardware specialities (like those discussed in your last reference).

Summed area tables are certainly a very interesting ansatz to overcome this problem, but unfortunately propably still too slow for my needs. Nonetheless, lots of brain food you pointed me at...