Page 1 of 1

normalizing vector without sqaure root?

Posted: Fri Apr 23, 2010 6:27 am
by JaJDoo
as i understand it, normalizing requires square root, which limit its usefulness in per-frame uses.

normalizing can also be done with trig function, which can use trig tables for speed.

ie (i doubt you need this example, but still) :
x: 13, y:7
angle : ATan ( y / x ) = ~28
new y : sin(28) = ~0.46
new x : cos(28) = ~0.88
or calculate vector length ( x/cos(a) or y /sin(a) )

understandingly its significantly faster when using trig tables, and can provide a per-frame tool when accuracy is not critical..

Re: normalizing vector without sqaure root?

Posted: Fri Apr 23, 2010 8:34 pm
by SigfriedMcWild
Using lookup tables on modern hardware is a bad idea. Memory is slow, very very slow.

By all means try it out with a profiler, but I expect calculating the square root on the cpu is going to be faster than a lookup table

Re: normalizing vector without sqaure root?

Posted: Sat Apr 24, 2010 7:06 am
by xavier
Pretty much every modern compiler is going to use the VPU (SSE/AVX on CPU) for all math operations, and "rsq" (reciprocal square root) is a common operation supported directly in SSE.

When accuracy is not critical, I'd take the few clocks of latency for a RSQRTSS instruction over the untold number of clocks to access a random memory location. In general, most re-normalizations in 3D graphics are not long-lived anyway. As always, the solution that is right for you is the one that works best in your particular situation -- have you profiled both approaches to see which one performs better for you?

Re: normalizing vector without sqaure root?

Posted: Sat Apr 24, 2010 7:21 am
by xavier
FWIW:

- The accuracy of RSQRTSS according to the IA instruction set reference matches what I've seen documented elsewhere (in the GCC bugs, for example), namely you get 12 bits of accuracy (the relative error is < 1.5 * 2 ^ -12))

As you can see, for the following code

Code: Select all

int _tmain(int argc, _TCHAR* argv[])
{
	float f[3];

	srand(12345);
	f[0] = (float)rand() / (float)RAND_MAX * 3;
	f[1] = (float)rand() / (float)RAND_MAX * 3;
	f[2] = (float)rand() / (float)RAND_MAX * 3;

	float r[3];
	r[0] = 1 / sqrt(f[0]);
	r[1] = 1 / sqrt(f[1]);
	r[2] = 1 / sqrt(f[2]);

	printf("r: (%f,%f,%f)\n", r[0], r[1], r[2]);

	return 0;
}
the Intel compiler produces the following abridged assembly

Code: Select all

;;; 
;;; 	float r[3];
;;; 	r[0] = 1 / sqrt(f[0]);

        movss     xmm0, DWORD PTR [128+esp]                     ;18.2
$LN13:

;;; 	r[1] = 1 / sqrt(f[1]);

        movss     xmm3, DWORD PTR [132+esp]                     ;19.2
$LN15:
        cvtsi2ss  xmm6, eax                                     ;15.2
        divss     xmm6, DWORD PTR [_2il0floatpacket.20]         ;15.2
        mulss     xmm6, DWORD PTR [_2il0floatpacket.21]         ;15.2
$LN17:
        rsqrtss   xmm1, xmm0                                    ;18.2
        mulss     xmm0, xmm1                                    ;18.2
        mulss     xmm0, xmm1                                    ;18.2
        subss     xmm0, DWORD PTR [_2il0floatpacket.21]         ;18.2
$LN19:

;;; 	r[2] = 1 / sqrt(f[2]);
;;; 
;;; 	printf("r: (%f,%f,%f)\n", r[0], r[1], r[2]);

        mov       DWORD PTR [esp], OFFSET FLAT: ??_C@_0P@A@r?3?5?$CI?$CFf?0?$CFf?0?$CFf?$CJ?6?$AA@ ;22.2
$LN21:
        mulss     xmm1, xmm0                                    ;18.2
        mulss     xmm1, DWORD PTR [_2il0floatpacket.22]         ;18.2
$LN23:
        cvtss2sd  xmm2, xmm1                                    ;22.2
        movsd     QWORD PTR [4+esp], xmm2                       ;22.2
$LN25:
        rsqrtss   xmm4, xmm3                                    ;19.2
        mulss     xmm3, xmm4                                    ;19.2
        mulss     xmm3, xmm4                                    ;19.2
        subss     xmm3, DWORD PTR [_2il0floatpacket.21]         ;19.2
$LN27:
        rsqrtss   xmm7, xmm6                                    ;20.2
        mulss     xmm6, xmm7                                    ;20.2
So in short, I wouldn't worry about it. You already said this is for cases where accuracy is less important, and for cases where it's very important (HPC for example) there's always Newton-Raphson (slower but more accurate).

Re: normalizing vector without sqaure root?

Posted: Sat Apr 24, 2010 8:31 am
by JaJDoo
well, i did a small experiment with gravity that required a square root each frame.
when i had one object fps was fine.
adding even one more made a significant slowdown in fps when second object was visible on screen.
i dropped the gravity idea for a few reasons; hard to plot, somewhat unpredictable at times, and the orbit itself doesn't 'stay' (even if locked into endless orbit, the ellipse will rotate around the focal point..) - reality makes it hard :P

i decided to go on a "fake" gravity on elliptical course.
so square root isn't the devil they say it is ?

Re: normalizing vector without sqaure root?

Posted: Sat Apr 24, 2010 9:22 am
by mkultra333
If you are normalizing you can multiply by the inverse square root, so then there's the fast inverse square root function made famous by Quake 3 Arena.

http://en.wikipedia.org/wiki/Fast_inverse_square_root

Code: Select all

float InvSqrt(float x)
{
	union {
		float f;
		int i;
	} tmp;
	tmp.f = x;
	tmp.i = 0x5f3759df - (tmp.i >> 1);
	float y = tmp.f;
	return y * (1.5f - 0.5f * x * y * y);
}
It's a sped up version of the Newton-Raphson method mentioned above that makes use of some weird float-to-integer demonology. I mention it for historical interest, since this page, http://assemblyrequired.crashworks.org/ ... uare-root/, shows that this is still significantly slower and less accurate than RSQRTSS... though it still does impressively well compared to other methods, if you can live with the approximation.

Re: normalizing vector without sqaure root?

Posted: Sat Apr 24, 2010 1:48 pm
by JaJDoo
just as a suffix, there is a very weird thing going on -

the slowdown i talked about? well its only occurs when i have exactly 3 objects on screen. i a few just to see if there is farther slowdown, but the reverse happened! 4,5,6,7,8.... it just stabilizes. cutting down back to 3, the slow down returns!
odd...

Re: normalizing vector without sqaure root?

Posted: Sat Apr 24, 2010 6:51 pm
by SigfriedMcWild
I'd guess it isn't the square root then. Use a profiler to figure out where your application is wasting time. Just guessing won't get you anywhere.

Re: normalizing vector without sqaure root?

Posted: Fri May 07, 2010 11:09 pm
by Jabberwocky
JaJDoo wrote:the slowdown i talked about? well its only occurs when i have exactly 3 objects on screen.
The number 3 was removed from the OgreCore as an optimization - the team found that having less than 10 digits can increase performance dramatically. Unfortunately, any reference to the number 3 will cause serious instability and performance problems in the core ogre library, so try to avoid it if at all possible.

Re: normalizing vector without sqaure root?

Posted: Sat May 08, 2010 3:48 am
by xavier
Jabberwocky wrote: The number 3
5, sir.

Re: normalizing vector without sqaure root?

Posted: Sat May 08, 2010 4:59 am
by Jabberwocky
I know this sounds crazy, but maybe we could get rid of all numbers except for 0 and 1.
... nah, that'd never work.

Re: normalizing vector without sqaure root?

Posted: Sat May 08, 2010 5:17 am
by jacmoe
That really made my day! =)