Page **1** of **1**

### normalizing vector without sqaure root?

Posted: **Fri Apr 23, 2010 6:27 am**

by **JaJDoo**

as i understand it, normalizing requires square root, which limit its usefulness in per-frame uses.

normalizing can also be done with trig function, which can use trig tables for speed.

ie (i doubt you need this example, but still) :

x: 13, y:7

angle : ATan ( y / x ) = ~28

new y : sin(28) = ~0.46

new x : cos(28) = ~0.88

or calculate vector length ( x/cos(a) or y /sin(a) )

understandingly its significantly faster when using trig tables, and can provide a per-frame tool when accuracy is not critical..

### Re: normalizing vector without sqaure root?

Posted: **Fri Apr 23, 2010 8:34 pm**

by **SigfriedMcWild**

Using lookup tables on modern hardware is a bad idea. Memory is slow, very very slow.

By all means try it out with a profiler, but I expect calculating the square root on the cpu is going to be faster than a lookup table

### Re: normalizing vector without sqaure root?

Posted: **Sat Apr 24, 2010 7:06 am**

by **xavier**

Pretty much every modern compiler is going to use the VPU (SSE/AVX on CPU) for all math operations, and "rsq" (reciprocal square root) is a common operation supported directly in SSE.

When accuracy is not critical, I'd take the few clocks of latency for a RSQRTSS instruction over the untold number of clocks to access a random memory location. In general, most re-normalizations in 3D graphics are not long-lived anyway. As always, the solution that is right for you is the one that works best in your particular situation -- have you profiled both approaches to see which one performs better for you?

### Re: normalizing vector without sqaure root?

Posted: **Sat Apr 24, 2010 7:21 am**

by **xavier**

FWIW:

- The accuracy of RSQRTSS according to the IA instruction set reference matches what I've seen documented elsewhere (in the GCC bugs, for example), namely you get 12 bits of accuracy (the relative error is < 1.5 * 2 ^ -12))

As you can see, for the following code

Code: Select all

```
int _tmain(int argc, _TCHAR* argv[])
{
float f[3];
srand(12345);
f[0] = (float)rand() / (float)RAND_MAX * 3;
f[1] = (float)rand() / (float)RAND_MAX * 3;
f[2] = (float)rand() / (float)RAND_MAX * 3;
float r[3];
r[0] = 1 / sqrt(f[0]);
r[1] = 1 / sqrt(f[1]);
r[2] = 1 / sqrt(f[2]);
printf("r: (%f,%f,%f)\n", r[0], r[1], r[2]);
return 0;
}
```

the Intel compiler produces the following abridged assembly

Code: Select all

```
;;;
;;; float r[3];
;;; r[0] = 1 / sqrt(f[0]);
movss xmm0, DWORD PTR [128+esp] ;18.2
$LN13:
;;; r[1] = 1 / sqrt(f[1]);
movss xmm3, DWORD PTR [132+esp] ;19.2
$LN15:
cvtsi2ss xmm6, eax ;15.2
divss xmm6, DWORD PTR [_2il0floatpacket.20] ;15.2
mulss xmm6, DWORD PTR [_2il0floatpacket.21] ;15.2
$LN17:
rsqrtss xmm1, xmm0 ;18.2
mulss xmm0, xmm1 ;18.2
mulss xmm0, xmm1 ;18.2
subss xmm0, DWORD PTR [_2il0floatpacket.21] ;18.2
$LN19:
;;; r[2] = 1 / sqrt(f[2]);
;;;
;;; printf("r: (%f,%f,%f)\n", r[0], r[1], r[2]);
mov DWORD PTR [esp], OFFSET FLAT: ??_C@_0P@A@r?3?5?$CI?$CFf?0?$CFf?0?$CFf?$CJ?6?$AA@ ;22.2
$LN21:
mulss xmm1, xmm0 ;18.2
mulss xmm1, DWORD PTR [_2il0floatpacket.22] ;18.2
$LN23:
cvtss2sd xmm2, xmm1 ;22.2
movsd QWORD PTR [4+esp], xmm2 ;22.2
$LN25:
rsqrtss xmm4, xmm3 ;19.2
mulss xmm3, xmm4 ;19.2
mulss xmm3, xmm4 ;19.2
subss xmm3, DWORD PTR [_2il0floatpacket.21] ;19.2
$LN27:
rsqrtss xmm7, xmm6 ;20.2
mulss xmm6, xmm7 ;20.2
```

So in short, I wouldn't worry about it. You already said this is for cases where accuracy is less important, and for cases where it's very important (HPC for example) there's always Newton-Raphson (slower but more accurate).

### Re: normalizing vector without sqaure root?

Posted: **Sat Apr 24, 2010 8:31 am**

by **JaJDoo**

well, i did a small experiment with gravity that required a square root each frame.

when i had one object fps was fine.

adding even one more made a significant slowdown in fps when second object was visible on screen.

i dropped the gravity idea for a few reasons; hard to plot, somewhat unpredictable at times, and the orbit itself doesn't 'stay' (even if locked into endless orbit, the ellipse will rotate around the focal point..) - reality makes it hard

i decided to go on a "fake" gravity on elliptical course.

so square root isn't the devil they say it is ?

### Re: normalizing vector without sqaure root?

Posted: **Sat Apr 24, 2010 9:22 am**

by **mkultra333**

If you are normalizing you can multiply by the inverse square root, so then there's the fast inverse square root function made famous by Quake 3 Arena.

http://en.wikipedia.org/wiki/Fast_inverse_square_root
Code: Select all

```
float InvSqrt(float x)
{
union {
float f;
int i;
} tmp;
tmp.f = x;
tmp.i = 0x5f3759df - (tmp.i >> 1);
float y = tmp.f;
return y * (1.5f - 0.5f * x * y * y);
}
```

It's a sped up version of the Newton-Raphson method mentioned above that makes use of some weird float-to-integer demonology. I mention it for historical interest, since this page,

http://assemblyrequired.crashworks.org/ ... uare-root/, shows that this is still significantly slower and less accurate than RSQRTSS... though it still does impressively well compared to other methods, if you can live with the approximation.

### Re: normalizing vector without sqaure root?

Posted: **Sat Apr 24, 2010 1:48 pm**

by **JaJDoo**

just as a suffix, there is a very weird thing going on -

the slowdown i talked about? well its only occurs when i have exactly 3 objects on screen. i a few just to see if there is farther slowdown, but the reverse happened! 4,5,6,7,8.... it just stabilizes. cutting down back to 3, the slow down returns!

odd...

### Re: normalizing vector without sqaure root?

Posted: **Sat Apr 24, 2010 6:51 pm**

by **SigfriedMcWild**

I'd guess it isn't the square root then. Use a profiler to figure out where your application is wasting time. Just guessing won't get you anywhere.

### Re: normalizing vector without sqaure root?

Posted: **Fri May 07, 2010 11:09 pm**

by **Jabberwocky**

JaJDoo wrote:the slowdown i talked about? well its only occurs when i have exactly 3 objects on screen.

The number 3 was removed from the OgreCore as an optimization - the team found that having less than 10 digits can increase performance dramatically. Unfortunately, any reference to the number 3 will cause serious instability and performance problems in the core ogre library, so try to avoid it if at all possible.

### Re: normalizing vector without sqaure root?

Posted: **Sat May 08, 2010 3:48 am**

by **xavier**

Jabberwocky wrote:
The number 3

5, sir.

### Re: normalizing vector without sqaure root?

Posted: **Sat May 08, 2010 4:59 am**

by **Jabberwocky**

I know this sounds crazy, but maybe we could get rid of all numbers except for 0 and 1.

... nah, that'd never work.

### Re: normalizing vector without sqaure root?

Posted: **Sat May 08, 2010 5:17 am**

by **jacmoe**

That really made my day! =)