NaNs coming out of ArrayQuaternion::FromOrthoDet1RotationMatrix

Discussion area about developing with Ogre-Next (2.1, 2.2 and beyond)


Post Reply
claytongarrett
Gnoblar
Posts: 12
Joined: Mon Sep 18, 2017 12:47 am

NaNs coming out of ArrayQuaternion::FromOrthoDet1RotationMatrix

Post by claytongarrett »

I have a mesh with a skeleton, and a TagPoint attached to a bone in that skeleton. Periodically, when updateAllTransformsBoneToTag is called, I wind up with NaNs in the mDerivedOrientation of the TagPoint. I've tracked this down to this line in ArrayQuaternion::FromOrthoDet1RotationMatrix called from within ArrayMatrixAf4x3::decomposition:

Code: Select all

mChunkBase[1] = vmulq_f32( MathlibNEON::Sqrt( tmp ), MathlibNEON::HALF );
Specifically, very small numbers go into MathlibNEON::Sqrt, and become NaN when they come out of vrsqrteq_f32( f ) because of a divide by zero.

I noticed in one other place whe vrsqrteq_f32 is used, its followed it up with a line labeled "//Nuke this NaN" -- maybe something like that's missing from MathlibNEON::Sqrt?

Thanks!
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5299
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1279
Contact:

Re: NaNs coming out of ArrayQuaternion::FromOrthoDet1RotationMatrix

Post by dark_sylinc »

I see.

So, just to confirm, this bug does not happen when Ogre is compiled with -DOGRE_SIMD_NEON=0 ?

David (masterfalcon) is the one who wrote the NEON port. SSE2 supports _mm_sqrt_ps directly. It appears there is no NEON equivalent so David had to use x * vrsqrteq_f32 (which calculates x / sqrt( x )) and applies two Netwon-Raphson iterations to improve accuracy, getting it closer to a real sqrt.

On SSE2, I am the one who wrote InvSqrt4 and InvSqrtNonZero4 using newton raphson iterations, and the former indeed accounts for 0 inputs. It seems David grabbed some code from ARM on how to implement sqrt using vrsqrteq that didn't account for that.

Indeed, vrsqrteq_f32 does not work for x = 0. And indeed the solution is to detect and nuke the NaN. We're not the first ones to fall for that. It seems they use vtstq_u32( x, x ) which stands for x & x ? 0000 : 1111 as an equivalent for vceqq_f32(v, vdupq_n_f32(0.0f) ) which is basically x == 0 ? 0000 : 1111, claiming it's faster.

What I do not know is if it's safe to nuke the NaN at fStep0 or if it should be done at the end (I don't know if vrsqrtsq_f32( 0, 0 ) results in valid output, or if it goes NaN or returns garbage) Stopping the NaN as soon as possible should lead to better performance (since NaN tend to have the habit of slowing down calculations) but if it's not possible, the NaN would have to be nuked at the end.

I'm saying all this because I don't have the time to perform the experiments, but you should be able to; since you'd be literally adding one line of code:

Test this code:

Code: Select all

static inline ArrayReal Sqrt( ArrayReal f )
{
	//Netwon-Raphson, 2 iterations.
	ArrayReal fStep0 = vrsqrteq_f32( f );
	fStep0 = vreinterpretq_f32_u32( vandq_u32( vtstq_u32( f, f ), vreinterpretq_u32_f32( fStep0 ) ) ); //Nuke NaN when f == 0
	// step fStep0 = 1 / sqrt(x)
	const ArrayReal fStepParm0  = vmulq_f32( f, fStep0 );
	const ArrayReal fStepResult0= vrsqrtsq_f32( fStepParm0, fStep0 );
	// step fStep1 = 1 / sqrt(x)
	const ArrayReal fStep1      = vmulq_f32( fStep0, fStepResult0 );
	const ArrayReal fStepParm1  = vmulq_f32( f, fStep1 );
	const ArrayReal fStepResult1= vrsqrtsq_f32( fStepParm1, fStep1 );
	// take the res. fStep2 = 1 / sqrt(x)
	const ArrayReal fStep2      = vmulq_f32( fStep1, fStepResult1 );
	// mul by x to get sqrt, not rsqrt
	return vmulq_f32( f, fStep2 );
}
Otherwise test this code:

Code: Select all

static inline ArrayReal Sqrt( ArrayReal f )
{
	//Netwon-Raphson, 2 iterations.
	const ArrayReal fStep0 = vrsqrteq_f32( f );
	// step fStep0 = 1 / sqrt(x)
	const ArrayReal fStepParm0  = vmulq_f32( f, fStep0 );
	const ArrayReal fStepResult0= vrsqrtsq_f32( fStepParm0, fStep0 );
	// step fStep1 = 1 / sqrt(x)
	const ArrayReal fStep1      = vmulq_f32( fStep0, fStepResult0 );
	const ArrayReal fStepParm1  = vmulq_f32( f, fStep1 );
	const ArrayReal fStepResult1= vrsqrtsq_f32( fStepParm1, fStep1 );
	// take the res. fStep2 = 1 / sqrt(x)
	const ArrayReal fStep2      = vmulq_f32( fStep1, fStepResult1 );
	fStep2 = vreinterpretq_f32_u32( vandq_u32( vtstq_u32( f, f ), vreinterpretq_u32_f32( fStep2 ) ) ); //Nuke NaN when f == 0
	// mul by x to get sqrt, not rsqrt
	return vmulq_f32( f, fStep2 );
}
Test it by doing MathlibeNEON::Sqrt( 0 ) rather than waiting for the whole thing skeleton to generate a NaN.

My only worry however, is if this routine will still generate NaNs when x is very close to 0, but not exactly 0 (the routine would have to be modified so that when vcneqq_f32( fStep2, fStep2 ) returns false i.e. fStep2 is a NaN, and f >= 0, then we should nuke the NaN, otherwise leave the NaN, as f < 0 should generate NaNs).

Update: According to David, I am the one who wrote that routine. Gosh I remember the commit, but I don't remember that.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5299
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1279
Contact:

Re: NaNs coming out of ArrayQuaternion::FromOrthoDet1RotationMatrix

Post by dark_sylinc »

David just tested the first snippet and he told me it works.
Please let me know if the first snippet fixes your problem! If so, I'll push the change to the repo.

Thanks
claytongarrett
Gnoblar
Posts: 12
Joined: Mon Sep 18, 2017 12:47 am

Re: NaNs coming out of ArrayQuaternion::FromOrthoDet1RotationMatrix

Post by claytongarrett »

Awesome! Yes, that solved my issue. It's working as expected now. Thank you very much for the quick turnaround.
AtTheWall
Gnoblar
Posts: 8
Joined: Fri Apr 07, 2017 2:53 pm
x 1

Re: NaNs coming out of ArrayQuaternion::FromOrthoDet1RotationMatrix

Post by AtTheWall »

Hello again!

We've run into this issue again, this time with InvSqrt4 producing NaNs for small numbers. It specifically comes up when we have something calculating its world-space position (eg particle systems with the local space option turned off).
Post Reply