Remove AABB Asserts for Min <= Max

What it says on the tin: a place to discuss proposed new features.
User avatar
nullsquared
Old One
Posts: 3245
Joined: Tue Apr 24, 2007 8:23 pm
Location: NY, NY, USA

Remove AABB Asserts for Min <= Max

Post by nullsquared » Sun Apr 13, 2008 7:14 pm

Referring to this:
http://www.ogre3d.org/phpBB2/viewtopic. ... light=aabb

The AABB assert seems to be only troublesome and very little (read: not at all) helpful. Some functions (for me, _updateBounds(), mostly) seem to randomly generate NAN or INF or other invalid values, and the AABB check crashes my program in debug. Running it through the debugger works, but it is insanely slow - and running it in release works, too. There is no visual/performance/logic/anything difference between when it runs and when it crashes with this AABB.

It occurs very randomly. And it seems to not be related to memory corruption or anything of the sort, I've ran it millions of times in release mode and under the debugger, it always runs correctly. (and why would debug mode have memory corruption, and release not?) Eugen seems to confirm the same issue.

So, my request? If needed, just clamp min to max. Or max to min, it doesn't make that big of a difference. Don't use asserts so liberally for such small, negligible issues - this makes my debug build unusable 50% of the time, since I don't like running it under the debugger and waiting half an hour (:roll:) for the program just to start up.

I understand if you think I'm trying to treat the symptoms and not the problem, but the problem seems inexistent - I am not mixing DLLs, I tried both MSVC and Code::Blocks with mingw, HEAD and 1.4.6 and 1.4.7 and 1.4.5 and 1.4.4, and even though I hate to say this, I tend to write memory-safe code, I doubt I'm just randomly corrupting memory somewhere - I use the STL whenever possible and applicable, and pretty much never manage new/delete objects on my own via random pointers. And even then, I'd expect a liberal crash, not a liberal NAN/INF AABB assert.

EDIT: Building my application in debug mode using release-mode Ogre runs just as does release mode. No AABB assert, and the program has correct logic, memory consumption is as predicted, and everything seems to be working fine. Exits fine, as well. The only problem here is that now I have to rely on Ogre's exceptions for the most part (though it's still a good thing I get to run my app itself in debug mode).
0 x

User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19261
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
Contact:

Post by sinbad » Mon Apr 14, 2008 3:44 pm

Actually the AABB asserts are there for a reason. In Eihort the AABB updates are now specialised and will not function correctly if min > max, the asserts are there to tell people specifically what the problem is rather than just silently failing in unusual ways.
0 x

User avatar
nullsquared
Old One
Posts: 3245
Joined: Tue Apr 24, 2007 8:23 pm
Location: NY, NY, USA

Post by nullsquared » Mon Apr 14, 2008 8:36 pm

sinbad wrote:Actually the AABB asserts are there for a reason. In Eihort the AABB updates are now specialised and will not function correctly if min > max, the asserts are there to tell people specifically what the problem is rather than just silently failing in unusual ways.
:|

Not sure what to say here, other than to reiterate... Removing the asserts has not caused any problems what-so-ever - no culling differences, no logic differences, no collision differences, nothing. I understand what you're saying, but my bounding box asserts seem to be very random and unusual. Like I said, it seems transformAffine and _updateBounds are somehow generating NAN or INF values. This seems to be related mostly to cameras, as my issues seem to be camera-related, and Eugen's seem to be shadow camera-related.

I just don't think asserts are the right thing here. Sure, for user-made AABB's it sounds OK (tell people the AABB is wrong), but crashing the whole application due to some odd internal numerical error is annoying. I think simply logging the condition as an "AABB error: blah" and defaulting min and max to fail safe values (or, like I mentioned, just clamp one to the other) is a better idea. Because, once more, there are 0 errors/problems when I remove that assert.
0 x

User avatar
Game_Ender
Ogre Magi
Posts: 1269
Joined: Wed May 25, 2005 2:31 am
Location: Rockville, MD, USA

Post by Game_Ender » Mon Apr 14, 2008 10:21 pm

Isn't the issue that nans, and infs poping up mean there something inherently wrong with what you are doing? (after all the asserts have been there for years) You don't want the values to propagate and read the ugly head somewhere else you least suspect it.
0 x

User avatar
Kojack
OGRE Moderator
OGRE Moderator
Posts: 7144
Joined: Sun Jan 25, 2004 7:35 am
Location: Brisbane, Australia
x 12

Post by Kojack » Mon Apr 14, 2008 10:31 pm

If it was just a case of making setExtents sort the x,y,z components so they are always min<max, I wouldn't see a problem (give 2 opposite corners, and a correct aabb is generated). That would give the same result as if you took an empty aabb and called merge() on both corners.

But NAN's and IND's are a whole different story. They spread through data, corrupting stuff as they go. Any operation between a number and a NAN gives a NAN. Just because they aren't making your program crash now doesn't mean they are harmless. Unless you specifically want them and have code to isolate them, they should never be allowed in a program.

The most common cause of IND (this post talks about INF, but the other one you linked to says the bad data was IND, they are very different) is things like doing an acos or asin of a number outside of the range -1.0 to 1.0. I've seen it happen when people normalised 2 vectors, did a dot product, then an acos to get the angle. Even though the 2 vectors were normalised, floating point errors caused the dot product to be something like 1.0001, which is enough to make the acos fail and give an IND (which then spread).

Basically, if NAN, IND or INF are being generated, there's something wrong (mathematically) which should be fixed (or at least understood).
0 x

btmorex
Gremlin
Posts: 156
Joined: Thu May 17, 2007 10:56 pm

Post by btmorex » Mon Apr 14, 2008 11:54 pm

but crashing the whole application due to some odd internal numerical error is annoying.
I think that's the whole point of the assert. Crash at the first sign that something is *really* wrong instead of letting subtle bugs propagate until the effects of those bugs are quite far from the source. I can only see three causes of a problem like that:

1.) Bug in ogre
2.) Bug in your app
3.) Faulty hardware

I wouldn't want to cover up any of those causes by just removing the assert.
0 x

User avatar
nullsquared
Old One
Posts: 3245
Joined: Tue Apr 24, 2007 8:23 pm
Location: NY, NY, USA

Post by nullsquared » Tue Apr 15, 2008 12:32 am

Kojack wrote: The most common cause of IND (this post talks about INF, but the other one you linked to says the bad data was IND, they are very different) is things like doing an acos or asin of a number outside of the range -1.0 to 1.0. I've seen it happen when people normalised 2 vectors, did a dot product, then an acos to get the angle. Even though the 2 vectors were normalised, floating point errors caused the dot product to be something like 1.0001, which is enough to make the acos fail and give an IND (which then spread).
*pretends /me knew what INF/IND/NAN were before* Yeah, I was getting IND, not NAN or INF. I just sort of "grouped" them together :oops:

Well, I'm honestly stumped. I have no idea what's causing the faulty values. I think the first time I got this error was around here: http://www.ogre3d.org/phpBB2/viewtopic. ... t+checking
After this, I've gotten it very randomly - sometimes I get it, others I don't.

Pretty much my only "info" is eugen's post:
eugen wrote:
i also get set extents debug assertions with the 1.6.7 version...i had those with 1.6.4 version also
here the error is related to the frustum attached to a shadow texture when rendering additive soft pcf shadows...i get it all the time in the same place and i have some weird nan values for bb corners for the root node (i cant reproduce it right now to give little more details but ill be doing in tomorow)

here i dont think there is any corrupting going on anyway, just an unusual case...
(from the other thread, linked in the OP)

I don't think I have any corrupting going on, either. The project really isn't "all that big", and there's few places where I can corrupt the memory - though, as far as I can code, I tend to handle them nicely (mostly just some new and delete on GameObjects, all handled with managers/factories). Everything else is just Ogre, for the most part.

As for hardware... it's not failing. I bench the gfx. card pretty much every month, and run Prime95 + memtest86+ (all while logging temperatures, other than memtest86+) every other month, just to check my hardware. And a more "normal" check, I play stuff like HL2:EP2/DM/Portal/F.E.A.R. every day, no problems.

Maybe it's some deep Ogre math bug, or something? I pretty much scrapped my whole MSVC "debugging suite" because I'm a GCC-type-of-guy (:lol:), but AFAICR, the IND value was generate after Ogre::AxisAlignedBox::transformAffine, which was being called from SceneNode::_updateBounds(). I believe that the scene node had a camera attached.

...Ideas? Not sure what to say here. If someone else is getting such AABB oddness, please do report.
Last edited by nullsquared on Tue Apr 15, 2008 12:38 am, edited 1 time in total.
0 x

Murphy
Greenskin
Posts: 102
Joined: Tue May 10, 2005 11:42 pm
Location: SF, California
Contact:

Post by Murphy » Tue Apr 15, 2008 12:33 am

I am with nullsquared on this one. This happens to me all the time. Also note that this is only in debug mode people. It isn't a bug in our code if it works 100% of the time in release and maybe 50-75% of the time in debug.

I don't know what it is but for me I traced it back to something related to the camera...

I will try to provide some more details when I can get back to my home computer where I can actually debug.
0 x

User avatar
nullsquared
Old One
Posts: 3245
Joined: Tue Apr 24, 2007 8:23 pm
Location: NY, NY, USA

Post by nullsquared » Tue Apr 15, 2008 12:40 am

Murphy wrote:I am with nullsquared on this one. This happens to me all the time. Also note that this is only in debug mode people. It isn't a bug in our code if it works 100% of the time in release and maybe 50-75% of the time in debug.
Yeah, this is what I'm trying to reinforce - it doesn't look like a bug in the our code, it runs 100% as expected in Release (and Debug, when linked to Release Ogre, which is what I'm doing now, since Debug Ogre is practically unusable for me).
I don't know what it is but for me I traced it back to something related to the camera...
Hm. Yes, seems like a camera issue - Eugen with shadow frustums, me with random cameras, you with cameras... :|
I will try to provide some more details when I can get back to my home computer where I can actually debug.
We'd all be grateful :)

I'm starting to think it's some deeply buried bug, somewhere within the depths of Ogre cameras - or maybe somewhere else, we still need to find it :lol:.
0 x

User avatar
eugen
OGRE Expert User
OGRE Expert User
Posts: 1422
Joined: Sat May 22, 2004 5:28 am
Location: Bucharest
Contact:

Post by eugen » Tue Apr 15, 2008 1:25 am

ive made some time to test it a bit further tonight. I dont have any enlighting information but only a few more details

Camera::_renderScene –> the vp parameter in the function header viewport.mCamera->frustum.mProjMatrix = all values are nans

SceneManager::_renderScene – findVisibleObjects
Frustum::calcProjectionParamters – the mProjMatrix is having all nan values and everything from here is screwed

it seems like an initialization problem to me since after 4-5 steps the error goes away and everything is working ok after (these steps might correspond to the number of camera projectors for the shadows (the testing scene had 5 spot lights casting shadows of the same type))

here there are some screenshots with stacktrace and debug information

Image
Image
Image
0 x

User avatar
Kojack
OGRE Moderator
OGRE Moderator
Posts: 7144
Joined: Sun Jan 25, 2004 7:35 am
Location: Brisbane, Australia
x 12

Post by Kojack » Tue Apr 15, 2008 11:05 am

Try adding:

Code: Select all

_controlfp_s(NULL, _EM_UNDERFLOW + _EM_OVERFLOW + _EM_ZERODIVIDE + _EM_INEXACT, _MCW_EM);
near the top of your program, and include float.h.

That should make visual studio break when a bad float operation happens. It might jump back in the stack one level too far in the debugger, but it should show where the first bad value is appearing.

You'll also have to disable __OGRE_HAVE_SSE in ogreplatforminformation.h and rebuild everything otherwise the sse code will trigger a break during software vertex skinning.

I've been taking a look at the camera, but I don't know that code very well, and it's rather hard to debug a problem I've never seen and can't reproduce.
0 x

Spanky
Halfling
Posts: 80
Joined: Mon Oct 07, 2002 2:45 am
Location: Ontario Canada

Post by Spanky » Tue Apr 15, 2008 4:48 pm

Don't you have to clear the mask for the exception to trigger? I tried it in a test program (never seen this before) and I had to clear the mask to get the exception to be raised.

Check the 'Hexidecimal Values' section just before midway down
http://msdn2.microsoft.com/en-us/librar ... S.80).aspx

Awesome tip though :) Should make life a lot easier for tracking those suckers down.

Shawn
0 x

User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19261
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
Contact:

Post by sinbad » Tue Apr 15, 2008 7:28 pm

@eugen: the problem you highlighted there must be with your own code. Notice how you're breaking in the part that says 'if (mCustomProjMatrix)' - that can ONLY happen if you've called Frustum::setCustomProjectionMatrix, and at that point, Ogre isn't responsible for the contents of the projection matrix anymore, because *you've* provided it ;) Therefore mProjMatrix has garbage in it because that's what you passed to it.
0 x

User avatar
bibiteinfo
Gremlin
Posts: 197
Joined: Wed Apr 12, 2006 2:48 pm
Location: Montreal, Canada

Post by bibiteinfo » Tue Apr 15, 2008 9:20 pm

Keep the assert, it saved my life many times!
0 x
Image

User avatar
eugen
OGRE Expert User
OGRE Expert User
Posts: 1422
Joined: Sat May 22, 2004 5:28 am
Location: Bucharest
Contact:

Post by eugen » Tue Apr 15, 2008 9:41 pm

The camera for which the error is actually raised is the camera created by ogre to be used for a shadow texture...the name is a default one as in "Ogre/ShadowTexture44cam", im not realy doing anything for this (or at least thats what it seems to me)
ill check it out more
0 x

User avatar
eugen
OGRE Expert User
OGRE Expert User
Posts: 1422
Joined: Sat May 22, 2004 5:28 am
Location: Bucharest
Contact:

Post by eugen » Tue Apr 15, 2008 9:42 pm

bibiteinfo wrote:Keep the assert, it saved my life many times!
im also against removing the assert since here it certainly catches an error we couldnt have otherwise
0 x

User avatar
nullsquared
Old One
Posts: 3245
Joined: Tue Apr 24, 2007 8:23 pm
Location: NY, NY, USA

Post by nullsquared » Tue Apr 15, 2008 9:54 pm

eugen wrote:
bibiteinfo wrote:Keep the assert, it saved my life many times!
im also against removing the assert since here it certainly catches an error we couldnt have otherwise
Yeah, we need to find this issue and fix it, not cover it up. Sorry about the "removal request", I'm not really looking so much for the AABB assert to be removed as much as fix whatever bug we're all not catching ;)
0 x

User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19261
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
Contact:

Post by sinbad » Tue Apr 15, 2008 10:07 pm

eugen wrote:The camera for which the error is actually raised is the camera created by ogre to be used for a shadow texture...the name is a default one as in "Ogre/ShadowTexture44cam", im not realy doing anything for this (or at least thats what it seems to me)
ill check it out more
Ok, then the problem is with the shadow camera setup being used, perhaps an edge case in whatever projection system its using.
0 x

Psyk
Gnoblar
Posts: 22
Joined: Mon Oct 29, 2007 7:24 pm

Post by Psyk » Thu Apr 17, 2008 2:11 pm

Perhaps adding a note to the error message generated from this error would be useful. I'm new to Ogre (and C++ programming in general) and that error confused the hell out of me until I figured out that it was caused by a NaN/Ind/Inf. I think it would be useful for new users to add a little note saying that's probably what the problem is.

Definitely don't remove the error. I hate to think what other bugs my code would have caused if this hadn't caught them.
0 x

hoekstra_peter
Gnoblar
Posts: 1
Joined: Tue Apr 22, 2008 7:50 pm

Post by hoekstra_peter » Tue Apr 22, 2008 8:14 pm

Hi,

I got the same assert in my application. I turned on exception trapping for floating point operations. It turned out that I computed an arcsine of 1.000000348 because of a badly normalized vector. This resulted in a nan which ended up in a transformation used by Ogre. This error was difficult to reproduce, because only at very specific orientations of objects in the scene the arcsine failed.
So in my case it turned out to be a bug in my application code.
0 x

User avatar
Kojack
OGRE Moderator
OGRE Moderator
Posts: 7144
Joined: Sun Jan 25, 2004 7:35 am
Location: Brisbane, Australia
x 12

Post by Kojack » Tue Apr 22, 2008 11:59 pm

That would be painful to track down. If you logged the value, it would probably round off the end digits and just print 1.0000 or something, which would look fine.
0 x

Psyk
Gnoblar
Posts: 22
Joined: Mon Oct 29, 2007 7:24 pm

Post by Psyk » Tue May 13, 2008 1:36 pm

Would it be possible to get the error message to report the name of the object that caused the error? Ultimately it's most probably caused by our own code, but finding out exactly what caused it can be a total nightmare. If it's a simple matter, this would greatly help debugging.
0 x

User avatar
cdleonard
Goblin
Posts: 266
Joined: Thu May 31, 2007 9:45 am

Post by cdleonard » Tue May 13, 2008 2:22 pm

It would be nice and quite easy to catch NaN's higher up the stack.

For instance Ogre::Node::setPosition and friends could all test that their parameters don't contain strange floats. Those checks could be skipped from _functions used internally by ogre every frame in order to avoid slowing down too much. But debug mode is slow anyway.
0 x

User avatar
Frenetic
Bugbear
Posts: 806
Joined: Fri Feb 03, 2006 7:08 am

Post by Frenetic » Tue May 13, 2008 5:40 pm

cdleonard wrote:It would be nice and quite easy to catch NaN's higher up the stack.

For instance Ogre::Node::setPosition and friends could all test that their parameters don't contain strange floats.
Intriguing. This sounds like a job for assert() and std::numeric_limits!

Kojack's suggestion sounds pretty good too, but it's not cross-platform and requires a recompile of Ogre.
0 x

User avatar
eugen
OGRE Expert User
OGRE Expert User
Posts: 1422
Joined: Sat May 22, 2004 5:28 am
Location: Bucharest
Contact:

Post by eugen » Fri May 23, 2008 1:37 am

One more update about this issue
I encountered it into another situation and the error seems to be from Matrix4::transformAffine(vector) where vector is 0, 0, 0. The result is a vector of ind values.
Now, the matrix for which affine is called is screwed having some good values and some nan values also. Is the result of

Code: Select all

mWorldAABB.transformAffine(_getParentNodeFullTransform());
in

Code: Select all

const AxisAlignedBox& MovableObject::getWorldBoundingBox(bool derive) const

Code: Select all

this is the method's code
        if (derive)
        {
            mWorldAABB = this->getBoundingBox();
            mWorldAABB.transformAffine(_getParentNodeFullTransform());
        }
and the bounding box method raises good values, then _getParentNodeFullTransform() just returns a matrix with bad values

I just checked out some more and it seems Vector3 default constructor doesnt explicitely initialize the x y and z. The scale value of a node is having nan values after creation althou it is initialized in the Node constructor with scale unit vector
0 x

Post Reply