Strange Ogre <-> Newton interference

Problems building or running the engine, queries about how to use features etc.
User avatar
Marc
Gremlin
Posts: 182
Joined: Tue Jan 25, 2005 7:56 am
Location: Germany

Strange Ogre <-> Newton interference

Post by Marc »

Hi !

I noticed a very strange interference when using Ogre and Newton Game Dynamics together in one application. It seems as if the call to
Root::Initialize() changes the result of Newtons calculations.

I first noticed it wehn I tried to develop a server-client application where only the clients use ogre for graphical output.
When I started the simulation with the same initial state and applying
the same forces at the same times, using fixed timesteps, the
simulation outcome on all clients was the same, but different from the server.

I figured out that the simulation on the server and the clients get in sync if I do not initialize the graphical output on the client side or
if I just initialize Ogre until the call of Root:Initialize().

I then investigated further and now, I've got an single app that
intitalizes a newton scene, advances it a few steps and saves the final state to a file. If I initialize Ogre in this app at the beginning, I get a different file then without initilization. I can toggle the
outcome of the simulation by just commenting in and out this one
Root::Initialize() call. Pretty strange, huh?

I've already posted this problem on the Newton forum: http://physicsengine.com/forum/viewtopic.php?t=1009 , but since Julio is away for a few days, noone seems to have a clue there.

I have no real explanaition to this effect. Any ideas? Maybe Ogre's memory manager? Maybe Ogre switches some flags of the floating point unit when initializing itself, producing slightly different results in Newton's calculations afterwards? Maybe Newton has a problem somewhere? Maybe I missed something?

Any suggestion is welcome. :)
User avatar
Marc
Gremlin
Posts: 182
Joined: Tue Jan 25, 2005 7:56 am
Location: Germany

Re: Strange Ogre <-> Newton interference

Post by Marc »

Ha! I got it!

I remembered that I red once about initialization of DirectX. It lets you specify whether it should switch the Floating Point Unit into a faster mode or not. I looked into the Ogre src and the FPU gets switched.

Doing

#include <float.h>
:
_control87(655391,0xffffffff);

in my application that doesn't open an Ogre window sets the FPU to the state it gets after Ogre is initialized. Et voila, everything is still in sync; problem solved!

By the way, using OpenGL instead of DirectX, the problem doesn't exist because initializing Opengl does not switch the FPU-state. Didn't try that earlier ...

Perhaps this is an important information for others trying to develop some Newton-Ogre/DirectX-"dedicated server"-client-thingy. ;)

A comment on plattform independancy: I'm not sure if one can also change the FPU state for a process on Linux that easy. If it is not possible, then the Linux-FPU-state will dictate the FPU-state that has to be used in an app that has to work distributed on multiple plattforms. If anyone knows how to control the fpustate on linux, I'ld like to here it ;)
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 67

Post by sinbad »

It's more that DirectX doesn't flip back and forth between double and single precision, which is supposedly faster. I guess this could be exposed as a configuration option to cope with these cases.

[edit]Yeah, this should definitely be an option, I'll add it[/edit]
User avatar
:wumpus:
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 3067
Joined: Tue Feb 10, 2004 12:53 pm
Location: The Netherlands
x 1

Post by :wumpus: »

Has anyone, ever done any real benchmarks on this? I don't think it makes much (if any) of a difference on modern processors if single or double precision FPU is used internally.
This might just be another MS legacy thing like LP for "long pointer" :D

I can confirm this for non-directx using programs as one of my float heavy scientific simulations had a switch from double/single precision. It became nearly nothing faster with single precision (less than 0.05%) on my XP3000. This was under Linux, so it's certainly possible to switch your FPU in that. (I forgot how though)
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 67

Post by sinbad »

I've added a config option in Dx9 called 'Floating-point mode', it defaults to 'Fastest' ie the status quo, but you can change it to 'Consistent' so that D3D will restore the FPU mode after it's done. I have no idea what the overhead was/is, the D3D docs just say 'Setting the flag will reduce Direct3D performance.' which could be anything from tiny to app-destroying. :) But at least you'll have the option.

This is backed up with the last 2 days changes I have swimming on my machine at the moment, I should get some quality time at the keyboard this evening to get them all in.
User avatar
Kojack
OGRE Moderator
OGRE Moderator
Posts: 7157
Joined: Sun Jan 25, 2004 7:35 am
Location: Brisbane, Australia
x 538

Post by Kojack »

Same reason that ODE compiled for doubles doesn't work that well with directx if you don't control the fpu manually or tell directx to restore the mode.

The dx9 sdk says:
Forces Direct3D to not change the floating-point unit (FPU) control word, running the pipeline using the precision of the calling thread. Without this flag, Direct3D defaults to setting the FPU to single-precision round-to-nearest mode. Using this flag with the FPU in double-precision mode will reduce Direct3D performance.
Other web sites say it and I'm 99% sure I read in earlier dx sdk's (somewhere from 6-8) that the D3DCREATE_FPU_PRESERVE flag caused every d3d function to store and restore the fpu state. For performance, it's probably better to do it once manually around the d3d renderer rather than d3d doing it every time any of it's functions are called. But the dx9 sdk makes it sound like it works differently now.


If you don't do one of them, every double float you use in any part of your process (which doesn't set the fpu precision for itself) including other libraries like ODE will only use single precision within the fpu.