Future of Math in Ogre

Discussion area about developing with Ogre-Next (2.1, 2.2 and beyond)


User avatar
xavier
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 9481
Joined: Fri Feb 18, 2005 2:03 am
Location: Dublin, CA, US
x 22

Re: Future of Math in Ogre

Post by xavier »

blunted2night wrote:You seem quite knowledgeable about the current state of the art.
Hehe, until this job I didn't care one way or another about the subject...lots of back and forth with the compiler guys taught me much more than I ever wanted to know about vectorizing compilers too. ;)
xavier wrote:Given that optimizing customer code is what I do on a daily basis for Intel
This is off topic, but are you affiliated with the LLVM project in any way?
No, but the person who is doing the LLVM vectorizer used to work for Intel on the OpenCL compiler. I've written systems that use LLVM and I follow the llvm-dev list, but I am not a direct contributor.
Do you need help? What have you tried?

Image

Angels can fly because they take themselves lightly.
User avatar
Kojack
OGRE Moderator
OGRE Moderator
Posts: 7157
Joined: Sun Jan 25, 2004 7:35 am
Location: Brisbane, Australia
x 535

Re: Future of Math in Ogre

Post by Kojack »

Optimising maths code for a Xeon Phi is going to be an interesting experience. 60 cores, each with a 512 bit vector unit and 4 hardware threads...
User avatar
xavier
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 9481
Joined: Fri Feb 18, 2005 2:03 am
Location: Dublin, CA, US
x 22

Re: Future of Math in Ogre

Post by xavier »

Kojack wrote:Optimising maths code for a Xeon Phi is going to be an interesting experience. 60 cores, each with a 512 bit vector unit and 4 hardware threads...
It's actually not a whole lot different than optimizing for Xeon -- the same caveats and best-methods apply, and indeed, there's not much point in running a workload on a Phi until you have it fully utilizing all of the threads and vector lanes on a normal CPU (Xeon or otherwise). Unless of course, one likes their single-threaded scalar code to run at about 1/3 the clock speed and leave 15/16 of the core's compute power on the table. ;)

Dr. Dobbs (yes, they are still around ;)) has a bunch of good articles on Xeon Phi programming, such as http://www.drdobbs.com/parallel/cuda-vs ... /240144545. The Intel MIC software dev site is another good place to learn more: http://software.intel.com/mic-developer

But yes, it's not just a matter of recompiling for a new architecture -- there is a high premium on getting all types of parallelism out of code and data for best performance on Xeon Phi.
Do you need help? What have you tried?

Image

Angels can fly because they take themselves lightly.
User avatar
xavier
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 9481
Joined: Fri Feb 18, 2005 2:03 am
Location: Dublin, CA, US
x 22

Re: Future of Math in Ogre

Post by xavier »

So, after all that...

http://glm.g-truc.net/api-0.9.4/index.html

I haven't gone through it all to see if it's specific to OpenGL's data layout, but it seems to be what a lot of folks are asking for. And the license is rather compatible as well...
Do you need help? What have you tried?

Image

Angels can fly because they take themselves lightly.
User avatar
Kojack
OGRE Moderator
OGRE Moderator
Posts: 7157
Joined: Sun Jan 25, 2004 7:35 am
Location: Brisbane, Australia
x 535

Re: Future of Math in Ogre

Post by Kojack »

We talked a little about GLM back on page 3, plus in the roadmap thread earlier where I was skeptical of it due to:
- adds 223 heavily templated files (almost as many headers/inline files as ogremain itself)
- american spelling conventions
- will slow down compile time due to template use (the GLM docs say this)
- not designed for performance (the GLM docs say this. They say that people should implement their own maths for performance critical code)
But it does look pretty interesting.
User avatar
xavier
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 9481
Joined: Fri Feb 18, 2005 2:03 am
Location: Dublin, CA, US
x 22

Re: Future of Math in Ogre

Post by xavier »

Kojack wrote:We talked a little about GLM back on page 3
Sorry, then phpbb's search sucks more than I thought possible. ;) I didn't feel up to trawling through 5 pages of discussion, though. :)
Do you need help? What have you tried?

Image

Angels can fly because they take themselves lightly.
User avatar
Zonder
Ogre Magi
Posts: 1172
Joined: Mon Aug 04, 2008 7:51 pm
Location: Manchester - England
x 76

Re: Future of Math in Ogre

Post by Zonder »

xavier wrote:
Sorry, then phpbb's search sucks more than I thought possible. ;)
Might need it's index rebuilding
There are 10 types of people in the world: Those who understand binary, and those who don't...
User avatar
Kojack
OGRE Moderator
OGRE Moderator
Posts: 7157
Joined: Sun Jan 25, 2004 7:35 am
Location: Brisbane, Australia
x 535

Re: Future of Math in Ogre

Post by Kojack »

GLM is three letters, phpbb search usually can't handle less than 4 (which sucks here because graphics programming is filled with TLA).
:)
User avatar
Klaim
Old One
Posts: 2565
Joined: Sun Sep 11, 2005 1:04 am
Location: Paris, France
x 56

Re: Future of Math in Ogre

Post by Klaim »

I started to use this library in my code for the non-Ogre related part. So far no problem, easy to use but I don't do much heavy mathematical work there, mostly positionning stuffs.
I think the part that will make me see if the syntax is good will come when I will have to work on rotations interpolation with their quaternions (I don't have up and down in my game so I think I might have some fun).
PhilipLB
Google Summer of Code Student
Google Summer of Code Student
Posts: 550
Joined: Thu Jun 04, 2009 5:07 pm
Location: Berlin
x 108

Re: Future of Math in Ogre

Post by PhilipLB »

A nice article about platform independend SIMD math: http://www.gamedev.net/page/resources/_ ... cache=true
Google Summer of Code 2012 Student
Topic: "Volume Rendering with LOD aimed at terrain"
Project links: Project thread, WIKI page, Code fork for the project
Mentor: Mattan Furst


Volume GFX, accepting donations.
jpho
Gnoblar
Posts: 10
Joined: Fri Oct 19, 2012 5:22 pm

Re: Future of Math in Ogre

Post by jpho »

Did anyone ever evaluate Eigen? It's a great math library for all kinds of vector and matrix stuff; heavily optimized and uses SIMD, but comes with lots of testing and an interface that is intuitive and obvious. It's open source with a permissive license and used already in lots of different projects.

http://eigen.tuxfamily.org/index.php?title=Main_Page
User avatar
Kojack
OGRE Moderator
OGRE Moderator
Posts: 7157
Joined: Sun Jan 25, 2004 7:35 am
Location: Brisbane, Australia
x 535

Re: Future of Math in Ogre

Post by Kojack »

jpho wrote:Did anyone ever evaluate Eigen?
The first two pages of this thread are about eigen. :)
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5476
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1358

Re: Future of Math in Ogre

Post by dark_sylinc »

I just found out about this thread and been reading the posts.

As you all may know, the situation has changed dramatically.

Ogre 2.0 uses now our own SIMD library and a SoA arrangement. We can't reuse other libs because our SoA memory arrangement is, to my knowledge, different from any other approach seen in the industry (XXXXYYYYZZZZ instead of XYZ_XYZ_XYZ_ or three streams of X, Y & Z)

I too have been wondering if it would be worth to SIMDify Vector3 & Quaternion; and the more I analyze it, the more it looks like a waste of time:
What used to be our major hotspots were fixed (using our ArrayMath library). The rest of usage of Vector3 Quaternion & Matrix4 is too scattered to be of any improvement.

Furthermore SIMD math libraries need an usage pattern/philosophy. Often it's faster to do myVector += SimdVector3::UNIT_Y * 3.0f; than doing myVector.y += 3.0f. But Ogre wasn't designed with this in mind, and has direct variable access all over its code base.

For example, the following bit of code is excruciatingly slow (I'm assuming SSE2):

Code: Select all

myVector = a + b;
myVector.y += 3.0f;
myVector = myVector * c;
What's the problem? xyzw (w is unused) is stored in an xmm register using movaps. Same with 'a' & 'b'. This can translate to a few movaps, then an addps instruction.
When myVector.y is accessed, we need to move the Y component. If we're extremely smart (and some help from the compiler), this may get translated to a shufps instruction (but will add register pressure). Else, the compiler will translate the code to:

Code: Select all

movaps [tmpMemory], xmm0
movss xmm0, [tmpMemory+4]
Accessing memory that you just saved is known as "store to load forwarding". Basically, the CPU knows this memory transfer hasn't happened yet (let's remember CPUs are pipelined) so it looks in it's pipe and takes the value from there; rather than getting it from the cache or main RAM. It's a huge performance optimization... that works when you save and read the same amount of memory.
In this case, we're storing with a 128-bit memory move and reading with a 32-bit memory load. Like Fabian Giesen said in his blog, latest Intel architectures are able to gracefully handle this situation. So let's assume there is no performance hit (even though this is barely true).

However then we do the addition. And right after that, we perform a multiplication in simd form again (myVector * c); thus the assembly will look like this:

Code: Select all

addss xmm0, 3  //.y += 3
movss [tmpMemory+4], xmm0
movaps xmm0, [tmpMemory]
mulps xmm0, xmm2 //assuming xmm2 contains c.
There's no chance store to load forwarding will work in this scenario. We're reading a 128-bit value from storing 32-bit values. The pipeline will stall. Any performance benefit you hoped to gain from using SIMD went down the drain.
This is the reason you'll see some SIMD math libraries put their __m128 variables as protected; rather than public access. So whenever you want to access x, y or z; you have to call getScalar().x() or similar. And if you think that looks ugly, it is ugly.

So, when refactoring Vector3/Quaternion to use SIMD, one has to take stuff like this in mind (we need to refactor it's usage too all over Ogre code). And IMHO is not worth it.
Still it may be nice to have another simd implementation so that it can start deprecating the old one. And for the Vector3, I still made a couple modifications so that it uses maxss & minss instructions whenever possible.

There's one big exception I think that may be worth the trouble: Matrix4.
The RenderQueue (or the AutoParams class) performs matrix concatenations too often to send to the vertex & pixel shader (world-view matrix, world-view-proj matrix, etc). The more Entities you have, the more concatenations. Do you use shadows? then even more concatenations.
Matrices rarely have their individual components accessed directly (and when it is, it's often unavoidable) and are prime candidate for SIMDification (I made up that word).
Futhermore matrix4 concatenations are almost never against itself (i.e. mat = mat * mat) which is the reason I put RESTRICT_ALIAS in ArrayMatrix4's concatenation code (plus an assert to check this never happens in debug mode).
If you've ever written matrix concatenation code in assembly, then you'll know there's a lot to gain from knowing that the pointers in memory aren't related (otherwise you're forced to copy the entire matrix into a temporary memory region).

I think making Matrix4 SIMD & RESTICT_ALIAS is worth the shot. Furthermore we can then investigate into using 4 movntps instructions to move Matrices, which is very fast (I already do that to move the Matrices from Instancing implementations to GPU buffers and noticed a few milliseconds less; though I may need to revisit this later as D3D9 does not guarantee the memory is 16-byte aligned, but D3D11 does, and GL 2.1 does too if GL_ARB_map_buffer_alignment is present).

While I was talking from an x86/x64 perspective, it is still very important for other platforms (i.e. ARM, PPC) since they usually don't even have store to load forwarding, they just stall.
Owen
Google Summer of Code Student
Google Summer of Code Student
Posts: 91
Joined: Mon May 01, 2006 11:36 am
x 21

Re: Future of Math in Ogre

Post by Owen »

Surely aliasing should be irrelevant for the operator* case, being as a temporary is implied?

Now, operator *=, on the other hand...
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5476
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1358

Re: Future of Math in Ogre

Post by dark_sylinc »

Owen wrote:Surely aliasing should be irrelevant for the operator* case, being as a temporary is implied?

Now, operator *=, on the other hand...
Yes. However a closer look, we override the operator*:

The compiler can't know whether there is aliasing here:

Code: Select all

vinline Matrix4 operator * ( const Matrix4 &m2 ) const
{
    return concatenate( m2 );
}
Which is as saying "this *= m2; return *this;"
Aliasing inside concatenate is unnecessary & redundant.

And yes, we could gain some extra performance by changing many "*" operators in AutoParamDataSource to "*=" due to the "implied temporary" (I dunno if the compiler optimizer can remove that temporary away; it's quite tricky).
User avatar
Klaim
Old One
Posts: 2565
Joined: Sun Sep 11, 2005 1:04 am
Location: Paris, France
x 56

Re: Future of Math in Ogre

Post by Klaim »

SORRY I EDITED THIS MESSAGE INSTEAD OF ADDING ANOTHER ONE, I DON'T SEE ANY WAY TO GET BACK THE HISTORY D:

//Mod: Sorry, other than going through the database backups, I don't think there is.
Last edited by Klaim on Tue Nov 12, 2013 5:22 pm, edited 2 times in total.
User avatar
lunkhound
Gremlin
Posts: 169
Joined: Sun Apr 29, 2012 1:03 am
Location: Santa Monica, California
x 19

Re: Future of Math in Ogre

Post by lunkhound »

Klaim wrote: 1. GML's types being designed to match GLSL makes it very annoying or unclear on the purpose of each data object. For example, vector have x, y, z, w, a, r, g, b members and some others and there is no simple way to make compile-time difference between colors and vectors.
That sounds really bad. Actually one thing I wish the Ogre math had is distinct types for "Point3" and "Vector3". I've found it very useful with other math libraries. You can set up sensible operators so that you can't add two Point3's together, but you can add a Point3 and a Vector3 to get another Point3. And the difference of two Point3s is a Vector3.
User avatar
Klaim
Old One
Posts: 2565
Joined: Sun Sep 11, 2005 1:04 am
Location: Paris, France
x 56

Re: Future of Math in Ogre

Post by Klaim »

lunkhound wrote: That sounds really bad. Actually one thing I wish the Ogre math had is distinct types for "Point3" and "Vector3". I've found it very useful with other math libraries. You can set up sensible operators so that you can't add two Point3's together, but you can add a Point3 and a Vector3 to get another Point3. And the difference of two Point3s is a Vector3.
Nice! I didn't try a library with this in practice but I think transforming a Point using Vector s is basically like in Ogre when we transform Vector using Matrices or Quaternions. Thanks for pointing this, it might be interesting to try this in some of my work.
User avatar
Klaim
Old One
Posts: 2565
Joined: Sun Sep 11, 2005 1:04 am
Location: Paris, France
x 56

Re: Future of Math in Ogre

Post by Klaim »

PS: if someone can reverse history of message editions, please fix my previous message...
I'm considering right now extracting Ogre maths library for my specific purpose, but I'm not sure how I will proceed exactly yet.
So I tried several ways:

1. extract the code I need only, remove everything else, reorganize to match my code's guidelines
2. extract all the math-related or math-inter-dependent code, put it in a separate library, set it up to match my code's guidelines
3. extract all the math-related or math-inter-dependent code, only rename the namespace

All 3 fails because of strong inter-dependencies with allocators for example, that is, if you want to keep all algorithms used in Ogre.
There is also a lot of Ogre-specific macros which makes the extraction harder.
The aggressive in-lining don't help separating some types like angles, but without a module system in the language, well, I guess there is no other way...

So now I'm considering just using Ogre as a dependency on server-side too. Or get back to using GLM (I did the experiments in an experimental branch).