Future of Math in Ogre
-
Kojack
- OGRE Moderator

- Posts: 7157
- Joined: Sun Jan 25, 2004 7:35 am
- Location: Brisbane, Australia
- x 538
Re: Future of Math in Ogre
Even including Eigen/core isn't enough.
That gives you matrix and vector math, but if you want to find the inverse of a matrix you need to include Eigen/LU (which includes 11 more headers).
For matrix identity and quaternions you need Eigen/Geometry (includes another 18 headers).
I found that out because I've just ported Ogre over to using Eigen's 4x4 Matrix instead of Ogre's. It's compiling now.
Matrix4 is a good one to start with because it's internal data was protected (no direct member access to fix). I replaced the internal float array with an eigen matrix then changed all the ogre matrix4 code to use eigen methods instead. From the outside the interface is still Ogre's. It's not the best way to go, but it's quicker than rewriting huge chunks of ogre. Plus I mainly want to test compile time when all this eigen stuff is included.
matrix 4 isn't used a lot in ogre, so I doubt it will affect framerate much.
Assuming it even runs.
(Yay, ogremain built ok, now for the rest of the projects...)
Edit: Everything compiles fine, but it asserts when run because of unaligned matrices (currently the d3d9rendersystem constructor is hitting the eigen assert, possibly because of the view matrix member that class contains). Using the eigen macro to align classes that contain eigen types isn't working.
That gives you matrix and vector math, but if you want to find the inverse of a matrix you need to include Eigen/LU (which includes 11 more headers).
For matrix identity and quaternions you need Eigen/Geometry (includes another 18 headers).
I found that out because I've just ported Ogre over to using Eigen's 4x4 Matrix instead of Ogre's. It's compiling now.
Matrix4 is a good one to start with because it's internal data was protected (no direct member access to fix). I replaced the internal float array with an eigen matrix then changed all the ogre matrix4 code to use eigen methods instead. From the outside the interface is still Ogre's. It's not the best way to go, but it's quicker than rewriting huge chunks of ogre. Plus I mainly want to test compile time when all this eigen stuff is included.
matrix 4 isn't used a lot in ogre, so I doubt it will affect framerate much.
Assuming it even runs.
(Yay, ogremain built ok, now for the rest of the projects...)
Edit: Everything compiles fine, but it asserts when run because of unaligned matrices (currently the d3d9rendersystem constructor is hitting the eigen assert, possibly because of the view matrix member that class contains). Using the eigen macro to align classes that contain eigen types isn't working.
-
lunkhound
- Gremlin
- Posts: 169
- Joined: Sun Apr 29, 2012 1:03 am
- Location: Santa Monica, California
- x 19
Re: Future of Math in Ogre
Very cool!Kojack wrote:I found that out because I've just ported Ogre over to using Eigen's 4x4 Matrix instead of Ogre's. It's compiling now.
(Yay, ogremain built ok, now for the rest of the projects...)
Eigen is interesting because it is the only library mentioned so far (I think?) that supports SSE, AltiVec and NEON. What's also nice is that all of the SIMD-specific code (for the most part) is separated out into a single header file for each type of SIMD called "PacketMath.h" under Eigen/src/Core/arch/*/ (replace * with SSE, AltiVec, or NEON). If nothing else these headers are an interesting resource, they have some useful looking comments about the various SIMD architectures.
-
Kojack
- OGRE Moderator

- Posts: 7157
- Joined: Sun Jan 25, 2004 7:35 am
- Location: Brisbane, Australia
- x 538
Re: Future of Math in Ogre
Damn alignment.
It seems it's not just the class that contains an eigen type that needs a special macro (which replaces all the new/delete methods), but every class that contains a class that contains an eigen type needs them too. I've just had it assert again because overlayelements contain a matrix4.
I might be able to fix most of them by changing the ogre allocators (ogre and eigen are conflicting on allocators).
The annoying thing is that since I need to edit a header to fix each assert, it means each one requires a 20min compile to see if it worked. Then another class throws an assert, so I need to paste one line of code and do another 20min compile...
I'm going to do something less painful for a while.
It seems it's not just the class that contains an eigen type that needs a special macro (which replaces all the new/delete methods), but every class that contains a class that contains an eigen type needs them too. I've just had it assert again because overlayelements contain a matrix4.
I might be able to fix most of them by changing the ogre allocators (ogre and eigen are conflicting on allocators).
The annoying thing is that since I need to edit a header to fix each assert, it means each one requires a 20min compile to see if it worked. Then another class throws an assert, so I need to paste one line of code and do another 20min compile...
I'm going to do something less painful for a while.
-
masterfalcon
- OGRE Retired Team Member

- Posts: 4270
- Joined: Sun Feb 25, 2007 4:56 am
- Location: Bloomington, MN
- x 126
Re: Future of Math in Ogre
I've been making some decent progress porting over to bullet. Almost have OgreMain compiling. It seems that our math library includes many more functions than theirs. So there would be some work in adding them and then optimizing. It also appears that they use column major rather than row major matrices(I may be reading this wrong, the screen is starting to get blurry).
-
lunkhound
- Gremlin
- Posts: 169
- Joined: Sun Apr 29, 2012 1:03 am
- Location: Santa Monica, California
- x 19
Re: Future of Math in Ogre
Nice!masterfalcon wrote:I've been making some decent progress porting over to bullet. Almost have OgreMain compiling. It seems that our math library includes many more functions than theirs. So there would be some work in adding them and then optimizing. It also appears that they use column major rather than row major matrices(I may be reading this wrong, the screen is starting to get blurry).
Which of the bullet maths are you porting to? Is it the one in src/linearMath or the one in src/vectormath?
-
masterfalcon
- OGRE Retired Team Member

- Posts: 4270
- Joined: Sun Feb 25, 2007 4:56 am
- Location: Bloomington, MN
- x 126
Re: Future of Math in Ogre
Vectormath
-
lunkhound
- Gremlin
- Posts: 169
- Joined: Sun Apr 29, 2012 1:03 am
- Location: Santa Monica, California
- x 19
Re: Future of Math in Ogre
Yay! I like the API on that one. That one is from Sony and it's very much like the one used by them internally. And it looks like someone is planning to add NEON support to it as well, though I didn't see any actual NEON intrinsics in the code under the NEON foldermasterfalcon wrote:Vectormath
-
boyamer
- Orc
- Posts: 459
- Joined: Sat Jan 24, 2009 11:16 am
- Location: Italy
- x 6
Re: Future of Math in Ogre
Just to make sure you don't miss DirectXMath and XnaMath, together they have support for SSE, NEON AND Xbox 360 math.
Into alimer engine we've integrated SIMD support based on them.
Into alimer engine we've integrated SIMD support based on them.
-
Kojack
- OGRE Moderator

- Posts: 7157
- Joined: Sun Jan 25, 2004 7:35 am
- Location: Brisbane, Australia
- x 538
Re: Future of Math in Ogre
I just found in google that Gamekit (the merging of ogre, bullet and blender, started by bullet's Erwin) already has ogre 1.9 with directxmath.
http://code.google.com/p/gamekit/source ... r=1213They are using it for software skinning and stuff.
How cross platform is directxmath though? It runs on arm/neon, but they say that's for windows rt. Does it work with gcc?
http://code.google.com/p/gamekit/source ... r=1213They are using it for software skinning and stuff.
How cross platform is directxmath though? It runs on arm/neon, but they say that's for windows rt. Does it work with gcc?
-
boyamer
- Orc
- Posts: 459
- Joined: Sat Jan 24, 2009 11:16 am
- Location: Italy
- x 6
Re: Future of Math in Ogre
Well i contact DirectX team and they say that the library works well on windows platforms, latest Windows 8 Phone and Xbox 360, i think with some tweaking would be possible to integrate into Ogre, probably we need to remove the usage of SAL operations.
-
Wolfmanfx
- OGRE Team Member

- Posts: 1525
- Joined: Fri Feb 03, 2006 10:37 pm
- Location: Austria - Leoben
- x 100
Re: Future of Math in Ogre
The dx math code is sponsored by microsoft - the added it with the wp8 patch so the gamekit guys did not added that code.
-
lunkhound
- Gremlin
- Posts: 169
- Joined: Sun Apr 29, 2012 1:03 am
- Location: Santa Monica, California
- x 19
Re: Future of Math in Ogre
From what I can see it's not open source.XNAMath.h -- SIMD C++ Math library for Windows and Xbox 360
Copyright (c) Microsoft Corp. All rights reserved.
-
Klaim
- Old One
- Posts: 2565
- Joined: Sun Sep 11, 2005 1:04 am
- Location: Paris, France
- x 56
Re: Future of Math in Ogre
[very anectdotic or revelent only in a far future]
Just wanted to point this: http://www.ogre3d.org/forums/viewtopic.php?f=1&t=75966
[/very anectdotic or revelent only in a far future]
Just wanted to point this: http://www.ogre3d.org/forums/viewtopic.php?f=1&t=75966
[/very anectdotic or revelent only in a far future]
-
xavier
- OGRE Retired Moderator

- Posts: 9481
- Joined: Fri Feb 18, 2005 2:03 am
- Location: Dublin, CA, US
- x 22
Re: Future of Math in Ogre
What is so difficult about just pulling the Ogre math classes out into a separate library? They currently "just work" and obviously the licensing is compatible...
And no I have not read the whole topic(s), but it seems from the first and last pages that this one stalled on the topic of "eigen".
Furthermore, what's the point? If an application or engine is using Ogre and it is known it will always use Ogre, then just use the math classes that Ogre provides. If there is a desire to abstract any details away from the application level, then there are any number of programming techniques to prevent the Ogre:: namespace from percolating up the dependency stack. Yes, they take more effort on the part of the application architect and programmers, but anything worth doing is worth doing right...
FWIW this topic ("a one-true-math-library to be used by all") comes up regularly on many other game-development mailing lists and message boards and nothing comes of it because middleware by necessity must provide its own low-level code to be useful. In any game engine out there (including Unreal, etc.), the "problem" of every piece of middleware having their own math library is handled by defining "opaque" interfaces and handling the translations behind the scenes. In reality, there is little performance penalty from this solution as inner-loop code (in other words, hotspots) always runs with the library's native, optimized, low-level math classes. If you are smart (or lucky), your application-level math data structures will line up with those used by one or more pieces of middleware and you can just cast back and forth rather than copy when crossing interface boundaries.
Again, I do not profess to know what the problem is here for which solutions are being presented; even so, I'm not sure that (a) it's worth spending this many pages on, and (b) likely can't be "solved" in any widely-accepted practical fashion.
And no I have not read the whole topic(s), but it seems from the first and last pages that this one stalled on the topic of "eigen".
Furthermore, what's the point? If an application or engine is using Ogre and it is known it will always use Ogre, then just use the math classes that Ogre provides. If there is a desire to abstract any details away from the application level, then there are any number of programming techniques to prevent the Ogre:: namespace from percolating up the dependency stack. Yes, they take more effort on the part of the application architect and programmers, but anything worth doing is worth doing right...
FWIW this topic ("a one-true-math-library to be used by all") comes up regularly on many other game-development mailing lists and message boards and nothing comes of it because middleware by necessity must provide its own low-level code to be useful. In any game engine out there (including Unreal, etc.), the "problem" of every piece of middleware having their own math library is handled by defining "opaque" interfaces and handling the translations behind the scenes. In reality, there is little performance penalty from this solution as inner-loop code (in other words, hotspots) always runs with the library's native, optimized, low-level math classes. If you are smart (or lucky), your application-level math data structures will line up with those used by one or more pieces of middleware and you can just cast back and forth rather than copy when crossing interface boundaries.
Again, I do not profess to know what the problem is here for which solutions are being presented; even so, I'm not sure that (a) it's worth spending this many pages on, and (b) likely can't be "solved" in any widely-accepted practical fashion.
-
Klaim
- Old One
- Posts: 2565
- Joined: Sun Sep 11, 2005 1:04 am
- Location: Paris, France
- x 56
Re: Future of Math in Ogre
As pointed in firsts posts in this thread, there have been another discussion where "pulling the Ogre math classes out into a separate library" have been tried and (so far) failed: http://www.ogre3d.org/forums/viewtopic.php?f=4&t=73101xavier wrote:What is so difficult about just pulling the Ogre math classes out into a separate library? They currently "just work" and obviously the licensing is compatible...
And no I have not read the whole topic(s), but it seems from the first and last pages that this one stalled on the topic of "eigen".
In my current project, sure I use Ogre math constructs in most graphic related code. But there is an isolated library (dll/so) that need to use math constructs but should never link with Ogre (for specific design purpose). There is no reason to do so anyway, but the data it provide still will be graphically interpreted at some point so there will be tons of conversions from custom maths constructs to Ogre maths constructs.Furthermore, what's the point? If an application or engine is using Ogre and it is known it will always use Ogre, then just use the math classes that Ogre provides.
Obviously if there was an easy way to move ogre code around, that would be cool, but apparently that's not as easy as it sounds.
-
Kojack
- OGRE Moderator

- Posts: 7157
- Joined: Sun Jan 25, 2004 7:35 am
- Location: Brisbane, Australia
- x 538
Re: Future of Math in Ogre
If we reduce dependencies (as I mentioned in the other thread, just compiling ogrequaternion.cpp pulls in over 2000 headers. A lot are the same due to recursive dependencies, but they are still there) and change the math code to be header only (like eigen) then that should fix most of the problems. No extra library (code will be inlined, no export problems) and easy to use in other projects.
At least as a first step. That gives other projects something easy to use with ogre compatibility, without drastically changing anything here. Perhaps there's some performance critical stuff in ogre that could use a small subset of eigen locally, such as for software skinning or something. We should really be profiling for that instead of just guessing though.
At least as a first step. That gives other projects something easy to use with ogre compatibility, without drastically changing anything here. Perhaps there's some performance critical stuff in ogre that could use a small subset of eigen locally, such as for software skinning or something. We should really be profiling for that instead of just guessing though.
-
xavier
- OGRE Retired Moderator

- Posts: 9481
- Joined: Fri Feb 18, 2005 2:03 am
- Location: Dublin, CA, US
- x 22
Re: Future of Math in Ogre
On this specific topic:Kojack wrote:as I mentioned in the other thread, just compiling ogrequaternion.cpp pulls in over 2000 headers.
Most of the dependencies pulled in are due to inverted dependency problems -- lower-level code like math classes should *always* be on the bottom and never depend on anything else but themselves. So for example, the whole serialization thing (which is probably for XML and/or script use?) should be farther up the dependency chain and the bits that need math serialization should depend on that instead. I saw from Kojack's post that Quaternion depends on Camera -- this is just poor design a long time ago, but this sort of thing should be straightforward to fix if Frustum is considered an aggregate mathematical construct (it's just N planes, correct?). I haven't looked at the rest but I suspect the rest are likely of the same sort.
Most of any refactoring work in any large software project revolves around fixing dependency issues like these. Agile methods suggest constant refactoring as a core feature, and they do it for a reason.
Most of the problems I've run into with Ogre code in the past is due to these sorts of dependency problems that were never fixed (yes, I am looking at you, Entity bone assignment initialization). If these problems are fixed (and this is not a fast process, expect it to take several months to a couple of years to get it right), then based on my experience you'll find that adding new (or currently requested) features later becomes almost startlingly easy. In addition to being able to extract the math code into its own library, of course (to get back on topic).
-
Mikachu
- Gnoll
- Posts: 603
- Joined: Thu Jul 28, 2005 4:11 pm
- Location: Nice, France
- x 35
Re: Future of Math in Ogre
Still on this specific topic :xavier wrote:Kojack wrote:
as I mentioned in the other thread, just compiling ogrequaternion.cpp pulls in over 2000 headers.
On this specific topic:
Most of the dependencies pulled in are due to inverted dependency problems -- lower-level code like math classes should *always* be on the bottom and never depend on anything else but themselves. So for example, the whole serialization thing (which is probably for XML and/or script use?) should be farther up the dependency chain and the bits that need math serialization should depend on that instead. I saw from Kojack's post that Quaternion depends on Camera -- this is just poor design a long time ago, but this sort of thing should be straightforward to fix if Frustum is considered an aggregate mathematical construct (it's just N planes, correct?). I haven't looked at the rest but I suspect the rest are likely of the same sort.
Actually, OgreQuaternion.cpp only depends on a few math headers, the massive header pulling only comes from precompiled headers (see OgreStableHeaders.h).
Unfortunately, precompiled headers are rather monolithic, so no refactoring could change that (appart from putting maths in its own lib)
OgreProcedural - Procedural Geometry for Ogre3D
-
xavier
- OGRE Retired Moderator

- Posts: 9481
- Joined: Fri Feb 18, 2005 2:03 am
- Location: Dublin, CA, US
- x 22
Re: Future of Math in Ogre
Any compiler will give you the header tree for a source file being compiled. I assume this is where Kojack got the 2000+ number. This is independent of use of PCH -- if you include header X which includes header Y and so on, PCH or not, there is a dependency chain, so this is still a physical dependency problem that needs disentangled.Mikachu wrote:Still on this specific topic :xavier wrote:Kojack wrote:
as I mentioned in the other thread, just compiling ogrequaternion.cpp pulls in over 2000 headers.
On this specific topic:
Most of the dependencies pulled in are due to inverted dependency problems -- lower-level code like math classes should *always* be on the bottom and never depend on anything else but themselves. So for example, the whole serialization thing (which is probably for XML and/or script use?) should be farther up the dependency chain and the bits that need math serialization should depend on that instead. I saw from Kojack's post that Quaternion depends on Camera -- this is just poor design a long time ago, but this sort of thing should be straightforward to fix if Frustum is considered an aggregate mathematical construct (it's just N planes, correct?). I haven't looked at the rest but I suspect the rest are likely of the same sort.
Actually, OgreQuaternion.cpp only depends on a few math headers, the massive header pulling only comes from precompiled headers (see OgreStableHeaders.h).
Unfortunately, precompiled headers are rather monolithic, so no refactoring could change that (appart from putting maths in its own lib)
In fact, if the dependency problem in Ogre is cleaned up and refactoring done properly, there is little reason to use PCH since they are often used only when header bloat becomes a compile time (in seconds/minutes) issue.
-
blunted2night
- Gnoblar
- Posts: 3
- Joined: Tue Mar 18, 2008 5:34 am
Re: Future of Math in Ogre
Forgive me if I am way off base, but is it unreasonable to expect users to have a "latest generation" compiler if they wan't a high performance "latest generation" experience from their code? I say this because with Visual C++ 2010 and above with the right compiler settings, it will automatically use SSE instructions where appropriate.
As an example, I created a simple vector class, and a sample to use it, then did an optimized build:
Then I started it up in the debugger and looked at the disassembly.
This code was generated by VS2012.1, but I first discovered this pattern produced SSE code when I wrote a more comprehensive library in VS2010.
I haven't checked in a while, but a while back there was a lot of buzz about auto-vectorization in the llvm forums, and I believe recent versions of GCC also support something like this.
As an example, I created a simple vector class, and a sample to use it, then did an optimized build:
Code: Select all
#include <iostream>
template <typename type_t, size_t Rank>
struct vector
{
typedef vector <type_t, Rank> vector_t;
type_t Data [Rank];
};
template <typename type_t, size_t Rank, typename operation_t>
vector <type_t, Rank> binary_operation (vector <type_t, Rank> const & l, vector <type_t, Rank> const & r, operation_t Operation)
{
vector <type_t, Rank> v;
for (int i = 0; i < Rank; ++i)
v.Data [i] = Operation (l.Data [i], r.Data [i]);
return v;
}
template <typename type_t> type_t add (type_t l, type_t r) { return l+r; }
template <typename type_t> type_t sub (type_t l, type_t r) { return l-r; }
template <typename type_t> type_t mul (type_t l, type_t r) { return l*r; }
template <typename type_t> type_t div (type_t l, type_t r) { return l/r; }
template <typename type_t> type_t mod (type_t l, type_t r) { return l%r; }
template <typename type_t, size_t Rank> vector <type_t, Rank> operator + (vector <type_t, Rank> const & l, vector <type_t, Rank> const & r) { return binary_operation (l, r, add <type_t>); }
template <typename type_t, size_t Rank> vector <type_t, Rank> operator - (vector <type_t, Rank> const & l, vector <type_t, Rank> const & r) { return binary_operation (l, r, sub <type_t>); }
template <typename type_t, size_t Rank> vector <type_t, Rank> operator * (vector <type_t, Rank> const & l, vector <type_t, Rank> const & r) { return binary_operation (l, r, mul <type_t>); }
template <typename type_t, size_t Rank> vector <type_t, Rank> operator / (vector <type_t, Rank> const & l, vector <type_t, Rank> const & r) { return binary_operation (l, r, div <type_t>); }
template <typename type_t, size_t Rank> vector <type_t, Rank> operator % (vector <type_t, Rank> const & l, vector <type_t, Rank> const & r) { return binary_operation (l, r, mod <type_t>); }
int main(int argc, char * argv[])
{
typedef vector <float, 4> float4;
auto input = [] () -> float4 {
float4 v;
for (int i = 1; i <= 4; ++i)
std::cin >> v.Data [i];
return v;
};
auto output = [] (float4 v) {
for (int i = 0; i < 4; ++i)
std::cout << v.Data [i];
};
auto a = input ();
auto b = input ();
auto c = input ();
auto d = input ();
auto r = (a+b)/c+d;
output (r);
return 0;
}
Code: Select all
int main(int argc, char * argv[])
{
00BE1270 push ebp
00BE1271 mov ebp,esp
00BE1273 and esp,0FFFFFFF0h
00BE1276 sub esp,68h
00BE1279 mov eax,dword ptr ds:[00BE4018h]
00BE127E xor eax,esp
00BE1280 mov dword ptr [esp+64h],eax
00BE1284 push esi
00BE1285 push edi
typedef vector <float, 4> float4;
auto input = [] () -> float4 {
float4 v;
for (int i = 1; i <= 4; ++i)
std::cin >> v.Data [i];
return v;
};
auto output = [] (float4 v) {
for (int i = 0; i < 4; ++i)
std::cout << v.Data [i];
};
auto a = input ();
00BE1286 lea esi,[esp+60h]
00BE128A mov edi,4
00BE128F nop
00BE1290 mov ecx,dword ptr ds:[0BE303Ch]
00BE1296 push esi
00BE1297 call dword ptr ds:[0BE3028h]
00BE129D add esi,4
typedef vector <float, 4> float4;
auto input = [] () -> float4 {
float4 v;
for (int i = 1; i <= 4; ++i)
std::cin >> v.Data [i];
return v;
};
auto output = [] (float4 v) {
for (int i = 0; i < 4; ++i)
std::cout << v.Data [i];
};
auto a = input ();
00BE12A0 dec edi
00BE12A1 jne main+20h (0BE1290h)
auto b = input ();
00BE12A3 lea esi,[esp+50h]
00BE12A7 mov edi,4
00BE12AC lea esp,[esp]
00BE12B0 mov ecx,dword ptr ds:[0BE303Ch]
00BE12B6 push esi
00BE12B7 call dword ptr ds:[0BE3028h]
00BE12BD add esi,4
00BE12C0 dec edi
00BE12C1 jne main+40h (0BE12B0h)
auto c = input ();
00BE12C3 lea esi,[esp+30h]
00BE12C7 mov edi,4
00BE12CC lea esp,[esp]
00BE12D0 mov ecx,dword ptr ds:[0BE303Ch]
00BE12D6 push esi
00BE12D7 call dword ptr ds:[0BE3028h]
00BE12DD add esi,4
00BE12E0 dec edi
00BE12E1 jne main+60h (0BE12D0h)
auto d = input ();
00BE12E3 lea esi,[esp+40h]
00BE12E7 mov edi,4
00BE12EC lea esp,[esp]
00BE12F0 mov ecx,dword ptr ds:[0BE303Ch]
00BE12F6 push esi
00BE12F7 call dword ptr ds:[0BE3028h]
00BE12FD add esi,4
00BE1300 dec edi
00BE1301 jne main+80h (0BE12F0h)
auto r = (a+b)/c+d;
00BE1303 vmovups xmm0,xmmword ptr [esp+5Ch]
00BE1309 vaddps xmm0,xmm0,xmmword ptr [esp+4Ch]
00BE130F vdivps xmm0,xmm0,xmmword ptr [esp+2Ch]
00BE1315 vaddps xmm0,xmm0,xmmword ptr [esp+3Ch]
00BE131B vmovaps xmmword ptr [esp+10h],xmm0
output (r);
00BE1321 vmovdqa xmm0,xmmword ptr [esp+10h]
00BE1327 vmovdqa xmmword ptr [esp+10h],xmm0
00BE132D xor esi,esi
00BE132F nop
00BE1330 vmovss xmm0,dword ptr [esp+esi*4+10h]
00BE1336 push ecx
00BE1337 mov ecx,dword ptr ds:[0BE3038h]
00BE133D vmovss dword ptr [esp],xmm0
00BE1342 call dword ptr ds:[0BE3024h]
00BE1348 inc esi
00BE1349 cmp esi,4
00BE134C jl main+0C0h (0BE1330h)
return 0;
}
00BE134E mov ecx,dword ptr [esp+6Ch]
00BE1352 pop edi
00BE1353 pop esi
00BE1354 xor ecx,esp
00BE1356 xor eax,eax
00BE1358 call __security_check_cookie (0BE17ADh)
00BE135D mov esp,ebp
00BE135F pop ebp
00BE1360 ret
I haven't checked in a while, but a while back there was a lot of buzz about auto-vectorization in the llvm forums, and I believe recent versions of GCC also support something like this.
-
masterfalcon
- OGRE Retired Team Member

- Posts: 4270
- Joined: Sun Feb 25, 2007 4:56 am
- Location: Bloomington, MN
- x 126
Re: Future of Math in Ogre
The problem is that you can't rely on the quality of the code generated by the compiler. Or the consistency between compilers. So either using straight assembly or intrinsics is the best way to go.
-
xavier
- OGRE Retired Moderator

- Posts: 9481
- Joined: Fri Feb 18, 2005 2:03 am
- Location: Dublin, CA, US
- x 22
Re: Future of Math in Ogre
...to ensure optimal code generation for a given architecture.masterfalcon wrote:is the best way to go.
@blunted2night:
Auto-vectorizing compilers (which is to say, pretty much all of the big three for sure) do the best they can, but they have to adhere to conservative guidelines about what can and cannot be vectorized. And often seemingly unrelated code changes will stop code from vectorizing that had previously done so, so relying on auto-vectorization is a hit-and-miss proposition. Furthermore, compilers that support language pragmas to direct vectorization of blocks or loops, can be made to generate less-efficient code by misuse or improper application of these pragmas.
Finally, there are things/tricks one can do with intrinsics, that can't be done with auto-vectorization. For instance, one of the first things I ever did with Xeon Phi (formerly Knights Corner, formerly Larrabee) back in the Larrabee days was to implement a 4-weight skinning algorithm in 10 instructions. The compiler would have no hope at all of generating the equivalent assembly from the reference version, that it could from the optimized/intrinsics version, as I was taking advantage of knowledge about data layout and semantic meaning of various parts of the vector and matrix arrays that the compiler can't possibly have.
-
blunted2night
- Gnoblar
- Posts: 3
- Joined: Tue Mar 18, 2008 5:34 am
Re: Future of Math in Ogre
I don't know much of the details, but it seems to me that with a little bit of discipline, and a library that pushed you in the right direction, the compiler could produce much more efficient code than an average programmer who doesn't have the time to workout all interactions that would affect efficient use of SIMD intrinsics. As a bonus, a library like that would automatically benefit from new instructions sets, and compiler optimization techniques.xavier wrote:Auto-vectorizing compilers (which is to say, pretty much all of the big three for sure) do the best they can, but they have to adhere to conservative guidelines about what can and cannot be vectorized. And often seemingly unrelated code changes will stop code from vectorizing that had previously done so, so relying on auto-vectorization is a hit-and-miss proposition. Furthermore, compilers that support language pragmas to direct vectorization of blocks or loops, can be made to generate less-efficient code by misuse or improper application of these pragmas.
These tricks sound outside the realm of a generic math library which would probably impede these types of ultimate optimizations.xavier wrote:Finally, there are things/tricks one can do with intrinsics, that can't be done with auto-vectorization. For instance, one of the first things I ever did with Xeon Phi (formerly Knights Corner, formerly Larrabee) back in the Larrabee days was to implement a 4-weight skinning algorithm in 10 instructions. The compiler would have no hope at all of generating the equivalent assembly from the reference version, that it could from the optimized/intrinsics version, as I was taking advantage of knowledge about data layout and semantic meaning of various parts of the vector and matrix arrays that the compiler can't possibly have.
-
xavier
- OGRE Retired Moderator

- Posts: 9481
- Joined: Fri Feb 18, 2005 2:03 am
- Location: Dublin, CA, US
- x 22
Re: Future of Math in Ogre
Given that optimizing customer code is what I do on a daily basis for Intel, including altering code to make it more vector-friendly and spending countless hours with the compiler engineers trying to sort out why a particular bit of code won't vectorize, I can assure you it's not as simple in practice as it sounds.blunted2night wrote:I don't know much of the details, but it seems to me that with a little bit of discipline, and a library that pushed you in the right direction, the compiler could produce much more efficient code than an average programmer who doesn't have the time to workout all interactions that would affect efficient use of SIMD intrinsics. As a bonus, a library like that would automatically benefit from new instructions sets, and compiler optimization techniques.xavier wrote:Auto-vectorizing compilers (which is to say, pretty much all of the big three for sure) do the best they can, but they have to adhere to conservative guidelines about what can and cannot be vectorized. And often seemingly unrelated code changes will stop code from vectorizing that had previously done so, so relying on auto-vectorization is a hit-and-miss proposition. Furthermore, compilers that support language pragmas to direct vectorization of blocks or loops, can be made to generate less-efficient code by misuse or improper application of these pragmas.
Believe me, with the introduction of Intel® Xeon Phi™, maximal vectorization with minimal effort is a hot topic right now.
There is only so much useful optimization you can do strictly within the bounds of the math code itself. Most of the optimization is done within code that uses the math library. Witness the software skinning code in Ogre, as a prime example. Additionally, code that might vectorize fine in a math-library-unit-test could possibly stop doing so once inlined in calling code. For longer math-specific routines (such as matrix inversion) you can stick it all behind an API call and prevent inlining, but I think you are talking more about the "add two vectors" sort of thing.These tricks sound outside the realm of a generic math library which would probably impede these types of ultimate optimizations.xavier wrote:Finally, there are things/tricks one can do with intrinsics, that can't be done with auto-vectorization. For instance, one of the first things I ever did with Xeon Phi (formerly Knights Corner, formerly Larrabee) back in the Larrabee days was to implement a 4-weight skinning algorithm in 10 instructions. The compiler would have no hope at all of generating the equivalent assembly from the reference version, that it could from the optimized/intrinsics version, as I was taking advantage of knowledge about data layout and semantic meaning of various parts of the vector and matrix arrays that the compiler can't possibly have.
Auto-vectorization also requires specific data alignment restrictions, and you can't assume that calling code will provide you with aligned data. This alone kills vectorization without further consideration (even if the code itself ought to vectorize in all cases). Ditto issues with aliasing. One cannot simply put pragmas in a math routine that assert that all data it works on is both aligned and non-aliased. So the "add two vectors" sort of thing won't vectorize anyway, and I am not aware of many internal-to-math-libary routines that work on long enough arrays that loop vectorization with peel-and-remainder prologue/epilogue would be useful.
In general, what happens is that the compiler either cannot (for correctness reasons) or will not (for cost-model reasons) vectorize a given piece of code, and will simply generate the scalar version.
-
blunted2night
- Gnoblar
- Posts: 3
- Joined: Tue Mar 18, 2008 5:34 am
Re: Future of Math in Ogre
You seem quite knowledgeable about the current state of the art.
This is off topic, but are you affiliated with the LLVM project in any way?xavier wrote:Given that optimizing customer code is what I do on a daily basis for Intel
