Future of Math in Ogre

Discussion area about developing with Ogre-Next (2.1, 2.2 and beyond)


User avatar
Klaim
Old One
Posts: 2565
Joined: Sun Sep 11, 2005 1:04 am
Location: Paris, France
x 56

Future of Math in Ogre

Post by Klaim »

Note: Split from this original discussion: http://www.ogre3d.org/forums/viewtopic.php?f=25&t=70522

Just a quick question related to pages 2 and 3 of this discussion:
Did any decision have been taken by the Ogre team to maybe, or not, separate, more or less, the math code from OgreMain? Status Quo maybe?

I don't see any work related to this in the repository logs but I might have missed something.
IronNerd
Gnoblar
Posts: 8
Joined: Mon Apr 23, 2012 11:35 am

Re: The roadmap to 1.9 and 2.0

Post by IronNerd »

I'm also interested in the status of the math library. I have some experience with implementing SIMD math libraries. If a decision has not been reached as to how it should be done, I submit the following suggestions for the first iteration redo of the math library.

Isolation
The math library should not exist as its own component, but should be easily extracted. It should contain no references to the rest of the core and be isolated in its own folder. The entry point can be a single header file that re-defines the names currently used in terms of the implemented types and the value of Ogre::Real. Ideally, this should result in not needing to modify anything but the includes in the rest of core.

Simply Templated Types
By this, I mean that there should be one template parameter for each type that defines how it is stored. Simple templates like this do not make any serious impact on compile time and will allow for the addition of specializations that use SIMD instruction sets. This should included matrices (IE: there would be separate implementations for a 4x4 matrix and a 3x3 matrix). Matrices with templated size are very flexible and relatively easy to make in the general case, but become a serious implementation burden when using SIMD instruction sets.

Things like expression templates allow for increased execution speed, but will have too much of an impact on compile time to be worthwhile for this project. If the added performance is ever needed, they should be enabled and disabled by a #define (that would presumably only be used for final builds). Enabling/disabling/implementing advanced optimizations like this can be transparent to the rest of the project and so can be decided later when the rest of the primary features are complete.

Private Members
The data members should be private to allow for implementations in which they are not trivially extractible (such as SIMD implementations).

SIMD Abstraction Layer
For the purposes of this project's needs, most (if not all) implementation details of SIMD instructions can be hidden behind a low to no cost abstraction layer. This will encapsulate the decisions as to which instructions can be used to one section of the codebase. This comes in handy when particular instruction sets should be avoided (common in x86 applications) and when implementing support for new platforms (such as NEON support).

Misaligned Memory
For general ease of use, the initial implementation of the SIMD functionality should allow for misaligned memory. This helps keep the changes to the math library mostly isolated from the rest of the code base. This means that less than optimal performance will be achieved at first, but the changes can occur without being a general burden to the rest of the project. Performance would still be better than the naive implementations.

There are two main ways to achieve this. One would involve sizing the data structure to allow for padding before the actual data. The other, involves using misaligned load and store instructions, which are available for at least SSE and NEON. I would personally prefer to see the latter approach. If the extra performance is eventually desired, either approach can be reversed fairly easily.

No Interoperable Types
By this, I mean that a float-templated vector would not be able to be added to a double-templated vector without an explicit conversion. This makes the implementation of SIMD versions far simpler and makes the costs of conversion known to the user. Additionally, this functionality is not in the current library and, I assume, is not depended on anywhere in Ogre.



Anyway, that's all that comes to mind at the moment. If somebody is already on the task of redoing the math library, I'd be happy to work with them and submit a few patches. If not, I'd be willing to take a closer look at exactly what would be necessary for the overhaul and start making some incremental changes to that end. Sadly, I do not have enough time to commit to implementing a complete and robust replacement alone.
bstone
OGRE Expert User
OGRE Expert User
Posts: 1920
Joined: Sun Feb 19, 2012 9:24 pm
Location: Russia
x 201

Re: The roadmap to 1.9 and 2.0

Post by bstone »

I wouldn't be surprised if applying SIMD to the math library would have no noticeable effect. After all a pair of 4x4 matrices fits entirely in the cache. It's not like Ogre multiplies 1000x1000 matrices anywhere in the process. The true power of SIMD comes when it's applied to large arrays. So till Ogre hasn't switched to SOA for its scene data organization/management there's no where to apply that power effectively. And if/once that happens, only then it will be clear how the related math code should actually look.
User avatar
Klaim
Old One
Posts: 2565
Joined: Sun Sep 11, 2005 1:04 am
Location: Paris, France
x 56

Re: The roadmap to 1.9 and 2.0

Post by Klaim »

IronNerd wrote:I'm also interested in the status of the math library. I have some experience with implementing SIMD math libraries. If a decision has not been reached as to how it should be done, I submit the following suggestions for the first iteration redo of the math library.
See http://www.ogre3d.org/forums/viewtopic.php?f=4&t=73101 for some discussion and realizations about this.
IronNerd
Gnoblar
Posts: 8
Joined: Mon Apr 23, 2012 11:35 am

Re: The roadmap to 1.9 and 2.0

Post by IronNerd »

bstone wrote:The true power of SIMD comes when it's applied to large arrays. So till Ogre hasn't switched to SOA for its scene data organization/management there's no where to apply that power effectively.
Although this is really where SIMD shines, it is becoming increasingly important to use SIMD for smaller operations such as 4x4 matrix multiplication. For example, it used to be that you would only get about a 1.9x speed boost for a 4x4 multiplication using SSE, but that was when MULSS and FMUL had the same latency[5]. Now, you get something closer to 4x speed increases[4]. Because performance intensive applications are using SSE more than the x87 floating point stack, it has become a significantly higher priority for optimization. This is exemplified in the instruction latency and throughput values for some of the newer architectures (where multiplication is ~1/3 faster and sqrt has ~1/4 the latency)[3]. In addition, the newer instruction sets (AVX, etc) allow for an additional ~1.77x speedup over the old SSE implementations[2]. This gives us a maximum of over 7x speedup. Any section of the code with remotely reasonable cache performance should be able to take advantage of this. This is also only taking into account matrix multiplication. Other things like Quaternion SLERP get approximately an 8(SSE)-17(AVX)x speedup[0,1].

[0] - http://software.intel.com/sites/default ... 293747.pdf
[1] - http://software.intel.com/en-us/article ... tion-slerp
[2] - http://software.intel.com/en-us/article ... l-matrices
[3] - http://www.intel.com/content/www/us/en/ ... anual.html
[4] - http://fhtr.blogspot.com/2010/02/4x4-fl ... using.html
[5] - http://download.intel.com/design/Pentiu ... 504501.pdf

Note: References are from some of my previous research. Hopefully they're useful to someone.
bstone wrote:And if/once that happens, only then it will be clear how the related math code should actually look.
Its hard to foresee anything much more complicated being required of the basic math facilities than "one by many" versions of the functions. Once you start going into "many by many" or more complicated versions, the performance benefits of the functions start dropping off anyway. If we start the overhaul before SOA is complete, it will form a good base off of which to build whatever is needed.
Klaim wrote:See viewtopic.php?f=4&t=73101 for some discussion and realizations about this.
Thanks! I didn't see this, but it looks like they have started coming to some similar conclusions as to where the math library belongs and how isolated it should be. Would it be bad form to link that thread back to this one?
User avatar
Jabberwocky
OGRE Moderator
OGRE Moderator
Posts: 2819
Joined: Mon Mar 05, 2007 11:17 pm
Location: Canada
x 218

Re: The roadmap to 1.9 and 2.0

Post by Jabberwocky »

IronNerd wrote:Would it be bad form to link that thread back to this one?
No problem at all.
Image
IronNerd
Gnoblar
Posts: 8
Joined: Mon Apr 23, 2012 11:35 am

Re: The roadmap to 1.9 and 2.0

Post by IronNerd »

I've just whipped up a quick bit of code to demo some of my core ideas for an update of the math library. The first file defines a class "Vec4" which implements the functionality of the current "Vector4" class (minus the functions relying on "Vector3") in terms of a set of inlined functions defined in "vec4_impl_fpu.h". This is a minimal set of functions required to implement the interface defined by "Vec4". These functions could then be swapped out at compile time with different versions based on the available hardware of the target and the build settings. This minimal set of functions with compile-time swapping allows for easy incremental implementation of SIMD versions as needed. The (39?) functions in "Vec4" are implemented by a total of 12 functions in the Vec4_impl namespace, most of which translate to single SSE/NEON/VMX instructions. This provides a reasonable abstraction layer for implementing math for new parallel architectures. Finally, the Vec4 template class is specialized into a Vector4 class that would keep compatibility with existing code. Some initial tweaks might be necessary before actually implementing another architecture for it, but hopefully it gets the ideas across effectively.

I'm considering writing a patch to put the math system into this form. What are everybody's thoughts? What does the actual dev team think about this?

Code: Select all

#ifndef __vec4_h__
#define __vec4_h__
#include <iostream>
#include "vec4_impl_fpu.h"

template<typename T>
class Vec4 {
public:
	// constructors
	inline Vec4() {}
	inline Vec4(T x, T y, T z, T w) {
		Vec4_impl::set(data, x, y, z, w);
	}
	inline explicit Vec4(T * const dat) {
		Vec4_impl::set(data, dat[0], dat[1], dat[2], dat[3]);
	}
	inline explicit Vec4(const T dat[4]) {
		Vec4_impl::set(data, dat[0], dat[1], dat[2], dat[3]);
	}
	inline explicit Vec4(const Vec4 &o) {
		Vec4_impl::copy(data,o.data);
	}
	inline explicit Vec4(const T scalar) {
		Vec4_impl::set(data, scalar);
	}
	// exchange contents
	inline void swap(Vec4 &o) {
		Vec4_impl::swap(data,o.data);
	}
	// indexing operations: will be slow on SIMD
	inline T operator[](const unsigned int i) const {
		return data[i];
	}
	inline T &operator[](const unsigned int i) {
		return data[i];
	}
	inline T* ptr() {
		return data;
	}
	inline const T* ptr() const {
		return data;
	}
	// Assigns the value of another vector
	inline Vec4& operator=(const Vec4& o) {
		Vec4_impl::copy(data,o.data);
		return *this;
	}
	inline Vec4& operator=(const T o) {
		Vec4_impl::set(o,o,o,o);
	}
	// sign
	inline const Vec4& operator+() const {
		return *this;
	}
	inline Vec4 operator-() const {
		Vec4 tmp;
		Vec4_impl::neg(tmp.data,data);
	}
	// equality
	inline bool operator==(const Vec4& o) const {
		return Vec4_impl::equals(data, o.data);
	}
	inline bool operator!=(const Vec4& o) const {
		return !Vec4_impl::equals(data, o.data);
	}
	inline bool isNan() const {
		return Vec4_impl::isNan(data);
	}
	// addition
	inline Vec4 operator+(const Vec4 &o) const {
		Vec4 tmp;
		Vec4_impl::add(tmp.data,data,o.data);
		return tmp;
	}
	inline Vec4 operator+(const T scalar) const {
		Vec4 tmp, tmp2;
		Vec4_impl::set(tmp2.data,scalar)
		Vec4_impl::add(tmp.data,data,tmp2.data);
		return tmp;
	}
	friend inline Vec4 operator+(const T scalar, const Vec4 &o) {
		Vec4 tmp, tmp2;
		Vec4_impl::set(tmp2.data,scalar)
		Vec4_impl::add(tmp.data,o.data,tmp2.data);
		return tmp;
	}
	inline Vec4 &operator+=(const Vec4 &o) {
		Vec4_impl::add(data,data,o.data);
		return *this;
	}
	inline Vec4 &operator+=(const T scalar) {
		Vec4 tmp;
		Vec4_impl::set(tmp.data,scalar);
		Vec4_impl::add(data,data,tmp.data);
		return *this;
	}
	// subtraction
	inline Vec4 operator-(const Vec4 &o) const {
		Vec4 tmp;
		Vec4_impl::sub(tmp.data,data,o.data);
		return tmp;
	}
	inline Vec4 operator-(const T scalar) const {
		Vec4 tmp, tmp2;
		Vec4_impl::set(tmp2.data,scalar)
		Vec4_impl::sub(tmp.data,data,tmp2.data);
		return tmp;
	}
	inline friend Vec4 operator-(const T scalar, const Vec4 &o) {
		Vec4 tmp, tmp2;
		Vec4_impl::set(tmp2.data,scalar)
		Vec4_impl::sub(tmp.data,o.data,tmp2.data);
		return tmp;
	}
	inline Vec4 &operator-=(const Vec4 &o) {
		Vec4_impl::sub(data,data,o.data);
		return *this;
	}
	inline Vec4 &operator-=(const T scalar) {
		Vec4 tmp;
		Vec4_impl::set(tmp.data,scalar);
		Vec4_impl::sub(data,data,tmp.data);
		return *this;
	}
	// multiplication
	inline Vec4 operator*(const Vec4 &o) const {
		Vec4 tmp;
		Vec4_impl::mul(tmp.data,data,o.data);
		return tmp;
	}
	inline Vec4 operator*(const T scalar) const {
		Vec4 tmp, tmp2;
		Vec4_impl::set(tmp2.data,scalar)
		Vec4_impl::mul(tmp.data,data,tmp2.data);
		return tmp;
	}
	inline friend Vec4 operator*(const T scalar, const Vec4 &o) {
		Vec4 tmp, tmp2;
		Vec4_impl::set(tmp2.data,scalar)
		Vec4_impl::mul(tmp.data,o.data,tmp2.data);
		return tmp;
	}
	inline Vec4 &operator*=(const Vec4 &o) {
		Vec4_impl::mul(data,data,o.data);
		return *this;
	}
	inline Vec4 &operator*=(const T scalar) {
		Vec4 tmp;
		Vec4_impl::set(tmp.data,scalar);
		Vec4_impl::mul(data,data,tmp.data);
		return *this;
	}
	// division
	inline Vec4 operator/(const Vec4 &o) const {
		Vec4 tmp;
		Vec4_impl::div(tmp.data,data,o.data);
		return tmp;
	}
	inline Vec4 operator/(const T scalar) const {
		Vec4 tmp, tmp2;
		Vec4_impl::set(tmp2.data,scalar)
		Vec4_impl::div(tmp.data,data,tmp2.data);
		return tmp;
	}
	inline friend Vec4 operator/(const T scalar, const Vec4 &o) {
		Vec4 tmp, tmp2;
		Vec4_impl::set(tmp2.data,scalar)
		Vec4_impl::div(tmp.data,o.data,tmp2.data);
		return tmp;
	}
	inline Vec4 &operator/=(const Vec4 &o) {
		Vec4_impl::div(data,data,o.data);
		return *this;
	}
	inline Vec4 &operator/=(const T scalar) {
		Vec4 tmp;
		Vec4_impl::set(tmp.data,scalar);
		Vec4_impl::div(data,data,tmp.data);
		return *this;
	}
	// output
	friend inline std::ostream& operator<<(std::ostream& out, const Vec4 &o) {
		out << "Vec4(" << o.data[0] << ", " << o.data[1] << ", " << o.data[2] << ", " << o.data[3] << ")";
		return out;
	}
protected:
	T data[4];
};

typedef float REAL;
typedef Vec4<REAL> Vector4;

#endif

Code: Select all

#ifndef __vec4_impl__
#define __vec4_impl__

namespace Vec4_impl {
	// data movement
	template<class T>
	inline void set(T* data, T x, T y, T z, T w) {
		data[0] = x;
		data[1] = y;
		data[2] = z;
		data[3] = w;
	}
	template<class T>
	inline void set(T* data, T scalar) {
		data[0] = scalar;
		data[1] = scalar;
		data[2] = scalar;
		data[3] = scalar;
	}
	template<class T>
	inline void copy(T* data1, T* const data2) {
		data1[0] = data2[0];
		data1[1] = data2[1];
		data1[2] = data2[2];
		data1[3] = data2[3];
	}
	template<class T>
	inline void swap(T* data1, T* data2) {
		T tmp[4];
		copy(tmp, data1);
		copy(data1, data2);
		copy(data2, tmp);
	}
	
	// equality
	template<class T>
	inline bool equals(T* const data1, T* const data2) {
		return data1[0]==data2[0] && data1[1]==data2[1] &&
			data1[2]==data2[2] && data1[3]==data2[3];
	}
	template<class T>
	inline bool isNan(T* const data) {
		return data[0]!=data[0] || data[1]!=data[1] ||
			data[2]!=data[2] || data[3]!=data[3];
	}

	// arithmetic
	template<class T>
	inline bool add(T* dest, T* const data1, T* const data2) {
		dest[0] = data1[0]+data2[0];
		dest[1] = data1[1]+data2[1];
		dest[2] = data1[2]+data2[2];
		dest[3] = data1[3]+data2[3];
	}
	template<class T>
	inline bool sub(T* dest, T* const data1, T* const data2) {
		dest[0] = data1[0]-data2[0];
		dest[1] = data1[1]-data2[1];
		dest[2] = data1[2]-data2[2];
		dest[3] = data1[3]-data2[3];
	}
	template<class T>
	inline bool mul(T* dest, T* const data1, T* const data2) {
		dest[0] = data1[0]*data2[0];
		dest[1] = data1[1]*data2[1];
		dest[2] = data1[2]*data2[2];
		dest[3] = data1[3]*data2[3];
	}
	template<class T>
	inline bool div(T* dest, T* const data1, T* const data2) {
		dest[0] = data1[0]/data2[0];
		dest[1] = data1[1]/data2[1];
		dest[2] = data1[2]/data2[2];
		dest[3] = data1[3]/data2[3];
	}

	// other
	template<class T>
	inline T dot(T* const data1, T* const data2) {
		return data1[0]*data2[0]+data1[1]*data2[1]+data1[2]*data2[2]+data1[3]*data2[3];
	}
	template<class T>
	inline void neg(T* dest, T* const data) {
		dest[0] = -data[0];
		dest[1] = -data[1];
		dest[2] = -data[2];
		dest[3] = -data[3];
	}
};

#endif
bstone
OGRE Expert User
OGRE Expert User
Posts: 1920
Joined: Sun Feb 19, 2012 9:24 pm
Location: Russia
x 201

Re: The roadmap to 1.9 and 2.0

Post by bstone »

It would make more sense to specialize your Vector4 with Ogre::REAL without defining it to float first. Otherwise that would break builds with OGRE_DOUBLE_PRECISION=1.

If you used that in your own product, would you stuff multiple binaries for each specific supported hardware in your distributable?
IronNerd
Gnoblar
Posts: 8
Joined: Mon Apr 23, 2012 11:35 am

Re: The roadmap to 1.9 and 2.0

Post by IronNerd »

I just defined it to float so that you could include it in a test project and it would compile. As I said, this is purely a demonstrative piece of code. I would use the standard Ogre::Real if submitting a patch.

There are three methods for swapping the functions out at runtime that come to mind.
1) Use CPUID to detect which features are available every time the function is called and run the appropriate code.
This is the easiest but also the most inefficienct way. For how small these functions are, we would see a performance drop.
2) Use function pointers to define the location of every function. On the first run, use CPUID to detect which version we should use, then set the function pointer to the appropriate value.
I'm still relatively un-fond of this because it causes poor cache performance. You might be able to get a performance boost here, but you can guarantee that all of your users have SSE3 and the overhead between just always using the SSE3 version and checking between that and the VEC version will be greater than the benefit on average. This is again because so many of our math functions are small.
3) At the beginning of the program, mark the sections of code that the math functions exist in as mutable. This will require a section of buffer code, especially for inlined functions. At runtime, use CPUID to detect which version of the code to use, then swap the code in memory so that the checking code never runs again.
I did this once before. Never again. It is a huge pain (especially for cross-compiler + cross-platform), is highly unstable, and often gets your antivirus very angry at you. It does, however, provide the best performance that you can get short of compiling separate builds for each supported instruction set.

After the code moves to a data-oriented architecture, there may be some places where it is advantageous to use #2, but that's in the future. Unless you happen to know of a better way, I would not recommend embedding multiple versions for now.
IronNerd
Gnoblar
Posts: 8
Joined: Mon Apr 23, 2012 11:35 am

Re: The roadmap to 1.9 and 2.0

Post by IronNerd »

bstone wrote:If you used that in your own product, would you stuff multiple binaries for each specific supported hardware in your distributable?
I just realized what you actually meant :P
No, I'd just assume that everybody has at least SSE3 and call it a day. If someone's computer is more than 7-8 years old, they probably can't handle the rest of my game anyway.
User avatar
vitefalcon
Orc
Posts: 438
Joined: Tue Sep 18, 2007 5:28 pm
Location: Seattle, USA
x 13

Re: The roadmap to 1.9 and 2.0

Post by vitefalcon »

Is there a reason why we wouldn't be using GLM apart from 'it introduces one more dependency'?

AFAIK, it's a header-only library and has already resolved SIMD issues. It's a MIT-licensed, CMake project and could be easily set as a sub-module to Ogre (by importing it to Mercurial, doing regular updates on it for each new version release or requests from users). This way we'd have a proven math library for Ogre, which is particularly aimed at games.

Ogre3D is aimed to be a rendering engine and not a math library provider. So let the 'math' guys provide the math and let Ogre3D do what it does best; being a great 3D graphics library.
Image
User avatar
Kojack
OGRE Moderator
OGRE Moderator
Posts: 7157
Joined: Sun Jan 25, 2004 7:35 am
Location: Brisbane, Australia
x 535

Re: The roadmap to 1.9 and 2.0

Post by Kojack »

Is there a reason why we wouldn't be using GLM apart from 'it introduces one more dependency'?
A few possible reasons:
- adds 223 heavily templated files (almost as many headers/inline files as ogremain itself)
- american spelling conventions
- will slow down compile time due to template use (the GLM docs say this)
- not designed for performance (the GLM docs say this. They say that people should implement their own maths for performance critical code)

It does look interesting though.
User avatar
saejox
Goblin
Posts: 260
Joined: Tue Oct 25, 2011 1:07 am
x 36

Re: The roadmap to 1.9 and 2.0

Post by saejox »

i like bullet's linear math headers.

it uses both SSE and NEON, cache friendly and compiles fast.

it doesnt have all Ogre structures and methods, but it's well coded imo.
http://code.google.com/p/bullet/source/ ... tVector3.h
Nimet - Advanced Ogre3D Mesh/dotScene Viewer
asPEEK - Remote Angelscript debugger with html interface
ogreHTML - HTML5 user interfaces in Ogre
User avatar
lunkhound
Gremlin
Posts: 169
Joined: Sun Apr 29, 2012 1:03 am
Location: Santa Monica, California
x 19

Re: The roadmap to 1.9 and 2.0

Post by lunkhound »

IronNerd wrote:I've just whipped up a quick bit of code to demo some of my core ideas for an update of the math library. The first file defines a class "Vec4" which implements the functionality of the current "Vector4" class (minus the functions relying on "Vector3") in terms of a set of inlined functions defined in "vec4_impl_fpu.h". This is a minimal set of functions required to implement the interface defined by "Vec4". These functions could then be swapped out at compile time with different versions based on the available hardware of the target and the build settings. This minimal set of functions with compile-time swapping allows for easy incremental implementation of SIMD versions as needed. The (39?) functions in "Vec4" are implemented by a total of 12 functions in the Vec4_impl namespace, most of which translate to single SSE/NEON/VMX instructions. This provides a reasonable abstraction layer for implementing math for new parallel architectures. Finally, the Vec4 template class is specialized into a Vector4 class that would keep compatibility with existing code. Some initial tweaks might be necessary before actually implementing another architecture for it, but hopefully it gets the ideas across effectively.

I'm considering writing a patch to put the math system into this form. What are everybody's thoughts? What does the actual dev team think about this?
I like the SIMD abstraction layer idea. I wrote my own SIMD math lib based on a SIMD abstraction layer and I feel that it worked out pretty well. I ended up with 35 functions (about 250 lines of code for SSE) in my SIMD abstraction layer supporting a math lib with Vec3, Vec4, Quat, and Matrix4x4 types. I kept expanding the abstraction layer anytime I would find an optimization opportunity. Compared with OGRE's current math I'm seeing pretty decent performance improvements in my synthetic benchmarks. For the case of multiplying two 4x4 matrices, I get about a 4x speedup (MSVC 2010 on a Nehalem quad core). For most cases the speedup is more modest, depending on how much shuffling of components is needed.
The nice thing about the SIMD abstraction layer is that it keeps the vector types clean looking, and free from stuff like "#if USE_SSE ..." "#elif USE_NEON ...". It also keeps all of the SSE and other platform specific code all together in one place rather than scattered around.

I can think of a few ideas for dealing with alignment:
  • a compile-option to enable a default alignment of all memory allocations (default to whatever the SIMD implementation needs, 16 bytes usually), if this option isn't enabled, then math doesn't use SIMD (no alignment headaches, but no SIMD performance either)
  • a compile-option to help track down alignment problems (i.e. generate an exception in the constructor and assignment operator of every SIMD-based type if the alignment is off) (obviously you don't want this on by default for performance reasons except maybe in a debug build)
I'm not a fan of using templates in the math lib. I don't see that it buys anything, and it just clutters up the code with alot of ugly "<T>" type stuff. It also tends to clutter up the compiler error messages with that stuff as well. The only reason I can see for templatizing the math would be if there was a need to use double precision and single precision types at the same time, and I haven't heard of anyone asking for that.

Also there are platforms such as the PS3 that suffer performance-wise when SIMD and scalar FPU operations are mixed. The units are completely separate and don't talk to each other, so transfers between them have to go through memory! OUCH!
To work around this problem, the strategy is to create a SIMD-scalar type that acts like a float and interoperates with floats but is implemented as a SIMD type. Anyplace where a vector class would accept or return a scalar it does so with the SIMDScalar type, thereby allowing intermediate values to stay in SIMD registers and never need to touch the scalar unit. This strategy also works on SSE. I think this should be incorporated into any math redesign. I posted about this awhile back (page 2 of this thread).
User avatar
vitefalcon
Orc
Posts: 438
Joined: Tue Sep 18, 2007 5:28 pm
Location: Seattle, USA
x 13

Re: The roadmap to 1.9 and 2.0

Post by vitefalcon »

Bringing back the topic of using external math library to avoid reinventing the wheel, how about Eigen Math Library, now licensed under MPL 2.0 (Mozilla Public License 2.0)?

According to them:
Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.
  • Eigen is versatile.
    • It supports all matrix sizes, from small fixed-size matrices to arbitrarily large dense matrices, and even sparse matrices.
    • It supports all standard numeric types, including std::complex, integers, and is easily extensible to custom numeric types.
    • It supports various matrix decompositions and geometry features.
    • Its ecosystem of [urlhttp://eigen.tuxfamily.org/dox-devel/unsupported/group__Unsupported__modules.html]unsupported modules[/url] provides many specialized features such as non-linear optimization, matrix functions, a polynomial solver, FFT, and much more.
  • Eigen is fast.
    • Expression templates allow to intelligently remove temporaries and enable lazy evaluation, when that is appropriate.
    • Explicit vectorization is performed for SSE 2/3/4, ARM NEON, and AltiVec instruction sets, with graceful fallback to non-vectorized code.
    • Fixed-size matrices are fully optimized: dynamic memory allocation is avoided, and the loops are unrolled when that makes sense.
    • For large matrices, special attention is paid to cache-friendliness.
  • Eigen is reliable.
    • Algorithms are carefully selected for reliability. Reliability trade-offs are clearly documented and extremely safe decompositions are available.
    • Eigen is thoroughly tested through its own test suite (over 500 executables), the standard BLAS test suite, and parts of the LAPACK test suite.
  • Eigen is elegant.
    • The API is extremely clean and expressive while feeling natural to C++ programmers, thanks to expression templates.
    • Implementing an algorithm on top of Eigen feels like just copying pseudocode.
  • Eigen has good compiler support as we run our test suite against many compilers to guarantee reliability and work around any compiler bugs. Eigen also is standard C++98 and maintains very reasonable compilation times.
Notes
News: Relicensed to MPL2!
Image
User avatar
Kojack
OGRE Moderator
OGRE Moderator
Posts: 7157
Joined: Sun Jan 25, 2004 7:35 am
Location: Brisbane, Australia
x 535

Re: The roadmap to 1.9 and 2.0

Post by Kojack »

Eigen has some downsides though. It's matrix4 and quaternion classes can't be stored in other classes without adding a special constructor to ensure memory alignment and they can't go into an std vector without custom allocators. There can be aliasing issues, if A is a matrix then A = A.transpose() will fail because if avoids temporaries by doing the transpose during the equals instead of during the transpose(), so it overwrites the memory it's trying to read.
The biggest potential problem is debug performance. Eigen is a templated expression tree building library that eliminates temporaries and will give you a bit faster code. But from previous experience with libs like this, debug builds will massively suffer. I haven't tested eigen's performance myself, but just reading the docs suggests that debug will be slow. Have a read of the 9 page topic in their manual about what happens internally when you add two vectors: http://eigen.tuxfamily.org/dox/TopicIns ... ample.html
User avatar
lunkhound
Gremlin
Posts: 169
Joined: Sun Apr 29, 2012 1:03 am
Location: Santa Monica, California
x 19

Re: The roadmap to 1.9 and 2.0

Post by lunkhound »

While it would be nice to get SSE, NEON, and Altivec support from a lib like Eigen. I have concerns about using it for OGRE.
  • Yet another dependency for OGRE.
  • Complicates OGRE's licensing status. I tried to understand this: http://www.mozilla.org/MPL/2.0/FAQ.html but about halfway through my eyes glazed over...
  • Eigen is overkill for what OGRE needs in a math lib. It seems to be primarily aimed at solving large/sparse matrices. It would clutter up the codebase with alot of unused stuff.
  • Templates. Slower compile times, indecipherable error messages, and unnecessary (for OGRE math).
The alignment issues are annoying, but ANY math library replacement that uses SIMD is going to have the same problems. And I don't see any reason to replace what we've already got unless we get SIMD performance out of it.
User avatar
vitefalcon
Orc
Posts: 438
Joined: Tue Sep 18, 2007 5:28 pm
Location: Seattle, USA
x 13

Re: The roadmap to 1.9 and 2.0

Post by vitefalcon »

Just to comment on the reasons you've given:
lunkhound wrote:[*]Yet another dependency for OGRE.
I had mentioned this before in my previous post, this is a proposal ignoring the fact that it needs one more dependency.
lunkhound wrote:[*]Complicates OGRE's licensing status. I tried to understand this: http://www.mozilla.org/MPL/2.0/FAQ.html but about halfway through my eyes glazed over...
Understandably, you're not a lawyer. Neither am I. But saying 'no' based on the fact that you haven't done enough research is not a reason. Here's how I found about mixing of licenses starting from the same page you've previously tried to understand:
MPL 2.0 FAQ wrote:Q5: I want to use software which is available under the MPL. What do I have to do?
Nothing. Like all other free and open source software, software available under the MPL is available for anyone (including individuals and companies) to use for any purpose. The MPL only creates obligations for you if you want to distribute the software outside your organization.
This brings us to two questions, what is the 'software' in question and what are the obligations if you want to distribute the software outside of your organization? The 'software' in our context will be Eigen. To answer the second question, we're not going to distribute Eigen outside of the organization (Ogre3D community). But what if we included it in our source? Wouldn't that be distribution of the 'MPL software' with 'our' software (Ogre3D)? Yes. It could be part of the SDK when distributed. And so we come to the question of 'How does one define MPL covered software?'
MPL 2.0: Section 1.4 wrote:1.4. “Covered Software”
means Source Code Form to which the initial Contributor has attached the notice in Exhibit A, the Executable Form of such Source Code Form, and Modifications of such Source Code Form, in each case including portions thereof.
MPL 2.0: Exhibit A wrote:Exhibit A - Source Code Form License Notice
This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
Basically, any file that attaches 'Exhibit A'. This will be the header files, since Eigen is header-only library. This takes us to one of the most critical question. Can MPL 2.0 be mixed with MIT?
MPL 2.0: Section 3.3 wrote:3.3. Distribution of a Larger Work
You may create and distribute a Larger Work under terms of Your choice, provided that You also comply with the requirements of this License for the Covered Software. If the Larger Work is a combination of Covered Software with a work governed by one or more Secondary Licenses, and the Covered Software is not Incompatible With Secondary Licenses, this License permits You to additionally distribute such Covered Software under the terms of such Secondary License(s), so that the recipient of the Larger Work may, at their option, further distribute the Covered Software under the terms of either this License or such Secondary License(s).
tl;dr: You can mix any license with a MPL 2.0 licensed software. When it gets mixed, the distributor can either choose to distribute the MPL 2.0 licensed software with license of the larger software (MIT) or as MPL 2.0.
lunkhound wrote:[*]Eigen is overkill for what OGRE needs in a math lib. It seems to be primarily aimed at solving large/sparse matrices. It would clutter up the codebase with alot of unused stuff.
The whole idea is not reinvent the wheel. It wouldn't clutter the code base if used as a library and typedef'd to appropriate names.
lunkhound wrote:[*]Templates. Slower compile times, indecipherable error messages, and unnecessary (for OGRE math).
I seriously doubt that template error messages are 'indecipherable'. Confusing, yes. But once you learn where to look at when an error shows up, it is pretty much straight-forward to 'decipher' what's wrong. And the clue to where to look at is to look at the error-line that specifies the '.h' or '.cpp' file you've created that uses the templated class.
lunkhound wrote:The alignment issues are annoying, but ANY math library replacement that uses SIMD is going to have the same problems. And I don't see any reason to replace what we've already got unless we get SIMD performance out of it.
If we want to use SIMD features, there's no other workaround I'm aware of. If there is, I would be delighted to know. I would try to push the use of SIMD operations though, because almost all processors support it and we only get performance advantages from it, especially for skinned animations (correct me if I am wrong).
Image
User avatar
vitefalcon
Orc
Posts: 438
Joined: Tue Sep 18, 2007 5:28 pm
Location: Seattle, USA
x 13

Re: The roadmap to 1.9 and 2.0

Post by vitefalcon »

Kojack wrote:Eigen has some downsides though. It's matrix4 and quaternion classes can't be stored in other classes without adding a special constructor to ensure memory alignment and they can't go into an std vector without custom allocators.
Although I don't have an answer to this, I would imagine that there are simple workarounds for this. Will have to research on this.
Kojack wrote:There can be aliasing issues, if A is a matrix then A = A.transpose() will fail because if avoids temporaries by doing the transpose during the equals instead of during the transpose(), so it overwrites the memory it's trying to read.
Considering that it's a well tested piece of software, I believe the 'doubt' of issues should be proven before claiming it does cause an issue. Maybe creating a test code to show it breaks would help?
Kojack wrote:The biggest potential problem is debug performance. Eigen is a templated expression tree building library that eliminates temporaries and will give you a bit faster code. But from previous experience with libs like this, debug builds will massively suffer. I haven't tested eigen's performance myself, but just reading the docs suggests that debug will be slow. Have a read of the 9 page topic in their manual about what happens internally when you add two vectors: http://eigen.tuxfamily.org/dox/TopicIns ... ample.html
Topics explaining what goes under the hood would naturally be too long for something like Eigen's math library. Have you considered how long Ogre's document would be if we were to explain the inner workings of our code? It would be huge and anyone reading it would get the sense of complexity behind a simple operation. But should they avoid Ogre just because it does some 'cool' stuff behind the scenes that makes it awesome? That's generally the argument Irrlicht users give against Ogre anyway. My suggestion would be to try and create a Ogre math module and compare between DLL's as a starting point of determining how big the debug libs become. Then we would have a quantitative measure to compare. I can volunteer to do this ONLY IF you would consider taking in a third-party math library into Ogre. If there's a strict 'NO' to that, then there's no point in me doing anything to be come up with a quantitative measure of sizes and debuggability of Eigen.
Image
User avatar
Kojack
OGRE Moderator
OGRE Moderator
Posts: 7157
Joined: Sun Jan 25, 2004 7:35 am
Location: Brisbane, Australia
x 535

Re: The roadmap to 1.9 and 2.0

Post by Kojack »

Considering that it's a well tested piece of software, I believe the 'doubt' of issues should be proven before claiming it does cause an issue. Maybe creating a test code to show it breaks would help?
Well, it's the Eigen official docs that say explicitly that A = A.transpose() will fail due to aliasing issues. :)
If you want test code to show it breaking, take a look at: http://eigen.tuxfamily.org/dox/TopicAliasing.html
The second example is a = a.transpose() failing.

My suggestion would be to try and create a Ogre math module and compare between DLL's as a starting point of determining how big the debug libs become.
It's not the size that matters, it's the performance. Expression templates to remove temporaries will make debug builds run slower. How much depends on the library itself.
Of course testing to see what hit it really has is the best thing to do. I've never used eigen, I've just used similarly written libs.
TheSHEEEP
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 972
Joined: Mon Jun 02, 2008 6:52 pm
Location: Berlin
x 65

Re: The roadmap to 1.9 and 2.0

Post by TheSHEEEP »

I am utterly sorry for what follows now, but there is no possible way for my head to pass up this unique opportunity. So please accept my up front apology for this spam...
Kojack wrote:It's not the size that matters, it's the performance.
Image


To contribute at least something of value, I absolutely agree with Kojack. First we'd have to know if the math we use is a problem at all (is it?) and then we'd need to actually compare with the alternatives.
My site! - Have a look :)
Also on Twitter - extra fluffy
User avatar
Kojack
OGRE Moderator
OGRE Moderator
Posts: 7157
Joined: Sun Jan 25, 2004 7:35 am
Location: Brisbane, Australia
x 535

Re: The roadmap to 1.9 and 2.0

Post by Kojack »

I'm 99% sure Eigen would give us a performance boost.

My concern is that the price we pay for that is losing the ability to do debug runs of ogre at ok framerates. Debug is already slow enough as it is.
I was in a similar situation 10 years ago, choosing between an ogre style vector class or a temp removing expression template system like eigen. In the end the small but definite gain in release mode wasn't worth the horrible debug performance.

But before considering moving ogre to it we need to test what we gain and what we lose.

I'm not against eigen itself, I'm just wary of the possible downsides.

Another fun little downside (from the Eigen FAQ page):
MSVC 2010 sometime crashes when the "enable browse" compiler option (/FR) is activated.
As a VC2010 user, that sounds a bit annoying.
bstone
OGRE Expert User
OGRE Expert User
Posts: 1920
Joined: Sun Feb 19, 2012 9:24 pm
Location: Russia
x 201

Re: The roadmap to 1.9 and 2.0

Post by bstone »

I'm in a sort of sci-fi mood today hence my suggestion: a swappable math implementation would benefit the release builds with increased performance and still save our day in the debug trenches. But I'm not even going through the headache of thinking if that's at all feasible. :wink:
User avatar
Klaim
Old One
Posts: 2565
Joined: Sun Sep 11, 2005 1:04 am
Location: Paris, France
x 56

Re: The roadmap to 1.9 and 2.0

Post by Klaim »

Shouldn't this discussion be cut in another consecrated thread?

Also, would there be an interest into making Ogre types being eigen types in non-debug modes only? I suspect having different code for different builds would be a bad idea but maybe some people would like to have the extra performance.
bstone
OGRE Expert User
OGRE Expert User
Posts: 1920
Joined: Sun Feb 19, 2012 9:24 pm
Location: Russia
x 201

Re: The roadmap to 1.9 and 2.0

Post by bstone »

Technically you always have different code in debug and release builds. But I see what you mean. It all boils down to how much you want the extra gain and how good your QA process is. Let's be honest - debug builds in Ogre (and pretty much anything performance critical for that matter) will not let you test the product in full, you still need to test release builds or you're in trouble.