improvement with OgreAtomicWrappers.h

Discussion area about developing or extending OGRE, adding plugins for it or building applications on it. No newbie questions please, use the Help forum for that.
oiram
Halfling
Posts: 87
Joined: Sun Dec 13, 2009 5:06 am
x 6

improvement with OgreAtomicWrappers.h

Post by oiram »

1. Auto traits (cas, increment, decrement) with sizeof atomic (2/4/8) in compiling time.
2. Use increment and decrement API If possible, and use cas do same thing with 64 bits value.
3. I think we do not need operator "+=" and "-=" with it.
4. "InterlockedCompareExchange64" cannot be located with kernal32.dll on XP SP3, so I use "_InterlockedCompareExchange64" to replace it.
5. I think use c++ 0x11 std::atomic if possible and use boost::atomic for others is better.

Code: Select all

// atomic.cpp : Defines the entry point for the console application.
//

#ifndef WIN32_LEAN_AND_MEAN
#  define WIN32_LEAN_AND_MEAN
#endif
#if !defined(NOMINMAX) && defined(_MSC_VER)
#	define NOMINMAX // required to stop windows.h messing up std::min
#endif
/*
#ifndef _WIN32_WINNT
#define _WIN32_WINNT 0x0502
#endif
 
#ifndef WINVER
#define WINVER 0x0502
#endif
*/

#include <windows.h>
#include <intrin.h>
#include <tchar.h>

template<class T> class AtomicScalar
{
public:
    AtomicScalar (const T &initial)
        : mField(initial)
    {
	}

    AtomicScalar (const AtomicScalar<T> &cousin)
        : mField(cousin.mField)
    {
	}

    AtomicScalar () 
    {
	}

    void operator = (const AtomicScalar<T> &cousin)
    {
        mField = cousin.mField;
    }

    T get (void) const
    {
        return mField;
    }

    void set (const T &v)
    {
        mField = v;
    }

	// cas
	template <int = sizeof(T)>
	struct cas_traits
	{
		bool operator () (T volatile &x, const T &old, const T &nu) { throw "exception!"; }
	};

	template <>
	struct cas_traits<2>
	{
		__forceinline bool operator () (T volatile &x, const T &old, const T &nu)
		{
			return _InterlockedCompareExchange16((SHORT*)&x, static_cast<SHORT>(nu), static_cast<SHORT>(old)) == static_cast<SHORT>(old);
		}
	};

	template <>
	struct cas_traits<4>
	{
		__forceinline bool operator () (T volatile &x, const T &old, const T &nu)
		{
			return _InterlockedCompareExchange((LONG*)&x, static_cast<LONG>(nu), static_cast<LONG>(old)) == static_cast<LONG>(old);
		}
	};

	template <>
	struct cas_traits<8>
	{
		__forceinline bool operator () (T volatile &x, const T &old, const T &nu)
		{
			return _InterlockedCompareExchange64((LONGLONG*)&x, static_cast<LONGLONG>(nu), static_cast<LONGLONG>(old)) == static_cast<LONGLONG>(old);
		}
	};

	bool cas (const T &old, const T &nu)
	{
		return cas_traits<>()(mField, old, nu);
	}


	// increment
	template <int = sizeof(T)>
	struct increment_traits
	{
		bool operator () (T volatile &x) { throw "exception!"; }
	};

	template <>
	struct increment_traits<2>
	{
		__forceinline T operator () (T volatile &x)
		{
			return _InterlockedIncrement16((SHORT*)&x);
		}
	};

	template <>
	struct increment_traits<4>
	{
		__forceinline T operator () (T volatile &x)
		{
			return InterlockedIncrement((LONG*)&x);
		}
	};

	template <>
	struct increment_traits<8>
	{
		__forceinline T operator () (T volatile &x)
		{
			return _InterlockedCompareExchange64((LONGLONG*)&x, static_cast<LONGLONG>(x + 1), static_cast<LONGLONG>(x));
		}
	};


	// decrement
	template <int = sizeof(T)>
	struct decrement_traits
	{
		bool operator () (T volatile &x) { throw "exception!"; }
	};

	template <>
	struct decrement_traits<2>
	{
		__forceinline T operator () (T volatile &x)
		{
			return _InterlockedDecrement16((SHORT*)&x);
		}
	};

	template <>
	struct decrement_traits<4>
	{
		__forceinline T operator () (T volatile &x)
		{
			return InterlockedDecrement((LONG*)&x);
		}
	};

	template <>
	struct decrement_traits<8>
	{
		__forceinline T operator () (T volatile &x)
		{
			return _InterlockedCompareExchange64((LONGLONG*)&x, static_cast<LONGLONG>(x - 1), static_cast<LONGLONG>(x));
		}
	};


	T operator++ (void)
    {
		return increment_traits<>()(mField);
    }
            
    T operator-- (void)
    {
		return decrement_traits<>()(mField);
    }

    T operator++ (int)
    {
		return increment_traits<>()(mField) - 1;
    }
            
    T operator-- (int)
    {
		return decrement_traits<>()(mField) + 1;
    }

	T operator+=(const T &add)
	{
		//The function InterlockedExchangeAdd is not available for 64 and 16 bit version
		//We will use the cas operation instead. 
		T newVal;
		do {
			//Create a value of the current field plus the added value
			newVal = mField + add;
			//Replace the current field value with the new value. Ensure that the value 
			//of the field hasn't changed in the mean time by comparing it to the new value
			//minus the added value. 
		} while (!cas(newVal - add, newVal)); //repeat until successful
		return newVal;
	}

	T operator-=(const T &sub)
	{
		//The function InterlockedExchangeAdd is not available for 64 and 16 bit version
		//We will use the cas operation instead. 
		T newVal;
		do {
			//Create a value of the current field plus the added value
			newVal = mField - sub;
			//Replace the current field value with the new value. Ensure that the value 
			//of the field hasn't changed in the mean time by comparing it to the new value
			//minus the added value. 
		} while (!cas(newVal + sub, newVal)); //repeat until successful
		return newVal;
	}

    volatile T mField;

};



int _tmain(int argc, _TCHAR* argv[])
{
	typedef unsigned char ResourceHandle8;
	typedef unsigned short ResourceHandle16;
	typedef unsigned long int ResourceHandle32;
	typedef unsigned long long int ResourceHandle64;
	AtomicScalar<ResourceHandle64> mField(0);

	++mField;
	--mField;

	mField++;
	mField--;
	
	mField += 2;
	mField -= 2;

	return 0;
}
What do you think? :wink:
User avatar
masterfalcon
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 4270
Joined: Sun Feb 25, 2007 4:56 am
Location: Bloomington, MN
x 126

Re: improvement with OgreAtomicWrappers.h

Post by masterfalcon »

Could you explain the reasons for each of those? Except 5, we're not ready to require C++11 yet and we can't have boost as a required dependency.
User avatar
vitefalcon
Orc
Posts: 438
Joined: Tue Sep 18, 2007 5:28 pm
Location: Seattle, USA
x 13

Re: improvement with OgreAtomicWrappers.h

Post by vitefalcon »

masterfalcon wrote:Could you explain the reasons for each of those? Except 5, we're not ready to require C++11 yet and we can't have boost as a required dependency.
I agree with masterfalcon here. We should neither enforce boost on Ogre nor are we ready for C++11. It would be great to add your rationale behind each of your points. But to comment on the other points for which I do have comments:
oiram wrote:1. Auto traits (cas, increment, decrement) with sizeof atomic (2/4/8) in compiling time.
Sounds reasonable and something I would like Ogre3D to have, but comes with considerable amount of work.
oiram wrote:2. Use increment and decrement API If possible, and use cas do same thing with 64 bits value.
I don't know why we wouldn't be using increment and decrement API (I haven't looked into it yet). Why should we use cas for 64-bit instead of increment/decrement?
oiram wrote:"InterlockedCompareExchange64" cannot be located with kernal32.dll on XP SP3, so I use "_InterlockedCompareExchange64" to replace it.
This is a known issue and, I believe, has been recently fixed by masterfalcon.

EDIT: The 'traits' code looks good, but needs to be extended to be cross-platform.
Image
oiram
Halfling
Posts: 87
Joined: Sun Dec 13, 2009 5:06 am
x 6

Re: improvement with OgreAtomicWrappers.h

Post by oiram »

masterfalcon wrote:Could you explain the reasons for each of those? Except 5, we're not ready to require C++11 yet and we can't have boost as a required dependency.
1. Auto traits (cas, increment, decrement) with sizeof atomic (2/4/8) in compiling time.
OgreAtomicWrappers was designed to a template class for any type name use, it guide me to use traits class template to replace run-time "if/else" unnecessary with compile-time traits no overhead. API will only be used to link.
2. Use increment and decrement API If possible, and use cas do same thing with 64 bits value.
Using increment and decrement API are faster then cas perhaps, if not these API no longer necessary. And using traits to decide which API will be used with compile-time. API will only be used to link.
3. I think we do not need operator "+=" and "-=" with it.
Base on my knowledge, these two operators should almost never be used with OGRE.
4. "InterlockedCompareExchange64" cannot be located with kernal32.dll on XP SP3, so I use "_InterlockedCompareExchange64" to replace it.
It's a known issue, and David Rogers fixed other problem with InterlockedIncrement16/InterlockedDecrement16 at yesterday.
5. I think use c++ 0x11 std::atomic if possible and use boost::atomic for others is better.
I have no objection, this means that we need to do more.
Last edited by oiram on Mon Apr 01, 2013 7:59 pm, edited 1 time in total.
oiram
Halfling
Posts: 87
Joined: Sun Dec 13, 2009 5:06 am
x 6

Re: improvement with OgreAtomicWrappers.h

Post by oiram »

vitefalcon wrote: I don't know why we wouldn't be using increment and decrement API (I haven't looked into it yet). Why should we use cas for 64-bit instead of increment/decrement?
Into winbase.h, InterlockedDecrement64 is a forceline function which is based on InterlockedCompareExchange64.
@winbase.h, 2413 line.

Code: Select all

FORCEINLINE
LONGLONG
InterlockedDecrement64 (
    __inout LONGLONG volatile *Addend
    )
{
    LONGLONG Old;

    do {
        Old = *Addend;
    } while (InterlockedCompareExchange64(Addend,
                                          Old - 1,
                                          Old) != Old);

    return Old - 1;
}
vitefalcon wrote: The 'traits' code looks good, but needs to be extended to be cross-platform.
I've not test these code except vc2010, but it's a standard traits class template.
User avatar
sparkprime
Ogre Magi
Posts: 1137
Joined: Mon May 07, 2007 3:43 am
Location: Ossining, New York
x 13

Re: improvement with OgreAtomicWrappers.h

Post by sparkprime »

The 'runtime' if/else should be optimised by the compiler:

1) template expansion
2) constant propagation
3) dead code elimination

Do you have any performance numbers or any evidence that this is not happening?
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5505
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1371

Re: improvement with OgreAtomicWrappers.h

Post by dark_sylinc »

So, I was asked to review this thread.

Summary:
I'm really against operator overloading for atomics. Multithreading is a highly difficult topic even for experienced programmers. Something like "++mCount" looks very innocent and prone to abusing (i.e. being treated like a normal variable by programmers who'll come after us). Multithreaded related variables (specially when used for synchronization) should yell "this is MT related stuff and not part of the actual algorithm!" Something like "AtomicIncrement( mCount )" tells a lot more to the programmer, and puts a heed warning for the careless ones.

I do appreciate that someone takes the time to write cross-platform Atomic functions though, since we'll need it. We can't just write "InterlockedIncrement" and then it doesn't compile in Linux. I'm saying that functions should be preferred over operator overloading (in this case). And I'm usually a fan of op. overloading.

mField being volatile is dangerous (can heavily block performance optimization of unrelated code around "++mCount") and nearly useless. The input to the Interlocked functions is what should be volatile, not the member variable. Again, since I prefer functions over classes in this case, the mField variable should go away.

mField is not enforced any particular alignment. In x86, the variable has to be aligned for Interlocked functions to truly be atomic.
I use the following snippet for MSVC:

Code: Select all

#define ALIGN(alignment, decl) __declspec(align(alignment)) decl
ALIGN( 4, long mField );
The '+=' and '-=' operators (add & sub) using live spin locks are an abomination when targetting x86 systems. We have the instruction "lock xadd" for that (there's InterlockedExchangeAdd in Windows). You can use a template specialization for the 32-bit case (although, multithreading = hard = I'd like to avoid templates). If we're targetting x64, "lock xadd" can be used too for 64-bit numbers.

The "abomination" part is an example of what I want to avoid:
"mCount += 5" -> This looks innocent and fast.
"AtomicAdd( mCount, 5 )" -> This looks like it may be nasty on some architectures (what happens inside the atomic should be architecture dependent, whether we use lock xadd, live locks, or a full blown mutex). Better not use it frequently.

Cheers
Dark Sylinc
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5505
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1371

Re: improvement with OgreAtomicWrappers.h

Post by dark_sylinc »

Ahhh, I see OgreAtomicWrappers.h is a preexisting file. I thought it was something you were planning to add. And look! it is being used in the Resource system.

That whole thing (Resource System) is in my seek-and-destroy todo list. But I (particularly) won't be touching that bit of code for a long, long time.

I agree we're not ready for std::atomic and boost is ok only if optional.
2. Use increment and decrement API If possible, and use cas do same thing with 64 bits value.
Since I suggested the same thing, I agree :)
3. I think we do not need operator "+=" and "-=" with it.
I called those an abomination, so I guess that means I agree :lol:
oiram
Halfling
Posts: 87
Joined: Sun Dec 13, 2009 5:06 am
x 6

Re: improvement with OgreAtomicWrappers.h

Post by oiram »

sparkprime wrote:The 'runtime' if/else should be optimised by the compiler:

1) template expansion
2) constant propagation
3) dead code elimination

Do you have any performance numbers or any evidence that this is not happening?
For example,

Code: Select all

typedef unsigned long long int ResourceHandle;
AtomicScalar<ResourceHandle> mNextHandle;
Using template specialization, only the 64-bit version of the code is generated with just 1 API is called and linked(simple & stupid). However, using if/else everywhere will make the code difficult to maintain and unnecessary performance overhead(I don't know whether the compiler is the ability to do this optimization in this case).
oiram
Halfling
Posts: 87
Joined: Sun Dec 13, 2009 5:06 am
x 6

Re: improvement with OgreAtomicWrappers.h

Post by oiram »

Just five minutes before I found one thing. TBB wrote a cross-platform atomic already, it's using template specialization as same as mine.
I suggest we use TBB::atomic and boost::atomic to avoid reinvent the wheel, provide an option to choose like thread(or automatic choose with thread).

I'm implementing a background loading terrain like wow. When the background thread to complete the loading of resources, and must be completed by the main thread creation. However, significant delay in the process, and finally I found ResourceManager::getNextHandle is the bottleneck(OGRE_LOCK_AUTO_MUTEX). And, thanks for vitefalcon's work.
User avatar
sparkprime
Ogre Magi
Posts: 1137
Joined: Mon May 07, 2007 3:43 am
Location: Ossining, New York
x 13

Re: improvement with OgreAtomicWrappers.h

Post by sparkprime »

I believe that I wrote OgreAtomic.h btw :)
User avatar
sparkprime
Ogre Magi
Posts: 1137
Joined: Mon May 07, 2007 3:43 am
Location: Ossining, New York
x 13

Re: improvement with OgreAtomicWrappers.h

Post by sparkprime »

OK I had a quick read of the code, I definitely remember writing it.

There are two classes in there -- AtomicObject<T> which is for when you want a full lock in all cases. The mField is volatile but this looks like a copy paste error.


Then there are 3 implementations of AtomicScalar<T>, which is intended to be used on datatypes that are <= sigatomic_t and be very fast.

There is an implementation for gcc that uses gcc intrinsics, one for msvc that uses winapi calls, and a fallback implementation that uses locks.

In all cases, the get and set routines will not work unless mField is volatile. This will prevent some compiler optimisations, yes, but these are the compiler optimisations that result in incorrect behaviour :)
User avatar
sparkprime
Ogre Magi
Posts: 1137
Joined: Mon May 07, 2007 3:43 am
Location: Ossining, New York
x 13

Re: improvement with OgreAtomicWrappers.h

Post by sparkprime »

Addressing dark_cyclic's points

1) alignment of mField -- probably right, I do not know. It was only tested on 32 bit systems, and not with 16 bit atomic types. There are many issues with alignment here -- not just word tearing but false sharing too. I suspect they also manifest on gcc, in the sense that the gcc intrinsics won't be as fast as they should be.
2) function as opposed to class API -- you convinced me this is a good idea -- we can do it by just changing += to .add() or whatever of course. (I suspect part of my goal here was to have a drop in replacement for a volatile int field in the Resource object, it was done in a rush you see.)
3) the code already uses the add / increment functions mentioned, i'm not sure what this is referring to. Not much in this forum thread makes much sense to be honest.
oiram
Halfling
Posts: 87
Joined: Sun Dec 13, 2009 5:06 am
x 6

Re: improvement with OgreAtomicWrappers.h

Post by oiram »

sparkprime wrote:I believe that I wrote OgreAtomic.h btw :)
Oops... sorry about that :oops: