Proposal: OGRE_THREAD_SUPPORT == 3

Discussion area about developing or extending OGRE, adding plugins for it or building applications on it. No newbie questions please, use the Help forum for that.
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 66

Proposal: OGRE_THREAD_SUPPORT == 3

Post by sinbad »

My experience of multicore programming with Ogre has improved over the last couple of years, and I realised a while ago that my original approach, OGRE_THREAD_SUPPORT==1 where resource management is fully threaded, was hopelessly naive. It required too many locks, and also that the rendersystem was multi-threaded. OGRE_THREAD_SUPPORT==2 improved that by not requiring that the rendersystem was threaded, and just doing disk I/O in the background, but still, the whole process is still driven within the Resource class, which means the locks are still in place on Resource and by association SharedPtr and a bunch of other classes too.

More recently I've been trying out alternative approaches in client projects, where I've built OGRE with OGRE_THREAD_SUPPORT disabled, and instead I've done my own, very much more specific threading and only interacted with Ogre in a single thread. I've worked very much in a data-driven model such that data is ring-fenced and passed between threads with minimal locks - data is generally not visible within more than one thread at a time (unlike the Resource approach, where even with OGRE_THREAD_SUPPORT==2 the resource instance & state is shared between the threads). This works very well, requires fewer locks, is easier to debug and is faster.

Recently, I've sneaked this approach into the Ogre core too. The new terrain component actually uses this approach - even though it's currently operating within OGRE_THREAD_SUPPORT==2, it doesn't do any data sharing between threads and instead simply has a hand-over of data via WorkQueue. I even created software, copyable versions of VertexDeclaration et al so that I could build them in another thread separately from the rendersystem. Therefore, if WorkQueue was still operating in a threaded manner, and I resolved a couple of issues with resource path lookups, I could build Ogre with OGRE_THREAD_SUPPORT disabled and still get threading on Terrain.

So, here's what I propose:
  1. Add a new OGRE_THREAD_SUPPORT option, 3
  2. This results in OGRE_MUTEX, OGRE_LOCK_MUTEX etc all becoming no-ops just like in OGRE_THREAD_SUPPORT=0 - so SharedPtr, Resource etc are all not thread safe
  3. WorkQueue changed to use different macros, e.g. OGRE_WQ_MUTEX, which are the only ones enabled when OGRE_THREAD_SUPPORT=3
  4. Resource prepare() when OGRE_THREAD_SUPPORT=3 is 'deferred' rather than 'threaded'. That is, we make all changes to Resource in the main render thread, and simply push a request on to WorkQueue to read / optionally pre-process the data in a thread. This thread cannot access any of Ogre in a threadsafe manner, it can only use software structures.
  5. All data exchanged between the threads must be totally encapsulated in the Request and Response, so that it is self-contained and ownership can be passed cleanly between threads.
  6. The main challenge here is the resource path & archive system - we need to make these lock-free (locking them will start to bleed into other areas and we'll end up with too much again). One option would be to have archive instances & resource paths copied to each thread, with changes queued up / logged and picked up when each thread needs them.
  7. So from a resource point of view, this is a lot like OGRE_THREAD_SUPPORT==2, except that Resource isn't threadsafe (actually, nothing is!), because it's only accessed in the main thread.
  8. We make this the default when Boost / POCO / TBB are detected
  9. Over time we expand this data-driven, lock-free model for future threading within Ogre
This will effectively eliminate any additional overheads when using threaded behaviour in OGRE, since none of the regular classes will be thread safe. Note that even though SharedPtr may be used in (CPU-side) data structures used by the background thread, beacuse the 'handover' of data only occurs in one place - the WorkQueue Request/Response system - there is no risk of parallel access and therefore it can be lock-free (provided the user doesn't allow the pointer to point to a shared object of course! This is key, data has to be solely owned, the SharedPtr is just for type / structure convenience like VertexData). So long as the data that is used by the background process is entirely encapsulated, and the contents only visible to one thread at a time, we get background processing with no locks except for the queueing system (and for that I may investigate a lock-free queue too as an extension).

Thoughts? As I say, I've used this approach in client projects with OGRE_THREAD_SUPPORT=0 (basically implementing my own version of WorkQueue and container structures externally, and not using the resource paths to locate files) and it's worked really well; I'd like to standardise it. It's faster an simpler at the same time.
User avatar
tuan kuranes
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 2653
Joined: Wed Sep 24, 2003 8:07 am
Location: Haute Garonne, France
x 4

Re: Proposal: OGRE_THREAD_SUPPORT == 3

Post by tuan kuranes »

Same here, always used OGRE_THREAD_SUPPORT=0 and threading my own data to resource loading.
I would definitly go for only the "3" thread model and erase the others for the sake of maintenance and code clean level.
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 66

Re: Proposal: OGRE_THREAD_SUPPORT == 3

Post by sinbad »

tuan kuranes wrote:I would definitly go for only the "3" thread model and erase the others for the sake of maintenance and code clean level.
I'd really like to do that but I'm concerned about the impact it might have on people's existing code. It wouldn't be so bad for those using OGRE_THREAD_SUPPORT==2, but it would totally break anyone using OGRE_THREAD_SUPPORT==1 and relying on Resource's thread safety.

Anyone have an opinion on this?
User avatar
stealth977
Gnoll
Posts: 638
Joined: Mon Dec 15, 2008 6:14 pm
Location: Istanbul, Turkey
x 42

Re: Proposal: OGRE_THREAD_SUPPORT == 3

Post by stealth977 »

I agree with tuan, cleaning up the core and supplying only THREADING_ON(3) or THREADING OFF would simplify a lot of stuff.

Also as you know better than i do, CORES are increasing rapidly and so are the MEMORY ARCHITECTURES (as either multi channels or frequency), the LOCK BASED threading model was actually popular when machines just began threading and were useful to prevent collisions between a few threads (can you guess the locking overhead of 8 hardware threads?).

Although locking will probably still be popular in the future too (if hardware threads continue to increase, at some point there sure will be inter-thread dependencies, i dont expect some 8+ threads to copy all the data and work on it independently, so there may be data shared between a few threads for a period ) the increase in available hardware threads and memory throughput highly favors usage of independent lockless algorithms.

Also i personally think that threading involving predefined locks creates a closed-circuit threading model which is not usually much beneficial during integration with other subsystems. OGRE using a closed-circuit lock based thread is most of the time a pain to integrate with other lockless parallel thread schedulers...

Of course, there can be work depending on THREADING_SUPPORT(1), but:
1 - It may be because the developer had no other feasible CHOICE (other threading type enabled in OGRE)
2 - They may be better utilized with THREADING_SUPPORT(3)
3 - They may be easily converted?

(Well actually as a coincidence, i was thinking to use a multi-threaded (lockless, task scheduled) API for the next generation Ogitor API (Component-Based) and my biggest concern was how to integrate OGRE in it :) )

thats my 2 cents on the subject,
Ismail TARIM
Ogitor - Ogre Scene Editor
WWW:http://www.ogitor.org
Repository: https://bitbucket.org/ogitor
User avatar
Zeal
Ogre Magi
Posts: 1260
Joined: Mon Aug 07, 2006 6:16 am
Location: Colorado Springs, CO USA

Re: Proposal: OGRE_THREAD_SUPPORT == 3

Post by Zeal »

I have been doing this for about a year now, and its been working out great. I have Ogre on its own thread, running a render loop as tight as you can get...

Code: Select all

while(OGRE=="win")
{
// Read messages (sent from the main thread) from a lockless double buffered queue
readMessages();
	// Messages dont do expensive computations, they just update the scene state (scene nodes, ect...)
renderOneFrame();
}
No locks, no fancy pants threading on your main thread, just write your main loop in a serial fashion (since messages in the queue are guaranteed to execute FIFO). There is some slight memory overhead, and you introduce up to one frame of latency, but you can theoretically shave 16ms off your main thread.
I'd really like to do that but I'm concerned about the impact it might have on people's existing code. It wouldn't be so bad for those using OGRE_THREAD_SUPPORT==2, but it would totally break anyone using OGRE_THREAD_SUPPORT==1 and relying on Resource's thread safety.
Bah take a vote if you have to, but I would be willing to bet nobody uses that mode anyway. THREADED or NOT_THREADED should be your only two options.
User avatar
joew
Greenskin
Posts: 113
Joined: Fri Nov 03, 2006 6:03 pm

Re: Proposal: OGRE_THREAD_SUPPORT == 3

Post by joew »

I definitely agree with adding THREAD_SUPPORT == 3, not only looking at the project now but looking at planned changes in the future this type of architecture actually seems needed.
User avatar
DanielSefton
Ogre Magi
Posts: 1235
Joined: Fri Oct 26, 2007 12:36 am
Location: Mountain View, CA
x 10

Re: Proposal: OGRE_THREAD_SUPPORT == 3

Post by DanielSefton »

Zeal wrote:but I would be willing to bet nobody uses that mode anyway.
I do. I use OGRE_THREAD SUPPORT == 1 for background loading.

So sinbad, you're saying I should disable threading in Ogre and handle it externally?

What are my options if/when THREAD_SUPPORT == 3 is available?
User avatar
Praetor
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 3335
Joined: Tue Jun 21, 2005 8:26 pm
Location: Rochester, New York, US
x 3

Re: Proposal: OGRE_THREAD_SUPPORT == 3

Post by Praetor »

I agree with sinbad 100%. This is how Ogre should handle all internal threading. Now, if you also want to totally isolate Ogre from other systems you can set up even more threading on your own. This is really the way to go. The only thing to work out is what to do about support and compatibility. Honestly, this is what major releases are for. I think we should drop the unsatisfying threading schemes and focus on this one. Will it break some code? Definitely, but that has to be expected between major releases and this really helps to keep our maintainability.
Game Development, Engine Development, Porting
http://www.darkwindmedia.com
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 66

Re: Proposal: OGRE_THREAD_SUPPORT == 3

Post by sinbad »

DanielSefton wrote:
Zeal wrote:but I would be willing to bet nobody uses that mode anyway.
I do. I use OGRE_THREAD SUPPORT == 1 for background loading.

So sinbad, you're saying I should disable threading in Ogre and handle it externally?
In 1.6 yeah; the argument is that this way you have far more control over how much locking is going on. If you partition your data properly there's no need for any generalised thread safety like we've tried to make available in Resource in the past, which means less locking. You bite off chunks of data instead that are only ever accessed by one thread at a time, and you pass them around in a single-ownership fashion. It means that you don't use most Ogre classes in your background threads, just the primitives and other self-contained data classes.
What are my options if/when THREAD_SUPPORT == 3 is available?
WorkQueue is already available for performing generalised background processing, and already encourages you to partition background data processing from accesssing any GPU data. That will be extended to saying that you should not use any shared data or shared Ogre classes in these background threads too.

If you're just loading resources in the background, you won't really have to think about it that much, because the process will look mostly the same from outside, it'll just be implemented differently. But for custom processes, like I do with Terrain, there are hooks for performing background tasks and you'd have to respect the new rules. My terrain component almost does that already so it's a decent example of what to do; that is data passed between threads in the Request / Response are completely isolated and therefore safe to process in threads without locking - I just need to make the opening of resource streams possible in other threads without locking (ie duplication of resource paths and archive instances per thread).

I'm going to leave this thread open a while and see what people think about dropping the other 2 modes and just going with this, but ideally I'd like to tackle this sooner rather than later. I'm busy until after Qt DevDays anyway but when I get back I think I'll put this high on my priority list. It would be nice to do it for 1.7 and have a clean threading slate.
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 66

Re: Proposal: OGRE_THREAD_SUPPORT == 3

Post by sinbad »

One of the major things I should mention is that this new mode will prevent parsing of scripts in the background. Potentially we could build the AST in the background but no Ogre instances could be created from the results except in the main thread.

Also the resource listeners and manual loaders will have to change a bit.
User avatar
stealth977
Gnoll
Posts: 638
Joined: Mon Dec 15, 2008 6:14 pm
Location: Istanbul, Turkey
x 42

Re: Proposal: OGRE_THREAD_SUPPORT == 3

Post by stealth977 »

Praetor wrote:The only thing to work out is what to do about support and compatibility. Honestly, this is what major releases are for. I think we should drop the unsatisfying threading schemes and focus on this one. Will it break some code? Definitely, but that has to be expected between major releases and this really helps to keep our maintainability.
sinbad wrote:One of the major things I should mention is that this new mode will prevent parsing of scripts in the background. Potentially we could build the AST in the background but no Ogre instances could be created from the results except in the main thread.
Also the resource listeners and manual loaders will have to change a bit.
If things are going to change and if things are going to change for good and possibly towards concurrent processing
(or at least towards being concurrent processing friendly) i guess making them now on 1.7 would be a great opportunity...
Ismail TARIM
Ogitor - Ogre Scene Editor
WWW:http://www.ogitor.org
Repository: https://bitbucket.org/ogitor
User avatar
Praetor
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 3335
Joined: Tue Jun 21, 2005 8:26 pm
Location: Rochester, New York, US
x 3

Re: Proposal: OGRE_THREAD_SUPPORT == 3

Post by Praetor »

The script processing could still be done in the background by creating special cache objects which then get synced later. So, the AST is a fairly compact structure which mimics well the objects it gets turned into, but it still takes quite a bit of processing. If we wanted to do more loading in the background then new cache structures can be created and filled, then quickly translated into their final form in the main thread.
Game Development, Engine Development, Porting
http://www.darkwindmedia.com
tp
Halfling
Posts: 40
Joined: Sat Dec 09, 2006 9:06 am

Re: Proposal: OGRE_THREAD_SUPPORT == 3

Post by tp »

I'm going to try and chip in from an "average" Ogre user's perspective. I personally am decently formally schooled in these matters, but my primary interests are not in the latest fads on getting that 1% more FPS. I generally want to build stuff that runs on older computers as well, and concentrate on getting a good end result (and I've found Ogre as a whole a very good tool for someone like that).

In approaching this threading issue, I think there are two problems.

1. Different needs, different tools

Users use the different threading possibilities for at least two different purposes. Some use them to improve the user experience of their applications, which most of the time means loading data in the background so that the user interface remains responsive. No FPS calculations, thread utilization profiling or locking system concerns affect these people. They just want to make sure their games run smoothly. Others use them to make sure their architecture is up to speed and that they can make game design decisions based on factors internal to the engine, theirs or Ogre's.

Due to point two (which I'll get to shortly), I am mostly in the former camp myself. The concerns here are that it'll get harder to do simple stuff and you might not even be able to do it anymore.
sinbad wrote:One of the major things I should mention is that this new mode will prevent parsing of scripts in the background.
My belief is that lots of people in the "thread to make the game work better" group (i.e. the first group I mentioned) are, at best, very concerned, and at worst, losing their faith, when they read something like this. It's really a pity, because these people might be interested...

2. Uhh.. weighted score or median?

If I had to, I would give Ogre as a whole a score of about 90% perhaps (very good in my book). Contrary to many other open source projects, the docs would get a 90% as well. The documentation and information about how Ogre works with and without threading, what you can do about it and which parts of the library are thread safe: 5%. I mean, the best source of information are the patch notes where the different OGRE_THREAD_SUPPORT modes were published.

Throwing some concrete information out there would really help in evaluating what these kinds of changes mean to me and my project. One of the first questions I myself would love to have answered is what exactly would I need to do to get my level data from a resource group loaded in a background thread if the proposed mode 3 is implemented. How would that change from how it is now? Can I load a mesh that is not yet used by any entity from a resource safely outside the rendering thread if OGRE_THREAD_SUPPORT is 1? What about if it is 3? Or 0? I have to go into the code to get any answers. (And I have, to some extent, I'm just trying to make a point)

A nice start would be some documentation detailing what you need to do to get the results given by using OGRE_THREAD_SUPPORT == 0 to work with a separate loader thread, and others mentioned in this thread that you can do already.


I understand the aspirations of the potential new system, and I think there are times when breaking changes are good and necessary even in an established project. It's likely, though, that many people are put off by this simply because they don't have the reference to compare it to.
User avatar
xavier
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 9481
Joined: Fri Feb 18, 2005 2:03 am
Location: Dublin, CA, US
x 22

Re: Proposal: OGRE_THREAD_SUPPORT == 3

Post by xavier »

tp wrote: Users use the different threading possibilities for at least two different purposes. Some use them to improve the user experience of their applications, which most of the time means loading data in the background so that the user interface remains responsive. No FPS calculations, thread utilization profiling or locking system concerns affect these people. They just want to make sure their games run smoothly. Others use them to make sure their architecture is up to speed and that they can make game design decisions based on factors internal to the engine, theirs or Ogre's.
There is nothing a tasking approach implies that runs counter to these goals. Internally, the same load requests that came through a statically-scheduled thread can still come through off a task; you (Ogre devs) could still lock the same resource structures that are locked now, but my guess is that a better approach that meets the same goals, and does so within the context of a single task-based system, will likely be designed.

In other words, just because 1 & 2 might go away as explicit defines in a config file, doesn't mean that what they do is end-of-lifed; it will just be done a different way internally.
Do you need help? What have you tried?

Image

Angels can fly because they take themselves lightly.
CABAListic
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 2903
Joined: Thu Jan 18, 2007 2:48 pm
x 58

Re: Proposal: OGRE_THREAD_SUPPORT == 3

Post by CABAListic »

Ogre's threading options need simplification, imho, so I'm all for removing mode 1 and 2 in favour of 3. It's going to be a maintenance nightmare to support all 3.
It might, however, be a good idea to plan some time to get the new mode stable, so I'd vote for postponing it to 1.8.
User avatar
Souvarine
Halfling
Posts: 79
Joined: Mon Apr 28, 2008 12:01 am
Location: France
x 5

Re: Proposal: OGRE_THREAD_SUPPORT == 3

Post by Souvarine »

So, if I use Ogre with OGRE_THREAD_SUPPORT=3, will I be able to create my own request and push them in the WorkQueue (to process some IA for example) ? Or is the WorkQueue for Ogre internal usage only ? Or maybe it is dedicated to resource loading only ?
Puzzle Platform Race
Open source action reflection platform game
User avatar
xavier
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 9481
Joined: Fri Feb 18, 2005 2:03 am
Location: Dublin, CA, US
x 22

Re: Proposal: OGRE_THREAD_SUPPORT == 3

Post by xavier »

Souvarine wrote:So, if I use Ogre with OGRE_THREAD_SUPPORT=3, will I be able to create my own request and push them in the WorkQueue (to process some IA for example) ? Or is the WorkQueue for Ogre internal usage only ? Or maybe it is dedicated to resource loading only ?
Access to it is obtained through Root -- it's for anyone's use. For anything larger than a trivial 3D app, however, you probably wouldn't want to use Ogre as your task scheduler -- instead, you'd probably have Ogre use the one that the rest of your application is already using.
Do you need help? What have you tried?

Image

Angels can fly because they take themselves lightly.
User avatar
_tommo_
Gnoll
Posts: 677
Joined: Tue Sep 19, 2006 6:09 pm
x 5

Re: Proposal: OGRE_THREAD_SUPPORT == 3

Post by _tommo_ »

I'll throw my random 2 cents in:

for me Ogre tends to be a bit "all-wrapping" and "framework like"... i mean, even with the enormous effort spent on plug-in systems, it is often difficult to partition it from the outside, like OGRE_THREAD_SUPPORT 0, 1 and 2 showed well enough.

To me, Ogre SHOULD NOT thread itself in any way; neither 1, 2 or 3.

1)It pushes on the final developer an ever-increasing level of complexity (workqueue, requesthandler, TBB... "wtf are those");

2)Forces to understand and wrap yet another useless (from the final app standpoint) threading system, which often enough has already been implemented in the project; i could as well be already using TBB, CnC, something made by me for Xbox360, etc... this already happens with Memory Allocators... where PhysX uses my allocator just calling a runtime method, while Ogre reimplements a copy of everything and requires a recompile.

3)As task issuing is internally managed, different work scheduling has to be forced through recompilation from SVN, tweaking and configuring... what if i think that Scene Graph is good enough when mono threaded, even in OTS3? What if i want to thread ONLY animations in OTS0?

4)As said before, changing settings requires modifies to the source, that in stable team settings are quite a let-down for me... transforming a standard setup&forget SDK in another piece of code to mantain.

All those points are anyway just some sides of a big problem: You (the graphic lib, one of the many) are forcing on Me (the user) yet another threading scheme, offering me a mere 2 options... between system threading and yet-to-be-mainstream job swarming.
Choices that could have me tend between two dangerous routes: locking the whole ogre to a thread and forget it there (which as extensively explained is the Evil) or go the Vector3 way and contaminate my whole code of Ogre-Inherited threading classes, wrapping my app to Ogre rather than the opposite (what? i push AI to a graphic library thread scheduler? this sounds really bad).

So, what i'm proposing here?
I'm proposing that Ogre should only PASSIVELY provide threading:
it should provide itself and more importantly its internal tasks in a way that is easy to pick up and integrate in my own centralised thread scheduler, but without imposing any particular threading scheme on the user;
i don't push request to Ogre's workQueues to run, rather i pull Ogre's requests and run them as i want, where i want, on the thread i want... as long that i respect their dependencies.
Of course, You can build any default work scheme on top of this, but at the same time (should i have too much time in my hands :wink: ) i could create a work scheme where ogre does nothing without passing its tasks to my scheduler.

Anyway, because of problem 1) i'm very far from understanding the intended workflow for Ogre Threading, so you could have as well been talking of this and i didn't notice :roll:
OverMindGames Blog
IndieVault.it: Il nuovo portale italiano su Game Dev & Indie Games
User avatar
steven
Gnoll
Posts: 657
Joined: Mon Feb 28, 2005 1:53 pm
Location: Australia - Canberra (ex - Switzerland - Geneva)

Re: Proposal: OGRE_THREAD_SUPPORT == 3

Post by steven »

tp wrote:Throwing some concrete information out there would really help in evaluating what these kinds of changes mean to me and my project. One of the first questions I myself would love to have answered is what exactly would I need to do to get my level data from a resource group loaded in a background thread if the proposed mode 3 is implemented. How would that change from how it is now? Can I load a mesh that is not yet used by any entity from a resource safely outside the rendering thread if OGRE_THREAD_SUPPORT is 1? What about if it is 3? Or 0? I have to go into the code to get any answers. (And I have, to some extent, I'm just trying to make a point)
CABAListic wrote:Ogre's threading options need simplification, imho, so I'm all for removing mode 1 and 2 in favour of 3. It's going to be a maintenance nightmare to support all 3.
_tommo_ wrote:1)It pushes on the final developer an ever-increasing level of complexity (workqueue, requesthandler, TBB... "wtf are those");
I can't agree more to those three comments.

For me Ogre is just a library that I use hence I don't want that ogre creates threads I don't have control over.
If all libraries do the same (physics, Ai, network, etc) you end up with dozens of competing threads - and usually you can't set the cpu affinity of those.

Hence I strongly hope that OGRE_THREAD_SUPPORT (==3) will come with an Ogre sample using an external workqueue / tasks / threads scheduler (or any name you want to call it). I think you will agree that a sample that load in the background terrain & assets and perhaps use threaded animations would help nearly everyone wanting thread support.


2)Forces to understand and wrap yet another useless (from the final app standpoint) threading system, which often enough has already been implemented in the project; i could as well be already using TBB, CnC, something made by me for Xbox360, etc... this already happens with Memory Allocators... where PhysX uses my allocator just calling a runtime method, while Ogre reimplements a copy of everything and requires a recompile.
Same idea: a sample using an external memory allocator would be nice :)
User avatar
sparkprime
Ogre Magi
Posts: 1137
Joined: Mon May 07, 2007 3:43 am
Location: Ossining, New York
x 13

Re: Proposal: OGRE_THREAD_SUPPORT == 3

Post by sparkprime »

Certainly I think having lots of objects that are 'shared' and protected with locks is a bad idea -- you often find yourself hopping from object to object in an OO framework and avoiding deadlocks becomes very hard.

Interestingly, there is not a huge amount of parallelism in OGRE. In fact, when I did the OGRE_THREAD_SUPPORT==2 stuff, from what I could tell the only 'parallelism' was the blocked IO thread. By chance, the decompressing of ZIP files occured behind the IO call in question so that was run in parallel, but that is just the exception to the rule. So the extra threading existed only for the purposes of implementing non-blocking IO. It would be possible to tear out the threading and replace it with system-specific non-blocking IO calls, and I recall this was discussed at the time.

So, for not much parallelism, we paid a high cost in terms of linear locking overhead (this is not the same as the multicore contention problems someone mentioned earlier). In other words, we were only using 1 core but were doing L2 cache misses all the time due to the locking code. We were also risking deadlock due to trying to retrofit thread-safety by just giving a lock to every object we ended up touching.

Nowadays I gather there is more going on -- there is some number crunching going on as part of specific algorithms, and this crunching is being farmed out to a number of threads using a work queue to load balance and maximise throughput.

These 2 uses of threads (non-blocking IO and work queue) are completely different. Work queues are often perceived to be much more efficient as they are used to solve embarassingly parallel problems where there are very few synchronisation constraints. The set of problems that fall into this space is quite small (although growing) but one can arrange it so that the majority of FLOPS are spent doing such algorithms and thus the system as a whole utilises the cores quite well. There is no requirement to avoid locks here -- it's just a simple ratio of time spent synchronising over time spent doing useful work. The more data is contended, the less useful work you can do. Duplicating data to reduce contention therefore makes complete sense, but it's a general technique and somewhat orthogonal to the question of the threading model used.

Anyway, there should definitely be a work queue -- but the user should be able to trap the jobs and execute them in their own work queue. Time consuming things like decompressing zip and compiling big shaders should be done with this. It load balances the whole system nicely. Getting external libraries to play ball might be difficult though. Many calls that are currently synchronous would need to become asynchronous.

There should definitely be non-blocking IO, whether implemented with threads or with system support I don't think it really matters. It should preferably be contained, anyway. Having a separate ResourceBackgroundQueue is probably not the right way to go anymore. A call to load() in the main rendering thread should do all the magic asynchronously.

I'm not sure whether the message passing style you propose is the magic bullet, however. It's good for separating memory. But the overheads of copying can often drown out the overheads of locks. Passing of ownership between threads cuts down on the copying, but it's easy to get race conditions if you leave a dangling pointer in the wrong thread. You can still get deadlocks too, if the message protocol gets too complicated. Just like you don't get deadlocks at first but as the system grows and the interdependencies become more complicated, they start cropping up. I think it's essentially no different from locks in terms of programmability, which would explain why the take-up has been so small despite the hype.

I have a background loading thread that loads the closest resources to the camera first. One submits a request to the background loader from the rendering thread. Since objects move, their distance needs to be recalculated and updated in the background loader. With a message-based system you'd have to send update messages and process them. All I did is have a volatile float field that I updated every frame. Trivially simple and the fastest solution possible. I think you just have to mix/match things as appropriate rather than defining a single model for all.

We could easily reduce the overheads of locking in OGRE anyway though, by switching to a lock-free implementation of SharedPtr. This is only hard because of all the subclasses of SharedPtr and is a lot easier than what you propose. I think you'd only get a few FPS improvement though as the locks are not that big a deal anymore. It seems you're more interested in refactoring and cleanup than squeezing out the last few drops of performance.
User avatar
Praetor
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 3335
Joined: Tue Jun 21, 2005 8:26 pm
Location: Rochester, New York, US
x 3

Re: Proposal: OGRE_THREAD_SUPPORT == 3

Post by Praetor »

The implementation of Task-based scheduling does not preclude you from having total control. Here are a few concerns hopefully addressed:

1) You can swap in your own WorkQueue for the default one. Which means if you already do task-based threading in your system you can have Ogre use your own.

2) About Ogre "forcing" anything on you. We try to have 2 levels: a default system in place that handles a good amount of cases for people and then a level which lets the developer take more control. After this new threading system is in place we want it to handle the average case for most people. However, nothing prevents you from turning off threading completely. In that case you can now take over and thread or not thread as you please. The key will be that there even without threading enabled Ogre needs to have enough elements that could be threaded so that developers can take advantage of them.

This is really what we've discussed several times before in relation to Ogre 2.0. We wanted to make sure that any large areas of Ogre that could be helped by threading were made to support it. What we're discussing now is that this new threading mode will provide some default threading implementations to take advantage of it.

I am one of those that currently uses OGRE_THREAD_SUPPORT = 0 and I do all my own threading.
Game Development, Engine Development, Porting
http://www.darkwindmedia.com
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 66

Re: Proposal: OGRE_THREAD_SUPPORT == 3

Post by sinbad »

Praetor summed this up pretty well. To respond individually to some of the points:
sparkprime wrote:So, for not much parallelism, we paid a high cost in terms of linear locking overhead
Bingo. Basically, I'm proposing a change because the existing approaches requires that if you enable threading in OGRE at all, too many classes need to become thread safe, which means a lot of locking regardless of how much parallel work you actually do, because the API is trying to make itself generally threadsafe in certain areas (mostly resource and the classes that uses).

Not only is inefficient, it makes the code complex and difficult to maintain, and it's very hard to describe accurately to the user as to what they can and can't do thread safely, which is where this comes from:
tp wrote:The documentation and information about how Ogre works with and without threading, what you can do about it and which parts of the library are thread safe: 5%. I mean, the best source of information are the patch notes where the different OGRE_THREAD_SUPPORT modes were published.
I completely agree. And this is one of the things I want to fix with this new mode, because not only is it simpler to implement, it's also a lot easier to explain and put a box around for users to grasp.
tp wrote:(Script parsing): My belief is that lots of people in the "thread to make the game work better" group (i.e. the first group I mentioned) are, at best, very concerned, and at worst, losing their faith, when they read something like this. It's really a pity, because these people might be interested...
As Praetor said, it's still possible to do the script parsing in the background, it's just that building the Materials etc can't be done there. I raised it because it's one of the trickier areas and hilights the difference between the two approaches - ie it's entirely data-driven and not about trying to make the entire API thread safe.

I've implemented several apps now where I've built Ogre with OGRE_THREAD_SUPPORT=0 and threaded I/O and other processing outside of that, in order to give the user an uninterrupted experience. The lack of locking in Ogre has also meant I've been able to get better FPS in those apps than if I'd used OGRE_THREAD_SUPPORT=1/2. I raised this because I'd like other people to be able to access that kind of flexibility more easily. You shouldn't assume that the change will somehow remove the ability to prevent hiccups when loading - it will continue to provide that ability, just slightly differently, with better overall performance if anything.
steven wrote:For me Ogre is just a library that I use hence I don't want that ogre creates threads I don't have control over.
If all libraries do the same (physics, Ai, network, etc) you end up with dozens of competing threads - and usually you can't set the cpu affinity of those.
We will allow that, and in fact we always have. There has always been the option to tell OGRE not start its own threads and for you to drive the background processes yourself, that won't change.
steven wrote:Hence I strongly hope that OGRE_THREAD_SUPPORT (==3) will come with an Ogre sample using an external workqueue / tasks / threads scheduler (or any name you want to call it). I think you will agree that a sample that load in the background terrain & assets and perhaps use threaded animations would help nearly everyone wanting thread support.
Check trunk. The current Terrain system already uses externally registered handlers for background tasks. It does use the standard WorkQueue and threads, and we still have our own structures for Requests / Responses, but that's because we need them if we're going to do request any background processing ourselves. It's still perfectly possible to externalise the handling of those requests/responses and the starting of threads, but we still need a hub to hang our own tasks off. It would be pointless for me to engineer a situation where I needed to create another external thread system in an Ogre sample, but we will always allow it (and we do already - WorkQueue is entirely, trivially replaceable with your own implementation).
_tommo_ wrote:I'm proposing that Ogre should only PASSIVELY provide threading:
it should provide itself and more importantly its internal tasks in a way that is easy to pick up and integrate in my own centralised thread scheduler, but without imposing any particular threading scheme on the user;
i don't push request to Ogre's workQueues to run, rather i pull Ogre's requests and run them as i want, where i want, on the thread i want... as long that i respect their dependencies.
Of course, You can build any default work scheme on top of this, but at the same time (should i have too much time in my hands :wink: ) i could create a work scheme where ogre does nothing without passing its tasks to my scheduler.
This would be fine except that lots of users don't want it this way. I'm quite happy to implement my own threading outside of Ogre, as are you, and as is steven. But you shouldn't assume for a second that's a universal opinion - in fact far more users just want to say 'load things in the background please' and forget about it. By implementing your option exclusively, I'd be saying to people "You can have parallelism in Ogre, but only if you know how to implement the guts of the process in your own app". That's not acceptable to me, or to a large number of our users.

Instead, I want to allow both modes. By default, the 'easy' mode when Ogre drives the WorkQueue for you will be used. I know you don't like that, but large numbers of other people do. For those that want to drive the tasks externally, you'll be able to do that too - either by plugging in your own WorkQueue implementation which simply delegates to your own task queues, or by using the existing queue but telling OGRE not to start any threads, so the items on it remain unprocessed until your threads deal with them.

At the end of the day, there needs to be a default threading implementation because our samples already use it, and in future pushing more tasks across cores is going to be more rather than less common, even in the core. I definitely want to keep the user in control of that if they want to be, using their own thread pools and prioritisation strategies, but that doesn't for a second mean that there shouldn't be a default that more modest users can just use out of the box, and that we can use in our samples. The two are not mutually exclusive in the slightest.
sparkprime wrote:I have a background loading thread that loads the closest resources to the camera first. One submits a request to the background loader from the rendering thread. Since objects move, their distance needs to be recalculated and updated in the background loader. With a message-based system you'd have to send update messages and process them. All I did is have a volatile float field that I updated every frame. Trivially simple and the fastest solution possible. I think you just have to mix/match things as appropriate rather than defining a single model for all.
And if you choose, you could still share memory across your threads if you know that special case is safe. What I'm proposing here is that the general rule is that no general API call is threadsafe anymore, and that we thread loading (and other tasks like terrain calculations, as I do already) without requiring that. At a stroke this removes all of the locking that was mostly unnecessary but had to be there just in case because the API was generally threadsafe in the resource areas. You can of course still special case some things if you know they're safe, but that's very different to assuming general thread safety. When I've used this approach in external applications I've special-cased shared memory sometimes too, and that's fine, but because the core was never threadsafe I didn't incur all the safety checks that reduced my performance unnecessarily.

So to sum up, I'm proposing this new mode because:
  • It's more efficient when threading is enabled
  • It's easier to explain to users
  • It's easier to extend to more tasks without propagating an unlimited number of locks throughout the API
  • It's easier for users to integrate into their own thread systems; because it's all going through one place (WorkQueue) and the base principles are easier to explain
  • It's easier to maintain / less error prone (only if we get rid of the other modes)
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 66

Re: Proposal: OGRE_THREAD_SUPPORT == 3

Post by sinbad »

Regarding when to include it (1.7 or 1.8 ), one of the reasons I raised it is that my new Terrain component is currently a bit hobbled by having to operate under OGRE_THREAD_SUPPORT=1/2 when in fact it never needs the locking behaviour that that comes with. The terrain component loads data, processes LOD and calculates normals and lightmaps in the background without ever needing anything else to be threadsafe (except WorkQueue and the file opening, the latter I mentioned as one area that would need looking at). Obviously this was influenced by my experience with externally threaded apps operating with OGRE_THREAD_SUPPORT=0. Unfortunately because I had to shoehorn it into the exiting codebase, it currently relies on OGRE_THREAD_SUPPORT=1/2. I've been using 2 because it's the lesser of 2 evils, but it still has a cost.

I suggest starting a new branch in which to implement the new approach, and reviewing how stable it is at the point we want to go live with 1.7. Then we can defer it to 1.8 if it's not ready. I'll wait until the GSoC branches are merged in though to avoid having lots of merge complexity.
tp
Halfling
Posts: 40
Joined: Sat Dec 09, 2006 9:06 am

Re: Proposal: OGRE_THREAD_SUPPORT == 3

Post by tp »

tuan kuranes wrote:Same here, always used OGRE_THREAD_SUPPORT=0 and threading my own data to resource loading.
Praetor wrote:I am one of those that currently uses OGRE_THREAD_SUPPORT = 0 and I do all my own threading.
Is there any chance of getting a brief step-by-step rundown of how to work with the current Ogre to get this done for resource background loading? That might be a simple enough case to understand, and might give more insight into where things are and what to expect.
User avatar
xavier
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 9481
Joined: Fri Feb 18, 2005 2:03 am
Location: Dublin, CA, US
x 22

Re: Proposal: OGRE_THREAD_SUPPORT == 3

Post by xavier »

Regarding things Ogre might consider for future direction:


Emergent presented at GameFest 2008, a way to enhance parallel rendering on serial devices (DX9, etc).

http://www.microsoft.com/downloads/deta ... laylang=en

Nice thing is, they open-sourced this "command buffer" implementation (link is at the end of the slides).
Do you need help? What have you tried?

Image

Angels can fly because they take themselves lightly.