I am including this link from the last thread as it seems important
From Page 17 of the slides in the post: http://www.ogre3d.org/forums/viewtopic. ... 50#p478276dark_sylinc wrote:OK!! I added the tasks needed for Ogre 2.x
They're in the WIKI PAGE.
Different people have differing defintions of Tasks/Blocks/Workunits/microthreading/whatever this can make this discussion difficult. There are many task scheduler based threading solutions that launch very few threads by running multiple blocks/WorkUnits/Whatever in the thread before its destruction. The active research in this area makes defining things difficult. Having few good definitions makes discussing difficult.dark_sylinc wrote:A thing about tasks (aka. “Short-lived threads”):
In my experience on modern systems much of the cost of creating a thread comes from it being a system call and at some point needing to allocate memory. A system that is spawning hundreds of threads on modern commodity hardware is too broken to be considered by any performance instensive application, but only because it is creating a large amount of needless objects. I think there are some situations in which it may be faster to create/destroy threads rather than use other synchronization primitives, but in general I agree fewer creations of short lived objects on the heap is a good thing.
From the page 14 slides in the post: http://www.ogre3d.org/forums/viewtopic. ... 50#p478276
How much work is in each of those boxes (on page 14)? If there is a significant amount of work, then it may be possible to put them each into Task/WorkUnit in a system that respects data dependencies and does not create unneeded threads. If it is just a few calculations I may have an idea. Is it possible to easily group the nodes in some way where they can be worked on and the dependency is respected because the grouping implies thread safety (Please forgive my ignorance of Ogre). For example, if the nodes are in a hierarchy where each node has one parent and N children there is an implied guarantee. This would let us logically split the heirarchy near its base into X groups we could keep each child with its parents(and grandparents and so on until the split) and each group is worked by a single thread. This requires no synchonization until the work ends and naturally there will be no race condtions.dark_sylinc wrote:There is no guarantee parents are updated in the same thread as their children.
Doing so would be more trouble than it's worth. (But if real world code proves it isn't
hard to guarantee this order → no syncs would be needed until the end)
Towards the end of your slides you mention that entities with animations must be handled in a different fasion. With these complications and the culling issues you in mentioned your previous response it seems that some things to be rendered follow very different codepaths, if that is the case then a more nuanced view of threading that we have discussed might be required. Is there a document that describes these processes in detail (moreso than your slides) so I could make more intelligent and specific observations? Or should I be looking at the source code, and helping to populate the wiki page with my findings?
From: http://www.ogre3d.org/forums/viewtopic. ... 50#p478279
Real time profiling like you describe seems to imply that a change in the amount of work or number of threads requires fine tuning. If that is the case then the ability to perform this tuning must be exposed. Can you think of a way to adjust the location of the thread barrier with a heuristic of some kind? Also, a Workunit based system that respects data dependencies would presumably only have periods of waiting once all the work was complete, this compensates for cases when one thread take longer then expected just before the thread barrier.Xavyiy wrote:What I am trying to say? That if UpdateAllTransforms() & UpdateAllAnimation() are the most relevant candidates for parallelizing, I really suggest doing it in a simple, clear and efficient way: the barrier system. Easy to debug, easy to modify and, at the end, more efficient than using a task-based lib. Less overhead. Granularity could be a problem, but nothing stop us from doing some real-time profiling and adapt the range of objects each thread updates. Also, a great thing here is that the time needed to update each node is constant (it'll not be the same for animations, but it's not going to be a huge difference), so that helps a lot! (In a task-based system each task may have very different execution times, you can mix little tasks with big tasks, so granularity/profiling becomes very important to avoid wasting of CPU resources).
***Shameless plug*** In a post in the previous thread I linked to a library I am working on. I think it is close to an ideal fit for most game related tasks (maybe Ogre) : http://www.ogre3d.org/forums/viewtopic. ... 25#p477907
Edit - Fixed quotation, and again