Is ogre ready for multicore CPU's?

What it says on the tin: a place to discuss proposed new features.
uzik
Goblin
Posts: 202
Joined: Sun Feb 25, 2007 1:45 am
Location: USA
x 6

Is ogre ready for multicore CPU's?

Post by uzik »

"The writing is on the wall" as they say. Intel's announced plans include
even more parallelization.
---A dream doesn't become reality through magic; it takes sweat, determination and hard work.
Chaster
OGRE Expert User
OGRE Expert User
Posts: 557
Joined: Wed May 05, 2004 3:19 pm
Location: Portland, OR, USA

Post by Chaster »

Look at the discussion on Tindalos in the Developer section of the forum. Multi-threading will be addressed much better in Ogre 2.0
User avatar
xavier
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 9481
Joined: Fri Feb 18, 2005 2:03 am
Location: Dublin, CA, US
x 22

Post by xavier »

In general, concurrent design on consoles tends to leave the renderer running on a single core by itself. This design would aso translate well to multicore CPUs. Other parts of a game engine benefit from running concurrently, but at the end of the frame, all of the rendering data is still serialized to a single resource. While it might sound good to have transform calculations running in separate threads, what you find in general is that the overhead of keeping all of the data synchronized equals or exceeds the benefits of running those calculations in a straight line.

Things like resource loading in parallel are already addressed in Eihort, and stuff like animations, well, it depends on how much is done on the CPU and how much is done on the GPU. The trend is to offload as much as possible onto a GPU because it is inherently parallel in design and the GPU power is growing at a much faster rate than CPU power. For CPU skinning (i.e. for physics) that data is already handled in a separate thread, whether you run your physics in a separate thread explicitly, or use a library such as Novodex (PhysX) which does it like that for you behind the scenes.

So I would say that Ogre is as ready to run in a concurrent environment as anything else at this point.
Do you need help? What have you tried?

Image

Angels can fly because they take themselves lightly.
User avatar
Falagard
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 2060
Joined: Thu Feb 26, 2004 12:11 am
Location: Toronto, Canada
x 3

Post by Falagard »

I'd say the only two areas I can see Ogre benefiting from running parts of it on a separate core (in addition to background resource loading) are complex animation blending and particle systems.

When I say animation blending, I'm talking about blending several animations together using state machines, special blend transitions, etc. similar to this thread http://www.ogre3d.org/phpBB2/viewtopic.php?t=30461

Those types of things need to be handled by the CPU and not the GPU, although obviously the final bone matrices can be passed to the GPU for rendering hardware skinned animations. The funny thing is that this doesn't need to be done by Ogre at all - since it's game specific and the blending itself could be calculated on a separate thread and then the results used to update the animation state in Ogre.

Really complex particle systems could be calculated in a separate thread and then just rendered by Ogre. I'm not sure about the overhead of synching the particles to the main thread for rendering though, especially since if you have so many particles that you need to calculate them on a separate core, then that's a lot of data to pass between threads.
uzik
Goblin
Posts: 202
Joined: Sun Feb 25, 2007 1:45 am
Location: USA
x 6

Post by uzik »

Thanks Chaster, great discussion on Tindalos.

Hey xavier, I thought that might be the case but I'm still too new to
ogre to be sure.

if the Tindalos branch moves transformations to a separate
entity then the physics PPU might be doing movement and
animation.

The only thing that comes to mind that would be
an easy and natural spinoff for another core is AI. You can put
another core to work considering the next move in your game.
You don't need high bandwidth or much coordination.
It generally doesn't need access to huge data resources since too much
input bogs your AI down in details. The data it feeds back
is pretty small too, just a representation of a decision.

Oh, and taking a page from IBM mainframe days we could have
one core per hardware I/O device. I almost forgot about the
sound card. I could finally implement sound that dynamically
changes tempo and style based on the action in the game.
---A dream doesn't become reality through magic; it takes sweat, determination and hard work.
uzik
Goblin
Posts: 202
Joined: Sun Feb 25, 2007 1:45 am
Location: USA
x 6

Post by uzik »

Falagard wrote:Really complex particle systems could be calculated in a separate thread and then just rendered by Ogre. I'm not sure about the overhead of synching the particles to the main thread for rendering though, especially since if you have so many particles that you need to calculate them on a separate core, then that's a lot of data to pass between threads.
Particles and fluids are one of the advertising points for the physics processing unit hardware.
---A dream doesn't become reality through magic; it takes sweat, determination and hard work.
User avatar
Falagard
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 2060
Joined: Thu Feb 26, 2004 12:11 am
Location: Toronto, Canada
x 3

Post by Falagard »

I'm not sold on PPUs, so don't really care about the hardware until it becomes mainstream (if it does).
It generally doesn't need access to huge data resources since too much
input bogs your AI down in details. The data it feeds back
is pretty small too, just a representation of a decision.
We're talking about Ogre here, not other game specific systems. I'd think about putting physics, AI, networking, and any other relevant game systems on separate threads.

Ogre is specific to rendering - you asked if Ogre is ready for multiple cores, and the only areas that effect Ogre that I can think of are particle systems and animation blending.
User avatar
syedhs
Silver Sponsor
Silver Sponsor
Posts: 2703
Joined: Mon Aug 29, 2005 3:24 pm
Location: Kuala Lumpur, Malaysia
x 51

Post by syedhs »

Eventually, when computer with quad cores are almost everywhere it would be very nice to have Ogre's processing spread across 1 core, 2 core or more. Depending on the application, graphics rendering may constitute from 50% - 100% of CPU usage. Dont ask me where I got those two numbers :lol: but to those applications with 100% of graphics rendering, having them spread evenly to 4 cores is a must.

Maybe I am asking too much, but if there is an API which spells something like this:-

Code: Select all

// all the code below are for 4 cores CPU
// 100% utilization on all 4 cores
mRoot->utilizeAllCores();
// utilize 3 cores
mRoot->utilizeCore(1, 100);
mRoot->utilizeCore(2, 100);
mRoot->utilizeCore(3, 100);
mRoot->utilizeCore(4, 0);
I dont read much into these cores programming, but I heard that it is doable to explicitly specify which core to use. As always, correct me if I am wrong.
User avatar
xavier
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 9481
Joined: Fri Feb 18, 2005 2:03 am
Location: Dublin, CA, US
x 22

Post by xavier »

syedhs wrote: but to those applications with 100% of graphics rendering, having them spread evenly to 4 cores is a must.
What exactly do you think in Ogre can be parallelized across 4 cores?

I understand that when you first learn to use a hammer, everything looks like a nail, but some applications simply are not suited well for particular designs. Rendering almost always will be one application that can only be parallelized so far before diminishing returns sets in.
Do you need help? What have you tried?

Image

Angels can fly because they take themselves lightly.
User avatar
xavier
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 9481
Joined: Fri Feb 18, 2005 2:03 am
Location: Dublin, CA, US
x 22

Post by xavier »

syedhs wrote: I dont read much into these cores programming, but I heard that it is doable to explicitly specify which core to use. As always, correct me if I am wrong.
You can set affinity to a particular core, but you can't force a thread to run only on one core -- the OS has control over that. It's a different matter on a console like PS3 -- there you tell the OS to give you a core resource (but not which one, only one of the available ones) and you can run your thread on that core.
Do you need help? What have you tried?

Image

Angels can fly because they take themselves lightly.
User avatar
Falagard
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 2060
Joined: Thu Feb 26, 2004 12:11 am
Location: Toronto, Canada
x 3

Post by Falagard »

syedhs

As stated by Xavier and me, there aren't a lot of things in Ogre that make sense to pull out onto other cores. They're better used for game specific code like physics, AI, networking, etc.

At the same time, Tindalos will make it easier to do specific scene management tasks (such as visibility determination, culling, etc.) on other threads because of its abstraction of the scene graph and the fact that the transformation matrices used to render Renderables is going to be handled differently by only pushing values into them.
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 66

Post by sinbad »

Yes, Tindalos is mostly about allowing you to thread your own code that uses Ogre, rather than necessarily Ogre itself. Bear in mind that having more threads than physical cores is a drain on performance, not a gain, unless it's to avoid some other blocking bottleneck like I/O (hence the first thing we supported was background loading) which therefore compensates for context switches, cache coherency issues and synchronisation overheads.

There are a finite number of threads you can usefully run in parallel doing real work until ultra-multicore becomes far more mainstream - right now dual core is worth supporting (background loading especially, perhaps AI / physics too), and quad core will be worth it soon. All the talk of 16-core CPUs being the norm is mostly marketing fluff at the moment - but the architectural changes required to scale more smoothly for the future are being considered.
User avatar
Kojack
OGRE Moderator
OGRE Moderator
Posts: 7157
Joined: Sun Jan 25, 2004 7:35 am
Location: Brisbane, Australia
x 535

Post by Kojack »

You can set affinity to a particular core, but you can't force a thread to run only on one core
Umm, isn't setting the affinity to a core forcing the thread to only run on that core?
SetThreadAffinityMask() can restrict a thread to only running on a specific core.
User avatar
ahmedali
Gnome
Posts: 302
Joined: Fri Feb 20, 2004 8:52 pm
Location: Lahore, Pakistan

Post by ahmedali »

For me the ideal way of multithreading in future will like programmer just have to code his engine in regular way, and may follow some specfifc patterns. The compiler and hence the machine will automatically take advantage of multicores.

So currently im not running towards multicore buz and waiting for the future :D .
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 66

Post by sinbad »

OpenMP basically lets you do that, provided you use data partitioning. It's not particularly good at functional partitioning though.
User avatar
xavier
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 9481
Joined: Fri Feb 18, 2005 2:03 am
Location: Dublin, CA, US
x 22

Post by xavier »

Kojack wrote:
You can set affinity to a particular core, but you can't force a thread to run only on one core
Umm, isn't setting the affinity to a core forcing the thread to only run on that core?
SetThreadAffinityMask() can restrict a thread to only running on a specific core.
Not the way I understood it -- setting affinity would set preference, but not lock it to a core, hence the usage of the word "affinity". I could be wrong however.
Do you need help? What have you tried?

Image

Angels can fly because they take themselves lightly.
User avatar
Praetor
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 3335
Joined: Tue Jun 21, 2005 8:26 pm
Location: Rochester, New York, US
x 3

Post by Praetor »

I think xavier is closer. Like most things, setting affinity is a request or suggestion to the operating system. In most cases it'll probably give you exactly what you ask for, but isn't set in stone. The operating system still does what it does in general.

Concurrent design is an incredibly non-trivial task. It takes a long time to design a program with concurrency that has any depth to it at all. Therefore, it is highly unlikely an automated mechanism (aka a compiler) will be able to match a hand-designed concurrent system. People just don't truly understand that design paradigm, and they are going to need to learn quickly. Even without true concurrency most programs are rather sloppy.

With regards to Ogre, the only way the core rendering code will benefit from multiple threads will be when there are multiple GPUs present. In the end, Ogre's rendering core must stream everything to a single resource. Having more threads than the number of GPUs will cause design headaches for no gains (and probably some losses). Even in systems with multiple GPUs you still send data as if to one, and the dual-GPU system itself does the parallelization.
User avatar
syedhs
Silver Sponsor
Silver Sponsor
Posts: 2703
Joined: Mon Aug 29, 2005 3:24 pm
Location: Kuala Lumpur, Malaysia
x 51

Post by syedhs »

Okay here is an excerpt from MSDN:
A thread affinity mask is a bit vector in which each bit represents the processors that a thread is allowed to run on.

A thread affinity mask must be a proper subset of the process affinity mask for the containing process of a thread. A thread is only allowed to run on the processors its process is allowed to run on.
That is direct copy-paste from SetThreadAffinityMask API - the last sentence suggest that a thread can be made to run only on specific processors.
User avatar
syedhs
Silver Sponsor
Silver Sponsor
Posts: 2703
Joined: Mon Aug 29, 2005 3:24 pm
Location: Kuala Lumpur, Malaysia
x 51

Post by syedhs »

Okay this article maybe a bit old (Nov 06), but this can illustrate the benefit of 'getting ready' for multiple core' now. It sounds difficult & may need lots of resource but the gain is there.

http://techreport.com/etc/2006q4/source ... dex.x?pg=1

Highlight:
Game engines must perform numerous tasks before even issuing draw calls, including building world and object lists, performing graphical simulations, updating animations, and computing shadows. These tasks are all CPU-bound, and must be calculated for every "view", be it the player camera, surface reflections, or in-game security camera monitors. With hybrid threading, Valve is able to construct world and object lists for multiple views in parallel. Graphics simulations can be overlapped, and things like shadows and bone transformations for all characters in all views can be processed across multiple cores. Multiple draw threads can even be executed in parallel, and Valve has rewritten the graphics library that sits between its engine and the DirectX API to take advantage of multiple cores.
Kerion
Goblin
Posts: 235
Joined: Wed Feb 05, 2003 5:49 am

Post by Kerion »

The problem again, is that Source is a game engine, not a rendering engine. Yes, it has a very healthy rendering component which uses a huge time slice, but if you read that article, Valve is splitting game tasks across CPU's. There use of hybrid threading to split up processing of the space partitioning scheme is interesting, but again, game specific. I suppose future versions of OGRE could be written to allow certain branches of the scene graph to be processed on a different core or thread, but serious tests would have to be performed to see if it gained anything.
User avatar
syedhs
Silver Sponsor
Silver Sponsor
Posts: 2703
Joined: Mon Aug 29, 2005 3:24 pm
Location: Kuala Lumpur, Malaysia
x 51

Post by syedhs »

Kerion wrote:The problem again, is that Source is a game engine, not a rendering engine.
I supposed this is also specifically for game engine? I requote
Graphics simulations can be overlapped, and things like shadows and bone transformations for all characters in all views can be processed across multiple cores
I believe that if we keep repeating the mantra 'Ogre is not a game engine' and therefore disregard any development for multi-core, it will not be too long before we just find out there are lots to catch up.
Last edited by syedhs on Thu Apr 12, 2007 8:33 pm, edited 1 time in total.
User avatar
Falagard
OGRE Retired Moderator
OGRE Retired Moderator
Posts: 2060
Joined: Thu Feb 26, 2004 12:11 am
Location: Toronto, Canada
x 3

Post by Falagard »

Game engines must perform numerous tasks before even issuing draw calls, including building world and object lists, performing graphical simulations, updating animations, and computing shadows. These tasks are all CPU-bound, and must be calculated for every "view", be it the player camera, surface reflections, or in-game security camera monitors. With hybrid threading, Valve is able to construct world and object lists for multiple views in parallel. Graphics simulations can be overlapped, and things like shadows and bone transformations for all characters in all views can be processed across multiple cores. Multiple draw threads can even be executed in parallel, and Valve has rewritten the graphics library that sits between its engine and the DirectX API to take advantage of multiple cores.
Yeah, I've read that article. Let's briefly disect it.
including building world and object lists, performing graphical simulations, updating animations, and computing shadows.
"building world and object lists" = visibility determination, which can apparently be done on a separate thread in Tinadalos (if I understand correctly, by your own code and not built into Ogre).

"performing graphical simulations" - I'm going to assume this means physics simulation. Not related to Ogre. It could also mean particle systems, which could potentially be an area that is run on a seperate thread, and is perhaps something Ogre should look into in the future.

"updating animations" - as I said, this is one of the areas that could benefit from being done on a separate thread, but since Ogre already has the simple idea of animation state, it's possible to do it on a separate thread already, just that it's up to the game to handle it, not Ogre. Perhaps in the future if Ogre has a more advanced animation blending system built in.

"computing shadows" - huh? What types of shadows are they using where they need to be computed?

"Multiple draw threads can even be executed in parallel" - surprises me that they'd take this approach and that it's even possible in DirectX.
Kerion
Goblin
Posts: 235
Joined: Wed Feb 05, 2003 5:49 am

Post by Kerion »

I think the "computing shadows" comment is about their VRAD tool. Notice how it was used as an example a lot. They are talking about computing radiosity for levels, which to someone who doesn't do a lot of graphics programming could be simplified to "computing shadows".

As far as drawing, again, I think that's where they were talking about doing spacial partition updating in separate threads.

For instance, lets say your seen is composed of four main branches off of root. In theory, those branches and all their sub-nodes are self-contained. You could, in theory, process each branches updates in it's own thread concurrently, and not have any drawing issues. This is where they state that all threads can read the scene graph at the same time, but that writing to the scene graph is locked to a single thread (probably via a mutex). The drawing itself will still have to be done in a serial manner, because, as you stated, I am pretty sure that's how DirectX works. You can't be calling multiple Draw calls on a DirectX device in multiple threads and expect good results. They stated very clearly they have a layer in front of DirectX to manage this. I assume it waits for each concurrent branch update to finish, then pushes the results to DirectX. It would require very fine balancing of your scene graph though.
User avatar
Praetor
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 3335
Joined: Tue Jun 21, 2005 8:26 pm
Location: Rochester, New York, US
x 3

Post by Praetor »

The visibility testing could definitely be threaded. But, that would be the onus of the scenemanager to do that in parallel. I suppose Ogre could provide some tools to help. Shadows... either the radiosity thing or who knows. All the shadows I toy with are computed exclusively on the GPU in pixel shaders (which technically is parallel processing).
Kerion
Goblin
Posts: 235
Joined: Wed Feb 05, 2003 5:49 am

Post by Kerion »

Praetor wrote:The visibility testing could definitely be threaded. But, that would be the onus of the scenemanager to do that in parallel. I suppose Ogre could provide some tools to help. Shadows... either the radiosity thing or who knows. All the shadows I toy with are computed exclusively on the GPU in pixel shaders (which technically is parallel processing).
Right, but that's real time shadows. VRAD isn't doing real time shadows, it's pre-baking the levels radiosity. I really think the whole "computing shadows" thing is a reference to the way that Source still uses pre-baked shadow maps to do a lot of it's non-dynamic level lighting.