Is ogre ready for multicore CPU's?
-
- Goblin
- Posts: 202
- Joined: Sun Feb 25, 2007 1:45 am
- Location: USA
- x 6
Is ogre ready for multicore CPU's?
"The writing is on the wall" as they say. Intel's announced plans include
even more parallelization.
even more parallelization.
---A dream doesn't become reality through magic; it takes sweat, determination and hard work.
-
- OGRE Retired Moderator
- Posts: 9481
- Joined: Fri Feb 18, 2005 2:03 am
- Location: Dublin, CA, US
- x 22
In general, concurrent design on consoles tends to leave the renderer running on a single core by itself. This design would aso translate well to multicore CPUs. Other parts of a game engine benefit from running concurrently, but at the end of the frame, all of the rendering data is still serialized to a single resource. While it might sound good to have transform calculations running in separate threads, what you find in general is that the overhead of keeping all of the data synchronized equals or exceeds the benefits of running those calculations in a straight line.
Things like resource loading in parallel are already addressed in Eihort, and stuff like animations, well, it depends on how much is done on the CPU and how much is done on the GPU. The trend is to offload as much as possible onto a GPU because it is inherently parallel in design and the GPU power is growing at a much faster rate than CPU power. For CPU skinning (i.e. for physics) that data is already handled in a separate thread, whether you run your physics in a separate thread explicitly, or use a library such as Novodex (PhysX) which does it like that for you behind the scenes.
So I would say that Ogre is as ready to run in a concurrent environment as anything else at this point.
Things like resource loading in parallel are already addressed in Eihort, and stuff like animations, well, it depends on how much is done on the CPU and how much is done on the GPU. The trend is to offload as much as possible onto a GPU because it is inherently parallel in design and the GPU power is growing at a much faster rate than CPU power. For CPU skinning (i.e. for physics) that data is already handled in a separate thread, whether you run your physics in a separate thread explicitly, or use a library such as Novodex (PhysX) which does it like that for you behind the scenes.
So I would say that Ogre is as ready to run in a concurrent environment as anything else at this point.
-
- OGRE Retired Moderator
- Posts: 2060
- Joined: Thu Feb 26, 2004 12:11 am
- Location: Toronto, Canada
- x 3
I'd say the only two areas I can see Ogre benefiting from running parts of it on a separate core (in addition to background resource loading) are complex animation blending and particle systems.
When I say animation blending, I'm talking about blending several animations together using state machines, special blend transitions, etc. similar to this thread http://www.ogre3d.org/phpBB2/viewtopic.php?t=30461
Those types of things need to be handled by the CPU and not the GPU, although obviously the final bone matrices can be passed to the GPU for rendering hardware skinned animations. The funny thing is that this doesn't need to be done by Ogre at all - since it's game specific and the blending itself could be calculated on a separate thread and then the results used to update the animation state in Ogre.
Really complex particle systems could be calculated in a separate thread and then just rendered by Ogre. I'm not sure about the overhead of synching the particles to the main thread for rendering though, especially since if you have so many particles that you need to calculate them on a separate core, then that's a lot of data to pass between threads.
When I say animation blending, I'm talking about blending several animations together using state machines, special blend transitions, etc. similar to this thread http://www.ogre3d.org/phpBB2/viewtopic.php?t=30461
Those types of things need to be handled by the CPU and not the GPU, although obviously the final bone matrices can be passed to the GPU for rendering hardware skinned animations. The funny thing is that this doesn't need to be done by Ogre at all - since it's game specific and the blending itself could be calculated on a separate thread and then the results used to update the animation state in Ogre.
Really complex particle systems could be calculated in a separate thread and then just rendered by Ogre. I'm not sure about the overhead of synching the particles to the main thread for rendering though, especially since if you have so many particles that you need to calculate them on a separate core, then that's a lot of data to pass between threads.
-
- Goblin
- Posts: 202
- Joined: Sun Feb 25, 2007 1:45 am
- Location: USA
- x 6
Thanks Chaster, great discussion on Tindalos.
Hey xavier, I thought that might be the case but I'm still too new to
ogre to be sure.
if the Tindalos branch moves transformations to a separate
entity then the physics PPU might be doing movement and
animation.
The only thing that comes to mind that would be
an easy and natural spinoff for another core is AI. You can put
another core to work considering the next move in your game.
You don't need high bandwidth or much coordination.
It generally doesn't need access to huge data resources since too much
input bogs your AI down in details. The data it feeds back
is pretty small too, just a representation of a decision.
Oh, and taking a page from IBM mainframe days we could have
one core per hardware I/O device. I almost forgot about the
sound card. I could finally implement sound that dynamically
changes tempo and style based on the action in the game.
Hey xavier, I thought that might be the case but I'm still too new to
ogre to be sure.
if the Tindalos branch moves transformations to a separate
entity then the physics PPU might be doing movement and
animation.
The only thing that comes to mind that would be
an easy and natural spinoff for another core is AI. You can put
another core to work considering the next move in your game.
You don't need high bandwidth or much coordination.
It generally doesn't need access to huge data resources since too much
input bogs your AI down in details. The data it feeds back
is pretty small too, just a representation of a decision.
Oh, and taking a page from IBM mainframe days we could have
one core per hardware I/O device. I almost forgot about the
sound card. I could finally implement sound that dynamically
changes tempo and style based on the action in the game.
---A dream doesn't become reality through magic; it takes sweat, determination and hard work.
-
- Goblin
- Posts: 202
- Joined: Sun Feb 25, 2007 1:45 am
- Location: USA
- x 6
Particles and fluids are one of the advertising points for the physics processing unit hardware.Falagard wrote:Really complex particle systems could be calculated in a separate thread and then just rendered by Ogre. I'm not sure about the overhead of synching the particles to the main thread for rendering though, especially since if you have so many particles that you need to calculate them on a separate core, then that's a lot of data to pass between threads.
---A dream doesn't become reality through magic; it takes sweat, determination and hard work.
-
- OGRE Retired Moderator
- Posts: 2060
- Joined: Thu Feb 26, 2004 12:11 am
- Location: Toronto, Canada
- x 3
I'm not sold on PPUs, so don't really care about the hardware until it becomes mainstream (if it does).
Ogre is specific to rendering - you asked if Ogre is ready for multiple cores, and the only areas that effect Ogre that I can think of are particle systems and animation blending.
We're talking about Ogre here, not other game specific systems. I'd think about putting physics, AI, networking, and any other relevant game systems on separate threads.It generally doesn't need access to huge data resources since too much
input bogs your AI down in details. The data it feeds back
is pretty small too, just a representation of a decision.
Ogre is specific to rendering - you asked if Ogre is ready for multiple cores, and the only areas that effect Ogre that I can think of are particle systems and animation blending.
-
- Silver Sponsor
- Posts: 2703
- Joined: Mon Aug 29, 2005 3:24 pm
- Location: Kuala Lumpur, Malaysia
- x 51
Eventually, when computer with quad cores are almost everywhere it would be very nice to have Ogre's processing spread across 1 core, 2 core or more. Depending on the application, graphics rendering may constitute from 50% - 100% of CPU usage. Dont ask me where I got those two numbers but to those applications with 100% of graphics rendering, having them spread evenly to 4 cores is a must.
Maybe I am asking too much, but if there is an API which spells something like this:-
I dont read much into these cores programming, but I heard that it is doable to explicitly specify which core to use. As always, correct me if I am wrong.
Maybe I am asking too much, but if there is an API which spells something like this:-
Code: Select all
// all the code below are for 4 cores CPU
// 100% utilization on all 4 cores
mRoot->utilizeAllCores();
// utilize 3 cores
mRoot->utilizeCore(1, 100);
mRoot->utilizeCore(2, 100);
mRoot->utilizeCore(3, 100);
mRoot->utilizeCore(4, 0);
-
- OGRE Retired Moderator
- Posts: 9481
- Joined: Fri Feb 18, 2005 2:03 am
- Location: Dublin, CA, US
- x 22
What exactly do you think in Ogre can be parallelized across 4 cores?syedhs wrote: but to those applications with 100% of graphics rendering, having them spread evenly to 4 cores is a must.
I understand that when you first learn to use a hammer, everything looks like a nail, but some applications simply are not suited well for particular designs. Rendering almost always will be one application that can only be parallelized so far before diminishing returns sets in.
-
- OGRE Retired Moderator
- Posts: 9481
- Joined: Fri Feb 18, 2005 2:03 am
- Location: Dublin, CA, US
- x 22
You can set affinity to a particular core, but you can't force a thread to run only on one core -- the OS has control over that. It's a different matter on a console like PS3 -- there you tell the OS to give you a core resource (but not which one, only one of the available ones) and you can run your thread on that core.syedhs wrote: I dont read much into these cores programming, but I heard that it is doable to explicitly specify which core to use. As always, correct me if I am wrong.
-
- OGRE Retired Moderator
- Posts: 2060
- Joined: Thu Feb 26, 2004 12:11 am
- Location: Toronto, Canada
- x 3
syedhs
As stated by Xavier and me, there aren't a lot of things in Ogre that make sense to pull out onto other cores. They're better used for game specific code like physics, AI, networking, etc.
At the same time, Tindalos will make it easier to do specific scene management tasks (such as visibility determination, culling, etc.) on other threads because of its abstraction of the scene graph and the fact that the transformation matrices used to render Renderables is going to be handled differently by only pushing values into them.
As stated by Xavier and me, there aren't a lot of things in Ogre that make sense to pull out onto other cores. They're better used for game specific code like physics, AI, networking, etc.
At the same time, Tindalos will make it easier to do specific scene management tasks (such as visibility determination, culling, etc.) on other threads because of its abstraction of the scene graph and the fact that the transformation matrices used to render Renderables is going to be handled differently by only pushing values into them.
-
- OGRE Retired Team Member
- Posts: 19269
- Joined: Sun Oct 06, 2002 11:19 pm
- Location: Guernsey, Channel Islands
- x 66
Yes, Tindalos is mostly about allowing you to thread your own code that uses Ogre, rather than necessarily Ogre itself. Bear in mind that having more threads than physical cores is a drain on performance, not a gain, unless it's to avoid some other blocking bottleneck like I/O (hence the first thing we supported was background loading) which therefore compensates for context switches, cache coherency issues and synchronisation overheads.
There are a finite number of threads you can usefully run in parallel doing real work until ultra-multicore becomes far more mainstream - right now dual core is worth supporting (background loading especially, perhaps AI / physics too), and quad core will be worth it soon. All the talk of 16-core CPUs being the norm is mostly marketing fluff at the moment - but the architectural changes required to scale more smoothly for the future are being considered.
There are a finite number of threads you can usefully run in parallel doing real work until ultra-multicore becomes far more mainstream - right now dual core is worth supporting (background loading especially, perhaps AI / physics too), and quad core will be worth it soon. All the talk of 16-core CPUs being the norm is mostly marketing fluff at the moment - but the architectural changes required to scale more smoothly for the future are being considered.
-
- OGRE Moderator
- Posts: 7157
- Joined: Sun Jan 25, 2004 7:35 am
- Location: Brisbane, Australia
- x 535
-
- Gnome
- Posts: 302
- Joined: Fri Feb 20, 2004 8:52 pm
- Location: Lahore, Pakistan
For me the ideal way of multithreading in future will like programmer just have to code his engine in regular way, and may follow some specfifc patterns. The compiler and hence the machine will automatically take advantage of multicores.
So currently im not running towards multicore buz and waiting for the future .
So currently im not running towards multicore buz and waiting for the future .
-
- OGRE Retired Team Member
- Posts: 19269
- Joined: Sun Oct 06, 2002 11:19 pm
- Location: Guernsey, Channel Islands
- x 66
-
- OGRE Retired Moderator
- Posts: 9481
- Joined: Fri Feb 18, 2005 2:03 am
- Location: Dublin, CA, US
- x 22
Not the way I understood it -- setting affinity would set preference, but not lock it to a core, hence the usage of the word "affinity". I could be wrong however.Kojack wrote:Umm, isn't setting the affinity to a core forcing the thread to only run on that core?You can set affinity to a particular core, but you can't force a thread to run only on one core
SetThreadAffinityMask() can restrict a thread to only running on a specific core.
-
- OGRE Retired Team Member
- Posts: 3335
- Joined: Tue Jun 21, 2005 8:26 pm
- Location: Rochester, New York, US
- x 3
I think xavier is closer. Like most things, setting affinity is a request or suggestion to the operating system. In most cases it'll probably give you exactly what you ask for, but isn't set in stone. The operating system still does what it does in general.
Concurrent design is an incredibly non-trivial task. It takes a long time to design a program with concurrency that has any depth to it at all. Therefore, it is highly unlikely an automated mechanism (aka a compiler) will be able to match a hand-designed concurrent system. People just don't truly understand that design paradigm, and they are going to need to learn quickly. Even without true concurrency most programs are rather sloppy.
With regards to Ogre, the only way the core rendering code will benefit from multiple threads will be when there are multiple GPUs present. In the end, Ogre's rendering core must stream everything to a single resource. Having more threads than the number of GPUs will cause design headaches for no gains (and probably some losses). Even in systems with multiple GPUs you still send data as if to one, and the dual-GPU system itself does the parallelization.
Concurrent design is an incredibly non-trivial task. It takes a long time to design a program with concurrency that has any depth to it at all. Therefore, it is highly unlikely an automated mechanism (aka a compiler) will be able to match a hand-designed concurrent system. People just don't truly understand that design paradigm, and they are going to need to learn quickly. Even without true concurrency most programs are rather sloppy.
With regards to Ogre, the only way the core rendering code will benefit from multiple threads will be when there are multiple GPUs present. In the end, Ogre's rendering core must stream everything to a single resource. Having more threads than the number of GPUs will cause design headaches for no gains (and probably some losses). Even in systems with multiple GPUs you still send data as if to one, and the dual-GPU system itself does the parallelization.
-
- Silver Sponsor
- Posts: 2703
- Joined: Mon Aug 29, 2005 3:24 pm
- Location: Kuala Lumpur, Malaysia
- x 51
Okay here is an excerpt from MSDN:
That is direct copy-paste from SetThreadAffinityMask API - the last sentence suggest that a thread can be made to run only on specific processors.A thread affinity mask is a bit vector in which each bit represents the processors that a thread is allowed to run on.
A thread affinity mask must be a proper subset of the process affinity mask for the containing process of a thread. A thread is only allowed to run on the processors its process is allowed to run on.
-
- Silver Sponsor
- Posts: 2703
- Joined: Mon Aug 29, 2005 3:24 pm
- Location: Kuala Lumpur, Malaysia
- x 51
Okay this article maybe a bit old (Nov 06), but this can illustrate the benefit of 'getting ready' for multiple core' now. It sounds difficult & may need lots of resource but the gain is there.
http://techreport.com/etc/2006q4/source ... dex.x?pg=1
Highlight:
http://techreport.com/etc/2006q4/source ... dex.x?pg=1
Highlight:
Game engines must perform numerous tasks before even issuing draw calls, including building world and object lists, performing graphical simulations, updating animations, and computing shadows. These tasks are all CPU-bound, and must be calculated for every "view", be it the player camera, surface reflections, or in-game security camera monitors. With hybrid threading, Valve is able to construct world and object lists for multiple views in parallel. Graphics simulations can be overlapped, and things like shadows and bone transformations for all characters in all views can be processed across multiple cores. Multiple draw threads can even be executed in parallel, and Valve has rewritten the graphics library that sits between its engine and the DirectX API to take advantage of multiple cores.
-
- Goblin
- Posts: 235
- Joined: Wed Feb 05, 2003 5:49 am
The problem again, is that Source is a game engine, not a rendering engine. Yes, it has a very healthy rendering component which uses a huge time slice, but if you read that article, Valve is splitting game tasks across CPU's. There use of hybrid threading to split up processing of the space partitioning scheme is interesting, but again, game specific. I suppose future versions of OGRE could be written to allow certain branches of the scene graph to be processed on a different core or thread, but serious tests would have to be performed to see if it gained anything.
-
- Silver Sponsor
- Posts: 2703
- Joined: Mon Aug 29, 2005 3:24 pm
- Location: Kuala Lumpur, Malaysia
- x 51
I supposed this is also specifically for game engine? I requoteKerion wrote:The problem again, is that Source is a game engine, not a rendering engine.
I believe that if we keep repeating the mantra 'Ogre is not a game engine' and therefore disregard any development for multi-core, it will not be too long before we just find out there are lots to catch up.Graphics simulations can be overlapped, and things like shadows and bone transformations for all characters in all views can be processed across multiple cores
Last edited by syedhs on Thu Apr 12, 2007 8:33 pm, edited 1 time in total.
-
- OGRE Retired Moderator
- Posts: 2060
- Joined: Thu Feb 26, 2004 12:11 am
- Location: Toronto, Canada
- x 3
Yeah, I've read that article. Let's briefly disect it.Game engines must perform numerous tasks before even issuing draw calls, including building world and object lists, performing graphical simulations, updating animations, and computing shadows. These tasks are all CPU-bound, and must be calculated for every "view", be it the player camera, surface reflections, or in-game security camera monitors. With hybrid threading, Valve is able to construct world and object lists for multiple views in parallel. Graphics simulations can be overlapped, and things like shadows and bone transformations for all characters in all views can be processed across multiple cores. Multiple draw threads can even be executed in parallel, and Valve has rewritten the graphics library that sits between its engine and the DirectX API to take advantage of multiple cores.
"building world and object lists" = visibility determination, which can apparently be done on a separate thread in Tinadalos (if I understand correctly, by your own code and not built into Ogre).including building world and object lists, performing graphical simulations, updating animations, and computing shadows.
"performing graphical simulations" - I'm going to assume this means physics simulation. Not related to Ogre. It could also mean particle systems, which could potentially be an area that is run on a seperate thread, and is perhaps something Ogre should look into in the future.
"updating animations" - as I said, this is one of the areas that could benefit from being done on a separate thread, but since Ogre already has the simple idea of animation state, it's possible to do it on a separate thread already, just that it's up to the game to handle it, not Ogre. Perhaps in the future if Ogre has a more advanced animation blending system built in.
"computing shadows" - huh? What types of shadows are they using where they need to be computed?
"Multiple draw threads can even be executed in parallel" - surprises me that they'd take this approach and that it's even possible in DirectX.
-
- Goblin
- Posts: 235
- Joined: Wed Feb 05, 2003 5:49 am
I think the "computing shadows" comment is about their VRAD tool. Notice how it was used as an example a lot. They are talking about computing radiosity for levels, which to someone who doesn't do a lot of graphics programming could be simplified to "computing shadows".
As far as drawing, again, I think that's where they were talking about doing spacial partition updating in separate threads.
For instance, lets say your seen is composed of four main branches off of root. In theory, those branches and all their sub-nodes are self-contained. You could, in theory, process each branches updates in it's own thread concurrently, and not have any drawing issues. This is where they state that all threads can read the scene graph at the same time, but that writing to the scene graph is locked to a single thread (probably via a mutex). The drawing itself will still have to be done in a serial manner, because, as you stated, I am pretty sure that's how DirectX works. You can't be calling multiple Draw calls on a DirectX device in multiple threads and expect good results. They stated very clearly they have a layer in front of DirectX to manage this. I assume it waits for each concurrent branch update to finish, then pushes the results to DirectX. It would require very fine balancing of your scene graph though.
As far as drawing, again, I think that's where they were talking about doing spacial partition updating in separate threads.
For instance, lets say your seen is composed of four main branches off of root. In theory, those branches and all their sub-nodes are self-contained. You could, in theory, process each branches updates in it's own thread concurrently, and not have any drawing issues. This is where they state that all threads can read the scene graph at the same time, but that writing to the scene graph is locked to a single thread (probably via a mutex). The drawing itself will still have to be done in a serial manner, because, as you stated, I am pretty sure that's how DirectX works. You can't be calling multiple Draw calls on a DirectX device in multiple threads and expect good results. They stated very clearly they have a layer in front of DirectX to manage this. I assume it waits for each concurrent branch update to finish, then pushes the results to DirectX. It would require very fine balancing of your scene graph though.
-
- OGRE Retired Team Member
- Posts: 3335
- Joined: Tue Jun 21, 2005 8:26 pm
- Location: Rochester, New York, US
- x 3
The visibility testing could definitely be threaded. But, that would be the onus of the scenemanager to do that in parallel. I suppose Ogre could provide some tools to help. Shadows... either the radiosity thing or who knows. All the shadows I toy with are computed exclusively on the GPU in pixel shaders (which technically is parallel processing).
-
- Goblin
- Posts: 235
- Joined: Wed Feb 05, 2003 5:49 am
Right, but that's real time shadows. VRAD isn't doing real time shadows, it's pre-baking the levels radiosity. I really think the whole "computing shadows" thing is a reference to the way that Source still uses pre-baked shadow maps to do a lot of it's non-dynamic level lighting.Praetor wrote:The visibility testing could definitely be threaded. But, that would be the onus of the scenemanager to do that in parallel. I suppose Ogre could provide some tools to help. Shadows... either the radiosity thing or who knows. All the shadows I toy with are computed exclusively on the GPU in pixel shaders (which technically is parallel processing).