al2950 wrote: ↑Tue Nov 14, 2017 11:38 am
Currently I am wasting quite a few CPU cycles waiting for the GPU to finish rendering. This is compounded by Ogre's rendering pipeline still being single threaded. So many cores are laying dormant. I should think this is a problem faced by many Ogre devs.
Analyze your bottleneck first. If you're idle because the CPU is waiting for the GPU, adding more cores won't fix any problem.
If you're idle because one CPU core is busy with Ogre processing, then adding more cores can alleviate it.
However the render split model in this case works just fine.
al2950 wrote: ↑Tue Nov 14, 2017 11:38 am
- You have to duplicate all data, which is not easy, especially with Ogre Singletons everywhere! I have ended up rolling my own scene nodes implementation and shortly my own skeleton, animation, particle....etc. In fact if I am being honest, I am getting to the point of questioning what I am using in Ogre.... Answer = Compositor & HLMS
No matter what threading paradigm, you'll never be saved from data duplication. It's a threading trade off:
- If data is read-only, no need to duplicate.
- If data is write access, you can lock access. Doesn't scale well if contention is high.
- If data is write access, you can duplicate. Each thread gets its own copy it can work with.
I don't get why you're duplicating
everything and seemingly using Ogre directly or Ogre-like concepts in your logic.
Back when I was writing Distant Souls, Ogre was fully contained to the rendering thread, except for skeletons which also had a copy simulated on the logic thread for important characters that needed deterministic attachments (e.g. weapons).
There were no nodes, just 2 flat array of objects in scene (one for objects that require constant sync like characters and attacks, and another for rare sync like trees, rocks, rendering-only entities). Logic thread copied from Havok its position & orientation to a container; and the rendering engine every frame copied that data to the Ogre node.
Animations that were on both logic and rendering, logic would periodically send messages basically saying "I am here" (running animations A, B, C, at time X and weights Y) and rendering just ensuring it wouldn't run too much ahead (basically rendering would act on its own; but on a few constraints to prevent deviating too much).
Particles FXs were fired from logic thread, which passed messages for the render thread to spawn, play and stop.
- SO you need to create a persistent copy and update it at a specific sync point. This has a number of issues, firstly, for example with scene nodes, you need to store and sync create, delete, attach, detach, etc events, not just state, eg position. Secondly the sync point implies some form of locking which will have a performance impact.
Those are all rather trivial to maintain (create, attach, detach) as the logic has no knowledge of that. That's something the render thread takes care of.
Sync points aren't a problem either because you don't need to lock to sync except in some rare cases (usually if the logic thread needs to know something from render thread, and it can't be delayed for the next frame). If you couldn't pass something this frame to/from render thread, then you'll do that on the next frame.
Logic doesn't require Graphics to be initialized at all, as it normally shouldn't be reading data back from Graphics. This makes the possibility that an Entity could spawn and be kept invisible for a few frames because of sync intervals being missed (though rarely more than a frame of latency).
In other words, maintaining a render split between Logic and Graphics is very much similar to CPU and GPU; usually we issue commands CPU -> GPU; and when we need something back then we either buffer the result so we query if the result is ready later; or we just stall (block the main thread). Though GPU -> CPU usually involves 2 or 3 frames of latency, whereas when you're threading, times tend to much faster (it really depends on how long it takes for the rendering thread to get to the sync point).