During the last days of development I more and more came to the conclusion that the current way how Ogre does what it does is always CPU bound.
The manual doesn't give very deep hints on how to improve things.
There is also the number of worker threads. But I don't have the feeling that the engine actually uses them to increase performance. At least in Remotery it's not visible when I build it with std::thread support.
But I know that there are more threads as I see messages like these
Code: Select all
DefaultWorkQueue('Root')::WorkerFunc - thread 8 starting.
I've discovered my lack of headroom of frametime during experiments with a second rendered scene. I've used this compositor for that purpose.
Code: Select all
abstract target rt_renderwindow
{
//Render opaque stuff
pass render_scene
{
profiling_id rt_renderwindow
load
{
all clear
clear_colour 0.2 0.4 0.6 1
}
store
{
depth dont_care
stencil dont_care
}
overlays on
shadows ShadowMapDebuggingShadowNode
}
}
abstract target main_stereo_render
{
//Eye render
pass render_scene
{
profiling_id main_stereo_render
load
{
all clear
clear_colour 0.2 0.4 0.6 1
}
store
{
depth dont_care
stencil dont_care
}
//0x01234567
identifier 19088743
overlays on
cull_camera VrCullCamera
shadows ShadowMapDebuggingShadowNode
instanced_stereo true
viewport 0 0.0 0.0 0.5 1.0
viewport 1 0.5 0.0 0.5 1.0
}
}
compositor_node OpenVRNodeNoRDM
{
in 0 stereo_output
target stereo_output : main_stereo_render {}
}
compositor_node OpenVRMirrorWindowNode
{
in 0 rt_renderwindow
target rt_renderwindow : rt_renderwindow {}
}
workspace OpenVRWorkspaceNoRDM
{
connect_output OpenVRNodeNoRDM 0
}
workspace OpenVRMirrorWindowWorkspace
{
connect_output OpenVRMirrorWindowNode 0
//connect_output OpenVRNodeNoRDM 0
}
But actually doing this eats up ~2ms of CPU time.
As a developer of the application I can do some things. I can put my logic in a different thread to give the rendering thread more space to breathe. But this is not my problem as the GameState of my application just uses 0.5ms. Bullet physics takes much more time but I've already put that in a separate thread.
My questions now would be.
- Is it possible to render two scenes in parallel for this specific case?
- Why does rendering take that much time and why is the CPU locked while doing so? And how can I shorten that time?
4.718ms is most of the time ok, considering we have 11ms on a 90 Hz headset. When I deactivate the Non-VR mirror and just go for the standard compositor, It's not that bad and playable pretty much most of the time. But it is a little bit dangerous considering the effects it has if I can't fit all rendering in the 11ms.
I still look for my own mistakes in my program. Maybe I'm doing something wrong. The performance is just not as I'd expect it to be.
I also dug a little deeper and compared the VR performance of Ogre 2.3 with Beat Saber (Unity) and Source 2 applications to check if this was a bug in OpenVR in general.
To have a much fairer comparsion I decided to remove my own application from the list and just use Tutorial_OpenVR for comparsion.
Both SteamVR Home and Half Life: Alyx have their mirror window disabled (Only Mixed Reality mirror). The GPU is screaming in pain in both applications but the CPU is pretty chill.
Beat Saber has an active mirror window (with a different FOV, The scene is therefore rendered again just for the mirror) and the CPU frametime looks great. The GPU is pumping but well there is gameplay going on.
Tutorial_OpenVR (with the original compositor) has a few spheres on screen in a nearly empty scene and the CPU time is already at 3ms.
I'm really confused about this. If I compare Tutorial_OpenVR with the AreaLights example it pumps out the frames:
But maybe this comes from the fact that the scene must be rendered twice for VR. 1.4ms of Area Lights times 2 would end up at 3ms If I would calculate it that way.
Long story short. I currently have the feeling that Ogre is held back by using only 1 Core. At least on my build.
But this is why I created this thread. There weren't really any about the rendering performance and CPU load.