[2.3] Ways to improve performance? (VR oriented) Topic is solved

Discussion area about developing with Ogre-Next (2.1, 2.2 and beyond)


Post Reply
Slamy
Gnoblar
Posts: 22
Joined: Sat Mar 27, 2021 10:49 pm
Location: Bochum, Germany
x 10

[2.3] Ways to improve performance? (VR oriented)

Post by Slamy »

It's kinda scary how many threads I've opened in the last weeks. But hey, I think this is an important topic and I haven't found an answer for this one yet. :oops:

During the last days of development I more and more came to the conclusion that the current way how Ogre does what it does is always CPU bound.
The manual doesn't give very deep hints on how to improve things.
There is also the number of worker threads. But I don't have the feeling that the engine actually uses them to increase performance. At least in Remotery it's not visible when I build it with std::thread support.
But I know that there are more threads as I see messages like these

Code: Select all

DefaultWorkQueue('Root')::WorkerFunc - thread 8 starting.
Let's check out my Remotery profile:
Image

I've discovered my lack of headroom of frametime during experiments with a second rendered scene. I've used this compositor for that purpose.

Code: Select all


abstract target rt_renderwindow
{
	//Render opaque stuff
	pass render_scene
	{
		profiling_id rt_renderwindow
		
		load
		{
			all				clear
			clear_colour	0.2 0.4 0.6 1
		}
		store
		{
			depth	dont_care
			stencil	dont_care
		}

		overlays	on

		shadows		ShadowMapDebuggingShadowNode
	}
	

}

abstract target main_stereo_render
{
	//Eye render
	pass render_scene
	{
		profiling_id main_stereo_render
		
		load
		{
			all				clear
			clear_colour	0.2 0.4 0.6 1
		}
		store
		{
			depth	dont_care
			stencil	dont_care
		}

		//0x01234567
		identifier 19088743

		overlays	on

		cull_camera VrCullCamera

		shadows		ShadowMapDebuggingShadowNode

		instanced_stereo true
		viewport 0 0.0 0.0 0.5 1.0
		viewport 1 0.5 0.0 0.5 1.0
		
	}

}

compositor_node OpenVRNodeNoRDM
{
	in 0 stereo_output

	target stereo_output : main_stereo_render {}
}

compositor_node OpenVRMirrorWindowNode
{
	in 0 rt_renderwindow

	target rt_renderwindow : rt_renderwindow {}

}

workspace OpenVRWorkspaceNoRDM
{
	connect_output OpenVRNodeNoRDM 0
}

workspace OpenVRMirrorWindowWorkspace
{
	connect_output OpenVRMirrorWindowNode 0
	//connect_output OpenVRNodeNoRDM 0
}
I came up with this as I wanted to have the possibility to show the world on the monitor without any VR properties. Current games do this a lot to offer a possibility to record game footage or to share the gameplay with a person in the same room.
But actually doing this eats up ~2ms of CPU time.

As a developer of the application I can do some things. I can put my logic in a different thread to give the rendering thread more space to breathe. But this is not my problem as the GameState of my application just uses 0.5ms. Bullet physics takes much more time but I've already put that in a separate thread.

My questions now would be.
- Is it possible to render two scenes in parallel for this specific case?
- Why does rendering take that much time and why is the CPU locked while doing so? And how can I shorten that time?

4.718ms is most of the time ok, considering we have 11ms on a 90 Hz headset. When I deactivate the Non-VR mirror and just go for the standard compositor, It's not that bad and playable pretty much most of the time. But it is a little bit dangerous considering the effects it has if I can't fit all rendering in the 11ms.

I still look for my own mistakes in my program. Maybe I'm doing something wrong. The performance is just not as I'd expect it to be.
I also dug a little deeper and compared the VR performance of Ogre 2.3 with Beat Saber (Unity) and Source 2 applications to check if this was a bug in OpenVR in general.

Image

Image

Image

Image

To have a much fairer comparsion I decided to remove my own application from the list and just use Tutorial_OpenVR for comparsion.
Both SteamVR Home and Half Life: Alyx have their mirror window disabled (Only Mixed Reality mirror). The GPU is screaming in pain in both applications but the CPU is pretty chill.
Beat Saber has an active mirror window (with a different FOV, The scene is therefore rendered again just for the mirror) and the CPU frametime looks great. The GPU is pumping but well there is gameplay going on.
Tutorial_OpenVR (with the original compositor) has a few spheres on screen in a nearly empty scene and the CPU time is already at 3ms.

I'm really confused about this. If I compare Tutorial_OpenVR with the AreaLights example it pumps out the frames:

Image

But maybe this comes from the fact that the scene must be rendered twice for VR. 1.4ms of Area Lights times 2 would end up at 3ms If I would calculate it that way.

Long story short. I currently have the feeling that Ogre is held back by using only 1 Core. At least on my build.
But this is why I created this thread. There weren't really any about the rendering performance and CPU load.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: [2.3] Ways to improve performance? (VR oriented)

Post by dark_sylinc »

CPU readings are going to be misleading because SteamVR purposedly burns CPU cycles waiting for the next VBlank so that the work is submitted right after the previous VBlank. Valve calls it "Running Start VSync"

Thus masking GPU-bound problems as CPU bound because the GPU isn't fast enough so the CPU burns extra cycles to get to the next VBlank.

This is showing up in your graph where Root::frameEnded listener is taking a monumental amount of time (which is where OpenVRCompositorListener::updateHmdTrackingPose is likely getting called). That's why we advise to experiment with VrWaitingMode and it can make massive a difference.

As for improving performance, you can:
  • Lower the resolution
  • Lower the number of shadow maps
  • Lower the shadow map resolution
  • Just disable shadow maps
  • Use Radial Density Mask (RDM)
  • Use Hidden Area Mesh (HAM, i.e. mHiddenAreaMeshVr in our code)
  • Use cheaper BRDFs. e.g. PbsBrdf::BlinnPhongFullLegacy is the cheapest, PbsBrdf::BlinnPhong is a good compromise that looks quite nice like Default while being cheaper
  • Disable Forward+
  • Use baking (see Samples/2.0/Tutorials/Tutorial_TextureBaking)
  • Avoid v1 overlays. They may stall the engine. Use Colibri (before you migrate anything, just disable v1 overlays and see if it makes any difference!).
And obviously make sure you're in Release. Debug builds have a significant performance overhead.
Slamy
Gnoblar
Posts: 22
Joined: Sat Mar 27, 2021 10:49 pm
Location: Bochum, Germany
x 10

Re: [2.3] Ways to improve performance? (VR oriented)

Post by Slamy »

First of all, thank you for you fast response.
I've made some progress and would like to share my changes and the impact of each change.
They are not VR relevant at all and could help with any application.

1. I started with my current software which has an average fps of 310 on my machine. I use the null compositor listener as it allows to have a higher framerate and shows raw performance. I use the standard resolution of 3704 x 2056 for the null compositor as the open vr sample does as well. The mirror window is done like the sample with an instanced stereo view. No additional render scene with another camera.
Hidden Mesh and rdm are not active at the moment. The amount of lights is decreased compared to the sample. Only ambient light and a directional sun with one shadow map (At least I hope so). I use Forward3D as I wanted to have a point light without shadow casting, which is not rendered without Forward3D (A bug?)

2. I've removed the 2D overlay and all text rendering. The avg FPS is instead printed on the console every 3 frames.
The framerate is increased to 340 fps. I assume this is what you've meant with v1 overlays?

3. Readded hidden mesh from the open vr example.
No impact on performance. About 343 fps on average.

Failed 4. Readded rdm. Revert to original compositor which has the RDM node intact. (Yeah, I've removed it)
It's a desaster as the framrate drops to 300 fps. So the impact is not a positive one the performance.

4. Removal of an Ogre::v1::BillboardChain which I've used to simulate a laser effect. I needed a "line ray" as some sort of 3D cursor.
By doing so I've increased the fps from 343 to 390. Cool. But the billboard chain is very useful. :oops:

5. Removal of a single Ogre::ManualObject.
On every frame I've executed clear() and made a begin(..., Ogre::OT_TRIANGLE_LIST) which I'm using as a "light ray parabola thing" to show a teleport position. This must be done on every frame. But I guess I did something wrong here as this had an enormous impact. The framerate raised from 390 fps to 670 fps. :shock: :shock: :shock:

6. Removal of some game state update calculations which I should definitely put in a separate thread.
I'm now at 774 fps.

It didn't even took an hour and I have more than doubled the framerate. I'm currently unaware how it will look in VR but I assume that this has improved a lot! So I didn't tried everything you said but If it stays like I'm confident that I have enough frametime headroom to tinker around.
Loosing the BillboardChain is not that big of an disaster. I just make an unlit transparent cylinder of light. But I need to check why the ManualObject is so aggressive. I do remember that you've provided an example for dynamic meshes so I will look into that.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: [2.3] Ways to improve performance? (VR oriented)

Post by dark_sylinc »

Hi!

Thanks for sharing your experience!
2. I've removed the 2D overlay and all text rendering. The avg FPS is instead printed on the console every 3 frames.
The framerate is increased to 340 fps. I assume this is what you've meant with v1 overlays?
Yup. Further investigation would be needed to see if that was just merely the cost of drawing things on screen, or the v1 infrastructure really was slowing you down.

Btw start measuring your times in milliseconds instead of FPS.

Going from 310 fps to 340 fps is 3.23ms vs 2.94, that's a respectable 0.28ms improvement for just a couple letters on screen (that's 2.52% of your 11.11ms budget).
However going from 40 fps to 70fps is still 30 fps difference but in ms that's 25ms vs 14.29ms, a whooping 10.71ms
3. Readded hidden mesh from the open vr example.
No impact on performance. About 343 fps on average.

Failed 4. Readded rdm. Revert to original compositor which has the RDM node intact. (Yeah, I've removed it)
It's a desaster as the framrate drops to 300 fps. So the impact is not a positive one the performance.
OK you were genuinely CPU bound (that is more obvious with the last thing you did, which doubled perf)
4. Removal of an Ogre::v1::BillboardChain which I've used to simulate a laser effect. I needed a "line ray" as some sort of 3D cursor.
By doing so I've increased the fps from 343 to 390. Cool. But the billboard chain is very useful.
Agreed on usefulness. Unfortunately they weren't ported to v2 equivalent, mostly because we didn't have time and because the v1 versions were good enough for most people. VR happens to demand extreme performance and every drop counts when targetting 90hz.

Please note the performance may not scale linealy. e.g. 1 laser gives you a 0.35ms impact but turns out 100 lasers barely increases the cost (specially if all those lasers end up sharing the same billboard renderer). That's something you'll have to measure
5. Removal of a single Ogre::ManualObject.
On every frame I've executed clear() and made a begin(..., Ogre::OT_TRIANGLE_LIST) which I'm using as a "light ray parabola thing" to show a teleport position. This must be done on every frame. But I guess I did something wrong here as this had an enormous impact. The framerate raised from 390 fps to 670 fps.
Ok that ManualObject was clearly forcing a stall (or doing something very, very inefficiently). If you hit a stall you lose double/triple buffering and that halves performance. It's very bad, and even more in VR.

Ogre::ManualObject was a community contribution and had some rough edges but there was no alternative and people liked the user friendly interface so it was accepted. But sadly it has problems like these.
IIRC ManualObjects have an option to convert to Mesh and then you destroy the ManualObject. That may solve the problem while still being able to use MOs (unless you need to construct MOs dynamically, because everytime you create one is going to cause a stall, looking like a hitch in your framerate).

I think we should document this behavior of MOs in the code.

Cheers!
Matias
Slamy
Gnoblar
Posts: 22
Joined: Sat Mar 27, 2021 10:49 pm
Location: Bochum, Germany
x 10

Re: [2.3] Ways to improve performance? (VR oriented)

Post by Slamy »

Howdy!
dark_sylinc wrote: Wed May 12, 2021 6:43 pm Btw start measuring your times in milliseconds instead of FPS.
Of course you are right. Sorry about that. FPS don't scale very linear and it's difficult to compare the actual burden of each change.
dark_sylinc wrote: Wed May 12, 2021 6:43 pm Going from 310 fps to 340 fps is 3.23ms vs 2.94, that's a respectable 0.28ms improvement for just a couple letters on screen (that's 2.52% of your 11.11ms budget).
We must take this with a grain of salt. I've discovered a measurement error on my side. It turns out, I like to listen to music on YouTube running on a Firefox while doing hobby programming stuff. The performance drops slightly and causes a small error.

I've repeated all my measurements and ensured that only my IDE was running during that time. Luckily I had every change as a separate commit for reproducability.

Code: Select all

Action                                                                                                            | Avg Ft. Impr. [ms]  |  Avg Ft. [ms] | Avg FPS             
starting point                                                                                                    |                     |          2.99 |     335
removed overlay                                                                                                   |               0.09  |          2.90 |     344
readded hidden mesh                                                                                               |               0.11  |          2.79 |     360
removed rightHand_shootLaserChain                                                                                 |               0.35  |          2.44 |     678
removed manual object with triangle list                                                                          |               0.96  |          1.48 |     678
removed much too complex gamestate calcs                                                                          |               0.22  |          1.26 |     793
readded manual object with triangle list for teleport parabola. but this time with beginUpdate and without clear. |              -0.03  |          1.29 |     773
reactivated rightHand_shoot laser. But this time with manual object and a custom implementation of a billboard    |                  0  |          1.29 |     778
reactivated mirror window render with additional camera, removed hidden area mesh temporary as it's ugly          |              -1.27  |          2.56 |     391
The manual object was the biggest performance killer of them all. The v1 billboard chain comes in second.
Yesterday I've removed all clear() calls for manual objects and replaced them by proper reserving enough space.
The length of the "teleporting parabola" changes according to the angle and whether it hits geometry. And it is attached to the hands. It must be recalculated every frame.
But I'm now using a fixed number of triangles for it and move unused ones really far away to not show them. I currently don't have a better solution for this.

And if we look at the numbers now, the manual objects don't have impact at all any more. The measure noise is bigger than the actual burden.
So Ogre should warn about frequent usage of clear(). I see potential for a "Ogre please tell me why I'm stupid" profiling mode which collects data about engine misuse

The last measurement of course returns me again to the start. An additional render seems to be a very bad idea. At least how I performed it as I use a compositor with an additional render_scene.
Usually I would say that instanced rendering can fix this by adding yet another instance: Left Eye, Right Eye, Camera.
But I'm not the expert here and an effcient mirror window is not the highest priority even for me.
Even Valve didn't get it right with Half Life: Alyx. (No joke my VR fellows. If you have frame drops just deactivate the mirror window. It might get better.)
I also had the idea about just using the left eye and show that in the mirror window as I don't really require a completely different camera position. It's just that the view is squashed horizontally in the mirror so I assume a transformation is required here to change that.

But concerning multiple cameras. I think that especially for split screen applications it might be a nice feature of having multiple rendering threads which can run in parallel.

Laser Renderer with ManualObject

One last thing I want to share is the replacement I use for the BillboardChain. It's not very clean but It does the job:

Code: Select all

static Ogre::ManualObject* shootRayQuads{nullptr};
static int shootRayCreateQuad_index = -1;

void generateShootRay(Ogre::Vector3 camPos, Ogre::SceneNode* laser_node)
{
	if (!shootRayQuads)
		return;

	if (shootRayCreateQuad_index==-1)
	{
		// catch the first execution
		shootRayQuads->begin("FancyTransparentMaterial", Ogre::OT_TRIANGLE_LIST);
	}
	else
	{
		shootRayQuads->beginUpdate(0);
	}

	shootRayCreateQuad_index = 0;

	Ogre::Vector3 firstPos;
	Ogre::Vector3 laserDirection;

	if (laser_node)
	{
		firstPos		  = laser_node->_getDerivedPosition();
		laserDirection = laser_node->_getDerivedOrientation() * Ogre::Vector3(0, 0, -1);
	}
	float laserWidth = 0.04;

	Ogre::Vector3 viewDirection = camPos - firstPos;
	Ogre::Vector3 perpenDicular		= laserDirection.crossProduct(viewDirection).normalisedCopy() * laserWidth;

	Ogre::ColourValue colors[4];
	colors[0] = Ogre::ColourValue(1, 1, 1, 0);
	colors[1] = Ogre::ColourValue(1, 1, 1, 1);
	colors[2] = Ogre::ColourValue(1, 1, 1, 1);
	colors[3] = Ogre::ColourValue(1, 1, 1, 0);

	Ogre::Vector3 positions[4];
	positions[0] = firstPos;
	positions[1] = positions[0] + laserDirection;
	positions[2] = positions[1] + laserDirection;
	positions[3] = positions[2] + laserDirection;

	for (int i = 0; i < 3; i++)
	{
		shootRayQuads->position(positions[i] - perpenDicular);
		shootRayQuads->colour(colors[i]);
		shootRayQuads->position(positions[i] + perpenDicular);
		shootRayQuads->colour(colors[i]);

		shootRayQuads->position(positions[i + 1] - perpenDicular);
		shootRayQuads->colour(colors[i + 1]);
		shootRayQuads->position(positions[i + 1] + perpenDicular);
		shootRayQuads->colour(colors[i + 1]);

		shootRayQuads->index(shootRayCreateQuad_index + 0);
		shootRayQuads->index(shootRayCreateQuad_index + 1);
		shootRayQuads->index(shootRayCreateQuad_index + 2);

		shootRayQuads->index(shootRayCreateQuad_index + 1);
		shootRayQuads->index(shootRayCreateQuad_index + 3);
		shootRayQuads->index(shootRayCreateQuad_index + 2);

		shootRayCreateQuad_index += 4;
	}

	shootRayQuads->end();
}
This code will give you this:
Image

While not beautiful it does a basic job of a necessary pointing mechanism with nearly zero costs compared to a single Ogre::v1::BillboardChain.

Kind regards,
Slamy
User avatar
TaaTT4
OGRE Contributor
OGRE Contributor
Posts: 267
Joined: Wed Apr 23, 2014 3:49 pm
Location: Bologna, Italy
x 75
Contact:

Re: [2.3] Ways to improve performance? (VR oriented)

Post by TaaTT4 »

Another thing you could try is to share the cull data among the two eyes. It basically consists in doing the cull pass just once, with a camera that is "behind" the eye cameras, in a way that cull data are then valid for both eyes. Check here for a better explanation: viewtopic.php?p=530019#p530019.
Extra-tip: since the differences are barely noticeable, I use this cull camera also as a camera for the shadow caster pass. I that way, I can compute the shadow maps just once and, again, share them among the two eyes.

If you're CPU bound and don't need/use LODs, disable them (lod_update_list false) in render_scene passes to spare some CPU cycles. Same considerations are (probably!) still valid for Forward+ (enable_forwardplus false).

Senior programmer at 505 Games; former senior engine programmer at Sandbox Games
Worked on: Racecraft EsportRacecraft Coin-Op, Victory: The Age of Racing

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: [2.3] Ways to improve performance? (VR oriented)

Post by dark_sylinc »

Another thing you could try is to share the cull data among the two eyes. It basically consists in doing the cull pass just once, with a camera that is "behind" the eye cameras, in a way that cull data are then valid for both eyes
IIRC instanced stereo should already be doing that for you (or if it's not, then it's still doing culling only once with the left eye)
Post Reply