Problem: performance with many low-poly models on scene; how to find bottleneck?

Problems building or running the engine, queries about how to use features etc.
Post Reply
User avatar
soft_fur_dragon
Kobold
Posts: 36
Joined: Sat Aug 07, 2021 12:00 pm
x 5

Problem: performance with many low-poly models on scene; how to find bottleneck?

Post by soft_fur_dragon »

Ogre Version: 1.12.12
Operating System: Windows 10
Render System: GL3
GPU: NVIDIA GeForce RTX 2070
CPU: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
RAM: 32Gb

Shadows: No
PostProcess: Simple image mirroring

I have about 17000 entities here. You can see models quality. You also can see how "much" resources it takes, but I still have 26 fps. Why?
P.S. with shadows enabled it becomes 10 fps :cry:

P.P.S. I guess, this may be happening because ogre don't use instancing by default. But I'm not sure and I would like to get exact data about what and how many time everything inside renderOneFrame() takes

I did some measures and found that:
120800 nanoseconds - everything else inside my world tick
48932100 nanoseconds - ogre_app_->getRoot()->renderOneFrame();

Image

This is a grass model I have:
Image

And this is an example of single terrain chunk:
Image
Every layer along vertical axis is a separate entity. I do need that because of some gameplay purpose
I have 49 of them

I also would like to hear your experience with big scenes, because this is only hills. I also have a forest which I even haven't tried to open on ogre yet :cry:

Upd: Let's say, we have 16700 (total - terrain) grass entities. So then we have 601 200 triangles from that grass. Part of it is occluded, we see about quarter of it, i.e. about 150 000 triangles. How much is that?

Upd: found that, so I guess 150 000 shouldn't be a problem for my GPU at all
User avatar
soft_fur_dragon
Kobold
Posts: 36
Joined: Sat Aug 07, 2021 12:00 pm
x 5

Re: Problem: performance with many low-poly models on scene; how to find bottleneck?

Post by soft_fur_dragon »

That was debug build. In release build I get 130 fps with fully functional shader (including shadows) in maximized window.

But why? I still don't understand why it has so few fps with such resource consumption and why it have such a big difference

Plus, question about how to detect bottleneck remains opened
paroj
OGRE Team Member
OGRE Team Member
Posts: 1572
Joined: Sun Mar 30, 2014 2:51 pm
x 749
Contact:

Re: Problem: performance with many low-poly models on scene; how to find bottleneck?

Post by paroj »

each entity results in one draw call by default (+1 for the shadow map).

see:
- https://ogrecave.github.io/ogre/api/lat ... _geom.html
- https://ogrecave.github.io/ogre/api/lat ... filer.html
rpgplayerrobin
Goblin
Posts: 244
Joined: Wed Mar 18, 2009 3:03 am
x 109

Re: Problem: performance with many low-poly models on scene; how to find bottleneck?

Post by rpgplayerrobin »

Yeah, each entity is one draw-call (glBegin in OpenGL for example).
Nvidia talked about batches in an article many years ago, explaining that at 10 000 batches the CPU was 100% busy just pushing data to the GPU (this was around 2010 or something though).
But the same stands today, batches are poison for the performance.

In my game, I use a ManualObject for many things to batch them together, including grass (well, an altered version of it with a huge performance boost, posted here, CManual_Object): viewtopic.php?f=2&t=95784
Grass in my game then becomes 1 batch, and barely takes any performance at all even with my own dynamic swaying of them per-frame (that they bend from characters walking through them and from other spell effects).

I also use InstanceManager (https://ogrecave.github.io/ogre-next/ap ... ncing.html) with HW Basic for more complex meshes, then I can display thousands of meshes without FPS falling, since they are combined into one single batch. Note that you must make a manager around this, since each InstanceManager can only have one single material applied to it (since it is impossible to have multiple materials in one batch).



About checking the bottleneck:
I use a profiler I made myself for my game that is basically only enabled if I want it to with #ifdef.
That way it takes no performance if I don't use it.
I just place blocks like this around every function I know can take time to use:
profiler->Start("Render");
app->m_root->RenderOneFrame();
profiler->Stop();
I also use it on every self-made code I have for every manager I use. That way I know exactly what takes performance in my game and I can then fix it.

Then it saves every time/string it takes from start->stop and outputs it in a GUI in-realtime. That way I can see exactly what is taking more performance than the rest.
Of course, this does not completely solve to check bottlenecks in the game when it comes to rendering.
The way I see it, just output on the screen the amount of batches and triangles that were rendered last frame. That way you KNOW if there are too many batches/triangles on the scene, which causes FPS decrease (I usually have at most 300 batches in my game at a time, but it is top-down view though).
You can "profile" a shader by going close to it with the camera so that it covers the entire screen. This can be useful to see shaders that are slow, for example, I made a particle shader that was doing 3D noise, but when I went close to it the FPS went down to around 20 FPS. Then you know that shader is extremely slow, and you should not use it.
Here is the code to get/output the amount of batches/triangles:

Code: Select all

std::to_string(app->m_Camera->getViewport()->_getNumRenderedBatches());
std::to_string(app->m_Camera->getViewport()->_getNumRenderedFaces());
paroj
OGRE Team Member
OGRE Team Member
Posts: 1572
Joined: Sun Mar 30, 2014 2:51 pm
x 749
Contact:

Re: Problem: performance with many low-poly models on scene; how to find bottleneck?

Post by paroj »

rpgplayerrobin wrote: Mon Nov 01, 2021 1:07 pm About checking the bottleneck:
I use a profiler I made myself for my game that is basically only enabled if I want it to with #ifdef.
That way it takes no performance if I don't use it.
I just place blocks like this around every function I know can take time to use:
profiler->Start("Render");
app->m_root->RenderOneFrame();
profiler->Stop();
I also use it on every self-made code I have for every manager I use. That way I know exactly what takes performance in my game and I can then fix it.
note, that this is almost exactly what OgreProfiler would do:
https://ogrecave.github.io/ogre/api/lat ... filer.html

the advantage of using it, is that it integrates with remotery:
https://github.com/Celtoys/Remotery
User avatar
soft_fur_dragon
Kobold
Posts: 36
Joined: Sat Aug 07, 2021 12:00 pm
x 5

Re: Problem: performance with many low-poly models on scene; how to find bottleneck?

Post by soft_fur_dragon »

1. I heard about OgreProfiler, but at first glance I thought that it just measure time between two points in code. As I understand now, it works like a real profiler and measure execution time for every function inside

2. Do 2 different submeshes with same material on one entity counts as two draw calls? Because if so, I literally double their count with my grass. I just have two similar meshes there with normals looking in opposite sides (in blender you can't have two triangles between same points, so I decided to just create two meshes and don't think about it)
P.S. when I was working on pure opengl earlier, I had instancing (180 instances per one call for grass :D ). Maybe, the only thing that was good there :lol:

Maybe, you can tell how much (in percent, just on your opinion, from what I provided in first post) ogre's instancing and other improvements may boost my performance? I just want to understand, is it worth to focus right now or should I delay it. As I told, I have more complex locations. I have a good fps, but only in release build.
I just don't want to spend 1-2 days on setting up some feature that will give me +5% fps when I have some more important tasks right now :D
paroj
OGRE Team Member
OGRE Team Member
Posts: 1572
Joined: Sun Mar 30, 2014 2:51 pm
x 749
Contact:

Re: Problem: performance with many low-poly models on scene; how to find bottleneck?

Post by paroj »

take a look at the "New Instancing" Sample, which has a "No Instancing" setting. With 10k entities and shadow mapping its a difference between 17 fps and 170 fps ("Hardware basic").

Each sub-entity is a separate draw-call.
User avatar
soft_fur_dragon
Kobold
Posts: 36
Joined: Sat Aug 07, 2021 12:00 pm
x 5

Re: Problem: performance with many low-poly models on scene; how to find bottleneck?

Post by soft_fur_dragon »

Ok, it sounds more than reasonable to implement instancing then.

I already rewrote my mesh component code to handle meshes that wants to be instanced. What is unclear for me, what should I do with my shaders, because right now they do not accept any arrayed data. Do ogre somehow modify and recompile them to accept arrays in place of specific instance-dependent uniform parameters?

For instance ( :lol: ), this is my basic vertex shader:

Code: Select all

#version 330

#include "utils_vs.glsl"

uniform mat4 model;
uniform mat4 view;
uniform mat4 proj;
uniform float time;
uniform mat4 shadow_vp_0;
uniform vec4 texel_offsets; // SHADOW_PASS
uniform vec4 wind_data;     // WIND / (box_height, scale, -, -)

...
wind_data is a custom parameter. This one and model - they both depends on instance, while others not, but they are not arrays here. How do ogre handle that?

Since in .program you define each parameter and it's value type, it seems to be more than possible for ogre to find them in shader and modify, but I'm not sure that it actually does that :D

Upd: no, I see. I already found examples and study them :D
User avatar
soft_fur_dragon
Kobold
Posts: 36
Joined: Sat Aug 07, 2021 12:00 pm
x 5

Re: Problem: performance with many low-poly models on scene; how to find bottleneck?

Post by soft_fur_dragon »

I just tried the samples. 100 fps with 3 mil triangles and shadows enabled - is oof :shock:
I definitely have a space to expand, once I finish with it
Image
rpgplayerrobin
Goblin
Posts: 244
Joined: Wed Mar 18, 2009 3:03 am
x 109

Re: Problem: performance with many low-poly models on scene; how to find bottleneck?

Post by rpgplayerrobin »

The differences between HW basic and a normal non-instanced shader is this below (from my game):

Normal shader material params:

Code: Select all

param_named_auto worldMatrix world_matrix
param_named_auto texViewProj texture_viewproj_matrix
param_named_auto modelViewProj worldviewproj_matrix
param_named_auto texturemat texture_matrix 0
Normal shader vertex shader:

Code: Select all

VS_OUTPUT main_vs( float4 position : POSITION,
				   out float4 oPosition : POSITION,
				   float3 normal : NORMAL,
				   float2 iUV : TEXCOORD0,
				   float3 iTangent : TANGENT )
{
	VS_OUTPUT Out;

	float4 worldPos = float4(mul(worldMatrix, position).xyz, 1);
	float3 worldNorm = normalize(mul((float3x3)worldMatrix, normal));
	float3 worldTangent = normalize(mul((float3x3)worldMatrix, iTangent));

	oPosition = mul(modelViewProj, position);
	Out.oVertexPos = worldPos;
	Out.oUV = iUV;
	Out.oTangent = worldTangent;
	Out.oNormal = worldNorm;
	Out.oBitangent = cross( Out.oTangent, Out.oNormal );

	Out.oUV = mul(texturemat,float4(Out.oUV,0,1)).xy;

	Out.oShadowUV = mul(texViewProj, Out.oVertexPos);

	return Out;
}


HW basic shader material params:

Code: Select all

param_named_auto texViewProj texture_viewproj_matrix
param_named_auto viewProjMatrix viewproj_matrix
param_named_auto texturemat texture_matrix 0
HW basic shader vertex shader:

Code: Select all

VS_OUTPUT main_vs( float4 position : POSITION,
				   out float4 oPosition : POSITION,
				   float3 normal : NORMAL,
				   float2 iUV : TEXCOORD0,
				   float4 mat14 : TEXCOORD1,
				   float4 mat24 : TEXCOORD2,
				   float4 mat34 : TEXCOORD3,
				   float3 iTangent : TANGENT )
{
	VS_OUTPUT Out;

	float3x4 calculatedWorldMatrix;
	calculatedWorldMatrix[0] = mat14;
	calculatedWorldMatrix[1] = mat24;
	calculatedWorldMatrix[2] = mat34;

	float4 worldPos = float4(mul(calculatedWorldMatrix, position).xyz, 1);
	float3 worldNorm = normalize(mul((float3x3)calculatedWorldMatrix, normal));
	float3 worldTangent = normalize(mul((float3x3)calculatedWorldMatrix, iTangent));

	oPosition = mul(viewProjMatrix, worldPos);
	Out.oVertexPos = worldPos;
	Out.oUV = iUV;
	Out.oTangent = worldTangent;
	Out.oNormal = worldNorm;
	Out.oBitangent = cross( Out.oTangent, Out.oNormal );

	Out.oUV = mul(texturemat,float4(Out.oUV,0,1)).xy;

	Out.oShadowUV = mul(texViewProj, Out.oVertexPos);

	return Out;
}

The fragment shaders are exactly the same.
So as you see, you only need a different vertex shader for it work.
I have it working using a checkbox in the options menu (but it only updates when you rejoin the game, then it exchanges the shaders for the real HW basic ones).
User avatar
soft_fur_dragon
Kobold
Posts: 36
Joined: Sat Aug 07, 2021 12:00 pm
x 5

Re: Problem: performance with many low-poly models on scene; how to find bottleneck?

Post by soft_fur_dragon »

Yes, I already found that, but thank you anyway :D

Right now I'm stuck on problem that I don't understand how to receive custom parameters inside instanced shaders

Also, I've spotted that VTF in sample on Gl3+ works better than HW. Even on static geometry. I had 143 fps while HW still had +-100.
I started development on GL3 because I had some compilation/launch problems, I don't remember already. Now, when I recompile sources myself I can launch it (yet I don't see my instanced entities on GL3+, but however). How do ogre utilize GL3+ features? How many profit will I get? I'm not targeting old platforms, so I'm ok with fact that OpenGL3+ is not supported as wide.
User avatar
soft_fur_dragon
Kobold
Posts: 36
Joined: Sat Aug 07, 2021 12:00 pm
x 5

Re: Problem: performance with many low-poly models on scene; how to find bottleneck?

Post by soft_fur_dragon »

Also, for reasons of optimization, I guess I may stop render shadows every frame. I'm creating some game with minecraft-like world, i.e. it almost static and changes rarely, i.e. I can render shadows only when tiles change and render only entities every frame.

But I'm not sure, if I can render shadows by function call. I found prepareShadowTextures(), but it's about deferred rendering, as I see. So I'm going to implement it then.
User avatar
nuke
Halfling
Posts: 66
Joined: Wed Oct 01, 2014 1:16 am
Location: Crimea
x 5

Re: Problem: performance with many low-poly models on scene; how to find bottleneck?

Post by nuke »

Hi!
I will not help you solve the problem :( , but I will tell you one thing.
I started development on GL3 because I had some compilation/launch problems, I don't remember already. Now, when I recompile sources myself I can launch it (yet I don't see my instanced entities on GL3+, but however). How do ogre utilize GL3+ features? How many profit will I get? I'm not targeting old platforms, so I'm ok with fact that OpenGL3+ is not supported as wide
@sercero gave me very good and useful advice - for Windows it is better to use DirectX (9 or 11), not OpenGL.
If your application is at an early stage of development, then it is better to consider the option of migrating to DirectX right now.
User avatar
soft_fur_dragon
Kobold
Posts: 36
Joined: Sat Aug 07, 2021 12:00 pm
x 5

Re: Problem: performance with many low-poly models on scene; how to find bottleneck?

Post by soft_fur_dragon »

Hi!

Actually, you may be right. I just use OpenGL because I had an idea to develop cross-platform game a few years ago. Since that the project has changed a few times, but it just became a habit to use OGL. I even haven't thought about that. I don't think that linux support as much relevant for me now, so I'll take your advise into consideration :D

Upd: yes, for performance reasons you're definitely right. I just tested it on deferred shading demo (instancing demo crashes for me :( ) with disabled VSync and got 580 FPS on DirectX and 170 FPS on GL3+ O.O
User avatar
soft_fur_dragon
Kobold
Posts: 36
Joined: Sat Aug 07, 2021 12:00 pm
x 5

Re: Problem: performance with many low-poly models on scene; how to find bottleneck?

Post by soft_fur_dragon »

Ok, current status is next.

I managed to completely move on DirectX. I loaded the same scene (plains with flowers), but without shadows. I'm going to implement deferred render, so it's pointless for me to implement forward-shadows. And, without shadows, I have +- 400 fps. Even if we assume that shadow pass divide FPS by two (which will never happen, shadow pass is always easier than render) we get +-200 fps. Against 70-80 on OpenGL. Sounds like an absolute win.
Post Reply