[2.1] Expectations/Advice on rendering 1 mil items

Problems building or running the engine, queries about how to use features etc.
Post Reply
gabbsson
Halfling
Posts: 65
Joined: Wed Aug 08, 2018 9:03 am
x 13

[2.1] Expectations/Advice on rendering 1 mil items

Post by gabbsson »

OpenGL3+
Linux (Ubuntu)

Hello!

I'm working on replacing old OpenGL rendering engine with Ogre and recently got the question how it performed when rendering 1 million simple meshes. I don't really want to use the word particles but that's basically whats requested; to be used in fluid dynamics visualization.
Essentially each mesh a triangle (or some simple mesh) and they will all be identical, but each should have its own SceneNode (transform).

The current OpenGL implementation is said to be able to handle 1 mil objects. Unfortunately I have no idea what hardware it runs on nor what other "workarounds" it uses. Regardless I need to be able to handle the same volume (or at least get as close as possible). I'll poke around to see if I can find out more, since I'm guessing it plays a huge role on comparing performance.

I've tried to incrementally see how much Ogre can handle and it gets really laggy for me already around 90k items (2-4 fps).
By objects I mean individual Ogre::Item with a Ogre::SceneNode each with a single quad mesh.
Each item uses the same mesh so I should be benefiting from Instancing? Is there a way to verify that?

I have created my own implementations for Unlit and PBS, adding geometry shaders for each.
While testing I've defaulted to using Unlit since I assume it performs better.
Just to do an initial test I swapped my GLSL folder for the sample folder (so just the "basic" shaders) and there was no significant difference.
Face culling is active, but even when faces are culled the fps is really low.
Does that point toward a CPU bottleneck?

Hardware is a big question mark for me right now. My computer is a HP laptop with an intel i7 6600U, no dedicated graphics.
I don't know what our users will have access to, so hardware is more or less not a parameter right now.
But I would love to know what to "expect" so that I can report back that X is feasible/not feasible with Y hardware.
Is there a ballpark number of how many times better things work with a dedicated graphics card?

Old forum posts often bring up "Which scene manager are you using?", if I recall correctly there is only one scene manager in 2.1?
I've also tried to find information about occlusion culling, but the most I've found is that it is a work in progress?

I'm really just looking for anything that can boost my range of how many individual items I can render.

Thanks for reading! Cheers
al2950
OGRE Expert User
OGRE Expert User
Posts: 1227
Joined: Thu Dec 11, 2008 7:56 pm
Location: Bristol, UK
x 157

Re: [2.1] Expectations/Advice on rendering 1 mil items

Post by al2950 »

Hi

I dont envy you, this is a difficult problem with a lot of unknowns! FYI I did a quick test with ogre and rendered 1 million spheres using an unlit material and I ensured shadows were disabled, ie very simple Compositor:

Ogre: V2-2-wip (3e5fe6cb3fcc57d08054fa60d827707463c04474)
Mesh: Sphere1000.mesh
CPU: i7 4770k (4 cores)
GPU: AMD Vega 64
OS: Windows
Renderer: Dx11
Res: 800x600
FPS: 10fps

FYI all spheres are on the same plain and so there is no, or at least little, benefit from GPU z-culling.

Not too bad considering. Having a very quick look at remotery
OneMillionObjectsOgre.jpg
gabbsson wrote: Fri Apr 05, 2019 9:07 am Does that point toward a CPU bottleneck?
Although GPU seems to be struggling a little, its the CPU thats holding it back. Key factors are
1) Updating the scenegraph (28ms)
2) Frustrum Culling (8ms)
3) Command Preperation

1 and 2 are actually highly optimised, and with a better CPU would go much faster. 3 might be a sticking point as I believe it does not scale very well. This is a question for dark_sylinc

If you solve the CPU issues (and this is assuming the CPU does not need to do any additional transform on every node every frame), there is still a fairly large GPU hit. Without looking into in more detail its hard to say what could be done, but given its a single mesh, I am sure it could be improved. It always can! but then again you dont know what hardware you have to use...

Having said all this, I would not think about actually trying to do it this way. You mentioned particles, and fluid dynamics. Now Ogre's particle implementation by todays tech is really only useful for the most basic affects. It sounds what you need is a flexible GPU particle system, like the one I started to think about in this post. The bad news is I have not been able to complete this work. The good news is that I am currently working away from home during the week stuck in a crappy hotel and have actual free time in the evenings! As result I started back on this last week :D. So you never know I might be able to finish it this time!
gabbsson wrote: Fri Apr 05, 2019 9:07 am Each item uses the same mesh so I should be benefiting from Instancing? Is there a way to verify that?
Yes it should. You could verify it by looking at renderdoc.
gabbsson wrote: Fri Apr 05, 2019 9:07 am Old forum posts often bring up "Which scene manager are you using?", if I recall correctly there is only one scene manager in 2.1?
I've also tried to find information about occlusion culling, but the most I've found is that it is a work in progress?
Yes there is only one. I am not sure CPU Occlusion culling will help you here as you are CPU bound, its more for things like cities when a lot of objects are hidden behind buildings
gabbsson wrote: Fri Apr 05, 2019 9:07 am I've tried to incrementally see how much Ogre can handle and it gets really laggy for me already around 90k items (2-4 fps).
I get 25fps with 100,000 spheres all in view, 100fps with ~50,000 vspheres in view
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: [2.1] Expectations/Advice on rendering 1 mil items

Post by dark_sylinc »

Although Ogre has been optimized to operate on a very large number of Items, what you’re trying to do (render 1 million Items, 1 triangle each) is going to hit a different kind of bottleneck.

First, I suspect you don’t need the flexibility that an Item gives (individual materials, individual mesh, multiple submeshes, hierarchical transform animation), thus a specialized solution (i.e. usually written for particles) would be much faster.

Second, each triangle would be submitted as a separate draw, even if we use OpenGL’s multidraw API. Even though there may only be one draw call thanks to glMultiDraw, internally the GPU must assign each triangle (3 vertices) to a single wavefront (a single wavefront is between 32 and 64 threads depending on the GPU vendor and model), unless the driver is able to merge these draws into the same wavefront (which is rare), and that is going to limit your performance severely (in your extreme case it could be as much as 10x-20x).

A dedicated particle solution would submit most particles as a single or a few draws, thus fully utilizing all threads in a wavefront.

A dedicated solution (whether CPU or via GPU compute) would work via particles, and possibly cluster these particles spatially (in order to frustum cull many of them at once instead of culling one by one), and probably generate the vertex geometry in the vertex shader using SV_VertexID/gl_VertexID tricks.
gabbsson wrote: Fri Apr 05, 2019 9:07 am I have created my own implementations for Unlit and PBS, adding geometry shaders for each.
Adding geometry shaders is going to destroy your performance.

So long story short: No, using Items for visualizing these "particles" when it's in the millions is probably a bad idea.
gabbsson
Halfling
Posts: 65
Joined: Wed Aug 08, 2018 9:03 am
x 13

Re: [2.1] Expectations/Advice on rendering 1 mil items

Post by gabbsson »

Thank you for the replies! I've read them as thoroughly as I can.

al2950 wrote: Fri Apr 05, 2019 4:13 pm I dont envy you, this is a difficult problem with a lot of unknowns! FYI I did a quick test with ogre and rendered 1 million spheres using an unlit material and I ensured shadows were disabled, ie very simple Compositor:
Your specs are far superior to what I'm running on so that should play a big part.
I'm curious however about the compositor. I just create mine in the code using createBasicWorkspaceDef.
Does that start with shadows by default? I haven't noticed any so I never bothered looking into it.

al2950 wrote: Fri Apr 05, 2019 4:13 pm Yes there is only one. I am not sure CPU Occlusion culling will help you here as you are CPU bound, its more for things like cities when a lot of objects are hidden behind buildings
You're right I realize now I should have been asking about z-culling rather than occlusion culling.
Anything I should know about z-culling or does that just do its job by default?

al2950 wrote: Fri Apr 05, 2019 4:13 pm I get 25fps with 100,000 spheres all in view, 100fps with ~50,000 vspheres in view
This makes me wonder if I am doing something wrong which makes things scale differently.
You go from 100 -> 25 -> 10 fps when rendering 50k -> 100k -> 1 mil spheres?
Then again this might be explained by hardware.

dark_sylinc wrote: Fri Apr 05, 2019 8:46 pm First, I suspect you don’t need the flexibility that an Item gives (individual materials, individual mesh, multiple submeshes, hierarchical transform animation), thus a specialized solution (i.e. usually written for particles) would be much faster.
I agree 100%. Honestly I probably don't even need this for my normal applications either.
The models are always fairly simple and will never really use any materials.
Since I'm working on replacing an old OpenGL rendering implementation there are quite a few features in Ogre that I won't use.

So is there something "lower level" that I can use instead which will run faster? Or would I have to create that myself too?
I still want to be able to set the transform of them individually, but the rest can be really simple down to vertices with indices or something.
If i understand correctly I would still need to inherit from MovableObject and then try to implement my own simpler version if an Item?
This would be a "quick fix" instead of creating a dedicated particle system.
Would the benefits of skipping a bunch of the flexibility increase performance enough?

dark_sylinc wrote: Fri Apr 05, 2019 8:46 pm A dedicated particle solution would submit most particles as a single or a few draws, thus fully utilizing all threads in a wavefront.

A dedicated solution (whether CPU or via GPU compute) would work via particles, and possibly cluster these particles spatially (in order to frustum cull many of them at once instead of culling one by one), and probably generate the vertex geometry in the vertex shader using SV_VertexID/gl_VertexID tricks.
So this sounds like the only real efficient solution. I'll look into it but I'm fairly new to all of this but I'll give it a shot.
But just to make sure, I should be able to render (as an example) a triangle or a sphere mesh using a GPU particle system?
Most of the examples I see just use "sprites" or other flat effects since they mostly want effects that aren't easily made by the usual method.


Edit: Forgot this part
dark_sylinc wrote: Fri Apr 05, 2019 8:46 pm Adding geometry shaders is going to destroy your performance.
Good to know! Is it enough to just have the geometry shader be a pass-through or does it's existence cause performance drops?
It's starting to sound like i need a third HLMS implementation... or is there an efficient way to skip using the geometry shader?
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: [2.1] Expectations/Advice on rendering 1 mil items

Post by dark_sylinc »

I'll try to reply in more detail to the rest of the questions, I'll just answer this one quickly:
gabbsson wrote: Tue Apr 09, 2019 9:35 am Edit: Forgot this part
dark_sylinc wrote: Fri Apr 05, 2019 8:46 pm Adding geometry shaders is going to destroy your performance.
Good to know! Is it enough to just have the geometry shader be a pass-through or does it's existence cause performance drops?
It's starting to sound like i need a third HLMS implementation... or is there an efficient way to skip using the geometry shader?
Pass-through GS still causes a significant slowdown (there is an exception with a specific NV extension, but it's tricky).
If you only need a GS for a few objects, then define @set( hlms_disable_stage, 1 ) when you don't need the GS.

eg. inside GeometryShader_gs.glsl:

Code: Select all

@property( requirements_not_met )
    @set( hlms_disable_stage, 1 )
@end
This tells the Hlms to not use the stage (in this case, the Geometry Shader) even if there is a GeometryShader_gs.glsl file.
gabbsson
Halfling
Posts: 65
Joined: Wed Aug 08, 2018 9:03 am
x 13

Re: [2.1] Expectations/Advice on rendering 1 mil items

Post by gabbsson »

dark_sylinc wrote: Tue Apr 09, 2019 9:22 pm I'll try to reply in more detail to the rest of the questions, I'll just answer this one quickly:
gabbsson wrote: Tue Apr 09, 2019 9:35 am Edit: Forgot this part
dark_sylinc wrote: Fri Apr 05, 2019 8:46 pm Adding geometry shaders is going to destroy your performance.
Good to know! Is it enough to just have the geometry shader be a pass-through or does it's existence cause performance drops?
It's starting to sound like i need a third HLMS implementation... or is there an efficient way to skip using the geometry shader?
Pass-through GS still causes a significant slowdown (there is an exception with a specific NV extension, but it's tricky).
If you only need a GS for a few objects, then define @set( hlms_disable_stage, 1 ) when you don't need the GS.

eg. inside GeometryShader_gs.glsl:

Code: Select all

@property( requirements_not_met )
    @set( hlms_disable_stage, 1 )
@end
This tells the Hlms to not use the stage (in this case, the Geometry Shader) even if there is a GeometryShader_gs.glsl file.
Great, I'll make sure to add this so that geometry shaders are only used when necessary.

Looking forward to the other answers (when you have time). Any example or tips are appreciated!
Thanks!
gabbsson
Halfling
Posts: 65
Joined: Wed Aug 08, 2018 9:03 am
x 13

Re: [2.1] Expectations/Advice on rendering 1 mil items

Post by gabbsson »

dark_sylinc wrote: Fri Apr 05, 2019 8:46 pm A dedicated particle solution would submit most particles as a single or a few draws, thus fully utilizing all threads in a wavefront.

A dedicated solution (whether CPU or via GPU compute) would work via particles, and possibly cluster these particles spatially (in order to frustum cull many of them at once instead of culling one by one), and probably generate the vertex geometry in the vertex shader using SV_VertexID/gl_VertexID tricks.
After reading up more on graphics particle systems and reading this my understanding is that I want to send the position information of all particles to the graphics card in one go and have it draw them all there. Instead of generating triangles for each and drawing the separately.

Currently what I have to work with is simply a list of all the particle positions (and a size/color which is the same for all particles).
Since the simulation is done beforehand the actual movements will not be done on the GPU. Moving the actual simulation to the GPU may be a future update however. Regardless I believe it is desired to have the simulation and the rendering be two different phases.

So what I've puzzled together is that I can embed all the positional data into a texture (and probably just send size and color as uniforms?). With RGBA each texel should be able to hold a 4 byte float value if I'm not mistaken, thus requiring 3 texels for a 3D position. Next I would unpack the texture (as I've seen in the Ogre shaders) in a specialized shader and generate the geometry there. This part I'm not sure of. Would this be a so called compute shader? I don't see how a vertex shader could handle generating all the geometry unless there is something along the lines of emit as in geometry shaders. I'm guessing this would also require a new Hlms implementation? Or is there a way to not use Hlms at all if it is not required? Is there a ComputerShader sample?
Then updating the texture on the CPU side should move the particles on the GPU correct?

I checked the Samples to see if I could learn anything and I found that the UavBuffer sample at least uses a compute shader.
Can't find where my samples are built to nor the SampleBrowser so I can't run it. But at least I learned there is a default ComputeHlms.
I'll look into the samples more to try and understand it.

Not sure how culling comes in to this but I don't think it's needed here. Since this is not a game anything in the Scene will most likely be in the camera view, akin to a CAD program.

From what I've read a particle system traditionally would handle the movement on the GPU entirely but that's not quite what I'm after.

Let me know if I've misunderstood something or if I'm on the right track!
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: [2.1] Expectations/Advice on rendering 1 mil items

Post by dark_sylinc »

You're correct on everything.

Now regarding to your doubts:
I don't see how a vertex shader could handle generating all the geometry unless there is something along the lines of emit as in geometry shaders
If all your geometry is the same basic primitive (or a few sets of basic primitives, then it's super easy).
For example particle FXs are quads. Each Quad is 2 triangles, 6 vertices without index buffer, 4 vertices and 6 indices with index buffer.
Thus you can can send a Vao with a primitive count of "num_quads * 6" with a static pre-filled index buffer and no vertex buffer, and in the vertex shader perform:

Code: Select all

uint particleIdx = vertexID / 4;
uint vertexInQuad = vertexID % 4;

float3 position;
position.x = (vertexInQuad % 2) ? 1.0 : -1.0;
position.y = (vertexInQuad & 2) ? -1.0 : 1.0;
position.z = 0.0;
For example a cube using triangle strip can be performed with 14 vertices: 14 tristrip cube in the vertex shader using only vertex ID (no UVs or normals):

Code: Select all

b = 1 << i;
x = (0x287a & b) != 0;
y = (0x02af & b) != 0;
z = (0x31e3 & b) != 0;
Our Terra system in the Terrain sample also uses vertex shader tricks to generate the XZ position of the terrain without a vertex buffer, and the height/Y is sampled from a heightmap texture

Like I said, basic primitives are "easy".
You can try some more advanced stuff like Merge Instancing, aka dynamic vertex pulling.

Or you can go full compute shader, and produce output that a vertex shader can later consume and "pass through" as is.
Look for anything that ends with "_cs" in our samples.

Aside from the compute sample we user:
  • A mipmapping filter: GaussianBlurBase_cs and MipmapsGaussianBlur_cs
  • A gaussian filter for ESM shadow maps (it's in log space): GaussianBlurLogFilterBase_cs and EsmGaussianBlurLogFilter_cs
  • A CS to perform a raw copy: CopyColourAndDepth_cs
And if you checkout the v2-2-vct branch (WARNING: It's a work in progress) you will see more compute shaders under Samples/Media/VCT and Samples/Media/Compute/Tools
Beware there are a few limitations being fixed (even though we likely won't let you bind directly an UAV buffer produced by a compute shader as a vertex buffer for a vertex shader; you can still read from that buffer using programmable vertex pulling and passthrough the data to the pixel shader).
al2950
OGRE Expert User
OGRE Expert User
Posts: 1227
Joined: Thu Dec 11, 2008 7:56 pm
Location: Bristol, UK
x 157

Re: [2.1] Expectations/Advice on rendering 1 mil items

Post by al2950 »

Hi!

I have been implementing Wicked Engines particles system as described here:
https://wickedengine.net/2017/11/07/gpu ... imulation/

Now I know you are not bothered about updating particles on the GPU, but the drawing particles part is certainly worth reading. It is basically the initial solution dark_sylinc outlined above. I have started implementing the draw stuff, but it requires Ogre changes that I am still playing with. However as you are not using compute shaders, and so already have most info on the CPU you dont need to use indirect buffers which is my sticking point.

Again as dark_sylinc said the terrain sample is probably what you want. look at Terra::optimizeCellsAndAdd and the whole TerrainCell class, these classes add an Ogre renderable (you will want one instance of TerrainCell for all your particles) and take note of 'vao->setPrimitiveRange(' in TerrainCell, this is were you would set number of particles * number of primitives per particle. I am not sure if you will need your own HLMS, without looking you should just be able to add some customisations to HlmsUnlit. You will need to add SV_VertexID semantic to custom_vs_attributes, and then do the vertex expansion stuff in custom_vs_preExecution... I think!!

Good luck!

If you can wait I may have something that may be of use to you in the not too distant future.
al2950
OGRE Expert User
OGRE Expert User
Posts: 1227
Joined: Thu Dec 11, 2008 7:56 pm
Location: Bristol, UK
x 157

Re: [2.1] Expectations/Advice on rendering 1 mil items

Post by al2950 »

ok... as it sits you would need a custom hlms.... but this I believe its unnecessary limitation in Ogre. Will create a separate post about it...
gabbsson
Halfling
Posts: 65
Joined: Wed Aug 08, 2018 9:03 am
x 13

Re: [2.1] Expectations/Advice on rendering 1 mil items

Post by gabbsson »

dark_sylinc wrote: Wed Apr 24, 2019 4:22 pm If all your geometry is the same basic primitive (or a few sets of basic primitives, then it's super easy).
For example particle FXs are quads. Each Quad is 2 triangles, 6 vertices without index buffer, 4 vertices and 6 indices with index buffer.
Thus you can can send a Vao with a primitive count of "num_quads * 6" with a static pre-filled index buffer and no vertex buffer, and in the vertex shader perform:

Code: Select all

uint particleIdx = vertexID / 4;
uint vertexInQuad = vertexID % 4;

float3 position;
position.x = (vertexInQuad % 2) ? 1.0 : -1.0;
position.y = (vertexInQuad & 2) ? -1.0 : 1.0;
position.z = 0.0;
Alright I think I understand this part. The vertex shader isn't generating "extra" vertices its just filling in particle_count*num_quads*6" "empty" ones since there is no vertexbuffer?
Once I've created the positions here I would want to offset them based on the particle position I presume.
The whole texture thing is still confusing, I've tried to understand sending "custom" data to shaders before using Ogre and still can't quite grasp it. I've seen plenty of examples in other threads. I think I'll get back to this later in th post.
dark_sylinc wrote: Wed Apr 24, 2019 4:22 pm Like I said, basic primitives are "easy".
You can try some more advanced stuff like Merge Instancing, aka dynamic vertex pulling.

Or you can go full compute shader, and produce output that a vertex shader can later consume and "pass through" as is.
Pretty sure I'm just gonna start out as simple as possible and just do the above. One quad per particle and no compute shader.
al2950 wrote: Wed Apr 24, 2019 5:56 pm I have been implementing Wicked Engines particles system as described here:
https://wickedengine.net/2017/11/07/gpu ... imulation/

Now I know you are not bothered about updating particles on the GPU, but the drawing particles part is certainly worth reading.
Thanks I think I've stumbled upon that blog post before, I'll give it another read even if it is DirectX and I'll only be using Opengl. Does seem like it's basically what dark_sylinc wrote.
al2950 wrote: Wed Apr 24, 2019 5:56 pm Again as dark_sylinc said the terrain sample is probably what you want. look at Terra::optimizeCellsAndAdd and the whole TerrainCell class, these classes add an Ogre renderable (you will want one instance of TerrainCell for all your particles) and take note of 'vao->setPrimitiveRange(' in TerrainCell, this is were you would set number of particles * number of primitives per particle. I am not sure if you will need your own HLMS, without looking you should just be able to add some customisations to HlmsUnlit. You will need to add SV_VertexID semantic to custom_vs_attributes, and then do the vertex expansion stuff in custom_vs_preExecution... I think!!

Good luck!

If you can wait I may have something that may be of use to you in the not too distant future.
I'll chug along in the meantime! If you get done soon that'll just be an added bonus! Thanks :)

I've tried to understand the sample but I'm not entirely clear on one part, which is still the whole "sending all the position data to the GPU" as I mentioned earlier.

The TerrainCell uses an uploadToGPU (called from the custom Hlms) to inject some data but that uses the ConstBuffer and if I recall I don't want to send large amounts of data through that?

I've tried to sort what I've understood so far in my head and made a list:
  • Create some ParticleRenderable which inherhits renderable. This holds the vao (created with an empty vertex buffers vector?) of all geometry for the whole system.
    I use setPrimitiveRange to match the amount of vertices I will eventually need.
  • The custom renderable also holds some type of texture which is updated with the particle positions.
    I'm starting to think that using the word texture here is misleading me. it will be a sampler2D on the shader.. but it can be "anything" I want on the CPU side, I'll just be assigning stuff to a pointer eventually either way, correct?
    Similar to how the TerrainCell assigns 16 values which end up in a Sampler2D (if I read the terra vertex shader correctly).
    EDIT: I got this wrong, it maps to the TerrainIntanceDecl.
  • A custom Hlms writes the texture to the GPU (I'm guessing through the currentMappedTextBuffer? Or do I create my own buffer?).
    I've attempted this before in a version of my own UnlitHlms and I couldn't figure it out at all (tried to mess with the worldViewProj).
    I don't see the link between assigning something to the pointer and where it ends up in the shader for the texBuffer. I kinda get it for the constBuffer (I hope).
  • Create a shader which does the stuff dark_sylinc wrote and reads the texture for positions in the world.
  • Create a custom MovableObject which holds the custom renderable to add it to the scene.
Voila 1 draw call, tons of movable quads.

Did I miss anything?

Thanks again for the answers, I understand way more than I did before and I believe I should be able to figure this out eventually!
al2950
OGRE Expert User
OGRE Expert User
Posts: 1227
Joined: Thu Dec 11, 2008 7:56 pm
Location: Bristol, UK
x 157

Re: [2.1] Expectations/Advice on rendering 1 mil items

Post by al2950 »

Ok, so I got distracted today, and I was interested to see if I could do this without creating a completely new HLMS type and use tips like this one from dark_sylinc.

Short story I couldn't but only because I could not undef the 'POSITION' semantic, which would cause Ogre to fall over as we are using Triangle strip not Triangle list...

But it does work, and I get around 150fps for 1 Million objects (cubes in this case). It is DirectX only and the code is very poorly written, but if you want to take a look I have uploaded it here
gabbsson
Halfling
Posts: 65
Joined: Wed Aug 08, 2018 9:03 am
x 13

Re: [2.1] Expectations/Advice on rendering 1 mil items

Post by gabbsson »

al2950 wrote: Sun Apr 28, 2019 1:39 pm Ok, so I got distracted today, and I was interested to see if I could do this without creating a completely new HLMS type and use tips like this one from dark_sylinc.

Short story I couldn't but only because I could not undef the 'POSITION' semantic, which would cause Ogre to fall over as we are using Triangle strip not Triangle list...

But it does work, and I get around 150fps for 1 Million objects (cubes in this case). It is DirectX only and the code is very poorly written, but if you want to take a look I have uploaded it here
This is perfect! I think I finally understand and will be trying this out. I've seen the comments/warnings you've added and I'll see over them later.
For now I just want to replicate what you've done in my code and for GLSL. I'm fine with creating a custom Hlms.

I have run into one question however. It seems a change was made in HlmsUnlit from 2.1 to 2.2, the mTexUnitSlotStart does not exist in Unlit in 2.1 but does in 2.2. Any ideas on what I can do in 2.1? I would prefer to not use a WIP branch.
HlmsPbs has it in 2.1 so perhaps I'll inherit that instead but I'm still curious (and I suspect it will be less performant than Unlit?).

EDIT:

Also curious whats happening here:

Code: Select all

virtual void calculateHashForPreCreate(Ogre::Renderable *renderable, Ogre::PiecesMap *inOutPieces)
{
	HlmsUnlit::calculateHashForPreCreate(renderable, inOutPieces);

	const Ogre::Renderable::CustomParameterMap &paramMap = renderable->getCustomParameters();
	Ogre::Renderable::CustomParameterMap::const_iterator itor = paramMap.find(1234);
	if (itor != paramMap.end())
	{
		setProperty("MakeItRain", 1);
		setProperty("ParticleBufferSlot", mParticleBufSlot);
	}
}
I don't see where "ParticleBufferSlot" is being used in the shader? Why do you set that property?
al2950
OGRE Expert User
OGRE Expert User
Posts: 1227
Joined: Thu Dec 11, 2008 7:56 pm
Location: Bristol, UK
x 157

Re: [2.1] Expectations/Advice on rendering 1 mil items

Post by al2950 »

gabbsson wrote: Mon Apr 29, 2019 12:06 pm This is perfect! I think I finally understand and will be trying this out. I've seen the comments/warnings you've added and I'll see over them later.
For now I just want to replicate what you've done in my code and for GLSL. I'm fine with creating a custom Hlms.
If it was this comment:

Code: Select all

// VERY IMPORTANT. Iam only using a single buffer here, but if you have a buffer 
// bound to shader that is updated every frame, you are going to want several buffers, as the GPU may buffer frames.
// There are plenty of examples of this being done properly in the HLMS code and also the Forward3D code for the grid buffers.
I beleive I was talking complete rubbish. If you are using dynamic buffer and map/unmap Ogre deals with this for you. (underneath it creates a buffer 3*the size of the requested and deals with the offsets internally.)
gabbsson wrote: Mon Apr 29, 2019 12:06 pm I have run into one question however. It seems a change was made in HlmsUnlit from 2.1 to 2.2, the mTexUnitSlotStart does not exist in Unlit in 2.1 but does in 2.2. Any ideas on what I can do in 2.1? I would prefer to not use a WIP branch.
I believe its was added just to make a little easier to customize. You dont have to do that, you could just pick an arbitory fixed number for your texture/buffer slot. Eg 14.... I think I ended up hardcoding it in the shader anyway and not using the property ('ParticleBufferSlot') I set. Which might of confused you. FYI I would consider 2.2 very stable now, but I understand you not wanting to jump whilst its stil WIP.
gabbsson wrote: Mon Apr 29, 2019 12:06 pm HlmsPbs has it in 2.1 so perhaps I'll inherit that instead but I'm still curious (and I suspect it will be less performant than Unlit?).
I would stick with unlit for now as its simpler to debug and understand. When you have unlit working you could move to PBS. NB PBS is much more heavy on the pixel shader (for obvious'ish reasons), and so will most likely have a negative implact in your scenario.
gabbsson
Halfling
Posts: 65
Joined: Wed Aug 08, 2018 9:03 am
x 13

Re: [2.1] Expectations/Advice on rendering 1 mil items

Post by gabbsson »

al2950 wrote: Mon Apr 29, 2019 1:01 pm If it was this comment:

Code: Select all

// VERY IMPORTANT. Iam only using a single buffer here, but if you have a buffer 
// bound to shader that is updated every frame, you are going to want several buffers, as the GPU may buffer frames.
// There are plenty of examples of this being done properly in the HLMS code and also the Forward3D code for the grid buffers.
I beleive I was talking complete rubbish. If you are using dynamic buffer and map/unmap Ogre deals with this for you. (underneath it creates a buffer 3*the size of the requested and deals with the offsets internally.)

I believe its was added just to make a little easier to customize. You dont have to do that, you could just pick an arbitory fixed number for your texture/buffer slot. Eg 14.... I think I ended up hardcoding it in the shader anyway and not using the property ('ParticleBufferSlot') I set. Which might of confused you. FYI I would consider 2.2 very stable now, but I understand you not wanting to jump whilst its stil WIP.

I would stick with unlit for now as its simpler to debug and understand. When you have unlit working you could move to PBS. NB PBS is much more heavy on the pixel shader (for obvious'ish reasons), and so will most likely have a negative implact in your scenario.
Alright thanks for the update!
I've run into a different problem stemming from trying to inherint PBS (something about FORCEINLINE on fillBuffersFor) so ill happily move back to Unlit and just set an arbitrary number.
That's something I'm trying to figure out, what that number is in GLSL. In your shader you use T2 as far as i can tell, what does that mean?
I'm guessing its something similar to layout in glsl?

Overall you've helped a ton! Thanks again :)
al2950
OGRE Expert User
OGRE Expert User
Posts: 1227
Joined: Thu Dec 11, 2008 7:56 pm
Location: Bristol, UK
x 157

Re: [2.1] Expectations/Advice on rendering 1 mil items

Post by al2950 »

I used t2 yes, however if you dont have access to mTexUnitSlotStart, ogre may assign textures to slot t2, so if you are going to hardcode it just use the maximum allowed slots.. which on 2.1 is limited to 16 I believe so you would use t15. On 2.2 it has been upped to 128
gabbsson
Halfling
Posts: 65
Joined: Wed Aug 08, 2018 9:03 am
x 13

Re: [2.1] Expectations/Advice on rendering 1 mil items

Post by gabbsson »

al2950 wrote: Mon Apr 29, 2019 2:59 pm I used t2 yes, however if you dont have access to mTexUnitSlotStart, ogre may assign textures to slot t2, so if you are going to hardcode it just use the maximum allowed slots.. which on 2.1 is limited to 16 I believe so you would use t15. On 2.2 it has been upped to 128
Alright! I've only written GLSL and I don't recognize the use of "t{i}" at all. So I'm not sure how that works in GLSL, I'm just guessing its a HLSL thing now.
Either way I should be able to figure it out somehow.
al2950
OGRE Expert User
OGRE Expert User
Posts: 1227
Joined: Thu Dec 11, 2008 7:56 pm
Location: Bristol, UK
x 157

Re: [2.1] Expectations/Advice on rendering 1 mil items

Post by al2950 »

Yes hlsl embeds the texture slots into the shader, but glsl (and metal) do not. For tips on how to sort that bit take a look at HlmsUnlit::createShaderCacheEntry which you will have to override to add your own texture slot declaration.
gabbsson
Halfling
Posts: 65
Joined: Wed Aug 08, 2018 9:03 am
x 13

Re: [2.1] Expectations/Advice on rendering 1 mil items

Post by gabbsson »

al2950 wrote: Mon Apr 29, 2019 4:30 pm Yes hlsl embeds the texture slots into the shader, but glsl (and metal) do not. For tips on how to sort that bit take a look at HlmsUnlit::createShaderCacheEntry which you will have to override to add your own texture slot declaration.
Alright I'll give that a read. I'm getting closer!

Having a hard time creating a third Hlms. My first two simply replace the regular Unlit and PBS.
It seems HlmsManager->getHlms(Ogre::HLMS_USER0)->getDefaultDatablock()) still becomes unlit despite chaning mType of the hlms.

Since I know have two implementations that inherint Unlit im guessing theres something ive missed.
What defines which hlms a "default" datablock is connected to?

EDIT: I'm dumb, forgot to add my new movable object to a node. Should be solved.
Post Reply