[2.1] Performance Questions =)

Discussion area about developing with Ogre-Next (2.1, 2.2 and beyond)


Post Reply
xrgo
OGRE Expert User
OGRE Expert User
Posts: 1148
Joined: Sat Jul 06, 2013 10:59 pm
Location: Chile
x 168

[2.1] Performance Questions =)

Post by xrgo »

Hello! I have some questions regarding usage to get the best performance on Ogre 2.1 (which btw its awesome, but the more the better)

1. does sharing the same blend and/or macro blocks between datablocks help?

2. does "use as fewer materials as possible to reduce batch count" still counts? even a if its a little

3. Imagine I have a tank, and I want to control the turret. Which one is better performance wise?: tank and turret as separated objects, or all the same object and control the parts with bones

4. I already have the code to manually create v1 meshes, then I do the import to v2 process. Does this have any later performance hit compared to creating v2 meshes directly?

Many thanks in advance!!
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: [2.1] Performance Questions =)

Post by dark_sylinc »

xrgo wrote:1. does sharing the same blend and/or macro blocks between datablocks help?
Ogre works inherently by sharing these blocks.

When you eg. call HlmsDatablock::setBlendblock( &myBlendblock ); internally the datablock will call HlmsManager::getBlendblock to get the real pointer with API-filled constructs (i.e. mRsData variable may not be null).
When HlmsManager::getBlendblock gets called with the same parameters, the first time will return a new pointer. The second time it gets called it will return the same pointer (and an internal reference count is increased); thus sharing becomes unavoidable.

When the HlmsDatablock is assigned a different blendblock, the old blendblock will have its reference count decreased (see the comments in getMacroblock and destroyMacroblock). When the ref. count reaches zero, it will be destroyed.
All the blocks (Macro-, blend-, sampler-blocks) behave the same.
2. does "use as fewer materials as possible to reduce batch count" still counts? even a if its a little
If you have a "material explosion" (i.e. you use 10.000 different materials) then yes.
Otherwise no, at least for the most part.

API wise, the PBS shader will batch together 273 materials in the same buffer, and Unlit batches 1024 per buffer. If a draw references material #0 and the next draw uses material #200; no API overhead is incurred.
But if a draw references material #0 and the next one uses material #500; we will need to bind a different constant buffer and break the MDI calls (and thus incur in higher API overhead). We sort our draws to avoid this kind of switching as much as possible.

There is one thing to note and it is that GPU-wise, in the shader, you could have a higher cache miss ratio if you end up jumping between materials. However since each material buffer is 64kb and usually we have lots of vertices and pixels to process per material (1 material = 240 bytes) I highly doubt gpu-side cache misses are going to ever be a problem. But I could be wrong.

It may be noted that textures play a much higher role here. We don't use bindless. We use texture arrays to allow more batching. If material #0 uses texture A, material #200 uses texture B; but A & B happen to live in different texture arrays, we will have to split the MDI call in order to bind a different texture array (and thus incur in higher API overhead).
The details are in section "8.9 The Hlms Texture Manager" of the manual; long story short, try to keep your textures in standardized formats (i.e. lots of 1024x1024 / 2048x2048 / 512x512 textures); use as few formats as possible (i.e. RGBA8888 for GUI, BC1 for majority of textures, BC5 for normal maps, BC3 for textures that need alpha blending) so that they can be put in the same array. Be sure to write an Art Guideline document so that your artists use homogenous settings and don't end up with a nightmare when it's too late.

Compared to 1.x; materials used to be super expensive and you were encouraged to keep the count as low as possible; while in 2.1 they're quite cheap. Furthermore, as long as the textures used live in the same texture array and use materials that live in the same buffer; we can instance them together; which is something that 1.x could never do.
3. Imagine I have a tank, and I want to control the turret. Which one is better performance wise?: tank and turret as separated objects, or all the same object and control the parts with bones
Most likely to keep them as separate objects (which is the opposite of what the answer would be for 1.x).
Though experimentation is encouraged.
4. I already have the code to manually create v1 meshes, then I do the import to v2 process. Does this have any later performance hit compared to creating v2 meshes directly?
At loading time? Sure, you're loading them twice, consume more bandwidth, make roundtrips CPU->GPU->CPU->GPU*, use CPU time to perform the compression (if you allow converting to 16-bit half or QTangent generation). Also if you don't unload the v1 meshes afterwards they will occupy up to twice as GPU memory since there will be two copies.
During rendering? Nope, once it is imported, it is the same as if it had been loaded from disk. There's no performance hit.

(*) The first CPU->GPU when loading the v1 mesh from disk, the GPU->CPU to get this data back, the last CPU->GPU to upload the final v2 mesh.
xrgo
OGRE Expert User
OGRE Expert User
Posts: 1148
Joined: Sat Jul 06, 2013 10:59 pm
Location: Chile
x 168

Re: [2.1] Performance Questions =)

Post by xrgo »

thank you very much!!!!!!!! very useful information!!
User avatar
MadWatch
Halfling
Posts: 64
Joined: Sat Jul 06, 2013 11:25 am
x 4

Re: [2.1] Performance Questions =)

Post by MadWatch »

Very useful informations indeed. I hope you won't mind some more questions.
dark_sylinc wrote:Be sure to write an Art Guideline document so that your artists use homogenous settings and don't end up with a nightmare when it's too late.
This is pretty much the nightmare I'm living right now. My artist made a tone of textures and they all have different sizes. In his defense these are not exactly textures but sprites, so it's pretty hard to make them all have the same size. What technique could be used to still batch them all together ?

Would it be hard to implement bindless texture in Ogre (as a plugin maybe) ? Would that be efficient ?

Is there a way to put textures of different size into the same array without wasting too much memory (sparse texture maybe) ?

Any other solution I didn't think of ?

Regards,
Nicolas
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: [2.1] Performance Questions =)

Post by dark_sylinc »

MadWatch wrote: This is pretty much the nightmare I'm living right now. My artist made a tone of textures and they all have different sizes. In his defense these are not exactly textures but sprites, so it's pretty hard to make them all have the same size. What technique could be used to still batch them all together ?
I suggest you try UV atlas packing tools. I don't know how different the size is, however if it's not huge, leaving some blank space is sometimes a good trade-off. The textures should already have been in power of 2 sizes, otherwise something wrong was done.
MadWatch wrote:Would it be hard to implement bindless texture in Ogre (as a plugin maybe) ? Would that be efficient ?
Efficient? Yes.
I wanted to use bindless, the problem was that only two vendors support them (NVIDIA & AMD; and they have to be the latest cards); and they're not supported in all APIs; which makes them a PITA to support consistently.

But first, you should check you're bottlenecked by API overhead caused by the texture switching. No point in worrying about it if it's not your main bottleneck or in the top 5.
N0vember
Gremlin
Posts: 196
Joined: Tue Jan 27, 2009 12:27 am
x 24

Re: [2.1] Performance Questions =)

Post by N0vember »

MadWatch wrote: Is there a way to put textures of different size into the same array without wasting too much memory (sparse texture maybe) ?
My take on this problem is to pack everything in the texture at runtime. Because I think packing sprites in one texture is an implementation detail and designers shouldn't care about that or have to do it themselve.
It's quite easy to do, you just need to get a rectangle packing algorithm somewhere (just google GuillotineBinPack or SkylineBinPack, or stb_rect_pack)
Then it's just a matter of adding your rects to the rectangle packer, and when you're finished packing them, do the texture work (blit the sprites to the atlas texture one by one, using the coordinates the rectangle packer calculated for you)

This way you don't care about powers of two for the individual sprites, you just need the final texture to be a power of two
User avatar
MadWatch
Halfling
Posts: 64
Joined: Sat Jul 06, 2013 11:25 am
x 4

Re: [2.1] Performance Questions =)

Post by MadWatch »

dark_sylinc wrote:I don't know how different the size is, however if it's not huge, leaving some blank space is sometimes a good trade-off. The textures should already have been in power of 2 sizes, otherwise something wrong was done.
The size difference is huge, too huge to leave some blank space around the smaller textures. They aren't power of 2 either. Why is this a problem ? AFAIK, power of 2 size is only important for mipmapping and I'm not using it, it isn't needed for sprites. Am I mistaken ?
N0vember wrote:My take on this problem is to pack everything in the texture at runtime. Because I think packing sprites in one texture is an implementation detail and designers shouldn't care about that or have to do it themselve.
It's quite easy to do, you just need to get a rectangle packing algorithm somewhere (just google GuillotineBinPack or SkylineBinPack, or stb_rect_pack)
Then it's just a matter of adding your rects to the rectangle packer, and when you're finished packing them, do the texture work (blit the sprites to the atlas texture one by one, using the coordinates the rectangle packer calculated for you)
I thought of doing this. It could be the easiest and most portable solution. Thank you very much for naming the packing algorithms, I will take a look at them. The only thing that's worrying me with this solution is the bleeding. I don't think there is a way to completely avoid it with non array texture atlas is it ?

Anyway, many thanks for your advises on my problem, both of you.
Post Reply