Question about 2.2

Discussion area about developing with Ogre-Next (2.1, 2.2 and beyond)


Post Reply
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Question about 2.2

Post by dark_sylinc »

Someone left me this question on my PM. Since it's useful I'll answer it for everybody:
I have been eyeing 2.2 a little bit after the porting manual was added.

Do you think it's ready enough for some preliminary porting for testing (we are on OpenGL only)?

If I understand correctly, I will need the 2.2 branch to be able to make custom resolve passes in OpenGL, for example to implement the reversible tonemapping you've mentioned (we need better HDR AA).
For preliminary porting, yes. You could even try using OGRE_VERSION_MINOR to switch between 2.1 and 2.2 differences.
I wouldn't recommend it for production because right now what we need is lots of testing. I don't think it's well tested what happens when you unload a texture, for example.

As for anything related to MSAA, Ogre 2.2 is superior in every way to 2.1; because 2.1 just treated MSAA like a magic algorithm while 2.2 gives a lot of explicit access to MSAA.
When it comes to texture streaming and those things, does 2.2 interoperate well with the HLMS texture arrays? For example, will there be any wins in texture usage or loading performance?
Yes. Ogre 2.2 got rid of anything that used the "old" Textures. That includes the HlmsTextureManager.
The new TextureGpuManager, which replaces the old TextureManager, also replaces HlmsTextureManager. I just updated the documentation (in Docs/src/manual/Ogre2.2.Changes.md, build it with doxygen) to explain this in detail.

As for loading performance: The code has been rewritten and there were many inefficiencies in the old one, so the new code generally is expected to perform better. But the big gain comes from background streaming. The whole point of 2.2's textures is that the textures do not block main rendering (unless you explicitly want that). The Progress Report from December 2017 has a video showing it.

The big main problem right now regarding background streaming performance, is that it will very likely cause a shader recompile (and that is bad). This is because the TextureGpuManager does not now the metadata in advance (like resolution, pixel format) and without it, it just presents a 4x4 dummy texture while the real texture is being uploaded in the background.
This can cause a shader recompile because let's suppose you have 2 textures, both are being loaded. The produced shader may end up looking like any of these:

Code: Select all

//Shader variant A
uniform sampler2DArray textures[2];
textures[0] //contains 4x4 dummy for diffuse
textures[1] //contains 1024x1024 loaded for normal map

Code: Select all

//Shader variant A
uniform sampler2DArray textures[2];
textures[0] //contains 1024x1024 dummy for diffuse
textures[1] //contains 4x4 loaded for normal map

Code: Select all

//Shader variant B
uniform sampler2DArray textures[1];
textures[0] //contains 4x4 dummy for both diffuse & normal maps

Code: Select all

//Shader variant B
uniform sampler2DArray textures[1];
textures[0] //contains 1024x1024 loaded for both diffuse & normal maps

Code: Select all

//Shader variant A
uniform sampler2DArray textures[2];
textures[0] //contains 1024x1024 loaded for diffuse (it's in pool M)
textures[0] //contains 1024x1024 loaded for normal (it's in pool N)
There are two variants, A & B. However as there are more textures, the number of variants grow and things get worse; and you could even have multiple variants be switched as textures get loaded (that's very stuttery).

The solution to that will be a texture cache that will save the metadata and store it to disk; which can be loaded by subsequent runs. If we know the metadata in advance, this problem won't happen because we already know what pool and slot to reserve for the batched textures.
If the metadata becomes out of date (e.g. you replaced the texture with a new one that's bigger in resolution) then it's not a big problem because the TextureGpuManager will see when the texture finishes loading that the resolution didn't match, and relocate the pool. However this can cause a shader recompile.
Unlike the shader cache, generating the metadata cache offline is easy (just run a command line tool that recursively searches for textures in the given folder, loads their headers, and saves their data). The cache also doesn't need to be updated if the texture contents changes, unless the resolution or pixel format changed.

Because that metadata cache code hasn't yet been written, beware of this issue.

So TL;DR: I encourage you try it out, and report back your problems. We need testing. Thanks!
Just... do not devote a disproportionate amount of resources because you assumed it's ready to be deployed to your final users.

Cheers
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Question about 2.2

Post by dark_sylinc »

I shall point out that even if you force serialization to prevent background streaming (e.g. you need all your frames to be rendered perfect), you still benefit from it.

For example if your code looks like the following:

Code: Select all

for( int i=0; i<1000; ++i )
{
     item = sceneManager->createItem( ... );
     sceneNode->attachObject( item );
}

textureGpuManager->waitForStreamingCompletion();
In 2.1 the following would happen:
  1. At startup, load ALL of the textures (even if they're never used but still referenced by an Hlms material)
in every iteration:
  1. Search for the mesh. If it's not loaded, load it
  2. Create the item
  3. Assign the material, evaluating the geometry
  4. Repeat step 1
But in 2.2, the following would happen:
  1. Search for the mesh. If it's not loaded, load it
  2. Create the item
  3. Assign the material, evaluating the geometry
  4. The material will schedule to the background thread the textures needed to load
  5. Repeat step 1
The key point here is that the second time it iterates, it will be possibly loading new meshes, creating items, assigning material & evaluating geometry while at the same time another thread is still loading your textures from the previous iteration.

So by the time you reach textureGpuManager->waitForStreamingCompletion(); there was some work that was parallelized that in 2.1 was never so (additionally, some textures may have never been loaded because they're not required, those boosting perceived performance).
Your mileage may vary because if you're opening a very big mesh and also a very big texture, your threads will be competing for IO. But if that's a problem for you, you'll need much more careful and explicit loading of resources to maximize disk throughput.

So even if you're planning on calling waitForStreamingCompletion because you need it, you may still benefit from background streaming.
crancran
Greenskin
Posts: 138
Joined: Wed May 05, 2010 3:36 pm
x 6

Re: Question about 2.2

Post by crancran »

Is it possible to skip the rendering of an object using the dummy texture and instead delay the render until the needed texture has been streamed?
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Question about 2.2

Post by dark_sylinc »

TBH I never considered that. You could do it by hand by listening to the Texture changes (or polling every frame) and call setVisible based on that. But there's no single function to do it in one line out of the box.

I suppose I could quickly add that feature by having the vertex shader collapse all vertices while not all textures are ready. It won't save performance (as we'd still be sending the meshes to the gpu) but it would prevent the item from showing up on screen.

I would have to think morr about it. What's the use cases? Why do you want that? What do you expect? Just for a few items or a lot of them? Thanks
crancran
Greenskin
Posts: 138
Joined: Wed May 05, 2010 3:36 pm
x 6

Re: Question about 2.2

Post by crancran »

The use case is as the player moves in an open world, I want to be judicious about what gets loaded not only based on the user's configured view distance (e.g. use can see 1000 game units in all directions) but also based on the distance an object is to the player with LoD.

For example, there are lots of smaller foliage objects and such that we only wish to load if the said object is within 250 game units of the player, but obviously that distance is smaller than the configured view distance. Rather than render this foliage with a dummy texture, I think its more aesthetically pleasing to merely wait for the real textures to be streamed and then fade the object in afterward. We'd also be able to load these into the scene preemptively without any visual artifacts of the dummy textured object (e.g. load into scene at 300 units to be prepared for visibility at 250).

As for the quantity, I'd say it would vary based on the game environment. What I am mostly considering using it for is any smaller object that typically has such a lower LoD that if the player were flying across the terrain looking off into the distance, the only objects with much higher LoDs would be visible. Various foliage objects, rocks, fences, tables, perhaps even smaller buildings might not be visible until the player has moved closer and I'd rather those objects avoid having the texture pop artifact effect.

One reason this can often be problematic is as a player moves from one open world zone area into another. Typically a zone is designed with a specific texture palette and art kit that differs from the current zone you're in. While the zones are typically blended on the edges to a degree, we use a zone to tell a different story, drive a different feel, look, and impression. I'd imagine this is where we'd often see this artifact pop the most.

If there are other ways to solve this without something special, let me know. But the idea is to make the game experience immersive for the player as much as possible and avoiding any texture pop helps :).
crancran
Greenskin
Posts: 138
Joined: Wed May 05, 2010 3:36 pm
x 6

Re: Question about 2.2

Post by crancran »

dark_sylinc wrote: Sat Feb 17, 2018 8:27 pm There are two variants, A & B. However as there are more textures, the number of variants grow and things get worse; and you could even have multiple variants be switched as textures get loaded (that's very stuttery).
With having a workable port to 2.2, this is painfully obvious for us.

A world map consists of a large number (potentially up to a max of 4096) patches. Each patch itself consists of 256 sections where each section is designed to support a certain number of diffuse textures, blend masks, and static shadows. In just attempting to render a single terrain patch, I am noticing that 2.2 is generating several hundred (~400) shaders which is painfully slow. By comparison to 2.1, it would typically generate somewhere in the neighborhood 12-24.

I have tried both with and without OGRE_FORCE_TEXTURE_STREAMING_ON_MAIN_THREAD and the outcome is all the same.

What I am not sure is if how I am building the data blocks themselves is compounding this problem. For each section, I build a data block with the appropriate diffuse, blend, and shadow texture data and associate that with a section renderable object. Would combining those sections have any impact on the shader generation? I'm sure from a VBO perspective it might; but not sure wrt the shaders themselves.
dark_sylinc wrote: Sat Feb 17, 2018 8:27 pm The solution to that will be a texture cache that will save the metadata and store it to disk; which can be loaded by subsequent runs.
What information will this texture cache ultimately store?
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Question about 2.2

Post by dark_sylinc »

crancran wrote: Thu Mar 22, 2018 11:49 pm I have tried both with and without OGRE_FORCE_TEXTURE_STREAMING_ON_MAIN_THREAD and the outcome is all the same.
To completely fix this stutter, enable OGRE_FORCE_TEXTURE_STREAMING_ON_MAIN_THREAD and set mEntriesToProcessPerIteration to a very high value (like std::numeric_limits<size_t>::max()). Otherwise with the default (3) it will process 3 textures per frame or so, which mimics background streaming behavior.
What I am not sure is if how I am building the data blocks themselves is compounding this problem. For each section, I build a data block with the appropriate diffuse, blend, and shadow texture data and associate that with a section renderable object
When you call HlmsPbsDatablock::setTexture, we ask the TextureGpuManager to start load the texture from file. Meanwhile we schedule ourselves to prepare a shader. But we won't do that until preparePassHash is called (i.e. inside a pass during render) in case you call setTexture() with multiple textures, or in case the textures are already loaded by the time we reach preparePassHash.
So short story: it shouldn't, unless you're somhow triggering OGRE_HLMS_TEXTURE_BASE_CLASS::updateDescriptorSets too early.
dark_sylinc wrote: Sat Feb 17, 2018 8:27 pm The solution to that will be a texture cache that will save the metadata and store it to disk; which can be loaded by subsequent runs.
What information will this texture cache ultimately store?
[/quote]
Resolution (width, height, depth, number of arrays), type (2D, 2D Array, cubemap, etc), number of mipmaps, and pixel format.
crancran
Greenskin
Posts: 138
Joined: Wed May 05, 2010 3:36 pm
x 6

Re: Question about 2.2

Post by crancran »

When porting the changes for Unlit to my HLMS implementation, I noticed this particular line:

Code: Select all

setProperty( UnlitProperty::DiffuseMap, datablock->mTexturesDescSet != 0 );
This seems to always set the "diffuse_map" property to 1 which seems incorrect to me, particularly if you're not doing automatic batching and your specifying a number of diffuse textures for the material definition. So I altered my implementation to do this instead:

Code: Select all

if( datablock->mTexturesDescSet )
            setProperty( UnlitProperty::DiffuseMap, datablock->mTexturesDescSet->mTextures.size() );
This worked quite well for non-batched situations; however when I enabled the TextureFlags::AutomaticBatching flag when loading my textures, I noticed that the value set for "diffuse_map" didn't make much sense anymore either. Particularly when I specified 4 diffuse and 3 blend textures, the shader output showed "diffuse_map 5" where-as I had properties set for "diffuse_map6_idx" and "diffuse_map7_idx". I had expected "diffuse_map 7" having been set.

Am I misinterpreting the meaning and usage of this property?
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Question about 2.2

Post by dark_sylinc »

Please note that datablock->mTexturesDescSet->mTextures.size() can be deceiving:
  • You may have textures 0, 1 and 5 bound.
  • datablock->mTexturesDescSet->mTextures.size() may return 3. However you need to go up to diffuse_map5 (with diffuse_map2 through diffuse_map4 being unused)
  • datablock->mTexturesDescSet->mTextures.size() may return less than 3 if automatic batching is involved (because e.g. textures 0 and 1 are internally just slices to a texture array at mTextures[0]). datablock->mTexturesDescSet->mTextures.size() may return anywhere between [1; 3] depending on how textures were batched together
Perhaps that's what's going on? I'm not 100% sure I understand the problem you're getting into.
crancran
Greenskin
Posts: 138
Joined: Wed May 05, 2010 3:36 pm
x 6

Re: Question about 2.2

Post by crancran »

dark_sylinc wrote: Fri Mar 23, 2018 8:35 pm Perhaps that's what's going on? I'm not 100% sure I understand the problem you're getting into.
It just appears to me that the blend mode piece logic for SamplerOrigin and SamplerUV end up not working properly on 2.2 based on what I see. In my case, diffuse_map = 5 however there are 7 textures; where several were batched together as you described. But the issue is that the shader will only ever sample up to 5. The textures identified as diffuse_map6 and diffuse_map7 are ignored, right?
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Question about 2.2

Post by dark_sylinc »

Ow crap. I see the problem now. We perform

Code: Select all

@foreach( diffuse_map, n, 1 )
That is definitely not going to work
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Question about 2.2

Post by dark_sylinc »

Fixed. Please try again and let me know if that works.

Thanks for the report!
Cheers
crancran
Greenskin
Posts: 138
Joined: Wed May 05, 2010 3:36 pm
x 6

Re: Question about 2.2

Post by crancran »

dark_sylinc wrote: Fri Mar 23, 2018 9:26 pm Fixed. Please try again and let me know if that works.
That did, thanks!

Now I am trying to optimize the mesh upload and render process and I was curious if you had any suggestions for me. In the picture below, I have extracted a single terrain tile and highlighted the cell boundaries by color based on the following conditions

Image

1. Cyan = 4 diffuse textures, 3 blend textures.
2. Dark Blue = 3 diffuse textures, 2 blend textures.
3. Green = 2 diffuse textures, 1 blend texture.
4. Red = 1 diffuse texture, no blend (none in this tile)

Right now I take a brute force approach. I build each cell separately into its own Ogre::Item with its own vertex/index buffers & material. I suspect that this isn't great on performance given that the vertices used by a single Ogre::Item is relatively small (~12 dozen). Secondly given the texture atlas functionality that, I also wonder if I am hurting myself rather than taking full advantage of the new features. This all the while leads me to lots of generated shaders and quite a bit of upfront compilation of them.

Is there a way I could optimize my build approach and take better advantage of Ogre 2.1/2.2 features ?
crancran
Greenskin
Posts: 138
Joined: Wed May 05, 2010 3:36 pm
x 6

Re: Question about 2.2

Post by crancran »

After doing some more digging, I think there is an optimization in HLMS which we should consider.

When AutomaticBatching is disabled, I assumed that textureMaps[] basically represents an array of my texture bindings 0 to the number of textures I have bound. Therefore with my terrain blend configuration, I expected only 4 shader pairs to be generated. Prior to Hlms, I had only 4 different pixel shaders and I bound the appropriate one to the material based on the number of layers.

In my Hlms implementation and terrain code, I bind my textures the same order, lowest layer to highest layer. So for the situation where there are 4 diffuse and 3 blend textures at play, I would set those textures as follows:

1. Layer 1 Diffuse (texUnit = 0)
2. Layer 2 Diffuse (texUnit = 1)
3. Layer 2 Blend Mask (texUnit = 2)
4. Layer 3 Diffuse (texUnit = 3)
5. Layer 3 Blend Mask (texUnit = 4)
6. Layer 4 Diffuse (texUnit = 5)
7. Layer 4 Blend Mask (texUnit = 6)

What confuses me is when I look at the shaders generated for 2 TerrainCells that both bind 4 diffuse and 3 blend textures, multiple shaders are generated despite the fact they use the same shader configuration. After doing more inspection of the shader output, the only difference I find are the values which are bound to diffuse_mapN_idx. They're always in the range of 0 to 6, but they're never consistent.

I haven't yet looked at the details inside OGRE_HLMS_TEXTURE_BASE_CLASS and how the texture descriptors determine the idx value, but I definitely think that HLMS could benefit from forcing diffuse_mapN_idx to always be deterministic when not using AutomaticBatching at least.

Is there anyway to improve this?
crancran
Greenskin
Posts: 138
Joined: Wed May 05, 2010 3:36 pm
x 6

Re: Question about 2.2

Post by crancran »

Not sure what impact this may have on AutomaticBatching (if any), but it seems if I override the following in my datablock implementation:

Code: Select all

        /// Expects caller to call flushRenderables if we return true.
        virtual bool bakeTextures( bool hasSeparateSamplers );
        
        /// This function has O( log N ) complexity, but O(1) if the texture was not set.
        uint8 getIndexToDescriptorTexture( uint8 texType );
and then change the usage of std::lower_bound with std::find seems to significantly improve shader generation.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Question about 2.2

Post by dark_sylinc »

You figured it out before I could reply :) .

OGRE_HLMS_TEXTURE_BASE_CLASS::bakeTextures sorts textures so that multiple texture setups (e.g. textures A, B, C and textures C, A, B) can be reused (additionally it ensures environment maps go last for... assumption reasons in HlmsPbs)

DescriptorSet reuse leads to better sorting, and ultimately better performance. But when you put it like that, it obviously needs rethinking a bit because spawning multiple shaders is way worse.

TL;DR: The sort done by OrderTextureByPoolThenName is not strictly necessary except for the cubemaps which must go last.

Edit: Yep, the current method is flawed. Even for automatically batched textures. A better sorting should consider the "place" in the slots of the textures (e.g. is this a texture? a detail map?) to avoid this kind of issues.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Question about 2.2

Post by dark_sylinc »

Fixed. Those unnecessary cloned shaders should be gone (ping me if they're not).

Thanks for reporting this issue. It would've been a PITA if it went undetected for a long time.

Cheers
Post Reply