Texture Streaming Refactor proposal (w/ SLIDES)
-
- OGRE Expert User
- Posts: 1227
- Joined: Thu Dec 11, 2008 7:56 pm
- Location: Bristol, UK
- x 157
Re: Texture Streaming Refactor proposal (w/ SLIDES)
This is looking very promising
I am not so worried about streaming as I can load everything in on start up, but everything else is very high on my list of dreams!
I would also agree that we should merge the PSO branch & metal branch into 2.1 and do an official release, and then start on a 2.2. I am literally delivering a system as I type this using Ogre 2.1, and although I know it is very stable, convincing my QA department was a struggle!
I am not so worried about streaming as I can load everything in on start up, but everything else is very high on my list of dreams!
I would also agree that we should merge the PSO branch & metal branch into 2.1 and do an official release, and then start on a 2.2. I am literally delivering a system as I type this using Ogre 2.1, and although I know it is very stable, convincing my QA department was a struggle!
-
- OGRE Team Member
- Posts: 5434
- Joined: Sat Jul 21, 2007 4:55 pm
- Location: Buenos Aires, Argentina
- x 1341
Re: Texture Streaming Refactor proposal (w/ SLIDES)
I agree with merging PSO into 2.1.
My main blocker is that we're not providing any sort of utility to cache PSOs for those who need to write low level shaders (mostly porting 3rd party GUI solutions).
It isn't hard to write one, it's just time consuming. It would do something similar to what Hlms does. Basically build an HlmsPso from the needed parameters, find in a map if it's already been created; if so, return that PSO. Otherwise create one.
My main blocker is that we're not providing any sort of utility to cache PSOs for those who need to write low level shaders (mostly porting 3rd party GUI solutions).
It isn't hard to write one, it's just time consuming. It would do something similar to what Hlms does. Basically build an HlmsPso from the needed parameters, find in a map if it's already been created; if so, return that PSO. Otherwise create one.
-
- OGRE Expert User
- Posts: 1227
- Joined: Thu Dec 11, 2008 7:56 pm
- Location: Bristol, UK
- x 157
Re: Texture Streaming Refactor proposal (w/ SLIDES)
Sounds perfect for a community contribution! I have not got round to looking at PSO's in Ogre yet so not really sure how they function, but I am sure ill look at it in the near future. But please share any other details you can so someone here can help implement and give you more time elsewhere.dark_sylinc wrote: It isn't hard to write one, it's just time consuming. It would do something similar to what Hlms does. Basically build an HlmsPso from the needed parameters, find in a map if it's already been created; if so, return that PSO. Otherwise create one.
-
- OGRE Team Member
- Posts: 5434
- Joined: Sat Jul 21, 2007 4:55 pm
- Location: Buenos Aires, Argentina
- x 1341
Re: Texture Streaming Refactor proposal (w/ SLIDES)
It's perfect for a community contribution indeed.
PSOs basically contain everything condensed into one big object. Macroblock & Blendblock information, stencil test params, vertex input layout, even render target information (including MRT count, MSAA settings, formats of each MRT, depth buffer format).
From an engine design point of view this means PSOs should be created on load. From an API perspective, PSOs give a lot of performance optimizations to the driver as the driver can see everything that can and will be used (and each of its relationships), perform heavy optimizations; and encapsulate the optimized state into the PSO.
However for immediate-mode style rendering (very common in most GUIs out there); this paradigm sucks (unless the GUI has been designed in mind with PSOs).
Ogre (and therefore, the Hlms) separates the PSO into two segments: PSO data that rarely changes (such as RTT formats, depth buffer format, stencil test settings), and PSO data that may change often: Macroblock, blendblock, vertex layout, and shaders.
In Ogre we have for that HlmsPso & HlmsPassPso (HlmsPassPso is part of HlmsPso).
My idea is that a cache class should work something like this:
getPso() would check if it's dirty; if it's not, just return the same PSO as before. If it is; it will find an already created HlmsPso from its cache. If it's not, then create a new one (by calling _hlmsPipelineStateObjectCreated; when destroying the entire cache don't forget to call _hlmsPipelineStateObjectDestroyed).
Naturally, the dev using the cache should optimize as much as possible (i.e. if the vertex format is always the same, then call cache.setVertexFormat outside the loop).
Also if the user provides macro & blendblocks by pointer created from HlmsManager, checking if they're different is just a pointer compare. i.e. if( oldBlendblock != newBlendblock ) mDirty = true;
Overall it's simple, and would simplify a lot the porting of GUI tools (Gorilla, CEGUI, etc) to Ogre 2.1-pso
PSOs basically contain everything condensed into one big object. Macroblock & Blendblock information, stencil test params, vertex input layout, even render target information (including MRT count, MSAA settings, formats of each MRT, depth buffer format).
From an engine design point of view this means PSOs should be created on load. From an API perspective, PSOs give a lot of performance optimizations to the driver as the driver can see everything that can and will be used (and each of its relationships), perform heavy optimizations; and encapsulate the optimized state into the PSO.
However for immediate-mode style rendering (very common in most GUIs out there); this paradigm sucks (unless the GUI has been designed in mind with PSOs).
Ogre (and therefore, the Hlms) separates the PSO into two segments: PSO data that rarely changes (such as RTT formats, depth buffer format, stencil test settings), and PSO data that may change often: Macroblock, blendblock, vertex layout, and shaders.
In Ogre we have for that HlmsPso & HlmsPassPso (HlmsPassPso is part of HlmsPso).
My idea is that a cache class should work something like this:
Code: Select all
PsoCache cache; //This is persistent, not a local variable
void render()
{
cache.clear();
cache.setRenderTarget( renderTarget );
cache.setStencilSettings( stencilParams );
foreach( object_to_render )
{
cache.setVertexFormat( vertexElements, operationType, enablePrimitiveRestart );
cache.setMacroblock( macroblock );
cache.setBlendblock( blendblock );
cache.setShaders( ... );
HlmsPso *pso = cache.getPso();
renderSystem->_setPipelineStateObject( pso );
}
}
Naturally, the dev using the cache should optimize as much as possible (i.e. if the vertex format is always the same, then call cache.setVertexFormat outside the loop).
Also if the user provides macro & blendblocks by pointer created from HlmsManager, checking if they're different is just a pointer compare. i.e. if( oldBlendblock != newBlendblock ) mDirty = true;
Overall it's simple, and would simplify a lot the porting of GUI tools (Gorilla, CEGUI, etc) to Ogre 2.1-pso
-
- OGRE Expert User
- Posts: 1227
- Joined: Thu Dec 11, 2008 7:56 pm
- Location: Bristol, UK
- x 157
Re: Texture Streaming Refactor proposal (w/ SLIDES)
I follow 75% of what you have explained, but I think I should tackle and update one of the GUI's to get a better idea, maybe MyGUI. As I currently understand it though, this PSOCache is not required to get the GUI's to work, but they would greatly benefit from it?
-
- OGRE Team Member
- Posts: 5434
- Joined: Sat Jul 21, 2007 4:55 pm
- Location: Buenos Aires, Argentina
- x 1341
Re: Texture Streaming Refactor proposal (w/ SLIDES)
Managing PSOs is required. What's optional would be using a PsoCache implementation we'll provide to make this management easy.
I had started writing a PSO cache (turns out writing these requirements down helped a lot), but I have to re-tune it for better performance, then commit and push.
I had started writing a PSO cache (turns out writing these requirements down helped a lot), but I have to re-tune it for better performance, then commit and push.
-
- OGRE Expert User
- Posts: 1227
- Joined: Thu Dec 11, 2008 7:56 pm
- Location: Bristol, UK
- x 157
Re: Texture Streaming Refactor proposal (w/ SLIDES)
Fair enough, ill wait for it and see if it matches what I had in mind!dark_sylinc wrote:Managing PSOs is required. What's optional would be using a PsoCache implementation we'll provide to make this management easy.
I had started writing a PSO cache (turns out writing these requirements down helped a lot), but I have to re-tune it for better performance, then commit and push.
-
- OGRE Team Member
- Posts: 5434
- Joined: Sat Jul 21, 2007 4:55 pm
- Location: Buenos Aires, Argentina
- x 1341
-
- Gnoblar
- Posts: 15
- Joined: Sat May 21, 2016 5:07 pm
Re: Texture Streaming Refactor proposal (w/ SLIDES)
Slides look great, any info on when this refactor is going to be started?
-
- Gnoblar
- Posts: 15
- Joined: Sat May 21, 2016 5:07 pm
Re: Texture Streaming Refactor proposal (w/ SLIDES)
Oops, seems like it's already started. 22 hours ago, bitbucket says
-
- Silver Sponsor
- Posts: 1141
- Joined: Tue Jul 06, 2004 5:57 am
- x 151
Re: Texture Streaming Refactor proposal (w/ SLIDES)
It's a 2.2 branch. Meaning that a release candidate for 2.1 is on its way? In that case shouldn't the license text "Copyright (c) 2000-2014 Torus Knot Software Ltd" be updated? (or adding a supplemental 2015-2017 copyright).
Gui generator tool https://github.com/spookyboo/Magus ==> Windows binaries https://github.com/spookyboo/Magus_bin
HLMS editor https://github.com/spookyboo/HLMSEditor ==> Windows setup https://github.com/spookyboo/HLMSEditor ... e?raw=true
HLMS editor https://github.com/spookyboo/HLMSEditor ==> Windows setup https://github.com/spookyboo/HLMSEditor ... e?raw=true
-
- OGRE Expert User
- Posts: 1148
- Joined: Sat Jul 06, 2013 10:59 pm
- Location: Chile
- x 169
Re: Texture Streaming Refactor proposal (w/ SLIDES)
Hello! I see a lot of movement in the branch, I hope it's doing great! You are a effing god Matias!
I have a couple of requests regarding this new system:
1) Would be possible to load easily in to vram just up to n mips? So I can have an option to use lower quality version of the textures that will actually use less vram.
2) And I would like an easy way to load a texture from a specific (relative or absolute) path, no using the resource manager.
Thanks!
I have a couple of requests regarding this new system:
1) Would be possible to load easily in to vram just up to n mips? So I can have an option to use lower quality version of the textures that will actually use less vram.
2) And I would like an easy way to load a texture from a specific (relative or absolute) path, no using the resource manager.
Thanks!
-
- OGRE Team Member
- Posts: 5434
- Joined: Sat Jul 21, 2007 4:55 pm
- Location: Buenos Aires, Argentina
- x 1341
Re: Texture Streaming Refactor proposal (w/ SLIDES)
I can't comment on this because there's a lot of factors involved that make this complex and hard. Something like this is within the goals, but right now we're too far from that.xrgo wrote:1) Would be possible to load easily in to vram just up to n mips? So I can have an option to use lower quality version of the textures that will actually use less vram.
The short story is that if you have a material with three 2048x2048 textures and they're all in the same texture array, that's all and well. But if you only change one of those textures from 2048x2048 to 1024x1024; it's going to be in a different texture array and generate a new shader. And new shader = hiccup while compiling it.
If you downsize all 3 textures and if they end up in the same array again, then this can proceed without hiccups. But to be hiccup-free we have to guarantee that:
- The texture you want is downsized
- The other textures used by that material are also downsized
- The downsized textures are put in the same arrays (to be able to use the same shader).
- If one of these textures is also used by another material (which uses other textures), it may cause a domino effect
- Ensuring that all of these conditions are met has its own overhead which could outweight just recompiling the shader.
That's the short version. There are more details at play.
Yes, absolutely.xrgo wrote:2) And I would like an easy way to load a texture from a specific (relative or absolute) path, no using the resource manager.
Thanks!
-
- OGRE Expert User
- Posts: 1148
- Joined: Sat Jul 06, 2013 10:59 pm
- Location: Chile
- x 169
Re: Texture Streaming Refactor proposal (w/ SLIDES)
Thank you so much!
1) but what if I set that at the moment that I load the texture, in other words, for the shader was never 2048, so no hiccup. I actually do this right now with something like this: http://www.ogre3d.org/forums/viewtopic. ... 31#p518003 and you actually commented
2) Fantastique!!!
1) but what if I set that at the moment that I load the texture, in other words, for the shader was never 2048, so no hiccup. I actually do this right now with something like this: http://www.ogre3d.org/forums/viewtopic. ... 31#p518003 and you actually commented
=Ddark_sylinc wrote:The idea is quite clever btw. I was thinking of something similar, but I like yours better.
2) Fantastique!!!
-
- OGRE Team Member
- Posts: 5434
- Joined: Sat Jul 21, 2007 4:55 pm
- Location: Buenos Aires, Argentina
- x 1341
Re: Texture Streaming Refactor proposal (w/ SLIDES)
If this data is known beforehand like you're suggesting, then there should be no problem at all.xrgo wrote:1) but what if I set that at the moment that I load the texture, in other words, for the shader was never 2048, so no hiccup.
-
- OGRE Expert User
- Posts: 1148
- Joined: Sat Jul 06, 2013 10:59 pm
- Location: Chile
- x 169
Re: Texture Streaming Refactor proposal (w/ SLIDES)
yes! that would be very useful. I actually set a quality setting as "Low" and then every texture is loaded with a minLod of non 0, and just stays theredark_sylinc wrote:If this data is known beforehand like you're suggesting, then there should be no problem at all.
Thank you!
-
- OGRE Team Member
- Posts: 5434
- Joined: Sat Jul 21, 2007 4:55 pm
- Location: Buenos Aires, Argentina
- x 1341
Re: Texture Streaming Refactor proposal (w/ SLIDES)
After a long time, it's finally done: Yesterday I pushed a set of several commits that introduced the Texture Metadata Cache.
I opted to use JSON instead of a binary format because the metadata cache file on disk is easier to inspect that way. Not to mention the metadata can also be used to manipulate the new feature of texture pool IDs, which can be very important for some engines that take advantage of it.
Pool IDs are basically a way to ensure textures with the same pool ID get grouped together (as long as they have same format & resolution), or rather... a way to prevent completely different textures to accidentally end up being grouped together.
The Metadata cache still needs testing, but I can already notice the fewer of fps hitches in OpenGL when the textures finish streaming and appear on screen. But I have yet to test D3D11. I'd expect D3D11 to be much more benefited from the metadata cache.
The code also handles the case were the metadata was out of date (or just intentionally lied...). If the cache was out of date, loading times will be higher because we have to retry loading a few things again related to that texture from scratch. To keep thread safety the cache-missed texture needs to go back to the main thread and then back again to the worker thread.
While optimizing this corner case could be possible, it only complicates the code and design, and we have to work under the assumption that the cache will be correct 99% of the time, because it's rare to modify the width/height/pixel format/texture type of a texture even during development. And when that happens, the performance hit is definitely acceptable (it's a small 'hitch').
I opted to use JSON instead of a binary format because the metadata cache file on disk is easier to inspect that way. Not to mention the metadata can also be used to manipulate the new feature of texture pool IDs, which can be very important for some engines that take advantage of it.
Pool IDs are basically a way to ensure textures with the same pool ID get grouped together (as long as they have same format & resolution), or rather... a way to prevent completely different textures to accidentally end up being grouped together.
The Metadata cache still needs testing, but I can already notice the fewer of fps hitches in OpenGL when the textures finish streaming and appear on screen. But I have yet to test D3D11. I'd expect D3D11 to be much more benefited from the metadata cache.
The code also handles the case were the metadata was out of date (or just intentionally lied...). If the cache was out of date, loading times will be higher because we have to retry loading a few things again related to that texture from scratch. To keep thread safety the cache-missed texture needs to go back to the main thread and then back again to the worker thread.
While optimizing this corner case could be possible, it only complicates the code and design, and we have to work under the assumption that the cache will be correct 99% of the time, because it's rare to modify the width/height/pixel format/texture type of a texture even during development. And when that happens, the performance hit is definitely acceptable (it's a small 'hitch').
-
- Goblin
- Posts: 296
- Joined: Mon May 09, 2016 8:21 am
- x 35
Re: Texture Streaming Refactor proposal (w/ SLIDES)
By "it's finally done" do you mean "The texture refactor is finally done"?dark_sylinc wrote: ↑Fri Sep 28, 2018 7:51 pm After a long time, it's finally done: Yesterday I pushed a set of several commits that introduced the Texture Metadata Cache.
So when will you remove the "WIP" suffix from the branch name?
-
- OGRE Team Member
- Posts: 5434
- Joined: Sat Jul 21, 2007 4:55 pm
- Location: Buenos Aires, Argentina
- x 1341
Re: Texture Streaming Refactor proposal (w/ SLIDES)
I've actually been thinking about that.
Before dropping the WIP label, the following is needed:
Before dropping the WIP label, the following is needed:
- Test TextureGpu::scheduleTransitionTo (3x, one for each GpuPageOutStrategy option):
- Resident -> OnSystemRam
- Resident -> OnStorage
- OnSystemRam -> OnStorage
- OnSystemRam -> Resident
- Implement going Resident with AlwaysKeepSystemRamCopy. Right now TextureGpu demands the sysram copy to be provided with _transitionTo when going Resident, otherwise exceptions/asserts are triggered. Now that I've had time to think, this makes little sense. The pointer must be provided before/during TextureGpu::notifyDataIsReady gets called. There is no need to require it while going Resident. Back then, when I started, I had the notion that a TextureGpu being Resident meant it was ready to display, which is not the same thing. Hence it asks for a memory pointer when using AlwaysKeepSystemRamCopy. This is wrong.
- Better error handing. Right now if there is an exception in the worker thread, the thread terminates abruptly and textures stop streaming, and the main thread will likely deadlock or livelock
-
- Gnome
- Posts: 388
- Joined: Sat Jun 23, 2007 5:16 pm
- x 99
Re: Texture Streaming Refactor proposal (w/ SLIDES)
Hmm does this topic need to be sticky, I mean this feature is done already right? It's over 4 years with no update. And streaming was in Ogre news here already IIRC.