Set custom HlmsMacroblock depending on compositor pass

Discussion area about developing with Ogre-Next (2.1, 2.2 and beyond)


User avatar
bishopnator
Gnome
Posts: 327
Joined: Thu Apr 26, 2007 11:43 am
Location: Slovakia / Switzerland
x 14

Set custom HlmsMacroblock depending on compositor pass

Post by bishopnator »

Hi, is it possible to "overwrite" setting of macro block depending on compositor pass? Let's consider something like this:

Code: Select all

compositor_node MyNode
{
	in 0 rt_renderwindow

target rt_renderwindow
{
	pass render_scene
	{
		..... // some additional settings like load/store/etc.
		
		visibility_mask		0x00000001
		rq_first		50
		rq_last			51

		identifier		100
		profiling_id		"1st pass"
	}
	pass render_scene
	{
		..... // some additional settings like load/store/etc.
		
		visibility_mask 	0x00000002
		rq_first		50
		rq_last			51

		identifier		101
		profiling_id		"2nd pass"
	}
}
}

I need to render same scene, filtered by the visibility_mask and render queue IDs and based on profiling_id setup I would like to set specific macroblock - more specifically in "1st pass" I need to turn-off depth writes and checks and in 2nd pass normal render with active depth buffer. At the moment I duplicated all my datablocks in hlms just to assign 2 different macroblocks and duplicated all items in my scene nodes - one Item with datablock without depth tests and visibility 1 and another item with datablock with depth tests and visibility 2.

I would like to remove those duplicates and keep as "simple" scene as possible.

edited: I just briefly considered to overwrite Hlms::createShaderCacheEntry in my custom Hlms implementation and before calling parent method, modify the datablock assigned to the input queuedRenderable - through it I suppose it is possible to access SceneManager and then sceneManager->getCurrentCompositorPass()->getDefinition()->mIdentifier and based on this identifier reassign macroblock, but it sounds horribly and hacky.

User avatar
bishopnator
Gnome
Posts: 327
Joined: Thu Apr 26, 2007 11:43 am
Location: Slovakia / Switzerland
x 14

Re: Set custom HlmsMacroblock depending on compositor pass

Post by bishopnator »

Looking more and more on the implementation of Ogre::HlmsDatablock, I am getting some strange feeling here. Why a datablock holds just 2 macroblock? I just looks like a hard-coded need for a feature - in this case "caster". If I need to render same scene in N compositor scene passes and for each scene pass I need to use same macroblock for all materials, I need to create N datablocks for each identical settings, only to be able to store there my specific macroblock. This lead to huge overhead and identical datablock will be stored N-times in the buffer. I understand, that Ogre tries to have scene as static as possible regarding of the attributes setups for the rendering through which it is possible to gain rendering performance.

Let's consider very simple example without any fancy complicated rendering technique. I have 2 windows there and both windows render completely identical scene. In one window, all HLMS datablock will use default macroblock and in another window, I want all objects rendered with PolygonMode::PM_WIREFRAME. Is it possible to achieve it without duplicating scene graph and datablocks?

User avatar
bishopnator
Gnome
Posts: 327
Joined: Thu Apr 26, 2007 11:43 am
Location: Slovakia / Switzerland
x 14

Re: Set custom HlmsMacroblock depending on compositor pass

Post by bishopnator »

I found out that there is a CompositorWorkspaceListener with which I can overwrite the macroblocks in all my datablocks in its passPreExecute. This seems to work. I pre-created my macroblocks and store them in member variables to speed-up setting of them in datablocks (using the method which accepts a pointer to HlmsMacroblock). Is it the right approach?

jwwalker
Goblin
Posts: 267
Joined: Thu Aug 12, 2021 10:06 pm
Location: San Diego, CA, USA
x 19

Re: Set custom HlmsMacroblock depending on compositor pass

Post by jwwalker »

I hope you get a good answer, because I have a similar problem with wanting to vary blend blocks according to the pass.

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5476
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1358

Re: Set custom HlmsMacroblock depending on compositor pass

Post by dark_sylinc »

bishopnator wrote: Wed Mar 12, 2025 10:26 pm

Looking more and more on the implementation of Ogre::HlmsDatablock, I am getting some strange feeling here. Why a datablock holds just 2 macroblock?

Because the macroblocks become part of the PSO (a huge monolithic block) and having many macroblocks translates to having many PSOs, which translates to many Renderable::mHlmsHash values.

OgreNext was designed to having many thousands (possibly hundreds of thousands) of Items on scene, thus that kind of flexibility would go against that goal.

Nonetheless, onto your actual problem:

Indeed your problem is a common one. "I want to disable depth writes for all objects at once". This is typical for depth pre-pass. But the key here is "all objects". That changes things a lot. Because that can be done efficiently now.

OgreNext supports this via Hlms::applyStrongMacroblockRules. It creates a hidden macroblock with customized settings that override the original macroblock. The word "strong" here alludes to the Hlms keeping a strong reference, owned by the PSO (typically, the PSO does not own a macroblock. It has a weak ref. If a macroblock is destroyed, then all PSOs that used this macroblock need to be destroyed. But "strong macroblocks" are macroblocks that were created by the PSO).

The current supported flags are:

Code: Select all

enum StrongMacroblockBits
{
	// clang-format off
	ForceDisableDepthWrites     = 1u << 0u,
	InvertVertexWinding         = 1u << 1u,
	NoDepthBuffer               = 1u << 2u,
	ForceDepthClamp             = 1u << 3u,
	ForceCullNone               = 1u << 4u,
	// clang-format on
};

They should be self explanatory, but in case there's a doubt you can check what Hlms::applyStrongMacroblockRules does.

Looking at the C++ code, if you fully override Hlms::preparePassHash you can set the flags yourself:

Code: Select all

PassCache passCache;
passCache.passPso = getPassPsoForScene( sceneManager, bForceCullNone );
passCache.passPso.strongMacroblockBits = whatever;
passCache.properties = mT[kNoTid].setProperties;

However if you're relying on preparePassHashBase, you have no way to override those bits (also see Hlms::getPassPsoForScene), PRs are accepted to change that :D .

If there is an override that we currently don't provide and you think would be useful, a PR to add a flag is also welcomed. I'm not sure about adding full flexibility though (e.g. relying on a listener to change the macroblocks).

User avatar
bishopnator
Gnome
Posts: 327
Joined: Thu Apr 26, 2007 11:43 am
Location: Slovakia / Switzerland
x 14

Re: Set custom HlmsMacroblock depending on compositor pass

Post by bishopnator »

I think at the moment, the current support seams sufficient, but there are for sure some options how to improve it. In my implementation I rely on the Hlms::preparePassHashBase - I think in this case, just making getPassPsoForScene virtual would add more flexibility to the implementation. I don't see a reason why it cannot be virtual. From the design part however I don't like in the base class if a virtual method calls another virtual method - it allows in the implementation to override the top virtual method completely and another virtual method won't be called at all. Let's ignore this detail, I still think it would be great to make getPassPsoForScene virtual.

A. Next problem is with mPassCache - the passCache and its passPso must know about the possible update of macroblock (at the moment indicated with strongMacroblockBits) so the value in the PassCacheVec will be properly recognized or a new value will be added to the container. This is actually the only place where it is necessary to distinguish between different kind of overrides in the macroblock.

B. Later applyStrongMacroblockRules is called where the update is actually done - as outcome of the update, it is necessary to have only 2 states - the macroblock was updated (strong ref) or not (weak ref) and the actual changes are not relevant anymore - this is indicated by any non-zero value of strongMacroblockBits.

C. At the end, the strong ref macroblocks are released from clearShaderCache.

Possible improvements:
Regarding A: Removing the HlmsPassPso::strongMacroblockBits completely - the corresponding additional properties will be set in getPassPsoForScene. They won't be read/used - this is just to make the mPassCache search "happy" so the value will be added to the container in the case of some overrides. Having getPassPsoForScene as virtual allows the Hlms inherited classes to set more properties which indicate possible overwrites of the macroblock/blendblock.

Regarding B: The Hlms will have 2 new virtual methods - updatePsoMacroblock and updatePsoBlendblock. The applyStrongMacroblockRules (will be renamed) will accept also SceneManager pointer (it can be passed from RenderQueue through Hlms::getMaterial at all places as it has access to it (member RenderQueue::mSceneManager). The method will copy macroblock locally and calls updatePsoMacroblock and same for blendblock. The Hlms::updatePsoMacroblock (as base class) can check mRenderSystem and sceneManager and make exactly the same updates as it makes now. Additionally it allows inherited classes to update macroblock/blendblock by their own (according to current compositor pass, state or render system etc.). Then the applyStrongMacroblockRules, after the virtual calls, will compare whether the macroblock/blendblock is same or it is necessary to access new instances (and make strong refs). The HlmsPassPso will have new member uint8 strongBlockRefs (or just renamed strongMacroblockBits, but with different meaning). At the moment only 2 bits will be used (strongMacroblockRef = 1 and strongBlendblockRef = 2).

Regarding C: The functionality will remain in clearShaderCache, but it checks for macroblock and as well for blendblock accessing the bits in HlmsPassPso::strongBlockRefs.

It seems that comparing 2 macroblocks or 2 blendblocks is very fast operation - there are only few members to compare. Also I think that applyStrongMacroblockRules is not called too often - only by first initialization of a material, where new cache entry is created.

What do you think? Should I create PR with above suggested changes? Do you see any obvious flaws there?

note: I hope that RenderSystem and SceneManager state is same when getPassPsoForScene is called and as well when applyStrongMacroblockRules is called because there will be same checks with the above changes.

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5476
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1358

Re: Set custom HlmsMacroblock depending on compositor pass

Post by dark_sylinc »

"There are 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors."

And with that, you may guess where I'm going with this :lol:

Do you see any obvious flaws there?

The biggest concern I have is with this:

B. Later applyStrongMacroblockRules is called where the update is actually done - as outcome of the update, it is necessary to have only 2 states - the macroblock was updated (strong ref) or not (weak ref) and the actual changes are not relevant anymore - this is indicated by any non-zero value of strongMacroblockBits.

The problem is that whatever applyStrongMacroblockRules does, must be consistent throughout the lifetime of your application (or more strictly, at least until clearShaderCache is called). Actually, it needs to be consistent even through multiple runs (more on that later).

In step A. you create an identifier for the PassCache that tells your applyStrongMacroblockRules replacement what to do with the macroblock. But this identifier must always instruct applyStrongMacroblockRules to do the same thing.

If for example, pso.pass.strongMacroblockBits = 123 means "remove depth writes", then pso.pass.strongMacroblockBits = 123 must always mean "remove depth writes".
If the user messes this up, it won't work as intended.

The current implementation guarantees that the same pso.pass.strongMacroblockBits combination always does the same thing.
That includes what goes into the HlmsDiskCache cache (which is a file on disk). And HlmsDiskCache needs a way to query the user's implementation in case the meaning strongMacroblockBits has changed (to invalidate the cache).

When I say "if the user messes this up, it won't work as intended"; more specifically once we open up to the user implementing what applyStrongMacroblockRules is supposed to do; then the next thing that will happen is having multiple modules layering on top of each other the customizations they want to apply. And when that happens, they should be in harmony with each other, and the user turning off a component must not change what applyStrongMacroblockRules() does.

It's not that your idea is flawed, but it needs to be extra careful in not messing up caches, and that is hard.

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5476
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1358

Re: Set custom HlmsMacroblock depending on compositor pass

Post by dark_sylinc »

On second thought, if instead of using HlmsPassPso::strongMacroblockBits as a bitmask, we use it as a boolean: bool HlmsPassPso::wantsStrongMacroblockBits; implementations can set this to true.

When set to true, Hlms calls applyStrongMacroblockRules() and listeners rely on getProperty() calls to see how to replace the macroblocks, e.g.

Code: Select all

void MyHlmsListener::preparePassHash(...)
{
	hlms->setProperty( "disable_depth_checks", 1 );
}

// NOTE: Alternatively just set a special property in preparePassHash, likehlms->setProperty( "INTERNAL WANTS STRONG MACROBLOCKS", 1 ); 
bool MyHlmsListener::wantsStrongMacroblocks() const
{
	if(conditions)
		return true;
	return false;
}

bool MyHlmsListener::applyStrongMacroblockRules(...)
{
	bool bNeedsStrongMacroblock = false;
	const bool bDisableDepthChecks = hlms->getProperty( "disable_depth_checks" ) != 0;
	if( bDisableDepthChecks && pso.macroblock->mDepthCheck )
		bNeedsStrongMacroblock = true;
		
// TODO: This code does not allow overlaying multiple listeners with their own applyStrongMacroblockRules().
	
if( bNeedsStrongMacroblock )
{
	HlmsMacroblock prepassMacroblock = *pso.macroblock;
	if( bDisableDepthChecks && pso.macroblock->mDepthCheck )
		prepassMacroblock.mDepthCheck = false;

        // mHlmsManager->getMacroblock may be called from different Hlms implementations
        ScopedLock lock( msGlobalMutex );
        pso.macroblock = mHlmsManager->getMacroblock( prepassMacroblock );
}

return bNeedsStrongMacroblock;
}

By relying on setProperty() and getProperty(), you get caching for free. This is slower, but it only gets called when creating the entry.

Implementations could optimize. Instead of this:

Code: Select all

	bool bNeedsStrongMacroblock = false;
	const bool bDisableDepthChecks = hlms->getProperty( "disable_depth_checks" ) != 0;
	if( bDisableDepthChecks && pso.macroblock->mDepthCheck )
		bNeedsStrongMacroblock = true;

You can write this:

Code: Select all

	bool bNeedsStrongMacroblock = false;
	const uint32 mask = hlms->getProperty( "MY MASK" );
	if( mask & DISABLE_DEPTH_CHECKS && pso.macroblock->mDepthCheck )
		bNeedsStrongMacroblock = true;

So that now one getProperty() can hold 32 different bits.

If a cache on disk contains an entry with a property "disable_depth_checks" and now the implementation ignores it or behaves differently, then that entry becomes stale.

An advantage of this design is that you no longer need to override getPassPsoForScene(), because you simply want a listener to set wantsStrongMacroblockBits to true.

User avatar
bishopnator
Gnome
Posts: 327
Joined: Thu Apr 26, 2007 11:43 am
Location: Slovakia / Switzerland
x 14

Re: Set custom HlmsMacroblock depending on compositor pass

Post by bishopnator »

yes, but it seems that this approach has same pros and cons. During setting the properties, the implementation must properly set its own mask (as property) and set the bool flag. Later in applyStrongMacroblockRules the mask must be properly interpreted. You complained that in this case there is a problem with stored cached values if implementation is changed. I think immediately that we give user a possibility to set somehow the flags for macroblock override and later allows to interpret the flags by the user, there is always this problem. User must be aware of such implementation and delete the cache by his/her own. It is not possible to detect such changes from the Ogre core implementation.

I think that it is the right approach - it must be just clear state that such problems can occur and in this case the user should delete the cache manually before the app is started (which can happen only during development phase multiple times).

note: I am always thinking to add the option to override also blendblock - is it a good idea? or should I focus only on macroblocks? At the moment I don't need to override any blendblock, but when I will try to extend a functionality in Ogre, why not also focus on blendblocks as well.

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5476
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1358

Re: Set custom HlmsMacroblock depending on compositor pass

Post by dark_sylinc »

bishopnator wrote: Sat Mar 15, 2025 8:44 pm

I think that it is the right approach - it must be just clear state that such problems can occur and in this case the user should delete the cache manually before the app is started (which can happen only during development phase multiple times).

Agreed.

bishopnator wrote: Sat Mar 15, 2025 8:44 pm

I think immediately that we give user a possibility to set somehow the flags for macroblock override and later allows to interpret the flags by the user, there is always this problem

You're correct but the problem is mitigated.

If we have passPso.strongMacroblockBits like we do now; strongMacroblockBits = (1u << 5u) could mean "remove depth writes" today, but could mean "switch polygon mode" tomorrow.

The problem is aggravated if there are multiple components and both want to use bit 5 for a different purpose.

However setProperty( "MySystem remove depth writes", 1 ) is always obvious.
If the user implements it as setProperty( "MySystem", 1 << 5 ) then yes, this problem can arise if the user changes what bit 5 means. We still have to warn the user through documentation.

BUT, if there are multiple components, they don't trample into each other because they will do their own thing, e.g. setProperty( "AnotherSystem remove culling", 1 )

bishopnator wrote: Sat Mar 15, 2025 8:44 pm

note: I am always thinking to add the option to override also blendblock - is it a good idea? or should I focus only on macroblocks? At the moment I don't need to override any blendblock, but when I will try to extend a functionality in Ogre, why not also focus on blendblocks as well.

It looks like we can do it for blendblocks too. jwwalker expressed interest in it.

User avatar
bishopnator
Gnome
Posts: 327
Joined: Thu Apr 26, 2007 11:43 am
Location: Slovakia / Switzerland
x 14

Re: Set custom HlmsMacroblock depending on compositor pass

Post by bishopnator »

Created PR https://github.com/OGRECave/ogre-next/pull/511 - I started from 3.0 branch as I would like to see it there :-) However there was immediately a Linux CI error which I have no idea what it means.