I think at the moment, the current support seams sufficient, but there are for sure some options how to improve it. In my implementation I rely on the Hlms::preparePassHashBase - I think in this case, just making getPassPsoForScene virtual would add more flexibility to the implementation. I don't see a reason why it cannot be virtual. From the design part however I don't like in the base class if a virtual method calls another virtual method - it allows in the implementation to override the top virtual method completely and another virtual method won't be called at all. Let's ignore this detail, I still think it would be great to make getPassPsoForScene virtual.
A. Next problem is with mPassCache - the passCache and its passPso must know about the possible update of macroblock (at the moment indicated with strongMacroblockBits) so the value in the PassCacheVec will be properly recognized or a new value will be added to the container. This is actually the only place where it is necessary to distinguish between different kind of overrides in the macroblock.
B. Later applyStrongMacroblockRules is called where the update is actually done - as outcome of the update, it is necessary to have only 2 states - the macroblock was updated (strong ref) or not (weak ref) and the actual changes are not relevant anymore - this is indicated by any non-zero value of strongMacroblockBits.
C. At the end, the strong ref macroblocks are released from clearShaderCache.
Possible improvements:
Regarding A: Removing the HlmsPassPso::strongMacroblockBits completely - the corresponding additional properties will be set in getPassPsoForScene. They won't be read/used - this is just to make the mPassCache search "happy" so the value will be added to the container in the case of some overrides. Having getPassPsoForScene as virtual allows the Hlms inherited classes to set more properties which indicate possible overwrites of the macroblock/blendblock.
Regarding B: The Hlms will have 2 new virtual methods - updatePsoMacroblock and updatePsoBlendblock. The applyStrongMacroblockRules (will be renamed) will accept also SceneManager pointer (it can be passed from RenderQueue through Hlms::getMaterial at all places as it has access to it (member RenderQueue::mSceneManager). The method will copy macroblock locally and calls updatePsoMacroblock and same for blendblock. The Hlms::updatePsoMacroblock (as base class) can check mRenderSystem and sceneManager and make exactly the same updates as it makes now. Additionally it allows inherited classes to update macroblock/blendblock by their own (according to current compositor pass, state or render system etc.). Then the applyStrongMacroblockRules, after the virtual calls, will compare whether the macroblock/blendblock is same or it is necessary to access new instances (and make strong refs). The HlmsPassPso will have new member uint8 strongBlockRefs (or just renamed strongMacroblockBits, but with different meaning). At the moment only 2 bits will be used (strongMacroblockRef = 1 and strongBlendblockRef = 2).
Regarding C: The functionality will remain in clearShaderCache, but it checks for macroblock and as well for blendblock accessing the bits in HlmsPassPso::strongBlockRefs.
It seems that comparing 2 macroblocks or 2 blendblocks is very fast operation - there are only few members to compare. Also I think that applyStrongMacroblockRules is not called too often - only by first initialization of a material, where new cache entry is created.
What do you think? Should I create PR with above suggested changes? Do you see any obvious flaws there?
note: I hope that RenderSystem and SceneManager state is same when getPassPsoForScene is called and as well when applyStrongMacroblockRules is called because there will be same checks with the above changes.