Hi!
There are a lot of duplicated code between HlmsUnlit and HlmsPbs and even between Hlms and HlmsUnlit.
The fundamental issue is that the requirements keep shifting and changing.
And in many cases sharing becomes worse: Unlit is supposed to be lightweight (as in simple) aside from not supporting lights.
Once the code begins to share more, you'll hit the issue that the shared code will begin to pull in more heavy weight code that is only meant for PBS.
If the advanced features of PBS are needed, one option is to use Emissive's element of PBS with no diffuse, no specular and no ambient; which will result in the same look but with all the other available options.
From the implementation side it looks to me that HlmsUnlit was the first and then somehow extracted to Hlms - at least HlmsUnlit::preparePassHash contains some handling which is done by Hlms::preparePassHashBase like the handling of item in mPassCache (including the assertion that too many items are there).
It actually happened the other way around: HlmsPbs came first and then Unlit was a stripped down version.
As for the assert, if you mean this one:
Code: Select all
assert( mPassCache.size() <= (size_t)HlmsBits::PassMask &&
"Too many passes combinations, we'll overflow the bits assigned in the hash!" );
This is because the Hlms final hash is a 32-bit value, and we encode 8 bits for passes and 21 bits for renderables. The goal is to have O(1) lookups without having to implement or use a hash table.
There are (probably only few) some identical properties defined in HlmsUnlit and HlmsPbs - why aren't they part of HlmsBaseProp? Like "exponential_shadow_maps". I didn't compare them, but when I tried to use a shader piece DoShadowCastPS, no code is generated by my CustomHlms due to missing exponential_shadow_maps.
Unlit was originally written to have no shadows. But the issue arose that it was useful for Unlit to cast shadows (but not receive them).
Exponential shadows introduced the problem that the caster must be aware of this setting, thus Unlit had to clone certain (but not all) functionality that is from PBS.
Exponential shadows is at its origin, a PBS feature. We could rethink the issue and probably refactor it as its own component that can be plugged into HlmsPbs and HlmsUnlit (and whatever other Hlms implementation). But there is not enough interest in exponential shadows to begin with, to allocate resources to do such refactor.
I noticed from the debugging, that filling pass buffers in preparePassHash implementation is also not very good idea - in my example code I registered HlmsPbs, HlmsUnlit and my CustomHlms - when items in the scene use only CustomHlms, the HlmsPbs and HlmsUnlit still create and fill pass buffers, even if their fillBuffersFor is not called at all. I think the good approach is to only accumulate the data in member variable (like it is done in HlmsUnlit::mPreparedPass), but don't fill the GUP buffers yet. The filling of the buffers can be done in fillBuffersFor when hlms type change is detected - there the pass buffers are bound. It is of course also better to guard the filling of the pass buffers to ensure that they are filled only once.
It is certainly a minor(\*) problem that could be fixed, but not an easy to fix:
-
Several of the operations in preparePassHash can NOT be done in fillBuffersFor: Vulkan & Metal require buffer copy operations to not happen while a "render pass" is open. Once RenderSystem::executeRenderPassDescriptorDelayedActions
is called, those operations can no longer be issued until RenderSystem::endRenderPassDescriptor
is called.
-
For the data that can be uploaded in fillBuffersFor, copying the data to a CPU buffer and then uploading on demand is pointless, because you'll be doubling bandwidth usage for no reason (read data from various sources, write into CPU buffer, then read from that CPU buffer, write into GPU buffer).
-
You could split Pass data generation from the uploading tasks that can be done in fillBuffersFor. But you'll quickly end up with a lot of duplicated work, like evaluating the size of mapSize.
(\*) It's "minor" because a normal rendering pipeline usually has less than 10 passes. At a max of < 64kb each (usually around 1kb typically) and 3 Hlms implementations, that's around 10 x 64 x 3 = 1.88MB of max data per frame, 112MB/s at 60Hz.
Typically 1.75MB/s if we consider 1kb instead of 64kb.
What you're proposing is to evaluate if we have already uploaded the pass data in fillBuffersFor. Which gets called per object, per pass. That's almost certainly a net loss. Even with a branch missprediction of 1%, at 100k objects that'd be the same as calling preparePasshHash 1000 times per pass per frame.
What's counterintuitive about modern computer performance is that it's sometimes (but not always) more efficient to do something 10 times and discard it because it was never needed after all, than to check if we need to ever do it... but check it 100,000 times.
So far the only solution I've found to this problem is to have the CompositorPass only call the prepare pass hash that it needs.
CompositorPassScene does not support this (I'm worried that people could disable it and then by accident use it, which can cause more unnecessary support tickets), but custom passes can do it. For example Colibri only calls HlmsColibri::preparePassHash (which derives from HlmsUnlit) in its compositor pass used to draw the GUI exclusively. This happens inside of m_colibriManager->prepareRenderCommands.
And another note about the shadow casters - I think the support should be verified by the Hlms itself - if Item has set a bool flag that it casts shadows, but Hlms doesn't support shadow casters, the Item shouldn't be rendered in the shadow maps.
Hlms implementations have no say on what gets rendered (at least with the default RenderQueue implementation).
They just say how to render it.
The best they can do if they want something gone is to replace the vertex shader with a dummy one or fill the WVP matrix with zeroes so that it creates degenerate triangles, thus nothing gets drawn on screen. However if your goal is to save performance, the GPU still has to process it.
Either that, or assert / log when they encounter a shadow casting object that shouldn't be casting shadow, to inform the object is incorrectly setup.
The analysis / assertion can happen either early in calculateHashForPreCaster (which is called once when a datablock is assigned to a renderable, but it will miss if the setting is later enabled) or late in fillBuffersFor (which is called every time the object is rendered).
Other ways to avoid this issue is to use Render Queue IDs or visibility masks in the shadow nodes passes.
"What" gets rendered and in which order is the Compositor's job.
Cheers