OK; I remembered a bit more about that part:
The Hlms is given two buffers from HlmsBufferManager: the const buffer (where worldMaterialIdx lives), can't be bigger than 64kb, and a large tex buffer (worldMatBuf) and can hold a lot more data. It is used for matrix data, but it is meant to hold anything generic and potentially big or of arbitrary length.
You could have more, but you would have to manage it in a similar way HlmsBufferManager does for you.
Find in files the following keywords to checkout how it works:
- worldMatBuf
- mStartMappedTexBuffer
- mCurrentMappedTexBuffer
- mCurrentTexBufferSize
- HlmsBufferManager::mapNextTexBuffer
- HlmsBufferManager::unmapTexBuffer
Rather than creating a new buffer, you could store your data in worldMatBuf. That's what it is for.
For non-animated objects, we assumed a very simple scheme: worldMatBuf only stores two world matrix (one if doing shadow casting pass), and can be addressed by doing:
Code: Select all
UNPACK_MAT3x4( worldMatBuf, drawId << 1u );
For skeletal objects, we get rid of that assumption and add an indirection. Because a submesh may be affected by 3 bones (3 matrices) or 9 bones (9 matrices), or whatever number of bones, now the addressing becomes variable. Thus we need the indirection:
Code: Select all
uint matStart = instance.worldMaterialIdx[drawId].x >> 9u;
vec4 worldMat[3];
worldMat[0] = bufferFetch( worldMatBuf, int(matStart + _idx + 0u) );
worldMat[1] = bufferFetch( worldMatBuf, int(matStart + _idx + 1u) );
worldMat[2] = bufferFetch( worldMatBuf, int(matStart + _idx + 2u) );
Quick Note: matStart is calculated when we do, from C++:
Code: Select all
//uint worldMaterialIdx[]
size_t distToWorldMatStart = mCurrentMappedTexBuffer - mStartMappedTexBuffer;
distToWorldMatStart >>= 2;
*currentMappedConstBuffer = (distToWorldMatStart << 9 ) |
(datablock->getAssignedSlot() & 0x1FF);
That is, we calculate, the difference of addresses between start of the tex buffer and current offset, and send it to the shader.
End of quick note
This leaves us with a problem: If I render a SubItem with skeletal animation (i.e. 9 bones) and then I render a non-animated object, the assumption breaks for the second object. The first object (the skeletally animated one) will be rendered fine, but the next one will not, because:
Code: Select all
drawId = 1; //Second element, remember drawId starts at 0.
UNPACK_MAT3x4( worldMatBuf, drawId << 1u );
Will gives us the 3rd and 4th bone of the previous item we rendered; and not our matrices.
To solve that, HlmsPbs performs three things:
1. We wrote 9 matrices to mCurrentMappedTexBuffer; but during non-shadow-casting passes, non-animated objects assume there's two matrix per draw. In other words we need to add some padding and pretend we wrote 10 matrices instead of 9. We do that here:
Code: Select all
//If the next entity will not be skeletally animated, we'll need
//currentMappedTexBuffer to be 16/32-byte aligned.
//Non-skeletally animated objects are far more common than skeletal ones,
//so we do this here instead of doing it before rendering the non-skeletal ones.
size_t currentConstOffset = (size_t)(currentMappedTexBuffer - mStartMappedTexBuffer);
currentConstOffset = alignToNextMultiple( currentConstOffset, 16 + 16 * !casterPass );
currentConstOffset = std::min( currentConstOffset, mCurrentTexBufferSize );
currentMappedTexBuffer = mStartMappedTexBuffer + currentConstOffset;
In other words, each animated object corrects the pointers so the assumptions hold for the next draw.
2. We need to cheat on draw id. We only draw an object with 9 matrices + 1 matrix of padding. But this is the same as drawing 5 non-animated objects. We recalculate the draw ID by doing pointer arithmetic based on the current location of texture buffer (currentMappedTexBuffer - mStartMappedTexBuffer) and add it to the start const buffer (currentConstOffset + mStartMappedConstBuffer). We do that here:
Code: Select all
if( !hasSkeletonAnimation )
{
//We need to correct currentMappedConstBuffer to point to the right texture buffer's
//offset, which may not be in sync if the previous draw had skeletal animation.
const size_t currentConstOffset = (currentMappedTexBuffer - mStartMappedTexBuffer) >>
(2 + !casterPass);
currentMappedConstBuffer = currentConstOffset + mStartMappedConstBuffer;
We can't use mCurrentMappedConstBuffer directly, because it still acts as if this is the second object being rendered. So we have to calculate currentMappedConstBuffer by adding an offset to mStartMappedConstBuffer.
This means that:
- worldMaterialIdx[0] is used by the animated object
- worldMaterialIdx[1] through worldMaterialIdx[4] are wasted
- worldMaterialIdx[5] will be used by the second, non-animated object
NOTE: If the second object to draw were also skeletally animated instead of being non-animated; we don't have to cheat on this. Because we're already using indirections; drawId = 1 can safely be used; we can use mCurrentMappedConstBuffer directly, and the tex buffer will be indexed via texelFetch( worldMatBuf, matStart = instance.worldMaterialIdx[drawId].x >> 9u ).
This is "an extra layer of indirection vs assumption" trade off.
Basically rendering many skeletally animated objects in sucession (regardless of their number of bones, can be heterogenous) is fast, and rendering many non-animated objects in succession is also fast. But interleaving non-animated with animated objects leaves waste (RenderQueue sorting should ensure we batch them together instead of interleaving as much as possible).
3. We need to tell the RenderQueue that drawId had a discontinuity, and that is why it's so important that HlmsPbs::fillBuffersFor ends with:
Code: Select all
return ((mCurrentMappedConstBuffer - mStartMappedConstBuffer) >> 2) - 1;
Since mCurrentMappedConstBuffer has been offseted by now; we will return 5 (the new drawId) instead of 2.
Armed with this knowledge, you should now be able to figgure that you can put as many poses you want inside the worldMatBuf. It is a general-purpose large buffer so you could even put together bone matrices and pose data in it (as long as you're organized where each thing is) and that:
- mCurrentMappedTexBuffer - mStartMappedTexBuffer to calculate offsets for sending to your shader.
- You need to add some padding at the end in case the the next draw call doesn't have any fancy indirection so that their assumptions still hold. The next draw call will take care of generating a discontinuity in drawId
The devil is in the details:
- It is very easy to screwup offset calculation. e.g. mCurrentMappedTexBuffer - mStartMappedTexBuffer is an offset in floats. It is in neither bytes nor matrices. The shader expects an offset in float4 (hahahahahahahahaha BBAAAHAHAHAHAHAHA <evil laugh>). Sometimes you'll confuse const and tex buffer addresses because their variables have similar names.
- Some bugs will only happen when you render in a certain order because you broke an assumption / failed to restore offsets properly and it stays otherwise hidden.
- You need to check if there's enough space left to write your data. Look for "exceedsConstBuffer" and "exceedsTexBuffer" and how we handle it
Adding pose animations + skeletal animations could simply be treated like skeletal animations with an extra "pose count" that say how many pose values come after the bone matrices (or before the bone matrices).
And using pose animations without skeletal animations could simply be treated like pose + skeletal animations with just 2 "bone" matrices.
We would have to stop thinking in terms of skeletal animation vs non-skeletal Items, and start thinking in terms of "Items that use indirection" (use skeletal and/or pose data) vs "Items that use assumption" (have no animation).
Adding a new buffer could also be another way; but you would have to manage its lifetime in a similar way HlmsBufferManager does for you: Ensure it's bound when poses are used, keep track of its offsets (so you can send them to the shader), create a new buffer when you run out of space. The big main advantage is that this buffer doesn't have to break assumptions for non-animated objects, since it's parallel to them.