[2.1] How to modify InstanceDecl in HLMS PBS?

xissburg · Post by **xissburg** » Tue Oct 31, 2017 11:54 pm

I would like to add more data to instances and to do so I need to modify the InstanceBuffer declaration in Structs_piece_vs_piece_ps.glsl (and others) to something like this:

Code: Select all

@piece( InstanceDecl )
//Uniforms that change per Item/Entity
struct InstanceItem
{
    uvec4 worldMaterialIdx;
    uvec4 poseWeight;
};

layout_constbuffer(binding = 2) uniform InstanceBuffer
{
    InstanceItem item[2048];
} instance;
@end

(I am implementing morph animation in Ogre 2.1 and I think this is the right place to put the weights)
But the problem is that Ogre::HlmsPbs and possibly other classes have the size and/or offsets of this InstanceBuffer hardcoded. For example, in OgreHlmsPbs.cpp-1739, in HlmsPbs::fillBuffersFor(), it adds 4 to the const buffer size to check if it exceeds the current buffer's size, which are the 4 floats for each item of worldMaterialIdx.

I have looked around the code and changed some these hardcoded numbers (to 8 ) which I think are tied to the size of each element in InstanceBuffer but there's still something missing and it's very difficult to find.

I have 2 objects in my scene and with these changes I have a strange effect where if both objects are visible they both have the same transform, if one is culled by the frustum then the other moves to its correct transform, so it means I am messing up the texBuffer which contains the world and worldView matrices...

xissburg · Post by **xissburg** » Wed Nov 01, 2017 12:19 am

It looks like this section was the problem..?

Code: Select all

//We need to correct currentMappedConstBuffer to point to the right texture buffer's
//offset, which may not be in sync if the previous draw had skeletal animation.
const size_t currentConstOffset = (currentMappedTexBuffer - mStartMappedTexBuffer) >>
                                                (2 + !casterPass);
currentMappedConstBuffer =  currentConstOffset + mStartMappedConstBuffer;

It works after commenting this out. So what is this? Why do you have to " correct currentMappedConstBuffer to point to the right texture buffer's offset"? As if the texBuffer and the constBuffer were pointers into the same buffer which I think they aren't... It's not clear.

Post by **dark_sylinc** » Wed Nov 01, 2017 12:46 am

That buffer isn't thought to be customizable and its size is fixed due to serious performance reasons.

Right now only worldMaterialIdx.w is left unused which could be used to place where to look to the poses in a different buffer. Look at how animation matrices work; in which they set the world matrix that affects a given bone in the higher 23 bits, and then fetch the actual world matrix from a different buffer:

Code: Select all

uint matStart = instance.worldMaterialIdx[drawId].x >> 9u;
vec4 worldMat[3];
worldMat[0] = bufferFetch( worldMatBuf, int(matStart + _idx + 0u) );
worldMat[1] = bufferFetch( worldMatBuf, int(matStart + _idx + 1u) );
worldMat[2] = bufferFetch( worldMatBuf, int(matStart + _idx + 2u) );

I suggest you do something similar with the pose weights:

Code: Select all

uint poseStart = instance.worldMaterialIdx[drawId].w;
poseData = bufferFetch( poseBuffer, int(poseStart + 0u) );
...

If you plan on supporting a max of 4 pose weights per object, you could alternatively encode all 4 into a 32-bit value using four 8-bit values (0...255):

Code: Select all

vec4 poseWeights = unpackUnorm4x8( instance.worldMaterialIdx[drawId].w );

xissburg · Post by **xissburg** » Wed Nov 01, 2017 1:26 am

Thanks for the quick reply. I'll try a different approach.

xissburg · Post by **xissburg** » Wed Nov 01, 2017 1:32 am

Actually, worldMaterialIdx.w is used when OGRE_BUILD_COMPONENT_PLANAR_REFLECTIONS:

Code: Select all

#ifdef OGRE_BUILD_COMPONENT_PLANAR_REFLECTIONS
        *( currentMappedConstBuffer+3u ) = queuedRenderable.renderable->mCustomParameter & 0x7F;
#endif

But apparently it doesn't use all bits?

Post by **dark_sylinc** » Wed Nov 01, 2017 2:14 am

Ouch, I didn't realize that. But yeah, like you said, it's only using 7 bits, and only if planar reflections are compiled in.
You could use the remaining 25 bits when compiled w/ planar reflections, and 32 bits if compiled without.

Things could be improved since worldMaterialIdx.y (the shadow bias) is currently using the full 32-bits when it could probably use 16 (or maybe even less) and put the planar reflection data in that component there.

Just use the full 32 bits of the .w component (i.e. don't compile with planar refl. support), and when you get things working we'll worry about reordering the bits to get everything squashed together.

SolarPortal · Post by **SolarPortal** » Tue Nov 07, 2017 10:36 am

@dark_sylinc, We also need to modify the instanceDecl for some items to contain the radius of the mesh on the X/Z and also to store the Y position of the Pivot point. I have been meaning to ask for a while but was waiting until someone else also needed it.
The method you mentioned here sounds really useful...

I suggest you do something similar with the pose weights:
Code: Select all
uint poseStart = instance.worldMaterialIdx[drawId].w;
poseData = bufferFetch( poseBuffer, int(poseStart + 0u) );

Currently we have been using the Z and W of the worldMaterialIdx and packed all the bits together, but it means we cant use planarReflections which we would really like to use and have no need of the fine light mask...

Code: Select all

			
			// Storing Y position...
			Ogre::Vector3 position = queuedRenderable.movableObject->getParentSceneNode()->_getDerivedPosition();
			*reinterpret_cast<float * RESTRICT_ALIAS>(currentMappedConstBuffer + 2) = (float)position.y;
			
			// Storing radius of mesh....
			Ogre::Vector3 size = queuedRenderable.movableObject->getWorldAabb().getSize();
			*(currentMappedConstBuffer + 3u) = (Ogre::uint16(size.x) & 0x0000FFFF) | (Ogre::uint16(size.z) << 16u);

Could you provide more explanation as to how to store the uint start(e.g. poseStart) in c++ or even an example of creating a new buffer, storing the idx and unpacking in the shader to retrieve the vec3, vec4, etc... for editing in shader so we can use the planar reflections

We obviously need to add something like this for GL (DX11 doesnt need to)
c++

Code: Select all

vsParams->setNamedConstant( "itemRadiusBuf", 0 );

Then include in the shader as a binding:
c++

Code: Select all

/*layout(binding = 0) */uniform samplerBuffer itemRadiusBuf;

but how to use this and set the accessors for it like you have:
glsl

Code: Select all

		
		uint itemRadStart = instance.worldMaterialIdx[drawId].z >> 9u;
		vec3 itemRadius;
		itemRadius = bufferFetch( itemRadiusBuf, int( itemRadStart + 0u) );

but how do we map the id and unmap the buffer in shader.....??

Thanks for the help

Post by **dark_sylinc** » Tue Nov 07, 2017 8:22 pm

OK; I remembered a bit more about that part:

The Hlms is given two buffers from HlmsBufferManager: the const buffer (where worldMaterialIdx lives), can't be bigger than 64kb, and a large tex buffer (worldMatBuf) and can hold a lot more data. It is used for matrix data, but it is meant to hold anything generic and potentially big or of arbitrary length.

You could have more, but you would have to manage it in a similar way HlmsBufferManager does for you.
Find in files the following keywords to checkout how it works:

worldMatBuf
mStartMappedTexBuffer
mCurrentMappedTexBuffer
mCurrentTexBufferSize
HlmsBufferManager::mapNextTexBuffer
HlmsBufferManager::unmapTexBuffer

Rather than creating a new buffer, you could store your data in worldMatBuf. That's what it is for.

For non-animated objects, we assumed a very simple scheme: worldMatBuf only stores two world matrix (one if doing shadow casting pass), and can be addressed by doing:

Code: Select all

UNPACK_MAT3x4( worldMatBuf, drawId << 1u );

For skeletal objects, we get rid of that assumption and add an indirection. Because a submesh may be affected by 3 bones (3 matrices) or 9 bones (9 matrices), or whatever number of bones, now the addressing becomes variable. Thus we need the indirection:

Code: Select all

uint matStart = instance.worldMaterialIdx[drawId].x >> 9u;
vec4 worldMat[3];
worldMat[0] = bufferFetch( worldMatBuf, int(matStart + _idx + 0u) );
worldMat[1] = bufferFetch( worldMatBuf, int(matStart + _idx + 1u) );
worldMat[2] = bufferFetch( worldMatBuf, int(matStart + _idx + 2u) );

Quick Note: matStart is calculated when we do, from C++:

Code: Select all

//uint worldMaterialIdx[]
size_t distToWorldMatStart = mCurrentMappedTexBuffer - mStartMappedTexBuffer;
distToWorldMatStart >>= 2;
*currentMappedConstBuffer = (distToWorldMatStart << 9 ) |
        (datablock->getAssignedSlot() & 0x1FF);

That is, we calculate, the difference of addresses between start of the tex buffer and current offset, and send it to the shader.
End of quick note

This leaves us with a problem: If I render a SubItem with skeletal animation (i.e. 9 bones) and then I render a non-animated object, the assumption breaks for the second object. The first object (the skeletally animated one) will be rendered fine, but the next one will not, because:

Code: Select all

drawId = 1; //Second element, remember drawId starts at 0.
UNPACK_MAT3x4( worldMatBuf, drawId << 1u );

Will gives us the 3rd and 4th bone of the previous item we rendered; and not our matrices.

To solve that, HlmsPbs performs three things:

1. We wrote 9 matrices to mCurrentMappedTexBuffer; but during non-shadow-casting passes, non-animated objects assume there's two matrix per draw. In other words we need to add some padding and pretend we wrote 10 matrices instead of 9. We do that here:

Code: Select all

//If the next entity will not be skeletally animated, we'll need
//currentMappedTexBuffer to be 16/32-byte aligned.
//Non-skeletally animated objects are far more common than skeletal ones,
//so we do this here instead of doing it before rendering the non-skeletal ones.
size_t currentConstOffset = (size_t)(currentMappedTexBuffer - mStartMappedTexBuffer);
currentConstOffset = alignToNextMultiple( currentConstOffset, 16 + 16 * !casterPass );
currentConstOffset = std::min( currentConstOffset, mCurrentTexBufferSize );
currentMappedTexBuffer = mStartMappedTexBuffer + currentConstOffset;

In other words, each animated object corrects the pointers so the assumptions hold for the next draw.

2. We need to cheat on draw id. We only draw an object with 9 matrices + 1 matrix of padding. But this is the same as drawing 5 non-animated objects. We recalculate the draw ID by doing pointer arithmetic based on the current location of texture buffer (currentMappedTexBuffer - mStartMappedTexBuffer) and add it to the start const buffer (currentConstOffset + mStartMappedConstBuffer). We do that here:

Code: Select all

if( !hasSkeletonAnimation )
{
//We need to correct currentMappedConstBuffer to point to the right texture buffer's
//offset, which may not be in sync if the previous draw had skeletal animation.
const size_t currentConstOffset = (currentMappedTexBuffer - mStartMappedTexBuffer) >>
                                        (2 + !casterPass);
currentMappedConstBuffer =  currentConstOffset + mStartMappedConstBuffer;

We can't use mCurrentMappedConstBuffer directly, because it still acts as if this is the second object being rendered. So we have to calculate currentMappedConstBuffer by adding an offset to mStartMappedConstBuffer.

This means that:

worldMaterialIdx[0] is used by the animated object
worldMaterialIdx[1] through worldMaterialIdx[4] are wasted
worldMaterialIdx[5] will be used by the second, non-animated object

NOTE: If the second object to draw were also skeletally animated instead of being non-animated; we don't have to cheat on this. Because we're already using indirections; drawId = 1 can safely be used; we can use mCurrentMappedConstBuffer directly, and the tex buffer will be indexed via texelFetch( worldMatBuf, matStart = instance.worldMaterialIdx[drawId].x >> 9u ).
This is "an extra layer of indirection vs assumption" trade off.
Basically rendering many skeletally animated objects in sucession (regardless of their number of bones, can be heterogenous) is fast, and rendering many non-animated objects in succession is also fast. But interleaving non-animated with animated objects leaves waste (RenderQueue sorting should ensure we batch them together instead of interleaving as much as possible).

3. We need to tell the RenderQueue that drawId had a discontinuity, and that is why it's so important that HlmsPbs::fillBuffersFor ends with:

Code: Select all

return ((mCurrentMappedConstBuffer - mStartMappedConstBuffer) >> 2) - 1;

Since mCurrentMappedConstBuffer has been offseted by now; we will return 5 (the new drawId) instead of 2.

Armed with this knowledge, you should now be able to figgure that you can put as many poses you want inside the worldMatBuf. It is a general-purpose large buffer so you could even put together bone matrices and pose data in it (as long as you're organized where each thing is) and that:

mCurrentMappedTexBuffer - mStartMappedTexBuffer to calculate offsets for sending to your shader.
You need to add some padding at the end in case the the next draw call doesn't have any fancy indirection so that their assumptions still hold. The next draw call will take care of generating a discontinuity in drawId

The devil is in the details:

It is very easy to screwup offset calculation. e.g. mCurrentMappedTexBuffer - mStartMappedTexBuffer is an offset in floats. It is in neither bytes nor matrices. The shader expects an offset in float4 (hahahahahahahahaha BBAAAHAHAHAHAHAHA <evil laugh>). Sometimes you'll confuse const and tex buffer addresses because their variables have similar names.
Some bugs will only happen when you render in a certain order because you broke an assumption / failed to restore offsets properly and it stays otherwise hidden.
You need to check if there's enough space left to write your data. Look for "exceedsConstBuffer" and "exceedsTexBuffer" and how we handle it

Adding pose animations + skeletal animations could simply be treated like skeletal animations with an extra "pose count" that say how many pose values come after the bone matrices (or before the bone matrices).
And using pose animations without skeletal animations could simply be treated like pose + skeletal animations with just 2 "bone" matrices.

We would have to stop thinking in terms of skeletal animation vs non-skeletal Items, and start thinking in terms of "Items that use indirection" (use skeletal and/or pose data) vs "Items that use assumption" (have no animation).

Adding a new buffer could also be another way; but you would have to manage its lifetime in a similar way HlmsBufferManager does for you: Ensure it's bound when poses are used, keep track of its offsets (so you can send them to the shader), create a new buffer when you run out of space. The big main advantage is that this buffer doesn't have to break assumptions for non-animated objects, since it's parallel to them.

SolarPortal · Post by **SolarPortal** » Tue Nov 07, 2017 9:51 pm

That's some good information there! Many thanks for the in depth explanation

I should be able to tailor it to our needs for the radius and pivot position for non animated items once i've read that a few more times

Ogre Forums

[2.1] How to modify InstanceDecl in HLMS PBS? Topic is solved

[2.1] How to modify InstanceDecl in HLMS PBS?

Re: [2.1] How to modify InstanceDecl in HLMS PBS?

Re: [2.1] How to modify InstanceDecl in HLMS PBS?

Re: [2.1] How to modify InstanceDecl in HLMS PBS?

Re: [2.1] How to modify InstanceDecl in HLMS PBS?

Re: [2.1] How to modify InstanceDecl in HLMS PBS?

Re: [2.1] How to modify InstanceDecl in HLMS PBS?

Re: [2.1] How to modify InstanceDecl in HLMS PBS?

Re: [2.1] How to modify InstanceDecl in HLMS PBS?