[3.0] binding Matrix in custom hlms

Problems building or running the engine, queries about how to use features etc.
psysu
Halfling
Posts: 74
Joined: Tue Jun 01, 2021 7:47 am
x 6

[3.0] binding Matrix in custom hlms

Post by psysu »

Hi, I'm trying to implement a custom unlit hlms. I inherited from the existing Ogre::HlmsUnlit type to create mine.

I reused to code from Ogre::HlmsUnlit for preparePassHash and fillBuffersFor functions.

For my problem, I want bind worldView matrix to the GPU program ,Where should I do that and how ?

I tried to doing it on fillBuffersFor function but I think im doing some kind of alignment mistake. If someone can explain how Ogre is binding worldViewProj Matrix to the GPU program, it'll be really helpful.

Thanks

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5436
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1343

Re: [3.0] binding Matrix in custom hlms

Post by dark_sylinc »

See the Ogre 2.1+ FAQ for resources on implementing custom Hlms.

The data that is sent every frame for every obj it can be split in multiple parts:

Every frame

Code: Select all

// mat4 worldViewProj
Matrix4 tmp =
    mPreparedPass.viewProjMatrix[mUsingInstancedStereo ? 4u : useIdentityProjection] * worldMat;
memcpy( currentMappedTexBuffer, &tmp, sizeof( Matrix4 ) );
currentMappedTexBuffer += 16;

That fills currentMappedTexBuffer with 16 floats (the 4x4) and advances the pointer.

You can see that HlmsPbs sends both worldViewProj and worldView, so it does this twice.

But earlier you can spot this code:

Code: Select all

const size_t minimumTexBufferSize = 16;
bool exceedsTexBuffer = static_cast<size_t>( currentMappedTexBuffer - mStartMappedTexBuffer ) +
                            minimumTexBufferSize >=
                        mCurrentTexBufferSize;

Which anticipates how much data we will be writing. If the buffer is not big enough, it creates a new one (or recycles a discarded one) and maps it.

In your case, since you want to send two matrices instead of 1, you want minimumTexBufferSize = 32; instead of 16.

Binding

This is handled by rebindTexBuffer( commandBuffer ); or mapNextTexBuffer( ... ).

The binding slot is hardcoded to slot R0 (t0 in HLSL, SSBO #0 in Vulkan and modern GL, TBO #0 in very old OpenGL HW).

Shader Code

See declarations:

Code: Select all

ReadOnlyBufferF( 0, float4, worldMatBuf ); // GLSL
ReadOnlyBuffer( 0, float4, worldMatBuf ); // HLSL
device const float4 *worldMatBuf [[buffer(TEX_SLOT_START+0)]] // Metal

The macro ReadOnlyBufferF deals declaring the variable for different HW support. Ideally we want to use readonly SSBOs, but when that's not supported, we fallback to TBOs. Additionally, the macro UNPACK_MAT4 uses different code for reading SSBOs and TBOs because the shader syntax is different.

The data is unpacked with a macro:

Code: Select all

float4x4 worldViewProj = UNPACK_MAT4( worldMatBuf, finalDrawId );

Which is the C++ equivalent of doing float4x4 worldViewProj = worldMatBuf[finalDrawId].

Note that Unlit doesn't support skeletal animation, while PBS does. Skeletal animations complicate things because indexing worldMatBuf[idx] is much more intricate (basically idx needs to be sent to another buffer so that shader ends up doing worldMatBuf[perInstanceOffsets[finalDrawId]], conceptually it's simple but from C++ side it becomes ridiculously complex. Unlit is simple because we can assume every object consumes exactly 16 floats, while w/ skeletons each object can consume an arbitrary amount of floats)

Misc

In HlmsUnlit::createShaderCacheEntry OpenGL needs to assign the slot # to the variables, which is named worldMatBuf.

In Vulkan, HlmsUnlit::setupRootLayout needs to tell the RootLayout that DescBindingTypes::ReadOnlyBuffer has at least 1 slot.