Matching Hlms PBS struct in GLSL/HLSL to C++

Discussion area about developing with Ogre-Next (2.1, 2.2 and beyond)


User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5433
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1341

Matching Hlms PBS struct in GLSL/HLSL to C++

Post by dark_sylinc »

This question came in to my PM, and my be relevant to people in general, so I'll answer through here:
I'm trying to understand how this bit of code:

Code: Select all

struct Material
{
   /* kD is already divided by PI to make it energy conserving.
     (formula is finalDiffuse = NdotL * surfaceDiffuse / PI)
   */
   vec4 kD; //kD.w is alpha_test_threshold
   vec4 kS; //kS.w is roughness
   //Fresnel coefficient, may be per colour component (vec3) or scalar (float)
   //F0.w is transparency
   vec4 F0;
   vec4 normalWeights;
   vec4 cDetailWeights;
   vec4 detailOffsetScaleD[4];
   vec4 detailOffsetScaleN[4];

   uvec4 indices0_3;
   //uintBitsToFloat( indices4_7.w ) contains mNormalMapWeight.
   uvec4 indices4_7;
};
Relates to C++ code.
I'm trying to adapt it to my needs and I would like to reorganize, add and remove stuff. But I don't understand how this part relates to a C++ counterpart.
The only thing I managed to understand is that the needed space is defined here:

Code: Select all

    const size_t HlmsPbsDatablock::MaterialSizeInGpu          = 52 * 4 + NUM_PBSM_TEXTURE_TYPES * 2 + 4;
    const size_t HlmsPbsDatablock::MaterialSizeInGpuAligned   = alignToNextMultiple(
                                                                    HlmsPbsDatablock::MaterialSizeInGpu,
                                                                    4 * 4 );
52*4... where does that come from?

If my new struct would be like this:

Code: Select all

struct Material
{
   vec4 param1;
   vec4 param2;
   uvec4 param3;
};
How do I define MaterialSizeInGpu and how do I send the data?. Thank you.
Ok, this is much easier than you think. The data is literally getting memcpy'd from C++'s HlmsPbsDatablock. If you watch at its definition:

Code: Select all

class _OgreHlmsPbsExport HlmsPbsDatablock
{
	//...
	float   mkDr, mkDg, mkDb;                   //kD
	float   _padding0;
	float   mkSr, mkSg, mkSb;                   //kS
	float   mRoughness;
	float   mFresnelR, mFresnelG, mFresnelB;    //F0
	float   mTransparencyValue;
	float   mDetailNormalWeight[4];
	float   mDetailWeight[4];
	Vector4 mDetailsOffsetScale[8];
	uint16  mTexIndices[NUM_PBSM_TEXTURE_TYPES];
	float   mNormalMapWeight;
	//...
};
This is exactly the C++ counterpart! It just looks different because of the bloody GLSL std140 rules which are really broken:
Although to be honest, these rules are so complicated and confusing that even driver implementations get it wrong, which is why I prefer using vec4 as much as possible rather than i.e. mixing vec3 with a float, because one driver may pack them together, while another adds padding between them. And the shader will be broken on that vendor until it fixes its driver bug. No, thanks.
But it's essentially the same. The data gets memcpy'd to GPU in HlmsPbsDatablock::uploadToConstBuffer. You can see we do a couple adjustments that are mere optimizations (e.g. we fill 'padding' with the alpha threshold since it was in our base class and hence not contiguous for the memcpy, when transparency is enabled we multiply the fresnel and kD by mTransparencyValue to avoid doing it in the shader, etc).
You control what you send to shader via uploadToConstBuffer. Whenever this needs to be called (i.e. a parameter changed), you need to call scheduleConstBufferUpdate() so that an upload is scheduled for when the time is appropriate. Multiple calls scheduleConstBufferUpdate will be merged into one uploadToConstBuffer call.

As for the value of MaterialSizeInGpu:
We have 13 vec4s (from kD through detailOffsetScaleN[3]). Multiplied by 4 gives 52 (52 floats). Multiplied by 4 again (the sizeof float) to get the size in bytes.
Then we sum the size of the texture types and finally the last float (mNormalMapWeight). As simple as that.
Note that you need to account for any padding between those variables according to the std140 rules. That's why we work only with vec4s, makes it easier (two vec3 should be the same as two vec4; two vec2 together can be the same as one vec4 or two vec4 depending on what came earlier, it can be tricky; gets worse when drivers get it wrong too)
The difference between MaterialSizeInGpu & MaterialSizeInGpuAligned is just the difference between what gets memcpy'd from C++ to GPU (to avoid reading out of bounds in system memory) and the final padding that gets the structure in GPU.
e.g.:

Code: Select all

for( all_materials_that_changed )
{
   memcpy( dataInGPU, datablock->dataInCPU, MaterialSizeInGpu );
   dataInGPU += MaterialSizeInGpuAligned;
   ++datablock;
}
Note that you don't need to memcpy. If you want to write whatever random stuff to the memory in GPU in uploadToConstBuffer, that's completely up to you. You can do whatever you want, I just thought personally that memcpy is the quickest since the data in CPU almost mirrors the data in GPU.
Hope this clarifies everything.
xrgo
OGRE Expert User
OGRE Expert User
Posts: 1148
Joined: Sat Jul 06, 2013 10:59 pm
Location: Chile
x 169

Re: Matching Hlms PBS struct in GLSL/HLSL to C++

Post by xrgo »

Thank you very much! I was guessing it was something simple thats why I did it via PM =)