HlmsBufferManager binds const buffers to vs and ps

Discussion area about developing with Ogre-Next (2.1, 2.2 and beyond)


User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5434
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1342

Re: HlmsBufferManager binds const buffers to vs and ps

Post by dark_sylinc »

Please note that certain changes you were proposing would (seemingly) be mostly transparent, but the changes you are proposing to HlmsDatablock would break existing Hlms custom implementations for no reason other than "refactor".

This makes upgrading harder.

uploading of data from the datablocks is not nice by giving direct access to raw mapped memory - I would prefer the interface like I created in HlmsBufferPool with writeData, insertValue and insertValues - then the execution is safer as at least in debug it is possible to assert when buffer overflows (writing more data than specified by number of bytes per slot).

I've looked at the interface and appears to be ok. What I am wondering is... is it really needed?

All datablocks are meant to be fixed in GPU size (not dynamic); which makes the problem quite easy to debug either in a C++ debugger or RenderDoc.

uploadToExtraBuffer is literally a memcpy. HlmsUnlitDatablock::uploadToConstBuffer is 3 memcpys. HlmsPbsDatablock::uploadToConstBuffer is 2 mempcys.

Please note dstPtr is write-combined memory, thus for better performance the whole region should be uploaded.

User avatar
bishopnator
Goblin
Posts: 299
Joined: Thu Apr 26, 2007 11:43 am
Location: Slovakia / Switzerland
x 11

Re: HlmsBufferManager binds const buffers to vs and ps

Post by bishopnator »

If anybody starts learning new interface/implementation, it possible to do a lot of wrong calls and hence it is I think better to get as much as possible feedback in form of logging/assertions/exceptions to detect the problem. Providing raw pointer access is here I think unnecessary source of possible problems. If at the end the app is running, then of course nothing like that is needed. The problems occurs mostly during learning and trying to use the interface and implement new classes.

I am also more alert if I see in implementation like now HlmsPbs and HlmsUnlit some code (a lot?) which will be part also of completely new hlms implementations - then I am questioning such code why is it there, why is it duplicated, etc.? And trying to figure out the solution how to reuse such code - even better if base class force the implementation. Such constraints are not obvious and one has to deeply study the existing code.

In Ogre 1.x it was possible to almost instantly start with new material and write shaders for it. In Ogre 2.x it is significantly more difficult - I checked also HlmsLowLevel, but as it is not recommended to use for bigger scenes, I rejected the idea to even try to use it. I am aware that probably for games and apps close to games, the HlmsPbs and HlmsUnlit would be sufficient, but I have in mind an app closer to CAD applications (as I am working on CAD at the moment). In the current implementation, we use a lot of geometry shaders and work a lot with lines (instead of triangles) and in this case the HlmsPbs is not suitable. The HlmsUnlit probably yes, but it still contains some parts which are not very useful for injecting a geometry shader there (e.g. due to names like VStoPSblock or how it is called).

I am trying to write a base class for such custom hlms implementation where I can start from very beginning without taking care of any code in HlmsPbs and HlmsUnlit.

I would be very satisfied if any of the code which I wrote would land in the Ogre codebase, but it is not necessary and I am not pushing it :-) I am however very thankful for all your notes and time which you invest in Ogre (and also specifically in the communication with me). Let check how far we can bring the code (and probably the new tutorial/sample) with this custom hlms.

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5434
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1342

Re: HlmsBufferManager binds const buffers to vs and ps

Post by dark_sylinc »

If anybody starts learning new interface/implementation, it possible to do a lot of wrong calls and hence it is I think better to get as much as possible feedback in form of logging/assertions/exceptions to detect the problem. Providing raw pointer access is here I think unnecessary source of possible problems. If at the end the app is running, then of course nothing like that is needed. The problems occurs mostly during learning and trying to use the interface and implement new classes.

That is true. I just feel the solution is right there at our fingertips; in a way that users can opt in for safety and hand holding; while opt out users can just get the raw ptr.

For example (thinking out loud) if your interface has raw() to opt out; that may be a suitable compromise.

User avatar
bishopnator
Goblin
Posts: 299
Joined: Thu Apr 26, 2007 11:43 am
Location: Slovakia / Switzerland
x 11

Re: HlmsBufferManager binds const buffers to vs and ps

Post by bishopnator »

Did you check the HlmsBufferPool::insertValue and insertValues member method? Those 2 methods I added exactly to cover the "raw" or "formatted" access. It is just necessary to specify size in bytes by which internally the pointer is adjusted for the next write, but the pointer is returned back. I wanted to cover the functionality of writing e.g. 12 floats for a matrix, but adjusting the internal pointer by 16 as it is done in case of writing 3x4 matrix.

Btw. I found also problem why in shaders the 4x3 matrix multiplication didn't yield expected results - the input position was declared as float3 instead of float4 - after fixing it, now the multiplication works also with 4x4 as well as with 4x3 matrix.

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5434
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1342

Re: HlmsBufferManager binds const buffers to vs and ps

Post by dark_sylinc »

Today I woke up realizing what was bothering me about your insertValues interface.

Don't get me wrong, the interface appears to still be useful to, e.g. fill instance data (which is more chaotic and random-looking even if it's not); but this is in relation to the uploadToConstBuffer functionality.

The reason your interface was bothering me is that it philosophically goes in the opposite direction the industry (and OgreNext) is moving: which is to transparently share code like CUDA does.

The ultimate goal is for the code to work like this:

Code: Select all

// MyShareFile.h
struct MyMaterial
{
   float4 diffuse;
   float4 specular;
};

// MyDatablock.h
class MyDatablock : public Ogre::MyDatablock
{
    MyMaterial myMaterial;
public:
    void uploadToConstBuffer( char *dstPtr, uint8 dirtyFlags ) override
    {
        memcpy( dstPtr, &this->myMaterial, sizeof( myMaterial ) );
        // or alternatively:
        *(MyMaterial*)dstPtr = this->myMaterial;
    }
};

Not only uploadToConstBuffer becomes just one memcpy; but more importantly the file MyShareFile.h can be included by C++ and shaders transparently.

Of course what goes into MyShareFile.h can only be a limited subset because it must be an intersection of what all the shader languages (GLSL + HLSL + Metal) and C++ can compile without complaining.

Not all code needs to behave that way, but it should be possible to do this effortlessly.

The issues preventing this kind of transparent sharing right now are:

  1. C++ header lives in the C++ project, shader file lives in the data folder. We need an easy way for either C++ build system to add include folders to point to where data folders are, or for include headers in C++ to be inserted into shaders as generated strings. I suspect the former is easier than the latter given that C++ does not have good reflection capabilities; but it may be easy to solve with external tooling (e.g. a tool that reads the header and converts it into an array like bintoheader.py does). Then shaders would simply do @insertpiece( MyMaterialFile.h ) and C++ would need to set mPieceFiles[shaderStage]["MyMaterialFile.h"]).

  2. We don't have float3 / float4 datatypes, though we have something basic going on. Please note that we don't have to support math operations (e.g. c = a + b) or swizzling (e.g. a.xyzz). We simply need to be able to easily convert to/from Vector3 / Vector4 back and forth.

    • And perhaps some tricks like being able to pack/unpack two Vector2 into one float4. Or one Vector3 + float, etc.

  3. uint8 / uint16 datatypes are missing in shaders. There are extensions for this for DX12 and Vulkan, but they're laughably broken or half-baked. Only Metal has good support. Nonetheless some macros may be able to workaround the issue.

  4. Alignment rules don't match. I was hoping shader languages to address this, but 8 years later it remains unaddressed. For example float3 a[16] takes 16-bytes per entry. But float3 a; float b; will consume 16 bytes. Metal addressed this particular issue by having simd::float3 (16-bytes, 4 bytes gets wasted) and simd::packed_float3 (12 bytes).

    • The difference is that Apple included support for packed_float3 inside shaders (not just C++) and e.g. packed_float3 a[16] will work as you expect (12 bytes per entry), at the cost of GPU performance.

  5. OpenGL alignment has driver issues. This was a much bigger problem 8 years ago. It is now less of a problem because OpenGL is becoming less relevant. The problem is that you can write float3 foo; float foo; and in some drivers it will consume 16 bytes; in others it will consume 20 bytes. This is why you'll see many of our shaders workaround this via float4 foo_bar; and then macros: #define foo foo.xyz and #define bar bar.w

Right now what we are doing is that the C++ & shader parts closely resemble each other but not exactly, nor is it shared.

The best (and latest) example is the struct AtmoSettingsGpu in C++ is identical to its shader counterpart. In fact the data is memcpy'ed as is into a ConstBufferPacked via ConstBufferPacked::upload (because AtmosphereNpr::mHlmsBuffer is of type BT_DEFAULT instead of BT_DYNAMIC_*).

However we still have "packedParams1" & co. to deal with the alignment issues I mentioned, which are handled in AtmosphereNpr::setPackedParams in C++, and handled by macros in shader. Ideally that madness should not be needed and could be automated.

Note that we still do some underhanded tricks like storing the camera position into skyLightAbsorption.w, sunAbsorption.w and cameraDisplacement.w. It's the never-ending struggle of readability vs optimization. However just because I do it, it doesn't mean that you have to do it in your customized Hlms implementations.

A shame really, because this sort of packing should happen behind the scenes (Rust actually does this: Unless explicitly told, Rust will pack struct members in the order it feels like).

Note that certain compromises are possible. e.g. If pre-build tools like bintoheader.py or the like are used (or a python script that copy pastes a C++ header into the shader data path every time it is launched) it would be possible to stuff like the following:

Code: Select all

// Original.h (for C++ consumption)
struct AtmoSettingsGpu
{
    packed_float3 mieAbsorption;
    float finalMultiplier;
    uint16 myU16;
    uint16 myU16_2;
};

// Generated.h (for shader consumption, autogenerated "somehow")
struct AtmoSettingsGpu
{
    float4 mieAbsorption_finalMultiplier;
    uint myU16_myU16_2;
};

struct AtmoSettingsGpuUnpacked
{
    float3 mieAbsorption;
    float finalMultiplier;
    uint myU16;
    uint myU16_2;
};

#define EXTRACT_AtmoSettingsGpu( x ) float3 mieAbsorption = x.mieAbsorption_finalMultiplier.xyz; float finalMultiplier = x.mieAbsorption_finalMultiplier.w; uint myU16 = x.myU16_myU16_2 & 0xFFFFu; uint myU16_2 = (x.myU16_myU16_2 >> 16u) & 0xFFFFu;

#define FILL_AtmoSettingsGpuUnpacked( out, x ) out.mieAbsorption = x.mieAbsorption_finalMultiplier.xyz; out.finalMultiplier = x.mieAbsorption_finalMultiplier.w; out.myU16 = x.myU16_myU16_2 & 0xFFFFu; out.myU16_2 = (x.myU16_myU16_2 >> 16u) & 0xFFFFu;

// Actual shader

#include "Generated.h"

@property( syntax != metal )
	CONST_BUFFER( AtmoSettingsBuf, @value(atmosky_npr) )
	{
		AtmoSettingsGpu atmoSettings;
	};
@end

void main()
{
    EXTRACT_AtmoSettingsGpu( atmoSettings );
    // now we can use mieAbsorption et. al.
    
// Or alternatively: AtmoSettingsGpuUnpacked atmoSettingsU; FILL_AtmoSettingsGpuUnpacked( atmoSettingsU, atmoSettings ); }

The differences between EXTRACT_AtmoSettingsGpu & FILL_AtmoSettingsGpuUnpacked is that FILL_AtmoSettingsGpuUnpacked will feel more familiar and easier to use; but some shader compilers have trouble optimizing variables inside struct, thus EXTRACT_AtmoSettingsGpu may still be useful as it unpacks everything into local variables.

I'm leaving out some details like interleaving unsigned and signed integers. Not every operation needs to be supported anyway, but at least a subset. Ultimately it's not just shader languages, but also we're trying to share data between fundamentally different HW (that's why those obnoxious alignment rules exist after all).

These solutions have downsides. For example adding a precompilation step always introduces friction to the user:

  1. It may be hard to integrate to whatever build system they're using (CMake, Visual Studio, XCode, Android Studio, whatever).

  2. It may cause cryptic build errors because the tool system is incorrectly setup.

  3. It can increase build times (e.g. Python is not exactly a fast language; and non-python languages are hard to port to other systems, although I am getting fond of "lite" Rust as a replacement for Python), which increases demand to just run the precompilation step on demand (i.e. failing to run it may cause shaders & C++ to go out of sync).

  4. Overall entry barrier increases.

Well, this is long enough. The point I'm trying to make is that, if you find it feasible or reasonable, you should try to keep this in mind and if possible, focus your efforts into getting us closer to this goal.

Cheers

User avatar
Crystal Hammer
Gnome
Posts: 388
Joined: Sat Jun 23, 2007 5:16 pm
x 99

Re: HlmsBufferManager binds const buffers to vs and ps

Post by Crystal Hammer »

Interesting topic. Feels like I've already read 2 chapters of a book.
I'm not sure about half of it. But I can say 2 things.
Firstly from my side, it'd be trouble to merge now if there were suddenly many changes in datablocks or in shaders, in many places. I still didn't upgrade to 4.0 from 3.0 because of that 1 parameter :)

Plus if it was up to me (obviously it's not), I'd focus not on reworking internals (unless this gives notably more performace), but on new samples (like e.g. grass) and demos (e.g. combining PBS+HDR+SSAO+Refraction into one sample). I think that'd be helpful for some, also doing an engine basing on OgreNext. Still, I'm only seeing fixes in git lately so yeah no point wishing for any new stuff, even particles are still not merged.

Secondly, from my point of view I really don't mind such code:

Code: Select all

#define p_borderLimit			midf_c(  atmo.packedParams2.w )
#define p_skyColour			midf3_c( atmo.packedParams3.xyz )
#define p_densityDiffusion		midf_c(  atmo.packedParams3.w )
// clang-format on

const float3 cameraPos = float3( atmo.skyLightAbsorption.w, atmo.sunAbsorption.w,
								 atmo.cameraDisplacement.w );

If you say that GPUs still need that then it's fine. And I don't even see much point in adding more macros or other stuff to hide it and allow other names etc.

I have been using atmo for a long time and adding my own variables to it, mostly fog. It is great because I can easily now add new params to it and its cpp parts, and then use new atmo.params in (all) shaders.
I think this could be even possible by only using ConstBuffer? Is there a sample for this? If not, will there be, or is it easy already?
I don't even have Atmosphere (rendered), I only use it for all those parameters for shaders.

User avatar
bishopnator
Goblin
Posts: 299
Joined: Thu Apr 26, 2007 11:43 am
Location: Slovakia / Switzerland
x 11

Re: HlmsBufferManager binds const buffers to vs and ps

Post by bishopnator »

I tried to added the definitions of the structs through *.h files or *.hlsl files:
e.g. PassData.hlsl

Code: Select all

struct PassData
{
	float4x4 viewProj;
	float4 depthRange;
};

And then include it in shader:

Code: Select all

#include "PassData.hlsl"

// pass buffer
CONST_BUFFER_STRUCT_BEGIN(PassBuffer, 0)
{
	PassData passData;
}
CONST_BUFFER_STRUCT_END(passBuf);

(here it would be possible to use PassData directly instead of PassBuffer)

The problem is in HLSLIncludeHandler that the file is not found (I placed it next to VertexShader_vs.hlsl), so further tweaks would be needed. Somehow I don't understand the point, how the insertValue or insertValues would limit such usage. I can imagine, that if the includes in hlsl/glsl are properly resolved, it is possible to add include path for the c++ projects also into media folder (e.g. Media/Hlms/CustomHlms/HLSL) and then include the file with struct also in c++ code.
e.g. with following code snippet:

Code: Select all

#include <OgreShaderPrimitives.h>
namespace Ogre
{
    #include "PassData.h"
    // .. possible other headers used also by shaders
}

@Crystal Hammer I don't quite agree that the Ogre development should focus only on new features/samples/tutorials without spending much time on internals. The internal interface I think plays huge role about motivating new people to use Ogre. I will repeat myself here :-) In Ogre 1.x it is possible to start coding shaders immediately, in Ogre 2.x it is not so easy. I am trying to put a new base class together which will make this state easier.

User avatar
bishopnator
Goblin
Posts: 299
Joined: Thu Apr 26, 2007 11:43 am
Location: Slovakia / Switzerland
x 11

Re: HlmsBufferManager binds const buffers to vs and ps

Post by bishopnator »

I updated the code so the structure for Passbuffer is separated in its own file which is included by shader and c++. Everything seems to work as before. Is it way as you expected? For me it is something which is not possible to force through the base class and I am still giving more focus on the base class (HlmsExt) as I want to use it for my other custom HLMS implementation (not the one showing in the git repo).

User avatar
bishopnator
Goblin
Posts: 299
Joined: Thu Apr 26, 2007 11:43 am
Location: Slovakia / Switzerland
x 11

Re: HlmsBufferManager binds const buffers to vs and ps

Post by bishopnator »

I realized, that I forgot to push the changes. In the meantime I added code sharing between hlsl and c++ also for the instance data, material data and pass data - all 3 buffers used by the shaders now share the type between c++ and hlsl.

User avatar
Crystal Hammer
Gnome
Posts: 388
Joined: Sat Jun 23, 2007 5:16 pm
x 99

Re: HlmsBufferManager binds const buffers to vs and ps

Post by Crystal Hammer »

bishopnator wrote: Fri Jun 28, 2024 3:23 pm

motivating new people to use Ogre. I will repeat myself here :-) In Ogre 1.x it is possible to start coding shaders immediately, in Ogre 2.x it is not so easy. I am trying to put a new base class together which will make this state easier.

Right. Very good point, I agree.
I feel like I know something in shaders already. But that is only until I hit some wall and feel like I still don't know anything.
And to me it seems like knowing internals of HLMS PBS shaders is just mandatory now to modify them further.

User avatar
bishopnator
Goblin
Posts: 299
Joined: Thu Apr 26, 2007 11:43 am
Location: Slovakia / Switzerland
x 11

Re: HlmsBufferManager binds const buffers to vs and ps

Post by bishopnator »

I am running out of ideas how to fix shadow mapping in the CustomHlms. I thought that it must be related to the following code snipped from the HLSL:

Code: Select all

  float shadowConstantBias = -uintBitsToFloat( worldMaterialIdx[inVs_drawId].y ) * passBuf.depthRange.y;

  //We can't make the depth buffer linear without Z out in the fragment shader;
  //however we can use a cheap approximation ("pseudo linear depth")
  //see http://www.yosoygames.com.ar/wp/2014/01/linear-depth-buffer-my-ass/
  outVs_Position.z = outVs_Position.z + shadowConstantBias;

But even if I remove the update of the z position from the unlit hlms shaders, it renders the shadows properly on the ground plane. I copied the code to CustomHlms including setting of all the values, but the result is not as desired. Actually the shadows from one light are correctly mapped and from the other 2 incorrectly (the shadows are moving when camera orientation is updated).

Is there something special which must HLMS fulfills in order to render the objects properly in the shadow caster pass? In the implementation of HlmsUnlit and HlmsPbs I don't see anything special there - except this shadowConstantBias and update of .z value.

Does Ogre use some predefined pixel shader for the shadow casters? It seems that the pixel shaders are disabled in both unlit nad pbs implementations in the case of shadow casters. I am probably missing some crucial very important information here.

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5434
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1342

Re: HlmsBufferManager binds const buffers to vs and ps

Post by dark_sylinc »

Getting shadow maps to work correctly is like a rite of passage for graphics programmers.

But the first thing you need to do is check the output in RenderDoc matches the one with ground truth.
You can also do source-level debugging of vertex and pixel shaders in RenderDoc, which will let you understand better what's going wrong.

If you don't know how to use RenderDoc see this post and this video.

User avatar
bishopnator
Goblin
Posts: 299
Joined: Thu Apr 26, 2007 11:43 am
Location: Slovakia / Switzerland
x 11

Re: HlmsBufferManager binds const buffers to vs and ps

Post by bishopnator »

Super, I found problem with RenderDoc. Thanks man, I will update the source code and commit the updates.
Image

User avatar
bishopnator
Goblin
Posts: 299
Joined: Thu Apr 26, 2007 11:43 am
Location: Slovakia / Switzerland
x 11

Re: HlmsBufferManager binds const buffers to vs and ps

Post by bishopnator »

I pushed the updated implementation. There are still GLSL shaders missing - I will implement them as simple as possible to have a good overview of the code there. MY goal is also at least for this CustomHlms tutorial (if it will remain as tutorial) avoid those CrossPlatformSettings_piece_all.hlsl - I think for those simple shaders it is not necessary to use them.

User avatar
bishopnator
Goblin
Posts: 299
Joined: Thu Apr 26, 2007 11:43 am
Location: Slovakia / Switzerland
x 11

Re: HlmsBufferManager binds const buffers to vs and ps

Post by bishopnator »

I added also GLSL shaders - there were problems that GLSL doesn't support include directives without ARB_shading_language_include, but the extension according to its description is not really a support for including files - the user must define the content of the files before compiling the shader source. I added rather support to Ogre::HlmsExt to upload all .h files as pieces within calculateHashForPreCreate and calculateHashForPreCaster. In HLSL shaders I kept include directive, but in GLSL I used then insertpiece mechanism (which is possible to use also in HLSL if one would like to keep the shaders as similar as possible).

I think I am done with the base class and that example. If you will see any further improvements / required changes which you think are the blockers to integrate the code in ogre-next, let me know.

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5434
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1342

Re: HlmsBufferManager binds const buffers to vs and ps

Post by dark_sylinc »

Hi!

When it comes to GLSL & includes you have two alternatives OgreNext gives you:

  1. Call HighLevelGpuProgram::setEnableIncludeHeader( true ); / enable_include_header true. It is off by default because we must parse it looking for #include. The technique is very crude: We find and replace for instances of #include "Filename" and replace it with the contents of that file. The only issue is that it will mess with lines when reporting compilation warnings/errors. However the fully parsed file will be dumped to Ogre.log if it failed to compile.
  2. Use @piece and @insertpiece from Hlms.

Cheers
Matias

User avatar
bishopnator
Goblin
Posts: 299
Joined: Thu Apr 26, 2007 11:43 am
Location: Slovakia / Switzerland
x 11

Re: HlmsBufferManager binds const buffers to vs and ps

Post by bishopnator »

Good to know about the option with enabling includes. I did it with @insertpiece and the pieces are injected from c++ code. The Ogre::HlmsExt searches for the *.h files from the specified folders given in the constructor and every header file is then defined also as piece for the template. This way the support for Ogre::HlmsExt is automatic and it is not forced to be used. It is up to the developer to use the functionality and it is possible to use the functionality the same way for all shader templates (in the code in my git however I kept using #include in HLSL and @insertpiece in GLSL so both options are shown in the example).