Today I woke up realizing what was bothering me about your insertValues interface.
Don't get me wrong, the interface appears to still be useful to, e.g. fill instance data (which is more chaotic and random-looking even if it's not); but this is in relation to the uploadToConstBuffer functionality.
The reason your interface was bothering me is that it philosophically goes in the opposite direction the industry (and OgreNext) is moving: which is to transparently share code like CUDA does.
The ultimate goal is for the code to work like this:
Code: Select all
// MyShareFile.h
struct MyMaterial
{
float4 diffuse;
float4 specular;
};
// MyDatablock.h
class MyDatablock : public Ogre::MyDatablock
{
MyMaterial myMaterial;
public:
void uploadToConstBuffer( char *dstPtr, uint8 dirtyFlags ) override
{
memcpy( dstPtr, &this->myMaterial, sizeof( myMaterial ) );
// or alternatively:
*(MyMaterial*)dstPtr = this->myMaterial;
}
};
Not only uploadToConstBuffer becomes just one memcpy; but more importantly the file MyShareFile.h
can be included by C++ and shaders transparently.
Of course what goes into MyShareFile.h
can only be a limited subset because it must be an intersection of what all the shader languages (GLSL + HLSL + Metal) and C++ can compile without complaining.
Not all code needs to behave that way, but it should be possible to do this effortlessly.
The issues preventing this kind of transparent sharing right now are:
-
C++ header lives in the C++ project, shader file lives in the data folder. We need an easy way for either C++ build system to add include folders to point to where data folders are, or for include headers in C++ to be inserted into shaders as generated strings. I suspect the former is easier than the latter given that C++ does not have good reflection capabilities; but it may be easy to solve with external tooling (e.g. a tool that reads the header and converts it into an array like bintoheader.py does). Then shaders would simply do @insertpiece( MyMaterialFile.h )
and C++ would need to set mPieceFiles[shaderStage]["MyMaterialFile.h"]).
-
We don't have float3 / float4 datatypes, though we have something basic going on. Please note that we don't have to support math operations (e.g. c = a + b) or swizzling (e.g. a.xyzz). We simply need to be able to easily convert to/from Vector3 / Vector4 back and forth.
-
uint8 / uint16 datatypes are missing in shaders. There are extensions for this for DX12 and Vulkan, but they're laughably broken or half-baked. Only Metal has good support. Nonetheless some macros may be able to workaround the issue.
-
Alignment rules don't match. I was hoping shader languages to address this, but 8 years later it remains unaddressed. For example float3 a[16]
takes 16-bytes per entry. But float3 a; float b;
will consume 16 bytes. Metal addressed this particular issue by having simd::float3
(16-bytes, 4 bytes gets wasted) and simd::packed_float3
(12 bytes).
-
OpenGL alignment has driver issues. This was a much bigger problem 8 years ago. It is now less of a problem because OpenGL is becoming less relevant. The problem is that you can write float3 foo; float foo;
and in some drivers it will consume 16 bytes; in others it will consume 20 bytes. This is why you'll see many of our shaders workaround this via float4 foo_bar;
and then macros: #define foo foo.xyz
and #define bar bar.w
Right now what we are doing is that the C++ & shader parts closely resemble each other but not exactly, nor is it shared.
The best (and latest) example is the struct AtmoSettingsGpu
in C++ is identical to its shader counterpart. In fact the data is memcpy'ed as is into a ConstBufferPacked via ConstBufferPacked::upload
(because AtmosphereNpr::mHlmsBuffer
is of type BT_DEFAULT
instead of BT_DYNAMIC_*
).
However we still have "packedParams1" & co. to deal with the alignment issues I mentioned, which are handled in AtmosphereNpr::setPackedParams
in C++, and handled by macros in shader. Ideally that madness should not be needed and could be automated.
Note that we still do some underhanded tricks like storing the camera position into skyLightAbsorption.w, sunAbsorption.w and cameraDisplacement.w. It's the never-ending struggle of readability vs optimization. However just because I do it, it doesn't mean that you have to do it in your customized Hlms implementations.
A shame really, because this sort of packing should happen behind the scenes (Rust actually does this: Unless explicitly told, Rust will pack struct members in the order it feels like).
Note that certain compromises are possible. e.g. If pre-build tools like bintoheader.py or the like are used (or a python script that copy pastes a C++ header into the shader data path every time it is launched) it would be possible to stuff like the following:
Code: Select all
// Original.h (for C++ consumption)
struct AtmoSettingsGpu
{
packed_float3 mieAbsorption;
float finalMultiplier;
uint16 myU16;
uint16 myU16_2;
};
// Generated.h (for shader consumption, autogenerated "somehow")
struct AtmoSettingsGpu
{
float4 mieAbsorption_finalMultiplier;
uint myU16_myU16_2;
};
struct AtmoSettingsGpuUnpacked
{
float3 mieAbsorption;
float finalMultiplier;
uint myU16;
uint myU16_2;
};
#define EXTRACT_AtmoSettingsGpu( x ) float3 mieAbsorption = x.mieAbsorption_finalMultiplier.xyz; float finalMultiplier = x.mieAbsorption_finalMultiplier.w; uint myU16 = x.myU16_myU16_2 & 0xFFFFu; uint myU16_2 = (x.myU16_myU16_2 >> 16u) & 0xFFFFu;
#define FILL_AtmoSettingsGpuUnpacked( out, x ) out.mieAbsorption = x.mieAbsorption_finalMultiplier.xyz; out.finalMultiplier = x.mieAbsorption_finalMultiplier.w; out.myU16 = x.myU16_myU16_2 & 0xFFFFu; out.myU16_2 = (x.myU16_myU16_2 >> 16u) & 0xFFFFu;
// Actual shader
#include "Generated.h"
@property( syntax != metal )
CONST_BUFFER( AtmoSettingsBuf, @value(atmosky_npr) )
{
AtmoSettingsGpu atmoSettings;
};
@end
void main()
{
EXTRACT_AtmoSettingsGpu( atmoSettings );
// now we can use mieAbsorption et. al.
// Or alternatively:
AtmoSettingsGpuUnpacked atmoSettingsU;
FILL_AtmoSettingsGpuUnpacked( atmoSettingsU, atmoSettings );
}
The differences between EXTRACT_AtmoSettingsGpu & FILL_AtmoSettingsGpuUnpacked is that FILL_AtmoSettingsGpuUnpacked will feel more familiar and easier to use; but some shader compilers have trouble optimizing variables inside struct, thus EXTRACT_AtmoSettingsGpu may still be useful as it unpacks everything into local variables.
I'm leaving out some details like interleaving unsigned and signed integers. Not every operation needs to be supported anyway, but at least a subset. Ultimately it's not just shader languages, but also we're trying to share data between fundamentally different HW (that's why those obnoxious alignment rules exist after all).
These solutions have downsides. For example adding a precompilation step always introduces friction to the user:
-
It may be hard to integrate to whatever build system they're using (CMake, Visual Studio, XCode, Android Studio, whatever).
-
It may cause cryptic build errors because the tool system is incorrectly setup.
-
It can increase build times (e.g. Python is not exactly a fast language; and non-python languages are hard to port to other systems, although I am getting fond of "lite" Rust as a replacement for Python), which increases demand to just run the precompilation step on demand (i.e. failing to run it may cause shaders & C++ to go out of sync).
-
Overall entry barrier increases.
Well, this is long enough. The point I'm trying to make is that, if you find it feasible or reasonable, you should try to keep this in mind and if possible, focus your efforts into getting us closer to this goal.
Cheers