Maybe I'm missing something, because this seems just awful.
Apparently, when a mesh with bone weights is loaded, Mesh::_compileBoneAssignments() is called. Among other things it does, it calculates the maximum number of blend weights for any vertex in the mesh, which is clamped to OGRE_MAX_BLEND_WEIGHTS.
After calculating this number, the number is used to pack the blend weights together as tightly as possible. Then it throws away the number!
What this means is that you can't just have a vertex shader that works for 4 blend weights and use that for everything. That won't work because if you have a mesh formatted with only 2 blend weights per vertex and you feed that to a 4 blend weight vertex shader, it'll see the 2 blend weights from the current vertex plus 2 unwanted blend weights from the next vertex!
So meshes get formatted one of 4 different ways as far as blend weights go, and the user is just supposed to know which format will be chosen, and assign the correct vertex shader to each one on a per-mesh basis? Horrible!
That means more permutations of shaders that need to be maintained--as if we don't have enough shader permutations to manage.
It means additional bookkeeping for users to keep the correct vertex shader permutations assigned to the correct meshes.
It means a fragile system that can break due to a seemingly innocuous change to a mesh--anything that changes the number of blend weights.
Why?
Why not just format with OGRE_MAX_BLEND_WEIGHTS weights for each vertex? Padding with zeros of course. If memory is a concern, why not use bytes for the weights instead of floats?
Mesh and the formatting of blend weights...
- lunkhound
- Gremlin
- Posts: 169
- Joined: Sun Apr 29, 2012 1:03 am
- Location: Santa Monica, California
- x 17
- c6burns
- Beholder
- Posts: 1511
- Joined: Fri Feb 22, 2013 4:44 am
- Location: Deep behind enemy lines
- x 134
Re: Mesh and the formatting of blend weights...
For some reason this just doesn't bug me at all, but maybe it's just me. Changing the number of blend weights on a model is a big deal (at least to me). Not something your artists should be doing without considering the effects on the engine. I remember CE3 not being able to handle anything but exactly 4. I guess if the number of weights are being calculated, it would be convenient to have the result saved somewhere so you can pass it as a uniform, or use it as a preprocessor var. As for using bytes instead of floats, the reason not to would of course be precision.
- lunkhound
- Gremlin
- Posts: 169
- Joined: Sun Apr 29, 2012 1:03 am
- Location: Santa Monica, California
- x 17
Re: Mesh and the formatting of blend weights...
I would think that 8-bits would be adequate precision for blend weights. If not, 16-bit would surely be enough precision. And with 16-bits it would still be less memory on average compared to packed floats.
OK here's an idea: we always send a ubyte4 for the blend indices -- what if we reserve index 0xff to indicate invalid blend weights. That way a shader could use dynamic branching to skip the blend weights it shouldn't be looking at.
That way it's up to the shader -- you can either have a less-than optimal performing shader that works with all meshes, or you can keep doing things as they are now with no performance impact.
Either way, it would be nice to save off that blend-weight-stride value on the mesh so that the vertex-shader assignment could be automated in some way. Also it would be nice to be able to specify whether to use bytes, shorts or floats for the blend weights.
OK here's an idea: we always send a ubyte4 for the blend indices -- what if we reserve index 0xff to indicate invalid blend weights. That way a shader could use dynamic branching to skip the blend weights it shouldn't be looking at.
That way it's up to the shader -- you can either have a less-than optimal performing shader that works with all meshes, or you can keep doing things as they are now with no performance impact.
Either way, it would be nice to save off that blend-weight-stride value on the mesh so that the vertex-shader assignment could be automated in some way. Also it would be nice to be able to specify whether to use bytes, shorts or floats for the blend weights.
- c6burns
- Beholder
- Posts: 1511
- Joined: Fri Feb 22, 2013 4:44 am
- Location: Deep behind enemy lines
- x 134
Re: Mesh and the formatting of blend weights...
Dynamic branching as a requirement for hardware skinning doesn't seem like the best solution to me. I get that you are trying to avoid having different materials/programs for models with different amount of weights, but I don't think that's the right way to go for all users of Ogre. Why not store the return value of Mesh::_rationaliseBoneAssignments in a member variable and use it to determine the correct program for the material? You could use the preprocessor to generate programs for 2, 3 and 4 weights in advance and therefore have no branching at all.
16 bit is probably enough precision, but then aren't you pushing an extra divide operation into the vertex program to translate from integer back to float? Maybe I'm misunderstanding something there.
16 bit is probably enough precision, but then aren't you pushing an extra divide operation into the vertex program to translate from integer back to float? Maybe I'm misunderstanding something there.
- lunkhound
- Gremlin
- Posts: 169
- Joined: Sun Apr 29, 2012 1:03 am
- Location: Santa Monica, California
- x 17
Re: Mesh and the formatting of blend weights...
No, dynamic branching would not be required. If the unused indices were set to 0xff, the existing way of doing it would still work as usual. If you are doing all the shader permutations, that would still work -- no dynamic branching. If you assign the wrong version shader, it would still blow up -- same as now. The difference is that it would be possible to write a single shader that handles all cases, at the cost of some dynamic branching.
We agree that mesh should save the blend-weight-stride (return value of _rationalizeBoneAssignements) in a member variable to aid with automatically selecting the correct vertex program. Seems like a no-brainer.
For 16-bit (or 8-bit) you do need to convert it in the shader, but it isn't a division, just multiplication by 1/65535 (or whatever). I would think the memory and memory bandwidth savings would be worth it. The scale factor could be passed in a uniform so that the same shader could be used for 8-bit or 16-bit.
We agree that mesh should save the blend-weight-stride (return value of _rationalizeBoneAssignements) in a member variable to aid with automatically selecting the correct vertex program. Seems like a no-brainer.
For 16-bit (or 8-bit) you do need to convert it in the shader, but it isn't a division, just multiplication by 1/65535 (or whatever). I would think the memory and memory bandwidth savings would be worth it. The scale factor could be passed in a uniform so that the same shader could be used for 8-bit or 16-bit.
- dark_sylinc
- OGRE Team Member
- Posts: 4212
- Joined: Sat Jul 21, 2007 4:55 pm
- Location: Buenos Aires, Argentina
- x 802
- Contact:
Re: Mesh and the formatting of blend weights...
Hi.
There are four factors at play here:
1. Blend Indices is stored as VET_UBYTE4. Because 4 bytes is the minimum word size for GPUs, having 1 index or 4 indices uses the same amount of vertex space. So the value of OGRE_MAX_BLEND_WEIGHTS nor the actual blends per vertex count affects anything in this case.
This limits the max. bone count to 256 though (which is really high anyway) as the indices are 8-bit.
2. The weights are stored as floats. Having 4 weights occupies more space than 1. So this attribute is the one affected the most. But... (go to point 3)
3. At least on D3D9 (and IIRC OpenGL), if you declare "float4 BlendWght : BLENDWEIGHT;" when the vertex format only contains one blend weight, the other 3 weights will be zero'ed. So a shader prepared to deal with 4 weights per vertex can open a mesh with 1, 2, and 3 weights per vertex just fine. However opening a mesh with 4 weights per vertex with a shader prepared for 2 weights p/ vertex will glitch.
I don't know about D3D11 though, I know the API is much more strict regarding the vertex layout and the shader matching so my guess this trick won't work. But D3D11 is usually paired with RTSS (1.x) and HLMS (2.x) which automatically handle the issue, so not a big problem.
4. In practice, even though you can assign a 4-weights p/ vertex shader to a 2 weights p/ vertex model, or a 2 wpv shader to a 1 wpv model and they will work fine; you don't want to do that for performance reasons.
On a skeletally animated model, most of the vertex shader time is spent skinning. And you're literally doubling, tripling or even quadrupling its work with a 0 weight. The performance impact begins to notice.
This is why I started the Hlms. The amount of permutations explodes and also assumes the artist can assign the right material with the right shader for every model.
The permutations are literally O(2^N) complexity: W/ skeleton, w/out skeleton. With 1/2/3/4 wpv. With shadows, without shadows. With normal mapping (normals and tangents need to be skinned too with 1/2/3/4 wpv), w/out normal mapping. With Instancing, w/out instancing.
Considering you want to support all of those combinations, that's 2^7 = 128 different shaders.
It's just too much to handle by hand. What's worst is that probably you will never use all those 128 permutations, but during production you don't know which ones will be used yet.
PS. To know how many weights per vertex a submesh uses (aka the return value from _rationaliseBoneAssignments), look at its vertex format; find the VES_BLENDWEIGHT semantic and get the weight per vertex count with VertexElement::getTypeCount.
There are four factors at play here:
1. Blend Indices is stored as VET_UBYTE4. Because 4 bytes is the minimum word size for GPUs, having 1 index or 4 indices uses the same amount of vertex space. So the value of OGRE_MAX_BLEND_WEIGHTS nor the actual blends per vertex count affects anything in this case.
This limits the max. bone count to 256 though (which is really high anyway) as the indices are 8-bit.
2. The weights are stored as floats. Having 4 weights occupies more space than 1. So this attribute is the one affected the most. But... (go to point 3)
3. At least on D3D9 (and IIRC OpenGL), if you declare "float4 BlendWght : BLENDWEIGHT;" when the vertex format only contains one blend weight, the other 3 weights will be zero'ed. So a shader prepared to deal with 4 weights per vertex can open a mesh with 1, 2, and 3 weights per vertex just fine. However opening a mesh with 4 weights per vertex with a shader prepared for 2 weights p/ vertex will glitch.
I don't know about D3D11 though, I know the API is much more strict regarding the vertex layout and the shader matching so my guess this trick won't work. But D3D11 is usually paired with RTSS (1.x) and HLMS (2.x) which automatically handle the issue, so not a big problem.
4. In practice, even though you can assign a 4-weights p/ vertex shader to a 2 weights p/ vertex model, or a 2 wpv shader to a 1 wpv model and they will work fine; you don't want to do that for performance reasons.
On a skeletally animated model, most of the vertex shader time is spent skinning. And you're literally doubling, tripling or even quadrupling its work with a 0 weight. The performance impact begins to notice.
You forgot that you need to match the caster's vertex shader as well!lunkhound wrote:So meshes get formatted one of 4 different ways as far as blend weights go, and the user is just supposed to know which format will be chosen, and assign the correct vertex shader to each one on a per-mesh basis? Horrible!
That means more permutations of shaders that need to be maintained--as if we don't have enough shader permutations to manage.
It means additional bookkeeping for users to keep the correct vertex shader permutations assigned to the correct meshes.
It means a fragile system that can break due to a seemingly innocuous change to a mesh--anything that changes the number of blend weights.
This is why I started the Hlms. The amount of permutations explodes and also assumes the artist can assign the right material with the right shader for every model.
The permutations are literally O(2^N) complexity: W/ skeleton, w/out skeleton. With 1/2/3/4 wpv. With shadows, without shadows. With normal mapping (normals and tangents need to be skinned too with 1/2/3/4 wpv), w/out normal mapping. With Instancing, w/out instancing.
Considering you want to support all of those combinations, that's 2^7 = 128 different shaders.
It's just too much to handle by hand. What's worst is that probably you will never use all those 128 permutations, but during production you don't know which ones will be used yet.
I don't think 8 bits have enough accuracy to store the weights, however storing the weights as unsigned shorts is something I'm strongly considering for 2.0. It's a bit tricky because with 1 wpv you want to use 1 float; as 1 short is 2 bytes; and the minimum the GPU requires is 4 bytes (so you would waste 2 bytes if you choose 16-bit for 1 wpv models).Why not just format with OGRE_MAX_BLEND_WEIGHTS weights for each vertex? Padding with zeros of course. If memory is a concern, why not use bytes for the weights instead of floats?
PS. To know how many weights per vertex a submesh uses (aka the return value from _rationaliseBoneAssignments), look at its vertex format; find the VES_BLENDWEIGHT semantic and get the weight per vertex count with VertexElement::getTypeCount.
- lunkhound
- Gremlin
- Posts: 169
- Joined: Sun Apr 29, 2012 1:03 am
- Location: Santa Monica, California
- x 17
Re: Mesh and the formatting of blend weights...
Hi dark_sylinc,
Thanks for chiming in. But I have to disagree with you on point #3. A shader expecting 4 weights per vertex doesn't work when passed fewer weights. I was wrong in my first post in this thread when I claimed that:
I guess it works the same as the POSITION semantic. In memory it is stored as 3-vectors, but when it gets to the shader it is a float4 with a 1.0 in the w field.
See this 9+ year old post about it. And it seems to be true for D3D11 as well. This explains why my 3-weights shader was working fine (as long as I set OGRE_MAX_BLEND_WEIGHTS to 3) but my 4-weights shader was glitching.
Also if you've only got 1 weight per vertex, you don't really need to store any weights at all -- they'd all be 1.
Thanks for the tip about digging out the wpv from the vertex format. I think I'll use that.
Thanks for chiming in. But I have to disagree with you on point #3. A shader expecting 4 weights per vertex doesn't work when passed fewer weights. I was wrong in my first post in this thread when I claimed that:
That's not what happens, actually the float4 received by the shader gets padded out with zeros -- except for the last component which gets a one!it'll see the 2 blend weights from the current vertex plus 2 unwanted blend weights from the next vertex
I guess it works the same as the POSITION semantic. In memory it is stored as 3-vectors, but when it gets to the shader it is a float4 with a 1.0 in the w field.
See this 9+ year old post about it. And it seems to be true for D3D11 as well. This explains why my 3-weights shader was working fine (as long as I set OGRE_MAX_BLEND_WEIGHTS to 3) but my 4-weights shader was glitching.
Also if you've only got 1 weight per vertex, you don't really need to store any weights at all -- they'd all be 1.
Thanks for the tip about digging out the wpv from the vertex format. I think I'll use that.
- lunkhound
- Gremlin
- Posts: 169
- Joined: Sun Apr 29, 2012 1:03 am
- Location: Santa Monica, California
- x 17
Re: Mesh and the formatting of blend weights...
I made a patch so that the user can choose which format to store blend weights in (float, short, or byte).
https://bitbucket.org/sinbad/ogre/pull- ... ights/diff
https://bitbucket.org/sinbad/ogre/pull- ... ights/diff