Because it's not?
You're omitting a lot of details in the implementation.
When you do:
Code: Select all
finalVertex = viewProj * (worldSkinning[idx] * inputVertex);
somethingElseOut = viewProj * somethingElseIn;
Ogre needs to do: Concatenate N world matrices for M objects and send them. Per pass, it sends the viewProj matrix.
When you do:
Code: Select all
finalVertex = viewProjMatrix * (localSkinning[idx] * inputVertex);
somethingElseOut = viewProj * somethingElseIn;
Ogre needs to do: send N local matrices for M objects, concatenate one extra world matrix against viewProj and send it for M objects. Per pass, it sends the viewProj matrix.
From the GPU side, having one extra matrix consumes more SGPR, which increases register pressure.
CPU side, In theory we send 64 more bytes than in the previous one, but we save lots of matrix concatenations by not having to do world * localSkinning[ i] in the CPU. However in practice the user will need this data anyway (e.g. bone attachments) thus it cannot be avoided.
If the skeletal system is specialized for a particular purpose (e.g. crowds) where you can get away with not doing world * localSkinning[ i] in the CPU, then local space is likely to be a win (unless you're heavily bandwidth bound or heavily limited by SGPR pressure... the latter being extremely rare). Bonus points if you don't need viewProj * somethingElseIn, which means you can skip sending viewProj.
But again that works on specialized cases, and a generic engine often can't assume that.
This discussion also ignores that often the vertex shader may need to know the final position in world space, hence you actually need:
Code: Select all
worldSpaceVertex = worldSkinning[idx] * inputVertex;
// Do something with worldSpaceVertex
finalVertex = viewProj * worldSpaceVertex;
somethingElseOut = viewProj * somethingElseIn;
vs
Code: Select all
worldSpaceVertex = worldMatrix * (localSkinning[idx] * inputVertex);
// Do something with worldSpaceVertex
finalVertex = viewProj * worldSpaceVertex;
somethingElseOut = viewProj * somethingElseIn;
In this case, the vertex shader needs to do one extra matrix multiplication per object, per vertex, which is definitely more expensive.