dark_sylinc wrote: ↑Fri Apr 08, 2022 4:10 pm
More than a decade doing graphics and I never knew about this.
GRAAAAAAAAAAHHHHHH.
I used to know this many years ago, but had managed to forget it until I rediscovered it now. There used to be a built-in uniform called gl_NormalMatrix
that contained transpose(inverse(gl_ModelViewMatrix))
, which you had to use instead of the model-view matrix for transforming normals.
As for some basic scaling discussion I see a few different use-cases (for regular meshes shown in the scene), each of which might have different requirements and solutions. In our case right now, the reason why we need to scale some objects is that some meshes are designed with a centimeter scale, while others have a meter scale. Right now, we apply a (uniform) scaling factor to the various scene nodes to cancel that difference. This kind of use-case could be solved by requiring more uniform input data, or some kind of load-time unit conversion. In practice, this turns out to be a bit difficult logistically, and requiring complete content uniformity also comes with a cost (just not in runtime), while the scene-node scaling is very convenient. There is also the use case where a single mesh is presented with various scalings to add some amount of variety to a set of objects. This use-case requires proper scaling support, and sometimes even non-uniform scaling. There is of course a third case where some completely dynamic scaling is used for whatever reason, where no amount of pre-processing would do any good. I don't know how common that is.
dark_sylinc wrote: ↑Fri Apr 08, 2022 4:10 pm
Calculating & sending another (3x3) matrix per object would be expensive.
Do you have any estimation of the cost of providing the extra matrix/matrices? I can't easily judge how expensive it would be relative to all the existing work being done. I suppose that having to compute at least one inverse for each object is not great (plus the cost of the extra uniform data). In the current normal offset bias calculation, the normal is required both in view space and world space, requiring yet another matrix. Maybe there is a way to avoid this by moving calculations to different vector spaces.
dark_sylinc wrote: ↑Fri Apr 08, 2022 4:10 pm
However I can't help like we're missing something. Like, undoing that operation shouldn't require 9 floats (well, I guess we could send a quaternion and multiply the normals against that, in order to get proper rotations without scale, but still it wouldn't be cheap as it would kick the alignments out of whack)
The problem with this alternative is that a simple rotation can't describe the transformation caused by non-uniform scaling. The inverse-transpose ends up applying the inverse of the original scaling operation (as well as rotations etc).
dark_sylinc wrote: ↑Fri Apr 08, 2022 4:10 pm
Edit: Oh we don't have quaternion data readily available because the world matrix is calculated out of compound operations based on inheritance.
Edit 2: An easy alternative would be to calculate the inverse-transpose in the vertex shader. This is expensive, but due to how the Hlms works, we can toggle it per object or per material (i.e. as an exception).
Edit 3: For skeletal animation it would be costly to do, at any point (CPU or GPU)
What is your opinion on the simplified case of always handling uniform scaling correctly? I feel like this question might benefit from being answered separately, while the non-uniform case might need more discussion, and possibly a more complicated solution. The solution here is as simple as adding a couple of normalization calls.
When it comes to non-uniform scaling, I am not sure which variant strikes the best balance between correctness, flexibility and performance.
A compilation option is the least flexible variant, and feels quite heavy handed. Even in most situations where non-uniform scaling is present, I would guess that a majority of the objects would not be scaled non-uniformly, in which case such a coarse option would pessimize all other objects. Secondly, it would prevent the same Ogre build from being used in projects where some have a need for correct non-uniform scaling behavior, and others prioritize performance. It would double the number of required build variants. I can't say that I'm very enthusiastic about this solution.
A more fine-grained approach would be preferable when it comes to ease of use and flexibility.
There seems to be two basic ways of solving the non-uniform case. Doing the calculation on the CPU (per object) or the GPU (per vertex). Both of these variants would benefit from an optimization where the cost is only borne by objects that require it. Both the CPU and GPU variants would have to choose different shaders depending on whether or not non-uniform scaling is present in the transformation. For the CPU case, the difference would be if extra uniforms are provided to the shader (containing the inverse-transpose matrix), or not, while the GPU case would switch between shaders where the transpose-inverse is performed per-vertex or not. However, there is also a cost to determining whether or not a non-uniform scale is present in the final transformation matrix (which is affected by the whole chain of parent nodes). I guess it could be handled a user-provided flag to avoid that cost, but that also feels rather ugly. How feasible do you think it is to make a dynamic choice per object, deciding if non-uniform scaling is required or not?
I can't easily judge the relative costs between these various solutions, and none of them feels all that great.
TLDR (only questions):
Do you have any estimation of the cost of providing the extra matrix/matrices?
What is your opinion on the simplified case of always handling uniform scaling correctly? (while solving the non-uniform case separately)
How feasible do you think it is to make a dynamic choice per object, deciding if non-uniform scaling is required or not?