Unefficient pssm frustums calculation ?

Discussion area about developing with Ogre-Next (2.1, 2.2 and beyond)


Post Reply
Crashy
Google Summer of Code Student
Google Summer of Code Student
Posts: 1005
Joined: Wed Jan 08, 2003 9:15 pm
Location: Lyon, France
x 49
Contact:

Unefficient pssm frustums calculation ?

Post by Crashy »

Hi,

I've recently spotted an abnormally high polygon count in a simple scene, and after a few debugging I've seen that in most cases, object may be rendered multiple times across each split when using pssm.

As usual, let me say I'm using Ogre 2.0 but the FocusedShadowCameraSetup::getShadowCamera is almost similar in 2.1.


Let me show you with a few pictures, with debug colours to see splits.

Test case scenario 1:
  • Light direction is [0,-1,0]
  • Camera direction is Identity
The view point:
Image
The shadow map:
Image


Here everything is ok. Of course, due to padding, some objects are rendered twice, but nothing unexpected.

Text case scenario 2:
  • Light direction is [0,-1,0]
  • Camera has been rotated
The view point:
Image
The shadow map:
Image

As you can see most objects are rendered multiple times, which is not optimal at all. I've checked how the camera position and ortho size are calculated, everything is correct mathematically.

I don't think it's a bug, it's only that the way this function is computing the projection matrix cannot deal with oblique projection, because at the end it's just creating a basic orthographic projection matrix.

The 1.x FocusedShadowCameraSetup implementation is doing this differently, and creates an custom projection matrix, which avoids this kind of results.

I wonder if there is a way to basically improve this (maybe by trying to compute a different orientation for shadow camera so that it'll be somewhat "aligned" to the view-camera ?), or if I'll need to try porting the old code to Ogre 2

Thanks.
Follow la Moustache on Twitter or on Facebook
Image
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Unefficient pssm frustums calculation ?

Post by dark_sylinc »

Hi!

As you suspect, this is not a bug. PSSM is not a performance optimization, but rather a quality improvement.

That being said, there's things that could be added to Ogre to improve the situation.

This post explains how PSSM works.

Ogre 1.x used LiPSM on top of PSSM. I got rid of it in Ogre 2.0 because:
  1. Our implementation was glitchy. It was showing errors the original demo just did not exhibit
  2. LiPSM has terrible quality degradation on its edge cases, which happens to be the same edge cases as PSSM (i.e. it didn't improve PSSM where it mattered the most)
But as you said, there are things that can be done to improve performance of the shadow maps. One is very simple but I never put thought into it:

1. Roll the shadow map camera. The shadow map camera's orientation is set in CompositorShadowNode::_update:

Code: Select all

if( light->getType() != Light::LT_POINT )
    texCamera->setOrientation( light->getParentNode()->_getDerivedOrientation() );
else
    texCamera->setOrientation( Quaternion::IDENTITY );
However there is no reason for directional lights to keep the roll of the rotation to improve things. The frustum of the camera from the light's perspective should look approx like this:


Red is first split, blue is second split (I'm simplifying it to two splits, but applies to any number of splits)

It is no wonder the last split contains the whole thing, because the square must be enlarged enough, and that includes a lot of wasted space.
This could be improved if we simply rolled the camera:


and then we simply scale it:


I haven't put much thought on how to figure out the required roll for directional lights, but if you're willing to try, do some math and ultimately your code would boilt to:

Code: Select all

shadowCamera->setOrientation( shadowCamera->getOrientation() * rollQuat ):
shadowCamera->setOrthoWindow( tighter_bounds_than_before );
This solution requires no projection matrix fiddling, improves quality; but as a disadvantage shadow quality may swim/flicker as you rotate the camera (because the shadow map camera keeps changing its roll instead of staying stationary)

2. Alternatively, you could use enableCustomNearClipPlane. This was thought for planar reflections but it should work here as well.
Going back to the original example, there is a custom clipping plane, and objects behind that plane should not be rendered (in yellow):


Use enableCustomNearClipPlane for that yellow plane, and if all things work correctly, objects behind it will not be rendered and performance should go up.
Basically this is your idea of "making the projection oblique".

Advantages: Should be stable regardless of camera rotation (unlike previous method)
Disadvantages: Quality does not improve as the waste will still be there, custom near clip plane sacrifices depth buffer precision so there could be more shadow acne and stuff like that.

My time is limited so I haven't had the time to fight with neither of these techniques. Anyone is welcome to try them and tell us the results!
Crashy
Google Summer of Code Student
Google Summer of Code Student
Posts: 1005
Joined: Wed Jan 08, 2003 9:15 pm
Location: Lyon, France
x 49
Contact:

Re: Unefficient pssm frustums calculation ?

Post by Crashy »

Thanks !
I saw your post yesterday but pics weren't visible. Now they are :), and it's describing what I understood looking at the code.

Custom near plane may be the easiest one, however it seems it's only used when mProjType == PT_PERSPECTIVE. As I'm dealing with a directional light, camera is orthographic, so it won't work.

I don't bother if shadows flicker when rotating the camera, it's acceptable in my case (I'm more looking for performances)
Basically, what's aimed is to roll the shadow camera so that the main-camera near/far clip planes are aligned to the shadow-cam's local y axis
If I do this before computing the intersection between the shadow frustum and the main frustum, everything will be as tight as possible.

I'll try to figure how to compute this.

PS:Don't be sorry, I don't have as much time as I'd want either. I'm glad to help if I can.
Follow la Moustache on Twitter or on Facebook
Image
Crashy
Google Summer of Code Student
Google Summer of Code Student
Posts: 1005
Joined: Wed Jan 08, 2003 9:15 pm
Location: Lyon, France
x 49
Contact:

Re: Unefficient pssm frustums calculation ?

Post by Crashy »

Allright, I did something really naive right now, rolling the light node in CompositorShadowNode. I didn't roll the shadow tex cam because everything is computed in light-space in the focused shadow camera setup.

Code: Select all

                if( light->getType() != Light::LT_POINT )
                {
                    if (light->getType() == Light::LT_DIRECTIONAL)
                    {
                        Vector3 camDirLS = light->getParentNode()->_getDerivedOrientation().Inverse()*camera->getDerivedOrientation().zAxis();
                        camDirLS.z = 0; //erase z component
                        camDirLS.normalise();

                        Radian angle = Vector3(0, 1, 0).angleBetween(camDirLS);

                        //roll lightNode
                        light->getParentNode()->roll(angle);
                        light->getParentSceneNode()->_getDerivedOrientationUpdated(); //dirty, to force the transform to be up-to-date
                    }

                    texCamera->setOrientation( light->getParentNode()->_getDerivedOrientation() *
                                               Quaternion( Radian(Math::PI), Vector3::UNIT_Y ) );

                    
                }
                
In optimal cases where light is mostly oriented downwards, this is by far better:
Image

Image

However, if light is almost parallel to the view, things are better, but not perfect. This time the shadow-cam is too far from the split, and many things are rendered twice. For this picture, I used zPadding == 0 to be sure it wasn't a padding-induced result.
In such case, there is now easy way to tighten the frustums.

Image
Image

I made a superb drawing about this situation
Image
Follow la Moustache on Twitter or on Facebook
Image
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Unefficient pssm frustums calculation ?

Post by dark_sylinc »

Interesting! After writing my post, a few ideas popped out in my mind. I was thinking of having the X axis of the light match up the X axis of the camera.
You're aligning the Y axis of the light, with the Z axis of the camera; which is pretty much the same thing.

A couple remarks:
  1. The direction of the camera is -camera->getDerivedOrientation().zAxis() (watch out it's negative). Perhaps this can make your results better. Or maybe not
  2. I suspect there is an "optimal axis" to align. For example in the case where the camera dir is parallel to the light, your camDirLS = (0, 0, 0). And in such case you should switch to a different axis as reference, such as camera->getDerivedOrientation().xAxis or .yAxis (both positive). Perhaps this isn't just a fallback for when camDirLS = (0, 0, 0), but rather the results may be better once the camera and light are "parallel enough"
Also note that for the worst case scenario (both camera and light are parallel), there is not much you can do:


In this case, the split 0 is rendered by the highlighted light frustum (the box). It must render both spheres A and B.

When we go to split 1, both spheres A and B must be rendered (because sphere A is touching the bounds of the split).


When we go to split 2, only sphere B must be rendered.


No oblique or rolling can help you here. Though when the camera and light are 45° to each other, the problem gets exacerbated and oblique CAN help:


I am only focusing on split 1 here because it rests my case: Only spheres A and B must be rendered, however due to how big the frustum is; this new sphere I introduced (let's call it sphere C) does not have to be rendered, yet it will be.
The only way to fix that in this case, would be using an oblique projection (if that's even possible?).

By having the roll match though, you're fixing lots of waste (not shown on the graphs). But not all cases will be fixed as they're the ones I'm mentioning here.

Cheers
Crashy
Google Summer of Code Student
Google Summer of Code Student
Posts: 1005
Joined: Wed Jan 08, 2003 9:15 pm
Location: Lyon, France
x 49
Contact:

Re: Unefficient pssm frustums calculation ?

Post by Crashy »

Thanks for your pics, they're awesome.
At first I didn't understand why in case 1 the shadow frustum was so large, but indeed, if we want sphere B's shadow to be cast in first split, we need to do it this way.
The direction of the camera is -camera->getDerivedOrientation().zAxis() (watch out it's negative). Perhaps this can make your results better. Or maybe not
Just tried it, nothing spectacular, rotation is just negated.
I suspect there is an "optimal axis" to align. For example in the case where the camera dir is parallel to the light, your camDirLS = (0, 0, 0). And in such case you should switch to a different axis as reference, such as camera->getDerivedOrientation().xAxis or .yAxis (both positive). Perhaps this isn't just a fallback for when camDirLS = (0, 0, 0), but rather the results may be better once the camera and light are "parallel enough"
Yes, I was thinking about finding a fallback axis when light and camera are almost parallel. However, even without that, there is no visible glitch as is.
Follow la Moustache on Twitter or on Facebook
Image
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Unefficient pssm frustums calculation ?

Post by dark_sylinc »

Crashy wrote: Fri Aug 10, 2018 9:45 pm At first I didn't understand why in case 1 the shadow frustum was so large, but indeed, if we want sphere B's shadow to be cast in first split, we need to do it this way.
Actually the frustum for split 1 & 2 are as large as they're shown for split 0; it's just that I made a mistake at first where they were longer than they should; and I was lazy to modify all 3 pictures, so I only modified the one for split 0.
Crashy
Google Summer of Code Student
Google Summer of Code Student
Posts: 1005
Joined: Wed Jan 08, 2003 9:15 pm
Location: Lyon, France
x 49
Contact:

Re: Unefficient pssm frustums calculation ?

Post by Crashy »

Apart from that, there are still some bad case scenarios, like the one in this pic.
Image

Right now I wonder if I'm not going to add some kind of per-object max shadow distance.
I'll have a lot of tiny but high polygon count objects (that I can't afford to render multiple time, i think even instancing won't help there) along with a few larger ones.
I can deal wth little objects not casting shadows when they're out of the first split.
Follow la Moustache on Twitter or on Facebook
Image
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Unefficient pssm frustums calculation ?

Post by dark_sylinc »

Crashy wrote: Sat Aug 11, 2018 4:36 pm Apart from that, there are still some bad case scenarios, like the one in this pic.
Image
Yeah that could only be fixed if it is somehow possible to use enableCustomNearClipPlane with ortho views.
Crashy wrote: Sat Aug 11, 2018 4:36 pm I can deal wth little objects not casting shadows when they're out of the first split.
That's how many games address the problem. Some objects do not cast shadows unless they're not assigned to the closest splits (or even within the split, their shadow just pops into existence when you get close enough to the object).
It's on my TODO list.

We already have ObjectData::mUpperDistance (controlled via MovableObject::setRenderingDistance) however that is a global value (i.e. affects both regular rendering as well as shadow mapping, so it's not just the shadow what disappears but also the object itself) and you we would need a separate one for shadow mapping i.e. mUpperDistance[2] where mUpperDistance[0] is used for regular rendering and mUpperDistance[1] is for shadow mapping passes.

I wanted to put some time to think if there's a workaround to save memory (i.e. rather than consuming 4 more bytes in a float per MovableObject, use a 1-byte multiplier to apply to mUpperDistance, or something like that), but maybe I'm overthinking it (i.e. the benefits should far outweight the cost), and after all saving memory like that would mean adding more ALU and bandwidth costs (because now we have to read the original 4-byte value + the new 1-byte value, then convert the 1 byte value to a float, mask it in non-caster passes, and multiply the values together)

Edit: I will accept PRs adding this functionality. It should be trivial to add.
Edit 2: I can also accept a PR for the "roll" solution to improve shadows. Just two remarks: 1. The behavior should be toggleable at ShadowNode definition level (i.e. "use the old behavior), and 2. in your current implementation you're modifying the light's scene node to adjust the roll, which is ok for testing but for real code you can't write to a SceneNode that was created and is controlled by the user.
Crashy
Google Summer of Code Student
Google Summer of Code Student
Posts: 1005
Joined: Wed Jan 08, 2003 9:15 pm
Location: Lyon, France
x 49
Contact:

Re: Unefficient pssm frustums calculation ?

Post by Crashy »

We already have ObjectData::mUpperDistance (controlled via MovableObject::setRenderingDistance) however that is a global value (i.e. affects both regular rendering as well as shadow mapping, so it's not just the shadow what disappears but also the object itself) and you we would need a separate one for shadow mapping i.e. mUpperDistance[2] where mUpperDistance[0] is used for regular rendering and mUpperDistance[1] is for shadow mapping passes.
Is there any benefits from changing mUpperDistance into an array rather than just adding a new pointer ?
I never dig that much in that packed data stuff but it doesn't seem that complicated.
I will accept PRs adding this functionality. It should be trivial to add.
Ok.
Edit 2: I can also accept a PR for the "roll" solution to improve shadows. Just two remarks: 1. The behavior should be toggleable at ShadowNode definition level (i.e. "use the old behavior), and 2. in your current implementation you're modifying the light's scene node to adjust the roll, which is ok for testing but for real code you can't write to a SceneNode that was created and is controlled by the user.
Right. I may need to change the FocusedShadowCameraSetup to use the texCam orientation instead of the light orientation if we don't want to change the SceneNode.

The thing is that I'm working with 2.0, I don't know if it will be easy for you to merge the PR in 2.1.

I could also do it for 2.1 too, depending on the time I have.
Follow la Moustache on Twitter or on Facebook
Image
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Unefficient pssm frustums calculation ?

Post by dark_sylinc »

Crashy wrote: Sat Aug 11, 2018 5:15 pm Is there any benefits from changing mUpperDistance into an array rather than just adding a new pointer ?
I never dig that much in that packed data stuff but it doesn't seem that complicated.
Doing it as:

Code: Select all

Real        * RESTRICT_ALIAS    mUpperDistance[2];
or doing it as:

Code: Select all

Real        * RESTRICT_ALIAS    mUpperDistance;
Real        * RESTRICT_ALIAS    mUpperDistanceShadows;
Is the same thing, with the sole exception that in the former you can do this in MovableObject::cullFrustum:

Code: Select all

//Outside the loop
const bool isShadowMappingCasterPass = (sceneVisibilityFlags & ~LAYER_SHADOW_CASTER ) == 0; //or != 0? I forgot
//Inside the loop:
ArrayReal * RESTRICT_ALIAS upperDistance = reinterpret_cast<ArrayReal*RESTRICT_ALIAS>(objData.mUpperDistance[isShadowMappingCasterPass]);
Instead of using a branch:

Code: Select all

if( !isShadowMappingCasterPass )
    upperDistance = reinterpret_cast<ArrayReal*RESTRICT_ALIAS>(objData.mUpperDistance);
else
    upperDistance = reinterpret_cast<ArrayReal*RESTRICT_ALIAS>(objData.mUpperDistanceShadows);
Crashy wrote: Sat Aug 11, 2018 5:15 pm The thing is that I'm working with 2.0, I don't know if it will be easy for you to merge the PR in 2.1.
It should be very easy, all the stuff you're touching has barely changed in 2.1. CompositorShadowNode did change a bit more than the rest, but the parts you're modifying are still there.
Crashy
Google Summer of Code Student
Google Summer of Code Student
Posts: 1005
Joined: Wed Jan 08, 2003 9:15 pm
Location: Lyon, France
x 49
Contact:

Re: Unefficient pssm frustums calculation ?

Post by Crashy »

Allright, makes perfect sense.
Follow la Moustache on Twitter or on Facebook
Image
Crashy
Google Summer of Code Student
Google Summer of Code Student
Posts: 1005
Joined: Wed Jan 08, 2003 9:15 pm
Location: Lyon, France
x 49
Contact:

Re: Unefficient pssm frustums calculation ?

Post by Crashy »

Hi,

I've something working for the "shadow max visible distance", but before doing any commit and a PR, I just want to be sure this is the way you wanted it.

I have doubts about one specific change:

I added a new enum value in ObjectDataArrayMemoryManager::MemoryTypes : ShadowUpperDistance, and in ObjectDataArrayMemoryManager::ElementsMemSize I've got something like this:

Code: Select all

...
1 * sizeof( Ogre::Real ),       //ArrayMemoryManager::UpperDistance
1 * sizeof( Ogre::Real ),       //ArrayMemoryManager::ShadowUpperDistance
...
but I wonder if instead of doing that way, it wouldn't be better to keep only the UpperDistance enum and change the element size

Code: Select all

...
2 * sizeof( Ogre::Real ),       //ArrayMemoryManager::UpperDistance
...
Follow la Moustache on Twitter or on Facebook
Image
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Unefficient pssm frustums calculation ?

Post by dark_sylinc »

By using 1 value (the latter):
  1. Each value is 8 bytes apart instead of 4, thus less cache friendly (because the other value must be ignored while iterating) and consumes more bandwidth
By using 2 values (the former):
  1. Each value is 4 bytes apart (more cache friendly), and consumes less bandwidth
  2. ObjectData needs one more pointer (4 more bytes per Item on 32-bit OS, 8 more bytes per Item on 64-bit OS) thus making it fatter
I'm inclined to do the former (two values). Due to the cache friendliness and lower bandwidth, despite consuming more RAM.
Crashy
Google Summer of Code Student
Google Summer of Code Student
Posts: 1005
Joined: Wed Jan 08, 2003 9:15 pm
Location: Lyon, France
x 49
Contact:

Re: Unefficient pssm frustums calculation ?

Post by Crashy »

Just to add a small update on this topic as the distance-related PR has been accepted :

I've spotted cases where the orientation changes of the shadow camera gives weird result, and as I don't have that much time right now to look for a fix, I don't know when I'm going to submit this PR :)
Follow la Moustache on Twitter or on Facebook
Image
User avatar
TaaTT4
OGRE Contributor
OGRE Contributor
Posts: 267
Joined: Wed Apr 23, 2014 3:49 pm
Location: Bologna, Italy
x 75
Contact:

Re: Unefficient pssm frustums calculation ?

Post by TaaTT4 »

Hi @Crashy,

Are you aiming to implement what in Unity is called shadow pancaking (or something similar)?

Senior programmer at 505 Games; former senior engine programmer at Sandbox Games
Worked on: Racecraft EsportRacecraft Coin-Op, Victory: The Age of Racing

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Unefficient pssm frustums calculation ?

Post by dark_sylinc »

Shadow Pancaking is about tightening the depth resolution (i.e. the Z component) to fight shadow acne. I guess we could support the same by shrinking the distance between vMin.z & vMax.z (either by tweaking one of them, or both).

However Crashy here is talking about doing something similar but in the XY components, to increase effective resolution and fight aliasing artifacts. Technically it is doing the same but in the xy plane; i.e. reduce the distance between vMin.xy & vMax.xy

However changes in the XY plane tend to be way much more noticeable than changes in Z if you get it wrong. Perhaps it might make worth trying allowing artist-defined biases in the XY plane to see if looks well enough for certain cases, but personally I doubt that will work fine.
Post Reply