FSAA and HDR

TaaTT4 · Post by **TaaTT4** » Wed Oct 12, 2016 4:32 pm

Hi,

I'm writing this post because is not really clear to me how to make MSAA working in an HDR workflow.
Till now, I've just used post processing antialiasing like SMAA and FXAA.
Since now I'm adding support to VR, I need to replace those two techniques with MSAA.
In addition, SMAA can be combined with MSAA (to obtain what is called SMAA 2x) so even in a non VR context I can get benefits.

Before starting with questions, let me explain how my render chain is made.

At the beginning, there is a local float16 render target texture in which I render all the scene objects.

Code: Select all

compositor_node Common/Generator
{
	texture render_target target_width target_height PF_FLOAT16_RGBA depth_texture depth_format PF_D32_FLOAT_X24_S8_UINT depth_pool 1
	
	out 0 render_target
}

In that render target happens the following things:

Clear
Render opaque objects
Draw skybox (as full screen quad)
Render transparent objects

Then, I have a "splitter" node in which the render target texture above is cloned (to be precise, only the depth buffer is cloned).

Code: Select all

compositor_node Common/Splitter
{
	in 0 render_target
	
	texture render_target_clone target_width target_height PF_FLOAT16_RGBA depth_texture depth_format PF_D32_FLOAT_X24_S8_UINT depth_pool 2
	
	texture render_target_depth target_width target_height PF_D32_FLOAT_X24_S8_UINT depth_pool 1
	texture render_target_clone_depth target_width target_height PF_D32_FLOAT_X24_S8_UINT depth_pool 2
	
	target render_target_clone
	{
		pass depth_copy
		{
			alias_on_copy_failure true
			
			in render_target_depth
			out render_target_clone_depth
		}
	}
	
	out 0 render_target
	out 1 render_target_clone
}

These two render target texture are then ping-ponged between every post processing node.
This is a very standard way to proceed and it means that the render target which acts as input for a post processing node becomes the output of the subsequent one (and viceversa for the other render target).

The post processing effects I have are:

Bloom
Eye adaptation
Tone mapping
Color grading
SMAA
FXAA

At the very end, there is a node which blits the last post processing node output into the render window buffer.

Code: Select all

compositor_node Common/Compositor
{
	in 0 render_target
	in 1 render_target_window
	
	target render_target_window
	{
		pass render_quad
		{
			material Common/Quad
			
			input 0 render_target
		}
	}
}

Reading the forum here and there, what I've understood so far is:

I have to mark the texture in Common/Generator as explicit_resolve.
Somewhere, in the post processing chain, I have to put a resolve pass (after the color grading I guess).
This is both for performance boost (avoid to resolve at every pass) and to don't have unexpected jaggies (see here).

And now the doubts:

I've read about default and custom resolve pass, what's the difference?
The former is directly handled by the GPU driver while the latter is a sort of custom shader, is it correct?
The post processing node which come before the resolve pass have to deal with a multisampled texture (Texture2DMS in HLSL), what does it means?
Do I have to the apply the effect on every sample of the Texture2DMS texture?
Can you point me to an example that shows it?
Is it doable to have a single implementation of post processing effects which works with every level of MSAA (1x, 2x, 4x and so on...)?
After the render target texture has been resolved, I shouldn't need to use multisampled textures anymore.
Does it make sense to enable MSAA just for the texture in Common/Generator and disable it for render window?

I've seen some recent commits about MSAA, but they're only in the 2.1-pso branch.
Is MSAA bugged in 2.1 branch?
Is the time to switch branch finally come (scary!)?

Post by **dark_sylinc** » Wed Oct 12, 2016 5:03 pm

You figured out everything and right in all accounts.

A couple things before we start:

I have not tested explicit resolves thoroughly. Could be buggy/broken (long term plan is to support them. MSAA makes a huge difference).
MSAA for RenderTextures in D3D11 was broken, and was fixed in the PSO branch. (btw the PSO branch will "soon" be merged into main 2.1). GL was working fine btw.

Ok, now to the answers:

TaaTT4 wrote:Reading the forum here and there, what I've understood so far is:

I have to mark the texture in Common/Generator as explicit_resolve.

Somewhere, in the post processing chain, I have to put a resolve pass (after the color grading I guess).
This is both for performance boost (avoid to resolve at every pass) and to don't have unexpected jaggies (see here).

Correct.

TaaTT4 wrote:And now the doubts:
[*]I've read about default and custom resolve pass, what's the difference?
The former is directly handled by the GPU driver while the latter is a sort of custom shader, is it correct?

That's correct. Likely you'll want to use the default pass (which lets the GPU driver do it). But if for some reason you need to do it by hand (i.e. special weights to each subsample, you obtain extra information from it to put in an UAV, etc), we let you do that. (IIRC custom resolve passes are broken right now)

TaaTT4 wrote:[*]The post processing node which come before the resolve pass have to deal with a multisampled texture (Texture2DMS in HLSL), what does it means?

Before the texture is resolved, your shaders will have to declare textures like this in HLSL (only to sample the MSAA texture):

Code: Select all

Texture2DMS myRtt;

If you declare it as Texture2D myRtt; Direct3D will scream you with a runtime error in debug mode, and in release you'll just get garbage or a GPU reset.

TaaTT4 wrote:[*]Is it doable to have a single implementation of post processing effects which works with every level of MSAA (1x, 2x, 4x and so on...)?

This is the annoying part, which is why I haven't focused on explicit resolves yet. You will have to provide a different shader for the non-msaa version, the msaa 2x, the msaa 4x, the msaa 8x.
Right now you have to setup the material for the right MSAA combination.
My idea is to make an HLMS implementation similar to HLMS_LOW_LEVEL (may even replace it) that works like the Compute's Hlms which has been very useful so far.
Compute Hlms will parse the shader code via Hlms and let you regenerate the shader code according to the textures currently bound. For example

Code: Select all

@property( texture0_msaa )
Texture2DMS myRtt;
@end @property( !texture0_msaa )
Texture2D myRtt;
@end

@foreach( n, texture0_msaa_samples )
    @n //Iterate and sample through each msaa subsample.
@end

This works beautifully in Compute Shaders, and I want to bring this to rendering shaders as well.
Without a feature like this, setting up explicit resolves is a nightmare.

TaaTT4 wrote:[*]After the render target texture has been resolved, I shouldn't need to use multisampled textures anymore.

Exactly. Once it's resolved, it's just like any other regular texture.

TaaTT4 wrote:Does it make sense to enable MSAA just for the texture in Common/Generator and disable it for render window?

Yes, it makes perfect sense.

al2950 · Post by **al2950** » Thu Oct 13, 2016 9:07 am

I am currently using FXAA, mainly because I was under the impression we did not have enough control over resolves yet in 2.1. More to the point I did not think explicit resolve passes had been implemented, I certainly can not find them in the code. Am I looking in the wrong place!?

zxz · Post by **zxz** » Tue Jun 20, 2017 2:39 pm

The time has come for me to look into these issues as well. However, I am slightly confused regarding the underlying issues here.

Antialiasing works for me in the regular HDR sample (Sample_Hdr), without any extra work.

However, when I run my own application with the same compositor nodes as the sample, I get no antialiasing whatsoever (tonemapping and bloom works nicely though). My guess is that the floating point RTT does not get created with multisampling, while it somehow does in Sample_Hdr.

Questions:
1. What steps must be taken to ensure that AA works with the HDR compositing? (The HDR sample seems to do nothing special, yet AA works)
2. Since Sample_Hdr works with FSAA out of the box, what's the point of doing it any other way? Performance?

I am running OpenGL on Linux.

zxz · Post by **zxz** » Tue Jun 20, 2017 3:27 pm

Eh. Checking in to partially answer my own post.

It turns out that the floating point RTT did not get created with multisampling. This was due to the "finalTarget"'s mFSAA happening to be 0, even though the window was created with FSAA. This is most likely a self-inflicted bug, since we have patched the Ogre window creation in order to render to a Qt widget with a pre-existing context.

I was a bit quick on the draw there, but I suppose one question remains:

What do I have to gain from doing explicit resolves instead of out of the box FSAA?

Post by **dark_sylinc** » Tue Jun 20, 2017 5:13 pm

I'm glad you were able to fix your problem on your own.

As for your specific question:
MSAA can be thought as a two step process:

Render to MSAA surface
Resolve MSAA surface (resolve is a fancy way of saying weighed average)

The resolve would ideally happen as the last step after everything. However this is slow, thus it will happen a lot sooner. With explicit resolves you have control of when this happens.

The most notable postprocess that has a profound impact is tonemapping, which must happen before resolving. Otherwise there may or may not be aliasing on sharp edges, despite MSAA being used.

However this thread is a bit old. Since that time and now, I've modified the HDR sample already to make use of explicit resolves when it's available (IIRC it works only on D3D11 at the moment, not yet on GL & Metal; that issue should be fixed in 2.2).
See that the sample will choose between HdrWorkspace & HdrWorkspaceMsaa at runtime.

For more info look at https://mynameismjp.wordpress.com/2012/ ... -overview/

zxz · Post by **zxz** » Fri May 25, 2018 11:58 am

Hello again!

What is actually missing in GL3Plus in 2.1 in order for explicit resolves to work? Are there some fundamental missing pieces, or just unfixed bugs?

There is some sort of code mentioning explicit resolves in GL3Plus, but created textures seem to be forced to not use explicit resolves depending on the rendersystem's advertised capabilities.

As you mentioned, this is supposed to work in 2.2, but porting to 2.2 requires a bit too much work for us to tackle at the moment. It would be cool if there's a way to get this working in 2.1, even it it's just some ugly hack that can be applied locally.

We would really like to use this for reversible tonemapping which you wrote about in a previous progress report (as well as your link above).

Thanks!

Post by **dark_sylinc** » Fri May 25, 2018 2:52 pm

Hi!

There are two parts to this, an easy and a not so easy (but not impossible hard).

The easy part has already been taken care, as RenderSystems/GL3Plus/include/OgreGL3PlusTexture.h getGLID already checks on whether MSAA is dirty and returns the raw MSAA surface if explicit resolves are set.
However because we never set RSC_EXPLICIT_FSAA_RESOLVE in GL3PlusRenderSystem::createRenderSystemCapabilities, Ogre always forces explicit resolves to off.

The not-so-easy part is that GL3+ shares MSAA surfaces between textures. That means if texture A and texture B are MSAA, they will share the same surface (as long as the textures have same resolution, format & msaa count). This is incompatible with explicit resolves because if you render to texture A, then to texture B, then try to access the MSAA contents of A, you will see what B rendered.
This isn't a problem if you render to A, then immediately after try to see what A rendered. If that's your case then you can just set RSC_EXPLICIT_FSAA_RESOLVE and it should work.

The code in GL3PlusFBOManager::requestRenderBuffer is where MSAA surfaces are created and shared (look that they're reference counted). If you fix that, so that each texture gets its own private MSAA surface (at least when explicit resolves were asked) then it should work for all cases.

Additionally MSAA in GL3+ had other problems (e.g. of the top of my head MRT + MSAA did not work as intended) that blew in my face when I tried to fix this, which is why I didn't put more effort into it.

D3D11 & Metal always get their own private MSAA surface, so explicit resolves work there.

zxz · Post by **zxz** » Fri May 25, 2018 8:24 pm

I think the sharing is acceptable in my use case. I just need to do the manual resolve after rendering, then use the resolved texture for further postprocessing.

Another issue seems to be when the resolving pass is run and wants to bind the texture for reading. The texture binding code calls GL3PlusTexture::getGLID to get the actual GL texture ID to bind. The getGLID function always returns the regular (non-multisampled) texture ID. When the texture is set to be multisampled, the getGLID runs

Code: Select all

renderTarget->getCustomAttribute( "GL_MULTISAMPLEFBOID", &retVal );

but never uses that value and returns mTextureID. Even so, the FBO ID isn't what's needed for binding the texture for reading, so it doesn't really help. Because of that, Ogre tries to bind a non-multisampled texture to a multisampled bind target.

I looked further into that FBO used for multisampled operations, and it internally uses a GL renderbuffer as the color attachment. From what I can read, renderbuffers aren't suited for read operations as well. So there seems to be no multisampled texture that can be bound for read operations anywhere.

I find all this code quite convoluted and most likely unnecessarily complex for what it's doing. I understand the desire to throw much of it out.

Post by **dark_sylinc** » Fri May 25, 2018 8:52 pm

zxz wrote: ↑Fri May 25, 2018 8:24 pm Another issue seems to be when the resolving pass is run and wants to bind the texture for reading. The texture binding code calls GL3PlusTexture::getGLID to get the actual GL texture ID to bind. The getGLID function always returns the regular (non-multisampled) texture ID. When the texture is set to be multisampled, the getGLID runs
Code: Select all
renderTarget->getCustomAttribute( "GL_MULTISAMPLEFBOID", &retVal );
but never uses that value and returns mTextureID. Even so, the FBO ID isn't what's needed for binding the texture for reading, so it doesn't really help. Because of that, Ogre tries to bind a non-multisampled texture to a multisampled bind target.

Oh. That's a bug.

zxz wrote: ↑Fri May 25, 2018 8:24 pmI looked further into that FBO used for multisampled operations, and it internally uses a GL renderbuffer as the color attachment. From what I can read, renderbuffers aren't suited for read operations as well. So there seems to be no multisampled texture that can be bound for read operations anywhere.

Ah, now I remember what blew in my face.

Yes, GL creation code is different for MSAA surfaces that are only meant to be automatically resolved (link to FBO binding), than from MSAA surfaces that can be used as a texture (link to FBO binding).

The difference is that the driver can make more optimizations (compression, invalidation and cache flushing) if it knows it won't be used as a texture. However GL makes this excruciating painful by having two completely different code paths instead of having just one with a boolean flag like D3D11 & Metal do.

At that time I didn't fully know both codepaths to replace the current renderbuffer path with the rendertexture path, but you have the advantage you can see 2.2's code as a guiding point. In 2.2's lingo, when texture->isTexture() returns false it means the texture is guaranteed to not be used for sampling as a texture (even if it's not MSAA, e.g. UAV textures can meet this criteria, RenderWindows can also meet this), and if hasMsaaExplicitResolves is false, then we can guarantee the MSAA surface won't be used as a texture.

zxz wrote: ↑Fri May 25, 2018 8:24 pmI find all this code quite convoluted and most likely unnecessarily complex for what it's doing. I understand the desire to throw much of it out.

Pretty much.

zxz · Post by **zxz** » Sat May 26, 2018 1:40 am

Hello!

I got it to work with only a rather small amount of extra code. An ugly hack of course.

1. Advertise RSC_EXPLICIT_FSAA_RESOLVE in GL3PlusRenderSystem.
2. Create a multisampled texture instead of a renderbuffer in GL3PlusRenderBuffer when multisampling is enabled and bind this to the FBO when asked to.
3. Add getter to GL3PlusRenderBuffer to get the multisample texture ID from (2), as well as to GL3PlusFrameBufferObject in order to reach the texture ID.
4. Add a new attribute in GL3PlusFBORenderTexture::getCustomAttribute, "GL_MULTISAMPLETEXID" which gives the texture ID of the multisample texture (uses accessor function from 3).
5. Adjust GL3PlusTexture::getGLID to query "GL_MULTISAMPLETEXID", and to return this instead of unconditionally returning mTextureID.

Those few changes made explicit resolves work for me.

To get the reversible tonemapping resolver to work, it was necessary to also fix a few things in Samples/Media/2.0/scripts/materials/HDR/GLSL/Resolve_4xFP32_HDR_Box_ps.glsl, since there is no way it can build in the current state. Some arguments were swapped, and it was missing a uniform declaration for the luminance texture sampler.

TaaTT4 · Post by **TaaTT4** » Tue Jul 17, 2018 6:10 pm

Hey guys,

I'm resuming this quite old thread both to ask for help and to share some knowledge. After a major overhaul of my engine, fun has finally come! It is time to tackle the post processing chain. The goal is to add some fancy effects to an HDR rendering pipeline. I'll try to use this thread as a work in progress log.

First of all, I semi-cite myself to summarize the context:

TaaTT4 wrote: ↑Wed Oct 12, 2016 4:32 pm Let me explain how my render chain is made.

At the beginning, a float16 render target is created.
Code: Select all
compositor_node Common/RenderTarget
{
	texture render_target target_width target_height PF_FLOAT16_RGB depth_texture depth_pool 1

	out 0 render_target
}
Then, in that render target, the following things happens:

The render target is cleared

Opaque items are rendered

The skybox is rendered (as a full screen quad)

Transparent items are rendered

After that, there is another workspace node that takes care to draw the overlays above the render target used so far.

Finally, at the very end, the last workspace node blits the render target into the render window buffer.
Code: Select all
compositor_node Common/Present
{
	in 0 render_target
	in 1 render_window

	target render_window
	{
		pass render_quad
		{
			material Ogre/Copy/4xFP32

			input 0 render_target
		}
	}
}

On my todo list, first things first is to enable MSAA. Unfortunately, in an HDR pipeline is not enough to set an FSAA level greater than 0 in the render system config options. These are the additional steps I have to take:

In the Common/RenderTarget workspace node, mark the render target as explicit resolvable (with explicit_resolve)
Between the rendering of transparent items and overlays, add an explicit resolve pass.

The resolve pass is needed to transform an MSAA surface in a non MSAA surface (in HLSL terms: from Texture2DMS to Texture2D). Due to HDR, I have to tonemap (with a lightweight operator) every MSAA sample before merging them together. Look at OGRE Sample_Hdr for code and here and there for the reasons behind.

That's all for what concern the colour buffer, but what about for the depth buffer? Even if it's not strictly required at the moment, many post processing effects will need it. How can it be resolved? I've read that blending together the depth samples in not a good idea. What other options do I have?

Post by **dark_sylinc** » Tue Jul 17, 2018 10:03 pm

TaaTT4 wrote: ↑Tue Jul 17, 2018 6:10 pm That's all for what concern the colour buffer, but what about for the depth buffer? Even if it's not strictly required at the moment, many post processing effects will need it. How can it be resolved? I've read that blending together the depth samples in not a good idea. What other options do I have?

When it comes to downsampling an MSAA depth buffer to a regular one you have several choices:

Pick one subsample. You can pick a random one (e.g. subsample 0). If you know the locations (see MSAA Sample Pattern Detection we plan on providing this out of the box for 2.2...) or you use D3D11's standard locations (I think 2.1 does not use them, so you would have to modify the RenderSystem), you could try picking the one closest to the center
Max all subsamples. i.e. value = max( value, subsample[ i ] )
Min all subsamples. i.e. value = min( value, subsample[ i ] )
Average. Blegh. This rarely works well
Pick the mode. When doing 4xMSAA and above, if three out of four subsamples are close to each other (based on some arbitrary threshold or std deviation, etc) then pick between those 3. If two out of four subsamples are close together and the other 2 are distant appart, pick between those 2. Rather than a picking method, this method is actually for filtering out the outliers.

All of these are approximations, downsampling a depth buffer is never perfect and each approach has different drawbacks depending on what you're trying to do.

TaaTT4 · Post by **TaaTT4** » Wed Aug 08, 2018 7:37 pm

dark_sylinc wrote: ↑Tue Jul 17, 2018 10:03 pm When it comes to downsampling an MSAA depth buffer to a regular one you have several choices:
Pick one subsample. You can pick a random one (e.g. subsample 0). If you know the locations (see MSAA Sample Pattern Detection we plan on providing this out of the box for 2.2...) or you use D3D11's standard locations (I think 2.1 does not use them, so you would have to modify the RenderSystem), you could try picking the one closest to the center

Max all subsamples. i.e. value = max( value, subsample[ i ] )

Min all subsamples. i.e. value = min( value, subsample[ i ] )

Average. Blegh. This rarely works well

Pick the mode. When doing 4xMSAA and above, if three out of four subsamples are close to each other (based on some arbitrary threshold or std deviation, etc) then pick between those 3. If two out of four subsamples are close together and the other 2 are distant appart, pick between those 2. Rather than a picking method, this method is actually for filtering out the outliers.
All of these are approximations, downsampling a depth buffer is never perfect and each approach has different drawbacks depending on what you're trying to do.

I guess I'll start with taking the subsample 0. In OGRE 2.1 (D3D11), it's the closest sample to pixel center. Depth buffer, from that point on, will be used just in some post-processing effects so I don't believe I'll need super-accurate values.
How about the shader to do the depth buffer resolution? Is there any OGRE sample/reference I can look for?

Post by **dark_sylinc** » Wed Aug 08, 2018 8:04 pm

TaaTT4 wrote: ↑Wed Aug 08, 2018 7:37 pm How about the shader to do the depth buffer resolution? Is there any OGRE sample/reference I can look for?

If I understood what you ask correctly, checkout material Ogre/Depth/DownscaleMax which is defined in Samples/Media/2.0/scripts/materials/Common/DepthUtils.material

TaaTT4 · Post by **TaaTT4** » Thu Aug 09, 2018 4:46 pm

dark_sylinc wrote: ↑Wed Aug 08, 2018 8:04 pm If I understood what you ask correctly, checkout material Ogre/Depth/DownscaleMax which is defined in Samples/Media/2.0/scripts/materials/Common/DepthUtils.material

That's exactly what I was looking for, thanks!

I studied a bit HLSL documentation about the topic and, if I've understood well, using the SV_Depth semantic means that I'm writing just in the depth buffer. What if I need to resolve even the stencil buffer? I suppose I have to do a second pass using the SV_StencilRef semantic, am I right? Is it possible to resolve both depth and stencil buffer in a single pass?

Ogre Forums

FSAA and HDR

FSAA and HDR

Re: FSAA and HDR

Re: FSAA and HDR

Re: FSAA and HDR

Re: FSAA and HDR

Re: FSAA and HDR

Re: FSAA and HDR

Re: FSAA and HDR

Re: FSAA and HDR

Re: FSAA and HDR

Re: FSAA and HDR

Re: FSAA and HDR

Re: FSAA and HDR

Re: FSAA and HDR

Re: FSAA and HDR

Re: FSAA and HDR