[3.0][VK] Rendering to texture in custom compositor pass

Discussion area about developing with Ogre-Next (2.1, 2.2 and beyond)


zxz
Gremlin
Posts: 184
Joined: Sat Apr 16, 2016 9:25 pm
x 19

[3.0][VK] Rendering to texture in custom compositor pass

Post by zxz »

I have a custom compositor pass that creates a bunch of internal intermediate textures programmatically. This worked flawlessly with the GL3Plus rendersystem, but fails with Vulkan.

The intermediate textures are created like this:

Code: Select all

auto texture = getTextureGpuManager()->createOrRetrieveTexture(
    name, Ogre::GpuPageOutStrategy::Discard,
    Ogre::TextureFlags::RenderToTexture, Ogre::TextureTypes::Type2D);
    texture->setResolution(w, h);
    texture->setPixelFormat(Ogre::PFG_RGBA16_FLOAT);
    [...]

and a RenderPassDescriptor is created for each one in order to draw into it.

However, rendering into these triggers the following exception when using Vulkan:

OGRE EXCEPTION(1:InvalidStateException): Texture Bloom_1:58 is not in ResourceLayout::Texture nor RenderTargetReadOnly. Did you forget to expose it to compositor? Currently rendering to target: Bloom_2:59 in VulkanRenderSystem::checkTextureLayout at ogre-next-master/RenderSystems/Vulkan/src/OgreVulkanRenderSystem.cpp (line 1563)

I am not sure what needs to be done to live up to these requirements. I've found the mExposedTextures member of CompositorPassDef, but it doesn't seem applicable to my situation where the textures are internal, and created by the pass implementation, and thus unknown at the point of definition.

How is this supposed to be done?

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5433
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1341

Re: [3.0][VK] Rendering to texture in custom compositor pass

Post by dark_sylinc »

Hi,

How does the compositor pass look like?

Where is that texture used? (i.e. by what? a Material? a render_quad pass? how are you assigning the texture to the material?)

zxz
Gremlin
Posts: 184
Joined: Sat Apr 16, 2016 9:25 pm
x 19

Re: [3.0][VK] Rendering to texture in custom compositor pass

Post by zxz »

The textures are the actual targets of the custom pass, as intermediate textures used for computing the final result. The textures and their associated RenderPassDescriptors are passed to beginRenderPassDescriptor to render to them. It's rather similar to multiple quad-passes baked into one, with a few different shaders used (bloom downsampling, upsampling, filtering).

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5433
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1341

Re: [3.0][VK] Rendering to texture in custom compositor pass

Post by dark_sylinc »

Ah! Custom passes. Should've started there!

OK I'll split this post in two.

Theoretical background

GPUs are incredibly parallel machines, with many thousands of threads running in parallel to process either vertices or pixels. Or both.

Write-able data has different restrictions vs e.g. data that is read-only. Not just in terms of access, but also in terms of how caches are synchronized or flushed (i.e. read-only data never needs to care about cache invalidation).

Additionally there are other technical things to considerate unrelated to parallelism, for example Depth Buffers have many acceleration structures such as min-max, Hi-Z, Early-Z, Z compression.

In some GPUs, these structures need to be decompressed/destroyed when you need to sample from it as a texture.

Therefore a texture needs to transition from "I'm a RenderTarget (with R/W access)" to "I'm a texture now (that is read only and could be sampled with trilinear filtering)".

In some cases GPUs have to issue a "stop the world" (barriers) because we can't read from a texture while a previous pass is still writing to it.
However if the 2nd pass doesn't use the texture in the vertex shader, vertex shader can start processing to give a head start (i.e. maximize parallelism as much a possible).

What happens exactly is very GPU-specific. Some barriers are required by certain HW, other HW will ignore it because it's not necessary.

Practical Terms in OgreNext

If you write to a target and then read from it, all in the same custom pass, you need to issue barriers.

OgreNext in general has been coded around one pass doing exactly that, just one pass; and what you're saying sounds like should be done at Node level (that's why nodes exist... they take an input, transform it via multiple passes like a black box, and provide an output for other nodes to consume).

But you can still do everything in one pass if you wish so (in fact CompositorPassMipmap does multiple passes all in just one pass).

Passes normally analyze and issue barriers in CompositorPass::analyzeBarriers. However since you're doing multiple passes inside a CompositorPass, that is not enough.

Your code should look something like this between each pass:

Code: Select all

renderSystem->endRenderPassDescriptor();
mResourceTransitions.clear(); // resolveTransition() will add entries into mResourceTransitions
uint32 stagesThatWillUseYourTexture = (1u << GPT_VERTEX_PROGRAM)|(1u << GPT_FRAGMENT_PROGRAM);
resolveTransition( textureToSample, ResourceLayout::Texture,
                   ResourceAccess::Read, stagesThatWillUseYourTexture );
resolveTransition( textureYouWillWriteTo, ResourceLayout::RenderTarget,
                   ResourceAccess::ReadWrite, 0u );
renderSystem->executeResourceTransition( mResourceTransitions );

renderSystem->beginRenderPassDescriptor( ... );
renderSystem->executeRenderPassDescriptorDelayedActions();
// start rendering commands

I believe this code should be self-explanatory enough: resolveTransition() is the main call you need (which accumulates into mResourceTransitions) and executeResourceTransition( mResourceTransitions ) issues the actual barrier.

The resolve transition asks you what you will be using the texture for: Will it be a texture? A render target? for mipmap generation? an UAV? Just read from it, write only, or read-write? What stages will be accessing it? (0 if it doesn't apply, e.g. RenderTargets don't really belong to any stage, since it's the ROP that access it, not the pixel shader).
RenderTargets are usually read+write because of the potential of alpha blending. Also depth buffers are read+write by nature when used as render targets.

You should be able to figure it out with this information. In 99% of cases, this is simple and straightforward, but in a few cases it may get yucky if you need to issue a TextureGpu::copyTo or a TextureGpu::_autogenerateMipmaps call (because they involve the copy encoder which automatically changes layout outside of the barrier solver, unless explicitly instructed not to. Figuring out the right initial and final layouts after you're done ends up more a thing of trial and error, honestly).

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5433
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1341

Re: [3.0][VK] Rendering to texture in custom compositor pass

Post by dark_sylinc »

I forgot one particular case I should mention just in case: it may happen that we raise an exception because you essentially end up doing this:

Code: Select all

mResourceTransitions.clear(); // resolveTransition() will add entries into mResourceTransitions
resolveTransition( texture, ResourceLayout::Texture, ResourceAccess::Read, stages );
resolveTransition( texture, ResourceLayout::RenderTarget, ResourceAccess::ReadWrite, 0u );
renderSystem->executeResourceTransition( mResourceTransitions );

That is, you transition the same texture to both Texture and RenderTarget. This is invalid.
You can't transition the same resource to two different layouts at the same time.

You can't do that, you will need to use a ping pong intermediate texture. OpenGL will happily and silently let you do it, but you may notice that it doesn't work on every machine, or that there are flickering artifacts (more intense in some HW, less intense on other).

D3D11 will complain though (with different error messages, but esentially due to the same problem).

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5433
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1341

Re: [3.0][VK] Rendering to texture in custom compositor pass

Post by dark_sylinc »

Food for thought

Once you figure this out and get it working, you might notice some of your passes can be reordered to avoid barriers.

e.g. if you do:

  1. Write A

  2. Barrier (A -> from RenderTarget to Texture)

  3. Read A, Write to B

  4. Write C

  5. Barrier (C -> from RenderTarget to Texture)

  6. Read C, Write to D

And you can change it to:

  1. Write A

  2. Write C

  3. Barrier (A -> from RenderTarget to Texture, C -> from RenderTarget to Texture)

  4. Read A, Write to B

  5. Read C, Write to D

You will notice you end up with the same results, but just one barrier instead of two.

zxz
Gremlin
Posts: 184
Joined: Sat Apr 16, 2016 9:25 pm
x 19

Re: [3.0][VK] Rendering to texture in custom compositor pass

Post by zxz »

Thanks a lot for the knowledge dump! Much appreciated. I hope it will be useful for more people. It belongs in some kind of documentation, if it isn't already there.

With your explanation, I've managed to restore the previous functionality without too much difficulty.

I understand that these barriers limit the parallellism on the GPU, as the operations have to be done sequentially. However, that seems impossible to avoid when it comes to something like a downsampling/upsampling pyramid for bloom. I assume that OpenGL performed these kind of barriers automatically under the hood, and that there is no further pessimization added when doing them manually like this.

When it comes to your recommendation to use nodes for this kind of thing, I must say that I find it very cumbersome to define them in code, and very limiting to do it in scripts. For example, the downsampling needs a different amount of textures depending on the viewport size. For that reason, a custom pass that encompasses the whole bloom operation seemed like quite a simplification, and a logical high level building block.

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5433
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1341

Re: [3.0][VK] Rendering to texture in custom compositor pass

Post by dark_sylinc »

zxz wrote: Sun Mar 26, 2023 10:57 pm

Thanks a lot for the knowledge dump! Much appreciated. I hope it will be useful for more people. It belongs in some kind of documentation, if it isn't already there.

Agreed. Added a ticket to track this task.

zxz wrote: Sun Mar 26, 2023 10:57 pm

With your explanation, I've managed to restore the previous functionality without too much difficulty.

Great!

zxz wrote: Sun Mar 26, 2023 10:57 pm

I understand that these barriers limit the parallellism on the GPU, as the operations have to be done sequentially. However, that seems impossible to avoid when it comes to something like a downsampling/upsampling pyramid for bloom. I assume that OpenGL performed these kind of barriers automatically under the hood, and that there is no further pessimization added when doing them manually like this.

Correct in all accounts. Except that when it comes to Compute Shaders, barriers in OpenGL must also be issued manually (i.e. Compute Shader B depends on the results of CS A, or the Pixel Shader C depends on the results of CS B).

Regarding downsampling/upsampling pyramid, except for mobile\*, Compute Shader can do this much more efficiently using parallel reduction.

\*"Except for Mobile" because many mobile GPUs have horrible Compute Shader performance.

FidelityFX Single-Pass Downsampler can perform a reduction in a single pass of up to a 4096x4096 texture. Its documentation explains why/how it works.

zxz wrote: Sun Mar 26, 2023 10:57 pm

When it comes to your recommendation to use nodes for this kind of thing, I must say that I find it very cumbersome to define them in code, and very limiting to do it in scripts. For example, the downsampling needs a different amount of textures depending on the viewport size. For that reason, a custom pass that encompasses the whole bloom operation seemed like quite a simplification, and a logical high level building block.

I agree, won't argue there.
I usually define Compositor scripts, then edit them via C++, instead of generating everything from scratch via C++.
Like gluing defined nodes/passes via scripts together.