Ah! Custom passes. Should've started there!
OK I'll split this post in two.
Theoretical background
GPUs are incredibly parallel machines, with many thousands of threads running in parallel to process either vertices or pixels. Or both.
Write-able data has different restrictions vs e.g. data that is read-only. Not just in terms of access, but also in terms of how caches are synchronized or flushed (i.e. read-only data never needs to care about cache invalidation).
Additionally there are other technical things to considerate unrelated to parallelism, for example Depth Buffers have many acceleration structures such as min-max, Hi-Z, Early-Z, Z compression.
In some GPUs, these structures need to be decompressed/destroyed when you need to sample from it as a texture.
Therefore a texture needs to transition from "I'm a RenderTarget (with R/W access)" to "I'm a texture now (that is read only and could be sampled with trilinear filtering)".
In some cases GPUs have to issue a "stop the world" (barriers) because we can't read from a texture while a previous pass is still writing to it.
However if the 2nd pass doesn't use the texture in the vertex shader, vertex shader can start processing to give a head start (i.e. maximize parallelism as much a possible).
What happens exactly is very GPU-specific. Some barriers are required by certain HW, other HW will ignore it because it's not necessary.
Practical Terms in OgreNext
If you write to a target and then read from it, all in the same custom pass, you need to issue barriers.
OgreNext in general has been coded around one pass doing exactly that, just one pass; and what you're saying sounds like should be done at Node level (that's why nodes exist... they take an input, transform it via multiple passes like a black box, and provide an output for other nodes to consume).
But you can still do everything in one pass if you wish so (in fact CompositorPassMipmap does multiple passes all in just one pass).
Passes normally analyze and issue barriers in CompositorPass::analyzeBarriers
. However since you're doing multiple passes inside a CompositorPass, that is not enough.
Your code should look something like this between each pass:
Code: Select all
renderSystem->endRenderPassDescriptor();
mResourceTransitions.clear(); // resolveTransition() will add entries into mResourceTransitions
uint32 stagesThatWillUseYourTexture = (1u << GPT_VERTEX_PROGRAM)|(1u << GPT_FRAGMENT_PROGRAM);
resolveTransition( textureToSample, ResourceLayout::Texture,
ResourceAccess::Read, stagesThatWillUseYourTexture );
resolveTransition( textureYouWillWriteTo, ResourceLayout::RenderTarget,
ResourceAccess::ReadWrite, 0u );
renderSystem->executeResourceTransition( mResourceTransitions );
renderSystem->beginRenderPassDescriptor( ... );
renderSystem->executeRenderPassDescriptorDelayedActions();
// start rendering commands
I believe this code should be self-explanatory enough: resolveTransition()
is the main call you need (which accumulates into mResourceTransitions
) and executeResourceTransition( mResourceTransitions )
issues the actual barrier.
The resolve transition asks you what you will be using the texture for: Will it be a texture? A render target? for mipmap generation? an UAV? Just read from it, write only, or read-write? What stages will be accessing it? (0 if it doesn't apply, e.g. RenderTargets don't really belong to any stage, since it's the ROP that access it, not the pixel shader).
RenderTargets are usually read+write because of the potential of alpha blending. Also depth buffers are read+write by nature when used as render targets.
You should be able to figure it out with this information. In 99% of cases, this is simple and straightforward, but in a few cases it may get yucky if you need to issue a TextureGpu::copyTo
or a TextureGpu::_autogenerateMipmaps
call (because they involve the copy encoder which automatically changes layout outside of the barrier solver, unless explicitly instructed not to. Figuring out the right initial and final layouts after you're done ends up more a thing of trial and error, honestly).