[2.3] Vulkan Progress

Discussion area about developing with Ogre-Next (2.1, 2.2 and beyond)


User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: [2.3] Vulkan Progress

Post by dark_sylinc »

So I've been wondering for a while why performance was always half of the other APIs (it was very suspicious). I took a deep down in profiling today to find out we were stalling every frame!

Now that's a big facepalm.

I think I remember putting that there until we fixed more bugs.

Performance is now on par with the other APIs.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: [2.3] Vulkan Progress

Post by dark_sylinc »

Hotshot5000 wrote: Wed Aug 05, 2020 9:46 pm
dark_sylinc wrote: Wed Aug 05, 2020 8:34 pm Btw how are you measuring GPU consumption?
I don't. That 1GB was of system ram reported by Visual Studio's Diagnostic Tools. I just looked at the Process Memory graph.
I'm not sure if these are related, but I noticed in Windows (via taskmgr) I was consuming 2GB of VRAM in PbsMaterials which was completely off. I also noticed changing reducing mDefaultPoolSize[CPU_INACCESSIBLE] lowered the consumed memory exponentially.

Two simple bugs were causing a major memory waste.
Hotshot5000
OGRE Contributor
OGRE Contributor
Posts: 226
Joined: Thu Oct 14, 2010 12:30 pm
x 56

Re: [2.3] Vulkan Progress

Post by Hotshot5000 »

I am not sure I understand what TextureFlags::MsaaExplicitResolve means.

From my (very limited) understanding of multisampling in Vulkan it seems that when multisampling is enabled the VulkanTextureGpu should also create a mMsaaFramebufferName image and imageview. So the colour and depth textures will also be mirrored with Msaa textures.

With these new textures the VulkanRenderPassDescriptor should then create new VkAttachmentDescription and VkAttachmentReference and the reference should be bound to subpass.pResolveAttachments.

Then in the pipeline multisampling.rasterizationSamples = msaaSamples; but this is already done.

What would the non explicit resolve scenario look like?

Should I add the explicit resolve to the VulkanTextureGpuManager::createTextureGpuWindow() and VulkanTextureGpuManager::createWindowDepthBuffer() ?
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: [2.3] Vulkan Progress

Post by dark_sylinc »

Let's forget about RenderWindows, because if you look at them, some implementations will confuse you.

Look at regular RenderTextures:

Without Explicit Resolves:
  • Both mMsaaFramebufferName and mFinalTextureName get populated.
  • mMsaaFramebufferName is MSAA. Created without the ShaderRead flag.
  • mFinalTextureName is not MSAA
  • Rendering happens to mMsaaFramebufferName
  • Once rendering is over, mMsaaFramebufferName gets resolved into mFinalTextureName
  • Using it as a Texture for sampling, you will always read from mFinalTextureName. Never from mMsaaFramebufferName
This is the case where you don't care about MSAA internals. You just want antialiasing and display to the screen.

With Explicit Resolves:
  • mMsaaFramebufferName is not used
  • mFinalTextureName is MSAA. Created with the ShaderRead flag.
  • Rendering happens to mFinalTextureName (same as if MSAA were disabled)
  • Using it as a Texture will read from mFinalTextureName, which is an MSAA surface
  • To resolve it, the RenderPassDescriptor::mResolveTexture must be explicitly set with another TextureGpu specifying where to resolve (see CubemapRendererNodeMsaa in Tutorial_DynamicCubemap.compositor)
For some reason (most likely postprocessing, HDR, etc), you need access to MSAA directly. And you know a bit or two of how MSAA works (or you just copy/paste our HDR+MSAA custom resolve shaders).
Or your texture is not Type2D (e.g. a cubemap, 3D, 2DArray, etc)

Enter RenderWindows:
Render Windows are a special breed because every API is different:
  • OpenGL allows the window surface to be MSAA. Resolving happens internally in the driver or the OS. Hence here it behaves like explicit resolves, so we set the MsaaExplicitResolveFlag and only mFinalTextureName is used (which is an MSAA swapchain)
  • D3D11 is the same as OpenGL, except when using DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL
    or _FLIP_DISCARD (i.e. UWP apps). In this case of UWP apps, the resolve must happen in the application. Therefore we do not set MsaaExplicitResolveFlag and mMsaaFramebufferName was created by us, while mFinalTextureName contains the swapchain the OS handed us
  • Metal is like UWP apps. The application is expected to do the resolve. The swapchain cannot be MSAA. Therefore we do not set MsaaExplicitResolveFlag and mMsaaFramebufferName was created by us, while mFinalTextureName contains the swapchain the OS handed us
I believe Vulkan is like UWP and Metal unless I'm mistaken. I didn't research too much into it.

I hope this clarifies everything

This post also talks further about implicit and explicit resolves, and MSAA.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: [2.3] Vulkan Progress

Post by dark_sylinc »

Compute Shaders are now working!
Compute.png
We're reaching end of the line now. I need to port the remaining shaders, fix the bugs that arise; and implement the details like MSAA, window config (which hopefully Hotshot5000 should be done by the time I'm done), resizing the window, detecting which extensions are unsupported, fixing the other RenderSystem with the barrier changes, etc.

Btw Hotshot5000, if MSAA is too much for you I can later take care of it; just leave the FSAA setting with dummy options. For me it should be easy.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: [2.3] Vulkan Progress

Post by dark_sylinc »

Added Documentation with examples to RootLayout in case someone had no idea what was going on.
Hotshot5000
OGRE Contributor
OGRE Contributor
Posts: 226
Joined: Thu Oct 14, 2010 12:30 pm
x 56

Re: [2.3] Vulkan Progress

Post by Hotshot5000 »

I've made a commit with what I have until now. MSAA still not working I get

Code: Select all

Ogre: ERROR: [Validation] Code 0 :  [ VUID-VkSubpassDescription-layout-02519 ] Object: VK_NULL_HANDLE (Type = 0) | vkCreateRenderPass(): subpass 0 already uses attachment 0 with a different image layout (VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL vs VK_IMAGE_LAYOUT_UNDEFINED). The Vulkan spec states: If any attachment is used by more than one VkAttachmentReference member, then each use must use the same layout (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-VkSubpassDescription-layout-02519)
Ogre: ERROR: [Validation] Code 0 :  [ VUID-VkAttachmentReference-layout-00857 ] Object: VK_NULL_HANDLE (Type = 0) | Layout for color attachment reference 1 in subpass 0 is VK_IMAGE_LAYOUT_UNDEFINED but should be COLOR_ATTACHMENT_OPTIMAL or GENERAL. The Vulkan spec states: If attachment is not VK_ATTACHMENT_UNUSED, layout must not be VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_PREINITIALIZED, VK_IMAGE_LAYOUT_PRESENT_SRC_KHR, VK_IMAGE_LAYOUT_DEPTH_ATTACHMENT_OPTIMAL_KHR, VK_IMAGE_LAYOUT_DEPTH_READ_ONLY_OPTIMAL_KHR, VK_IMAGE_LAYOUT_STENCIL_ATTACHMENT_OPTIMAL_KHR, or VK_IMAGE_LAYOUT_STENCIL_READ_ONLY_OPTIMAL_KHR (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-VkAttachmentReference-layout-00857)
in VulkanCache::getRenderPass()

I'll fix this issue around the beginning of the next week (hopefully by Tuesday) and after that I'll take a small vacation for a few days maybe one-two weeks. I'll still check this thread from time to time so if there are any questions leave them here and I'll answer them.

It's summertime here and I need a break. I haven't been out except to the corner shop in the last 3-4 months because of the Covid-19 situation. All I have been doing is just work at the job + work on my game + work on Ogre::Vulkan :)
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: [2.3] Vulkan Progress

Post by dark_sylinc »

It took me the whole day and I'm exhausted, but LocalCubemapsManualProbes is finally working! Error free!
LocalCubemapsManual.jpg
I need to fix a couple issues that I spotted that have little to do with Vulkan (Valgrind is reporting uninitialized memory reads in our HlmsPassPso caches).

After that I'll look into integrating your MSAA + window config changes before my code keeps diverging (I can already foresee some merge conflicts from today's changes...)
Hotshot5000
OGRE Contributor
OGRE Contributor
Posts: 226
Joined: Thu Oct 14, 2010 12:30 pm
x 56

Re: [2.3] Vulkan Progress

Post by Hotshot5000 »

Small update from me here. The swapchain image looks like it contains the anti-aliased image but for some reason the screen flickers and I get:

Code: Select all

Ogre: ERROR: [Validation] Code 0 :  [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object: 0x211ffa95a50 (Type = 6) | Submitted command buffer expects VkImage 0x41ab840000000021[RenderWindow] (subresource: aspectMask 0x1 array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL--instead, current layout is VK_IMAGE_LAYOUT_PRESENT_SRC_KHR.
Ogre: ERROR: [Validation] Code 0 :  [ UNASSIGNED-CoreValidation-DrawState-InvalidImageLayout ] Object: 0x211ffa95a50 (Type = 6) | Submitted command buffer expects VkImage 0x5261630000000026[] (subresource: aspectMask 0x1 array layer 0, mip level 0) to be in layout VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL--instead, current layout is VK_IMAGE_LAYOUT_PRESENT_SRC_KHR.
Ogre: ERROR: [Validation] Code 0 :  [ VUID-VkPresentInfoKHR-pImageIndices-01296 ] Object: 0x211ec78fae0 (Type = 4) | Images passed to present must be in layout VK_IMAGE_LAYOUT_PRESENT_SRC_KHR or VK_IMAGE_LAYOUT_SHARED_PRESENT_KHR but is in VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL. The Vulkan spec states: Each element of pImageIndices must be the index of a presentable image acquired from the swapchain specified by the corresponding element of the pSwapchains array, and the presented image subresource must be in the VK_IMAGE_LAYOUT_PRESENT_SRC_KHR layout at the time the operation is executed on a VkDevice (https://github.com/KhronosGroup/Vulkan-Docs/search?q=VUID-VkPresentInfoKHR-pImageIndices-01296)
I'll check with renderdoc tomorrow to see the transitions and where it goes wrong.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: [2.3] Vulkan Progress

Post by dark_sylinc »

OK I just tried your changes with FSAA, you'll never be able to fix it because CompositorPass::analyzeBarriers never considers the resolveTexture, and there is no ResourceLayout to signify Resolve destination

Fixing that right now.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: [2.3] Vulkan Progress

Post by dark_sylinc »

MSAA is working. Thanks for dealing with the details!

I only tested RenderWindow's MSAA. Hopefully regular render textures are also working.. Update: Regular RenderTextures are also tested.

Also... I can finally select resolutions!
Hotshot5000
OGRE Contributor
OGRE Contributor
Posts: 226
Joined: Thu Oct 14, 2010 12:30 pm
x 56

Re: [2.3] Vulkan Progress

Post by Hotshot5000 »

dark_sylinc wrote: Tue Aug 18, 2020 11:39 pm OK I just tried your changes with FSAA, you'll never be able to fix it because CompositorPass::analyzeBarriers never considers the resolveTexture, and there is no ResourceLayout to signify Resolve destination

Fixing that right now.
At least I know I was on the right path :). I realized the issue was because there was no barrier for the resolve texture and ended the day looking at analyzeBarriers() but I was to tired to figure things out.

I need to spend some time looking in the Compositor component to get more familiar with it. I know it was added in v2.1 I think but I never got the time to properly look into it. I'll take a bit of time off (a week, maybe two) and then I will learn more about Ogre.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: [2.3] Vulkan Progress

Post by dark_sylinc »

Refractions, SMAA and SSAO samples are working now.

By now most samples are working in Vulkan. I didn't thought we would get this far so fast.

We have yet to port some Compute Shaders (e.g. Voxel Cone Tracing), fix the other RenderSystems (due to breaking changes from Vulkan), and fix window resizing.
happyOgreRust
Gnoblar
Posts: 6
Joined: Mon Aug 10, 2020 3:31 am

Re: [2.3] Vulkan Progress

Post by happyOgreRust »

The vulkan branch in OgreVulkanWin32Window.cpp . line 501, setMsaaBackbuffer() not exist this funciton. Can't compiled.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: [2.3] Vulkan Progress

Post by dark_sylinc »

happyOgreRust wrote: Sat Aug 22, 2020 4:24 am The vulkan branch in OgreVulkanWin32Window.cpp . line 501, setMsaaBackbuffer() not exist this funciton. Can't compiled.
I pushed a fix for that; but I didn't try compiling on Windows again yet; so I don't know if there are other bugs. Thanks for reporting.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: [2.3] Vulkan Progress

Post by dark_sylinc »

We're getting closer and closer to trying Ogre on Android, and as we approach it I need to tell you something @Hotshot5000:

I noticed on that on Android almost all GPUs report only 65536 elements in maxTexelBufferElements.

This means TexBufferPacked maximum size can only be between 64kb (R8_UNORM) and 1MB (RGBA_FLOAT32). This is a stark contrast to PC where the minimum size is 128MB.

1MB is not enough for most of our advanced features (e.g. Forward+). In fact HlmsPbs by default reserves 4MB for world matrices (see setTextureBufferDefaultSize).
We could use your emulated TexBufferPacked which used a 2D texture behind the scenes that you wrote for GLES2 (and which was ported to GL3+ to get it working on macOS)

However there is a much better way. In Android SSBOs max range is at least 128MB. This is bizarre because SSBOs are considered a much more modern and advanced feature (but turns out it's just regular memory fetch with no format conversion, which is why it is supported more in Android); and in fact it's not avaible in OpenGL 3.x (needs 4.x).

Something I meant to do a long time ago is to replace most of our uses of TexBufferPacked with a new buffer: ReadOnlyBufferPacked, or RawReadBufferPacked (not sure on the name yet).

The reason is the following:
  • On D3D11, Read-only SSBOs actually map to texture buffer registers, and HLSL shaders must either Buffer<float>, StructureBuffer<float>, or ByteAddressBuffer. See Perftest. In short words: C++ behaves like TexBufferPacked with a few extra flags during buffer creation, and the relevant portions are in HLSL code
  • On GL 4.x HW; ReadOnlyBufferPacked would behave exactly as UavBufferPacked from C++. Shader code would use SSBOs with readonly keyword
  • On GL 3.x HW; ReadOnlyBufferPacked would behave exactly as TexBufferPacked. Shader code would use samplerBuffer. In other words behave exactly as it does now.
  • On Vulkan, ReadOnlyBufferPacked would behave like GL 4.x
  • On Metal it doesn't matter. TexBufferPacked & UavBufferPacked are just read-only device memory pointers; and not even the shader syntax changes
Thus, we'd create ReadOnlyBufferPacked rather than switching to UavBufferPacked or sticking with TexBufferPacked because:
  • On Vulkan we shouldn't use your TexBuffer emulation layer because it'd be inefficient (SSBOs are much better)
  • On D3D11 UavBufferPacked are too expensive (they force lots of memory barriers between draws)
  • On GL we'd lose GL 3.x support if we use UavBufferPacked
ReadOnlyBufferPacked would allow us to dynamically switch between the most efficient / best supported method depending on HW.

I'm giving you a heads up in case you beat me to getting Ogre running on Android and run into this problem (which is more of an annoying inconvenience really)

Cheers
Matias
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: [2.3] Vulkan Progress

Post by dark_sylinc »

Btw I had forgotten about this: shaderc is not a different shader compiler from Google.

It takes glslang and integrates it with other stuff for SPIRV generation (like optimizers, validators); and wraps it into compatible interfaces (e.g. C interface instead of C++).
Therefore its HLSL support is the same as glslang's, because it IS glslang. It's not an alternative / competing implementation (unlike dxc from MS which is a different compiler supporting hlsl -> spirv)
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: [2.3] Vulkan Progress

Post by dark_sylinc »

Oooff!!!

It took me the entire weekend and some, but ReadOnlyBufferPacked is finally here!

Now Android readiness should be maximum. I need to fix a couple more issues on my end, top priority:
  • MemoryResidency shows some errors
  • Leaks on shutdown
  • Port Terra
But there's no impediment to start an Android port.

The main question is how we want to approach it. There's a hundred ways to write a C++:
  • Write some thin Java code, then delegate to C++ as much as possible
  • Pure NDK
I'm actually more of a fan of mixing Java, because IMO it has less friction as the tooling is polished (unlike the C++ aspects). And managing the Java-side via pure NDK calls is really verbose.

It also becomes possible to load different .so depending on platform support (e.g. NEON vs non-NEON versions).

I also want to see if D3D11, GL and Metal are working as intended; because I want to merge into master around these weeks. It doesn't seem to make sense anymore to keep the Vulkan stuff in a separate branch.
rujialiu
Goblin
Posts: 296
Joined: Mon May 09, 2016 8:21 am
x 35

Re: [2.3] Vulkan Progress

Post by rujialiu »

dark_sylinc wrote: Tue Aug 25, 2020 4:11 am Therefore its HLSL support is the same as glslang's, because it IS glslang. It's not an alternative / competing implementation (unlike dxc from MS which is a different compiler supporting hlsl -> spirv)
Yeah, I'm aware of this. So what I tried basically means it looks like glslang's HLSL is good enough for compiling our current HLMS shaders :)
rujialiu
Goblin
Posts: 296
Joined: Mon May 09, 2016 8:21 am
x 35

Re: [2.3] Vulkan Progress

Post by rujialiu »

That's fantastic!!!
dark_sylinc wrote: Tue Sep 01, 2020 3:00 am I also want to see if D3D11, GL and Metal are working as intended; because I want to merge into master around these weeks. It doesn't seem to make sense anymore to keep the Vulkan stuff in a separate branch.
I can test D3D11/Vulkan in our project before merging v2-2-vulkan into master, if you want
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: [2.3] Vulkan Progress

Post by dark_sylinc »

rujialiu wrote: Wed Sep 02, 2020 1:17 pm I can test D3D11/Vulkan in our project before merging v2-2-vulkan into master, if you want
Thanks. First I'll run our unit testing suite which should catch any obvious regression on D3D11.

As for testing Vulkan in your project, targetting Vulkan sometimes needs some extra work so I doubt it will work out of the box.

That extra work is:
  • Our HLSL support is not well tested at all. Probably we should use DXC instead of glslang. And select Vulkan 1.2 (right now it's hardcoded to 1.0.2)
  • Otherwise use GLSL shaders, which your project isn't using
  • In very few cases, setting up / tweaking RootLayouts is needed, and there is little documentation about it right now
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: [2.3] Vulkan Progress

Post by dark_sylinc »

I fixed multiple problems in D3D11. It's working now.

However I won't be merging to master yet, because the unit tests caught minor differences which I have yet to research.

Slight differences in lighting in SOME of the PCC samples, and the Decals sample's lighting is different (as if one accounts for normal map while the other doesn't). I have yet to check whether these are regressions (or improvements) caused by the Vulkan branch, or if my baseline set is just too old.

On other news, I'm starting to port to Android! :D :D
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: [2.3] Vulkan Progress

Post by dark_sylinc »

Running a clear screen from Android via Vulkan for the first time ever.

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: [2.3] Vulkan Progress

Post by dark_sylinc »

IT LIVES!!!



Regarding the framerate: This is a full debug build with validations enabled.
It's also old HW (Adreno 505) at 1080p

Note to self: D32_SFLOAT is broken in older Adreno. Can't be used. Shadow maps are using PFG_D24_UNORM_S8_UINT instead.

It's a bit hacked together so I'll have to make the proper fixes later. It is getting late and I've been at this the whole weekend. I'm tired.
Slicky
Bronze Sponsor
Bronze Sponsor
Posts: 614
Joined: Mon Apr 14, 2003 11:48 pm
Location: Was LA now France
x 25

Re: [2.3] Vulkan Progress

Post by Slicky »

Nicely done. I've played with Ogre 1 on Android now Ogre 2 has it.
Post Reply