[2.3] Vulkan Progress

Discussion area about developing with Ogre2 branches (2.1, 2.2 and beyond)
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 4569
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 961
Contact:

Re: [2.3] Vulkan Progress

Post by dark_sylinc »

The only thing that comes to my mind then is that we're acquiring more than one swapchain per frame, thus we run out of them and have to wait

Edit: Or alternatively, we're waiting for 2 swapchains instead of one (i.e. the one we're supposed to wait for + a foreign one. At the very least mGpuWaitSemaphForCurrCmdBuff.size should always be 0 or 1, same for mGpuSignalSemaphForCurrCmdBuff

Edit 2: Try setting your CPU freq scaling to "Performance" instead of powersave.

Edit 3: Try doubling (or 3x) the amount of swapchain images we request in VulkanWindow::createSwapchain (change minImageCount) to see if it makes any difference.

dermont
Bugbear
Posts: 808
Joined: Thu Dec 09, 2004 2:51 am
x 42

Re: [2.3] Vulkan Progress

Post by dermont »

dark_sylinc wrote:
Fri Oct 16, 2020 4:47 am
The only thing that comes to my mind then is that we're acquiring more than one swapchain per frame, thus we run out of them and have to wait

Edit: Or alternatively, we're waiting for 2 swapchains instead of one (i.e. the one we're supposed to wait for + a foreign one. At the very least mGpuWaitSemaphForCurrCmdBuff.size should always be 0 or 1, same for mGpuSignalSemaphForCurrCmdBuff

Edit 2: Try setting your CPU freq scaling to "Performance" instead of powersave.

Edit 3: Try doubling (or 3x) the amount of swapchain images we request in VulkanWindow::createSwapchain (change minImageCount) to see if it makes any difference.
Thanks for the suggestions, already tried the power settings in NVidia(Auto/Adaptive & max performance). Tried tripling the amount of swapchange images, it didn't make any difference.

I have some old OIS examples that I updated and built against master (with overlay enabled) and they did not show the low frame rate as per the sdl demos.

So I copied the Pbs code into one of examples and in debug the sample ran 500 fps. I'll try and produce a simpler sdl/xcb demo to see if I can trace what is happening.

Edit: I tried NVIDIA Nsight but it didn't show anything other than we already know.
https://ibb.co/fp5X4RF
https://ibb.co/0XnZykh

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 4569
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 961
Contact:

Re: [2.3] Vulkan Progress

Post by dark_sylinc »

dermont wrote:
Fri Oct 16, 2020 9:03 am
Edit: I tried NVIDIA Nsight but it didn't show anything other than we already know.
https://ibb.co/fp5X4RF
https://ibb.co/0XnZykh
This may be a bit too advanced, NSight has a tracing option which is later analyzed in its own visualizer; where you can see a timeline of VBlanks presentations and commands issued. This timeline graph could clarify what's going wrong.

dermont
Bugbear
Posts: 808
Joined: Thu Dec 09, 2004 2:51 am
x 42

Re: [2.3] Vulkan Progress

Post by dermont »

dark_sylinc wrote:
Fri Oct 16, 2020 4:02 pm
..
This may be a bit too advanced, NSight has a tracing option which is later analyzed in its own visualizer; where you can see a timeline of VBlanks presentations and commands issued. This timeline graph could clarify what's going wrong.
Thanks I had already enabled the Vulkan trace in the NSight gui but wasn't getting a complete output of commands:

https://developer.nvidia.com/blog/nvidi ... ers-cloud/

This is the memory usage for the PbsMaterial demo from nvidia-smi:

Vulkan

Code: Select all

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.28       Driver Version: 455.28       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 650...  Off  | 00000000:01:00.0 N/A |                  N/A |
| 42%   58C    P0    N/A /  N/A |    461MiB /  1996MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
OpenGL

Code: Select all

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.28       Driver Version: 455.28       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 650...  Off  | 00000000:01:00.0 N/A |                  N/A |
| 51%   63C    P0    N/A /  N/A |    614MiB /  1996MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
Attached is screen shot of a typical frame from NSight for the PbsMaterial demo run from the master branch (debug) and one the one I build with OIS comparison. Also I included a summary of the chronology of events for the sdl demo, using cut and paste from the tooltips, so sorry in advance for any omissions.

I know you are busy and don't want to keep using up your time.

Edit:
Zip file too large to upload.

Code: Select all

===========================
WorkLoad
===========================


*************************************
Command buffer [Begin,End]
*************************************
Begins: 6.28677s
Ends: 6.29293s (+6.157 ms)
Thread [7084] Sample_PbsMater


---------------------
Sampling point
---------------------
Call stack at 6.2879s:
libOgreMain_d.so.2.3.0!Ogre::ConvexBody::clip(...)
libOgreMain_d.so.2.3.0!Ogre::FocusedShadowCameraSetup::getShadowCamera(...) const
libOgreMain_d.so.2.3.0!Ogre::PSSMShadowCameraSetup::getShadowCamera(...) const
libOgreMain_d.so.2.3.0!Ogre::CompositorShadowNode::_update(...)
libOgreMain_d.so.2.3.0!Ogre::CompositorPassScene::execute(...)
libOgreMain_d.so.2.3.0!Ogre::CompositorNode::_update(...)
libOgreMain_d.so.2.3.0!Ogre::CompositorWorkspace::_update()
[Max depth]![Max depth]


---------------------
Sampling point
---------------------

Call stack at 6.28838s:
libOgreMain_d.so.2.3.0!Ogre::OrderPropertyByIdString(...)
libOgreMain_d.so.2.3.0!bool __gnu_cxx::__ops::_Iter_comp_val<...>::operator()<...>(...)
libOgreMain_d.so.2.3.0!__gnu_cxx::__normal_iterator<...> std::__lower_bound<...>(...)
libOgreMain_d.so.2.3.0!__gnu_cxx::__normal_iterator<...> std::lower_bound<...>(...)
libOgreMain_d.so.2.3.0!Ogre::Hlms::setProperty(...)
libOgreHlmsPbs_d.so.2.3.0!Ogre::HlmsPbs::preparePassHash(...)
libOgreMain_d.so.2.3.0!Ogre::RenderQueue::renderPassPrepare(...)
libOgreMain_d.so.2.3.0!Ogre::SceneManager::_cullPhase01(...)
[Max depth]![Max depth]

Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.28889s
Ends: 6.2889s (+7.017 μs)

Call to: vkWaitForFences
■ Vulkan API calls
Begins: 6.28891s
Ends: 6.28893s (+16.345 μs)

Call to: vkWaitForFences
■ Vulkan API calls
Begins: 6.28893s
Ends: 6.28894s (+5.600 μs)

Call to: vkCmdBindDescriptorSets
■ Vulkan API calls
Begins: 6.28899s
Ends: 6.289s (+7.240 μs)

Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.28901s
Ends: 6.28902s (+4.069 μs)

---------------------
Sampling point
---------------------

Call stack at 6.28933s:
libOgreMain_d.so.2.3.0!Ogre::CbIndirectBuffer::CbIndirectBuffer(...)
libOgreMain_d.so.2.3.0!Ogre::RenderQueue::renderGL3(...)
libOgreMain_d.so.2.3.0!Ogre::RenderQueue::render(...)
libOgreMain_d.so.2.3.0!Ogre::SceneManager::_renderPhase02(...)
libOgreMain_d.so.2.3.0!Ogre::Camera::_renderScenePhase02(...)
libOgreMain_d.so.2.3.0!Ogre::Viewport::_updateRenderPhase02(...)
libOgreMain_d.so.2.3.0!Ogre::CompositorPassScene::execute(...)
[Max depth]![Max depth]

Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.28952s
Ends: 6.28952s (+6.244 μs)

Call to: vkCmdBindDescriptorSets
■ Vulkan API calls
Begins: 6.28956s
Ends: 6.28957s (+5.473 μs)


---------------------
Sampling point
---------------------

Call stack at 6.28999s:
libOgreMain_d.so.2.3.0!Ogre::FastArray<...>::size() const
libOgreMain_d.so.2.3.0!Ogre::RenderQueue::render(...)
libOgreMain_d.so.2.3.0!Ogre::SceneManager::_renderPhase02(...)
libOgreMain_d.so.2.3.0!Ogre::Camera::_renderScenePhase02(...)
libOgreMain_d.so.2.3.0!Ogre::Viewport::_updateRenderPhase02(...)
libOgreMain_d.so.2.3.0!Ogre::CompositorPassScene::execute(...)
[Max depth]![Max depth]

Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.2901s
Ends: 6.2901s (+5.753 μs)

Call to: vkCmdBindDescriptorSets
■ Vulkan API calls
Begins: 6.29014s
Ends: 6.29015s (+5.246 μs)

Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.29016s
Ends: 6.29016s (+3.220 μs)

Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.29073s
Ends: 6.29073s (+7.340 μs)

Call to: vkCmdBindDescriptorSets
■ Vulkan API calls
Begins: 6.29078s
Ends: 6.29078s (+6.052 μs)

Call to: vkCmdBindDescriptorSets
■ Vulkan API calls
Begins: 6.29078s
Ends: 6.29078s (+6.052 μs)


---------------------
Sampling point
---------------------

Call stack at 6.29082s:
libOgreMain_d.so.2.3.0!Ogre::HlmsProperty* std::__copy_move_backward<...>::__copy_move_b<...>(...)
libOgreMain_d.so.2.3.0!Ogre::HlmsProperty* std::__copy_move_backward_a<...>(...)
libOgreMain_d.so.2.3.0!Ogre::HlmsProperty* std::__copy_move_backward_a2<...>(...)
libOgreMain_d.so.2.3.0!Ogre::HlmsProperty* std::move_backward<...>(...)
libOgreMain_d.so.2.3.0!void std::vector<...>::_M_insert_aux<...>(...)
libOgreMain_d.so.2.3.0!std::vector<...>::insert(...)
libOgreMain_d.so.2.3.0!Ogre::Hlms::setProperty(...)
libOgreMain_d.so.2.3.0!Ogre::Hlms::preparePassHashBase(...)
libOgreHlmsPbs_d.so.2.3.0!Ogre::HlmsPbs::preparePassHash(...)
libOgreMain_d.so.2.3.0!Ogre::RenderQueue::renderPassPrepare(...)
libOgreMain_d.so.2.3.0!Ogre::SceneManager::_cullPhase01(...)
[Max depth]![Max depth]


Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.29141s
Ends: 6.29141s (+6.928 μs)
Call to: vkCmdBindDescriptorSets
■ Vulkan API calls
Begins: 6.29146s
Ends: 6.29146s (+6.094 μs)

Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.29148s
Ends: 6.29148s (+3.107 μs)

---------------------
Sampling point
---------------------

Call stack at 6.29163s:
libOgreMain_d.so.2.3.0!bool std::operator==<...>(...)
libOgreMain_d.so.2.3.0!Ogre::Hlms::PassCache::operator==(...) const
libOgreMain_d.so.2.3.0!bool __gnu_cxx::__ops::_Iter_equals_val<...>::operator()<...>(...)
libOgreMain_d.so.2.3.0!__gnu_cxx::__normal_iterator<...> std::__find_if<...>(...)
libOgreMain_d.so.2.3.0!__gnu_cxx::__normal_iterator<...> std::__find_if<...>(...)
libOgreMain_d.so.2.3.0!__gnu_cxx::__normal_iterator<...> std::find<...>(...)
libOgreMain_d.so.2.3.0!Ogre::Hlms::preparePassHashBase(...)
libOgreMain_d.so.2.3.0!Ogre::Hlms::preparePassHash(...)
libOgreMain_d.so.2.3.0!Ogre::HlmsLowLevel::preparePassHash(...)
libOgreMain_d.so.2.3.0!Ogre::RenderQueue::renderPassPrepare(...)
libOgreMain_d.so.2.3.0!Ogre::SceneManager::_cullPhase01(...)
libOgreMain_d.so.2.3.0!Ogre::Camera::_cullScenePhase01(...)
libOgreMain_d.so.2.3.0!Ogre::Viewport::_updateCullPhase01(...)
[Max depth]![Max depth]


---------------------
Sampling point
---------------------

Call stack at 6.29234s:
libVkLayer_khronos_validation.so!DispatchResetDescriptorPool(...)
libVkLayer_khronos_validation.so!vulkan_layer_chassis::ResetDescriptorPool(...)
RenderSystem_Vulkan_d.so.2.3.0!Ogre::VulkanDescriptorPool::reset(...)
RenderSystem_Vulkan_d.so.2.3.0!Ogre::VulkanDescriptorPool::allocate(...)
RenderSystem_Vulkan_d.so.2.3.0!Ogre::VulkanRootLayout::bind(...)
[Max depth]![Max depth]

Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.29237s
Ends: 6.29237s (+7.136 μs)

Call to: vkWaitForFences
■ Vulkan API calls
Begins: 6.29239s
Ends: 6.2924s (+14.535 μs)


Call to: vkWaitForFences
■ Vulkan API calls
Begins: 6.29247s
Ends: 6.29248s (+7.445 μs)

Call to: vkCmdBindDescriptorSets
■ Vulkan API calls
Begins: 6.29252s
Ends: 6.29252s (+7.392 μs)


Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.29255s
Ends: 6.29255s (+4.491 μs)

Call to: vkWaitForFences
■ Vulkan API calls
Begins: 6.29257s
Ends: 6.29257s (+4.279 μs)


Call to: vkWaitForFences
■ Vulkan API calls
Begins: 6.29274s
Ends: 6.29274s (+5.643 μs)

Call to: vkCmdBindDescriptorSets
■ Vulkan API calls
Begins: 6.29278s
Ends: 6.29278s (+5.361 μs)


Call to: vkQueueSubmit
■ Vulkan API calls
Begins: 6.29294s
Ends: 6.29301s (+70.314 μs)
Correlation ID: 72

Call to: vkQueuePresentKHR
■ Vulkan API calls
Begins: 6.29302s
Ends: 6.29308s (+56.566 μs)

*************************************
End Command Buffer
*************************************

Call to: vkWaitForFences
■ Vulkan API calls
Begins: 6.2931s
Ends: 6.32207s (+28.970 ms)
https://ibb.co/ZGwTMHf
https://ibb.co/jhXQ75t

dermont
Bugbear
Posts: 808
Joined: Thu Dec 09, 2004 2:51 am
x 42

Re: [2.3] Vulkan Progress

Post by dermont »

dermont wrote:
Fri Oct 16, 2020 9:03 am
...
I have some old OIS examples that I updated and built against master (with overlay enabled) and they did not show the low frame rate as per the sdl demos.

So I copied the Pbs code into one of examples and in debug the sample ran 500 fps. I'll try and produce a simpler sdl/xcb demo to see if I can trace what is happening.
...
There was a bug in the example I was using, the correct compositor wasn't being picked up. Disabling shadows in PbsMaterials.compositor i.e removing ( "shadows PbsMaterialsShadowNode") and the Ogre pbsMaterial sample frame rate jumps to 500.

Shadows enabled

Code: Select all

Ave Time 35.03ms , min 34.47ms,  max 35.79,  FPS 29.

Command                   Begins    Ends           Time           FPS   %Frame
GPU(average)                                         +35.02  ms
CPU                             20.2198s  20.255s    +35.137 ms    28   100 
vkWaitForFences          20.2198s  20.2494s          +29.564 ms          84 
CommandBuffer           20.2549s  20.2549s           +5.447  ms          15
vkQueueSubmit           20.2549s  20.2549s           +45.276 μs
vkQueuePresentKHR    20.2549s  20.255s               +37.739 μs
Shadows disabled(frame rate jumps all over the place)

Code: Select all

Ave Time 2.13ms , min 1.36ms,  max 2.9,  FPS 496.

Command                   Begins    Ends       Time             FPS   %Frame
GPU(Average)                                      +2.5  ms     469
CPU                          18.8325s  18.8353s   +2.752 ms    363     100 
vkWaitForFences           18.8325s  18.8344s      +1.815 ms             66 
CommandBuffer            18.8344s  18.8352s       +881.576 μs           32
vkQueueSubmit             18.8353s  18.8353s      +10.595 μs
vkQueuePresentKHR      18.8353s  18.8353s         +20.460 μs
I have tried 3x the amount of swapchain images, tried increasing mDynamicBufferMultiplier, at best I get 5% increase in FPS.

Post Reply