The only thing that comes to my mind then is that we're acquiring more than one swapchain per frame, thus we run out of them and have to wait
Edit: Or alternatively, we're waiting for 2 swapchains instead of one (i.e. the one we're supposed to wait for + a foreign one. At the very least mGpuWaitSemaphForCurrCmdBuff.size should always be 0 or 1, same for mGpuSignalSemaphForCurrCmdBuff
Edit 2: Try setting your CPU freq scaling to "Performance" instead of powersave.
Edit 3: Try doubling (or 3x) the amount of swapchain images we request in VulkanWindow::createSwapchain (change minImageCount) to see if it makes any difference.
[2.3] Vulkan Progress
- dark_sylinc
- OGRE Team Member
- Posts: 4654
- Joined: Sat Jul 21, 2007 4:55 pm
- Location: Buenos Aires, Argentina
- x 988
- Contact:
-
- Bugbear
- Posts: 808
- Joined: Thu Dec 09, 2004 2:51 am
- x 42
Re: [2.3] Vulkan Progress
Thanks for the suggestions, already tried the power settings in NVidia(Auto/Adaptive & max performance). Tried tripling the amount of swapchange images, it didn't make any difference.dark_sylinc wrote: ↑Fri Oct 16, 2020 4:47 amThe only thing that comes to my mind then is that we're acquiring more than one swapchain per frame, thus we run out of them and have to wait
Edit: Or alternatively, we're waiting for 2 swapchains instead of one (i.e. the one we're supposed to wait for + a foreign one. At the very least mGpuWaitSemaphForCurrCmdBuff.size should always be 0 or 1, same for mGpuSignalSemaphForCurrCmdBuff
Edit 2: Try setting your CPU freq scaling to "Performance" instead of powersave.
Edit 3: Try doubling (or 3x) the amount of swapchain images we request in VulkanWindow::createSwapchain (change minImageCount) to see if it makes any difference.
I have some old OIS examples that I updated and built against master (with overlay enabled) and they did not show the low frame rate as per the sdl demos.
So I copied the Pbs code into one of examples and in debug the sample ran 500 fps. I'll try and produce a simpler sdl/xcb demo to see if I can trace what is happening.
Edit: I tried NVIDIA Nsight but it didn't show anything other than we already know.
https://ibb.co/fp5X4RF
https://ibb.co/0XnZykh
- dark_sylinc
- OGRE Team Member
- Posts: 4654
- Joined: Sat Jul 21, 2007 4:55 pm
- Location: Buenos Aires, Argentina
- x 988
- Contact:
Re: [2.3] Vulkan Progress
This may be a bit too advanced, NSight has a tracing option which is later analyzed in its own visualizer; where you can see a timeline of VBlanks presentations and commands issued. This timeline graph could clarify what's going wrong.dermont wrote: ↑Fri Oct 16, 2020 9:03 amEdit: I tried NVIDIA Nsight but it didn't show anything other than we already know.
https://ibb.co/fp5X4RF
https://ibb.co/0XnZykh
-
- Bugbear
- Posts: 808
- Joined: Thu Dec 09, 2004 2:51 am
- x 42
Re: [2.3] Vulkan Progress
Thanks I had already enabled the Vulkan trace in the NSight gui but wasn't getting a complete output of commands:dark_sylinc wrote: ↑Fri Oct 16, 2020 4:02 pm..
This may be a bit too advanced, NSight has a tracing option which is later analyzed in its own visualizer; where you can see a timeline of VBlanks presentations and commands issued. This timeline graph could clarify what's going wrong.
https://developer.nvidia.com/blog/nvidi ... ers-cloud/
This is the memory usage for the PbsMaterial demo from nvidia-smi:
Vulkan
Code: Select all
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.28 Driver Version: 455.28 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 650... Off | 00000000:01:00.0 N/A | N/A |
| 42% 58C P0 N/A / N/A | 461MiB / 1996MiB | N/A Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Code: Select all
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.28 Driver Version: 455.28 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 650... Off | 00000000:01:00.0 N/A | N/A |
| 51% 63C P0 N/A / N/A | 614MiB / 1996MiB | N/A Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
I know you are busy and don't want to keep using up your time.
Edit:
Zip file too large to upload.
Code: Select all
===========================
WorkLoad
===========================
*************************************
Command buffer [Begin,End]
*************************************
Begins: 6.28677s
Ends: 6.29293s (+6.157 ms)
Thread [7084] Sample_PbsMater
---------------------
Sampling point
---------------------
Call stack at 6.2879s:
libOgreMain_d.so.2.3.0!Ogre::ConvexBody::clip(...)
libOgreMain_d.so.2.3.0!Ogre::FocusedShadowCameraSetup::getShadowCamera(...) const
libOgreMain_d.so.2.3.0!Ogre::PSSMShadowCameraSetup::getShadowCamera(...) const
libOgreMain_d.so.2.3.0!Ogre::CompositorShadowNode::_update(...)
libOgreMain_d.so.2.3.0!Ogre::CompositorPassScene::execute(...)
libOgreMain_d.so.2.3.0!Ogre::CompositorNode::_update(...)
libOgreMain_d.so.2.3.0!Ogre::CompositorWorkspace::_update()
[Max depth]![Max depth]
---------------------
Sampling point
---------------------
Call stack at 6.28838s:
libOgreMain_d.so.2.3.0!Ogre::OrderPropertyByIdString(...)
libOgreMain_d.so.2.3.0!bool __gnu_cxx::__ops::_Iter_comp_val<...>::operator()<...>(...)
libOgreMain_d.so.2.3.0!__gnu_cxx::__normal_iterator<...> std::__lower_bound<...>(...)
libOgreMain_d.so.2.3.0!__gnu_cxx::__normal_iterator<...> std::lower_bound<...>(...)
libOgreMain_d.so.2.3.0!Ogre::Hlms::setProperty(...)
libOgreHlmsPbs_d.so.2.3.0!Ogre::HlmsPbs::preparePassHash(...)
libOgreMain_d.so.2.3.0!Ogre::RenderQueue::renderPassPrepare(...)
libOgreMain_d.so.2.3.0!Ogre::SceneManager::_cullPhase01(...)
[Max depth]![Max depth]
Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.28889s
Ends: 6.2889s (+7.017 μs)
Call to: vkWaitForFences
■ Vulkan API calls
Begins: 6.28891s
Ends: 6.28893s (+16.345 μs)
Call to: vkWaitForFences
■ Vulkan API calls
Begins: 6.28893s
Ends: 6.28894s (+5.600 μs)
Call to: vkCmdBindDescriptorSets
■ Vulkan API calls
Begins: 6.28899s
Ends: 6.289s (+7.240 μs)
Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.28901s
Ends: 6.28902s (+4.069 μs)
---------------------
Sampling point
---------------------
Call stack at 6.28933s:
libOgreMain_d.so.2.3.0!Ogre::CbIndirectBuffer::CbIndirectBuffer(...)
libOgreMain_d.so.2.3.0!Ogre::RenderQueue::renderGL3(...)
libOgreMain_d.so.2.3.0!Ogre::RenderQueue::render(...)
libOgreMain_d.so.2.3.0!Ogre::SceneManager::_renderPhase02(...)
libOgreMain_d.so.2.3.0!Ogre::Camera::_renderScenePhase02(...)
libOgreMain_d.so.2.3.0!Ogre::Viewport::_updateRenderPhase02(...)
libOgreMain_d.so.2.3.0!Ogre::CompositorPassScene::execute(...)
[Max depth]![Max depth]
Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.28952s
Ends: 6.28952s (+6.244 μs)
Call to: vkCmdBindDescriptorSets
■ Vulkan API calls
Begins: 6.28956s
Ends: 6.28957s (+5.473 μs)
---------------------
Sampling point
---------------------
Call stack at 6.28999s:
libOgreMain_d.so.2.3.0!Ogre::FastArray<...>::size() const
libOgreMain_d.so.2.3.0!Ogre::RenderQueue::render(...)
libOgreMain_d.so.2.3.0!Ogre::SceneManager::_renderPhase02(...)
libOgreMain_d.so.2.3.0!Ogre::Camera::_renderScenePhase02(...)
libOgreMain_d.so.2.3.0!Ogre::Viewport::_updateRenderPhase02(...)
libOgreMain_d.so.2.3.0!Ogre::CompositorPassScene::execute(...)
[Max depth]![Max depth]
Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.2901s
Ends: 6.2901s (+5.753 μs)
Call to: vkCmdBindDescriptorSets
■ Vulkan API calls
Begins: 6.29014s
Ends: 6.29015s (+5.246 μs)
Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.29016s
Ends: 6.29016s (+3.220 μs)
Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.29073s
Ends: 6.29073s (+7.340 μs)
Call to: vkCmdBindDescriptorSets
■ Vulkan API calls
Begins: 6.29078s
Ends: 6.29078s (+6.052 μs)
Call to: vkCmdBindDescriptorSets
■ Vulkan API calls
Begins: 6.29078s
Ends: 6.29078s (+6.052 μs)
---------------------
Sampling point
---------------------
Call stack at 6.29082s:
libOgreMain_d.so.2.3.0!Ogre::HlmsProperty* std::__copy_move_backward<...>::__copy_move_b<...>(...)
libOgreMain_d.so.2.3.0!Ogre::HlmsProperty* std::__copy_move_backward_a<...>(...)
libOgreMain_d.so.2.3.0!Ogre::HlmsProperty* std::__copy_move_backward_a2<...>(...)
libOgreMain_d.so.2.3.0!Ogre::HlmsProperty* std::move_backward<...>(...)
libOgreMain_d.so.2.3.0!void std::vector<...>::_M_insert_aux<...>(...)
libOgreMain_d.so.2.3.0!std::vector<...>::insert(...)
libOgreMain_d.so.2.3.0!Ogre::Hlms::setProperty(...)
libOgreMain_d.so.2.3.0!Ogre::Hlms::preparePassHashBase(...)
libOgreHlmsPbs_d.so.2.3.0!Ogre::HlmsPbs::preparePassHash(...)
libOgreMain_d.so.2.3.0!Ogre::RenderQueue::renderPassPrepare(...)
libOgreMain_d.so.2.3.0!Ogre::SceneManager::_cullPhase01(...)
[Max depth]![Max depth]
Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.29141s
Ends: 6.29141s (+6.928 μs)
Call to: vkCmdBindDescriptorSets
■ Vulkan API calls
Begins: 6.29146s
Ends: 6.29146s (+6.094 μs)
Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.29148s
Ends: 6.29148s (+3.107 μs)
---------------------
Sampling point
---------------------
Call stack at 6.29163s:
libOgreMain_d.so.2.3.0!bool std::operator==<...>(...)
libOgreMain_d.so.2.3.0!Ogre::Hlms::PassCache::operator==(...) const
libOgreMain_d.so.2.3.0!bool __gnu_cxx::__ops::_Iter_equals_val<...>::operator()<...>(...)
libOgreMain_d.so.2.3.0!__gnu_cxx::__normal_iterator<...> std::__find_if<...>(...)
libOgreMain_d.so.2.3.0!__gnu_cxx::__normal_iterator<...> std::__find_if<...>(...)
libOgreMain_d.so.2.3.0!__gnu_cxx::__normal_iterator<...> std::find<...>(...)
libOgreMain_d.so.2.3.0!Ogre::Hlms::preparePassHashBase(...)
libOgreMain_d.so.2.3.0!Ogre::Hlms::preparePassHash(...)
libOgreMain_d.so.2.3.0!Ogre::HlmsLowLevel::preparePassHash(...)
libOgreMain_d.so.2.3.0!Ogre::RenderQueue::renderPassPrepare(...)
libOgreMain_d.so.2.3.0!Ogre::SceneManager::_cullPhase01(...)
libOgreMain_d.so.2.3.0!Ogre::Camera::_cullScenePhase01(...)
libOgreMain_d.so.2.3.0!Ogre::Viewport::_updateCullPhase01(...)
[Max depth]![Max depth]
---------------------
Sampling point
---------------------
Call stack at 6.29234s:
libVkLayer_khronos_validation.so!DispatchResetDescriptorPool(...)
libVkLayer_khronos_validation.so!vulkan_layer_chassis::ResetDescriptorPool(...)
RenderSystem_Vulkan_d.so.2.3.0!Ogre::VulkanDescriptorPool::reset(...)
RenderSystem_Vulkan_d.so.2.3.0!Ogre::VulkanDescriptorPool::allocate(...)
RenderSystem_Vulkan_d.so.2.3.0!Ogre::VulkanRootLayout::bind(...)
[Max depth]![Max depth]
Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.29237s
Ends: 6.29237s (+7.136 μs)
Call to: vkWaitForFences
■ Vulkan API calls
Begins: 6.29239s
Ends: 6.2924s (+14.535 μs)
Call to: vkWaitForFences
■ Vulkan API calls
Begins: 6.29247s
Ends: 6.29248s (+7.445 μs)
Call to: vkCmdBindDescriptorSets
■ Vulkan API calls
Begins: 6.29252s
Ends: 6.29252s (+7.392 μs)
Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.29255s
Ends: 6.29255s (+4.491 μs)
Call to: vkWaitForFences
■ Vulkan API calls
Begins: 6.29257s
Ends: 6.29257s (+4.279 μs)
Call to: vkWaitForFences
■ Vulkan API calls
Begins: 6.29274s
Ends: 6.29274s (+5.643 μs)
Call to: vkCmdBindDescriptorSets
■ Vulkan API calls
Begins: 6.29278s
Ends: 6.29278s (+5.361 μs)
Call to: vkQueueSubmit
■ Vulkan API calls
Begins: 6.29294s
Ends: 6.29301s (+70.314 μs)
Correlation ID: 72
Call to: vkQueuePresentKHR
■ Vulkan API calls
Begins: 6.29302s
Ends: 6.29308s (+56.566 μs)
*************************************
End Command Buffer
*************************************
Call to: vkWaitForFences
■ Vulkan API calls
Begins: 6.2931s
Ends: 6.32207s (+28.970 ms)
https://ibb.co/jhXQ75t
-
- Bugbear
- Posts: 808
- Joined: Thu Dec 09, 2004 2:51 am
- x 42
Re: [2.3] Vulkan Progress
There was a bug in the example I was using, the correct compositor wasn't being picked up. Disabling shadows in PbsMaterials.compositor i.e removing ( "shadows PbsMaterialsShadowNode") and the Ogre pbsMaterial sample frame rate jumps to 500.dermont wrote: ↑Fri Oct 16, 2020 9:03 am...
I have some old OIS examples that I updated and built against master (with overlay enabled) and they did not show the low frame rate as per the sdl demos.
So I copied the Pbs code into one of examples and in debug the sample ran 500 fps. I'll try and produce a simpler sdl/xcb demo to see if I can trace what is happening.
...
Shadows enabled
Code: Select all
Ave Time 35.03ms , min 34.47ms, max 35.79, FPS 29.
Command Begins Ends Time FPS %Frame
GPU(average) +35.02 ms
CPU 20.2198s 20.255s +35.137 ms 28 100
vkWaitForFences 20.2198s 20.2494s +29.564 ms 84
CommandBuffer 20.2549s 20.2549s +5.447 ms 15
vkQueueSubmit 20.2549s 20.2549s +45.276 μs
vkQueuePresentKHR 20.2549s 20.255s +37.739 μs
Code: Select all
Ave Time 2.13ms , min 1.36ms, max 2.9, FPS 496.
Command Begins Ends Time FPS %Frame
GPU(Average) +2.5 ms 469
CPU 18.8325s 18.8353s +2.752 ms 363 100
vkWaitForFences 18.8325s 18.8344s +1.815 ms 66
CommandBuffer 18.8344s 18.8352s +881.576 μs 32
vkQueueSubmit 18.8353s 18.8353s +10.595 μs
vkQueuePresentKHR 18.8353s 18.8353s +20.460 μs