[2.3] Vulkan Progress
-
- OGRE Team Member
- Posts: 5446
- Joined: Sat Jul 21, 2007 4:55 pm
- Location: Buenos Aires, Argentina
- x 1348
Re: [2.3] Vulkan Progress
The only thing that comes to my mind then is that we're acquiring more than one swapchain per frame, thus we run out of them and have to wait
Edit: Or alternatively, we're waiting for 2 swapchains instead of one (i.e. the one we're supposed to wait for + a foreign one. At the very least mGpuWaitSemaphForCurrCmdBuff.size should always be 0 or 1, same for mGpuSignalSemaphForCurrCmdBuff
Edit 2: Try setting your CPU freq scaling to "Performance" instead of powersave.
Edit 3: Try doubling (or 3x) the amount of swapchain images we request in VulkanWindow::createSwapchain (change minImageCount) to see if it makes any difference.
Edit: Or alternatively, we're waiting for 2 swapchains instead of one (i.e. the one we're supposed to wait for + a foreign one. At the very least mGpuWaitSemaphForCurrCmdBuff.size should always be 0 or 1, same for mGpuSignalSemaphForCurrCmdBuff
Edit 2: Try setting your CPU freq scaling to "Performance" instead of powersave.
Edit 3: Try doubling (or 3x) the amount of swapchain images we request in VulkanWindow::createSwapchain (change minImageCount) to see if it makes any difference.
-
- Bugbear
- Posts: 812
- Joined: Thu Dec 09, 2004 2:51 am
- x 42
Re: [2.3] Vulkan Progress
Thanks for the suggestions, already tried the power settings in NVidia(Auto/Adaptive & max performance). Tried tripling the amount of swapchange images, it didn't make any difference.dark_sylinc wrote: ↑Fri Oct 16, 2020 4:47 am The only thing that comes to my mind then is that we're acquiring more than one swapchain per frame, thus we run out of them and have to wait
Edit: Or alternatively, we're waiting for 2 swapchains instead of one (i.e. the one we're supposed to wait for + a foreign one. At the very least mGpuWaitSemaphForCurrCmdBuff.size should always be 0 or 1, same for mGpuSignalSemaphForCurrCmdBuff
Edit 2: Try setting your CPU freq scaling to "Performance" instead of powersave.
Edit 3: Try doubling (or 3x) the amount of swapchain images we request in VulkanWindow::createSwapchain (change minImageCount) to see if it makes any difference.
I have some old OIS examples that I updated and built against master (with overlay enabled) and they did not show the low frame rate as per the sdl demos.
So I copied the Pbs code into one of examples and in debug the sample ran 500 fps. I'll try and produce a simpler sdl/xcb demo to see if I can trace what is happening.
Edit: I tried NVIDIA Nsight but it didn't show anything other than we already know.
https://ibb.co/fp5X4RF
https://ibb.co/0XnZykh
-
- OGRE Team Member
- Posts: 5446
- Joined: Sat Jul 21, 2007 4:55 pm
- Location: Buenos Aires, Argentina
- x 1348
Re: [2.3] Vulkan Progress
This may be a bit too advanced, NSight has a tracing option which is later analyzed in its own visualizer; where you can see a timeline of VBlanks presentations and commands issued. This timeline graph could clarify what's going wrong.dermont wrote: ↑Fri Oct 16, 2020 9:03 am Edit: I tried NVIDIA Nsight but it didn't show anything other than we already know.
https://ibb.co/fp5X4RF
https://ibb.co/0XnZykh
-
- Bugbear
- Posts: 812
- Joined: Thu Dec 09, 2004 2:51 am
- x 42
Re: [2.3] Vulkan Progress
Thanks I had already enabled the Vulkan trace in the NSight gui but wasn't getting a complete output of commands:dark_sylinc wrote: ↑Fri Oct 16, 2020 4:02 pm ..
This may be a bit too advanced, NSight has a tracing option which is later analyzed in its own visualizer; where you can see a timeline of VBlanks presentations and commands issued. This timeline graph could clarify what's going wrong.
https://developer.nvidia.com/blog/nvidi ... ers-cloud/
This is the memory usage for the PbsMaterial demo from nvidia-smi:
Vulkan
Code: Select all
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.28 Driver Version: 455.28 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 650... Off | 00000000:01:00.0 N/A | N/A |
| 42% 58C P0 N/A / N/A | 461MiB / 1996MiB | N/A Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Code: Select all
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.28 Driver Version: 455.28 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 650... Off | 00000000:01:00.0 N/A | N/A |
| 51% 63C P0 N/A / N/A | 614MiB / 1996MiB | N/A Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
I know you are busy and don't want to keep using up your time.
Edit:
Zip file too large to upload.
Code: Select all
===========================
WorkLoad
===========================
*************************************
Command buffer [Begin,End]
*************************************
Begins: 6.28677s
Ends: 6.29293s (+6.157 ms)
Thread [7084] Sample_PbsMater
---------------------
Sampling point
---------------------
Call stack at 6.2879s:
libOgreMain_d.so.2.3.0!Ogre::ConvexBody::clip(...)
libOgreMain_d.so.2.3.0!Ogre::FocusedShadowCameraSetup::getShadowCamera(...) const
libOgreMain_d.so.2.3.0!Ogre::PSSMShadowCameraSetup::getShadowCamera(...) const
libOgreMain_d.so.2.3.0!Ogre::CompositorShadowNode::_update(...)
libOgreMain_d.so.2.3.0!Ogre::CompositorPassScene::execute(...)
libOgreMain_d.so.2.3.0!Ogre::CompositorNode::_update(...)
libOgreMain_d.so.2.3.0!Ogre::CompositorWorkspace::_update()
[Max depth]![Max depth]
---------------------
Sampling point
---------------------
Call stack at 6.28838s:
libOgreMain_d.so.2.3.0!Ogre::OrderPropertyByIdString(...)
libOgreMain_d.so.2.3.0!bool __gnu_cxx::__ops::_Iter_comp_val<...>::operator()<...>(...)
libOgreMain_d.so.2.3.0!__gnu_cxx::__normal_iterator<...> std::__lower_bound<...>(...)
libOgreMain_d.so.2.3.0!__gnu_cxx::__normal_iterator<...> std::lower_bound<...>(...)
libOgreMain_d.so.2.3.0!Ogre::Hlms::setProperty(...)
libOgreHlmsPbs_d.so.2.3.0!Ogre::HlmsPbs::preparePassHash(...)
libOgreMain_d.so.2.3.0!Ogre::RenderQueue::renderPassPrepare(...)
libOgreMain_d.so.2.3.0!Ogre::SceneManager::_cullPhase01(...)
[Max depth]![Max depth]
Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.28889s
Ends: 6.2889s (+7.017 μs)
Call to: vkWaitForFences
■ Vulkan API calls
Begins: 6.28891s
Ends: 6.28893s (+16.345 μs)
Call to: vkWaitForFences
■ Vulkan API calls
Begins: 6.28893s
Ends: 6.28894s (+5.600 μs)
Call to: vkCmdBindDescriptorSets
■ Vulkan API calls
Begins: 6.28899s
Ends: 6.289s (+7.240 μs)
Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.28901s
Ends: 6.28902s (+4.069 μs)
---------------------
Sampling point
---------------------
Call stack at 6.28933s:
libOgreMain_d.so.2.3.0!Ogre::CbIndirectBuffer::CbIndirectBuffer(...)
libOgreMain_d.so.2.3.0!Ogre::RenderQueue::renderGL3(...)
libOgreMain_d.so.2.3.0!Ogre::RenderQueue::render(...)
libOgreMain_d.so.2.3.0!Ogre::SceneManager::_renderPhase02(...)
libOgreMain_d.so.2.3.0!Ogre::Camera::_renderScenePhase02(...)
libOgreMain_d.so.2.3.0!Ogre::Viewport::_updateRenderPhase02(...)
libOgreMain_d.so.2.3.0!Ogre::CompositorPassScene::execute(...)
[Max depth]![Max depth]
Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.28952s
Ends: 6.28952s (+6.244 μs)
Call to: vkCmdBindDescriptorSets
■ Vulkan API calls
Begins: 6.28956s
Ends: 6.28957s (+5.473 μs)
---------------------
Sampling point
---------------------
Call stack at 6.28999s:
libOgreMain_d.so.2.3.0!Ogre::FastArray<...>::size() const
libOgreMain_d.so.2.3.0!Ogre::RenderQueue::render(...)
libOgreMain_d.so.2.3.0!Ogre::SceneManager::_renderPhase02(...)
libOgreMain_d.so.2.3.0!Ogre::Camera::_renderScenePhase02(...)
libOgreMain_d.so.2.3.0!Ogre::Viewport::_updateRenderPhase02(...)
libOgreMain_d.so.2.3.0!Ogre::CompositorPassScene::execute(...)
[Max depth]![Max depth]
Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.2901s
Ends: 6.2901s (+5.753 μs)
Call to: vkCmdBindDescriptorSets
■ Vulkan API calls
Begins: 6.29014s
Ends: 6.29015s (+5.246 μs)
Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.29016s
Ends: 6.29016s (+3.220 μs)
Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.29073s
Ends: 6.29073s (+7.340 μs)
Call to: vkCmdBindDescriptorSets
■ Vulkan API calls
Begins: 6.29078s
Ends: 6.29078s (+6.052 μs)
Call to: vkCmdBindDescriptorSets
■ Vulkan API calls
Begins: 6.29078s
Ends: 6.29078s (+6.052 μs)
---------------------
Sampling point
---------------------
Call stack at 6.29082s:
libOgreMain_d.so.2.3.0!Ogre::HlmsProperty* std::__copy_move_backward<...>::__copy_move_b<...>(...)
libOgreMain_d.so.2.3.0!Ogre::HlmsProperty* std::__copy_move_backward_a<...>(...)
libOgreMain_d.so.2.3.0!Ogre::HlmsProperty* std::__copy_move_backward_a2<...>(...)
libOgreMain_d.so.2.3.0!Ogre::HlmsProperty* std::move_backward<...>(...)
libOgreMain_d.so.2.3.0!void std::vector<...>::_M_insert_aux<...>(...)
libOgreMain_d.so.2.3.0!std::vector<...>::insert(...)
libOgreMain_d.so.2.3.0!Ogre::Hlms::setProperty(...)
libOgreMain_d.so.2.3.0!Ogre::Hlms::preparePassHashBase(...)
libOgreHlmsPbs_d.so.2.3.0!Ogre::HlmsPbs::preparePassHash(...)
libOgreMain_d.so.2.3.0!Ogre::RenderQueue::renderPassPrepare(...)
libOgreMain_d.so.2.3.0!Ogre::SceneManager::_cullPhase01(...)
[Max depth]![Max depth]
Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.29141s
Ends: 6.29141s (+6.928 μs)
Call to: vkCmdBindDescriptorSets
■ Vulkan API calls
Begins: 6.29146s
Ends: 6.29146s (+6.094 μs)
Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.29148s
Ends: 6.29148s (+3.107 μs)
---------------------
Sampling point
---------------------
Call stack at 6.29163s:
libOgreMain_d.so.2.3.0!bool std::operator==<...>(...)
libOgreMain_d.so.2.3.0!Ogre::Hlms::PassCache::operator==(...) const
libOgreMain_d.so.2.3.0!bool __gnu_cxx::__ops::_Iter_equals_val<...>::operator()<...>(...)
libOgreMain_d.so.2.3.0!__gnu_cxx::__normal_iterator<...> std::__find_if<...>(...)
libOgreMain_d.so.2.3.0!__gnu_cxx::__normal_iterator<...> std::__find_if<...>(...)
libOgreMain_d.so.2.3.0!__gnu_cxx::__normal_iterator<...> std::find<...>(...)
libOgreMain_d.so.2.3.0!Ogre::Hlms::preparePassHashBase(...)
libOgreMain_d.so.2.3.0!Ogre::Hlms::preparePassHash(...)
libOgreMain_d.so.2.3.0!Ogre::HlmsLowLevel::preparePassHash(...)
libOgreMain_d.so.2.3.0!Ogre::RenderQueue::renderPassPrepare(...)
libOgreMain_d.so.2.3.0!Ogre::SceneManager::_cullPhase01(...)
libOgreMain_d.so.2.3.0!Ogre::Camera::_cullScenePhase01(...)
libOgreMain_d.so.2.3.0!Ogre::Viewport::_updateCullPhase01(...)
[Max depth]![Max depth]
---------------------
Sampling point
---------------------
Call stack at 6.29234s:
libVkLayer_khronos_validation.so!DispatchResetDescriptorPool(...)
libVkLayer_khronos_validation.so!vulkan_layer_chassis::ResetDescriptorPool(...)
RenderSystem_Vulkan_d.so.2.3.0!Ogre::VulkanDescriptorPool::reset(...)
RenderSystem_Vulkan_d.so.2.3.0!Ogre::VulkanDescriptorPool::allocate(...)
RenderSystem_Vulkan_d.so.2.3.0!Ogre::VulkanRootLayout::bind(...)
[Max depth]![Max depth]
Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.29237s
Ends: 6.29237s (+7.136 μs)
Call to: vkWaitForFences
■ Vulkan API calls
Begins: 6.29239s
Ends: 6.2924s (+14.535 μs)
Call to: vkWaitForFences
■ Vulkan API calls
Begins: 6.29247s
Ends: 6.29248s (+7.445 μs)
Call to: vkCmdBindDescriptorSets
■ Vulkan API calls
Begins: 6.29252s
Ends: 6.29252s (+7.392 μs)
Call to: vkCmdBindPipeline
■ Vulkan API calls
Begins: 6.29255s
Ends: 6.29255s (+4.491 μs)
Call to: vkWaitForFences
■ Vulkan API calls
Begins: 6.29257s
Ends: 6.29257s (+4.279 μs)
Call to: vkWaitForFences
■ Vulkan API calls
Begins: 6.29274s
Ends: 6.29274s (+5.643 μs)
Call to: vkCmdBindDescriptorSets
■ Vulkan API calls
Begins: 6.29278s
Ends: 6.29278s (+5.361 μs)
Call to: vkQueueSubmit
■ Vulkan API calls
Begins: 6.29294s
Ends: 6.29301s (+70.314 μs)
Correlation ID: 72
Call to: vkQueuePresentKHR
■ Vulkan API calls
Begins: 6.29302s
Ends: 6.29308s (+56.566 μs)
*************************************
End Command Buffer
*************************************
Call to: vkWaitForFences
■ Vulkan API calls
Begins: 6.2931s
Ends: 6.32207s (+28.970 ms)
https://ibb.co/jhXQ75t
-
- Bugbear
- Posts: 812
- Joined: Thu Dec 09, 2004 2:51 am
- x 42
Re: [2.3] Vulkan Progress
There was a bug in the example I was using, the correct compositor wasn't being picked up. Disabling shadows in PbsMaterials.compositor i.e removing ( "shadows PbsMaterialsShadowNode") and the Ogre pbsMaterial sample frame rate jumps to 500.dermont wrote: ↑Fri Oct 16, 2020 9:03 am ...
I have some old OIS examples that I updated and built against master (with overlay enabled) and they did not show the low frame rate as per the sdl demos.
So I copied the Pbs code into one of examples and in debug the sample ran 500 fps. I'll try and produce a simpler sdl/xcb demo to see if I can trace what is happening.
...
Shadows enabled
Code: Select all
Ave Time 35.03ms , min 34.47ms, max 35.79, FPS 29.
Command Begins Ends Time FPS %Frame
GPU(average) +35.02 ms
CPU 20.2198s 20.255s +35.137 ms 28 100
vkWaitForFences 20.2198s 20.2494s +29.564 ms 84
CommandBuffer 20.2549s 20.2549s +5.447 ms 15
vkQueueSubmit 20.2549s 20.2549s +45.276 μs
vkQueuePresentKHR 20.2549s 20.255s +37.739 μs
Code: Select all
Ave Time 2.13ms , min 1.36ms, max 2.9, FPS 496.
Command Begins Ends Time FPS %Frame
GPU(Average) +2.5 ms 469
CPU 18.8325s 18.8353s +2.752 ms 363 100
vkWaitForFences 18.8325s 18.8344s +1.815 ms 66
CommandBuffer 18.8344s 18.8352s +881.576 μs 32
vkQueueSubmit 18.8353s 18.8353s +10.595 μs
vkQueuePresentKHR 18.8353s 18.8353s +20.460 μs