CPU bottleneck

Discussion area about developing with Ogre-Next (2.1, 2.2 and beyond)


User avatar
TaaTT4
OGRE Contributor
OGRE Contributor
Posts: 267
Joined: Wed Apr 23, 2014 3:49 pm
Location: Bologna, Italy
x 75
Contact:

CPU bottleneck

Post by TaaTT4 »

Hi,

I'm tuning and optimizing my game for VR to keep a stable frame rate of 90 FPS as long as possible.

At the moment, my game is CPU bound.
I can clearly see it using the Minimum geometry debug feature of the NVIDIA Nsight profiler (see here for more infos).

I've already taken care of every CPU intensive task of OGRE I'm aware of:
  • Minimize render_scene passes to reduce frustum culling impact
  • Disable skeletal animations
  • Disable LOD calculations
To make CPU occupancy as more reliable as possible, I've (temporarly) disabled every CPU consuming subsystem I have like the physics engine, the UI platform and so on.
Anyway, the CPU impact of those subsytems is very low.

I believe the bottleneck is related to scene objects (count of them, complexity of them, instancing, graphics API overhead, something else?!) since the FPS drops just in some camera poses.
In a VR context an high stable frame rate is more important than visual fidelity.
Don't load "heavy" scene objects is not a big deal, every VR game does it, but I'd like to understand what those objects are.

In the Visual Studio CPU profiler I don't see any suspicious function which overloads the CPU and I don't really know if NVIDIA Nsight/RenderDoc can be helpfull with this kind of issue.
How can I proceed?

Senior programmer at 505 Games; former senior engine programmer at Sandbox Games
Worked on: Racecraft EsportRacecraft Coin-Op, Victory: The Age of Racing

xrgo
OGRE Expert User
OGRE Expert User
Posts: 1148
Joined: Sat Jul 06, 2013 10:59 pm
Location: Chile
x 168

Re: CPU bottleneck

Post by xrgo »

I think I am in the same situation, I noticed when using a nvidia 970, as I mentioned here http://www.ogre3d.org/forums/viewtopic.php?f=2&t=84162
but now with a 1070 the gpu usage is even lower (but I did get fps increase) I haven't profiled anything yet since I am really busy with non-engine related stuffs ( sniff :( ) for a while, I just wanted to bump for TaaTT4 and maybe later would work for me :P

Cheers!
User avatar
GlowingPotato
Goblin
Posts: 211
Joined: Wed May 08, 2013 2:58 pm
x 10

Re: CPU bottleneck

Post by GlowingPotato »

Hi,

I have a GTX 1070 and a monster CPU bottleneck with a 6700k CPU.
But... i'm not much of a valid parameter since we still need to migrate to v2 mesh and item.

Have you guys profiled the CPU already ? What is the most time consuming task for you?
User avatar
TaaTT4
OGRE Contributor
OGRE Contributor
Posts: 267
Joined: Wed Apr 23, 2014 3:49 pm
Location: Bologna, Italy
x 75
Contact:

Re: CPU bottleneck

Post by TaaTT4 »

GlowingPotato wrote: Have you guys profiled the CPU already ? What is the most time consuming task for you?
Skeletal animations and LOD calculations.
BTW, as I said before, the CPU bottleneck is still present even with both of them disabled.

All my meshes are in V2 format.

Forgot to say my specs:
  • i7-6700K @ 4.00 GHz (4 cores/8 threads)
  • NVIDIA GeForce GTX 1070

Senior programmer at 505 Games; former senior engine programmer at Sandbox Games
Worked on: Racecraft EsportRacecraft Coin-Op, Victory: The Age of Racing

User avatar
GlowingPotato
Goblin
Posts: 211
Joined: Wed May 08, 2013 2:58 pm
x 10

Re: CPU bottleneck

Post by GlowingPotato »

TaaTT4 wrote:BTW, as I said before, the CPU bottleneck is still present even with both of them disabled.
we don't use LOD... but we do have a LOT (really, a lot) of skeletal animations... more than 200 skeleton files.
User avatar
TaaTT4
OGRE Contributor
OGRE Contributor
Posts: 267
Joined: Wed Apr 23, 2014 3:49 pm
Location: Bologna, Italy
x 75
Contact:

Re: CPU bottleneck

Post by TaaTT4 »

GlowingPotato wrote: we don't use LOD... but we do have a LOT (really, a lot) of skeletal animations... more than 200 skeleton files.
Be sure to put lod_update_list false somewhere in your render_scene passes otherwise some CPU cycles are still wasted in LOD things (I'm sorry, I didn't rememer the details since some months has been passed).
I have about 1000 animated spectators at max quality setting :D

Senior programmer at 505 Games; former senior engine programmer at Sandbox Games
Worked on: Racecraft EsportRacecraft Coin-Op, Victory: The Age of Racing

User avatar
GlowingPotato
Goblin
Posts: 211
Joined: Wed May 08, 2013 2:58 pm
x 10

Re: CPU bottleneck

Post by GlowingPotato »

Thans for the tip! :D
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: CPU bottleneck

Post by dark_sylinc »

Do you guys have screenshots from CodeXL / VTune / Visual Studio Profiler to show? (showing the callstack ordered with hotspots first).

So far Ogre 2.1 has been severely GPU bottleneck. I'm interested in knowing where you guys are having trouble.
I know TaaTT4 has >30.000 Items with around 4 shadow map splits; + second pass for VR. That's a lot of frustum culling and command preparation. Plus, you're hitting the 60hz mark but now you want the 90hz mark :roll: :lol:

I have a few ideas for improving performance by spreading workload command preparation across cores (basically thread Hlms::fillBuffersFor & RenderQueue::render). But I need to know it's really the bottleneck as it's quite a lot of work (not hard, just... time consuming and tiring) and I don't wanna work myself to death on the wrong spot.

Other than that, optimizations would either:
  • Revolve around you, merging Items into bigger Items to reduce CPU workload.
  • Revolve around Ogre adding VR-only optimizations. The "bigger frustum that encompasses both eyes" algorithm, and command list reuse (i.e. prepare commands for the first eye, replay these commands for the second eye but with a different set of TexBufferPacked to send different worldViewProj matrices). This could achieve massive CPU gains, but it's only useful for VR.
xrgo
OGRE Expert User
OGRE Expert User
Posts: 1148
Joined: Sat Jul 06, 2013 10:59 pm
Location: Chile
x 168

Re: CPU bottleneck

Post by xrgo »

dark_sylinc wrote:[*]Revolve around Ogre adding VR-only optimizations. The "bigger frustum that encompasses both eyes" algorithm, and command list reuse (i.e. prepare commands for the first eye, replay these commands for the second eye but with a different set of TexBufferPacked to send different worldViewProj matrices). This could achieve massive CPU gains, but it's only useful for VR.[/list]
This would be amazing! the second one would be the one the other engines call "single pass stereo rendering" ??
User avatar
TaaTT4
OGRE Contributor
OGRE Contributor
Posts: 267
Joined: Wed Apr 23, 2014 3:49 pm
Location: Bologna, Italy
x 75
Contact:

Re: CPU bottleneck

Post by TaaTT4 »

dark_sylinc wrote: Do you guys have screenshots from CodeXL / VTune / Visual Studio Profiler to show? (showing the callstack ordered with hotspots first).
I send it to you tomorrow (PM is OK?) with the relative RenderDoc snapshot too.
I usually use Visual Studio Profiler, but I can switch to VTune if you prefer.
dark_sylinc wrote: I know TaaTT4 has >30.000 Items with around 4 shadow map splits; + second pass for VR. That's a lot of frustum culling and command preparation.
To reduce frustum culling workload, I'll create simple track to use as a case study:
  • 3 PSSM splits instead of 4
  • No dynamic reflections on the car
  • No spectators on the grandstands and around the track
  • No vegetation (grass and trees)
In addition, all my subsystem which consume CPU cycles (like physics engine, UI platform and so on) will be disabled.
dark_sylinc wrote: I have a few ideas for improving performance by spreading workload command preparation across cores (basically thread Hlms::fillBuffersFor & RenderQueue::render).
I guess command preparation could be the culprit of this issue.
dark_sylinc wrote: The "bigger frustum that encompasses both eyes" algorithm, and command list reuse (i.e. prepare commands for the first eye, replay these commands for the second eye but with a different set of TexBufferPacked to send different worldViewProj matrices). This could achieve massive CPU gains, but it's only useful for VR.
I've already achieved to share frustum culling data between different render_scene passes and to use a "huge frustum that includes both eyes".
Some changes to OGRE and a bit of trigonometry did the magic.
Command preparation and how it works is a complete black-box for me (I confess, I haven't studied...).

Anyway, the CPU bottleneck is not related to a VR context only.
Even in a standard rendering workflow I'm still CPU bound a I have a very variable framerate (from 180 FPS to about 95 FPS) with no apparent reason.

Senior programmer at 505 Games; former senior engine programmer at Sandbox Games
Worked on: Racecraft EsportRacecraft Coin-Op, Victory: The Age of Racing

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: CPU bottleneck

Post by dark_sylinc »

xrgo wrote:This would be amazing! the second one would be the one the other engines call "single pass stereo rendering" ??
No. Single pass stereo rendering involves somehow rendering twice in the same command. This "somehow" is either achieved via instancing (i.e. double instancing count to draw twice) and geometry shaders (to draw to different RTTs and/or viewports) or some extension that produces the same effect (VPAndRTArrayIndexFromAnyShaderFeedingRasterizer) usually at the cost of some GPU power (which is why it's not always a win).
User avatar
TaaTT4
OGRE Contributor
OGRE Contributor
Posts: 267
Joined: Wed Apr 23, 2014 3:49 pm
Location: Bologna, Italy
x 75
Contact:

Re: CPU bottleneck

Post by TaaTT4 »

TaaTT4 wrote: I send it to you tomorrow (PM is OK?) with the relative RenderDoc snapshot too.
I'm sorry, but today I've been super busy with a deadline which has suddenly become very strictly.
Hope to send you the data on monday.

Senior programmer at 505 Games; former senior engine programmer at Sandbox Games
Worked on: Racecraft EsportRacecraft Coin-Op, Victory: The Age of Racing

xrgo
OGRE Expert User
OGRE Expert User
Posts: 1148
Joined: Sat Jul 06, 2013 10:59 pm
Location: Chile
x 168

Re: CPU bottleneck

Post by xrgo »

So I ported my engine to windows, now I can use all the super cool tools for profiling that I don't know how to use yet :P
but since I just integrated openvr, I started with the times profiler that comes with it... here are some results (with stereo rendering (big res for each eye), no msaa, no multisampling, mirroring left eye):

here is when looking at the sky, almost no object except the skydome:
Image
notice that GPU load is 55%, and GPU time is very low, but CPU time is very high... and whenever there is a spike on CPU and time goes over 11ms GPU time also spikes

Now looking at a warehouse that has many racks inside, many separated objects (most with static nodes)
Image
notice that GPU load is 30%, but GPU time is very high, I am guessing its the same reason avobe, if CPU time is over 11ms GPU time will spike with it, and in this case its almost all the time... still there's plenty of GPU available to use =D

I remember many many time ago I tested merging all the racks inside the warehouse to few objects and I did get a bit more fps, I was on linux so I didn't run any profiler, it was just a quick test. But I thought that might be related to sceneNode count so I created many many scenenodes and it didn't make a difference, so maybe is the frustum culling or the commands being sent to the gpu

I'll see if I can get more information

Saludos!!
User avatar
GlowingPotato
Goblin
Posts: 211
Joined: Wed May 08, 2013 2:58 pm
x 10

Re: CPU bottleneck

Post by GlowingPotato »

Hi,

we dig a little a while ago and we found that shadows is consuming a lot of CPU time. Something we can't fix.
This is the thread http://www.ogre3d.org/forums/viewtopic.php?f=25&t=88975
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: CPU bottleneck

Post by dark_sylinc »

CPU side, shadow mapping is not inherently more time consuming than any regular pass (the only extra operation I can think of is a pass that calculates the merged AABB of the entire scene; which should be happening once per frame).

However, if you have for example 4 PSSM splits; then Ogre will render 5 passes (first 4 shadow maps + the regular render pass), excluding what happens inside updateSceneGraph (which happens once per frame, regardless of the number of passes), naturally shadow mapping will occupy more time because 4/5ths of the processing time (culling and command generation) will be spend doing shadow maps, while 1/5th is for the regular pass.

(note this does not explain the high variance in the frame timing of your post, only why shadow mapping would appear consuming the most of the CPU time)
User avatar
TaaTT4
OGRE Contributor
OGRE Contributor
Posts: 267
Joined: Wed Apr 23, 2014 3:49 pm
Location: Bologna, Italy
x 75
Contact:

Re: CPU bottleneck

Post by TaaTT4 »

So, here is my analysis with some evidences provided.

To reduce CPU/GPU workload, I've created a quite simple scenario with the following conditions:
  • No CPU consuming subsystem is running (physic engine, game logic, UI platform and so on)
  • No vegetation (grass blades and tree billboards)
  • No spectators on the grandstands and around the track
  • No shadows mapping at all
  • No dynamic reflections
  • No post processing effect (apart tone mapping)
The scene is composed by the following objects:
  • Cameras: 71
  • Items: 7211 (166 of them are dynamic)
  • Lights: 139
  • Nodes: 7964 (397 of them are dynamic)
As you can see in the following image, the FPS stay at about 220.
Image

Using the "Minimum geometry" debug feature of the NVIDIA Nsight profiler, which only draws the first triangle of each draw call (see here for more infos), the FPS is the same as before.
This is a clear symptom of a CPU bottleneck.
Image

So, I've take a run with Visual Studio Profile (I'm sorry, but it seems VTune doesn't work on my PC, it stucks during OGRE initialization) and these are the results:
Image
Image
For what I've understood inspecting the OGRE source code, most of the time is spent in command buffer preparation and execution which unlike culling isn't parallelized.
Am I right?

Matias let me know if you need some other infos (like a RenderDoc dump or some other profiler data).
I can easily reproduce this scenario.

Senior programmer at 505 Games; former senior engine programmer at Sandbox Games
Worked on: Racecraft EsportRacecraft Coin-Op, Victory: The Age of Racing

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: CPU bottleneck

Post by dark_sylinc »

As I explained in another post, Minimum Geometry may not always be indicative of a CPU bottleneck.

Your profiler is indeed telling you the most expensive portion, Ogre-side, is command generation; but it's also telling the most significant portion is spent inside NVIDIA's driver; which could be either command processing, or waiting for the GPU.

Do you have that handy Nsight API Call Summary? It will shed a lot of light. Either it's actually GPU bottlenecked (and you have very few vertices per Mesh) or CPU bottlenecked (and all your guesses are correct)

A RenderDoc dump may also be useful to discard anything huge.
User avatar
TaaTT4
OGRE Contributor
OGRE Contributor
Posts: 267
Joined: Wed Apr 23, 2014 3:49 pm
Location: Bologna, Italy
x 75
Contact:

Re: CPU bottleneck

Post by TaaTT4 »

I'm not at work today, so I'll send you the dump on monday.
dark_sylinc wrote: Do you have that handy Nsight API Call Summary? It will shed a lot of light. Either it's actually GPU bottlenecked (and you have very few vertices per Mesh) or CPU bottlenecked (and all your guesses are correct)
Do you also want a reproducible NSight dump?

Senior programmer at 505 Games; former senior engine programmer at Sandbox Games
Worked on: Racecraft EsportRacecraft Coin-Op, Victory: The Age of Racing

User avatar
TaaTT4
OGRE Contributor
OGRE Contributor
Posts: 267
Joined: Wed Apr 23, 2014 3:49 pm
Location: Bologna, Italy
x 75
Contact:

Re: CPU bottleneck

Post by TaaTT4 »

Hi Matias,

I've sent you a PM with a link where to download both the NVIDIA NSight and Renderdoc dumps.
Let me know if you need something more.

Senior programmer at 505 Games; former senior engine programmer at Sandbox Games
Worked on: Racecraft EsportRacecraft Coin-Op, Victory: The Age of Racing

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: CPU bottleneck

Post by dark_sylinc »

I just managed to download your captures.

Btw Nsight API Call Summary. I need those numbers.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: CPU bottleneck

Post by dark_sylinc »

Well I've ran the RenderDoc capture and I noticed two things:
  • You have A LOT of low-vertex count objects. Many objects are <100 triangles. In fact a lot of them are < 40 tris. You must be starving your GPU from work. Batching them together would help your GPU not just your CPU. I wouldn't be surprised if you're actually GPU bound.
  • A LOT of these objects don't seem to be contributing to your scene in any meaningful way. They're so distant it doesn't even occupy one pixel. Make use of MovableObject::setRenderingDistance. When the distance to camera becomes > renderingDistance, Ogre won't include the object for rendering (this is evaluated during frustum culling). Small insignificant objects should disappear sooner than the big ones.
While you can do both solutions, eventually these will clash since rendering distance accounts AABB, and batching things together will enlarge the AABB (making it broader when distinguishing what should be excluded from rendering).

Considering setRenderingDistance is a quick solution ala "just try it"; I would start there.
User avatar
TaaTT4
OGRE Contributor
OGRE Contributor
Posts: 267
Joined: Wed Apr 23, 2014 3:49 pm
Location: Bologna, Italy
x 75
Contact:

Re: CPU bottleneck

Post by TaaTT4 »

dark_sylinc wrote: You have A LOT of low-vertex count objects. Many objects are <100 triangles. In fact a lot of them are < 40 tris. You must be starving your GPU from work. Batching them together would help your GPU not just your CPU. I wouldn't be surprised if you're actually GPU bound.
A LOT of these objects don't seem to be contributing to your scene in any meaningful way. They're so distant it doesn't even occupy one pixel. Make use of MovableObject::setRenderingDistance. When the distance to camera becomes > renderingDistance, Ogre won't include the object for rendering (this is evaluated during frustum culling). Small insignificant objects should disappear sooner than the big ones.
Set objects render distance is something which has been in my todo list since almost the beginning, but I've always postponed it waiting to have all kind of objects in scene.
I guess now the time has come.

What I'm aiming to do is to mark every part of a complex object as "required" vs "optional".
For example, in a granstand the roof and the concrete structure will be marked as "required", while the seats, the fences, the poles, the stairs and so on will be marked as "optional".
The "optional" meshes will have a lower render distance than "required" meshes.
In this way, I'll be able to cut out small insignificant objects very soon and to preserve object shapes at a distant position avoiding popping.

As soon as this system will be ready, I'll send you another set of dumps.

BTW, I guess all those tiny objects you've seen are transparent items (fences especially) which haven't been batched together to avoid this AABB transparency issue.
dark_sylinc wrote: Btw Nsight API Call Summary. I need those numbers.
I've sent you a PM with a link where to download it.

Senior programmer at 505 Games; former senior engine programmer at Sandbox Games
Worked on: Racecraft EsportRacecraft Coin-Op, Victory: The Age of Racing

User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: CPU bottleneck

Post by dark_sylinc »

TaaTT4 wrote:Set objects render distance is something which has been in my todo list since almost the beginning, but I've always postponed it waiting to have all kind of objects in scene.
I guess now the time has come.
For what is worth, PixelCount LOD strategy (ScreenRatioPixelCountLodStrategy::lodUpdateImpl) shows how to calculate the area of the screen a sphere will occupy at a given distance. It needs 3 things:
  • The projection matrix (dependency on FOV settings)
  • The object's radius (mWorldRadius, which accounts for final scale)
  • Distance to camera (taken from cameraPos - mWorldAabb->mCenter in our code)
You could use that math to calculate a distance value at which you will know mathematically that the object will be < 1 pixel; and use that as a minimum visibility distance for each object (and boost it for irrelevant objects that can disappear sooner; TBH we should be the ones providing this facility for you instead of telling you to implement it :lol: ).

Please note that it's resolution dependent. At 1920x1080 the area threshold is being smaller than a quarter of a pixel, that is area = 1 / (1920 * 4) which is 0.0001302083 but at 4K resolution it's 1/(3840 * 4). You'll probably want to use doubles for your math. Or maybe you won't care about resolution.
TaaTT4 wrote: What I'm aiming to do is to mark every part of a complex object as "required" vs "optional".
For example, in a granstand the roof and the concrete structure will be marked as "required", while the seats, the fences, the poles, the stairs and so on will be marked as "optional".
The "optional" meshes will have a lower render distance than "required" meshes.
If the math pans out, you won't need the "required" category (since everything that is <1 pixel is 100% reasonable it should be left out). What you want is a booster variable (so they get dropped way before the become <1 pixel, but it's something manageable) to use for seats, stairs, etc. or an override (i.e. specify the value directly: drop the object after it's 150 meters away)
TaaTT4 wrote: BTW, I guess all those tiny objects you've seen are transparent items (fences especially) which haven't been batched together to avoid this AABB transparency issue.
Short bed sheet problem, eh? (do we cover our feet or our head? can't cover both at the same time).
For fences that are always exterior (i.e. you're always inside the fence) you can batch together the fence parts and then put it in a lower RenderQueue since it's guaranteed they should be rendered before trees. It will only look incorrect if you look the scene from the other side of the fence (i.e. from the outside):
Image
In this case you can batch together the skin-coloured fence and put it in a lower RQ ID so it gets rendered first (which works always as long as you're inside the fence). You could batch the whole fence in one draw, or in several few (for better culling, it really depends on how many vertices make up that fence. If the whole fence is <1000 vertices just batch the whole thing).
TaaTT4 wrote:
dark_sylinc wrote: Btw Nsight API Call Summary. I need those numbers.
I've sent you a PM with a link where to download it.
1. I guess I should've told you by now I can't run NSIGHT (no NV card, died like a year ago, still haven't bought a replacement...)
2. I don't want my numbers. I want your numbers. Perhaps your machine says CPU bottlenecked while mine says GPU bottlenecked (or viceversa). I need to know your numbers. Generating the call summary myself is pointless.

Cheers
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5296
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: CPU bottleneck

Post by dark_sylinc »

I've analyzed the call graphs you sent me.

NSight says the GPU spends 5.22ms (max 191 FPS) and the CPU spends 1.33ms (max 746 FPS). This is consistent with the screeshots you posted (at roughly 220 fps), but indicating you're GPU bound.

Your problem is too few vertices per draw (reduce the draw count by either batching draws together or just removing some) and tiny triangles (when triangles occupy less than 4 pixels you're under-utilizing your hardware; because GPUs operate on at least 4 pixels at a time). You could be having a bottleneck in the rasterizer (the rasterizer removes away a lot of triangles because the are < 1 pixel; thus leaving the shader cores idle).

The good news is that properly using draw distance should fix the majority of those problems (because you'll be rendering less draws and avoiding small triangles)

Cheers
Matias
User avatar
TaaTT4
OGRE Contributor
OGRE Contributor
Posts: 267
Joined: Wed Apr 23, 2014 3:49 pm
Location: Bologna, Italy
x 75
Contact:

Re: CPU bottleneck

Post by TaaTT4 »

dark_sylinc wrote:
TaaTT4 wrote: Set objects render distance is something which has been in my todo list since almost the beginning, but I've always postponed it waiting to have all kind of objects in scene.
I guess now the time has come.
For what is worth, PixelCount LOD strategy (ScreenRatioPixelCountLodStrategy::lodUpdateImpl) shows how to calculate the area of the screen a sphere will occupy at a given distance. It needs 3 things:
  • The projection matrix (dependency on FOV settings)
  • The object's radius (mWorldRadius, which accounts for final scale)
  • Distance to camera (taken from cameraPos - mWorldAabb->mCenter in our code)
You could use that math to calculate a distance value at which you will know mathematically that the object will be < 1 pixel; and use that as a minimum visibility distance for each object (and boost it for irrelevant objects that can disappear sooner; TBH we should be the ones providing this facility for you instead of telling you to implement it :lol: ).

Please note that it's resolution dependent. At 1920x1080 the area threshold is being smaller than a quarter of a pixel, that is area = 1 / (1920 * 4) which is 0.0001302083 but at 4K resolution it's 1/(3840 * 4). You'll probably want to use doubles for your math. Or maybe you won't care about resolution.
TaaTT4 wrote: What I'm aiming to do is to mark every part of a complex object as "required" vs "optional".
For example, in a granstand the roof and the concrete structure will be marked as "required", while the seats, the fences, the poles, the stairs and so on will be marked as "optional".
The "optional" meshes will have a lower render distance than "required" meshes.
If the math pans out, you won't need the "required" category (since everything that is <1 pixel is 100% reasonable it should be left out). What you want is a booster variable (so they get dropped way before the become <1 pixel, but it's something manageable) to use for seats, stairs, etc. or an override (i.e. specify the value directly: drop the object after it's 150 meters away)
Thanks a lot!
Looking at what happens behind the pixel count strategy is a very nice hint.
I'll implement my render distance auto-adaptation system upon that math.
I also believe that, in this way, will be easier for artists to visually tune up the scene.
dark_sylinc wrote:
TaaTT4 wrote: BTW, I guess all those tiny objects you've seen are transparent items (fences especially) which haven't been batched together to avoid this AABB transparency issue.
Short bed sheet problem, eh? (do we cover our feet or our head? can't cover both at the same time).
For fences that are always exterior (i.e. you're always inside the fence) you can batch together the fence parts and then put it in a lower RenderQueue since it's guaranteed they should be rendered before trees. It will only look incorrect if you look the scene from the other side of the fence (i.e. from the outside):
Image
In this case you can batch together the skin-coloured fence and put it in a lower RQ ID so it gets rendered first (which works always as long as you're inside the fence). You could batch the whole fence in one draw, or in several few (for better culling, it really depends on how many vertices make up that fence. If the whole fence is <1000 vertices just batch the whole thing).
I'd like the things were so simple...
The tracks are more similar to the following image:
Image
And I have no guarantee that the camera will always be on a particular side of a fence/tree (think about TV cameras).
Lower RQ IDs works very well, but just for "splatted" transparent objects (like tarmac stripes, grass blades and so on).
dark_sylinc wrote: I've analyzed the call graphs you sent me.

NSight says the GPU spends 5.22ms (max 191 FPS) and the CPU spends 1.33ms (max 746 FPS). This is consistent with the screeshots you posted (at roughly 220 fps), but indicating you're GPU bound.

Your problem is too few vertices per draw (reduce the draw count by either batching draws together or just removing some) and tiny triangles (when triangles occupy less than 4 pixels you're under-utilizing your hardware; because GPUs operate on at least 4 pixels at a time). You could be having a bottleneck in the rasterizer (the rasterizer removes away a lot of triangles because the are < 1 pixel; thus leaving the shader cores idle).

The good news is that properly using draw distance should fix the majority of those problems (because you'll be rendering less draws and avoiding small triangles)
Thanks very much for the help!
I'll make another dump to check differences as soon as I'll fixed the render distances.

Senior programmer at 505 Games; former senior engine programmer at Sandbox Games
Worked on: Racecraft EsportRacecraft Coin-Op, Victory: The Age of Racing

Post Reply