void IndexData::optimiseVertexCacheTriList(void)

Discussion area about developing or extending OGRE, adding plugins for it or building applications on it. No newbie questions please, use the Help forum for that.
User avatar
dennis
Gremlin
Posts: 157
Joined: Mon Nov 11, 2002 4:21 pm
x 3

void IndexData::optimiseVertexCacheTriList(void)

Post by dennis »

Amazing that an Ogre developer only fairly recently rediscovered my VertexCacheTriListOptimizer. Which then got made generally accessible through the MeshUpgrader and which I then noticed in the news section which then got me wondering: "Hey, didn't I make something similar almost 20 years ago?". Which I then looked up in the source code and then realized it was my code after all.

The news article mentions version Ogre 1.6 as the version when the optimizer was implemented. I can't recall when that version was released but the optimizer was probably implemented even earlier. See this post from 2004.

Also that same news article references a paper from September 2006 by Tom Forsyth, which mentions some code, but not mine. I was inspired by the NvTriStrip stuff NVidia was promoting at the time and Microsoft's ID3DXMesh::OptimizeInplace, but I created the algorithm myself. It is a stoopit simple algorithm, but better than no optimization at all. My hope was that some other genius :wink: would come up with a better way. But almost 20 years later they just exposed mine as-is.

Simply amazing!

paroj
OGRE Team Member
OGRE Team Member
Posts: 2108
Joined: Sun Mar 30, 2014 2:51 pm
x 1132

Re: void IndexData::optimiseVertexCacheTriList(void)

Post by paroj »

I checked again and indeed it already appeared in Ogre 1.2.

After finding it, my main motivation was to make it more discoverable as it was not used anywhere within Ogre.

I also took a look at https://github.com/microsoft/DirectXMes ... irectXMesh, the open source ID3DXMesh and it seems to implement the Tom Forsyth method (LRU) and TVC: https://hhoppe.com/tvc.pdf

does TVC resemble what you did there? If so, I could add a proper reference to the code.

User avatar
dennis
Gremlin
Posts: 157
Joined: Mon Nov 11, 2002 4:21 pm
x 3

Re: void IndexData::optimiseVertexCacheTriList(void)

Post by dennis »

paroj wrote:

I checked again and indeed it already appeared in Ogre 1.2.

It was late 2004, does that line up with Ogre 1.2? My best guesstimate is pre 1.0.

paroj wrote:

After finding it, my main motivation was to make it more discoverable as it was not used anywhere within Ogre.

It was part of my Lwo2Mesh converter first and was later added as a patch for IndexData. I don't even know if it is still used there. (Can you check?) It can and should be used as much as possible where meshes are written to disk. AFAICT it should only increase triangle coherency. When reading meshes from disk for rendering the hardware used becomes more important. (cache size, cache type etc.) Using a snake-type movement would optimize cache usage for FIFO and the optimal left<->right movement is dependent on cache size. I posted an image of that, but that got lost in a forum move over the years. (Maybe it still exists and only the reference to it needs to be updated.)

paroj wrote:

I also took a look at https://github.com/microsoft/DirectXMes ... irectXMesh, the open source ID3DXMesh and it seems to implement the Tom Forsyth method (LRU) and TVC: https://hhoppe.com/tvc.pdf

does TVC resemble what you did there? If so, I could add a proper reference to the code.

Thanks for the reads! And no, neither papers or code they reference resemble what I created. My algorithm simply reorders on adjacency, ANY adjacency. To illustrate this, the one workhorse line is:

Code: Select all

if (triangles[ti].sharesEdge(triangles[j]))

And then a matching triangle is eventually and conceptually moved next to this one. TADA!

Looking at the Microsoft code I'm rather proud of the Triangle helper class I created. All this multiplication and modulating by threes does not help readability at all and probably can't be optimized by the compiler as well. And to use it all it took was one simple static_cast. (Except for IT_16BIT buffers of course.) Shame on you Microsoft for not doing better.

So what is available now is the simplest optimization algorithm I could come up with and so we should all see it as something with an enormous potential for improvement ... which has not happened the last 20 years because of obscurity? :idea:

One simple example could be to templatize Triangle (for use with uint16) so the conversion steps can be skipped.

paroj
OGRE Team Member
OGRE Team Member
Posts: 2108
Joined: Sun Mar 30, 2014 2:51 pm
x 1132

Re: void IndexData::optimiseVertexCacheTriList(void)

Post by paroj »

I checked the attic repo and optimiseVertexCacheTriList only appeared on the v1-2 branch:
https://github.com/OGRECave/ogre-attic/ ... exData.cpp

did not see it being used in Lwo2Mesh even in that version though. Maybe it uses the original implementation instead of that function..

Note that @Kojack tried to add Tom Forsyth method to Ogre and did some comparision with your implementation here:
viewtopic.php?p=325224#p325224

Generally, Tom Forsyth method seems to have the widest adoption as it is used by meshopt:
https://github.com/zeux/meshoptimizer/b ... imizer.cpp

If you wonder why I came back to this in the first place, it is because of Mesh shaders which make this kind of stuff explicit:
https://gpuopen.com/learn/mesh_shaders/ ... sh_shader/

User avatar
dennis
Gremlin
Posts: 157
Joined: Mon Nov 11, 2002 4:21 pm
x 3

Re: void IndexData::optimiseVertexCacheTriList(void)

Post by dennis »

paroj wrote: Thu Jul 18, 2024 12:06 pm

I checked the attic repo and optimiseVertexCacheTriList only appeared on the v1-2 branch:
https://github.com/OGRECave/ogre-attic/ ... exData.cpp

did not see it being used in Lwo2Mesh even in that version though. Maybe it uses the original implementation instead of that function..

Note that @Kojack tried to add Tom Forsyth method to Ogre and did some comparision with your implementation here:
viewtopic.php?p=325224#p325224

Kojack wrote: Fri Jan 23, 2009 9:15 am

Some performances for comparison.
Model: Standford Bunny (35947 vertices and 69451 triangles) with normals, stored in standard mesh file and loaded as an entity. One light source.
Computer: i7 940, geforce 7800gtx, vista64

No optimisation: 410fps
Ogre's optimiseVertexCacheTriList(): 704fps, 11 seconds to process
My version of Tom's optimiser: 958fps, less than 1 second to process

(previous rates were higher because model was unlit)

Not too bad. Not only is this technique giving better results than the built in one, but it's also near instant instead of taking 11 seconds (or several minutes like my last version).

Now that I have it working faster (and I haven't chucked it in my vtune demo yet), I'm going to restructure it a bit better, and put some easier support in for getting to the buffers (currently it needs to be given an IndexBuffer pointer, I'd rather give it an entity and let it find the index buffers for all sub meshes).

I tried using OgreTootle, but I can't get it to run (even if I build it myself). might be a vista problem, I'll try it on xp later.

Ok, these numbers were to be expected. It adds weight to my suggestion that all meshes written to disk should be optimized. If the format is an indexBuffer pointing to vertices of triangles of course. (Is that still the predominantly used format?) Maybe rewrite as an opt-out instead and do profiling runs compared to the original to make sure the quality of the mesh is not deteriorated.

Did Kojacks code make it into Ogre somewhere? The better fps was to be expected, the amazing difference in processing speed was not. Note that all the Triangle member functions are supposed to be inlined and that the difference between a debug and a release build is big. I don't understand what light sources have to with the optimization of the indexBuffer itself, but of course they have an impact on the fps.

paroj wrote:

Generally, Tom Forsyth method seems to have the widest adoption as it is used by meshopt:
https://github.com/zeux/meshoptimizer/b ... imizer.cpp

If you wonder why I came back to this in the first place, it is because of Mesh shaders which make this kind of stuff explicit:
https://gpuopen.com/learn/mesh_shaders/ ... sh_shader/

?

Again, nice article and I have seen that patchwork bunny before somewhere, or something similar. But are you implying that vertex cache optimization has become even more important? BTW have GPU caches switched strategy from FIFO to LRU or something else? My last somewhat in-depth knowledge of graphics cards stems from the GeForce2MX era.

paroj
OGRE Team Member
OGRE Team Member
Posts: 2108
Joined: Sun Mar 30, 2014 2:51 pm
x 1132

Re: void IndexData::optimiseVertexCacheTriList(void)

Post by paroj »

dennis wrote: Thu Jul 18, 2024 1:27 pm

Ok, these numbers were to be expected. It adds weight to my suggestion that all meshes written to disk should be optimized. If the format is an indexBuffer pointing to vertices of triangles of course. (Is that still the predominantly used format?) Maybe rewrite as an opt-out instead and do profiling runs compared to the original to make sure the quality of the mesh is not deteriorated.

yeah, it makes sense to switch to opt-out. However, the extra processing time is too high currently to do so.

dennis wrote: Thu Jul 18, 2024 1:27 pm

Did Kojacks code make it into Ogre somewhere? The better fps was to be expected, the amazing difference in processing speed was not. Note that all the Triangle member functions are supposed to be inlined and that the difference between a debug and a release build is big.

something might went wrong in the evaluation. However, generally your algorithm is O(N*N), while the other one is O(N), so the numbers kind of add up.
I did not see Kojacks code anywhere, unfortunately.

paroj wrote:

Again, nice article and I have seen that patchwork bunny before somewhere, or something similar. But are you implying that vertex cache optimization has become even more important?

with mesh shaders, there is no input assembly any more and hence no explicit index buffers and caches. Hence you must reorder your data to form the meshlets which essentially is the same task as vertex cache optimization. So yeah, it has become even more important :)