Most people probably don't think of it, but graphics cards have a very limited cache for transformed vertices. Different hardware has different sizes, but it's probably something small like 32. Using a rendering primitive like an indexed triangle list can give good performance gains because the indices reuse vertices rather than having 3 unique ones per vertex, but with only around 32 slots available in the cache, many vertices will still have to be recalculated (have their vertex shader run) more than once.
This paper covers a technique for high speed rearranging of the index buffer of a triangle list to make it more cache friendly, without knowing what cache size your target hardware is.
The paper is a bit vague on some bits so my version is slower than it should be (currently takes about 2 minutes to brute force process a high poly mesh, should take maybe a second or less with the proper shortcuts implemented) and I got the cache direction backwards when calculating the score for a vertex, but already the results are interesting.
So far to test it I've used the Stanford Bunny model. It has 35947 vertices and 69451 faces.
Loading the bunny's vertex and index data into a SimpleRenderable in ogre and rendering it (nothing else) I'm getting 785fps (measured in fraps on my geforce 7800gtx).
Loading the bunny's data the same way but giving the index buffer to my optimiser first, the same mesh is now rendering at 1250fps.
This is just fixed function, I'd assume heavy vertex shader use could gain even more from it.
Of course it's not suited to small meshes as much, it won't hurt but the gains wouldn't be as noticeable.
The code needs a ton of work (like speeding up, I'm doing an O(n^2) version because Tom's O(n) version was a little confusing, but I think I get it now), but I think it has potential.
Plus this is only stage 1 of the optimisation process, the full version has 2 more stages.
There's 4 possible places for this when (if) I finish it.
- stand alone code, stick it in your program to optimise what you want
- as an option in OgreXMLConverter
- as an option in meshmagic
- added to Ogre's MeshManager or some other suitable place (only if I REALLY clean it up and can show some consistent benefits).
The advantage of having it in Ogre itself instead of just in the tools would be so you could call it on a ManualObject, StaticGeometry or other runtime constructed mesh.
Damn, I've got to go out. When I get home I think I'll implement another Tom Forsyth paper I've been staring at for ages.
![Smile :)](./images/smilies/icon_smile.gif)
Down with teapots! Long live the Stanford Bunny!