SOAR-based terrain engine

Anything and everything that's related to OGRE or the wider graphics field that doesn't fit into the other forums.
Post Reply
David
Gnoblar
Posts: 8
Joined: Wed Oct 01, 2003 8:38 pm
Contact:

SOAR-based terrain engine

Post by David » Wed Dec 31, 2003 11:41 pm

Just to let the community know, I had orignially planned to work on the PagingLandscape plugin, but have decided against it, since I need a terrain engine with slightly different characteristics. Primarily, a very large viewing distance and the ability to fly over this terrain with little or no loss of quality or stuttering. So, I'm working on yet another scene manager based on the SOAR terrain engine. You may be more familiar with the Ranger Mk2 demo, which adds procedural detail to the SOAR algorithm... I intend to do the same, but in a simpler (and more disk efficient) way.

The biggest problem with the SOAR engine is that it has a large precalculation step that generates very large files (on the order of 1.3 GB for a 16k x 16k heightmap). I intend to reduce this disk space requirement considerably by adding a small amount of run-time code: mainly, I plan to keep up with the current X and Z coordinates as I traverse the terrain DAG, which removes 10 bytes from the originally 20 byte vertex struct (the height will only be stored as a 16 bit integer, and converted to a float on the fly). Also, I plan to store object space error values as a 16 bit integer as well (error is the difference between actual height and the average of 2 vertices, so error * 2 is always an integer). This removes two more bytes from the struct. Finally, I plan to keep a table of the maximum bounding sphere sizes per level, and using this for all vertices in a given level as opposed to storing individual bounding sphere sizes in each vertex struct, removing 4 more bytes from the struct. This brings the total per vertex storage requirements from 20 down to 4 bytes.

I also plan to implement the procedural detail somewhat like that seen in the Ranger Mk2 demo, but with no huge precalculation step. I will simply store a detail texture that will be consulted once the algorithm reaches the full resolution of the primary heightmap. The horizontal resolution of this texture must be equal to the resolution of the primary heightmap / 2^k, where k is an integer. The detail texture will just store the difference from the average of the two vertices that the vertex is splitting (which are a level higher, so they will already exist) and will be repeated over the entire landscape. I'll probably implement some sort of height-based scaling features so mountains can appear more rocky while lowlands will appear more smooth. Finally, bump maps will also be added in to give even more percieved detail once I figure out how to do multitexturing over a dynamically created Renderable.

Anyone who is familiar with the SOAR terrain engine, feel free to comment on this approach. With the combination of added procedural detail and stripping down the orignal vertex struct, I think I'll remove the primary disadvantage of the SOAR engine (disk space) without decreasing it's incredible speed very much. If anyone sees any huge flaws in my logic, please point them out.

David
0 x

User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19261
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
Contact:

Post by sinbad » Thu Jan 01, 2004 12:39 am

Hi David, that sounds intriguing, I looked at SOAR a while back and similarly I was a bit horrified by the precalculation step, and I also thought there was quite a lot of popping going on, which suggested there was a lot of splitting / merging happening, but I haven't really studied the algorithm properly to know whether that's really the case or not.

BTW I suggest you look at cvs or the upcoming 0.13.0, there are a lot more material features in the new version which will help you achieve the multipass approach you are looking for.

Keep us updated, visually the ranger demo is very nice so it would be interesting to see a version of the algorithm in Ogre. :)
0 x

David
Gnoblar
Posts: 8
Joined: Wed Oct 01, 2003 8:38 pm
Contact:

Post by David » Thu Jan 01, 2004 5:36 pm

Well, no splitting or merging actually happens in SOAR..... the S stands for Stateless: a new mesh is created using top-down refinement every frame. The rationale behind that is this: Algorithms which try to use frame-to-frame coherence really only work when the camera isn't moving. As soon as the camera moves, several triangles need to be both merged and split, and each of those merges and splits causes more merges and splits to be needed to prevent T-junctions from forming. The process of propagating errors out to prevent T-junctions takes so much time that you might as well just refine from the top down to start with.

The general form of a SOAR mesh is like follows: Take your entire square map, and stick an X in the middle of it. Now you have 5 vertices forming four right isosceles triangles. To refine this map furthur, draw a line from the right angles that bisects the hypotenuse for each triangle. The new vertex that might be formed from this operation is considered "active" (and therefore exists in the final mesh) if the error stored from the precalculation step, translated into screen space each frame, is greater than a given threshold value. (drawings really help here :( ) The process continues recursively (each bisected triangle forms 2 more right isosceles triangles), until inactive vertices are found (vertices whose screen space error lies below the threshold). During the recursion, the bounding sphere of the vertex (also precalculated) is checked against the view frustum, and the vertex is determined inactive if it's bounding sphere lies totally outside of the frustum. The bounding sphere encompasses all of the vertex's children, so this will not cause us to miss triangles in lower levels that should be included.

The precalculation step is basically responsible for 2 things: object space error and bounding spheres. Bounding spheres are easy: if you view the vertices as a DAG: the first level holds the center vertex, and it's children are the 4 vertices in the center of each side of the map. Every vertex has 4 children (except vertices around the edges of the map), and every vertex has 2 parents (the right angle vertex of the triangles on either side of the hypotenuse the vertex bisects). The vertices at the deepest level of the DAG have no children, so their bounding spheres have a radius of 0. From there up, a vertex's bounding sphere encompasses the bounding spheres of all its children (i.e. parent bounding sphere radius = max of (distance to child + child bounding sphere radius)). As far as I can tell, in the two dimensional case (i.e. bounding cylinders), all vertices on a given level of the DAG have the same bounding radius, so I may store this information at the top of the file to save per-vertex space.... at the cost of not being able to cull as many vertices against the frustum. The problem is, frustum vs. cylinder intersections are harder than frustum vs. sphere intersections (except when the camera is not allowed to roll side to side), so our probably already CPU-bound algorithm is only getting more CPU intensive. But adding the 32-bit floating point bounding sphere radius to every vertex doubles the size of the file. I'll probably end up making it a compilation option for both the precalculation program and the scene manager, so people can decide between speed and disk space.

The other precalculated value is the object space error. To visualize the object space error, pick any triangle and bisect it's hypotenuse. The difference between this height (the average of the two corner vertices) and the real height of that point is the object space error. But there's a catch. In order to prevent T-junctions from forming, this condition must be met: for each active vertex, both of it's parents must also be active. Inductively, all of it's ancestors must also be active. In order to force this property to hold true without introducing even more calculations at run-time, we simply enforce the rule that all vertices must have greater or equal error than their children. At run-time, we project the object space error to the nearest edge of the bounding sphere, do a little trig., and get the screen space error. Since children's bounding spheres are inside their parent's bounding spheres, and children's object space error is guaranteed to be less than or equal to it's parent's object space error, the child's screen space error will always be less than it's parents', so if the child is active, its parents must be as well.

Now that you know a little more about the algorithms involved, let me explain to you how popping may or may not exist in a given SOAR simulation. It's all determined by that error threshold value. Think of the threshold value as the number of pixels on the screen a vertex is allowed to get from it's actual position before it pops to that position. This means, if you set the error threshold to a number less than or equal to 1.0, a pop will happen within the space of a single pixel, so it will be virtually undetectable. Even with a threshold of 2.0, you can only see pops if you're looking for them. At 5.0+, the pops become pretty noticeable, but you're rendering a very small fraction of the number of triangles you had to render at 1.0 or 2.0. You're also paging the memory mapped map file in and out of memory less, putting less load on the CPU, using less GPU RAM, and generally speeding everything up. So you can change this value on the fly to help keep FPS constant by allowing more time and memory for other processes, like rendering those 200 people that just showed up for the guild raid :) .

I've looked at the new multi-pass material features, and I'm not sure that they will do quite what I want them to do. Basically, I want to allow one texture per vertex, and blend between multiple textures when two adjacent vertices have different textures. I want to allow for up to 256 textures, of which maybe 10 or 20 will be used on 98% of the vertices (things like grass, sand, snow, rock, etc.). The other 240 will only be used sparingly (things like gravel path, rock path, asphalt road, etc.). And I want to stick a bump map on top of that on vertices within a certain distance. I don't want to do 256 passes when a vertex will at most be blending between 9 different textures. So the only ideas I can come up with are either split the renderables up so that each renderable only has a single texture... The problem with this is building the triangle strip over an arbitrary surface. I know that's not a very difficult problem, but SOAR has a method for building the triangle strip while traversing the DAG, so any extra processing would best be avoided.

The other idea I had was to put all the textures in one big texture, and use UV's to pick the right ones. But I'm not sure how I'd blend the edges together using this technique. All the edge areas would have strange looking textures after the first pass.... I'd probably end up still traversing the terrain and doing one renderable per texture, but only on the edges.

I guess the best solution is to use a fragment program, and do the blending myself, but I'm pretty sure my GeForce 4 Ti 4200 won't let me do that... and since it won't work on any older hardware, I'd need another method anyway.

If anyone else has any ideas on how to get this portion working, let me know.
0 x

User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19261
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
Contact:

Post by sinbad » Thu Jan 01, 2004 7:13 pm

Does it really rebuild the entire mesh every frame? Sounds rather wasteful and not at all GPU friendly (GPUs like static geometry) - does it actually do it over a number of frames or genuinely every frame?

As for the texturing, you will have to break up your renderables into groups based on the texture they need, and multipass render with different textures (and yes, you'll need to use different materials for this rather than multiple passes within one texture). You can optimise this by doing a base pass with the most common texture in the landscape (this could be predetermined, or possibly on a per-region or even per-frame basis), then doing subsequent passes with subsets of geometry using an alpha blend and the other texture (this can be done with a 2-texture unit card). Full details in Charles Bloom's paper: http://www.cbloom.com/3d/techdocs/splatting.txt
0 x

David
Gnoblar
Posts: 8
Joined: Wed Oct 01, 2003 8:38 pm
Contact:

Post by David » Sat Jan 03, 2004 11:23 am

Sorry this reply took so long.... I needed to do some extra research :)

There are two different approaches to rebuilding the mesh: the first is the best looking way, and that is rebuilding the mesh every frame. The second is rebuilding the mesh in a separate thread, and updating it every time the camera moves n units from the position the last mesh was built for. Popping artifacts increase greatly with this approach.

ROAM is the other algorithm that uses frame to frame coherence to take advantage of GPU features. I still have a good bit of research to do before I understand the algorithm as well as I do SOAR, but here is my current understanding. It uses two priority queues to determine which triangles need to be split and merged. This means that every frame, every triangle must be visited to update it's priority (based on the screen space error). Then you have a tree of all the triangles in memory which is traversed twice... once to update the priorities, and the second time (after merges and splits) to send the leaf triangles to the GPU (as VBO indices). You can probably imagine that this tree takes a good bit of system memory.

The point is, you can't keep full resolution heightmap data in the GPU, even if you only keep visible data. For instance, in my application, we need a view distance of 2km... to allow for turning around without massive paging of data, we would need to keep a 4km^2 area in graphics memory... at 1m resolution, that's 16*12 = 192MB of graphics memory... that won't work. So you have to just keep the data you need in graphics memory... and it's going to take a lot of system memory (and time) to figure out exactly what you need. SOAR assumes that that time is so large that you might as well just send a new VBO every time and forget all the complexities of maintaining state.

I'm thinking a hybrid approach might actually be in order. I have 2 ideas: either make the memory mapped file that SOAR relies on a read-write mapping, and store VBO indices in the file (let the OS page it out to disk and back if memory requirements demand, but usually what's on the screen will be in memory unless you're moving fast). This would slow down paging the data in, but would allow lots of cool stuff including dynamic terrain modification and VBO usage. I just don't know how big of a hit the copying stuff back out to disk before you can page in new data will be.

The other way to store the extra information would be to have a separate in memory data structure that holds the VBO indices and dynamic height changes (and probably a few more bits and pieces). This isn't as good of an option as it seems, because it makes the view distance dependent on the RAM you have to spare (most terrain algorithms do), but it also makes dynamic height modification harder: you have to recalculate the error values for all the modified indices (a quick recursive function, but bad enough to not want to do it more than you have to... especially over big areas). So perhaps we should write the dynamic height changes to disk when they happen, but keep the VBO indices in memory. This might work, but it means closing the file mapping, opening the file for writing, writing the changes, closing it, and re-opening it as a read only memory mapped file. Or perhaps we make the file map read-write, and write dynamic terrain adjustments out that way. Are the OSes smart enough to not write a page back to disk if it hasn't changed?

Either way, I'd set up a configurable vertex budget, which would just determine the size of the VBO. Then I'd need to do a quick depth first search for all the active vertices that have a projected error value below the threshold (and shouldn't be active). These vertices would be made inactive (by setting their VBO index to a special non-existing value -- probably 0xffffffff, and putting their old VBO index in a free queue). Also in this search, I'd recurse down to the first inactive vertex, and if it needs to be activated, put it in an activate queue. After I finish merging all the inactive vertices, I'd start looking through the activate queue. Activate them by taking a VBO index from the free queue, updating that VBO entry to the new vertex's position, setting the VBO index pointer in the file (or memory data structure), and any children that need activating to the end of the activate queue. Continue until no more children need to be split, the free queue is empty, or (possibly) the alloted time for terrain processing has passed.

So far, I'm thinking doing read-write file mappings, but only using it for dynamic height and such, and using an extra data structure to hold the VBO indices and such is going to be the best option, assuming that the OSes can tell when you change a memory mapped file, and don't page out changes if none have been made.
0 x

User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19261
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
Contact:

Post by sinbad » Sat Jan 03, 2004 1:37 pm

I've never been a big fam of ROAM, it's just too CPU intensive. Having done some research of my own, I think CPU-based tesselation algorithms, including SOAR, have only a limited shelf life - I think most new terrain engines will be using displacement mapping.

Full, filtered displacement mapping isn't available in the current crop of cards (hopefully it will be in 2004), but a very good alternative is available in vs_1_1 and above. Basically you use a single set of LOD'ed meshes, all flat (so you only need 2D coords), which you share among all the terrain patches, then an you only have an array of displacements per patch (that's 1 float per vertex). You actually pass 2 sets of each, so you can lerp between both LODs, but you can pack this information into float4's, and texture information can be generated in the vertex shader. Then, per vertex, based on the depth of each vertex you lerp between the 2 LODs. This results in a per-vertex realtime continuous level of detail with none of the CPU overhead of ROAM or SOAR, and requiring absolutely no changes to the vertex buffers. What's more, because it is lerping based on the depth of each vertex, you never get T-junction problems like you do with chunk-based LOD algorithms which are often the alternative to per-triangle algorithms, because the vertices on the boundaries are lerped at the same rate (they are at the same depth).

I'm really tempted to have a go at one of these myself, they look great. Take a look at http://users.belgacom.net/xvox/ as an example.
0 x

David
Gnoblar
Posts: 8
Joined: Wed Oct 01, 2003 8:38 pm
Contact:

Post by David » Sun Jan 04, 2004 12:02 am

Man, that one page is about all I can find on the subject. If I understand what it's saying correctly, you need 4 bytes per vertex for each LOD stored on the card. So that 1k x 1k patch that the demo you linked repeats over and over again uses somewhere between 6 and 8 megs of graphics memory, depending on how many LOD's it keeps in the card. That's 1M vertices, times 4 bytes per vertex, times 1.5 - 2.0 for the lower LOD's (upper bound of 2.0 if you go all the way down to 1 vertex for that whole 1k x 1k patch, lower bound of 1.5 if you only use one more half-sized LOD).

Now let's scale that up to the view range I'd like to support in my application. With a view range of 2 km and a horizontal resolution of 1m, we need a bare minimum of a 4k x 4k patch in memory (actually a bit more... the view frustum corners will stick out farther than that when you're not looking straight down the x or z axes). So let's store a 5k x 5k patch. That's 25 M times 4 times 1.5 - 2.0 = 150 - 200 megs(!) of graphics memory. And I'd really like to get 0.5 meter horizontal resolution.....

So storing all the mips for all patches won't work.... maybe only storing the mip for the LODs you need for the current frame would work, but that means paging data in and out of graphics memory every few frames (depending on camera speed), and would probably be a huge hit for flying over the terrain. Besides, moving stuff in and out of graphics memory is what we were trying to avoid in the first place.

So I think that repeating that same 1k x 1k block over and over again trivializes the biggest problem with that particular algorithm: it's massive graphics memory requirements. Unless I'm missing something, any real-world data sets just wouldn't fit on the card without serious modification of the algorithm. As always, feel free to enlighten me if I'm way off the mark (or if I just suck at math somewhere :) ).

Back to the SOAR/ROAM hybrid idea, I have a quick question: if you're only going to change a small portion (say 5%) of vertices in a vertex buffer, but they're scattered all around the buffer, is it better to use a shadow buffer and HBL_DISCARD, or just use HBL_NO_OVERWRITE and update the vertices directly on the card? This would happen first thing each frame, before any other render operations, so HBL_NO_OVERWRITE would still give me access to the whole buffer. If HBL_DISCARD and shadow buffers are the preferred method, there's little point in worrying about frame-to-frame coherence in the first place.... but I'm assuming the HBL_NO_OVERWRITE and spot updates will be faster.

Let me also say that I agree completely with the fact that ROAM is too CPU intensive. But I do think giving the CPU a decent amount of work to do will allow me to support a larger range of video cards.... especially since my target genre is MMORPG's, where most of the CPU work is done on the server anyway, and where there's less of a drive to have the latest and greatest hardware than there is among FPS players.

Thanks for the conversation, I definitely want to have a good survey of what's out there before I settle down and write up an implementation.

David
0 x

User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19261
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
Contact:

Post by sinbad » Sun Jan 04, 2004 2:57 am

You're right, the displacement map approach does have high memory requirements, but I also think there's an opportunity to improve it - the approach that web site generates more data than a standard mipmap filter, because he only reduces the area resolution by a factor of 2 rather than 4 every mip level (your calculation uses 4 too actually, so the memory required to do it his way is even worse!). However, it's interesting to see that you can get very good frame rates on fairly recent cards by using a very high detail level - meaning that most of the LOD levels he's using there are wasted because they're all compressed towards the far plane. I also thought his tessaltion was unnecessarily high, and I would have stretched the mesh data he used over a larger area, which would reduce the number of patches required. It depends on the level of detail you think you need though.

For my purposes, ie to give a similar level of detail to the current terrain renderers which means the size of the patches (and screen-space tris) would be about double what they are in that demo, I think you could get good visual results by reducing the detail only twice i.e. 1024->512->256, with a 3x3 rolling set of patches. According to my calculations, that's 5.5Mb per patch, or just under 50Mb for a 9-patch set, which I can live with given I was thinking that this would target recent cards (it might just squeeze into 64Mb with textures). I would set up the paging of this data in a separate thread if I wanted a larger area than that, but I think that will equate to a pretty large area; however if you want 1m tesselation then it probably won't do for you. I'm thinking more 3m tesselation so that's enough data for 9km.

As for the HBL_NO_OVERWRITE vs HBL_DISCARD, this is a tough one - I think the only way to know is to benchmark it.

Interesting!
0 x

User avatar
Antiarc
Greenskin
Posts: 120
Joined: Thu Jan 23, 2003 8:40 am
Contact:

Post by Antiarc » Sun Jan 04, 2004 4:58 am

Oh, yay! I was wanting to look into working on SOAR. It fits my needs much more closely than PLS, as well. I wsh you luck in this - I hope to be able to contribute.

Once I have my bezier terrain generation stuff done, I'll adapt it to your engine. It should translate easily enough, and should allow for expression of massive terrain with little space - ideal for SOAR. I've done some calculations, and have come up with numbers that indicate I can store 500x500 miles in about 300 MB, uncompressed. (By comparison, SWG planets are about 15kmX15km). That will be adjusted with practice, obviously, but I think that bezier data + SOAR is definitely the way to go for massive terrains.

I wish you luck in this, and will be following it closely. Hopefully, I can help you out, as well. This is something I've been meaning to do myself for some time, but haven't gotten around to it. Do keep us updated!
0 x

David
Gnoblar
Posts: 8
Joined: Wed Oct 01, 2003 8:38 pm
Contact:

Post by David » Mon Jan 05, 2004 4:58 pm

Antiarc: I'm not sure how you plan to get Bezier data to work together with SOAR, when SOAR needs to precalculate error and bounding sphere values for every vertex. These functions might fit into runtime if they're running in a separate thread and only cover a small area. They'll have to if we want to do dynamic height modification. But if you want to load an entire patch of Bezier data and recalculate all of its error values on the fly, that's going to really overload the CPU.

By contrast, if I can get this Displacement Map approach to work, Bezier data would be wonderful. The vertex array could be converted from Bezier data into a heightmap in another thread (since that conversion is a lot faster than the deeply recursive error calculations), and then shot to the card when it's needed (and there's room). If this particular strategy works, we'll have a good balance between CPU and GPU usage, with a fraction of the memory/disk requirements as the more complex algorithms I've been talking about in this thread. Not to mention dynamic height adjustment with little or no overhead -- you just need an "errata" quad-tree that can be traversed to find the changed heights after the heightmap has been formed, and reload a patch if you dynamically change a visible patch.... Hmmm, better yet, set the heightmap to some special invalid value, then go through and change the heights that are different, then you only have to calculate heights from Bezier data for vertices that are still marked invalid.
0 x

User avatar
Antiarc
Greenskin
Posts: 120
Joined: Thu Jan 23, 2003 8:40 am
Contact:

Post by Antiarc » Mon Jan 05, 2004 5:28 pm

That's pretty much how I have my bezier data working in the PLS at the moment - it generates a heightmap on the fly, and then just acts like the regular image loader. You wouldn't want to recalculate the error and bounding spheres with every run of the program, of course, but you could use bezier data to distribute terrain, and then calculate and cache the results on a client machine. I'm looking at a distributed system with massive terrain, and have been working on developing the bezier terrain expression format as a means of distributing terrain quickly and easily. A client could receive an area from the server in a very small bezier format, and then the client could expand it into a heightmap and perform the necessary calculations, then cache it on the client. I'm working under the pretense that terrain can change permanently for the system, and I'll need a way to distribute changes. Optimizations would have to be made for decent speed with a SOAR-based engine, but I think it could work. Rather than having to distribute the massive amounts of precalculated data SOAR needs, you just distribute the bezier data, and the engine can expand and cache it.

I'm fairly unfamiliar with the concept of a displacement map, so I'll leave that one alone.

I'm still reading up on the various fast terrain methods, but I've been most impressed with SOAR so far, really. I'd love to see what happens with this.
0 x

David
Gnoblar
Posts: 8
Joined: Wed Oct 01, 2003 8:38 pm
Contact:

Post by David » Mon Jan 05, 2004 8:00 pm

So you're basically thinking of using Bezier data for sending new landscape data accross the network, then rebuilding the SOAR files at patch-time. That approach might work. The problem is, it sounds like you want truly massive terrain, on the order of 500 square kilometers. If you go for 1000 vertices per kilometer, then you're talking about 250 billion vertices for the entire landscape. At the bare minimum, SOAR needs 4 bytes per vertex in the file, so the file would be 1 terabyte. Even at 2 meter horizontal resolution, that's 62.5 billion vertices, creating a 25 GB file. I'm really thinking that, even with file size optimizations, SOAR and massive terrains don't go together. You can store (16k * horizontal resolution) square units per GB in the terrain file. That's about 16-32 square kilometers per gigabyte.

To me, it's looking more and more like the only way to get truly massive terrains is to use a highly compressed format like your Bezier data format, and dynamically create pages on the fly as they're needed. Last time I talked to Spoke, he was thinking about moving the VBO's up from the Tile class to the PageData class, making one big VBO per page. Think of the displacement map approach as doing exactly that, except moving most of the LOD operations into vertex programs rather than doing splits and merges on the CPU.
0 x

David
Gnoblar
Posts: 8
Joined: Wed Oct 01, 2003 8:38 pm
Contact:

Post by David » Tue Jan 06, 2004 6:15 pm

Sinbad: I've gotten a chance to digest the source code of the demo you linked for displacement mapping, but I think I've thought of a way to do even better....

Store the X and Z base positions in separate VBO's, only one VBO for the most detailed LOD. (it never has made sense to me why the lowest LOD has the highest detail).

Store the Y coordinates in VBO's, one per page, only the full detail heightmap.

Make the size of a page 129x129.

Now 16 bit indexes will be enough (plus the smaller page size means loading can happen over more frames.... less stutter).

You have 3 different vertex buffers, so now we can use one index buffer to index them all. Put each LOD in a different index buffer, and reuse the index buffer for all the pages (the X and Z as well as each Y page).

Now you only have to store the highest level of detail for each heightmap, and use a different RenderOperation for each page to associate the same index buffers with different heightmaps.

If you store the heightmap according to what LOD a vertex is in (i.e. put all the vertices that appear in the highest LOD in first, then the remaining vertices in the next highest LOD, and so on until you get to the lowest LOD and full resolution), then you can only load the first half of the vertex buffer into graphics memory for any pages that you know will always be out of range of the lowest LOD, and reload the pages with full detail when you get close. You wouldn't want to reload all the pages every time the camera moved from one page to the next, so you only reload the pages when they enter or leave the "inner circle" of pages around the page the camera is over (depending on how far it is to the first LOD increase), but even that halves the memory requirements for all the pages outside of the inner circle. Of course, you'd need a few extra full detail pages than you really need, so you have room to preload the pages before you need the extra data.

Let me know what you think about this approach.

The only piece of the puzzle I need to make this work, is figuring out how to set up multiple VBO's for one rendering pass. It seems a RenderOperation only supports one Vertex buffer and one Index buffer. Then I need to figure out how to set up the Cg program to recieve these vertex streams, and I'll be good to start working on an implementation. Any info on these would be appreciated.
0 x

User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19261
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
Contact:

Post by sinbad » Wed Jan 07, 2004 12:18 am

David wrote:The only piece of the puzzle I need to make this work, is figuring out how to set up multiple VBO's for one rendering pass. It seems a RenderOperation only supports one Vertex buffer and one Index buffer.
Nono, vertexData has a vertexBufferBinding member, but this is a list of bindings. You can pull vertex components from any number of buffers, so long as all the components at index 'i' all refer to the same vertex in every buffer. However if you meant 'stitching' VBOs together, ie. combining more vertices from other buffers (rather than other components for the same vertices) no that can't be done because of the way GPUs work - they stream data in one vertex at a time, from potentially many streams but each component must come from a single contiguous stream.

To be honest I haven't been able to completely wrap my head around what you're suggesting here, probably because I'm suffering from a lack of sleep :? Since you're using the same index data for all pages, does this mean your lower LOD pages are physically larger? Or did you mean use the index buffer space, rather than the index buffer values?
0 x

David
Gnoblar
Posts: 8
Joined: Wed Oct 01, 2003 8:38 pm
Contact:

Post by David » Wed Jan 07, 2004 3:22 am

Ok, I get it now.... I think I can make my idea work with that.

The way I understand it, here are the streamed values you need in the vertex program:

x2, y2, and z2 of the vertex.
x1, y1, and z1 where the vertex morphs from.
LOD level that the vertex is introduced in.

Now, to me, it looks like he stores the x1, x2, z1, z2, and LOD in one buffer and re-uses for all the other pages, and stores y1 and y2 in another buffer per page. But then he stores every lod above 0 as well. But really, each vertex is introduced into the mesh at exactly one level of detail, so you really only need to store each vertex once. Then you can just use a subset of the buffer by only indexing the portions you need. So you have one 127x127 buffer for x1/2, z1/2, and lod, another 127x127 buffer for y1/2, and 14 index buffers of varying lengths (one for each possible LOD).

Then you just make sure you pick an index buffer that's a level of detail lower or the same as the vertices in a given patch, and vertices nearing the LOD line will start moving towards their new place, vertices closer will already be there (clamped at 1.0), and vertices past it will be in the position that makes them degenerate triangles (clamped at 0.0). So you follow his lead and traverse the quad tree until there is only one LOD difference between the minimum and maximum LOD, and shoot that patch out to the card. And that can be optimized by storing the indexed triangle fans that make up each square of the mesh in Morton order, then it's just a quick calculation to figure out the starting index and number of indices. Something like:

// odd levels have 4 triangles, even have 2
trisPerFan = (level % 2 ? 6 : 4);
start = (interleave bits of x and z) * trisPerFan;
length = 4^(7 - level) * trisPerFan;

Well, something close to that anyway.

And since I'm using the same index data for all the pages, no, lower LOD pages aren't physically larger (I couldn't think of a way to make them mesh up right in space), they're smaller in memory. i.e. pages which are missing LOD 0 are half the memory footprint of pages that only go down to LOD 1.

Hope that clears things up a bit. (It fleshed some things out in my head anyway :) ).

I'll start playing around with some code and seeing if I can make it work.... wish me luck. Now that I see that VertexBufferBinding is a list of bindings, I think I have everything I need to get started on this.
0 x

User avatar
jason86
Silver Sponsor
Silver Sponsor
Posts: 22
Joined: Sat Dec 17, 2005 6:40 am
Location: Austin, Tx

Post by jason86 » Sun Dec 25, 2005 4:26 am

sinbad wrote: Nono, vertexData has a vertexBufferBinding member, but this is a list of bindings. You can pull vertex components from any number of buffers, so long as all the components at index 'i' all refer to the same vertex in every buffer.
I dug this up when searching for an answer to a problem im working on. This is what Im trying to accomplish but what I dont understand is how does this work, (combining components from multiple buffers per vertex that is), I cant seem to find info on using multi-input buffers for one vertexData.

Thanks sinbad.
0 x

User avatar
jason86
Silver Sponsor
Silver Sponsor
Posts: 22
Joined: Sat Dec 17, 2005 6:40 am
Location: Austin, Tx

Post by jason86 » Sun Dec 25, 2005 12:48 pm

I figured it out after playing with it all day, learned alot about shaders too hehe. Man, everytime I sit down to play with Ogre i get more addicted!

Thanks for the great api guys.
0 x

Post Reply