[Offtopic] Some thoughts on a networking model.

Anything and everything that's related to OGRE or the wider graphics field that doesn't fit into the other forums.
Post Reply
User avatar
Antiarc
Greenskin
Posts: 120
Joined: Thu Jan 23, 2003 8:40 am
Contact:

[Offtopic] Some thoughts on a networking model.

Post by Antiarc »

'm working on a networking solution I've tenatively termed a "quadtree network". The basic idea is to produce a scalable network for a series of nodes that have geographical or pseudo-geographical locations by concentrating bandwidth available to the system according to that area's need - a self-adapting network, if you will. The root node is a trusted content distribution node, and then recursive quadrants are formed, which content then propagates down through. As state changes occur within the network, they propagate up through the network, and then spill down into other branches as resources become available. Since nodes will be changing pseudogeographical location frequently (and thus, will be changing their location in the quadtree frequently), it is important that state changes be propagated up through the tree quickly, so that they can learn about those changes as they come into proximity with the effects of those changes.

I'm designing for a network size potentially in the tens of thousands of nodes. The entire point of this structure is to alleviate as much load as possible from the root node and to offload as much of it as possible onto other nodes in the network.

My inspiration is two-fold: first, quadtrees for terrain tesselation and rendering as a model for node distribution (that is, more heavily populated areas of the network are tesselated further and further, forming small sub-pools of nodes), and second, the BitTorrent/swarming P2P model of exploiting unused resources within the network to decrease the load on individual points in the network at the cost of total increased bandwidth usage. I plan to use the swarming aspect of the network for authentication of state - that is, if more nodes than not agree on a state change, it becomes authoritative. Rather than taking the "trusted/not trusted" model, I'm wanting to see if a "trusted by consensus" model can work. This increases the total resource usage per state change, but drastically decreases the resources needed per node to tiny, self-contained network levels. If my design is right, then I will essentially be forming a hierarchy of four-to-twelve (for redundancy/fallover) node networks, which can then hopefully operate as self-contained entities, which should hopefully operate with very low latencies.

I've put together a partial visualization of a small network employing this model:

Image

Each node serves a maximum of eight child nodes. After those eight nodes, then the children are passed subsequent connection requests, depending on the quadrant of the parent's authority radius that they serve.

Low-bandwidth nodes will move down towards the tree towards leaf positions. This means that they have a higher latency to the root (# of hops increases), but they have a low personal resource responsibility, as they aren't serving anyone. Higher bandwidth nodes will move up the tree towards the root, as they can handle larger child trees.

Each node has two parents (for redundancy of connections) and a maximum of seven siblings. The siblings do authentication of state change requests for each other, with conflicts being pushed up the tree for resolution. If there are enough conflicts to move up the tree, it'll eventually move to the root, which is authoritative.

Each node stores its ancestor chain, so that in the event of one or more disconnects (ie both parents and grandparents disconnect simultaniously), they can query up the chain until they find a live node that can adopt them.

State changes are verified locally, and then sent to the parent node. The parent node batches state changes and propagates them to its parent. The parent then stores them and propagates them to its own parent, all the way up to the root. When a node moves to a different parent, then the parent either knows about the state (due to it being a sibling of the previous parent's - remember, pool-jumping is likely to be proximity-based, since node pools are based on quadtree division of virutal locations) and can propagate it, or it sends a request up the tree until it finds a node that knows about it, and then propagates it to the child.

Content is also distributed in a hierarchical swarming manner. A parent obtains a resource, notifies children of the resource, and the children set up a mini-torrent that they use to obtain the resource, which they then propagate to their own children in a similar manner. This will hopefully make resource publishing transparent and relatively easy on the publisher.

Ideas? What pitfalls do you see? I'm aware of several, but I'd like to get some other programmers' views on the idea.

iq
Greenskin
Posts: 125
Joined: Mon Oct 20, 2003 8:25 pm

Post by iq »

I'm curious - what's the purpose of this network? Is it interactive (ie MMOG), 'dynamic' (ie giant chat room or better chat city; similar to previous but less latency dependant) or more a bulk transmission scheme with integrity checks by peers (latency and temporary disconnects are not critical)?

User avatar
Antiarc
Greenskin
Posts: 120
Joined: Thu Jan 23, 2003 8:40 am
Contact:

Post by Antiarc »

The intended usage is actually large-scale game networks (MMO, whatever) with high global latency, but low local latency. That is, you should have low latency with the people you're interacting with, though latency with the rest of the world can be extremely high without consequence.

User avatar
joshcryer
Gnome
Posts: 351
Joined: Wed Oct 13, 2004 8:22 am

Post by joshcryer »

It's a really good concept. :)

It may break down when populations soar, the problem then becoming a bandwidth issue, since every client must be accounted for, and a lot of nodes must pass along those client locations, to a point of oversaturation and lots of nasty latency. What is your answer to that?

This helps lower bandwidth costs (which is getting so cheap these days it's almost not an issue), but it doesn't exactly solve latency issues imo because those generally occur when large masses are in a given area.

What happens when 10k people are in a room? With a central server, we just have to send 10k location+direction packets, each client recieving one packet a piece. With nodes, low end clients send one packet each (for their own location), while high end clients have to send several times as many (one for their location, and 7 others for those nodes beneath them).

Have you done simulations? What kinds of packet sizes are we talking about here? I'm quite interested in this idea. Perhaps an open source proof of concept is in order. :)

User avatar
Snugglebear
Gnoblar
Posts: 22
Joined: Wed Oct 01, 2003 7:38 am
Location: California

Post by Snugglebear »

The pseudogeographic lines could be easily tweaked, redrawn, or even replaced with other partitioning schemes should it prove unworkable in a situation. Having 10,000 people in a single room can be prevented by forcefully splitting that room into smaller segments. It would cause the extreme latency, but get the job done without spiraling resource consumption out of control. The system could be tweaked for such special cases anyway.

However, there are a number of things that bother me. There is a root in this system. Valve's Steam has proven the folly of this topology for large networks, even if root is a cluster or other logical entity itself composed of multiple physical entities. Strictly peer to peer systems have been made to work (there was some research with action games and the MBONE network in 2001-2002), though even with the low latencies and high bandwidth MBONE provided, peer models just couldn't handle keeping a synced gamestate. And that leads to the big issue - there are realistic limits on bandwidth and power that need to be addressed. A friend of mine from undergraduate tried a very similar architecture in 1999-2000 and failed, mostly because there weren't enough clock cycles nor bits per second to make it happen. Networks like Gnutella or Freenet also have interesting topologies, but even with modern machines they suck up far too many resources (granted Freenet is in Java, and not optimized) with fair amounts of traffic.

iq
Greenskin
Posts: 125
Joined: Mon Oct 20, 2003 8:25 pm

Post by iq »

I'm far from a network expert, but well - I have my doubts about this system for a game.

1.) I understand you want to group nodes according to ingame locations, not real world locations. This can possibly build up enormous lag, unless node-to-node speed (latency not bandwith) is carefully taken into account. just imagine a parent-child chain hopping back and forth across the globe.

2.) The goal is to reduce server load by offloading validation to insecure sub nodes. This is IMHO a sure recipe for desaster. In order to reduce central server load state changes have to become authorative before the root is reached (ie a finite number of validation checks) - now imagine a cluster of 'bad' clients. By aggregating in a small area chances are good that the group is able to authenticate its own transactions - at least temporarily. (a node connects with both upstreams to bad nodes, which effectly dictates the rules for the whole tree below)

3.) keeping track of state changes - while the idea, that you only move small distances in the tree should reconnect you with nodes of similar knowledge is valid the whole thing falls apart at busy locations - having a heavily fragmented world segment makes it easier to rapidly traverse the tree. each client would have to report the state to the new parent which would have to authenticate the information. a rapidly changing game environment will likely create a lot of authentification traffic.
imagine a medieval battle field - you pick up a sword, run past a gaggle of other players quickly traversing the network nodes and now attack another player with it, but the new sub network has no knowledge if the possession of the new sword is valid - what do you do now?
Additionally travel shortcuts (ie teleportation, portals) or node failures can throw you into a complety different universe (as far as game state is concerned), and there comes the large world lag into play - resyncing can now take a prohibitive long time.

4) couple of potential problematic scenarios:
what happens to outstanding authentication/state change request if intermediate nodes change connectivity of drop out completly.
What happens if intermediate nodes leave the area - ie how are childs handed over.
Who determines node traversal - ie how do you prevent temporary flooding/overload of nodes.

Just thinking of how you want to keep track of the mix of validated and unvalidated state changes for yourself and your subnodes and the possible implications for game mechanics make my head hurt ...

And finally to locally authenticate state changes nodes have to know a lot of details about other players. ie A sells B an item - to confirm that transaction other nodes have to know if A really has that item and that B has enough money to pay. This actually increases network load and opens the door for a number of exploits ...

User avatar
joshcryer
Gnome
Posts: 351
Joined: Wed Oct 13, 2004 8:22 am

Post by joshcryer »

Yeah, I have my doubts that even now we'd have enough bandwidth at the user level. I recall playing Quake recently with a buddy of mine, and between the two of us, he had a latency of 100. Add 8 more people, the whole system comes to a laughing halt. Too bad multicasting isn't on every router out there, it would simplify this greatly. All clients would have a list of hosts and send one multicast packet to them all. Problem solved.

But if I recall, BitTorrent is actually quite resource friendly. Shouldn't be too processor intensive to just send packets, problem is preserving game state so that hacked clients can't get away with naughty things.

edit: iq, we posted at almost the same time; regarding your last paragraph, I was thinking that trusted servers could take care of transactions. A sells item B is just one event, one tiny packet, nothing on the scale of things. A moves from point X to point Y, however, is a lot of little packets, each one detailing A's speed, direction, etc.

This probably breaks for action games, though, where real time fighting is taking place.

iq
Greenskin
Posts: 125
Joined: Mon Oct 20, 2003 8:25 pm

Post by iq »

joshcryer wrote: edit: iq, we posted at almost the same time; regarding your last paragraph, I was thinking that trusted servers could take care of transactions. A sells item B is just one event, one tiny packet, nothing on the scale of things. A moves from point X to point Y, however, is a lot of little packets, each one detailing A's speed, direction, etc.
Ahh I thought the "trusted by concesus" to be more globally. It is true that such transactions only are a small part of the network traffic. But to validate movements by peer lag gets an even more improtant role (populated area problem) and you still have to transmit additional data that can affect movement speed like 'level' of mount, overburdening etc for which the point still applies.

The major problem in all remotely competitive games is that even short time manipulations can thoroughly destroy the fun. Even if fake actions can be detected later on you can not roll back game states an infinite time - if at all. And how cheats can affect the acceptance of online play can be seen numerous times already - nobody will play it as it is no fun.

User avatar
zarthrag
Greenskin
Posts: 128
Joined: Sat Jul 24, 2004 9:07 am
Location: Tulsa, Oklahoma

Post by zarthrag »

As long as the "local" cluster is small, I don't think there will be a bandwidth problem as long as the application is properly coded.

However, the problem is normally securing such a network properly. Each node needs to be uniquely idenityable and *verifiable*. In my university days some friends and I attempted to build a watered down P2P network that could support a MMO (written in C#). We ended up using a central key authority to track each node's id. (every game installation needed to have a unique id).

This id is used with the player's username/password to generate a public and private key used to sign data packets. Systems can then request/verify the public key via the authoritative server. As a further precaution, the key sets are *temporary* and must be renewed by the keyserver periodically. It worked quite well for as long as the CA is running fine. Issues arise when trying to decentralize the CA. Just thought I'd share that.

A problem we did defeat is the "Braveheart" effect. (An event in the game that draws many nodes to a small area). We planned to build special "dynamic server" nodes that could actually convert an entire area from P2P to a client-server architecture. As the effect worsens, more dedicated nodes could be added to the area.

Those same nodes would normally perform "spot checks" on packets to ensure to detect cheating, it was quite ornate - as the protocol supported packet "history". That is, the last xx packets could be examined in P2P areas to detect problems. (Only necessary if all nodes in the area were using the same cheat together.)

Our design was great, but all users must be using broadband for it to truely shine. Additionally, we were experimenting with IPv6 - with extensive usage of multicasting to broadcast key-related packets, and to address densely packed areas.

There were two packet routing types, direct, multicast (nodes in a set area), and broadcast (the entire network).

This all died around when we started to add too much: IPv4 support, with routing of multicast packets via the nearest ipv6 node. A star topology that determines network placement by bandwidth/latency. It basically turned into a triple-layered protocol that would have taken years to develop and test, Crazy stuff.

Post Reply