Presentation is loading. Please wait.

Presentation is loading. Please wait.

By Jonathan Drake.  The Gnutella protocol is simply not scalable  This is due to the flooding approach it currently utilizes  As the nodes increase.

Similar presentations


Presentation on theme: "By Jonathan Drake.  The Gnutella protocol is simply not scalable  This is due to the flooding approach it currently utilizes  As the nodes increase."— Presentation transcript:

1 By Jonathan Drake

2  The Gnutella protocol is simply not scalable  This is due to the flooding approach it currently utilizes  As the nodes increase this approach causes an exponential increase in bandwidth usage  DHT is one concept that has been considered but it performs poorly on multiple keyword searches

3  DHT indexes specific keywords so this allows queries that search for specific needles in the haystack  The reality is that most searches in P2P utilize multiple keywords  Users tend to look for general results where multiple files could satisfy their needs  DHT would be great for finding a specific file or one entry in groups of thousands but that’s not the case

4

5  Random walking is one solution but unfortunately it takes a lot of hops and doesn’t guarantee it will find all the results the user wants  Random walking selects a random peer node to query and may end up missing results the user wants unless it runs for long periods of time making it no better then flooding

6  Supernodes work better but it still takes considerable resources and bandwidth because flooding (broadcasting) is still taking place between this super mesh  This can cause failures among super nodes and still doesn’t scale well when considering a file may only exist on a regular edge node

7  Sure!  Well the idea is that random walking has less of a cost then flooding so but still only chooses a random node to forward the query  Nodes are not identical, some have more resources then others so why not take advantage of this?  That’s what GIA proposes. When forwarding a query it should go to the node that’s least overloaded and has the most available bandwidth

8  Dynamic topology adaption – choose neighbors that have high capacity so we pass off queries to nodes that can handle it  Active Flow Control – When a node gets overloaded it allocates less tokens for queries so that its not overloaded  One-hop replication – Keep an index of the files on all neighbors to help speed up querying  Search Protocol – Direct queries to the node with the highest capacity

9  Topology Adaptation for GIA is an approach that chooses neighbors based on their overall capacity and current number of neighbors  When a node gets a request from another node it only accepts it if it has the capacity  If it doesn’t it still favors the new node and drops another neighbor from the subset of nodes with lower capacity that has the most neighbors. This is based on the idea that the node that is dropped has the least to loose

10  Tokens are assigned to neighbors based on their capacity (rather then uniformly)  These are used to issue queries to other nodes  They can start out uniformly but as nodes don’t use their tokens they can be redistributed to other nodes until it reflects a weight towards capacity

11  Replicate the contents of your neighbors in an index so that when a query comes you can respond with their file matches as well  When a node leaves the node removes their information from the index

12  Searching is essentially a biased random walk  Each node sends the query to neighbor with the highest capacity it has tokens for (otherwise its queued for later)  Book keeping is done with GUIDs to make sure we don’t follow redundant paths  TTL is used to end the query if its taking too long  MAX_RESPONSES is the total responses that should be retrieved before sending results back

13  You want a 90% success rate  You can see that just over 10 is the Collapse Point.  More replication makes things easier

14  Higher CP is preferred and lower hop counts  As replication rate increases CP increases and hop count decreases.  GIA Wins!

15  The authors thought of that and did some comparisons

16  Yes but GIA scales to multiple responses with no issues  They even found a proportion between MAX_RESPONSES and Replication factor!

17  You can achieve even capacities by allowing nodes to replicate files and not just index the checksum and location (one hope replication)  I’m sure the RIAA and MPAA love this idea…

18  Satisfaction levels are used to help choose when to keep looking for higher capacity neighbors  I = T x K -(1-S)

19

20  Yes that’s true. If a node loses a result the fallback is that the node who issued the request will not receive keep-alive messages from other nodes signaling it to reissue the request  For cases involving topology adaptation it won’t accept new queries after changing neighbors but it will still forward them along the old path

21  Then ask me a question!  Seriously any questions?


Download ppt "By Jonathan Drake.  The Gnutella protocol is simply not scalable  This is due to the flooding approach it currently utilizes  As the nodes increase."

Similar presentations


Ads by Google