Download presentation
Presentation is loading. Please wait.
1
P2P Architecture Case Study: Gnutella Network
I am … and I’m going to talk about the Gnutella network – more specifically about the macroscopic characteristics of this large-scale, distributed system. Gnutella network is one of the many P2P systems that appeared recently that allow users to exchange files. It’s special because it is completely decentralized: (at least until recently) all nodes performed exactly the same tasks and take decisions based only on local information. Matei Rîpeanu The University of Chicago
2
Why analyze Gnutella network?
Unprecedented scale up to 100k nodes, 100TB data, 10M files today Self-organizing network Staggering growth more than 50 times during first half of 2001 Open architecture, simple and flexible protocol Interesting mix of social and technical issues
3
Overview Gnutella protocol Tools for exploring the network
Network growth Structural graph analysis Is Gnutella a power-law network? Generated (overhead) network traffic Traffic estimates Overlay network topology mapping I’m going to briefly present the protocol and the tools developed to explore the network. We used those tools to track the network over a 7 months period: November 2000 – May We analyzed the data gathered and tried to explain network growth, prerformed structural analysis on the network topology grapy and discovered growth invariants and analyzed gnutella’s similarities with other large-scale systems. Finally we analyzed generated traffic and the match between …
4
Gnutella protocol overview
P2P file sharing application on top of an overlay network Nodes maintain open TCP connections Messages are broadcasted (flooded) or back-propagated Protocol: Broadcast (Flooding) Back-propagated Node to node Membership PING PONG Query QUERY QUERY HIT File download GET, PUSH
5
Gnutella search mechanism
Steps: Node 2 initiates search for file A 7 1 A 4 2 6 3 5
6
Gnutella search mechanism
Steps: Node 2 initiates search for file A Sends message to all neighbors 7 1 4 2 A 6 3 A 5
7
Gnutella search mechanism
Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message 7 1 4 2 A 6 3 A 5
8
Gnutella search mechanism
Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message 7 1 4 2 A 6 3 A:5 5 A
9
Gnutella search mechanism
Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message Query reply message is back-propagated 7 1 4 2 A:7 A:5 6 3 A 5 A
10
Gnutella search mechanism
Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message Query reply message is back-propagated 7 1 4 A:7 2 A:5 6 3 5
11
Gnutella search mechanism
Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message Query reply message is back-propagated File download download A 7 1 4 2 6 3 5
12
Tools for network exploration
Eavesdropper - insert modified nodes into the network to eavesdrop traffic. Crawler - connects to all active nodes and uses the membership protocol to discover graph topology. Client-server approach. Graph analysis tools high-volume offline computations.
13
Network growth High user interest Better resources
Users tolerate high latency, low quality results Better resources DSL and cable modem nodes grew from 24% to 41% over first 6 months. Today >50%. Although the protocol looks almost too simple, and although the failure to scale of the gnutella network has been predicted timje and again, the network managed to grow 100x in about a year (50x during the 6 month period we ran our crawler). … Graph explainations … This growth deserves some explanations: Open architecture / open-source environment Competing implementations Lower overhead network traffic, improved resource utilization, better structure
14
Growth invariants (1): avg. node connectivity
3.4 links per node on average With the data gathered over this 6 months we performed some structural analysis on the topology graph. A first interesting growth invariant was that the average number of links per node stayed constant. For the graph – each point is a network – on X axis the size of the network and on Y axis the total number of links.
15
Growth invariants (2): network diameter
Node-to-node distance maintains similar distribution Average node-to-node distance grew 25% while the network grew 50 times over 6 months A more interesting invariant is related to the distribution and average values of node-to-node shortest paths for all the topology graphs we’ve obtained. In the figure each line relresents a graph … The darker ones represent earlier network measurements while the lighter one represent later network measurements. As you can see the distributions remain pretty stable … curves have the same shape … they only shift a bit right over time. And this shift is reflected in a 25% increase in average node to node shortest path all whlie the network grew 50%. Note that this is better than a random graph would do!
16
Is Gnutella a power-law network?
Power-law networks: the number of links per node follows a power-law distribution Examples: the Internet, in/out links to/from HTML pages, citation network, US power grid, social networks. November 2000 An interesting analysis is generated by the question on whether GN is a power-law network? Implications: High tolerance to random node failure but low reliability when facing of an ‘intelligent’ adversary
17
Is Gnutella a power-law network?
Later, larger networks display a bimodal distribution Implications: High tolerance to random node failures preserved Increased reliability when facing an attack. May 2001
18
Overview Gnutella protocol Network growth Structural graph analysis
Generated network traffic: Traffic estimates Does Gnutella overlay network topology match the underlying resources.
19
Traffic analysis 6-8 kbps per link over all connections
Traffic structure changed over time
20
Total generated traffic
1Gbps (or 330TB/month)! Compare to 15,000TB/month in US Internet backbone (Dec. 2000) Note that this estimate excludes actual file transfers Q: Does it matter? Reasoning: QUERY and PING messages are flooded. They form more than 90% of generated traffic predominant TTL=7 >95% of nodes are less than 7 hops away measured traffic at each link about 6kbs network with 50k nodes and 170k links
21
Topology mismatch The overlay network topology doesn’t match the underlying Internet infrastructure topology! 40% of all nodes are in the 10 largest Autonomous Systems (AS) Only 2-4% of all TCP connections link nodes within the same AS Largely ‘random wiring’ Entropy experiment gives similar results
22
Conclusions Gnutella: self-organizing, large-scale, P2P application based on overlay network. It works! Growth hindered by the volume of generated traffic and inefficient resource use. Discovered growth invariants specific to large-scale systems that: Help predict resource usage Give hints for better search and resource organization techniques. Some solutions to help the network scale: Organize the overlay network to match the underlying infrastructure topology. Investigate methods for reducing traffic (query routing/filtering, better information organization). Exploit locality in user interest small world network (vorbit despre proiectul nostru de la Chicago) Exploit caches all while maintaining the self-organizing characteristics
23
Thank you! Questions?
24
What’s next? Organize the overlay network to match the underlying infrastructure topology. Investigate methods for reducing traffic (query routing/filtering, better information organization). Is Gnutella network a small-world network? What are the implications? CRED CA ASTA POATE SA DISPARA!
25
Statistical laws of large-scale systems
Zipf’s law: the size of the rth largest occurrence of the event is inversely proportional to it's rank: y ~ r -b, with b close to unity. Power law distributions: Probability distribution of event X is P[X=x]=x -k Pareto distribution: Cumulative probability distribution P[X>x]=x –(k-1) =x – Zipf, Pareto and power-law distributions are basically different ways to express the same phenomenon
26
H G E F D A B C H G E F D A B C
27
Overview Gnutella protocol Network growth
Statistical properties of large-scale systems Power-law distributions. Power-law networks. Generated (overhead) network traffic.
28
Power-law distributions
Probability distribution of event X is P[X=x]=x –k Present all over WWW and Internet space: the number of HTML pages within a site, visits to a site, links to a page, cache document popularity, etc
29
Power-law distributions in Gnutella
Number of shared files per node Query popularity follows a power-law distribution [Kas01] Implications: Caching is an effective solution to reduce traffic and query latency New search and node organizing mechanisms!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.