Download presentation
Presentation is loading. Please wait.
Published byVictor Gabriel Laranjeira Modified over 5 years ago
1
A look at Peer-to-Peer File Sharing with Gnutella
Prof. Ellis Horowitz November 25, 2002 Copyright 2002 Ellis Horowitz
2
Copyright 2002 Ellis Horowitz
Outline P2P file sharing clients Gnutella protocol Gnutella network properties Gnutella protocol issues topolgy mismatch scalability free riding query types anonymity security Conclusions Copyright 2002 Ellis Horowitz
3
Copyright 2002 Ellis Horowitz
Peer-to-Peer File Sharing is all about the trading of copyrighted music and videos without paying anything to the authors query music category banner ad Kazaa Native Windows Application 3 million users online Copyright 2002 Ellis Horowitz sharing 4 PetaBytes of data
4
Kazaa Survives By Legal Manuvering
March 2001, Kazaa is founded by two Dutchmen, Niklas Zennstrom and Janus Friis in a company called Computer Empowerment The software is based upon their FastTrack P2P Stack, a proprietary algorithm for peer-to-peer communication Kazaa licenses FastTrack to Morpheus and Grokster Oct MPAA and RIAA sue Kazaa, Morpheus and Grokster Nov. 2001, Consumer Empowerment is sued in the Netherlands by the Dutch music publishing body, Buma/Stemra. The court orders KaZaA to take steps to prevent its users from violating copyrights or else pay a heavy fine. Jan. 2002, Zennstrom&Friis sell Kazaa software and website to Sharman Networks, based in Vanuatu, an island in the Pacific, but operating out of Australia Feb. 2002, Kazaa cuts off Morpheus clients from FastTrack April 2002, Sharman Networks agrees to let Brilliant Digital bundle their own stealth P2P application called AltNet within KaZaA. This network would be remotely switched on, allowing KaZaA users to trade Brilliant Digital content throughout FastTrack Copyright 2002 Ellis Horowitz
5
Morpheus File Sharing Software
shopping, web browser a Java application Search Power searches over multiple categories, metadata banner ad behind a firewall Morpheus adopts the Jtella version of Gnutella Copyright 2002 Ellis Horowitz
6
There are many Gnutella Clients
See Copyright 2002 Ellis Horowitz
7
Copyright 2002 Ellis Horowitz
Gnutella History Originally conceived of by Justin Frankel, 21 year old founder of Nullsoft March 2000, Nullsoft posts Gnutella to the web A day later AOL removes Gnutella at the behest of Time Warner The Gnutella protocol version 0.4 and version 0.6 there are multiple open source implementations at including: Jtella Gnucleus Software released under the Lesser Gnu Public License (LGPL) the Gnutella protocol has been widely analyzed Copyright 2002 Ellis Horowitz
8
Gnutella Protocol Messages
Broadcast Messages Ping: initiating message (“I’m here”) Query: search pattern and TTL (time-to-live) Back-Propagated Messages Pong: reply to a ping, contains information about the peer Query response: contains information about the computer that has the needed file Node-to-Node Messages GET: return the requested file PUSH: push the file to me Copyright 2002 Ellis Horowitz
9
Gnutella search mechanism
Steps: Node 2 initiates search for file A 7 1 A 4 2 6 3 5 Copyright 2002 Ellis Horowitz
10
Gnutella Search Mechanism
Steps: Node 2 initiates search for file A Sends message to all neighbors 7 1 4 2 A 6 3 A 5 Copyright 2002 Ellis Horowitz
11
Gnutella Search Mechanism
Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message 7 1 4 2 A 6 3 A 5 Copyright 2002 Ellis Horowitz
12
Gnutella Search Mechanism
Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message 7 1 4 2 A 6 3 A:5 5 A Copyright 2002 Ellis Horowitz
13
Gnutella Search Mechanism
Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message Query reply message is back-propagated 7 1 4 2 A:7 A:5 6 3 A 5 A Copyright 2002 Ellis Horowitz
14
Gnutella Search Mechanism
Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message Query reply message is back-propagated 7 1 4 A:7 2 A:5 6 3 5 Copyright 2002 Ellis Horowitz
15
Gnutella Search Mechanism
Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message Query reply message is back-propagated File download Note: file transfer between clients behind firewalls is not possible; if only one client, X, is behind a firewall, Y can request that X push the file to Y download A 7 1 4 2 6 3 5 Copyright 2002 Ellis Horowitz
16
Copyright 2002 Ellis Horowitz
Other Gnutella Issues GUID: Short for Global Unique Identifier, a randomized string that is used to uniquely identify a host or message on the Gnutella Network. This prevents duplicate messages from being sent on the network. GWebCache: a distributed system for helping servents connect to the Gnutella network, thus solving the "bootstrapping" problem. Servents query any of several hundred GWebCache servers to find the addresses of other servents. GWebCache servers are typically web servers running a special module. Host Catcher: Pong responses allow servents to keep track of active gnutella hosts On most servents, the default port for Gnutella is 6346 Copyright 2002 Ellis Horowitz
17
Network growth statistics
Growth Factors DSL and cable modem nodes grew substantially Multiple client implementations became available Although the protocol looks almost too simple, and although the failure to scale of the gnutella network has been predicted timje and again, the network managed to grow 100x in about a year (50x during the 6 month period we ran our crawler). … Graph explainations … This growth deserves some explanations: There was significant growth in the Gnutella network in 2001 5,000 nodes on February 2001, 10,000 nodes on March 19, 2001 20,000 nodes on May 12, 2001 40,000 nodes on May 29, 2001 Statistics due to Matei Ripeanu, see PAPERS/gnutella-rc.pdf Copyright 2002 Ellis Horowitz
18
Limewire Count of Gnutella Hosts in 2002
Green graph represents unique hosts Copyright 2002 Ellis Horowitz
19
Growth invariants (1): avg. node connectivity
3.4 links per node on average With the data gathered over this 6 months we performed some structural analysis on the topology graph. A first interesting growth invariant was that the average number of links per node stayed constant. For the graph – each point is a network – on X axis the size of the network and on Y axis the total number of links. graph due to Matei Ripeanu Copyright 2002 Ellis Horowitz
20
Growth invariants (2): network diameter
Node-to-node distance maintains similar distribution Average node-to-node distance grew 25% while the network grew 50 times over 6 months A more interesting invariant is related to the distribution and average values of node-to-node shortest paths for all the topology graphs we’ve obtained. In the figure each line relresents a graph … The darker ones represent earlier network measurements while the lighter one represent later network measurements. As you can see the distributions remain pretty stable … curves have the same shape … they only shift a bit right over time. And this shift is reflected in a 25% increase in average node to node shortest path all whlie the network grew 50%. Note that this is better than a random graph would do! Copyright 2002 Ellis Horowitz graph due to Matei Ripeanu
21
Is Gnutella a power-law network?
Power-law networks: the number of links per node follows a power-law distribution Examples: the Internet, in/out links to/from HTML pages, citation network, US power grid November 2000 An interesting analysis is generated by the question on whether GN is a power-law network? graph due to Matei Ripean Implications: High tolerance to random node failure but low reliability when facing of an ‘intelligent’ adversary Copyright 2002 Ellis Horowitz
22
Total Generated Traffic
Ripeanu has determined that Gnutella traffic totals 1Gbps (or 330TB/month)! Compare to 15,000TB/month in US Internet backbone (Dec. 2000) this estimate excludes actual file transfers Reasoning: QUERY and PING messages are flooded. They form more than 90% of generated traffic predominant TTL=7 >95% of nodes are less than 7 hops away measured traffic at each link about 6kbs network with 50k nodes and 170k links Statistics due to Matei Ripeanu Copyright 2002 Ellis Horowitz
23
Mapping between Gnutella Network and Internet Infrastructure
H Perfect Mapping Copyright 2002 Ellis Horowitz
24
Mismatch between Gnutella Network and Internet Infrastructure
Inefficient mapping Link D-E needs to support six times higher traffic. Copyright 2002 Ellis Horowitz
25
Copyright 2002 Ellis Horowitz
Topology mismatch The overlay network topology doesn’t match the underlying Internet infrastructure topology! 40% of all nodes are in the 10 largest Autonomous Systems (AS) Only 2-4% of all TCP connections link nodes within the same AS Largely ‘random wiring’ Most Gnutella generated traffic crosses AS border, making the traffic more expensive May cause ISPs to change their pricing scheme Copyright 2002 Ellis Horowitz
26
Copyright 2002 Ellis Horowitz
Scalability Whenever a node receives a message, (ping/query) it sends copies out to all of its other connections. existing mechanisms to reduce traffic: TTL counter Cache information about messages they received, so that they don't forward duplicated messages. Copyright 2002 Ellis Horowitz
27
Free Riding on Gnutella
70% of Gnutella users share no files 90% of users answer no queries Those who have files to share may limit number of connections or upload speed, resulting in a high download failure rate. If only a few individuals contribute to the public good, these few peers effectively act as centralized servers. see Adar and Huberman at Copyright 2002 Ellis Horowitz
28
Free Riding on Gnutella
More than 25% of Gnutella clients share no files; 75% share 100 files or less Conclusion: Gnutella has a high percentage of free riders * Statistics due to S. Gribble Copyright 2002 Ellis Horowitz
29
Copyright 2002 Ellis Horowitz
Anonymity Gnutella provides for anonymity by masking the identity of the peer that generated a query. However, IP addresses are revealed at various points in its operation: HITS packets includes the URL for each file, revealing the IP addresses Clients claim that they have no control, but.. they support bootstrapping they may control message flow they may control metadata searches they may control program updates Copyright 2002 Ellis Horowitz
30
Copyright 2002 Ellis Horowitz
Query Expressiveness Format of query not standardized No standard format or matching semantics for the QUERY string. Its interpretation is completely determined by each node that receives it. String literal vs. regular expression Directory name, filename, or file contents Malicious users may even return files unrelated to the query Copyright 2002 Ellis Horowitz
31
Copyright 2002 Ellis Horowitz
Gnutella Queries "The popularity of Gnutella queries and its implications on scalability" Kunwadee Sripanidkulchai, see Examining over 5 million queries Copyright 2002 Ellis Horowitz
32
Copyright 2002 Ellis Horowitz
Security Recently there have been P2P viruses and worms constructed the Benjamin virus uses Kazaa to spread itself, see Kazaa now includes virus checking software that is applied before upload/after download There have been several Gnutella worms: Gnutella.worm, VBS/GWV.a, VBS_GNUTELWORM, VBS.Gnut.A, VBS/Gnu A Gnutella worm spreads by making a copy of itself in the Gnutella program directory, then making that directory available for sharing files on the Gnutella network. Copyright 2002 Ellis Horowitz
33
Copyright 2002 Ellis Horowitz
Conclusions Gnutella is a self-organizing, large-scale, P2P application that produces an overlay network on top of the Internet; it appears to work Growth is hindered by the volume of generated traffic and inefficient resource use since there is no central authority the open source community must commit to making any changes Suggested changes have been made by Peer-to-Peer Architecture Case Study: Gnutella Network, by Matei Ripeanu Improving Gnutella Protocol: Protocol Analysis and Research Proposals by Igor Ivkovic Some solutions to help the network scale: Organize the overlay network to match the underlying infrastructure topology. Investigate methods for reducing traffic (query routing/filtering, better information organization). Exploit locality in user interest small world network (vorbit despre proiectul nostru de la Chicago) Exploit caches all while maintaining the self-organizing characteristics Copyright 2002 Ellis Horowitz
34
Copyright 2002 Ellis Horowitz
Legal Questions Do US courts have jurisdiction over P2P companies? Do P2P companies really contribute to copyright infringement, cite: Sony BetaMax case? Do P2P companies affect file sharing? If Kazaa, Grokster and Morpheus are stopped, will that stop file sharing or copyright infringement? Copyright 2002 Ellis Horowitz
35
Copyright 2002 Ellis Horowitz
Some References [1] Eytan Adar and Bernardo A. Huberman, Free Riding on Gnutella [2] Igor Ivkovic, Improving Gnutella Protocol: Protocol Analysis And Research Proposals [3] Jordan Ritter, Why Gnutella Can't Scale. No, Really. [4] Matei Ripeanu, Peer-to-Peer Architecture Case Study: Gnutella network. [5] The Gnutella Protocol Specification v0.4 Copyright 2002 Ellis Horowitz
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.