Download presentation
Presentation is loading. Please wait.
Published byCarmella Terry Modified over 9 years ago
2
Introduction Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts Ideal as a research test bed Large scale network demonstrates the need for scalable P2P protocols Large scale network demonstrates the need for scalable P2P protocols A Gnutella client has 4-10 TCP connections to other peers For signaling traffic UDP is used and to make use of the benefits of server based networks a ”ultra-peer” state was created
3
Introduction (Cont.) ”Ultra-peer” status is self assigned by powerful peers and provides some extra functionality compared to ordinary nodes There exist many freely available Gnutella clients Some of the most popular are: Limewire Limewire Bearshare Bearshare Morpheus Morpheus Shareaza Shareaza It has the most increasing number of users It has a very pleasant GUI and connects also to eDonkey and BitTorrent
4
Its Main Features This protocol underlies much of the current file-sharing activity on the Internet. It is based on TCP/IP and http! A file sharing network (fsn) is a bunch of machines that exchange files using gnutella. To connect to a gnutella network, you need the IP address of one single machine that is already part of the network.
5
Gnutella Peer-to-peer indexing and searching service. Peer-to-peer point-to-point file downloading using HTTP. A gnutella node needs a server (or a set of servers) to “start-up”… gnutellahosts.com provides a service with reliable initial connection points But introduces a new single point of failure!
6
Gnutella vs. Napster Like Napster, distributed file storage and transmission Added the ability to distribute file discovery Ask your direct peers who else they know Ask your direct peers who else they know Query those machines directly Query those machines directly
7
Concepts of Unstructured Services There are many interesting ideas being explored; Breaking shared files into many parts to both increase bandwidth (parallel I/O) and increase security of content as no one site can access files without cooperation from its peers Breaking shared files into many parts to both increase bandwidth (parallel I/O) and increase security of content as no one site can access files without cooperation from its peers This type of technology makes censorship very hard. MojoNation has a load balancing and scheduling algorithm in the form of micro payments to reward those who contribute most to the community of peers. MojoNation has a load balancing and scheduling algorithm in the form of micro payments to reward those who contribute most to the community of peers. Gnutella - which is a family of related products -- is usually described as a P2P search engine as its interface is nearer that of a search engine than a Web file system Gnutella - which is a family of related products -- is usually described as a P2P search engine as its interface is nearer that of a search engine than a Web file system
8
Characteristics Gnutella is a distributed system for file sharing Gnutella is a distributed system for file sharing provide means for network discovery provide means for file searching and sharing Defines a network at the application level Defines a network at the application level Employs the concept of peer-to-peer Employs the concept of peer-to-peer all hosts are equal (symmetry) there is no central point anonymous search, but reveal the IP addresses when downloading anonymous search, but reveal the IP addresses when downloading
9
connection Once you establish connection to the first servent, you announce your presence. The first servent will pass on that message to all the servents that it is connected to, and so on. These servents all reply with data about themselves how many files it is sharing how many files it is sharing how many kilo bytes the files take up how many kilo bytes the files take up This already adds up to a lot of traffic!
10
Gnutella File Sharing model n Users register files with network neighbors n Search across the network to find files to copy n Does not require a centralized broker (as Napster) BobCarol Ted Alice Where is Final Fantasy 4?Carol has Final Fantasy 4 Copying Final Fantasy 4 Where is Final Fantasy 4? Carol has it
11
Decentralized File-sharing Model Peers have same capability and responsibility The communication between peers is symmetric There is no central directory server Index on the metadata of shared files is stored locally among all peers Gnutella Gnutella FreeServe FreeServe MojoNation MojoNation Resource Discovery
12
Decentralized (Cont.) every user acts as a client, a server or both (servent) User connects to framework and becomes a member of the community, allowing others to connect through him/her Users speak directly to other users with no intermediate or central authority No one entity controls the information that passes through the community Resource Discovery
13
Advantages and Disadvantages Advantages: Inherent scalability Inherent scalability Avoidance of “single point of litigation” problem Avoidance of “single point of litigation” problem Fault Tolerance Fault ToleranceDisadvantages: Slow information discovery Slow information discovery More query traffic on the network More query traffic on the network Resource Discovery
14
Unstructured Decentralized Services There some 200 available Napster clones to support this area http://www.ultimateresourcesite.com/napster/main.htm Currently the most popular is Imesh [http://www.imesh.com], which has some 2 million users and can share any type of file. Some of the best known file sharing systems are Some of the best known file sharing systems are MojoNation [http://www.mojonation.net] MojoNation [http://www.mojonation.net] Freenet [http://freenet.sourceforge.net/] Freenet [http://freenet.sourceforge.net/] Gnutella [http://gnutella.wego.com/] Gnutella [http://gnutella.wego.com/] These three are not server based like Napster but rather support waves of software agents expressing resource availability and interest propagating among an informal dynamic networks of peers
15
DFS Variations FTPNFSWebNapster Shawn Fanning Gnutella Gene Kan @ AOL Freenet Ian Clark Purpose Remote file sharing Local file sharing Remote file sharing (portal) File-sharing community (portal) Decentralized file sharing community Decentralized anonymous file sharing Moderated?YesYesYesYesNoNo Access control? YesYesNoNoNoNo Search Server- based p2pp2p File transfer Client/ server p2pp2pp2p File transfer protocol ftpnfs http, caching proprietaryhttp Proprietary, encrypted, caching DFS: Distributed File Sharing
16
P2P File Sharing Benefits Cost sharing Resource aggregation Improved scalability/reliability Anonymity/privacyDynamism
17
Management/Placement Challenges Per-node state Bandwidth usage Search time Fault tolerance/resiliency
18
Gnutella in Details Share any type of files (not just music) Decentralized search unlike Napster You ask your neighbors for files of interest Neighbors ask their neighbors, and so on TTL field quenches messages after a number of hops TTL field quenches messages after a number of hops Users with matching files reply to you Figure from http://computer.howstuffworks.com/file-sharing.htm
19
The Gnutella protocol (v0.4) PING – Notify a peer of your existence PONG – Reply to a PING request QUERY – Find a file in the network RESPONSE – Give the location of a file PUSHREQUEST – Request a server behind a firewall to push a file out to a client.
20
Joining Gnutella Network A Gnutella Network The new node connects to a well known ‘Anchor’ node. Then sends a PING message to discover other nodes. PONG messages are sent in reply from hosts offering new connections with the new node. Direct connections are then made to the newly discovered nodes. New PING PONG
21
Properties of the Flooding Searching by flooding: If you don’t have the file you want, query 7 of your partners. If they don’t have it, they contact 7 of their partners, for a maximum hop count of 10. Requests are flooded, but there is no tree structure. No looping but packets may be received twice Note: Play gnutella animation at: http://www.limewire.com/index.jsp/p2p
22
Query flooding Gnutella no hierarchy use bootstrap node to learn about others join message Send query to neighbors Neighbors forward query to all attached neighbors (floods) If queried peer has object, it sends message back to querying peer join query
23
More on query flooding Pros peers have similar responsibilities: no group leaders highly decentralized no peer maintains directory info Cons excessive query traffic query radius: may not have content when present bootstrap node still required maintenance of overlay network
24
About the Flooding There is nothing that stops a servant flooding its network region with messages. Cost of maintaining Network Cost of searching file
25
Breadth-First Search (BFS) = forward query = processed query = source = found result = forward response
26
Pros and Cons Benefits: Peers speak directly with no central authority Peers speak directly with no central authority Nobody owns the Gnutella Network and nobody can shut it down No central point of failure Limited per-node state Isolated node failure can quickly and automatically be worked around Limited per-node state Isolated node failure can quickly and automatically be worked around Free loading Free loading Scalability ScalabilityDrawbacks: Searches are less effective and can be slow Searches are less effective and can be slow Bandwidth intensive Bandwidth intensive Gnutella network evolving to include “controlled decentralization” (limewire, bearshare, toadnode) Resource Discovery
27
Searching for a File Gnutella Network QUERY A node broadcasts its QUERY to all its peers who in turn broadcast to their peers. Nodes route QUERYHITs along the QUERY path back to the sender containing file location details. To download files a direct connection is made using details of the host in the QUERYHIT messages. HIT
28
The Cooperation Spectrum
29
Free Riding File sharing networks rely on users sharing data Two types of free riding Downloading but not sharing any data Not sharing any interesting data On Gnutella 15% of users contribute 94% of content 63% of users never responded to a query Didn’t have “interesting” data Data from E. Adar and B.A. Huberman (2000), “Free Riding on Gnutella”
30
Example: GNUTELLA
31
Summary of the Gnutella’s Features Decentralized No single point of failure No single point of failure Not as susceptible to denial of service Not as susceptible to denial of service Cannot ensure correct results Cannot ensure correct results Flooding queries Search is now distributed but still not scalable Search is now distributed but still not scalable
32
Initials Problems and Fixes Freeloading: WWW sites offering search/retrieval from Gnutella network without providing file sharing or query routing Block file-serving to browser-based non-file-sharing users Prematurely terminated downloads: Software bugs long download times over modems modem users run gnutella peer only briefly (Napster problem also!) or any users becomes overloaded fix: peer can reply “I have it, but I am busy. Try again later”
33
Initials Problems and Fixes 2 2000: avg size of reachable network only 400-800 hosts Why so small? modem users: not enough bandwidth to provide search routing capabilities: routing black holes Fix: create peer hierarchy based on capabilities previously: all peers identical, most modem blackholes connection preferencing: favors routing to well-connected peers favors reply to clients that themselves serve large number of files: prevent freeloading Limewire gateway functions as Napster-like central server on behalf of other peers for searching purposes
34
Gnutella Enhancements Pings/Pongs can consume up to 50% of bandwidth Solutions: Pong Limiting Pong Caching Ping Multiplexing http://www.limewire.com/index.jsp/pingpong
35
Gnutella enhancements 2 Cache query responses Results Evolving Protocol Gnutella Developer Forum UltraPeers Alternative query routing algorithms
36
Can Heterogeneity Make Gnutella Scale? Ideas Replace query flooding with multiple random walks Proactive replication #replicas proportional to sqrt(request rate) Result: Two orders of magnitude improvement in terms of query-time, per node load and message traffic
37
Can Heterogeneity Make Gnutella Scale? 2 Gnutella assumption: All peers are equal Not true! Heterogeneity among P2P peers (dial-up users vs. college users) Evolve topology to match node capacities Use random walks over this topology
38
Can Heterogeneity Make Gnutella Scale? 3 Solution outline C_i, node capacity in[j,i] messages from j->i, out[i,j] messages i->j Init in[i,j]=out[i,j]=0, OutMax[i,j]=c_i/d_I Update according the messages received/sent Check if overloaded If so redirect high-input neighbor to neighbor with high OutMax (spare capacity) Intuitively, take yourself out of the loop If node cannot be found ask neighbor to throttle back Result: Average query length reduces from 70 to 2-9 hops depending on topology
39
Measurement Results Who is sharing what? August 2000 The topShareAs percent of whole 333 hosts (1%) 1,142,64537% 1,667 hosts (5%) 2,182,08770% 3,334 hosts (10%) 2,692,08287% 5,000 hosts (15%) 2,928,90594% 6,667 hosts (20%) 3,037,23298% 8,333 hosts (25%) 3,082,57299%
40
Protocol scalability Message broadcast technique imposes limitations on the network size Message broadcast technique imposes limitations on the network size packets per message = ∑noPeers i packets per message = ∑noPeers i In November 2000 dial-up bandwidth barrier reached In November 2000 dial-up bandwidth barrier reached Overlay network efficiency Random selection of peers results in inefficient use of the underlying network Random selection of peers results in inefficient use of the underlying network Redundant traffic generated on the Internet Redundant traffic generated on the Internet Problems With Gnutella TTL i=0
41
Heterogeneous connection qualities of the Gnutella 35% have upstream bottleneck bandwidth of at least 100Kbps only 8% have at least 10Mbps bandwidth 22% have bandwidth 100kbps or less
42
Number of Shared Files
43
Why Look at Gnutella Widespread unstructured P2P network Currently between 200,000 & 300,000 hosts Currently between 200,000 & 300,000 hosts 2006: 2006: still heavily in use by about 2 million users Gnutella clients (among others): Gnutella clients (among others):LimeWireMorpheusBearShareOpenColaShareaza It has the most increasing number of users It has the most increasing number of users It has a very pleasant GUI and connects also to eDonkey and BitTorrent It has a very pleasant GUI and connects also to eDonkey and BitTorrent Ideal as a research test bed Large scale network demonstrates the need for scalable P2P protocols Large scale network demonstrates the need for scalable P2P protocols
44
Limewire: Improvement on Gnutella Creation peer hierarchy based on capabilities previously: all peers identical, most modem blackholes previously: all peers identical, most modem blackholes connection preferencing: connection preferencing: favors routing to well-connected peers favors reply to clients that themselves serve large number of files: prevent freeloading Limewire gateway functions as Napster-like central server on behalf of other peers for searching purposes for searching purposes
45
Limewire The Limewire P2P file sharing program connects to the Gnutella P2P network Limewire client software is widely recognized for its clean user interface that does not contain adware Sometimes billed as the „fastest file sharing program” Limewire claims to offer relatively good search and download performance Free Limewire software downloads are available for Windows, Linux and Macintosh operating systems Limewire Pro pay clients also exist
46
BearShare The BearShare P2P file sharing program is a popular free software client for the Gnutella P2P network Both free and pay downloads of BearShare file sharing programs exist
47
Shareaza Shareaza is an up-and-coming P2P file sharing program This client offers an extremely powerful search engine capable of connecting to multiple popular P2P networks including eDonkey, BitTorrent and Gnutella Shareaza file sharing software includes intelligence for detecting fake and/or corrupted files The free Shareaza download also contains no ads or spyware As the installed base of Shareaza client users grows expect Shareaza to become an even better P2P file sharing program expect Shareaza to become an even better P2P file sharing program
48
Anonymous? The person you are getting the file from knows who you are That’s not anonymous. That’s not anonymous. Other protocols exist where the owner of the files doesn’t know the requester. Peer-to-peer anonymity exists
49
Summary peer-to-peer networking: applications connect to peer applications focus: decentralized method of searching for files each application instance serves to: store selected files store selected files route queries (file searches) from and to its neighboring peers route queries (file searches) from and to its neighboring peers respond to queries (serve file) if file stored locally respond to queries (serve file) if file stored locally Gnutella history: 3/14/00: release by AOL, almost immediately withdrawn 3/14/00: release by AOL, almost immediately withdrawn too late: 23K users on Gnutella at 8 am this AM too late: 23K users on Gnutella at 8 am this AM many iterations to fix poor initial design (poor design turned many people off) many iterations to fix poor initial design (poor design turned many people off) What we care about: How much traffic does one query generate? How much traffic does one query generate? how many hosts can it support at once? how many hosts can it support at once? What is the latency associated with querying? What is the latency associated with querying? Is there a bottleneck? Is there a bottleneck?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.