Download presentation
Presentation is loading. Please wait.
Published byMarcia Willis Modified over 9 years ago
1
Peer to Peer (1)
2
References This discussion is based on the papers and websites in the P2P File Sharing section Acknowledgements: Many of the figures are from other presentations especially from the original authors.
3
Client-Server Model Let’s look at the Client- Server model Servers are centrally maintained and administered Client has fewer computing resources than a server This is the way the web works No interaction between clients
4
Client Server Model Disadvantages of the client-server model oReliability —The network depends on a possibly highly loaded server to function properly. —Server needs to be replicated to some extent to provide better reliability. oScalability —More users imply more demand for computing power, storage space and bandwidth
5
Peer-to-Peer Model All nodes have same functional capabilities and responsibilities No reliance on central services or resources. A node acts as both as a “server” and client. Considered more scalable
6
Peer-to-Peer Model Peer-to-peer systems provide access to information resources located on computers throughout network. Algorithms for the placement and subsequent retrieval of information is a key aspect of the design of P2P systems.
7
The Promise of P2P Computing High capacity through parallelism: oMany disks oMany network connections oMany CPUs Reliability: oMany replicas oGeographic distribution Automatic configuration Useful in public and proprietary settings
8
Why P2P? P2P potentially can eliminating the single-source bottleneck P2P can be used to distribute data and control and load-balance requests across the Net P2P potentially eliminates the risk of a single point of failure P2P infrastructure allows direct access and shared space, and this can enable remote maintenance capability
9
Brief History Generation 1 of P2P Systems oNapster music exchange Generation 2 oFreenet, Gnuetella, Kazaa, BitTorrent Generation 3 oCharacterized by the emergence of middleware layers for the application-independent management of distributed resources on a global scale —Pastry, Tapestry, Chord, Kademlia
10
Environment Characteristics for Peer- to-Peer Systems Unreliable environments Peers connecting/disconnecting – network failures to participation Random Failures e.g. power outages, cable and DSL failures, hackers Personal machines are much more vulnerable than servers
11
Evaluating Peer-to-Peer Systems A node’s database: oWhat does a node need to save in order to operate properly/efficiently Success rate (if the file is in the network, what are the changes that a search will find it) Lookup cost: oTime oCommunication (bandwidth usage) Join/departure cost Fault Tolerance – Resilience to faults Resilience to denial of service attacks, security.
12
Issues in File Sharing Services Publish – How to insert a new file into the network Lookup – Find a specific file Retrieval – Getting a copy of a file
13
P2P File Sharing Software Allows a user to open up a directory in their file system oAnyone can retrieve a file from directory oLike a Web server Allows the user to copy files from other users’ open directories: oLike a Web client Allows users to search nodes for content based on keyword matches: oLike Google
14
Napster: How Did It Work Application-level, client-server protocol over point- to-point TCP Centralized directory server Steps: oConnect to Napster server oGive server keywords to search the full list with. oSelect “best” of correct answers. —One approach is select based on the response time of a pings. –Shortest response time is chosen.
15
Napster: How Did It Work File list and IP address is uploaded 1. napster.com centralized directory
16
Napster: How Did It Work napster.com centralized directory Query and results User requests search at server. 2.
17
Napster: How Did It Work pings User pings hosts that apparently have data. Looks for best transfer rate. 3. napster.com centralized directory
18
Napster: How Did It Work napster.com centralized directory Retrieves file User chooses server 4. Napster’s centralized server farm had difficult time keeping up with traffic
19
Napster There are centralized indexes but users supplied the files which were stored and accessed on their personal computer Napster became very popular for music exchange
20
Napster History: o5/99: Shawn Fanning (freshman, Northeasten U.) founds Napster Online music service o12/99: first lawsuit o3/00: 25% UWisc traffic Napster o2/01: US Circuit Court of Appeals: Napster knew users violating copyright laws o7/01: # simultaneous online users: Napster 160K, Gnutella: 40K, Morpheus (KaZaA): 300K
21
Napster Judge orders Napster to pull plug in July ‘01 Other file sharing apps take over! gnutella napster fastrack (KaZaA) 8M 6M 4M 2M 0.0 bits per sec
22
Napster’s Downfall Napster’s developers argued they were not liable for infringement of the copyrights oWhy? They were not participating in the copying process which was performed entirely between users’ machines. This argument was not accepted by the courts oWhy? The index servers were deemed an essential part of the process Since the index servers were located at well- known addresses, their operators were unable to remain anonymous. oMakes for an easy lawsuit target
23
Napster’s Downfall A more fully distributed file sharing service spreads the responsibility across all of the users oMakes the pursuit of legal remedies difficult
24
Napster: Discussion Locates files quickly Vulnerable to censorship and technical failure Popular data become less accessible because of the load of the requests on a central server People started to look for more distributed solutions to file-sharing as a result of Napster’s failure.
25
Gnutella Napster’s legal problems motivated Gnutella where there is not a use of centralized indexes The focus is on a decentralized method of searching for files oCentral directory server no longer the bottleneck oMore difficult to “pull plug” Each application instance serves to: oStore selected files oRoute queries from and to its neighboring peers oRespond to queries if file stored locally oServe files
26
Gnutella Gnutella history: o3/14/00: release by AOL, almost immediately withdrawn oBecame open source oMany iterations to fix poor initial design (poor design turned many people off) Issues: oHow much traffic does one query generate? oHow many hosts can it support at once? oWhat is the latency associated with querying? oIs there a bottleneck?
27
Gnutella: Searching Searching by flooding: A Query packet might ask, "Do you have any content that matches the string ‘ Homer"? oIf a node does not have the requested file, then 7 (default set by Gnutella) of its neighbors are queried. oIf the neighbors do not have it, they contact 7 of their neighbors. oMaximum hop count: 10 (this is called time-to-live TTL) oReverse path forwarding for responses (not files)
28
Gnutella: Searching Downloading Peers respond with a “ QueryHit ” (contains contact info) File transfers use direct connection using HTTP protocol ’ s GET method When there is a firewall a "Push" packet is used – reroutes via Push path
29
Gnutella: Searching
31
Gnutella: Discovering Peers A peer has to know at least one other peer to send requests to. Addresses of some peers have been published on a website. When a peer enters the network, it contacts a designated peer and receives a list of other peers that have recently entered the network.
32
Gnutella: Discussion Robust: The failure of peer is not a failure of Gnutella. Performance: Flooding leads to poor performance Free riders: Those who get data but do not share data. The model of Gnutella just presented was found not be workable. This led to models that and some peer nodes have indexes.
33
KaZaA: The Service More than 3 million up peers sharing over 3,000 terabytes of content More popular than Napster ever was MP3s & entire albums, videos, games Optional parallel downloading of files Automatically switches to new download server when current server becomes unavailable Provides estimated download times
34
KaZaA: The Service A user can configure the maximum number of simultaneous uploads and maximum number of simultaneous downloads Queue management at server and client oFrequent uploaders can get priority in server queue Keyword search oUser can configure “up to x” responses to keywords Responses to keyword queries come in waves; stops when x responses are found
35
KaZaA: The Technology Proprietary Control data encrypted Everything in HTTP request and response messages
36
KaZaA: Architecture Each peer is either a supernode or is assigned to a supernode o56 min avg connect oEach SN has about 100- 150 children oRoughly 30,000 SNs Each supernode has TCP connections with 30-50 supernodes o23 min avg connect supernodes
37
KaZaA: Architecture Nodes that have more connection bandwidth and are more available are designated as supernodes Each supernode acts as a mini-Napster hub, tracking the content and IP addresses of its descendants A supernode tracks only the content of its children. Considered a cross between Napster and Gnutella
38
KaZaA: Finding Supernodes List of potential supernodes included within software download New peer goes through list until it finds operational supernode oConnects, obtains more up-to-date list, with 200 entries oNodes in list are “close” to ON. oNode then pings 5 nodes on list and connects with the one If supernode goes down, node obtains updated list and chooses new supernode
39
KaZaA Queries Node first sends query to supernode oSupernode responds with matches oIf x matches found, done. Otherwise, supernode forwards query to subset of supernodes oIf total of x matches found, done. Otherwise, query further forwarded oProbably by original supernode rather than recursively
40
Freenet: History Final year (undergraduate) project at Edinburgh University, Scotland, 1999. Around the time of Napster One of the goals was to provide users with anonymity; circumvent censorship. Other goals include filesharing, high availability and reliability (through decentralization), efficiency and scalability.
41
Freenet:Key Management Files are stored according to an associated key oCluster information about similar keys Keys are used to locate a document anywhere Keys are used to form a URI Two similar keys don’t mean the subjects of the file are similar
42
Freenet: Routing Tables Each node maintains a common stack id – file identifier next_hop – another node that stores the file id file – file identified by id being stored on the local node Forwarding of query for file id oIf file id stored locally, then stop —Forward data back to upstream requestor —Requestor adds file to cache, adds entry in routing table oIf not, search for the “closest” id in the stack, and forward the message to the corresponding next_hop oIf data is not found, failure is reported back —Requestor then tries next closest match in routing table id next_hop file … …
43
Freenet:Messages Random 64 bit ID used for loop detection —Each node maintains the list of query IDs that have traversed it help to avoid looping messages TTL —TTL is decremented each time the query message is forwarded
44
Freenet:Query API: file = query(id); Upon receiving a query for document id oCheck whether the queried file is stored locally —If yes, return it —If not, forward the query message Notes: oEach query is associated with a TTL that is decremented each time the query message is forwarded; to obscure distance to originator: —TTL can be initiated to a random value within some bounds —When TTL=1, the query is forwarded with a finite probability oEach node maintains the state for all outstanding queries that have traversed it help to avoid cycles oWhen file is returned it is cached along the reverse path
45
Freenet: Routing Example Copies of the requested file may be cached along the request path for scalability and robustness 4 n1 f4 12 n2 f12 5 n3 9 n3 f9 3 n1 f3 14 n4 f14 5 n3 14 n5 f14 13 n2 f13 3 n6 n1 n2 n3 n4 4 n1 f4 10 n5 f10 8 n6 n5 query(10) 1 2 3 44’ 5
46
Freenet: Insert API: insert(id, file); Two steps oSearch for the file to be inserted — If found, report collision — If number of nodes exhausted report clear oIf not found, insert the file
47
Freenet: Insert Searching: like query Insertion oFollow the forward path; insert the file at all nodes along the path oA node probabilistically replace the originator with itself; obscure the true originator
48
Freenet: Insert Example Assume query returned failure along “gray” path; insert f10 4 n1 f4 12 n2 f12 5 n3 9 n3 f9 3 n1 f3 14 n4 f14 5 n3 14 n5 f14 13 n2 f13 3 n6 n1 n2 n3 n4 4 n1 f4 11 n5 f11 8 n6 n5 insert(10, f10)
49
Freenet: Insert Example 10 n1 f10 4 n1 f4 12 n2 3 n1 f3 14 n4 f14 5 n3 14 n5 f14 13 n2 f13 3 n6 n1 n3 n4 4 n1 f4 11 n5 f11 8 n6 n5 insert(10, f10) 9 n3 f9 n2 orig=n1
50
Freenet: Insert Example n2 replaces the originator (n1) with itself 10 n1 f10 4 n1 f4 12 n2 10 n1 f10 9 n3 f9 10 n2 10 3 n1 f3 14 n4 14 n5 f14 13 n2 f13 3 n6 n1 n2 n3 n4 4 n1 f4 11 n5 f11 8 n6 n5 insert(10, f10) orig=n2
51
Freenet: Insert Example n2 replaces the originator (n1) with itself 10 n1 f10 4 n1 f4 12 n2 10 n1 f10 9 n3 f9 10 n2 10 3 n1 f3 14 n4 10 n2 f10 14 n5 f14 13 n2 n1 n2 n3 n4 10 n4 f10 4 n1 f4 11 n5 n5 Insert(10, f10)
52
Insert Conflict If the same GUID already exists, reject insert Previous file propagation prevents attempts to supplant file already in network.
53
Freenet: Ensuring Anonymity Nodes can lie randomly about the requests and claim to be the origin or the destination of a request TTL values are fuzzy It is difficult to trace back a document to its original node Similarly, it is difficult to discover which node inserted a given document.
54
Freenet: Discussion What’s wrong with Freenet? oNot well tested in the wild – scalability, resilience. Insertion flooding is one way to take out the network. oAnonymity guarantees not that strong – “Most non-trivial attacks would probably be successful in identifying someone making requests on Freenet.” oNo search mechanism – a standard search would allow attacks to take out specific content holders oSuffers from problems of establishing initial network connection. oDoes not always guarantee that a file is found, even if the file is in the network
55
Freenet: Discussion Primary Points oPrevention of censorship and protection of privacy is an important and active field of research. oFreenet is a (successful?) implementation of a system that resists information censorship oFreenet is an ongoing project that still has plenty of flaws oThere may be a tradeoff between network efficiency and anonymity, robustness.
56
BitTorrent Problems: oTraditional Client/Server Sharing —Performance deteriorates rapidly as the number of clients increases oFree-riding in P2P network —Free riders only download without contributing to the network. BitTorrent: Started in 2002 by B. Cohen Focus: Efficient fetching and not searching
57
BitTorrent url tracker 1. GET file.torrent file.torrent info: length name hash url of tracker 2. GET 3. list of peers 4. (tracks peers)
58
BitTorrent: Pieces File is broken into pieces oTypically piece is 256 KBytes oUpload pieces while downloading pieces Tit-for-tat oBit-torrent uploads to at most four peers oAmong the uploaders, upload to the four that are downloading to you at the highest rates oSometimes it allows for free loaders
59
Summary P2P is a major portion of Internet traffic. oHas exceeded web traffic There are different approaches to structuring P2P applications. There is a good deal of concern about future scalability and traffic which leading to a lot research from academia and industry.
60
Interesting Notes P2P traffic significantly outweighs web traffic P2P traffic continues to grow The focus of P2P file sharing is NOT MP3’s oMany free software projects use BitTorrent to download software BitTorrent is the dominant P2P application in use. oMicrosoft is jealous – they are developing their own version called Avalanche
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.