Download presentation
Presentation is loading. Please wait.
1
winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella –KaZaA –BitTorrent Distributed Hash Tables (DHT) and Structured Networks –Chord –Pros and Cons Readings: do required and optional readings if interested
2
winter 2008 P2P2 Peer-to-Peer Networks: How Did it Start? A killer application: Naptser –Free music over the Internet Key idea: share the content, storage and bandwidth of individual (home) users Internet
3
winter 2008 P2P3 Model Each user stores a subset of files Each user has access (can download) files from all users in the system
4
winter 2008 P2P4 Main Challenge Find where a particular file is stored A B C D E F E?
5
winter 2008 P2P5 Other Challenges Scale: up to hundred of thousands or millions of machines Dynamicity: machines can come and go any time
6
winter 2008 P2P6 Peer-to-Peer Networks: Napster Napster history: the rise –January 1999: Napster version 1.0 –May 1999: company founded –September 1999: first lawsuits –2000: 80 million users Napster history: the fall –Mid 2001: out of business due to lawsuits –Mid 2001: dozens of P2P alternatives that were harder to touch, though these have gradually been constrained –2003: growth of pay services like iTunes Napster history: the resurrection –2003: Napster reconstituted as a pay service –2006: still lots of file sharing going on Shawn Fanning, Northeastern freshman
7
winter 2008 P2P7 Napster Technology: Directory Service User installing the software –Download the client program –Register name, password, local directory, etc. Client contacts Napster (via TCP) –Provides a list of music files it will share –… and Napster’s central server updates the directory Client searches on a title or performer –Napster identifies online clients with the file –… and provides IP addresses Client requests the file from the chosen supplier –Supplier transmits the file to the client –Both client and supplier report status to Napster
8
winter 2008 P2P8 Napster Assume a centralized index system that maps files (songs) to machines that are alive How to find a file (song) –Query the index system return a machine that stores the required file Ideally this is the closest/least-loaded machine –ftp the file Advantages: –Simplicity, easy to implement sophisticated search engines on top of the index system Disadvantages: –Robustness, scalability (?)
9
winter 2008 P2P9 Napster: Example A B C D E F m1 m2 m3 m4 m5 m6 m1 A m2 B m3 C m4 D m5 E m6 F E? m5 E? E
10
winter 2008 P2P10 Napster Technology: Properties Server’s directory continually updated –Always know what music is currently available –Point of vulnerability for legal action Peer-to-peer file transfer –No load on the server –Plausible deniability for legal action (but not enough) Proprietary protocol –Login, search, upload, download, and status operations –No security: cleartext passwords and other vulnerability Bandwidth issues –Suppliers ranked by apparent bandwidth & response time
11
winter 2008 P2P11 Napster: Limitations of Central Directory Single point of failure Performance bottleneck Copyright infringement So, later P2P systems were more distributed File transfer is decentralized, but locating content is highly centralized
12
winter 2008 P2P12 Peer-to-Peer Networks: Gnutella Gnutella history –2000: J. Frankel & T. Pepper released Gnutella –Soon after: many other clients (e.g., Morpheus, Limewire, Bearshare) –2001: protocol enhancements, e.g., “ultrapeers” Query flooding –Join: contact a few nodes to become neighbors –Publish: no need! –Search: ask neighbors, who ask their neighbors –Fetch: get file directly from another node
13
winter 2008 P2P13 Gnutella Distribute file location Idea: flood the request Hot to find a file: –Send request to all neighbors –Neighbors recursively multicast the request –Eventually a machine that has the file receives the request, and it sends back the answer Advantages: –Totally decentralized, highly robust Disadvantages: –Not scalable; the entire network can be swamped with request (to alleviate this problem, each request has a TTL)
14
winter 2008 P2P14 Gnutella Ad-hoc topology Queries are flooded for bounded number of hops No guarantees on recall Query: “xyz” xyz
15
winter 2008 P2P15 Gnutella: Query Flooding Fully distributed –No central server Public domain protocol Many Gnutella clients implementing protocol Overlay network: graph Edge between peer X and Y if there’s a TCP connection All active peers and edges is overlay net Given peer will typically be connected with < 10 overlay neighbors
16
winter 2008 P2P16 Gnutella: Protocol Query message sent over existing TCP connections Peers forward Query message QueryHit sent over reverse path Query QueryHit Query QueryHit Query Query Hit File transfer: HTTP Scalability: limited scope flooding
17
winter 2008 P2P17 Gnutella: Peer Joining Joining peer X must find some other peer in Gnutella network: use list of candidate peers X sequentially attempts to make TCP with peers on list until connection setup with Y X sends Ping message to Y; Y forwards Ping message. All peers receiving Ping message respond with Pong message X receives many Pong messages. It can then setup additional TCP connections
18
winter 2008 P2P18 Gnutella: Pros and Cons Advantages –Fully decentralized –Search cost distributed –Processing per node permits powerful search semantics Disadvantages –Search scope may be quite large –Search time may be quite long –High overhead and nodes come and go often
19
winter 2008 P2P19 Peer-to-Peer Networks: KaAzA KaZaA history –2001: created by Dutch company (Kazaa BV) –Single network called FastTrack used by other clients as well –Eventually the protocol changed so other clients could no longer talk to it Smart query flooding –Join: on start, the client contacts a super-node (and may later become one) –Publish: client sends list of files to its super-node –Search: send query to super-node, and the super- nodes flood queries among themselves –Fetch: get file directly from peer(s); can fetch from multiple peers at once
20
winter 2008 P2P20 KaZaA: Exploiting Heterogeneity Each peer is either a group leader or assigned to a group leader –TCP connection between peer and its group leader –TCP connections between some pairs of group leaders Group leader tracks the content in all its children
21
winter 2008 P2P21 KaZaA: Motivation for Super-Nodes Query consolidation –Many connected nodes may have only a few files –Propagating query to a sub-node may take more time than for the super-node to answer itself Stability –Super-node selection favors nodes with high up-time –How long you’ve been on is a good predictor of how long you’ll be around in the future
22
winter 2008 P2P22 Peer-to-Peer Networks: BitTorrent BitTorrent history and motivation –2002: B. Cohen debuted BitTorrent –Key motivation: popular content Popularity exhibits temporal locality (Flash Crowds) E.g., Slashdot effect, CNN Web site on 9/11, release of a new movie or game –Focused on efficient fetching, not searching Distribute same file to many peers Single publisher, many downloaders –Preventing free-loading
23
winter 2008 P2P23 BitTorrent: Simultaneous Downloading Divide large file into many pieces –Replicate different pieces on different peers –A peer with a complete piece can trade with other peers –Peer can (hopefully) assemble the entire file Allows simultaneous downloading –Retrieving different parts of the file from different peers at the same time
24
winter 2008 P2P24 BitTorrent Components Seed –Peer with entire file –Fragmented in pieces Leacher –Peer with an incomplete copy of the file Torrent file –Passive component –Stores summaries of the pieces to allow peers to verify their integrity Tracker –Allows peers to find each other –Returns a list of random peers
25
winter 2008 P2P25 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker Web Server.torrent
26
winter 2008 P2P26 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker Get-announce Web Server
27
winter 2008 P2P27 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker Response-peer list Web Server
28
winter 2008 P2P28 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker Shake-hand Web Server Shake-hand
29
winter 2008 P2P29 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker pieces Web Server
30
winter 2008 P2P30 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker pieces Web Server
31
winter 2008 P2P31 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloade r “US” Peer [Seed] Peer [Leech] Tracker Get- announce Response-peer list pieces Web Server
32
winter 2008 P2P32 Free-Riding Problem in P2P Networks Vast majority of users are free-riders –Most share no files and answer no queries –Others limit # of connections or upload speed A few “peers” essentially act as servers –A few individuals contributing to the public good –Making them hubs that basically act as a server BitTorrent prevent free riding –Allow the fastest peers to download from you –Occasionally let some free loaders download
33
winter 2008 P2P33 Distributed Hash Tables (DHTs) Abstraction: a distributed hash-table data structure –insert(id, item); –item = query(id); (or lookup(id);) –Note: item can be anything: a data object, document, file, pointer to a file… Proposals –CAN, Chord, Kademlia, Pastry, Tapestry, etc
34
winter 2008 P2P34 DHT Design Goals Make sure that an item (file) identified is always found Scales to hundreds of thousands of nodes Handles rapid arrival and failure of nodes
35
winter 2008 P2P35 Distributed Hash Tables (DHTs) Hash table interface: put(key,item), get(key) O(log n) hops Guarantees on recall Structured Networks K I put(K 1,I 1 ) (K 1,I 1 ) get (K 1 ) I1I1
36
winter 2008 P2P36 Chord Associate to each node and item a unique id in an uni-dimensional space 0..2 m -1 Key design decision –Decouple correctness from efficiency Properties –Routing table size O(log(N)), where N is the total number of nodes –Guarantees that a file is found in O(log(N)) steps
37
winter 2008 P2P37 Identifier to Node Mapping Example Node 8 maps [5,8] Node 15 maps [9,15] Node 20 maps [16, 20] … Node 4 maps [59, 4] Each node maintains a pointer to its successor 4 20 32 35 8 15 44 58
38
winter 2008 P2P38 Lookup Each node maintains its successor Route packet (ID, data) to the node responsible for ID using successor pointers 4 20 32 35 8 15 44 58 lookup(37) node=44
39
winter 2008 P2P39 Joining Operation Each node A periodically sends a stabilize() message to its successor B Upon receiving a stabilize() message, node B –returns its predecessor B’=pred(B) to A by sending a notify(B’) message Upon receiving notify(B’) from B, –if B’ is between A and B, A updates its successor to B’ –A doesn’t do anything, otherwise
40
winter 2008 P2P40 Joining Operation Node with id=50 joins the ring Node 50 needs to know at least one node already in the system –Assume known node is 15 4 20 32 35 8 15 44 58 50 succ=4 pred=44 succ=nil pred=nil succ=58 pred=35
41
winter 2008 P2P41 Joining Operation Node 50: send join(50) to node 15 Node 44: returns node 58 Node 50 updates its successor to 58 4 20 32 35 8 15 44 58 50 join(50) succ=58 succ=4 pred=44 succ=nil pred=nil succ=58 pred=35 58
42
winter 2008 P2P42 Joining Operation Node 50: send stabilize() to node 58 Node 58: –update predecessor to 50 –send notify() back 4 20 3232 35 8 15 44 58 50 succ=58 pred=nil succ=58 pred=35 stabilize() notify(pred=50) pred=50 succ=4 pred=44
43
winter 2008 P2P43 Joining Operation (cont’d) Node 44 sends a stabilize message to its successor, node 58 Node 58 reply with a notify message Node 44 updates its successor to 50 4 20 32 35 8 15 44 58 50 succ=58 stabilize() notify(pred=50) succ=50 pred=50 succ=4 pred=nil succ=58 pred=35
44
winter 2008 P2P44 Joining Operation (cont’d) Node 44 sends a stabilize message to its new successor, node 50 Node 50 sets its predecessor to node 44 4 20 32 35 8 15 44 58 50 succ=58 succ=50 Stabilize() pred=44 pred=50 pred=35 succ=4 pred=nil
45
winter 2008 P2P45 Joining Operation (cont’d) This completes the joining operation! 4 20 32 35 8 15 44 58 50 succ=58 succ=50 pred=44 pred=50
46
winter 2008 P2P46 Achieving Efficiency: finger tables 80 + 2 0 80 + 2 1 80 + 2 2 80 + 2 3 80 + 2 4 80 + 2 5 (80 + 2 6 ) mod 2 7 = 16 0 Say m=7 ith entry at peer with id n is first peer with id >= i ft[i] 0 96 1 96 2 96 3 96 4 96 5 112 6 20 Finger Table at 80 32 45 80 20 112 96
47
winter 2008 P2P47 Achieving Robustness To improve robustness each node maintains the k (> 1) immediate successors instead of only one successor In the notify() message, node A can send its k-1 successors to its predecessor B Upon receiving notify() message, B can update its successor list by concatenating the successor list received from A with A itself
48
winter 2008 P2P48 Chord Optimizations Reduce latency –Chose finger that reduces expected time to reach destination –Chose the closest node from range [N+2 i-1,N+2 i ) as successor Accommodate heterogeneous systems –Multiple virtual nodes per physical node
49
winter 2008 P2P49 DHT Conclusions Distributed Hash Tables are a key component of scalable and robust overlay networks Chord: O(log n) state, O(log n) distance Both can achieve stretch < 2 Simplicity is key Services built on top of distributed hash tables –persistent storage (OpenDHT, Oceanstore) –p2p file storage, i3 (chord) –multicast (CAN, Tapestry)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.