Presentation is loading. Please wait.

Presentation is loading. Please wait.

Peer-to-Peer Applications

Similar presentations


Presentation on theme: "Peer-to-Peer Applications"— Presentation transcript:

1 Peer-to-Peer Applications
Course on Computer Communication and Networks, CTH/GU The slides are adaptation of slides of the authors of the main textbook of the course, of the authors in the bibliography section and of Jeff Pang. P2P

2 P2P file sharing Alice chooses one of the peers, Bob.
File is copied from Bob’s PC to Alice’s notebook: HTTP While Alice downloads, other users uploading from Alice. Alice’s peer is both a Web client and a transient Web server. All peers are servers = highly scalable! Example Alice runs P2P client application on her notebook computer Intermittently connects to Internet; gets new IP address for each connection Asks for “Hey Jude” Application displays other peers that have copy of Hey Jude. P2P

3 Intro Quickly grown in popularity But what is P2P?
Dozens or hundreds of file sharing applications many million people worldwide use P2P networks Audio/Video transfer now dominates traffic on the Internet But what is P2P? Searching or location? Computers “Peering”? Take advantage of resources at the edges of the network End-host resources have increased dramatically Broadband connectivity now common P2P

4 The Lookup Problem N2 N1 N3 ? N4 N6 N5 Key=“title” Value=MP3 data…
Internet ? Client Publisher 1000s of nodes. Set of nodes may change… Lookup(“title”) N4 N6 N5 P2P

5 The Lookup Problem (2) Common Primitives:
Join: how to I begin participating? Publish: how do I advertise my file? Search: how to I find a file? Fetch: how to I retrieve a file? P2P

6 Overview Centralized Database Query Flooding
Napster Query Flooding Gnutella Hierarchical Query Flooding KaZaA Swarming BitTorrent Unstructured Overlay Routing Freenet Structured Overlay Routing Distributed Hash Tables Other P2P

7 P2P: centralized directory
directory server peers Alice Bob 1 2 3 original “Napster” design (1999, S. Fanning) 1) when peer connects, it informs central server: IP address content 2) Alice queries directory server for “Hey Jude” 3) Alice requests file from Bob Problems? Single point of failure Performance bottleneck Copyright infringement P2P

8 Napster: Publish insert(X, 123.2.21.23) ... Publish
I have X, Y, and Z! P2P

9 Napster: Search 123.2.0.18 Fetch search(A) --> 123.2.0.18 Query
Reply Where is file A? P2P

10 Napster: Discussion Pros: Simple Search scope is O(1)
Controllable (pro or con?) Cons: Server maintains O(N) State Server does all processing Single point of failure P2P

11 Overview Centralized Database Query Flooding
Napster Query Flooding Gnutella Hierarchical Query Flooding KaZaA Swarming BitTorrent Unstructured Overlay Routing Freenet Structured Overlay Routing Distributed Hash Tables Other P2P

12 Gnutella: History In 2000, J. Frankel and T. Pepper from Nullsoft released Gnutella Soon many other clients: Bearshare, Morpheus, LimeWire, etc. In 2001, many protocol enhancements including “ultrapeers” P2P

13 Gnutella: Overview Query Flooding:
Join: on startup, client contacts a few other nodes; these become its “neighbors” Publish: no need Search: ask neighbors, who ask their neighbors, and so on... when/if found, reply to sender. Fetch: get the file directly from peer P2P

14 Gnutella: Search I have file A. Reply Query Where is file A? P2P

15 Gnutella: protocol Query message sent over existing TCP connections
File transfer: HTTP Query message sent over existing TCP connections peers forward Query message QueryHit sent over reverse path Query QueryHit Scalability: limited scope flooding P2P

16 Gnutella: More on Peer joining
Joining peer X must find some other peer in Gnutella network (from gnutellahosts.com): use list of candidate peers X sequentially attempts to make TCP with peers on list until connection setup with Y X sends Ping message to Y; Y forwards (floods) Ping message. All peers receiving Ping message respond with Pong message X receives many Pong messages. It can then (later on, if needed) setup additional TCP connections P2P

17 Query flooding: Gnutella
fully distributed no central server public domain protocol many Gnutella clients implementing protocol overlay network: edge between peer X and Y if there’s a TCP connection all active peers and edges is overlay net Edge is not a physical link Given peer will typically be connected with < 10 overlay neighbors What is routing in p2p networks? P2P

18 Gnutella: Discussion Pros: Cons: Improvements: Fully de-centralized
Search cost distributed Cons: Search scope is O(N) Search time is O(???) Nodes leave often, network unstable Improvements: limiting depth of search Random walks instead of flooding P2P

19 Aside: Search Time? P2P

20 Aside: All Peers Equal? 56kbps Modem 10Mbps LAN 1.5Mbps DSL P2P

21 Overview Centralized Database Query Flooding
Napster Query Flooding Gnutella Hierarchical Query Flooding KaZaA Swarming BitTorrent Unstructured Overlay Routing Freenet Structured Overlay Routing Distributed Hash Tables Other P2P

22 KaZaA: History In 2001, KaZaA created by Dutch company Kazaa BV
Single network called FastTrack used by other clients as well: Morpheus, giFT, etc. Eventually protocol changed so other clients could no longer talk to it popular file sharing network with >10 million users (number varies) P2P

23 KaZaA: Overview “Smart” Query Flooding:
Join: on startup, client contacts a “supernode” ... may at some point become one itself Publish: send list of files to supernode Search: send query to supernode, supernodes flood query amongst themselves. Fetch: get the file directly from peer(s); can fetch simultaneously from multiple peers P2P

24 KaZaA: Network Design “Super Nodes” P2P

25 KaZaA: File Insert insert(X, 123.2.21.23) ... Publish I have X!
P2P

26 KaZaA: File Search search(A) --> 123.2.0.18 123.2.22.50 Replies
Query Where is file A? P2P

27 KaZaA: Discussion P2P architecture used by Skype Pros: Cons:
Tries to take into account node heterogeneity: Bandwidth Host Computational Resources Host Availability (?) Rumored to take into account network locality Cons: Mechanisms easy to circumvent Still no real guarantees on search scope or search time P2P architecture used by Skype P2P

28 Overview Centralized Database Query Flooding
Napster Query Flooding Gnutella Hierarchical Query Flooding KaZaA Swarming BitTorrent Unstructured Overlay Routing Freenet Structured Overlay Routing Distributed Hash Tables Other P2P

29 BitTorrent: History In 2002, B. Cohen debuted BitTorrent
Key Motivation: Popularity exhibits temporal locality (Flash Crowds) E.g., Slashdot effect, CNN on 9/11, new movie/game release Focused on Efficient Fetching, not Searching: Distribute the same file to all peers Single publisher, multiple downloaders Has some “real” publishers: Blizzard Entertainment using it to distribute games P2P

30 BitTorrent: Overview Swarming:
Join: contact centralized “tracker” server, get a list of peers. Publish: Run a tracker server. Search: Out-of-band. E.g., use Google to find a tracker for the file you want. Fetch: Download chunks of the file from your peers. Upload chunks you have to them. P2P

31 File distribution: BitTorrent
P2P file distribution tracker: tracks peers participating in torrent torrent: group of peers exchanging chunks of a file obtain list of peers trading chunks peer 2: Application Layer

32 BitTorrent (1) file divided into 256KB chunks. peer joining torrent:
has no chunks, but will accumulate them over time registers with tracker to get list of peers, connects to subset of peers (“neighbors”) while downloading, peer uploads chunks to other peers. peers may come and go once peer has entire file, it may (selfishly) leave or (altruistically) remain 2: Application Layer

33 BitTorrent (2) Sending Chunks: tit-for-tat
Alice sends chunks to four neighbors currently sending her chunks at the highest rate re-evaluate top 4 every 10 secs every 30 secs: randomly select another peer, starts sending chunks newly chosen peer may join top 4 “optimistically unchoke” Pulling Chunks at any given time, different peers have different subsets of file chunks periodically, a peer (Alice) asks each neighbor for list of chunks that they have. Alice sends requests for her missing chunks rarest first 2: Application Layer

34 BitTorrent: Tit-for-tat
(1) Alice “optimistically unchokes” Bob (2) Alice becomes one of Bob’s top-four providers; Bob reciprocates (3) Bob becomes one of Alice’s top-four providers With higher upload rate, can find better trading partners & get file faster! 2: Application Layer

35 BitTorrent: Publish/Join
Tracker P2P

36 BitTorrent: Fetch P2P

37 BitTorrent: Sharing Strategy
“Tit-for-tat” sharing strategy “I’ll share with you if you share with me” Be optimistic: occasionally let freeloaders download Otherwise no one would ever start! Also allows you to discover better peers to download from when they reciprocate Approximates Pareto Efficiency Game Theory: “No change can make anyone better off without making others worse off” P2P

38 BitTorrent: Summary Pros: Cons: Works reasonably well in practice
Gives peers incentive to share resources; avoids freeloaders Cons: Central tracker server needed to bootstrap swarm P2P

39 BTW: File Distribution: Server-Client vs P2P
Question : How much time to distribute file from one server to N peers? us: server upload bandwidth Server ui: peer i upload bandwidth u1 d1 u2 d2 us di: peer i download bandwidth File, size F dN Network (with abundant bandwidth) uN 2: Application Layer

40 BTW: File distribution time: server-client
server sequentially sends N copies: NF/us time client i takes F/di time to download F u2 u1 d1 d2 us Network (with abundant bandwidth) dN uN = dcs = max { NF/us, F/min(di) } i Time to distribute F to N clients using client/server approach increases linearly in N (for large N) 2: Application Layer

41 BTW: File distribution time: P2P
Server server must send one copy: F/us time client i takes F/di time to download NF bits must be downloaded (aggregate) F u2 u1 d1 d2 us Network (with abundant bandwidth) dN uN fastest possible upload rate: us + Sui dP2P = max { F/us, F/min(di) , NF/(us + Sui) } i 2: Application Layer

42 Server-client vs. P2P: example
Client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us 2: Application Layer

43 Overview Centralized Database Query Flooding
Napster Query Flooding Gnutella Hierarchical Query Flooding KaZaA Swarming BitTorrent Unstructured Overlay Routing Freenet Structured Overlay Routing Distributed Hash Tables Other P2P

44 Freenet: History In 1999, I. Clarke started the Freenet project
Basic Idea: Employ shortest-path-like routing on the overlay network to publish and locate files Addition goals: Provide anonymity and security Make censorship difficult P2P

45 Freenet: Overview Routed Queries:
Join: on startup, client contacts a few other nodes it knows about; gets a unique node id Publish: route file contents toward the file id. File is stored at node with id closest to file id Search: route query for file id toward the closest node id (DFS+caching in freenet vs. BFS+nocaching in Gnutella) Fetch: when query reaches a node containing file id, it returns the file to the sender P2P

46 Freenet: Routing Tables
id – file identifier (e.g., hash of file) next_hop – another node that stores the file id file – file identified by id being stored on the local node Forwarding of query for file id If file id stored locally, then stop Forward data back to upstream requestor If not, search for the “closest” id in the table, and forward the message to the corresponding next_hop If data is not found, failure is reported back Requestor then tries next closest match in routing table id next_hop file P2P

47 DFS+caching in freenet vs. BFS+nocaching in Gnutella)
Freenet: Routing query(10) n1 n2 1 4 n1 f4 12 n2 f12 5 n3 9 n3 f9 4’ 4 n4 n5 2 14 n5 f14 13 n2 f13 3 n6 5 4 n1 f4 10 n5 f10 8 n6 3 n3 3 n1 f3 14 n4 f14 5 n3 id next_hop file DFS+caching in freenet vs. BFS+nocaching in Gnutella) P2P

48 Freenet: Routing Properties
“Close” file ids tend to be stored on the same node Why? Publications of similar file ids route toward the same place Network tend to be a “small world” Small number of nodes have large number of neighbors (i.e., ~ “six-degrees of separation”) Consequence: Most queries only traverse a small number of hops to find the file (cf. also rule) P2P

49 Freenet: Discussion Pros: Cons: +/- anonymity
Intelligent routing makes queries relatively short Search scope small (only nodes along search path involved); no flooding Cons: Still no provable guarantees! +/- anonymity Anonymity properties may give you “plausible deniability” Anonymity features make it hard to measure, debug P2P

50 Overview Centralized Database Query Flooding
Napster Query Flooding Gnutella Hierarchical Query Flooding KaZaA Swarming BitTorrent Unstructured Overlay Routing Freenet Structured Overlay Routing Distributed Hash Tables Other P2P

51 DHT: History In , academic researchers said “we want to play too!” Motivation: Frustrated by popularity of all these “half-baked” P2P apps :) We can do better! (so we said) Guaranteed lookup success for files in system Provable bounds on search time Provable scalability to millions of node Hot Topic in networking ever since P2P

52 Motivation How to find data in a distributed file sharing system? N2 N3 N1 Publisher Key=“LetItBe” Value=MP3 data Internet N5 ? N4 Client Lookup(“LetItBe”) Lookup is a key problem P2P

53 Centralized Solution N2 N3 N1 DB N5 N4 Central server (Napster)
Publisher Key=“LetItBe” Value=MP3 data Internet DB N5 N4 Client Lookup(“LetItBe”) Requires O(M) state Single point of failure P2P

54 Distributed Solution (1)
Flooding (Gnutella, Morpheus, etc.) N2 N3 N1 Publisher Key=“LetItBe” Value=MP3 data Internet N5 N4 Client Lookup(“LetItBe”) Worst case O(N) messages per lookup P2P

55 Distributed Solution (2)
Routed messages (Freenet, Tapestry, Chord, CAN, etc.) N2 N3 N1 Publisher Key=“LetItBe” Value=MP3 data Internet N5 N4 Client Lookup(“LetItBe”) Only exact matches P2P

56 Routing Challenges Define a useful key nearness metric
Keep the hop count small Keep the routing tables “right size” Stay robust despite rapid changes in membership P2P

57 DHT: Overview Abstraction: a distributed “hash-table” (DHT) data structure: put(id, item); item = get(id); Implementation: nodes in system form a distributed data structure Can be Ring, Tree, Hypercube, Skip List, Butterfly Network, ... P2P

58 DHT: Overview (2) Structured Overlay Routing:
Join: On startup, contact a “bootstrap” node and integrate yourself into the distributed data structure; get a node id Publish: Route publication for file id toward a close node id along the data structure Search: Route a query for file id toward a close node id. Data structure guarantees that query will meet the publication. Fetch: Two options: Publication contains actual file => fetch from where query stops Publication says “I have file X” => query tells you has X, use IP routing to get X from P2P

59 Chord Overview Provides peer-to-peer hash lookup service:
Lookup(key)  IP address How does Chord locate a node? How does Chord maintain routing tables? How does Chord cope with changes in membership? P2P

60 Node identifier = SHA-1(IP address)
Chord IDs m bit identifier space for both keys and nodes Key identifier = SHA-1(key) Key=“LetItBe” ID=60 SHA-1 Node identifier = SHA-1(IP address) IP=“ ” ID=123 SHA-1 Both are uniformly distributed How to map key IDs to node IDs? P2P

61 Consistent Hashing [Karger 97]
K5 IP=“ ” N123 K20 Circular 7-bit ID space K101 N32 Key=“LetItBe” N90 K60 A key is stored at its successor: node with next higher ID P2P

62 Consistent Hashing Every node knows of every other node
requires global information Routing tables are large O(N) Lookups are fast O(1) N10 Where is “LetItBe”? Hash(“LetItBe”) = K60 N123 N32 “N90 has K60” K60 N90 N55 P2P

63 Chord: Basic Lookup Every node knows its successor in the ring N10
N10 Where is “LetItBe”? N123 Hash(“LetItBe”) = K60 N32 “N90 has K60” N55 K60 N90 requires O(N) time P2P

64 “Finger Tables” Every node knows m other nodes in the ring
Increase distance exponentially N112 N16 N96 N80 P2P

65 “Finger Tables” Finger i points to successor of n+2i N120 N112 N16 N96
N96 N80 P2P

66 Lookups are Faster Lookups take O(Log N) hops Resembles binary search
Lookup(K19) N80 N60 P2P

67 Chord properties Efficient: O(Log N) messages per lookup
N is the total number of servers Scalable: O(Log N) state per node Robust: survives massive changes in membership Proofs are in paper / tech report Assuming no malicious participants P2P

68 DHT: Chord Summary Routing table size? Routing time? Log N fingers
Each hop expects to 1/2 the distance to the desired id => expect O(log N) hops. P2P

69 DHT: Discussion Pros: Cons: Guaranteed Lookup
O(log N) per node state and search scope Cons: No one uses them? (only one file sharing app) Supporting non-exact match search is hard P2P

70 Overview Centralized Database Query Flooding
Napster Query Flooding Gnutella Hierarchical Query Flooding KaZaA Swarming BitTorrent Unstructured Overlay Routing Freenet Structured Overlay Routing Distributed Hash Tables Other P2P

71 Other P2P networks Content-Addressable Network (CAN) topological routing (k-dimensional grid) Tapestry, Pastry, DKS: ring organization as Chord, other measure of key closeness, k-ary search instead of binary search paradigm Viceroy: butterfly-type overlay network .... for sharing computational resources... Skype P2P

72 P2P Case study: Skype inherently P2P: pairs of users communicate.
Skype clients (SC) inherently P2P: pairs of users communicate. proprietary application-layer protocol (inferred via reverse engineering) hierarchical overlay with SNs Index maps usernames to IP addresses; distributed over SNs Skype login server Supernode (SN) 2: Application Layer

73 Peers as relays Problem when both Alice and Bob are behind “NATs”.
NAT prevents an outside peer from initiating a call to insider peer Solution: Using Alice’s and Bob’s SNs, Relay is chosen Each peer initiates session with relay. Peers can now communicate through NATs via relay 2: Application Layer

74 P2P: Summary Observe: Application-layer networking (sublayers...)
Many different styles of organization (data, routing); pros and cons of each Lessons learned: Single points of failure are bad Flooding messages to everyone is bad Underlying network topology is important Not all nodes are equal Need incentives to discourage freeloading Privacy and security are important Structure can provide bounds and guarantees Observe: Application-layer networking (sublayers...) P2P

75 Bibliography Ion Stoica, Robert Morris, David Karger, Frans Kaashoek, Hari Balakrishnan. Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications. Proceedings of the ACM SIGCOMM, 2001. Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker. A Scalable Content-Addressable Network. Proceedings of the ACM SIGCOMM, 2001. M.A. Jovanovic, F.S. Annexstein, and K.A.Berman. Scalability Issues in Large Peer-to-Peer Networks - A Case Study of Gnutella. University of Cincinnati, Laboratory for Networks and Applied Graph Theory, 2001. Frank Dabek, Emma Brunskill, M. Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan. Building Peer-to-Peer Systems With Chord, a Distributed Lookup Service. Proceedings of the 8th Workshop on Hot Topics in Operating Systems (HotOS-VIII), 2001. Ian Clarke, Oskar Sandberg, Brandon Wiley, and Theodore W. Hong. Freenet: A Distributed Anonymous Information Storage and Retrieval System. Designing Privacy Enhancing Technologies: International Workshop on Design Issues in Anonymity and Unobservability. LLNCS Springer Verlag 2001. P2P

76 Some extra notes P2P

77 Chord: Joining the Ring
Three step process: Initialize all fingers of new node Update fingers of existing nodes Transfer keys from successor to new node Less aggressive mechanism (lazy finger update): Initialize only the finger to successor node Periodically verify immediate successor, predecessor Periodically refresh finger table entries P2P

78 Joining the Ring - Step 1 Initialize the new node finger table
Locate any node p in the ring Ask node p to lookup fingers of new node N36 Return results to new node N5 N20 N99 N36 1. Lookup(37,38,40,…,100,164) N40 N80 N60 P2P

79 Joining the Ring - Step 2 Updating fingers of existing nodes
new node calls update function on existing nodes existing nodes can recursively update fingers of other nodes N5 N20 N99 N36 N40 N80 N60 P2P

80 Joining the Ring - Step 3 Transfer keys from successor node to new node only keys in the range are transferred N5 N20 N99 K30 K38 N36 Copy keys from N40 to N36 N40 K30 K38 N80 N60 P2P

81 Chord: Handling Failures
Use successor list Each node knows r immediate successors After failure, will know first live successor Correct successors guarantee correct lookups Guarantee is with some probability Can choose r to make probability of lookup failure arbitrarily small P2P

82 Chord: Evaluation Overview
Quick lookup in large systems Low variation in lookup costs Robust despite massive failure Experiments confirm theoretical results P2P


Download ppt "Peer-to-Peer Applications"

Similar presentations


Ads by Google