Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee.

Similar presentations


Presentation on theme: "Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee."— Presentation transcript:

1 Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

2 2 2 Peer-to-Peer? u Centralized server u Distributed server –Client –server paradigm »Plat: RPC »Hierarchical: DNS, mount –Peer to Peer paradigm »both a web client and a transient web server: Easy part u How a peer determines which peers have the desired content? –Connected peers that have copies of the desired object.: Difficult part »Dynamic member list makes it more difficult –Pure: Gnutella, Chord –Hybrid: Napster, Groove u Other challenges –Scalability: up to hundred of thousands or millions of machines –Dynamicity: machines can come and go any time

3 Prof. Younghee Lee 3 3 Peer-to-Peer? u Network is hard to change. Especially the Internet! u Overlay network => end node => easy to change u If overlay end nodes act as the network node? –Overlay multicast –(VoIP) –File sharing u The Internet was an overlay on the telephone network u Future Internet –“naming” : key design issue today –Querying and data independence : key to tomorrow? »Decouple application level API from data organization

4 Prof. Younghee Lee 4 4 Peer-to-Peer? u Share the resources of individual peers –CPU, disk, bandwidth, information, … u Communication and collaboration –Magi, Groove, Skype u File sharing –Napster, Gnutella, Kazaa, Freenet, Overnet u P2P applications built over emerging overlays –PlanetLab

5 Prof. Younghee Lee 5 5 Peer-to-Peer? u Distributed computing –SETI@Home »is a scientific experiment that will harness the power of hundreds of thousands of Internet-connected computers in the Search for Extra- Terrestrial Intelligence »Server assigns work unit: u Computers send machine information to server u Server assigns Task u Computers send results –folding@Home »distributed computing project which studies protein folding, misfolding, aggregation, and related diseases

6 Prof. Younghee Lee 6 6 Overlay Networks u Virtual edge: TCP connection or simply a pointer to an IP address u Overlay maintenance: Ping?, messaging?, new edge? (neighbor goes down),.. TCP/IP P2P/overlay middleware P2P applications/ DNS, CDN, ALM,..

7 Prof. Younghee Lee 7 7 P2P file sharing u Napster –Centralized, sophisticated search –C-S search –Point to point file transfer u Gnutella –Open source, Flooding, TTL, unreachable nodes u FastTrack (KaZaA) –Heterogeneous peers u Freenet –Anonymity, caching, replication

8 Prof. Younghee Lee 8 8 Centralized directory : Napster u Napster: first commercial company, –for MP3 distribution u Large scale server(server farm) u How to find a file: –On startup, client contacts central server and reports list of files –Query the index system  return a machine that stores the required file »Ideally this is the closest/least-loaded machine –ftp the file directly from peer u Centralized index –Lawsuits –Denial of service u Copyright issues –Direct infringement: download/upload –Indirect infringement: Individual accountables for actions of others, contributory

9 Prof. Younghee Lee 9 9 Centralized Lookup (Napster) Inform & update Publisher@ Client Query for content Lookup(“title”) N6N6 N9N9 N7N7 DB N8N8 N3N3 N2N2 N1N1 SetLoc(“title”, N4) Simple, but O( N ) state single point of failure Performance bottleneck Copyright infringement Key=“title” Value=MP3 data… N4N4 To keep its database current Directory server can determine when peer become disconnected -send msg periodically to the peers -Permanent TCP connection with connected peer

10 Prof. Younghee Lee 10 Decentralized directory:Flooding(1) u Gnutella –Distribute file location –Idea: flood the request –How to find a file: »Send request to all neighbors »Neighbors recursively multicast the request »A machine that has the file receives the request, and it sends back the answer »Transfers are done with HTTP between peers –Advantages: »Totally decentralized, highly robust –Disadvantages: »Not scalable; the entire network can be swamped with request (to alleviate this problem, each request has a TTL(limited scope query) »worst case O( N ) messages per lookup

11 Prof. Younghee Lee 11 Decentralized directory:Flooding(2) u FastTrack (aka Kazaa) –Modifies the Gnutella protocol into two-level hierarchy »Hybrid of Gnutella and Napster –Group leader: Super node »Nodes that have better connection to Internet »Act as temporary directory servers for other nodes in group »Maintains database, mapping names of content to IP address of its group member »Not a dedicated server; an ordinary server –Bootstrapping node »A peer wants to join the network contacts this node. »This node can designate this peer as new bootstrapping node. –Standard nodes: Ordinary node »Connect to super nodes and report list of files »Allows slower nodes to participate –Broadcast (Gnutella-style) search across Group leader peer; Query flooding –Drawbacks »Fairly complex protocol to construct and maintain the overlay network »Group leader have more responsibility. Not truly decentralized »Still not purely serverless(Bootstrapping node is on “always up server”) Overlay peer Group leader peer Neighboring relationships In overlay network KaZaA metadata - File name - File size - Content Hash - File descriptors: used for keyword matches during query

12 Prof. Younghee Lee 12 Gossip protocols u Epidemic algorithms u Originally targeted at database replication –Rumor mongering »Propagate newly received update to k random neighbors u Extended to routing –Rumor mongering of queries instead of flooding

13 Prof. Younghee Lee 13 Hierarchical Networks u IP –Hierarchical routing u DNS –Hierarchical name space »Client + hierarchy of server u Pros & Cons of Hierarchical data management –Work well for things aligned with hierarchy »Physical locality –Inflexible: no data independence u A Layered naming Architecture for the Internet [Balak+ 04] –Three levels of name resolution: for “mobility. Multihoming, integrate middlebox(NAT, firewall)…..” »From user level descriptors to service identifiers (SID) »From SID to endpoint identifier (EID) »From EID to IP address ; DNS –Flat names for SID and EID –Scalable resolution for flat architecture? => DHT

14 Prof. Younghee Lee 14 Commercial products u JXTA –Java/XML Framework for p2p applications –Name resolution and routing is done with floods & superpeers u MS WinXP p2p networking –An unstructured overlay, flooded publication and caching –“does not yet support distributed searches” u Security support –Authentication via signatures (assumes a trusted authority) –Encryption of traffic u Groove –Platform for p2p “experience”. MS collaboration tools. »Microsoft Office 2007 –Client-server name resolution, backup services, etc.

15 Prof. Younghee Lee 15 Routed lookups: Freenet, Chord, CAN N4N4 Publisher Client N6N6 N9N9 N7N7 N8N8 N3N3 N2N2 N1N1 Lookup(“title”) Key=“title” Value=MP3 data…

16 Prof. Younghee Lee 16 Routing: Freenet u Addition goals to file location: –Provide publisher anonymity, security –Resistant to attacks – a third party shouldn’t be able to deny the access to a particular file (data item, object), even if it compromises a large fraction of machines u Architecture: –Each file is identified by a unique identifier –Each machine stores a set of files, and maintains a “routing table” to route the individual requests u Files are stored according to associated key; unique identifier –Core idea: try to cluster information about similar keys u Messages –Random 64bit ID used for loop detection

17 Prof. Younghee Lee 17 Routing: Freenet Routing Tables u Each node maintains a common stack –id – file identifier –next_hop – another node that stores the file id –file – file identified by id being stored on the local node u Forwarding of query for file id –If file id stored locally, then stop »Forward data back to upstream requestor »Requestor adds file to cache, adds entry in routing table –If not, search for the “closest” id in the stack, and forward the message to the corresponding next_hop –If data is not found, failure is reported back »Requestor then tries next closest match in routing table id next_hop file … …

18 Prof. Younghee Lee 18 Query u API: file = query(id); u Notes: –Any node forwarding reply may change the source of the reply (to itself or any other node) »Helps anonymity –Each query is associated a TTL that is decremented each time the query message is forwarded; »to obscure distance to originator: u TTL can be initiated to a random value within some bounds u When TTL=1, the query is forwarded with a finite probability u Depth counter –Opposite of TTL – incremented with each hop –Depth counter initialized to small random value –Each node maintains the state for all outstanding queries that have traversed it  help to avoid cycles –When file is returned, the file is cached along the reverse path

19 Prof. Younghee Lee 19 Routing: Freenet Example Note: doesn’t show file caching on the reverse path 4 n1 f4 12 n2 f12 5 n3 9 n3 f9 3 n1 f3 14 n4 f14 5 n3 14 n5 f14 13 n2 f13 3 n6 n1 n2 n3 n4 4 n1 f4 10 n5 f10 8 n6 n5 query(10) 1 2 3 44’ 5

20 Prof. Younghee Lee 20 Insert u API: insert(id, file); u Two steps –Search for the file to be inserted –If not found, insert the file u Searching: –like query, but nodes maintain state after a collision is detected and the reply is sent back to the originator u Insertion –Follow the forward path; insert the file at all nodes along the path –A node probabilistically replace the originator with itself; obscure the true originator

21 Prof. Younghee Lee 21 Cache Management u LRU(Least Recently Used) Cache of files –Files are not guaranteed to live forever »Files “fade away” as fewer requests are made for them u File contents can be encrypted with original text names as key(id) –Cache owners do not know either original name or contents  cannot be held responsible

22 Prof. Younghee Lee 22 Freenet Summary u Advantages –Provides publisher anonymity –Totally decentralize architecture  robust and scalable –Resistant against malicious file deletion u Disadvantages –Does not always guarantee that a file is found, even if the file is in the network

23 Prof. Younghee Lee 23 Routing: Structured Approaches u Goal: make sure that an item (file) identified is always found in a reasonable # of steps u Abstraction: a distributed hash-table (DHT) data structure –insert(id, item); –item = query(id); –Note: item can be anything: a data object, document, file, pointer to a file… u Proposals –CAN (ICIR/Berkeley) –Chord (MIT/Berkeley) –Pastry (Rice) –Tapestry (Berkeley)

24 Prof. Younghee Lee 24 High level idea: Indirection u Indirection in space –Logical IDs (Content based) »Routing to those IDs »“Content addressable” network –Tolerant of “nodes joining and leaving the network” u Indirection in time –Scheme to temporally decouple send and receive –Soft state »“publisher” requests TTL on storage u Distributed Hash Table –Directed search

25 Prof. Younghee Lee 25 Distributed Hash Table (DHT) u Hash table –Data structure that maps “keys” to “values” u DHT –Hash table but spread across the Internet u Interface –insert(key, value) –lookup(key) u Every DHT node supports a single operation –Given key as input route messages toward node holding key

26 Prof. Younghee Lee 26 Distributed Hash Table (DHT)

27 Prof. Younghee Lee 27 K V DHT in action: put() Operation: take key as input; route messages to node holding key insert(K 1,V 1 ) (K 1,V 1 )

28 Prof. Younghee Lee 28 retrieve (K 1 ) K V DHT in action: get() Operation: take key as input; route messages to node holding key

29 Prof. Younghee Lee 29 Routing: Chord u Associate to each node and item a unique id in an uni-dimensional space u Goals –Scales to hundreds of thousands of nodes –Handles rapid arrival and failure of nodes u Properties –Routing table size O(log(N)), where N is the total number of nodes –Guarantees that a file is found in O(log(N)) steps

30 Prof. Younghee Lee 30 Aside:Consistent Hashing [Karger97] A key is stored at its successor: node with next higher ID This is designed to let nodes enter and leave the network with minimal disruption

31 Prof. Younghee Lee 31 Routing: Chord Basic Lookup

32 Prof. Younghee Lee 32 Routing: Finger table - Faster Lookups

33 Prof. Younghee Lee 33 Routing: join operation

34 Prof. Younghee Lee 34 Routing: join operation Before and after node 6 joins.

35 Prof. Younghee Lee 35 Routing: Chord Summary u Assume identifier space is 0…2 m u Each node maintains –Finger table »Entry i in the finger table of n is the first node that succeeds or equals n + 2 i –Predecessor node u An item identified by id is stored on the successor node of id u Pastry –Similar to Chord

36 Prof. Younghee Lee 36 Routing: Chord Example u Assume an identifier space 0..8 u Node n1:(1) joins  all entries in its finger table are initialized to itself 0 1 2 3 4 5 6 7 i id+2 i succ 0 2 1 1 3 1 2 5 1 Succ. Table

37 Prof. Younghee Lee 37 Routing: Chord Example u Node n2:(3) joins 0 1 2 3 4 5 6 7 i id+2 i succ 0 2 2 1 3 1 2 5 1 Succ. Table i id+2 i succ 0 3 1 1 4 1 2 6 1 Succ. Table

38 Prof. Younghee Lee 38 Routing: Chord Example u Nodes n3:(0), n4:(6) join 0 1 2 3 4 5 6 7 i id+2 i succ 0 2 2 1 3 6 2 5 6 Succ. Table i id+2 i succ 0 3 6 1 4 6 2 6 6 Succ. Table i id+2 i succ 0 1 1 1 2 2 2 4 0 Succ. Table i id+2 i succ 0 7 0 1 0 0 2 2 2 Succ. Table

39 Prof. Younghee Lee 39 Routing: Chord Examples u Nodes: n1:(1), n2(3), n3(0), n4(6) u Items: f1:(7), f2:(2) 0 1 2 3 4 5 6 7 i id+2 i succ 0 2 2 1 3 6 2 5 6 Succ. Table i id+2 i succ 0 3 6 1 4 6 2 6 6 Succ. Table i id+2 i succ 0 1 1 1 2 2 2 4 0 Succ. Table 7 Items 1 i id+2 i succ 0 7 0 1 0 0 2 2 2 Succ. Table

40 Prof. Younghee Lee 40 Routing: Query u Upon receiving a query for item id, a node u Check whether stores the item locally u If not, forwards the query to the largest node in its successor table that does not exceed id 0 1 2 3 4 5 6 7 i id+2 i succ 0 2 2 1 3 6 2 5 6 Succ. Table i id+2 i succ 0 3 6 1 4 6 2 6 6 Succ. Table i id+2 i succ 0 1 1 1 2 2 2 4 0 Succ. Table 7 Items 1 i id+2 i succ 0 7 0 1 0 0 2 2 2 Succ. Table query(7)

41 Prof. Younghee Lee 41 CAN u Associate to each node and item a unique id in an d- dimensional space –Virtual Cartesian coordinate space u Entire space is partitioned amongst all the nodes –Every node “owns” a zone in the overall space u Abstraction –Can store data at “points” in the space –Can route from one “point” to another u Point = node that owns the enclosing zone u Properties –Routing table size O(d) –Guarantees that a file is found in at most d*n 1/d steps, where n is the total number of nodes

42 Prof. Younghee Lee 42 CAN E.g.: Two Dimensional Space u Space divided between nodes u All nodes cover the entire space u Each node covers either a square or a rectangular area of ratios 1:2 or 2:1 u Example: –Assume space size (8 x 8) –Node n1:(1, 2) first node that joins  cover the entire space 1 234 5 670 1 2 3 4 5 6 7 0 n1

43 Prof. Younghee Lee 43 CAN E.g.: Two Dimensional Space u Node n2:(4, 2) joins  space is divided between n1 and n2 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2

44 Prof. Younghee Lee 44 CAN E.g.: Two Dimensional Space u Node n2:(4, 2) joins  space is divided between n1 and n2 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3

45 Prof. Younghee Lee 45 CAN E.g.: Two Dimensional Space u Nodes n4:(5, 5) and n5:(6,6) join 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5

46 Prof. Younghee Lee 46 CAN E.g.: Two Dimensional Space u Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6) u Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5); 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5 f1 f2 f3 f4

47 Prof. Younghee Lee 47 CAN E.g.: Two Dimensional Space u Each item is stored by the node who owns its mapping in the space 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5 f1 f2 f3 f4

48 Prof. Younghee Lee 48 CAN: Query Example u Each node knows its neighbors in the d-space u Forward query to the neighbor that is closest to the query id u Example: assume n1 queries f4 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5 f1 f2 f3 f4

49 Prof. Younghee Lee 49 CAN: Query Example u Each node knows its neighbors in the d-space u Forward query to the neighbor that is closest to the query id u Example: assume n1 queries f4 u Can route around some failures 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5 f1 f2 f3 f4

50 Prof. Younghee Lee 50 Node Failure Recovery u Simple failures –Know your neighbor’s neighbors –When a node fails, one of its neighbors takes over its zone u More complex failure modes –Simultaneous failure of multiple adjacent nodes –Scoped flooding to discover neighbors –Hopefully, a rare event

51 Prof. Younghee Lee 51 Routing: Concerns/optimization u Each hop in a routing-based P2P network can be expensive –No correlation between neighbors and their location –A query can repeatedly jump from Europe to North America, though both the initiator and the node that store the item are in Europe! –Solutions: Tapestry takes care of this implicitly; CAN and Chord maintain multiple copies for each entry in their routing tables and choose the closest in terms of network distance u CAN/Chord Optimizations –Weight neighbor nodes by RTT »When routing, choose neighbor who is closer to destination with lowest RTT from me »Reduces path latency –Multiple physical nodes per virtual node »Reduces path length (fewer virtual nodes) »Reduces path latency (can choose physical node from virtual node with lowest RTT) »Improved fault tolerance (only one node per zone needs to survive to allow routing through the zone) u What type of lookups? –Only exact match!

52 Prof. Younghee Lee 52 BitTorent u A p2p file sharing system –Load sharing through file splitting –Uses bandwidth of peers instead of a server u Successfully used: –Used to distribute RedHat 9 ISOs (about 80TB) u Setup –A “seed” node has the file –File is split into fixed-size segments (256KB typ) –Hash calculated for each segment –A “tracker” node is associated with the file –A “.torrent” meta-file is built for the file – identifies the address of the tracker node –The.torrent file is passed around the web

53 Prof. Younghee Lee 53 BitTorent u Download –A client contacts the tracker identified in the.torrent file (using HTTP) –Tracker sends client a (random) list of peers who have/are downloading the file –Client contacts peers on list to see which segments of the file they have –Client requests segments from peers (via TCP) –Client uses hash from.torrent to confirm that segment is legitimate –Client reports to other peers on the list that it has the segment –Other peers start to contact client to get the segment (while client is getting other segments) file.torrent info: length name hash url of tracker

54 Prof. Younghee Lee 54 Conclusions u Distributed Hash Tables are a key component of scalable and robust overlay networks u CAN: O(d) state, O(d*n1/d) distance u Chord: O(log n) state, O(log n) distance u Both can achieve stretch < 2 u Simplicity is key u Services built on top of distributed hash tables –p2p file storage, i3 (chord) –multicast (CAN, Tapestry) –persistent storage (OceanStore using Tapestry)


Download ppt "Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee."

Similar presentations


Ads by Google