UCDavis, ecs251 Spring 2007 05/03/2007P2P1 Operating System Models ecs251 Spring 2007: Operating System Models #3: Peer-to-Peer Systems Dr. S. Felix Wu.

UCDavis, ecs251 Spring 2007 05/03/2007P2P1 Operating System Models ecs251 Spring 2007: Operating System Models #3: Peer-to-Peer Systems Dr. S. Felix Wu Computer Science Department University of California, Davis http://www.cs.ucdavis.edu/~wu/ sfelixwu@gmail.com

UCDavis, ecs251 Spring 2007 05/03/2007P2P2 The role of service provider.. l Centralized management of services –DNS, Google, www.cnn.com, Blockbuster, SBC/Sprint/AT&T, cable service, Grid computing, AFS, bank transactions…www.cnn.com l Information, Computing, & Network resources owned by one or very few administrative domains. –Some with SLA (Service Level Agreement)

UCDavis, ecs251 Spring 2007 05/03/2007P2P3 Interacting with the “SP” l Service providers are the owner of the information and the interactions –Some enhance/establish the interactions

UCDavis, ecs251 Spring 2007 05/03/2007P2P4 Let’s compare … l Google l Blockbuster l CNN l MLB/NBA l LinkIn l e-Bay l Skype l Bittorrent l Blog l Youtube l BotNet l Cyber-Paparazzi

UCDavis, ecs251 Spring 2007 05/03/2007P2P5 Toward P2P l More participation of the end nodes (or their users) –More decentralized Computing/Network resources available –End-user controllability and interactions –Security/robustness concerns

UCDavis, ecs251 Spring 2007 05/03/2007P2P6 Service Providers in P2P l We might not like SP, but we still can not avoid SP entirely. –Who is going to lay the fiber and switch? –Can we avoid DNS? –How can we stop “Cyber-Bullying” and other similar? –Copyright enforcement? –Internet becomes a junkyard?

UCDavis, ecs251 Spring 2007 05/03/2007P2P7 We will discuss… l P2P system examples –Unstructured, structured, incentive l Architectural analysis and issues l Future P2P applications and why?

UCDavis, ecs251 Spring 2007 05/03/2007P2P8 Challenge to you… l Define a new P2P-related application, service, or architecture. l Justify why it is practical, useful and will scale well. –Example: sharing cooking recipes, experiences & recommendations about restaurants and hotels

UCDavis, ecs251 Spring 2007 05/03/2007P2P9 Napster l P2P File sharing l “Unstructured”

UCDavis, ecs251 Spring 2007 05/03/2007P2P10Napster

UCDavis, ecs251 Spring 2007 05/03/2007P2P11 Napster l Advantages? l Disadvantages?

UCDavis, ecs251 Spring 2007 05/03/2007P2P12

UCDavis, ecs251 Spring 2007 05/03/2007P2P14 l Originally conceived of by Justin Frankel, 21 year old founder of Nullsoft l March 2000, Nullsoft posts Gnutella to the web l A day later AOL removes Gnutella at the behest of Time Warner l The Gnutella protocol version 0.4 http://www9.limewire.com/developer/gnutella_protocol_0.4.pdf and version 0.6 http://rfc-gnutella.sourceforge.net/Proposals/Ultrapeer/Ultrapeers.htm l there are multiple open source implementations at http://sourceforge.net/ including: –Jtella –Gnucleus l Software released under the Lesser Gnu Public License (LGPL) l the Gnutella protocol has been widely analyzed

UCDavis, ecs251 Spring 2007 05/03/2007P2P15 Gnutella Protocol Messages l Broadcast Messages –Ping: initiating message (“I’m here”) –Query: search pattern and TTL (time-to-live) l Back-Propagated Messages –Pong: reply to a ping, contains information about the peer –Query response: contains information about the computer that has the needed file l Node-to-Node Messages –GET: return the requested file –PUSH: push the file to me

UCDavis, ecs251 Spring 2007 05/03/2007P2P16 1 2 3 4 5 6 7 A Steps: Node 2 initiates search for file A

UCDavis, ecs251 Spring 2007 05/03/2007P2P17 1 2 3 4 5 6 7 A Steps: Node 2 initiates search for file A Sends message to all neighbors A A

UCDavis, ecs251 Spring 2007 05/03/2007P2P18 1 2 3 4 5 6 7 A Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message A A A

UCDavis, ecs251 Spring 2007 05/03/2007P2P19 1 2 3 4 5 6 7 Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message A:5 A A:7 A A

UCDavis, ecs251 Spring 2007 05/03/2007P2P20 1 2 3 4 5 6 7 Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message Query reply message is back- propagated A:5 A:7 A A

UCDavis, ecs251 Spring 2007 05/03/2007P2P21 1 2 3 4 5 6 7 Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message Query reply message is back- propagated A:5 A:7

UCDavis, ecs251 Spring 2007 05/03/2007P2P22 1 2 3 4 5 6 7 Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message Query reply message is back- propagated File download Note: file transfer between clients behind firewalls is not possible; if only one client, X, is behind a firewall, Y can request that X push the file to Y download A Limited Scope Flooding Reverse Path Forwarding

UCDavis, ecs251 Spring 2007 05/03/2007P2P23 Gnutella l Advantages? l Disadvantages?

UCDavis, ecs251 Spring 2007 05/03/2007P2P24 l GUID: Short for Global Unique Identifier, a randomized string that is used to uniquely identify a host or message on the Gnutella Network. This prevents duplicate messages from being sent on the network. l GWebCache: a distributed system for helping servants connect to the Gnutella network, thus solving the "bootstrapping" problem. Servants query any of several hundred GWebCache servers to find the addresses of other servants. GWebCache servers are typically web servers running a special module. l Host Catcher: Pong responses allow servants to keep track of active gnutella hosts l On most servants, the default port for Gnutella is 6346

UCDavis, ecs251 Spring 2007 05/03/2007P2P26 “Limited Scope Flooding” Ripeanu reported that Gnutella traffic totals 1Gbps (or 330TB/month). –Compare to 15,000TB/month in US Internet backbone (December 2000) –this estimate excludes actual file transfers Reasoning:  QUERY and PING messages are flooded. They form more than 90% of generated traffic  predominant TTL=7  >95% of nodes are less than 7 hops away  measured traffic at each link about 6kbs  network with 50k nodes and 170k links

UCDavis, ecs251 Spring 2007 05/03/2007P2P27 A DB C E H G F Perfect Mapping

UCDavis, ecs251 Spring 2007 05/03/2007P2P28 A DB C E H G F l Inefficient mapping l Link D-E needs to support six times higher traffic.

UCDavis, ecs251 Spring 2007 05/03/2007P2P29 Topology mismatch The overlay network topology doesn’t match the underlying Internet infrastructure topology!  40% of all nodes are in the 10 largest Autonomous Systems (AS)  Only 2-4% of all TCP connections link nodes within the same AS  Largely ‘random wiring’ l Most Gnutella generated traffic crosses AS border, making the traffic more expensive l May cause ISPs to change their pricing scheme

UCDavis, ecs251 Spring 2007 05/03/2007P2P30 Scalability l Whenever a node receives a message, (ping/query) it sends copies out to all of its other connections. l existing mechanisms to reduce traffic: –TTL counter –Cache information about messages they received, so that they don't forward duplicated messages.

UCDavis, ecs251 Spring 2007 05/03/2007P2P31 l 70% of Gnutella users share no files l 90% of users answer no queries l Those who have files to share may limit number of connections or upload speed, resulting in a high download failure rate. l If only a few individuals contribute to the public good, these few peers effectively act as centralized servers.

UCDavis, ecs251 Spring 2007 05/03/2007P2P32 Anonymity l Gnutella provides for anonymity by masking the identity of the peer that generated a query. l However, IP addresses are revealed at various points in its operation: HITS packets includes the URL for each file, revealing the IP addresses

UCDavis, ecs251 Spring 2007 05/03/2007P2P33 Query Expressiveness l Format of query not standardized l No standard format or matching semantics for the QUERY string. Its interpretation is completely determined by each node that receives it. l String literal vs. regular expression l Directory name, filename, or file contents l Malicious users may even return files unrelated to the query

UCDavis, ecs251 Spring 2007 05/03/2007P2P34 Superpeers l Cooperative, long-lived peers typically with significant resources to handle very high amount of query resolution traffic.

UCDavis, ecs251 Spring 2007 05/03/2007P2P36  Gnutella is a self-organizing, large-scale, P2P application that produces an overlay network on top of the Internet; it appears to work  Growth is hindered by the volume of generated traffic and inefficient resource use  since there is no central authority the open source community must commit to making any changes  Suggested changes have been made by –Peer-to-Peer Architecture Case Study: Gnutella Network, by Matei Ripeanu –Improving Gnutella Protocol: Protocol Analysis and Research Proposals by Igor Ivkovic

UCDavis, ecs251 Spring 2007 05/03/2007P2P37 Freenet l Essentially the same as Gnutella: –Limited-scope flooding –Reverse-path forwarding l Difference: –Data objects (I.e., files) are also being delivered via “reverse-path forwarding”

UCDavis, ecs251 Spring 2007 05/03/2007P2P38 P2P Issues l Scalability & Load Balancing l Anonymity l Fairness, Incentives & Trust l Security and Robustness l Efficiency l Mobility

UCDavis, ecs251 Spring 2007 05/03/2007P2P39 Incentive-driven Fairness l P2P means we all should contribute.. –Hopefully fair, but the majority is selfish… l “Incentive for people to contribute…”

UCDavis, ecs251 Spring 2007 05/03/2007P2P40 Bittorrent: “Tit for Tat” l Equivalent Retaliation (Game theory) –A peer will “initially” cooperate, then respond in kind to an opponent's previous action. If the opponent previously was cooperative, the agent is cooperative. If not, the agent is not.

UCDavis, ecs251 Spring 2007 05/03/2007P2P41 Bittorrent l Fairness of download and upload between a pair of peers l Every 10 seconds, estimate the download bandwidth from the other peer –Based on the performance estimation to decide to continue uploading to the other peer or not

UCDavis, ecs251 Spring 2007 05/03/2007P2P42 Client & its Peers l Client –Download rate (from the peers) l Peers –Upload rate (to the client)

UCDavis, ecs251 Spring 2007 05/03/2007P2P43 BT Choking by Client l By default, every peer is “choked” –stop “uploading” to them, but the TCP connection is still there. l Select four peers to “unchoke” –Best “upload rates” and “interested”. –Uploading to the unchoked ones and monitor the download rate for all the peers –“Re-choke” every 30 seconds l Optimistic Unchoking –Randomly select a choked peer to unchoke

UCDavis, ecs251 Spring 2007 05/03/2007P2P44 “Interested” l A request for a piece (or its sub-pieces)

UCDavis, ecs251 Spring 2007 05/03/2007P2P45 Becoming “seed” l Use “upload” rate to the peers to decide which peers to unchoke.

UCDavis, ecs251 Spring 2007 05/03/2007P2P46 Bittorrent Wiki

UCDavis, ecs251 Spring 2007 05/03/2007P2P47 BT Peer Selection l From the “Tracker” –We receive a partial list of all active peers for the same file –We can get another 50 from the tracker if we want

UCDavis, ecs251 Spring 2007 05/03/2007P2P48 Piece Selection l Piece (64K~1M) Sub-piece (16K) –Piece-size: trade-off between performance and the size of the torrent file itself –A client might request different sub-pieces of the same piece from different peers. l Strict Priority - sub-pieces and piece l Rarest First –Exception: “random first” –Get the stuff out of Seed(s) as soon as possible..

UCDavis, ecs251 Spring 2007 05/03/2007P2P49 Rarest First l Exchanging bitmaps with 20+ peers –Initial messages –“have” messages l Array of buckets –I th buckets contains “pieces” with I known instances –Within the same bucket, the client will randomly select one piece.

UCDavis, ecs251 Spring 2007 05/03/2007P2P50 Random-First l Usually, rare-first pieces are rare. l The client has to get all the sub-pieces from one or very few peers. l For the first 4~5 pieces, get some random pieces so the client can have a few pieces to upload.

UCDavis, ecs251 Spring 2007 05/03/2007P2P51 BitTorrent l Connect to the Tracker l Connect to 20+ peers l Random-first or Rarest-first l Monitoring the download rate from the peers (or upload rate to the client) l Unchoke and Optimistic Unchoke

UCDavis, ecs251 Spring 2007 05/03/2007P2P52 Bittorrent l Advantages l Disadvantages

UCDavis, ecs251 Spring 2007 05/03/2007P2P53 Trackerless Bittorrent l Every BT peer is a tracker! l But, how would they share and exchange information regarding other peers? l Similar to Napster’s index server or DNS

UCDavis, ecs251 Spring 2007 05/03/2007P2P54 Pure P2P l Every peer is a tracker l Every peer is a DNS server l Every peer is a Napster Index server l How can this be done? –We try to remove/reduce the role of “special servers”!

UCDavis, ecs251 Spring 2007 05/03/2007P2P55 Peer l The requirements of Peer?

UCDavis, ecs251 Spring 2007 05/03/2007P2P56 Structured Peering l Peer identity and routability

UCDavis, ecs251 Spring 2007 05/03/2007P2P57 Structured Peering l Peer identity and routability l Key/content assignment –Which identity owns what? (Google Search?)

UCDavis, ecs251 Spring 2007 05/03/2007P2P58 Structured Peering l Peer identity and routability l Key/content assignment –Which identity owns what? Napster: centralized index service Skype/Kazaa: login-server & super peers DNS: hierarchical DNS servers Two problems: (1). How to connect to the “ring”? (2). How to prevent failures/changes?

UCDavis, ecs251 Spring 2007 05/03/2007P2P59 DHT l Distributed hash tables (DHTs) –decentralized lookup service of a hash table –(name, value) pairs stored in the DHT –any peer can efficiently retrieve the value associated with a given name –the mapping from names to values is distributed among peers

UCDavis, ecs251 Spring 2007 05/03/2007P2P60 HT as a search table Index key Information/content is distributed, and we need to know where? Where is this piece of music? What is the location of this type of content? What is the current IP address of this skype user?

UCDavis, ecs251 Spring 2007 05/03/2007P2P61 DHT as a search table Index key ???

UCDavis, ecs251 Spring 2007 05/03/2007P2P64 DHT l Scalable l Peer arrivals, departures, and failures l Unstructured versus structured

UCDavis, ecs251 Spring 2007 05/03/2007P2P65 DHT (Name, Value) l How to utilize DHT to avoid Trackers in Bittorrent?

UCDavis, ecs251 Spring 2007 05/03/2007P2P66 DHT-based Tracker Index key Whoever owns this hash entry is the tracker for the corresponding key! FreeBSD 5.4 CD images Publish the key on the class web site. Seed’s IP address PUT & GET

UCDavis, ecs251 Spring 2007 05/03/2007P2P67 Chord l Consistent Hashing l A Simple Key Lookup Algorithm l Scalable Key Lookup Algorithm l Node Joins and Stabilization l Node Failures

UCDavis, ecs251 Spring 2007 05/03/2007P2P68 Chord l Given a key (data item), it maps the key onto a peer. l Uses consistent hashing to assign keys to peers. l Solves problem of locating key in a collection of distributed peers. l Maintains routing information as peers join and leave the system

UCDavis, ecs251 Spring 2007 05/03/2007P2P69 Issues l Load balance: distributed hash function, spreading keys evenly over peers l Decentralization: chord is fully distributed, no node more important than other, improves robustness l Scalability: logarithmic growth of lookup costs with number of peers in network, even very large systems are feasible l Availability: chord automatically adjusts its internal tables to ensure that the peer responsible for a key can always be found

UCDavis, ecs251 Spring 2007 05/03/2007P2P70 Example Application l Highest layer provides a file-like interface to user including user- friendly naming and authentication l This file systems maps operations to lower-level block operations l Block storage uses Chord to identify responsible node for storing a block and then talk to the block storage server on that node File System Block Store Chord Block Store Chord Block Store Chord ClientServer

UCDavis, ecs251 Spring 2007 05/03/2007P2P71 Consistent Hashing l Consistent hash function assigns each peer and key an m-bit identifier. l SHA-1 is used as a base hash function. l A peer’s identifier is defined by hashing the peer’s IP address. l A key identifier is produced by hashing the key (chord doesn’t define this. Depends on the application). –ID(peer) = hash(IP, Port) –ID(key) = hash(key)

UCDavis, ecs251 Spring 2007 05/03/2007P2P72 Consistent Hashing l In an m-bit identifier space, there are 2 m identifiers. l Identifiers are ordered on an identifier circle modulo 2 m. l The identifier ring is called Chord ring. l Key k is assigned to the first peer whose identifier is equal to or follows (the identifier of) k in the identifier space. l This peer is the successor peer of key k, denoted by successor(k).

UCDavis, ecs251 Spring 2007 05/03/2007P2P73 6 1 2 6 0 4 26 5 1 3 7 2 identifier circle identifier node X key Consistent Hashing - Successor Peers successor(1) = 1 successor(2) = 3successor(6) = 0

UCDavis, ecs251 Spring 2007 05/03/2007P2P74 l For m = 6, # of identifiers is 64. l The following Chord ring has 10 nodes and stores 5 keys. l The successor of key 10 is node 14.

UCDavis, ecs251 Spring 2007 05/03/2007P2P75 Consistent Hashing – Join and Departure l When a node n joins the network, certain keys previously assigned to n’s successor now become assigned to n. l When node n leaves the network, all of its assigned keys are reassigned to n’s successor.

UCDavis, ecs251 Spring 2007 05/03/2007P2P76 Node Join 0 4 26 5 1 3 7 keys 1 2 7 5

UCDavis, ecs251 Spring 2007 05/03/2007P2P77 Node Departure 0 4 26 5 1 3 7 keys 1 2 6 7

UCDavis, ecs251 Spring 2007 05/03/2007P2P78 Technical Issues l ???

UCDavis, ecs251 Spring 2007 05/03/2007P2P79 Consistent Hashing l When node 26 joins the network:

UCDavis, ecs251 Spring 2007 05/03/2007P2P80 A Simple Key Lookup l A very small amount of routing information suffices to implement consistent hashing in a distributed environment l If each node knows only how to contact its current successor node on the identifier circle, all node can be visited in linear order. l Queries for a given identifier could be passed around the circle via these successor pointers until they encounter the node that contains the key.

UCDavis, ecs251 Spring 2007 05/03/2007P2P81 A Simple Key Lookup l Pseudo code for finding successor: // ask node n to find the successor of id n.find_successor(id) if (id  (n, successor]) return successor; else // forward the query around the circle return successor.find_successor(id);

UCDavis, ecs251 Spring 2007 05/03/2007P2P82 A Simple Key Lookup l The path taken by a query from node 8 for key 54:

UCDavis, ecs251 Spring 2007 05/03/2007P2P83 Successor l Each active node MUST know the IP address of its successor! –N8 has to know that the next node on the ring is N14. l Departure N8 => N21 l But, how about failure or crash?

UCDavis, ecs251 Spring 2007 05/03/2007P2P84 Robustness l Successor in R hops –N8 => N14, N21, N32, N38 (R=4) –Periodic pinging along the path to check, & also find out maybe there are “new members” in between

UCDavis, ecs251 Spring 2007 05/03/2007P2P85 Is that good enough?

UCDavis, ecs251 Spring 2007 05/03/2007P2P86 Complexity of the search l Time/messages: O(N) –N: # of nodes on the Ring l Space: O(1) –We only need to remember R IP addresses l Stablization depends on “period”.

UCDavis, ecs251 Spring 2007 05/03/2007P2P87 Scalable Key Location l To accelerate lookups, Chord maintains additional routing information. l This additional information is not essential for correctness, which is achieved as long as each node knows its correct successor.

UCDavis, ecs251 Spring 2007 05/03/2007P2P88 Scalable Key Location – Finger Tables l Each node n’ maintains a routing table with up to m entries (which is in fact the number of bits in identifiers), called finger table. l The i th entry in the table at node n contains the identity of the first node s that succeeds n by at least 2 i-1 on the identifier circle. l s = successor(n+2 i-1 ). l s is called the i th finger of node n, denoted by n.finger(i)

UCDavis, ecs251 Spring 2007 05/03/2007P2P89 Scalable Key Location – Finger Tables 0 4 26 5 1 3 7 1 2 4 1 3 0 finger table startsucc. keys 1 235235 330330 finger table startsucc. keys 2 457457 000000 finger table startsucc. keys 6 0+2 0 0+2 1 0+2 2 For. 1+2 0 1+2 1 1+2 2 For. 3+2 0 3+2 1 3+2 2 For.

UCDavis, ecs251 Spring 2007 05/03/2007P2P90 Finger Tables l A finger table entry includes both the Chord identifier and the IP address (and port number) of the relevant node. l The first finger of n is the immediate successor of n on the circle.

UCDavis, ecs251 Spring 2007 05/03/2007P2P91 Scalable Key Location – Example query l The path a query for key 54 starting at node 8:

UCDavis, ecs251 Spring 2007 05/03/2007P2P92 Scalable Key Location – A characteristic l Since each node has finger entries at power of two intervals around the identifier circle, each node can forward a query at least halfway along the remaining distance between the node and the target identifier. From this intuition follows a theorem: Theorem: With high probability, the number of nodes that must be contacted to find a successor in an N-node network is O(logN).

UCDavis, ecs251 Spring 2007 05/03/2007P2P93 Complexity of the Search l Time/messages: O(logN) –N: # of nodes on the Ring l Space: O(logN) –We need to remember R IP addresses –We need to remember logN Fingers l Stablization depends on “period”.

UCDavis, ecs251 Spring 2007 05/03/2007P2P94 An Example l M = 4096 (identifier size), ring size is 2 4096 l N = 2 16 (# of nodes) l How many entries we need to have for the Finger Table? Each node n’ maintains a routing table with up to m entries (which is in fact the number of bits in identifiers), called finger table. The i th entry in the table at node n contains the identity of the first node s that succeeds n by at least 2 i-1 on the identifier circle. s = successor(n+2 i-1 ).

UCDavis, ecs251 Spring 2007 05/03/2007P2P95 Complexity of the Search l Time/messages: O(M) –M: # of bits of the identifier l Space: O(M) –We need to remember R IP addresses –We need to remember M Fingers l Stablization depends on “period”.

UCDavis, ecs251 Spring 2007 05/03/2007P2P96 Structured Peering l Peer identity and routability –2 M identifiers, Finger Table routing l Key/content assignment –Hashing l Dynamics/Failures –Inconsistency??

UCDavis, ecs251 Spring 2007 05/03/2007P2P97 Node Joins and Stabilizations l The most important thing is the successor pointer. l If the successor pointer is ensured to be up to date, which is sufficient to guarantee correctness of lookups, then finger table can always be verified. l Each node runs a “stabilization” protocol periodically in the background to update successor pointer and finger table.

UCDavis, ecs251 Spring 2007 05/03/2007P2P98 Node Joins and Stabilizations l “Stabilization” protocol contains 6 functions: –create( ) –join( ) –stabilize( ) –notify( ) –fix_fingers( ) –check_predecessor( )

UCDavis, ecs251 Spring 2007 05/03/2007P2P99 Node Joins – join() l When node n first starts, it calls n.join(n’), where n’ is any known Chord node. l The join() function asks n’ to find the immediate successor of n. l join() does not make the rest of the network aware of n.

UCDavis, ecs251 Spring 2007 05/03/2007P2P100 Node Joins – join() // create a new Chord ring. n.create() predecessor = nil; successor = n; // join a Chord ring containing node n’. n.join(n’) predecessor = nil; successor = n’.find_successor(n);find_successor

UCDavis, ecs251 Spring 2007 05/03/2007P2P101 Scalable Key Location – find_successor() l Pseudo code: // ask node n to find the successor of id n.find_successor(id) if (id  (n, successor]) return successor; else n’ = closest_preceding_node(id); return n’.find_successor(id); // search the local table for the highest predecessor of id n.closest_preceding_node(id) for i = m downto 1 if (finger[i]  (n, id)) return finger[i]; return n;

UCDavis, ecs251 Spring 2007 05/03/2007P2P102 Node Joins – stabilize() l Each time node n runs stabilize(), it asks its successor for the it’s predecessor p, and decides whether p should be n’s successor instead. l stabilize() notifies node n’s successor of n’s existence, giving the successor the chance to change its predecessor to n. l The successor does this only if it knows of no closer predecessor than n.

UCDavis, ecs251 Spring 2007 05/03/2007P2P103 Node Joins – stabilize() // called periodically. verifies n’s immediate // successor, and tells the successor about n. n.stabilize() x = successor.predecessor; if (x  (n, successor)) successor = x; successor.notify(n); // n’ thinks it might be our predecessor. n.notify(n’) if (predecessor is nil or n’  (predecessor, n)) predecessor = n’;

UCDavis, ecs251 Spring 2007 05/03/2007P2P104 Node Joins – Join and Stabilization npnp succ(n p ) = n s nsns n pred(n s ) = n p l n joins –predecessor = nil –n acquires n s as successor via some n’ l n runs stabilize –n notifies n s being the new predecessor –n s acquires n as its predecessor l n p runs stabilize –n p asks n s for its predecessor (now n) –n p acquires n as its successor –n p notifies n –n will acquire n p as its predecessor l all predecessor and successor pointers are now correct l fingers still need to be fixed, but old fingers will still work nil pred(n s ) = n succ(n p ) = n

UCDavis, ecs251 Spring 2007 05/03/2007P2P105 Node Joins – fix_fingers() l Each node periodically calls fix fingers to make sure its finger table entries are correct. l It is how new nodes initialize their finger tables l It is how existing nodes incorporate new nodes into their finger tables.

UCDavis, ecs251 Spring 2007 05/03/2007P2P106 Node Joins – fix_fingers() // called periodically. refreshes finger table entries. n.fix_fingers() next = next + 1 ; if (next > m) next = 1 ; finger[next] = find_successor(n + 2 next-1 );find_successor // checks whether predecessor has failed. n.check_predecessor() if (predecessor has failed) predecessor = nil;

UCDavis, ecs251 Spring 2007 05/03/2007P2P107 Scalable Key Location – find_successor() l Pseudo code: // ask node n to find the successor of id n.find_successor(id) if (id  (n, successor]) return successor; else n’ = closest_preceding_node(id); return n’.find_successor(id); // search the local table for the highest predecessor of id n.closest_preceding_node(id) for i = m downto 1 if (finger[i]  (n, id)) return finger[i]; return n;

UCDavis, ecs251 Spring 2007 05/03/2007P2P108 Node Failures l Key step in failure recovery is maintaining correct successor pointers l To help achieve this, each node maintains a successor-list of its r nearest successors on the ring l If node n notices that its successor has failed, it replaces it with the first live entry in the list l Successor lists are stabilized as follows: –node n reconciles its list with its successor s by copying s’s successor list, removing its last entry, and prepending s to it. –If node n notices that its successor has failed, it replaces it with the first live entry in its successor list and reconciles its successor list with its new successor.

UCDavis, ecs251 Spring 2007 05/03/2007P2P109 Chord – The Math l Every node is responsible for about K/N keys (N nodes, K keys) l When a node joins or leaves an N-node network, only O(K/N) keys change hands (and only to and from joining or leaving node) l Lookups need O(log N) messages l To reestablish routing invariants and finger tables after node joining or leaving, only O(log 2 N) messages are required

UCDavis, ecs251 Spring 2007 05/03/2007P2P1 Operating System Models ecs251 Spring 2007: Operating System Models #3: Peer-to-Peer Systems Dr. S. Felix Wu.

Similar presentations

Presentation on theme: "UCDavis, ecs251 Spring 2007 05/03/2007P2P1 Operating System Models ecs251 Spring 2007: Operating System Models #3: Peer-to-Peer Systems Dr. S. Felix Wu."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

UCDavis, ecs251 Spring 2007 05/03/2007P2P1 Operating System Models ecs251 Spring 2007: Operating System Models #3: Peer-to-Peer Systems Dr. S. Felix Wu.

Similar presentations

Presentation on theme: "UCDavis, ecs251 Spring 2007 05/03/2007P2P1 Operating System Models ecs251 Spring 2007: Operating System Models #3: Peer-to-Peer Systems Dr. S. Felix Wu."— Presentation transcript:

Similar presentations

About project

Feedback