Presentation is loading. Please wait.

Presentation is loading. Please wait.

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 1 Principles of Reliable Distributed Systems Lecture 2: Distributed Hash.

Similar presentations


Presentation on theme: "Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 1 Principles of Reliable Distributed Systems Lecture 2: Distributed Hash."— Presentation transcript:

1 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 1 Principles of Reliable Distributed Systems Lecture 2: Distributed Hash Tables (DHT), Chord Spring 2008 Idit Keidar

2 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 2 Today’s Material Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications –Stoica et al.

3 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 3 Reminder: Peer-to-Peer Lookup Insert (key, file) Lookup (key) –Should find keys inserted in any node

4 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 4 Reminder: Overlay Networks A virtual structure imposed over the physical network (e.g., the Internet) –over the Internet, there is a (IP level) link between every pair of nodes –an overlay uses a fixed subset of these Why restrict to a subset?

5 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 5 Routing/Lookup in Overlays How does one route a packet to its destination in an overlay? How about lookup (key)? Unstructured overlay: (last week) –Flooding or random walks Structured overlay: (today) –The links are chosen according to some rule –Tables define next-hop for routing and lookup

6 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 6 Structured Lookup Overlays Many academic systems – –CAN, Chord, D2B, Kademlia, Koorde, Pastry, Tapestry, Viceroy, … OverNet based on the Kademlia algorithm Symmetric, no hierarchy Decentralized self management Structured overlay – data stored in a defined place, search goes on a defined path Implement Distributed Hash Table (DHT) abstraction

7 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 7 Reminder: Hashing Data structure supporting the operations: –void insert( key, item ) –item search( key ) Implementation uses hash function for mapping keys to array cells Expected search time O(1) –provided that there are few collisions

8 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 8 Distributed Hash Tables (DHTs) Nodes store table entries –The role of array cells Good abstraction for lookup? –Why?

9 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 9 The DHT Service Interface lookup( key ) returns the location of the node currently responsible for this key key is usually numeric (in some range)

10 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 10 Using the DHT Interface How do you publish a file? How do you find a file? Requirements for an application being able to use DHTs? –Data identified with unique keys –Nodes can (agree to) store keys for each other location of object (pointer) or actual object (data)

11 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 11 What Does a DHT Implementation Need to Do? Map keys to nodes –Needs to be dynamic as nodes join and leave –How does this affect the service interface? Route a request to the appropriate node –Routing on the overlay

12 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 12 Lookup Example K V insert (K 1,V 1 ) K V (K 1,V 1 ) lookup(K 1 )

13 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 13 Mapping Keys to Nodes Goal: load balancing –Why? Typical approach: –Give an m-bit id to each node and each key (e.g., using SHA-1 on the key, IP address) –Map key to node whose id is “close” to the key (need distance function) –How is load balancing achieved?

14 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 14 Routing Issues Each node must be able to forward each lookup query to a node closer to the destination Maintain routing tables adaptively –Each node knows some other nodes –Must adapt to changes (joins, leaves, failures) –Goals?

15 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 15 Handling Join/Leave When a node joins it needs to assume responsibility for some keys –Ask the application to move these keys to it –How many keys will need to be moved? When a nodes fails or leaves, its keys have to be moved to others –What else is needed in order to implement this?

16 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 16 P2P System Interface Lookup Join Move keys

17 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 17 Chord Stoica, Morris, Karger, Kaashoek, and Balakrishnan

18 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 18 Chord Logical Structure m-bit ID space (2 m IDs), usually m=160. Think of nodes as organized in a logical ring according to their IDs. N1 N8 N10 N14 N21 N30 N38 N42 N48 N51 N56

19 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 19 Consistent Hashing: Assigning Keys to Nodes Key k is assigned to first node whose ID equals or follows k – successor(k) N1 N8 N10 N14 N21 N30 N38 N42 N48 N51 N56 K54

20 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 20 Moving Keys upon Join/Leave When a node joins, it becomes responsible for some keys previously assigned to its successor –Local change –Assuming load is balanced, how many keys should move? And what happens when a node leaves?

21 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 21 Consistent Hashing Guarantees For any set of N nodes and K keys, w.h.p.: –Each node is responsible for at most (1 +  )K/N keys –When an (N + 1)st node joins or leaves, responsibility for O(K/N) keys changes hands (only to or from the joining or leaving node) For the scheme described above,  = O(logN)  can be reduced to an arbitrarily small constant by having each node run (logN) virtual nodes, each with its own identifier

22 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 22 Simple Routing Solutions Each node knows only its successor –Routing around the circle –Good idea? Each node knows all other nodes –O(1) routing –Cost?

23 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 23 Chord Skiplist Routing Each node has “fingers” to nodes ½ way around the ID space from it, ¼ the way… finger[i] at n contains successor(n+2 i-1 ) successor is finger[1] N0 N8 N10 N14 N21 N30 N38 N42 N48 N51 N56 How many entries in the finger table?

24 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 24 Example: Chord Fingers N0 N10 N21 N30 N47 finger[1..4] N72 N82 N90 N114 finger[5] finger[6] finger[7] m entries log N distinct fingers with high probability

25 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 25 Chord Data Structures (At Each Node) Finger table First finger is successor Predecessor

26 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 26 Forwarding Queries Query for key k is forwarded to finger with highest ID not exceeding k K54 Lookup( K54 ) N0 N8 N10 N14 N21 N30 N38 N42 N48 N51 N56

27 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 27 How long does it take? Remote Procedure Call (RPC)

28 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 28 Routing Time Node n looks up a key stored at node p p is in n’s ith interval: p  ((n+2 i-1 )mod 2 m, (n+2 i )mod 2 m ] n contacts f=finger[i] –The interval is not empty (because p is in it) so: f  ((n+2 i-1 )mod 2 m, (n+2 i )mod 2 m ] –RPC f f is at least 2 i-1 away from n p is at most 2 i-1 away from f The distance is halved: maximum m steps

29 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 29 Routing Time Refined Assuming uniform node distribution around the circle, the number of nodes in the search space is halved at each step: –Expected number of steps: log N Note that: –m = 160 –For 1,000,000 nodes, log N = 20

30 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 30 What About Network Distance? K54 Lookup( K54 ) N0 N8 N10 N14 N21 N30 N38 N42 N48 N51 N56 Haifa Texas China

31 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 31 Joining Chord Goals? Required steps: –Find your successor –Initialize finger table and predecessor –Notify other nodes that need to change their finger table and predecessor pointer O(log 2 N) –Learn the keys that you are responsible for; notify others that you assume control over them

32 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 32 Join Algorithm: Take II Observation: for correctness, successors suffice –Fingers only needed for performance Upon join, update successor only Periodically, –Check that successors and predecessors are consistent –Fix fingers

33 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 33 Creation and Join

34 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 34

35 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 35 Join Example joiner finds successor gets keys stabilize fixes successor stabilize fixes predecessor

36 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 36 Join Stabilization Guarantee If any sequence of join operations is executed interleaved with stabilizations, –Then at some time after the last join –The successor pointers form a cycle on all the nodes in the network Model assumptions?

37 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 37 Performance with Concurrent Joins Assume a stable network with N nodes with correct finger pointers Now, another set of up to N nodes joins the network, –And all successor pointers (but perhaps not all finger pointers) are correct, Then lookups still take O(logN) time w.h.p. Model assumptions?

38 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 38 Failure Handling Periodically fixing fingers List of r successors instead of one successor Periodically probing predecessors:

39 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 39 Failure Detection Each node has a local failure detector module Uses periodic probes and timeouts to check liveness of successors and fingers –If the probed node does not respond by a designated timeout, it is suspected to be faulty A node that suspects its successor (finger) finds a new successor (finger) False suspicion - the suspected node is not faulty –Suspected due to communication problems

40 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 40 The Model? Reliable messages among correct nodes –No network partitions Node failures can be accurately detected! –No false suspicions Properties hold as long as failure is bounded: –Assume a list of r = (logN) successors –Start from stable state and then each node fails with prob. 1/2 –Then w.h.p. find successor returns the closest living successor to the query key –And the expected time to execute find successor is O(logN)

41 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 41 What Can Partitions Do? N0 N8 N10 N14 N21 N38 N42 N51 N56 Suspect successor N30 Suspect successor N48 Suspect successor

42 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 42 What About Moving Keys? Left up to the application Solution: keep soft state, refreshed periodically –Every refresh operation performs lookup(key) before storing the key in the right place How can we increase reliability for the time between failure and refresh?

43 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 43 Summary: DHT Advantages Peer-to-peer: no centralized control or infrastructure Scalability: O(log N) routing, routing tables, join time Load-balancing Overlay robustness

44 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 44 DHT Disadvantages No control where data is stored In practice, organizations want: –Content Locality – explicitly place data where we want (inside the organization) –Path Locality – guarantee that local traffic (a user in the organization looks for a file of the organization) remains local No prefix search


Download ppt "Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 1 Principles of Reliable Distributed Systems Lecture 2: Distributed Hash."

Similar presentations


Ads by Google