Download presentation
Presentation is loading. Please wait.
Published byJoanna Kelly Modified over 9 years ago
1
Chord Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google, Inc. OSDI 2006
2
Introduction r Dynamo stores objects associated with a key through a simple interface: m get(),put() r It should be possible to scale Dynamo incrementally r This requires the ability to partition data over the set of nodes (storage hosts) r Dynamo relies on a concept called consistent hashing m The approach they used is similar to that found in Chord.
3
Distributed Hash Tables (DHT) r Operationally like standard hash tables r Stores (key, value) pairs m The key is like a filename m The value can be file contents or pointer to location r Goal: Efficiently insert/lookup/delete (key,value) pairs r Each peer stores a subset of (key, value) pairs in the system
4
DHT r Core operation: Find node responsible for a key m Map key to node m Efficiently route insert/lookup/delete request to this node r Allow for frequent node arrivals and departures
5
DHT r Introduce a hash function to map the object being searched for to a unique global identifier: m e.g., h(“NGC’02 Tutorial Notes”) → 8045 r Distribute the range of the hash function among all nodes in the network r Each node must “know about” at least one copy of each object that hashes within its range (when one exists) 0-999 9500-9999 1000-1999 1500-4999 9000-9500 4500-6999 8000-8999 7000-8500 8045
6
DHT:Desirable Properties r Key ID space (search space) is uniformly populated m Mapping of keys to IDs using (consistent) hashing r A node is responsible for indexing all the keys in a certain subspace of the ID space r Nodes have only partial knowledge of other node’s responsibilities r Messages should be routed to a node efficiently (small number of hops) r Node arrival/departure should only affect a few nodes.
7
Consistent Hashing r The main idea: map both keys and nodes (node IPs) to the same (metric) ID space
8
Consistent Hashing r The main idea: map both keys and nodes (node IPs) to the same (metric) ID space The ring is just a possibility. Any metric space will do
9
Consistent Hashing r With high probability, the hash function balances load (all nodes receive roughly the same number of keys). r With high probability, when a node joins (or leaves) the network, only an fraction of the keys are moved to a different location. m This is clearly the minimum necessary to maintain a balanced load.
10
Consistent Hashing r The consistent hash function assigns each node and key an m-bit identifier using SHA-1 as a base hash function. r A node’s identifier is chosen by hashing the node’s IP address. r A key identifier is produced by hashing the key. r For more info see: m D. R. Karger, E. Lehman, F. Leighton, M. Levine, D. Lewin, and R.Panigrahy, “Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on theWorldWideWeb,” in Proc. 29 th ACM Symp. Theory of Computing, El Paso, TX, May 1997, pp. 654–663.
11
P2P Middleware: Differences r Different P2P middlewares differ in: m The choice of the ID space m The structure of their network of nodes (i.e. how each node chooses its neighbors) m For each object, node(s) whose range(s) cover that object must be reachable via a “short” path r This is a major research topic
12
Chord r m bit identifier space for both keys and nodes r Key identifier = SHA-1(key) m Key = “LetItBe” ID=50 m Key = “129.100.16.93” ID=70 r How do we assign keys to nodes? SHA-1
13
Chord r Nodes organized in an identifier circle based on node identifiers r Keys assigned to their successor node in the identifier circle e.g., node with next higher ID.
14
Chord r Hash function ensures even distribution of nodes and keys on the circle r Range covered by node is from previous ID up to its own ID r Assume an N node network
15
Chord: Search Possibilities r Routing table size vs search cost r Every peer knows every other peer: O(N) routing table size r Every peer knows its successor: O(N) search time. r The “compromise” is to have each peer know the next m successors.
16
Finger Table r Let m be the number of bits in the key/node identifiers r Each node, n, maintains a routing table with at most m entries called the finger table. r The i th entry in the table at node n contains the identity of the first node, s, that succeeds n by at least 2 i-1. m s = successor(n+2 i-1 ) m s is called the i th finger of node n
17
Chord:Finger Table Finger table: finger[i] = successor (n + 2 i-1 ) where 1 ≤ i ≤ m O(log N) table size
18
Chord: Finger Table Finger table: finger[i] = successor (n + 2 i-1 )
19
Chord: Finger Table Finger table: finger[i] = successor (n + 2 i-1 )
20
Chord: Finger Table Finger table: finger[i] = successor (n + 2 i-1 )
21
Chord: Finger Table Finger table: finger[i] = successor (n + 2 i-1 )
22
Chord: Finger Table Finger table: finger[i] = successor (n + 2 i-1 )
23
Chord: Finger Table Finger table: finger[i] = successor (n + 2 i-1 )
24
Chord: Finger Table Finger table: finger[i] = successor (n + 2 i-1 )
25
Chord: Finger Table Finger table: finger[i] = successor (n + 2 i-1 )
26
The Chord algorithm – Scalable node localization
27
Chord: Search r Assume node n is searching for key k. r Node n does the following: m Find i th table entry of node n such that k [finger[i].start, finger[i+1].start]) m If no such entry exists then return the node in the last entry of the finger table m The above two steps are repeated until the condition in the first step is satisfied.
28
Chord: Join r Nodes can join (and leave) at any time. r Challenge: Preserving the ability to locate every key in the network r Chord must preserve the following: m Each node’s successor correctly maintained m For every key k, node successor(k) is responsible for k. r For lookups to be fast, it is desirable for the finger tables to be correct.
29
Chord: Join Implementation r Each node in Chord maintains a predecessor pointer. m This consists of the Chord ID and IP address of the immediate predecessor of that node. m It can be used to walk counterclockwise around the identifier circle. r The new node to be added learns the identify of an existing Chord node by some external mechanism
30
Chord: Join Initialization Steps r Assume n is the node to join. r Find any existing node, n’. r Find successor of n from n’. Label this successor(n). r Ask successor(n) for its predecessor. This is labelled as predecessor(successor(n)).
31
Chord: Join Example Assume N26 wants to join; If finds N8 N8’s finger table suggests that N26 will be “between” N21 and N32.
32
Chord: Join (Initialize finger table) r Node n needs to have its finger table initialized r Node n can ask one its predecessor to be for its finger table as a starting point
33
Chord: Join (Changing Existing Finger Tables) r Node n needs to entered into the finger tables of some existing nodes. r Node n becomes the i th finger of node p, iff m p precedes n by at least 2 i-1 ; and m The i th finger of node p succeeds n. r The first node, p, that satisfies these conditions is the immediate predecessor of n-2 i-1 r For a given n, the algorithm starts with the i th finger of node n and then continues to walk in the counter-clock-wise direction on the identifier circle until it encounters a node whose i th finger precedes n.
34
Chord: Join Example (add N26) N21+1N32 N21+2N32 N21+4N32 N21+8N32 N21+16N38 N21+32N56 N21 (old finger table) N21+1N26 N21+2N26 N21+4N26 N21+8N32 N21+16N38 N21+32N56 N21 (new finger table) i=1: Does N21 precede N26 by at least 1 ( 2 i-1 ) ; yes: N21+1 becomes N26; i=2: Does N21 precede N26 by at least 2; yes: N21+2 becomes N26; i=3: Does N21 precede N26 by at least 4; yes: N21+4 becomes N26; i=4: Does N21 precede N26 by 8; no; evaluate N14;
35
Chord: Join Example (add N26) N14+1N21 N14+2N21 N14+4N21 N14+8N32 N14+16N32 N14+32N48 N14 (new finger table) N14+1N21 N14+2N21 N14+4N21 N14+8N26 N14+16N32 N14+32N48 N14 (new finger table) i=4: Does N14 precede N26 by at least 8; yes; N14+8 becomes N26 i=5; Does N15 precede N26 by at least 16; no; evaluate N8 Etc
36
Chord: Join (Transferring Keys) r Move responsibility for all the keys for which node n is the successor. r Typically this involves moving data associated with each key to the new node. r Node n can become the successor for keys that were previously the responsibility of the node immediately following n. r Node n only needs to contact one node to transfer responsibility for all relevant keys.
37
Chord: Join r The previous discussion on join focuses on a single node join. r What if there are multiple node joins? r Join requires that each node’s successor is correctly maintained
38
Chord: Stabilization Protocol r The successor/predecessor links are rebuilt by periodic stabilize notification messages m Sent by each node to its successor to inform it of the (possibly new) identity of the predecessor r The successor pointers are used to verify and correct finger table entries.
39
Chord: Join/Stabilize Example
40
N26 joins the system N26 acquires N32 as its successor N26 notifies N32 N32 acquires N26 as its predecessor
41
Chord: Join/Stabilize Example N26 copies keys N21 runs stabilize() and asks its successor N32 for its predecessor which is N26.
42
Chord: Join/Stabilize Example N21 aquires N26 as its successor
43
Chord Stabilization r Pointers and finger tables may be in a state of flux r Is it possible that data will not be found? m Yes r Recovery: try again
44
Chord: Node Failure N120 N113 N102 N80 N85 N80 doesn’t know correct successor, so incorrect lookup N10 Lookup(90)
45
Chord: Node Failure r Solution: Use successor lists Each node knows r immediate successors r After failure, will know first live successor r Stabilize messages correct finger tables r Replicas of the data associated with a key at the r successor nodes might be used m Application dependent
46
Chord Properties r In a system with N nodes and K keys, with high probability… m each node receives at most K/N keys m each node maintains info. about O(log N) other nodes m lookups resolved with O(log N) hops m Insertions O(log 2 N) r The developers of Chord validated this through simulation studies. r No consistency among replicas r Hops have poor network locality
47
Chord: Network Locality r Nodes close on ring can be far in the network. N20 N41 N80 N40 * Figure from http://project-iris.net/talks/dht-toronto-03.ppt
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.