Distributed Lookup Systems

Distributed Lookup Systems
Chord: A Scalable Peer-to-peer Lookup Service for Internet App. A Scalable Content Addressable Network

Motivation How to find data in a distributed system? N2 N3 N1 N5 ? N4
Publisher Key=“LetItBe” Value=MP3 data Internet N5 ? N4 Client Lookup(“LetItBe”)

Applications Peer-to-peer systems Mirroring (web caching)
Napster, Gnutella, Groove, FreeNet, … Large scale storage management systems Publius, OceanStore, PAST, Farsite, CFS ... Mirroring (web caching) Any wide area name resolution system

Outline Types of solutions Evaluation Criteria CAN and Chord
basic idea insert/retrieve join/leave recovery from failures

Centralized Solution N2 N3 N1 DB N5 N4 Central server (Napster)
Publisher Key=“LetItBe” Value=MP3 data Internet DB N5 N4 Client Lookup(“LetItBe”)

Distributed Solution (1)
Flooding (Gnutella, Morpheus, etc.) N2 N3 N1 Publisher Internet N5 N4 Client Lookup(“LetItBe”) Worst case O(m) messages per lookup

Distributed Solution (2)
Routed messages (Freenet, Tapestry, Chord, CAN, etc.) N2 N3 N1 Publisher Key=“LetItBe” Value=MP3 data Internet N5 N4 Client Lookup(“LetItBe”)

Routing Challenges Define a useful nearness metric
Keep the hop count small Keep the routing table right size Stay robust despite robust changes in membership

Evaluation Scalability Latency Load balancing Robustness
routing path length per-node state Latency Load balancing Robustness routing fault tolerance data availability

Content Addressable Network
Basic idea Internet scale hash table Interface insert(key,value) value = retrieve(key) Table partitioned among many individual node

CAN - Solution virtual d-dimensional Cartesian coordinate space
dynamically partitioned into zones, each “owned” by one node a key mapped into a point P (key, value) is stored at the node which owns the zone P belongs to

CAN - Example I

CAN - Example node I::insert(K,V) I

CAN - Example node I::insert(K,V) I (1) a = hx(K) x = a

CAN - Example I node I::insert(K,V) (1) a = hx(K) b = hy(K) y = b

CAN - Example I node I::insert(K,V) (1) a = hx(K) b = hy(K)
(2) route(K,V)-> (a,b)

CAN - Example I node I::insert(K,V) (1) a = hx(K) b = hy(K) (K,V)
(2) route(K,V)-> (a,b) (3) (a,b) stores (K,V) (K,V)

CAN - Routing each node maintains state for its neighbors
message contains dst coordinates greedy forwarding to neighbor with coordinates closest to dst (a,b) (x,y)

CAN - Node Insertion I new node
1) discover some node “I” already in CAN

CAN – Node Insertion I (p,q) 2) pick random point in space new node

CAN – Node Insertion J I new node
3) I routes to (p,q), discovers node J

CAN - Node Insertion J new
4) split J’s zone in half… new owns one half

CAN – Node Failure Need to repair the space recover database
soft-state updates use replication, rebuild database from replicas repair routing takeover algorithm

CAN – Takeover Algorithm
Simple failures know your neighbor’s neighbors when a node fails, one of its neighbors takes over its zone More complex failure modes simultaneous failure of multiple adjacent nodes scoped flooding to discover neighbors hopefully, a rare event

CAN - Evaluation Scalability
per node, number of neighbors is 2d average routing path is (dn1/d)/4 hops Can scale the network without increasing per-node state

CAN – Design Improvements
increase dimensions, realities (reduce the path length) heuristics (reduce the per-CAN-hop latency) RTT-weighted routing multiple nodes per zone (peer nodes) deterministically replicate entries

CAN - Weaknesses Impossible to perform a fuzzy search
Susceptible to malicious activity Maintain coherence of all the indexed data (Network overhead, Efficient distribution) Still relatively higher routing latency Poor performance w/o improvement

Chord Based on consistent hashing for key to node mapping
standard hashing, e.g. x->ax+b (mod(p)) provide good balance across bins consistent hashing small change in the bucket set does not induce a total remapping of items to buckets

Node identifier = SHA-1(IP address)
Chord IDs m bit identifier space for both keys and nodes Key identifier = SHA-1(key) Key=“LetItBe” ID=60 SHA-1 Node identifier = SHA-1(IP address) IP=“ ” ID=123 SHA-1 Both are uniformly distributed

Chord – Consistent Hashing
K5 IP=“ ” N123 K20 Circular 7-bit ID space K101 N32 Key=“LetItBe” N90 K60 A key is stored at its successor: node with next higher ID

Chord – Basic Lookup Every node knows its successor in the ring N10
N10 Where is “LetItBe”? N123 Hash(“LetItBe”) = K60 N32 “N90 has K60” N55 K60 N90

Chord – “Finger Tables”
Every node knows m other nodes in the ring Increase distance exponentially Finger i points to successor of n+2i N80 N112 N96 N16

Chord – “Finger Tables”
(0) N32’s Finger Table N113 N40 N40 N40 N40 N52 N70 N102 N102 N32 N85 N40 N80 N79 N52 N70 N60

Chord – Lookup Algorithm
(0) N32 N60 N79 N70 N113 N40 N52 N80 N40 N40 N40 N40 N52 N70 N102 N32’s Finger Table N85

Chord - Node Insertion Three step process:
Initialize all fingers of new node Update fingers of existing nodes Transfer keys from successor to new node Less aggressive mechanism (lazy finger update): Initialize only the finger to successor node Periodically verify immediate successor, predecessor Periodically refresh finger table entries

Joining the Ring – Step 1 Initialize the new finger table
locate any node p in the ring ask node p to lookup fingers of new node N36 N5 N20 N99 N36 1. Lookup(37,38,40,…,100,164) N40 N80 N60

Joining the Ring – Step 2 Updating fingers of existing nodes
new node calls update function on existing nodes existing node can recursively update fingers of other nodes N5 N20 N99 N36 N40 N80 N60

Joining the Ring – Step 3 Transfer keys from successor node to new node N5 N20 N99 N36 K30 K38 Copy keys from N40 to N36 N40 K30 K38 N80 N60

Chord – Handling Failures
Use successor list each node know r immediate successors after failure will know first live successor correct successors guarantee correct lookups Guarantee is with some probability Can choose r to make probability of lookup failure arbitrarily small

Chord - Evaluation Efficient: O(Log N) messages per lookup
N is the total number of servers Scalable: O(Log N) state per node Robust: survives massive changes in membership Assuming no malicious participants

Chord - Weakness NOT that simple (compared to CAN)
Member joining is complicated aggressive mechanisms requires too many messages and updates no analysis of convergence in lazy finger mechanism Key management mechanism mixed between layers upper layer does insertion and handle node failures Chord transfer keys when node joins (no leave mechanism!) Routing table grows with # of members in group Worst case lookup can be slow

Summary Both systems: fully distributed and scalable efficient lookup
robust simplicity (?) susceptible to malicious activity

How these related to OSD?
Very similar if data is “public” If data is “private”, only a few locations are available for storing data Does OSD help (make it easy) for peer-to-peer computing? Any more comments?

Distributed Lookup Systems

Similar presentations

Presentation on theme: "Distributed Lookup Systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distributed Lookup Systems

Similar presentations

Presentation on theme: "Distributed Lookup Systems"— Presentation transcript:

Similar presentations

About project

Feedback