Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M

Similar presentations


Presentation on theme: "Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M"— Presentation transcript:

1 Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications
Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M. Frans Kaashoek, Frank Dabek, Hari Balakrishnan February 2003 IEEE/ACM Transactions on Networking (TON) Presented by Leland Smith

2 Organization Introduction: Concepts Chord Protocol Simulation Results
What is Chord? Distributed Hash Tables Chord Protocol Simulation Results Future work Summary and Conclusion Questions

3 Intro: What is Chord? Structured peer-to-peer overlay network
A logical network built on top of the existing internet. Fully distributed, no central authority All nodes are equal. Challenges? How to locate data? How to route efficiently and correctly? How to adapt to changes in the network?

4 Intro: What is Chord? m = 6 10 nodes 5 keys

5 Intro: What is Chord? Chord is a highly structured peer-to-peer key lookup service, based on distributed hash tables. Does not specify how to store the data, simply how to find it. It is an API providing just one function: lookup(key), which returns the node at which key should be stored, if it exists. Designed for higher level services to be built on top of its basic mechanism, such as persistent storage through replication. Example: Storage (CFS), Indexing (DNS)

6 Intro: Distributed Hash Tables
Hash tables associate keys with data. DHTs are an abstraction of the classic hash table with the load distributed across nodes in a network. Each connected node is responsible for a portion of the shared keyspace of the hash table. Uses a fixed keyspace that all hash values fall within. Other DHT based networks: FreeNet, Pastry, Tapestry, CAN, Ohaha

7 Intro: Unique Identifiers
Each Node and Key has a unique identifier Chord uses the 160 bit SHA-1 hash function, making collisions extremely unlikely (2160 possible keys), and hash values scattered enough for approximate load balancing. Identifiers are generated by the hash function at each node using an understood input that is unique to the node. Example: IP address Each node computes its own identifier, it is not assigned. Centralization…

8 Chord Ring (1) predecessor(x) node x successor(x) m = 6
Identifier Circle mod 2m Node & Key IDs Successor Key location “Owning” Keys m = 6 10 nodes 5 keys

9 Chord Ring (2) A correct successor is the only piece of data required for correctness of the protocol. Additional information is maintained only to speed up lookups and provide fault tolerance.

10 Simple Key Location (1) Naïve key lookup strategy:
Is the key I’m looking for between me and my successor? If so, then my successor is responsible for the key If not, then forward the lookup request to my successor. It’ll eventually arrive at an answer, but it will pass through every node in the identifier space between the source node and the node hosting the key! This obviously doesn’t scale well!

11 Simple Key Location (2)

12 Scalable Key Location (1)
Chord improves lookup performance by maintaining a routing table of m nodes for an m-bit identifier space, called the finger table. The ith entry of the table (the ith finger) at node n contains the IP address and identifier of the first node that succeeds n by at least 2i-1 on the Chord ring. For i = 1, 21-1 = 20 = 1, so the first finger of a node is the node immediately following it on the Chord ring, its successor.

13 Scalable Key Location (2)

14 Scalable Key Location (3)
A better key lookup strategy: Is the key I’m looking for between me and my successor? If so, then my successor is responsible for the key If not, consult my finger table, starting at the node furthest from me (mth finger in an m-bit identifier space), working backward, and find the first node whose identifier is between me and the key host, and forward the lookup to that node. Requires only O(log N) routes before arriving at an answer for an N node network.

15 Scalable Key Location (4)

16 Stabilization A node must keep its finger table up to date when considering nodes joining, leaving and failing in a dynamic network. To achieve this, each node periodically runs a stabilization process, which locates all of its fingers in the dynamic network. More importantly, it makes sure it is its successor’s predecessor. If it isn’t, then a node has joined between it and its successor, therefore the node that joined is its new successor.

17 Fault Tolerance A Chord node must be able to withstand the failure of nodes in the network, most importantly, its successor. To guard against successor failure, a Chord node maintains a list of its next r successors in its successor list. r consecutive nodes must fail simultaneously in order for the ring to be disrupted, which is very improbable with even modest values of r. Guarding against failure in the finger table is less important. If a node fails during a lookup, the lookup may simply try the next finger in the table.

18 Joining A joining node must know of a node that is already connected to a Chord network. The joining node asks the existing node to find its successor. Once it knows its successor, the joining node has connected to the network. Its successor can then transfer keys it “owns” whose identifiers lie in the joining node’s keyspace to the joining node.

19 Simulation Results (1) Load Balancing: Total Keys vs. Keys per Node
Although variations exist because node identifiers do not uniformly cover the entire identifier space, each node is responsible for about K / N keys in a network with N nodes and K total keys. On average, with high probability, each node is responsible for O(1 / N) of the identifier space. N=10,000 nodes, K=500,000 keys

20 Simulation Results (2) Lookup Performance: Path Length vs. Number of Nodes For a network with N nodes, with high probability, lookups are performed in: O(log N) route hops

21 Problems addressed Load Balance: Distributed hash function spreads keys evenly over the nodes Decentralization: Fully distributed, self organizing Scalability: Lookup grows as a log of number of nodes Availability: Automatically adjusts internal tables to reflect changes in the network. Flexible Naming: No constraints on key structure.

22 Summary and Conclusion
Solves the common peer-to-peer problem of locating a key in a network in a very efficient, decentralized manner. However, keyword searches are difficult. Provable correctness In an N node network: Maintains routing information on only O(log N) nodes Resolves all lookups in O(log N) routing hops Promising future

23 Future Work Detecting and healing partitions.
Protecting against malicious nodes. Network/geographic sensitive routing. Realistic load balancing, sensitive to each node’s unique combination of attributes. Anonymity, or at least plausible deniability.

24 Applications Cooperative mirroring Time-shared storage
Distributed indexes Large-scale combinatorial search DNS? Chat?

25 Questions? Thank You!


Download ppt "Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M"

Similar presentations


Ads by Google