Distributed Hash Tables Chord and Dynamo Costin Raiciu, Advanced Topics in Distributed Systems 18/12/2012.

Slides:



Advertisements
Similar presentations
Dynamo: Amazon’s Highly Available Key-value Store
Advertisements

Dynamo: Amazon’s Highly Available Key-value Store Slides taken from created by paper authors Giuseppe DeCandia, Deniz Hastorun,
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Dynamo: Amazon’s Highly Available Key-value Store ID2210-VT13 Slides by Tallat M. Shafaat.
Case Study - Amazon. Amazon r Amazon has many Data Centers r Hundreds of services r Thousands of commodity machines r Millions of customers at peak times.
AMAZON’S KEY-VALUE STORE: DYNAMO DeCandia,Hastorun,Jampani, Kakulapati, Lakshman, Pilchin, Sivasubramanian, Vosshall, Vogels: Dynamo: Amazon's highly available.
D YNAMO : A MAZON ’ S H IGHLY A VAILABLE K EY - V ALUE S TORE Presented By Roni Hyam Ami Desai.
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
Chord A Scalable Peer-to-peer Lookup Service for Internet Applications Prepared by Ali Yildiz (with minor modifications by Dennis Shasha)
Technische Universität Yimei Liao Chemnitz Kurt Tutschku Vertretung - Professur Rechner- netze und verteilte Systeme Chord - A Distributed Hash Table Yimei.
Technische Universität Chemnitz Kurt Tutschku Vertretung - Professur Rechner- netze und verteilte Systeme Chord - A Distributed Hash Table Yimei Liao.
Chord: A Scalable Peer-to- Peer Lookup Service for Internet Applications Ion StoicaRobert Morris David Liben-NowellDavid R. Karger M. Frans KaashoekFrank.
CHORD: A Peer-to-Peer Lookup Service CHORD: A Peer-to-Peer Lookup Service Ion StoicaRobert Morris David R. Karger M. Frans Kaashoek Hari Balakrishnan Presented.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Speaker: Cathrin Weiß 11/23/2004 Proseminar Peer-to-Peer Information Systems.
Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M
1 1 Chord: A scalable Peer-to-peer Lookup Service for Internet Applications Dariotaki Roula
Amazon’s Dynamo Simple Cloud Storage. Foundations 1970 – E.F. Codd “A Relational Model of Data for Large Shared Data Banks”E.F. Codd –Idea of tabular.
Dynamo: Amazon's Highly Available Key-value Store Distributed Storage Systems CS presented by: Hussam Abu-Libdeh.
Dynamo: Amazon's Highly Available Key-value Store Guiseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin,
Amazon Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google,
Dynamo: Amazon’s Highly Available Key-value Store Adopted from slides and/or materials by paper authors (Giuseppe DeCandia, Deniz Hastorun, Madan Jampani,
1 Dynamo Amazon’s Highly Available Key-value Store Scott Dougan.
Chord A Scalable Peer-to-peer Lookup Service for Internet Applications
Robert Morris, M. Frans Kaashoek, David Karger, Hari Balakrishnan, Ion Stoica, David Liben-Nowell, Frank Dabek Chord: A scalable peer-to-peer look-up.
Robert Morris, M. Frans Kaashoek, David Karger, Hari Balakrishnan, Ion Stoica, David Liben-Nowell, Frank Dabek Chord: A scalable peer-to-peer look-up protocol.
Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
Chord: A Scalable Peer-to-Peer Lookup Protocol for Internet Applications Stoica et al. Presented by Tam Chantem March 30, 2007.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Peer-to-Peer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Dynamo: Amazon’s Highly Available Key- value Store (SOSP’07) Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman,
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Dynamo A presentation that look’s at Amazon’s Dynamo service (based on a research paper published by Amazon.com) as well as related cloud storage implementations.
Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.
Amazon’s Dynamo System The material is taken from “Dynamo: Amazon’s Highly Available Key-value Store,” by G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati,
Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, et.al., SOSP ‘07.
Cloud Storage – A look at Amazon’s Dyanmo A presentation that look’s at Amazon’s Dynamo service (based on a research paper published by Amazon.com) as.
Dynamo: Amazon’s Highly Available Key-value Store Presented By: Devarsh Patel 1CS5204 – Operating Systems.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.
Peer-to-Peer in the Datacenter: Amazon Dynamo Aaron Blankstein COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
Dynamo: Amazon’s Highly Available Key-value Store COSC7388 – Advanced Distributed Computing Presented By: Eshwar Rohit
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
1 Reading Report 5 Yin Chen 2 Mar 2004 Reference: Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications, Ion Stoica, Robert Morris, david.
Dynamo: Amazon's Highly Available Key-value Store Dr. Yingwu Zhu.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
D YNAMO : A MAZON ’ S H IGHLY A VAILABLE K EY - VALUE S TORE Presenters: Pourya Aliabadi Boshra Ardallani Paria Rakhshani 1 Professor : Dr Sheykh Esmaili.
Dynamo: Amazon’s Highly Available Key-value Store
CSE 486/586 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.
Presentation 1 By: Hitesh Chheda 2/2/2010. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT Laboratory for Computer Science.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Peer to Peer Networks Distributed Hash Tables Chord, Kelips, Dynamo Galen Marchetti, Cornell University.
Chord Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google,
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Distributed Hash.
Big Data Yuan Xue CS 292 Special topics on.
Kitsuregawa Laboratory Confidential. © 2007 Kitsuregawa Laboratory, IIS, University of Tokyo. [ hoshino] paper summary: dynamo 1 Dynamo: Amazon.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Amazon’s Dynamo Lecturer.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
1 Distributed Hash tables. 2 Overview r Objective  A distributed lookup service  Data items are distributed among n parties  Anyone in the network.
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
Dynamo: Amazon’s Highly Available Key-value Store
EECS 498 Introduction to Distributed Systems Fall 2017
MIT LCS Proceedings of the 2001 ACM SIGCOMM Conference
A Scalable Peer-to-peer Lookup Service for Internet Applications
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
Presentation transcript:

Distributed Hash Tables Chord and Dynamo Costin Raiciu, Advanced Topics in Distributed Systems 18/12/2012

Motivation: file sharing Many users want to share files online If a file’s location is known, downloading is easy – The challenge is to find who stores the file we want Early attempts – Napster (centralized), Kazaa Gnutella (March 2000) – Completely decentralized

How should we fix Gnutella’s problems? Decouple storage from lookup – Gnutella: node only answers queries for nodes it has locally Requirements – Extreme scalability: millions of nodes – Load balance: spread load across nodes evenly – Availability: must cope with node churn (nodes joining/leaving/failing)

Chord [Stoica et al, Sigcomm 2001] Opens a new body of research on “Distributed Hash Tables” – Together with Content Addressable Networks (also Sigcomm 2001) Most popular application: a Distributed Hash Table (DHT)

Chord basics A single fundamental operation: lookup(key) – Given a key, find the node responsible for that key How do we do this?

Consistent hashing Assign unique m-bit identifiers to both nodes and objects (e.g. files) – E.g. m=160, use SHA1 – Node identifier: hash of IP address – Object identifier: hash of name. Split key space across all servers – Not necessary to store keys for the files you have! Who is responsible for storing metadata relating to a given key?

Key assignment Identifiers are ordered in an identifier circle modulo 2 m Key k is assigned to the first node whose identifier is equal to or follows (the identifier of) k in the identifier space. – This node is called the successor node of k (successor(k)) – If identifiers are represented as a circle of numbers from 0 to 2 m −1 then successor (k) is the first node clockwise from k

Consistent hashing example

Lookup Each node n maintains a routing table with (at most) m entries called the finger table The i­th entry in the table at node n contains the identity of the first node(s) that succeeds n by at least 2 i −1 on the circle – n.finger[i]= successor ( n + 2 i-1 ), 1< i < m

Lookup (2) Each node stores information about only a small number of other nodes (log n) Nodes know more about nodes closely following them on the circle than about nodes farther away Is there enough information in the finger table to find the successor of an arbitrary key?

How should we use finger pointers to guide the lookup?

Lookup algorithm

How many hops are required to find a key?

Node joins To maintain correctness, Chord maintains two invariants: – Each node’s successor is correctly maintained – For every key k,successor(k) is responsible for k

Node joins: detail Chord uses a predecessor pointer to walk counterclockwise – Maintains Chord ID and IP address of previous node – Why? When a node joins the network Chord: – Initializes the predecessor and fingers of node n; – Updates the fingers and predecessors of existing nodes to reflect the addition of n – Notifies the higher layer software so that it can transfer state associated with keys that n is now responsible for

Stabilization: Dealing with Concurrent Joins and Failures In practice Chord needs to deal with nodes joining the system concurrently and with nodes that fail or leave voluntarily Solution: Every node runs a stabilize process periodically – When n runs stabilize,it asks n’s successor for the successor’s predecessor p, and decides whether p should be n ’s successor instead – stabilize also notifies n’s successor of n’s existence, giving the successor the chance to change its predecessor to n

Implementing a Distributed Hash Table over Chord put(k, v) – lookup n, the node responsible for k and store v on n get(k) – lookup node responsible for k, return value How long does it take to join/leave Chord? – Fix: store on n and a few of its successors – Locally broadcast query

Other aspects of Distributed Hash Tables How do we deal with security? – Nodes that return wrong answers – Nodes that do not forward messages – …

Applications of Distributed Hash Tables? A whole body of research – Distributed Filesystems (Past, Oceanstore) – Distributed Search – None deployed. Why? Today: – Kademlia is used for “tracker-less” torrents

Amazon Dynamo [DeCandia et al, SOSP 2007] (slides adapted from DeCandia et al)

Context Want a distributed storage system to use as support some of Amazon’s tasks: – best seller lists – shopping carts – customer preferences – session management – sales rank – product catalog Traditional databases scale poorly and have poor availability

Amazon Dynamo Requirements – Scale – Simple: key-value – Highly available – Guarantee Service Level Agreements (SLA) Uses key-value store as abstraction

System Assumptions and Requirements Query Model – Read and write operations to a data item that is uniquely identified by a key – No schema needed – Small Objects (<1MB) stored as blobs ACID Properties? – Atomicity and weaker consistency, durability Efficiency – Commodity hardware – Mind the SLA! Other Assumptions – Environment is friendly (no security issues)

Amazon Request Handling 99.9% SLAs

Design Considerations Sacrifice strong consistency for availability – Why are consistency and availability at odds? Optimistic replication increases availability – Allow disconnected operations – This may lead to concurrent updates to the same object: conflict – When to perform conflict resolution? Delaying writes unacceptable (e.g. shopping cart update) Solve conflicts during read instead of write, i.e. “always writeable”. Who resolves conflict? – App – e.g. merge shopping cart contents – Datastore – last write wins.

Other design considerations Incremental scalability Symmetry Decentralization Heterogeneity

Partitioning Algorithm Dynamo uses consistent hashing Consistent hashing issues: –Load imbalance –Dealing with heterogeneity ”Virtual Nodes”: Each node can be responsible for more than one virtual node.

Advantages of using virtual nodes If a node becomes unavailable the load handled by this node is evenly dispersed across the remaining available nodes. When a node becomes available again, the newly available node accepts a roughly equivalent amount of load from each of the other available nodes. The number of virtual nodes that a node is responsible can decided based on its capacity, accounting for heterogeneity in the physical infrastructure.

Replication Each data item is replicated at N hosts – N is specified per instance “preference list”: the N-1 successors of the key that store it.

Data Versioning A put() call may return to its caller before the update has been applied at all the replicas A get() call may return many versions of the same object Challenge: an object having distinct version sub- histories, which the system will need to reconcile in the future. Solution: uses vector clocks in order to capture causality between different versions of the same object.

Vector Clock A vector clock is a list of (node, counter) pairs. Every version of every object is associated with one vector clock. If the counters on the first object’s clock are less-than-or-equal to all of the nodes in the second clock, then the first is an ancestor of the second and can be forgotten.

Vector clock example

Execution of get () and put () operations 1.Route its request through a generic load balancer that will select a node based on load information. 2.Use a partition-aware client library that routes requests directly to the appropriate coordinator nodes.

Quorum systems We are balancing writes and reads over N nodes How do we make sure a read sees the latest write? – Write on all nodes, wait for reply from all; read from any node – Or write to one, read from all Quorum systems: write to W, read from R such that W+R>N

Dynamo uses Sloppy Quorum Send write to all nodes – Return when W reply Send read to all nodes – Return result(s) when R reply What did we lose?.

Hinted handoff Assume N = 3. When B is temporarily down or unreachable during a write, send replica to E. E’s metadata hints that the replica belongs to A and it will deliver it to A when A is recovered. Write will succeed as long as where are W nodes (any) available in the system

Dynamo membership Membership changes are manually configured – Gossip based protocol propagates membership information – Everyone node knows about every other node’s range Failures are detected by each node via timeouts – Enable hinted handoffs, etc.

Implementation Java Local persistence component allows for different storage engines to be plugged in: – Berkeley Database (BDB) Transactional Data Store: object of tens of kilobytes – MySQL: object of > tens of kilobytes – BDB Java Edition, etc.

Evaluation