P2P: Storage.

Slides:



Advertisements
Similar presentations
Dynamo: Amazon’s Highly Available Key-value Store
Advertisements

Dynamo: Amazon’s Highly Available Key-value Store Slides taken from created by paper authors Giuseppe DeCandia, Deniz Hastorun,
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Dynamo: Amazon’s Highly Available Key-value Store ID2210-VT13 Slides by Tallat M. Shafaat.
Scalable Content-Addressable Network Lintao Liu
Peer-to-Peer (P2P) Distributed Storage 1Dennis Kafura – CS5204 – Operating Systems.
AMAZON’S KEY-VALUE STORE: DYNAMO DeCandia,Hastorun,Jampani, Kakulapati, Lakshman, Pilchin, Sivasubramanian, Vosshall, Vogels: Dynamo: Amazon's highly available.
Distributed Hash Tables Chord and Dynamo Costin Raiciu, Advanced Topics in Distributed Systems 18/12/2012.
Cassandra Structured Storage System over a P2P Network Avinash Lakshman, Prashant Malik.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Amazon’s Dynamo Simple Cloud Storage. Foundations 1970 – E.F. Codd “A Relational Model of Data for Large Shared Data Banks”E.F. Codd –Idea of tabular.
Dynamo: Amazon's Highly Available Key-value Store Distributed Storage Systems CS presented by: Hussam Abu-Libdeh.
Dynamo: Amazon's Highly Available Key-value Store Guiseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin,
Dynamo: Amazon’s Highly Available Key-value Store Adopted from slides and/or materials by paper authors (Giuseppe DeCandia, Deniz Hastorun, Madan Jampani,
1 Dynamo Amazon’s Highly Available Key-value Store Scott Dougan.
Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Dynamo Kay Ousterhout. Goals Small files Always writeable Low latency – Measured at 99.9 th percentile.
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Dynamo: Amazon’s Highly Available Key- value Store (SOSP’07) Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman,
Rethinking Dynamo: Amazon’s Highly Available Key-value Store --An Offense Shih-Chi Chen Hongyu Gao.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Dynamo A presentation that look’s at Amazon’s Dynamo service (based on a research paper published by Amazon.com) as well as related cloud storage implementations.
Amazon’s Dynamo System The material is taken from “Dynamo: Amazon’s Highly Available Key-value Store,” by G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati,
Dynamo: Amazon's Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, et.al., SOSP ‘07.
Cloud Storage – A look at Amazon’s Dyanmo A presentation that look’s at Amazon’s Dynamo service (based on a research paper published by Amazon.com) as.
Dynamo: Amazon’s Highly Available Key-value Store Presented By: Devarsh Patel 1CS5204 – Operating Systems.
EECS 262a Advanced Topics in Computer Systems Lecture 22 P2P Storage: Dynamo November 14 th, 2012 John Kubiatowicz and Anthony D. Joseph Electrical Engineering.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.
Peer-to-Peer in the Datacenter: Amazon Dynamo Aaron Blankstein COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia et al. [Amazon.com] Jagrut Sharma CSCI-572 (Prof. Chris Mattmann)
Dynamo: Amazon’s Highly Available Key-value Store COSC7388 – Advanced Distributed Computing Presented By: Eshwar Rohit
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
Depot: Cloud Storage with minimal Trust COSC 7388 – Advanced Distributed Computing Presentation By Sushil Joshi.
Dynamo: Amazon's Highly Available Key-value Store Dr. Yingwu Zhu.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Apache Cassandra - Distributed Database Management System Presented by Jayesh Kawli.
Dynamo: Amazon’s Highly Available Key-value Store DeCandia, Hastorun, Jampani, Kakulapati, Lakshman, Pilchin, Sivasubramanian, Vosshall, Vogels PRESENTED.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Amazon’s Dynamo Lecturer.
D YNAMO : A MAZON ’ S H IGHLY A VAILABLE K EY - VALUE S TORE Presenters: Pourya Aliabadi Boshra Ardallani Paria Rakhshani 1 Professor : Dr Sheykh Esmaili.
Dynamo: Amazon’s Highly Available Key-value Store
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
Cassandra - A Decentralized Structured Storage System
CSE 486/586 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.
Peer to Peer Networks Distributed Hash Tables Chord, Kelips, Dynamo Galen Marchetti, Cornell University.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin,
DYNAMO: AMAZON’S HIGHLY AVAILABLE KEY-VALUE STORE GIUSEPPE DECANDIA, DENIZ HASTORUN, MADAN JAMPANI, GUNAVARDHAN KAKULAPATI, AVINASH LAKSHMAN, ALEX PILCHIN,
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Big Data Yuan Xue CS 292 Special topics on.
Kitsuregawa Laboratory Confidential. © 2007 Kitsuregawa Laboratory, IIS, University of Tokyo. [ hoshino] paper summary: dynamo 1 Dynamo: Amazon.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Amazon’s Dynamo Lecturer.
CSCI5570 Large Scale Data Processing Systems NoSQL Slide Ack.: modified based on the slides from Peter Vosshall James Cheng CSE, CUHK.
Cassandra - A Decentralized Structured Storage System
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
Peer-to-Peer Data Management
Dynamo: Amazon’s Highly Available Key-value Store
(slides by Nick Feamster)
John Kubiatowicz Electrical Engineering and Computer Sciences
DHT Routing Geometries and Chord
Key-Value Tables: Chord and DynamoDB (Lecture 16, cs262a)
Peer-to-Peer Storage Systems
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
Presentation transcript:

P2P: Storage

Overall outline (Relatively) chronological overview of P2P areas: What is P2P? Filesharing → structured networks → storage → the cloud Dynamo Design considerations Challenges and design techniques Evaluation, takeaways, and discussion Cassandra Vs Dynamo Notable design choices Give background because this is the last class on P2P we’re doing “As a review” Sequence of areas where the P2P design pattern was applied in research, from the beginning of interest in P2P to the cloud environment where Dynamo picks up Cassandra: mention some interesting design choices and do a bit of comparison to Dynamo

Background: P2P Formal definition? Symmetric division of responsibility and functionality Client-server: Nodes both request and provide service Each node enjoys conglomerate service provided by peers Can offer better load distribution, fault-tolerance, scalability... On a fast rise in the early 2000’s I tried looking around but couldn’t find a formal definition. This is the definition that I have, if Hakim wants to interject here… In relation to the server/client setup … What this means for the individual node as a client is that… Why is P2P worth mentioning? Because a correct use of P2P’s symmetry property can… Now I will discuss how interest in P2P evolved since it sparked in the early 2000s

Background:P2P filesharing & unstructured networks Napster (1999) Gnutella (2000) FreeNet (2000) Key challenges: Decentralize content search and routing The main application of P2P in the early 2000’s was in filesharing systems, including … The key challenges is using symmetric P2P design to decentralize the content search and search/route on any topology

Background: P2P structured networks CAN (2001) Chord (2001) Pastry (2001) Tapestry (2001) More systematic+formal Key challenges: Routing latency Churn-resistance Scalability Then came structured P2P networks with routing geometries is defined with various complex tradeoffs Original 4 systems We have reviewed some of these systems, so we know that the main challenges in this area was

Background: P2P Storage CAN (2001) Chord (2001) → DHash++ (2004) Pastry (2001) → PAST (2001) Tapestry (2001) → Pond (2003) Chord/Pastry → Bamboo (2004) Key challenges: Distrusting peers High churn rate Low bandwidth connections Storage systems were built on these routing geometries The main challenges facing these … distrusting peers and with high expected churn rate

Background: P2P on the Cloud In contrast: Single administrative domain Low churn (only due to permanent failure) High bandwidth connections This new environment created new space for research …. Which is where Dynamo picks up

Dynamo:Amazon’s Highly Available Key-value Store SOSP 2007: Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels Best seller lists, shopping carts, etc. Also proprietary service@AWS Werner Vogels: Cornell → Amazon Also offered as a proprietary service

Interface Put (key, context, object) → Success/Fail Success of (set of values, context)/Fail ← Get (key)

Dynamo’s design considerations Strict performance requirements, tailored closely to the cloud environment Very high write availability CAP No isolation, single-key updates 99.9th percentile SLA system Regional power outages are tolerable → Symmetry of function Incremental scalability Explicit node joins Low churn rate assumed Why worth mentioning? Uncommon at the time Reflect the cloud environment discussed Later design choices greatly depend on assumptions made here No writes should fail or delay from consistency management no transactions SLA = Service Level Agreement, Setting an upper bound on latency Incremental scalability means minimal tolerance to one node manually added at a time

List of challenges: Incremental scalability and load balance Flexible durability High write availability Handling temporary failure Handling permanent failure Membership protocol and failure detection Generally follows the paper’s table 1 … added non-trivial decisions like flexible durability as well

List of challenges: Incremental scalability and load balance Adding one node at a time Uniform node-key distribution Node heterogeneity Flexible durability High write availability Handling temporary failure Handling permanent failure Membership protocol and failure detection Minimizing disruption when adding one node at a time Maintaining a uniform node-key distribution Addressing node heterogeneity

Incremental scalability and load balance Consistent Hashing Virtual nodes (as seen in Chord): Node gets several, smaller key ranges instead of one big one

Incremental scalability and load balance Consistent Hashing Virtual nodes (as seen in Chord): Node gets several, smaller key ranges instead of one big one Benefits: More uniform key-node distribution Node join and leaves requires only neighbor nodes Variable number of virtual nodes per physical node

List of challenges: Incremental scalability and load balance Flexible durability Latency vs durability High write availability Handling temporary failure Handling permanent failure Membership protocol and failure detection

Flexible Durability Key preference list N - # of healthy nodes coordinator references W - min # of responses for put R - min # of responses for get R, W, N tradeoffs W↑ ⇒ Consistency↑, latency↑ R↑ ⇒ Consistency↑, latency↑ N↑ ⇒ Durability↑, load on coord↑ R + W > N : Read-your-writes R+W > N : if assuming no inaccessibility, guaranteed read-your-writes

Flexible Durability Key preference list N - # of healthy nodes coordinator references W - min # of responses for put R - min # of responses for get R, W, N tradeoffs Benefits: Tunable consistency, latency, and fault-tolerance Fastest possible latency out of the N healthy replicas every time Allows hinted handoff R+W > N : if assuming no inaccessibility, guaranteed read-your-writes

List of challenges: Incremental scalability and load balance Flexible durability High write availability Writes cannot fail or delay because of consistency management Handling temporary failure Handling permanent failure Membership protocol and failure detection The key idea is that Dynamo aggressively accepts all writes that it receives

Achieving High Write Availability Weak consistency Small W → outdated objects lying around Small R → outdated objects reads Update by itself is meaningful and should preserve Accept all updates, even on outdated copies Updates on outdated copies ⇒ DAG object was-before relation Given two copies, should be able to tell: Was-before relation → Subsume Independent → preserve both But single version number forces total ordering (Lamport clock)

Hiding Concurrency (1) (2) (3) (3) Write handled by Sz (4)

Achieving High Write Availability Weak consistency Small W → outdated objects lying around Small R → outdated objects reads Update by itself is meaningful and should preserve Accept all updates, even on outdated copies Updates on outdated copies ⇒ DAG object was-before relation Given two copies, should be able to tell: Was-before relation → Subsume Independent → preserve both But single version number forces total ordering (Lamport clock) Vector clock: version number per key per machine, preserves concurrence

Showing Concurrency Write handled by Sz [Sz,2] )

Achieving High Write Availability No write fail or delay because of consistency management Immutable objects + vector clock as version Automatic subsumption reconciliation Client resolves unknown relation through context

Achieving High Write Availability No write fail or delay because of consistency management Immutable objects + vector clock as version Automatic subsumption reconciliation Client resolves unknown relation through context Read (k) = {D3, D4}, Opaque_context(D3(vector), D4(vector)) /* Client reconciles D3 and D4 into D5 */ Write (k, Opaque_context(D3(vector), D4(vector), D5) Dynamo creates a vector clock that subsumes clocks in context

Achieving High Write Availability No write fail or delay because of consistency management Immutable objects + vector clock as version Automatic subsumption reconciliation Client resolves unknown relation through context Benefits: Aggressively accept all updates Problem: Client-side reconciliation Reconciliation not always possible Must read after each write to chain a sequence of updates

List of challenges: Incremental scalability and load balance Flexible durability High write availability Handling temporary failure Writes cannot fail or delay because of temporary inaccessibility Handling permanent failure Membership protocol and failure detection

Handling Temporary Failures No write fail or delay because of temporary inaccessibility Assume node will be accessible again soon coordinator walks off the N-preference list References node N+a on list to reach W responses N+a keeps passes object back to the hinted node at first opportunity Benefits: Aggressively accept all updates

List of challenges: Incremental scalability and load balance Flexible durability High write availability Handling temporary failure Handling permanent failure Maintain eventual consistency with permanent failure Membership protocol and failure detection

Permanent failures in Dynamo Use anti-entropy between replicas Merkle Trees Speeds up subsumption

List of challenges: Incremental scalability and load balance Flexible durability High write availability Handling temporary failure Handling permanent failure Membership protocol and failure detection

Membership and failure detection in Dynamo Anti-entropy to reconcile membership (eventually consistent view) Constant time lookup Explicit node join and removal Seed nodes to avoid logical network partitions Temporary inaccessibility detected through timeouts and handled locally

Evaluation 1 low variance in read and write latencies 2 Writes directly to memory, cache reads 3 Shows skewed distribution of latency 1 2 3

Evaluation - Lowers write latency - smoothes 99.9th percentile extremes - At a durability cost

Evaluation - lower loads: Fewer popular keys - In higher loads: Many popular keys roughly equally among the nodes, most node don’t deviate more than 15% Imbalance = 15% away from average node load

Takeaways User gets knobs to balance durability, latency, and consistency P2P techniques can be used in the cloud environment to produce highly- available services Instead resolving consistency for all clients at a universally higher latency, let each client resolve their own consistency individually ∃ Industry services that require the update to always preserve

Cassandra - A Decentralized Structured Storage System Avinash Lakshman, Prashant Malik Avinash from Dynamo team Used in multiple internals in FB, including inbox search

Interface / Data Model Borrows from BigTable Rows are keys Columns are common key attributes Column families and super columns

In Relation to Dynamo Implement very similar systems “A write operation in Dynamo also requires a read to be performed for managing the vector timestamps … limiting [when] handling a very high write throughput.” Instead of virtual nodes, moves tokens “Makes the design and implementation very tractable … deterministic choices about load balancing” Consistency option between quorum or single-machine and anti-entropy Automate bootstraping through ZooKeeper

Results

Thank you for listening!