© 2005 Andreas Haeberlen, Rice University 1 Glacier: Highly durable, decentralized storage despite massive correlated failures Andreas Haeberlen Alan Mislove.

Slides:



Advertisements
Similar presentations
Perspective on Overlay Networks Panel: Challenges of Computing on a Massive Scale Ben Y. Zhao FuDiCo 2002.
Advertisements

CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Glacier: Highly Durable, Decentralized Storage Despite Massive Correlated Failures Andreas Haeberlen,
Analysis and Construction of Functional Regenerating Codes with Uncoded Repair for Distributed Storage Systems Yuchong Hu, Patrick P. C. Lee, Kenneth.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
What is OceanStore? - 10^10 users with files each - Goals: Durability, Availability, Enc. & Auth, High performance - Worldwide infrastructure to.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
A Robust and Efficient Reputation System for Active Peer-to-Peer Systems Dominik Grolimund, Luzius Meisser, Stefan Schmid, Roger Wattenhofer Computer Engineering.
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Druschel Presented by: Cristian Borcea.
Storage management and caching in PAST, a large-scale, persistent peer- to-peer storage utility Antony Rowstron, Peter Druschel.
Abstract HyFS: A Highly Available Distributed File System Jianqiang Luo, Mochan Shrestha, Lihao Xu Department of Computer Science, Wayne State University.
Fabián E. Bustamante, Fall 2005 Efficient Replica Maintenance for Distributed Storage Systems B-G Chun, F. Dabek, A. Haeberlen, E. Sit, H. Weatherspoon,
SplitStream: High- Bandwidth Multicast in Cooperative Environments Monica Tudora.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University.
Storage Management and Caching in PAST, a large-scale, persistent peer- to-peer storage utility Authors: Antony Rowstorn (Microsoft Research) Peter Druschel.
SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07.
1 High Availability, Scalable Storage, Dynamic Peer Networks: Pick Two Nov. 24, 2003 Byung-Gon Chun.
On Object Maintenance in Peer-to-Peer Systems IPTPS 2006 Kiran Tati and Geoffrey M. Voelker UC San Diego.
DNA Research Group 1 Growth Codes: Maximizing Sensor Network Data Persistence Abhinav Kamra, Vishal Misra, Dan Rubenstein Department of Computer Science,
Efficient replica maintenance for distributed storage systems Byung-Gon Chun, Frank Dabek, Andreas Haeberlen, Emil Sit, Hakim Weatherspoon, M. Frans Kaashoek,
Erasure Coding vs. Replication: A Quantiative Comparison
GLACIERS HIGHLY DURABLE, DECENTRALIZED STORAGE DESPITE MASSIVE CORRELATED FAILURE PRESENTED BY ANILA JAGANNATHAM.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Wide-area cooperative storage with CFS
Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.
OceanStore An Architecture for Global-Scale Persistent Storage Motivation Feature Application Specific Components - Secure Naming - Update - Access Control-
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
Long Term Durability with Seagull Hakim Weatherspoon (Joint work with Jeremy Stribling and OceanStore group) University of California, Berkeley ROC/Sahara/OceanStore.
Correctness of Gossip-Based Membership under Message Loss Maxim Gurevich, Idit Keidar Technion.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
Network Coding for Distributed Storage Systems IEEE TRANSACTIONS ON INFORMATION THEORY, SEPTEMBER 2010 Alexandros G. Dimakis Brighten Godfrey Yunnan Wu.
Slicing the Onion: Anonymity Using Unreliable Overlays Sachin Katti Jeffrey Cohen & Dina Katabi.
Failure Resilience in the Peer-to-Peer-System OceanStore Speaker: Corinna Richter.
1 Configurable Security for Scavenged Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany, Matei Ripeanu.
UbiStore: Ubiquitous and Opportunistic Backup Architecture. Feiselia Tan, Sebastien Ardon, Max Ott Presented by: Zainab Aljazzaf.
Resilient Peer-to-Peer Streaming Presented by: Yun Teng.
Security Michael Foukarakis – 13/12/2004 A Survey of Peer-to-Peer Security Issues Dan S. Wallach Rice University,
Growth Codes: Maximizing Sensor Network Data Persistence abhinav Kamra, Vishal Misra, Jon Feldman, Dan Rubenstein Columbia University, Google Inc. (SIGSOMM’06)
Building Dependable Distributed Systems Chapter 1 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Peer-to-Peer Systems Rodrigo Rodrigues Peter Druschel Max Planck Institute for Software Systems.
Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding Yuchong Hu, Yinlong Xu, Xiaozhao Wang, Cheng Zhan and Pei.
University of Massachusetts, Amherst TFS: A Transparent File System for Contributory Storage James Cipar, Mark Corner, Emery Berger
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Effective Replica Maintenance for Distributed Storage Systems USENIX NSDI’ 06 Byung-Gon Chun, Frank Dabek, Andreas Haeberlen, Emil Sit, Hakim Weatherspoon,
Exact Regenerating Codes on Hierarchical Codes Ernst Biersack Eurecom France Joint work and Zhen Huang.
POND: THE OCEANSTORE PROTOTYPE S. Rea, P. Eaton, D. Geels, H. Weatherspoon, J. Kubiatowicz U. C. Berkeley.
P2PSIP Security Analysis and evaluation draft-song-p2psip-security-eval-00 Song Yongchao Ben Y. Zhao
Peer to Peer Network Design Discovery and Routing algorithms
SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov.
Peer-to-Peer Systems: An Overview Hongyu Li. Outline  Introduction  Characteristics of P2P  Algorithms  P2P Applications  Conclusion.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
A Key Management Scheme for Distributed Sensor Networks Laurent Eschaenauer and Virgil D. Gligor.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
Peer-to-Peer Networks 05 Pastry Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg.
Seminar On Rain Technology
CS791Aravind Elango Maintenance-Free Global Data Storage Sean Rhea, Chris Wells, Patrick Eaten, Dennis Geels, Ben Zhao, Hakim Weatherspoon and John Kubiatowicz.
SEMINAR TOPIC ON “RAIN TECHNOLOGY”
File-System Management
CS 268: Lecture 22 (Peer-to-Peer Networks)
(slides by Nick Feamster)
RAID RAID Mukesh N Tekwani
Providing Secure Storage on the Internet
SpiraTest/Plan/Team Deployment Considerations
A Redundant Global Storage Architecture
RAID RAID Mukesh N Tekwani April 23, 2019
Federated, Available, and Reliable Storage for an Incompletely Trusted Environment Atul Adya, William J. Bolosky, Miguel Castro, Gerald Cermak, Ronnie.
Presentation transcript:

© 2005 Andreas Haeberlen, Rice University 1 Glacier: Highly durable, decentralized storage despite massive correlated failures Andreas Haeberlen Alan Mislove Peter Druschel Rice University Houston, TX 2nd Symposium on Networked Systems Design & Implementation (NSDI) Boston, MA May 2-4, 2005

2 © 2005 Andreas Haeberlen, Rice University Introduction Many distributed applications require storage Cooperative storage: Aggregate storage on participating nodes Advantages: Resilient Highly scalable Examples: Farsite, PAST, OceanStore Structured overlay network

3 © 2005 Andreas Haeberlen, Rice University Motivation Common assumption: High node diversity  Failure independence Unrealistic! Node population may have low diversity (e.g. OS) Worms can cause large-scale correlated Byzantine failures Reactive systems are too slow to prevent data loss

4 © 2005 Andreas Haeberlen, Rice University Related Work Phoenix, OceanStore use introspection: Build failure model Store data on nodes with low correlation Limitations: Model must reflect all possible correlations Even small inaccuracies may lead to data loss Users have an incentive to report incorrect data

5 © 2005 Andreas Haeberlen, Rice University Our Approach: Glacier Create massive redundancy to ensure that data survives any correlated failure with high probability Assumption: Magnitude of the failure can be bounded by fraction f max Challenges: Minimize storage and bandwidth requirements Withstand attacks, Byzantine failures

6 © 2005 Andreas Haeberlen, Rice University Glacier: Insertion When a new object is inserted: 1. Apply erasure code 2. Attach manifest with hashes of fragments 3. Send each fragment to a different node No remote delete operation, but lifetime of objects can be limited Storage is lease-based; reclaims unused storage X

7 © 2005 Andreas Haeberlen, Rice University Glacier: Maintenance Nodes with distance store similar fragments Periodic maintenance: Ask a peer node for its list of fragments Compare with local list, recover any missing fragments Fragments remain on their nodes during offline periods ? X

8 © 2005 Andreas Haeberlen, Rice University Glacier: Recovery During a failure, some fragments are damaged or lost Communication may not be possible Unaffected nodes do not take any special action: Failed nodes are eventually repaired Maintenance gradually restores lost fragments Time Insert Correlated failure T fail Offline period

9 © 2005 Andreas Haeberlen, Rice University Glacier: Durability Example configuration: 48 fragments, any 5 sufficient for recovery Bad news: Storage overhead 9.6x Good news: Survives 60% correlated failure with P= (single object) f max DurabilityCodeFragmentsStorage More storage Higher durability

10 © 2005 Andreas Haeberlen, Rice University Aggregation If objects are small: Huge number of fragments High overhead for storage, management Solution: Aggregate objects before storing them in Glacier Challenges: Untrusted environment Aggregates must be self-authenticating App Glacier App Aggreg. Glacier

11 © 2005 Andreas Haeberlen, Rice University Aggregation: Links Mapping from objects to aggregates is crucial! Need durability Need authentication Solution: Link aggregates Result: DAG Can recover mapping by traversing the DAG DAG forms a hash tree; easy to authenticate Top-level pointer is kept in Glacier itself

12 © 2005 Andreas Haeberlen, Rice University Evaluation Two sets of experiments: Trace-driven simulations (scalability, churn,...) Actual deployment: ePOST ePOST: A cooperative, serverless system In production use: Initially 17 users, 20 nodes Based on FreePastry, PAST, Scribe, POST Added Glacier for durability Glacier configuration in ePOST: 48 fragments, 0.2 encoding f max =0.6, P= days of practical experience (incl. some failures)

13 © 2005 Andreas Haeberlen, Rice University Evaluation: Storage Inherent storage overhead: 48/5= GB of on-disk storage for 1.3GB of data Actual storage overhead on disk: About 12.6

14 © 2005 Andreas Haeberlen, Rice University Evaluation: Network load During stable periods, traffic is comparable to PAST In the ePOST experiment, a misconfiguration caused frequent traffic spikes Long off-line periods were mistaken for failures

15 © 2005 Andreas Haeberlen, Rice University Evaluation: Recovery Experiment: Created a 'clone' of the ePOST ring with only 13 of the 31 nodes (a 58% failure!) Started recovery process on a freshly installed node: User entered address and date of last use Glacier located head of aggregate tree, recovered it System was again ready for use; no data loss

16 © 2005 Andreas Haeberlen, Rice University Conclusions Large-scale correlated failures are a realistic threat to distributed storage systems Glacier provides hard durability guarantees with minimal assumptions about the failure model Glacier transforms abundant but unreliable disk space into reliable storage Bandwidth cost is low Thank you!

17 © 2005 Andreas Haeberlen, Rice University Glacier is available! Download: Serverless, secure Easy to set up Uses Glacier for durability