Fabián E. Bustamante, Fall 2005 Efficient Replica Maintenance for Distributed Storage Systems B-G Chun, F. Dabek, A. Haeberlen, E. Sit, H. Weatherspoon,

Slides:



Advertisements
Similar presentations
Dynamo: Amazon’s Highly Available Key-value Store
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
SDN + Storage.
© 2005 Andreas Haeberlen, Rice University 1 Glacier: Highly durable, decentralized storage despite massive correlated failures Andreas Haeberlen Alan Mislove.
Dynamo: Amazon's Highly Available Key-value Store Distributed Storage Systems CS presented by: Hussam Abu-Libdeh.
Fabián E. Bustamante, 2007 Meridian: A lightweight network location service without virtual coordinates B. Wong, A. Slivkins and E. Gün Sirer SIGCOM 2005.
Consensus Routing: The Internet as a Distributed System John P. John, Ethan Katz-Bassett, Arvind Krishnamurthy, and Thomas Anderson Presented.
Samsara Honor among thieves in peer-to-peer storage.
A New Adaptive FEC Loss Control Algorithm for Voice Over IP Applications Chinmay Padhye, Kenneth Christensen and Wilfirdo Moreno Department of Computer.
1 High Availability, Scalable Storage, Dynamic Peer Networks: Pick Two Nov. 24, 2003 Byung-Gon Chun.
On Object Maintenance in Peer-to-Peer Systems IPTPS 2006 Kiran Tati and Geoffrey M. Voelker UC San Diego.
Chord and CFS Philip Skov Knudsen Niels Teglsbo Jensen Mads Lundemann
Efficient replica maintenance for distributed storage systems Byung-Gon Chun, Frank Dabek, Andreas Haeberlen, Emil Sit, Hakim Weatherspoon, M. Frans Kaashoek,
University of Oregon Slides from Gotz and Wehrle + Chord paper
A Scalable Content- Addressable Network Sections: 3.1 and 3.2 Καραγιάννης Αναστάσιος Α.Μ. 74.
Replication Monitoring University of Maryland Institute for Advanced Computer Studies.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
G Robert Grimm New York University Bayou: A Weakly Connected Replicated Storage System.
Application Layer Multicast for Earthquake Early Warning Systems Valentina Bonsi - April 22, 2008.
Wide-area cooperative storage with CFS
Database System Concepts ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Remote Backup Systems.
Replica Placement Strategy for Wide-Area Storage Systems Byung-Gon Chun and Hakim Weatherspoon RADS Final Presentation December 9, 2004.
PCP: Efficient Endpoint Congestion Control To appear in NSDI, 2006 Thomas Anderson, Andrew Collins, Arvind Krishnamurthy and John Zahorjan University of.
Failures in the System  Two major components in a Node Applications System.
Team CMD Distributed Systems Team Report 2 1/17/07 C:\>members Corey Andalora Mike Adams Darren Stanley.
Amazon’s Dynamo System The material is taken from “Dynamo: Amazon’s Highly Available Key-value Store,” by G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati,
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
Dynamo: Amazon’s Highly Available Key-value Store COSC7388 – Advanced Distributed Computing Presented By: Eshwar Rohit
1 The Google File System Reporter: You-Wei Zhang.
Wide-Area Cooperative Storage with CFS Robert Morris Frank Dabek, M. Frans Kaashoek, David Karger, Ion Stoica MIT and Berkeley.
DISTRIBUTED ALGORITHMS Luc Onana Seif Haridi. DISTRIBUTED SYSTEMS Collection of autonomous computers, processes, or processors (nodes) interconnected.
CH2 System models.
Distributed File Systems
1 Reading Report 5 Yin Chen 2 Mar 2004 Reference: Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications, Ion Stoica, Robert Morris, david.
Tony McGregor RIPE NCC Visiting Researcher The University of Waikato DAR Active measurement in the large.
Presentation 1 By: Hitesh Chheda 2/2/2010. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT Laboratory for Computer Science.
Growth Codes: Maximizing Sensor Network Data Persistence abhinav Kamra, Vishal Misra, Jon Feldman, Dan Rubenstein Columbia University, Google Inc. (SIGSOMM’06)
1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Databases Illuminated
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Rendezvous Regions: A Scalable Architecture for Service Location and Data-Centric Storage in Large-Scale Wireless Sensor Networks Karim Seada, Ahmed Helmy.
The concept of RAID in Databases By Junaid Ali Siddiqui.
Effective Replica Maintenance for Distributed Storage Systems USENIX NSDI’ 06 Byung-Gon Chun, Frank Dabek, Andreas Haeberlen, Emil Sit, Hakim Weatherspoon,
Flow and Congestion Control for Reliable Multicast Communication In Wide-Area Networks Supratik Bhattacharyya Department of Computer Science University.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
Network Computing Laboratory Load Balancing and Stability Issues in Algorithms for Service Composition Bhaskaran Raman & Randy H.Katz U.C Berkeley INFOCOM.
Virtual Machine Movement and Hyper-V Replica
Seminar On Rain Technology
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
Big Data Yuan Xue CS 292 Special topics on.
PERFORMANCE MANAGEMENT IMPROVING PERFORMANCE TECHNIQUES Network management system 1.
CS791Aravind Elango Maintenance-Free Global Data Storage Sean Rhea, Chris Wells, Patrick Eaten, Dennis Geels, Ben Zhao, Hakim Weatherspoon and John Kubiatowicz.
Persistence of Data in a Dynamic Unreliable Network
Remote Backup Systems.
Nomadic File Systems Uri Moszkowicz 05/02/02.
Distributed Systems – Paxos
Dynamo: Amazon’s Highly Available Key-value Store
Impact of Neighbor Selection on Performance and Resilience of Structured P2P Networks Sushma Maramreddy.
Authors Alessandro Duminuco, Ernst Biersack Taoufik and En-Najjary
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)
EECS 498 Introduction to Distributed Systems Fall 2017
COS 561: Advanced Computer Networks
Elders know best Lifespan-based ideas in P2P systems
Friendships that last Peer lifespan and its role in P2P protocols
MIT LCS Proceedings of the 2001 ACM SIGCOMM Conference
Remote Backup Systems.
Presentation transcript:

Fabián E. Bustamante, Fall 2005 Efficient Replica Maintenance for Distributed Storage Systems B-G Chun, F. Dabek, A. Haeberlen, E. Sit, H. Weatherspoon, M. Kaashoek, J. Kubiatowicz, and R. Morris, In Proc. of NSDI, May Presenter: Fabián E. Bustamante

EECS 443 Advanced Operating Systems Northwestern University 2 Replication in Wide-Area Storage Applications put & get objects in/from the wide-area storage system Objects are replicated for –Availability Get on an object will return promptly –Durability Object put by the app are not lost due to disk failures –An object may be durably stored but not immediately available

EECS 443 Advanced Operating Systems Northwestern University 3 Goal: durability at low bandwidth cost Durability is a more practical & useful goal Threat to durability –Loose the last copy of an object –So, create copies faster than they are destroyed Challenges –Replication can eat your bandwidth –Hard to distinguish bet/ transient & permanent failure –After recover, some replicas may be in nodes the lookup algorithm does not check Paper presents Carbonite – efficient wide-area replication technique for durability

EECS 443 Advanced Operating Systems Northwestern University 4 System Environment Use PlanetLab (PL) as representative –>600 nodes distributed world-wide –History traces collected by CoMon project (every 5’) –Disk failures from event logs of PlanetLab Central Synthetic traces –632 nodes as PL –Failure inter-arrival times from exponential dist. (mean session time and downtime as in PL) –Two years instead of one and avg node lifetime of 1 year Simulation –Trace-driven event-based simulator –Assumptions Network paths are independent All nodes reachable from all other nodes Each node with same link capacity Dates3/1/05-2/28/06 Hosts632 Transient failures21355 Disk failures219 Transient host downtime (s) (median,avg,90th) 1208, , Any failure interarrival (s)305, 1467, 3306 Disk failure interarrival (s)544411, ,

EECS 443 Advanced Operating Systems Northwestern University 5 Understanding durability To handle some avg. rate of failure – create new replicas faster than they are destroyed –Function of per-node access link, number of nodes, amount of data stored per node Infeasible system – unable to keep pace w/ avg. failure rate – will eventually adapt by discarding objects (which ones?) If creation rate is just above failure rate – failure burst may be a problem Target replicas to maintain – r L Durability does not increased continuously with r L

EECS 443 Advanced Operating Systems Northwestern University 6 Improving repair time Scope – set of other nodes that can hold copies of the objects a node is responsible for Small scope –Easier to keep track of copies –Effort of creating copies fall on a small set of nodes –Addition of nodes may result on needless copying of objects (when combined w/ consistent hashing) Large scope –Spread work among more nodes –Network traffic source/ destination are spread –Temp failures will be noticed by more nodes

EECS 443 Advanced Operating Systems Northwestern University 7 Reducing transient costs Impossible to distinguish transient/permanent failures To minimize net traffic due to transient failures: reintegrate replicas Carbonite –Selecet a suitable value for r L –Respond to detected failure by creating new replica –Reintegrate replicas Bytes sent by different maintenance algorithms

EECS 443 Advanced Operating Systems Northwestern University 8 Reducing transient costs Bytes sent w/ and w/o reintegration Impact of timeouts on bandwidth and durability

EECS 443 Advanced Operating Systems Northwestern University 9 Assumptions The PlanetLab testbed can be seen as representative of something Immutable data Relatively stable system membership & data loss driven by disk failures Disk failures are uncorrelated Simulation –Network paths are independent –All nodes reachable from all other nodes –Each node with same link capacity