Long Term Durability with Seagull Hakim Weatherspoon (Joint work with Jeremy Stribling and OceanStore group) University of California, Berkeley ROC/Sahara/OceanStore.

Slides:

Advertisements

Similar presentations

Tapestry: Decentralized Routing and Location SPAM Summer 2001 Ben Y. Zhao CS Division, U. C. Berkeley.

Advertisements

What is OceanStore? - 10^10 users with files each - Goals: Durability, Availability, Enc. & Auth, High performance - Worldwide infrastructure to.

Henry C. H. Chen and Patrick P. C. Lee

Scalable Content-Addressable Network Lintao Liu

Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.

A P2P-based Storage Platform for Storing Session Data in Internet Access Networks T. Bahls, D. Duchow Nokia Siemens Networks Broadband Access Division.

CIS : Network Management. Introduction Network, associated resources and distributed applications indispensable Complex systems —More things can.

© 2005 Andreas Haeberlen, Rice University 1 Glacier: Highly durable, decentralized storage despite massive correlated failures Andreas Haeberlen Alan Mislove.

Pond: the OceanStore Prototype CS 6464 Cornell University Presented by Yeounoh Chung.

Pond The OceanStore Prototype. Pond -- Dennis Geels -- January 2003 Talk Outline System overview Implementation status Results from FAST paper Conclusion.

Pond The OceanStore Prototype. Introduction Problem: Rising cost of storage management Observations: Universal connectivity via Internet $100 terabyte.

Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.

David Choffnes, Winter 2006 OceanStore Maintenance-Free Global Data StorageMaintenance-Free Global Data Storage, S. Rhea, C. Wells, P. Eaton, D. Geels,

Outline for today Structured overlay as infrastructures Survey of design solutions Analysis of designs.

Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.

OceanStore Exploiting Peer-to-Peer for a Self-Repairing, Secure and Persistent Storage Utility John Kubiatowicz University of California at Berkeley.

OceanStore Status and Directions ROC/OceanStore Retreat 1/16/01 John Kubiatowicz University of California at Berkeley.

Distributed Cluster Repair for OceanStore Irena Nadjakova and Arindam Chakrabarti Acknowledgements: Hakim Weatherspoon John Kubiatowicz.

OceanStore An Architecture for Global-scale Persistent Storage By John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels,

Failure Independence in Oceanstore Archive Hakim Weatherspoon University of California, Berkeley.

Scalable Adaptive Data Dissemination Under Heterogeneous Environment Yan Chen, John Kubiatowicz and Ben Zhao UC Berkeley.

Efficient replica maintenance for distributed storage systems Byung-Gon Chun, Frank Dabek, Andreas Haeberlen, Emil Sit, Hakim Weatherspoon, M. Frans Kaashoek,

OceanStore Status and Directions ROC/OceanStore Retreat 1/13/03 John Kubiatowicz University of California at Berkeley.

Tentative Updates in MINO Steven Czerwinski Jeff Pang Anthony Joseph John Kubiatowicz ROC Winter Retreat January 13, 2002.

Naming and Integrity: Self-Verifying Data in Peer-to-Peer Systems Hakim Weatherspoon, Chris Wells, John Kubiatowicz University of California, Berkeley.

presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.

Erasure Coding vs. Replication: A Quantiative Comparison

Introspective Replica Management Yan Chen, Hakim Weatherspoon, and Dennis Geels Our project developed and evaluated a replica management algorithm suitable.

Weaving a Tapestry Distributed Algorithms for Secure Node Integration, Routing and Fault Handling Ben Y. Zhao (John Kubiatowicz, Anthony Joseph) Fault-tolerant.

OceanStore: An Architecture for Global-Scale Persistent Storage Professor John Kubiatowicz, University of California at Berkeley

CITRIS Poster Supporting Wide-area Applications Complexities of global deployment  Network unreliability.

Locality Optimizations in Tapestry Jeremy Stribling Joint work with: Kris Hildrum Ben Y. Zhao Anthony D. Joseph John D. Kubiatowicz Sahara/OceanStore Winter.

OceanStore/Tapestry Toward Global-Scale, Self-Repairing, Secure and Persistent Storage Anthony D. Joseph John Kubiatowicz Sahara Retreat, January 2003.

Or, Providing Scalable, Decentralized Location and Routing Network Services Tapestry: Fault-tolerant Wide-area Application Infrastructure Motivation and.

OceanStore An Architecture for Global-Scale Persistent Storage Motivation Feature Application Specific Components - Secure Naming - Update - Access Control-

Tapestry: A Resilient Global-scale Overlay for Service Deployment Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John.

Replica Placement Strategy for Wide-Area Storage Systems Byung-Gon Chun and Hakim Weatherspoon RADS Final Presentation December 9, 2004.

Tapestry An off-the-wall routing protocol? Presented by Peter, Erik, and Morten.

7/15/2015ROC/OceanStore Winter Retreat Introspective Replica Management in OceanStore Dennis Geels.

Understanding Network Failures in Data Centers: Measurement, Analysis and Implications Phillipa Gill University of Toronto Navendu Jain & Nachiappan Nagappan.

A Backup System built from a Peer-to-Peer Distributed Hash Table Russ Cox joint work with Josh Cates, Frank Dabek, Frans Kaashoek, Robert Morris,

Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)

Zuse-Institute Berlin (ZIB) Computer Science Research Artur Andrzejak Zuse-Institute Berlin (ZIB) Overview: Challenges in P2P Systems.

Failure Resilience in the Peer-to-Peer-System OceanStore Speaker: Corinna Richter.

Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.

1 The Design of a Robust Peer-to-Peer System Gisik Kwon Dept. of Computer Science and Engineering Arizona State University Reference: SIGOPS European Workshop.

Pond: the OceanStore Prototype Sean Rhea, Patric Eaton, Dennis Gells, Hakim Weatherspoon, Ben Zhao, and John Kubiatowicz University of California, Berkeley.

Overcast: Reliable Multicasting with an Overlay Network CS294 Paul Burstein 9/15/2003.

Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.

Peer-to-Peer Network Tzu-Wei Kuo. Outline What is Peer-to-Peer(P2P)? P2P Architecture Applications Advantages and Weaknesses Security Controversy.

Plethora: A Wide-Area Read-Write Storage Repository Design Goals, Objectives, and Applications Suresh Jagannathan, Christoph Hoffmann, Ananth Grama Computer.

OceanStore: An Architecture for Global- Scale Persistent Storage.

Fault Tolerance in CORBA and Wireless CORBA Chen Xinyu 18/9/2002.

Effective Replica Maintenance for Distributed Storage Systems USENIX NSDI’ 06 Byung-Gon Chun, Frank Dabek, Andreas Haeberlen, Emil Sit, Hakim Weatherspoon,

Toward Achieving Tapeless Backup at PB Scales Hakim Weatherspoon University of California, Berkeley Frontiers in Distributed Information Systems San Francisco.

POND: THE OCEANSTORE PROTOTYPE S. Rea, P. Eaton, D. Geels, H. Weatherspoon, J. Kubiatowicz U. C. Berkeley.

Tapestry : An Infrastructure for Fault-tolerant Wide-area Location and Routing Presenter : Lee Youn Do Oct 5, 2005 Ben Y.Zhao, John Kubiatowicz, and Anthony.

CubicRing ENABLING ONE-HOP FAILURE DETECTION AND RECOVERY FOR DISTRIBUTED IN- MEMORY STORAGE SYSTEMS Yiming Zhang, Chuanxiong Guo, Dongsheng Li, Rui Chu,

Seminar On Rain Technology

CS791Aravind Elango Maintenance-Free Global Data Storage Sean Rhea, Chris Wells, Patrick Eaten, Dennis Geels, Ben Zhao, Hakim Weatherspoon and John Kubiatowicz.

SEMINAR TOPIC ON “RAIN TECHNOLOGY”

OceanStore : An Architecture for Global-Scale Persistent Storage Jaewoo Kim, Youngho Yi, Minsik Cho.

rain technology (redundant array of independent nodes)

Persistence of Data in a Dynamic Unreliable Network

OceanStore: An Architecture for Global-Scale Persistent Storage

Making the Archive Real

OceanStore August 25, 2003 John Kubiatowicz

Fault Tolerance Distributed Web-based Systems

John D. Kubiatowicz UC Berkeley

Exploiting Routing Redundancy via Structured Peer-to-Peer Overlays

Presentation transcript:

Long Term Durability with Seagull Hakim Weatherspoon (Joint work with Jeremy Stribling and OceanStore group) University of California, Berkeley ROC/Sahara/OceanStore Retreat, Lake Tahoe. Monday, January 13, 2003

ROC/Sahara/OceanStore©2003 Hakim Weatherspoon/UC BerkeleySeagull:2 Questions Given: wide-area durable storage is complex. What is required to convince you to place your data in this system (or a like system)? –How do you know that it works? –How efficient is it? BW, latency, throughput. –Do you trust it? Who do you sue. –How much does it cost? BW, Storage, Money. –How reliable is it? MTTDL/Fractions of Blocks Lost Per Year (FBLPY).

ROC/Sahara/OceanStore©2003 Hakim Weatherspoon/UC BerkeleySeagull:3 Relevance to ROC/Sahara/OceanStore Components of Communication –Heart beating, Fault tolerant routing, etc. Correlation –Monitoring, Human input, etc. Detection –Distributed vs. Global. Repair –Triggered vs. Continuous Reliability –Continuous restart of communication links, etc. –FBLPY (MTTDL).

ROC/Sahara/OceanStore©2003 Hakim Weatherspoon/UC BerkeleySeagull:4 Outline Overview Experience. Lessons learned Required Components Future Directions

ROC/Sahara/OceanStore©2003 Hakim Weatherspoon/UC BerkeleySeagull:5 Deployment Planet Lab global network –98 machines at 42 institutions, in North America, Europe, Australia. –1.26Ghz PIII (1GB RAM), 1.8Ghz PIV (2GB RAM) –North American machines (2/3) on Internet2

ROC/Sahara/OceanStore©2003 Hakim Weatherspoon/UC BerkeleySeagull:6 Deployment Deployed storage system in November of –~ 50 physical machines. –100 virtual nodes. 3 clients, 93 storage serves, 1 archiver, 1 monitor. –Support OceanStore API NFS, IMAP, etc. –Fault injection. –Fault detection and repair.

ROC/Sahara/OceanStore©2003 Hakim Weatherspoon/UC BerkeleySeagull:7 Path of an OceanStore Update

ROC/Sahara/OceanStore©2003 Hakim Weatherspoon/UC BerkeleySeagull:8 OceanStore SW Architecture

ROC/Sahara/OceanStore©2003 Hakim Weatherspoon/UC BerkeleySeagull:9 OceanStore SW Architecture

ROC/Sahara/OceanStore©2003 Hakim Weatherspoon/UC BerkeleySeagull:10 Path of a Storage Update Erasure codes –redundancy without overhead of strict replication –produce n fragments, where any m is sufficient to reconstruct data. m < n. rate r = m/n. Storage overhead is 1/r. Archiver

ROC/Sahara/OceanStore©2003 Hakim Weatherspoon/UC BerkeleySeagull:11 Durability Fraction of Blocks Lost Per Year (FBLPY)* –r = ¼, erasure-encoded block. (e.g. m = 16, n = 64) –Increasing number of fragments, increases durability of block Same storage cost and repair time. –n = 4 fragment case is equivalent to replication on four servers. * Erasure Coding vs. Replication, H. Weatherspoon and J. Kubiatowicz, In Proc. of IPTPS 2002.

ROC/Sahara/OceanStore©2003 Hakim Weatherspoon/UC BerkeleySeagull:12 Naming and Verification Algorithm Use cryptographically secure hash algorithm to detect corrupted fragments. Verification Tree: –n is the number of fragments. –store log(n) + 1 hashes with each fragment. –Total of n.(log(n) + 1) hashes. Top hash is a block GUID (B-GUID). –Fragments and blocks are self-verifying Fragment 3: Fragment 4: Data: Fragment 1: Fragment 2: H2H34HdF1 - fragment dataH14 data H1H34HdF2 - fragment dataH4H12HdF3 - fragment dataH3H12HdF4 - fragment dataF1F2F3F4 H1H2H3H4 H12H34 H14 B-GUID Hd Data Encoded Fragments F1 H2 H34 Hd Fragment 1: H2H34HdF1 - fragment data

ROC/Sahara/OceanStore©2003 Hakim Weatherspoon/UC BerkeleySeagull:13 GUID Fragments Enabling Technology Tapestry DOLR

ROC/Sahara/OceanStore©2003 Hakim Weatherspoon/UC BerkeleySeagull:14 Outline Overview Experience. Lessons learned Required Components Future Directions

ROC/Sahara/OceanStore©2003 Hakim Weatherspoon/UC BerkeleySeagull:15 Lessons Learned Need ability to route to an object if it exists. –Hindrance to a long running process. Robustness to node and network failures. Need tools to diagnosis current state of network. Need ability to run without inner ring. Need monitor, detection, repair mechanisms. –Avoid correlated failures. –Quickly and efficiently detect faults. –Efficiently repair faults. Need to perform maintenance in distributed fashion.

ROC/Sahara/OceanStore©2003 Hakim Weatherspoon/UC BerkeleySeagull:16 Outline Overview Experience. Lessons learned Required Components Future Directions

ROC/Sahara/OceanStore©2003 Hakim Weatherspoon/UC BerkeleySeagull:17 Monitor: Low Failure Correlation Dissemination Model Builder. –Various sources. –Model failure correlation. Set Creator. –Queries random nodes. –Dissemination Sets. Storage servers that fail with low correlation. Disseminator. –Sends fragments to members of set. Model Builder Set Creator Introspection Human Input Network Monitoring model Disseminator set probe type fragments Storage Server

ROC/Sahara/OceanStore©2003 Hakim Weatherspoon/UC BerkeleySeagull:18 Monitor: Low Failure Correlation Dissemination Sanity Check –Monitored 1909 Web Servers Future –Simple Network Management Protocol (SNMP) standard protocol for monitoring. Query network components for information about their configuration, activity, errors, etc. –Define an OceanStore/Tapestry MIB. Management Information Base (MIB)

ROC/Sahara/OceanStore©2003 Hakim Weatherspoon/UC BerkeleySeagull:19 Detection Goal –Maintain routing and object state using minimal resources. e.g. less than 1% of bandwidth and cpu cycles. Server Heartbeat's –“Keep-alive” beacon along each forward link. –Increasing period (decreasing frequency) with the routing level. Data-Driven Server Heartbeat's –“Keep-alive” Multicast to all ancestors with an object pointer that points to us. –Multicast with increasing radius.

ROC/Sahara/OceanStore©2003 Hakim Weatherspoon/UC BerkeleySeagull:20 Detection Republish/Object Heartbeats –Object Heartbeat (Republish). –Heartbeat period increasing with distance (i.e. Heartbeat frequency decreases with distance) Distance is number of application-level hops Distributed Sweep –Request object from storage servers. –Sweep period increasing with distance (i.e. Sweep frequency decreases with distance) Global Sweep (Responsible Party/Client) –Request object from storage servers at regular intervals. –Period constant. (i.e. frequency constant)

ROC/Sahara/OceanStore©2003 Hakim Weatherspoon/UC BerkeleySeagull:21 Object Publication and Location

ROC/Sahara/OceanStore©2003 Hakim Weatherspoon/UC BerkeleySeagull:22 Detection and Repair Schemes

ROC/Sahara/OceanStore©2003 Hakim Weatherspoon/UC BerkeleySeagull:23 Efficient Repair Distributed. –Exploit DOLR’s distributed information and locality. –Efficient detection and then reconstruction of fragments AE L2 L1 L2 L3 L2 L1 L2 L3 L2 Ring of L1 Heartbeats

ROC/Sahara/OceanStore©2003 Hakim Weatherspoon/UC BerkeleySeagull:24 Detection

ROC/Sahara/OceanStore©2003 Hakim Weatherspoon/UC BerkeleySeagull:25 Efficient Repair Continuous vs. Triggered Continuous (Responsible Party/Client) –Request object from storage servers at regular intervals. –Period constant. (i.e. frequency constant) Triggered (Infrastructure) –FBLPY proportional to MTTR/MTTF. Disks: MTTF > 100,000 hours. Gnutella: MTTF = 2.3 hours. (Median 40 minutes). –Local vs. Remote Repair. Local stub looks like durable disk.

ROC/Sahara/OceanStore©2003 Hakim Weatherspoon/UC BerkeleySeagull:26 Efficient Repair Reliability vs. Cost vs. Trust.

ROC/Sahara/OceanStore©2003 Hakim Weatherspoon/UC BerkeleySeagull:27 Outline Overview Experience. Lessons learned Required Components Future Directions

ROC/Sahara/OceanStore©2003 Hakim Weatherspoon/UC BerkeleySeagull:28 Future Directions Redundancy, Detection, Repair, Monitoring. –None alone is sufficient. –Only reliable as weakest link. Verify system? –System may always be in inconsistent state. –How do you know … Data exists? Data will exist tomorrow? Applications/Usage in the long-term. –NSF, IMAP, rsynch (back-up), InternetArchive, etc.