Recovery Oriented Computing: Update Armando Fox (in loco Patterson) Summer ROC Retreat, June 2002.

Slides:



Advertisements
Similar presentations
Mafijul Islam, PhD Software Systems, Electrical and Embedded Systems Advanced Technology & Research Research Issues in Computing Systems: An Automotive.
Advertisements

Network Management Basics Network management requirements OSI Management Functional Areas –Network monitoring: performance, fault, accounting –Network.
Fabián E. Bustamante, Winter 2006 Recovery Oriented Computing Embracing Failure A. B. Brown and D. A. Patterson, Embracing failure: a case for recovery-
Pinpoint: Problem Determination in Large, Dynamic Internet Services Mike Chen, Emre Kıcıman, Eugene Fratkin {emrek,
CSE 598B: Self-* Systems Path Based Failure and Evolution Management Mike Y. Chen, Anthony Accardi, Emre Kiciman, Jim Lloyd, Dave Patterson, Armando Fox,
RAMP Retreat August 2008 Christos Kozyrakis Pervasive Parallelism Laboratory Stanford University
The Future of Correct Software George Necula. 2 Software Correctness is Important ► Where there is software, there are bugs ► It is estimated that software.
Failure Analysis of Two Internet Services Archana Ganapathi
OceanStore/Tapestry Toward Global-Scale, Self-Repairing, Secure and Persistent Storage Anthony D. Joseph John Kubiatowicz Sahara Retreat, January 2003.
CalStan 3/2011 VIRAM-1 Floorplan – Tapeout June 01 Microprocessor –256-bit media processor –12-14 MBytes DRAM – Gops –2W at MHz –Industrial.
Sustainable ICT and Sustainable e-learning Symposium on the Benefits of eLearning Technologies University of Manchester, in conjunction with the Higher.
Recovery Oriented Computing (ROC) Dave Patterson and a cast of 1000s: Aaron Brown, Pete Broadwell, George Candea †, Mike Chen, James Cutler †, Prof. Armando.
Introduction and Review : Educational Technology 1
Security Difficulties of E-Learning in Cloud Computing
Latency as a Performability Metric for Internet Services Pete Broadwell
CompSci Self-Managing Systems Shivnath Babu.
ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.
IT – DBMS Concepts Relational Database Theory.
PMIT-6102 Advanced Database Systems
1 Autonomic Computing An Introduction Guenter Kickinger.
Welcome!. 2 Introduction  Welcome!  Feedback from the Partner surveys  Looking ahead to FY06 and beyond.  The Microsoft Partner Program  New benefits.
Use-Cases / Technology Session DE Cluster Meeting, Brussels nd May, 2007.
CS492: Special Topics on Distributed Algorithms and Systems Fall 2008 Lab 3: Final Term Project.
Alec Stanculescu, Fintronic USA Alex Zamfirescu, ASC MAPLD 2004 September 8-10, Design Verification Method for.
Version 4.0. Objectives Describe how networks impact our daily lives. Describe the role of data networking in the human network. Identify the key components.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
CSS-TW1 Cooperation in Selfish Systems incorporating TagWorld I Welcome! David Hales, University of Bologna.
Recovery Oriented Computing (ROC) Aaron Brown*, Pete Broadwell, George Candea †, Mike Chen, Leonard Chung*, James Cutler †, Armando Fox †, Archana Ganapathi*,
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
High Availability for Information Security Managing The Seven R’s Rich Schiesser Sr. Technical Planner.
K. De UTA Grid Workshop April 2002 U.S. ATLAS Grid Testbed Workshop at UTA Introduction and Goals Kaushik De University of Texas at Arlington.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Metrics and Techniques for Evaluating the Performability of Internet Services Pete Broadwell
CompSci Self-Managing Systems Shivnath Babu.
1 ISA&D29-Oct ISA&D29-Oct-13 Systems Analyst: problem solver IT and Strategic Planning.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
Colorado Center for Astrodynamics Research The University of Colorado 1 Emerging Space Industry Leaders Workshop (ESIL-01) Welcome to Boulder.
Reliability & Maintainability Engineering An Introduction Robert Brown Electrical & Computer Engineering Worcester Polytechnic Institute.
© 2011 theIDLgroup Welcome Workshop on integrating Social Protection, DRR and CCA UNECA, Addis Ababa 14 th – 17 th March 2011.
Welcome WRF Verification Toolkit Workshop February 2007.
CompSci Self-Managing Systems Shivnath Babu.
Ashish Prabhu Douglas Utzig High Availability Systems Group Server Technologies Oracle Corporation.
Progress Report Armando Fox with George Candea, James Cutler, Ben Ling, Andy Huang.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
Slide 1 Recovery-Oriented Computing Aaron Brown, Dan Hettenna, David Oppenheimer, Noah Treuhaft, Leonard Chung, Patty Enriquez, Susan Housand, Archana.
Pinpoint: Problem Determination in Large, Dynamic Internet Services Mike Chen, Emre Kıcıman, Eugene Fratkin {emrek,
COP 5611 Operating Systems Spring 2010 Dan C. Marinescu Office: HEC 439 B Office hours: M-Wd 1:00-2:00 PM.
Lecturer: Eng. Mohamed Adam Isak PH.D Researcher in CS M.Sc. and B.Sc. of Information Technology Engineering, Lecturer in University of Somalia and Mogadishu.
Our Mission. Computer Purchasing Website Design and Development Services.
Click to edit Master title style Sytel’s High Availability Strategy © 2012 Sytel Limited. All rights reservedVersion 2.5.
GABRIELLA CARROZZA & CSTEP What I was before, what I am today and what I do expect to be tomorrow Sesm scarl
Advanced Higher Computing Science The Project. Introduction Worth 60% of the total marks for the course Must include: An appropriate interface using input.
© 2010 VMware Inc. All rights reserved Why Virtualize? Beng-Hong Lim, VMware, Inc.
Please fill in my session feedback form available on each chair. SPSCairo Welcome.
An Iterative Method For System Integration
Welcome to the Winter 2004 ROC Retreat
Embracing Failure: A Case for Recovery-Oriented Computing
Large Distributed Systems
Fault Tolerance & Reliability CDA 5140 Spring 2006
Maximum Availability Architecture Enterprise Technology Centre.
Preparation for the June 2005 High Confidence Medical Device Software & Systems (HCMDSS) Workshop (continued…) We have a new url:
Recovery-Oriented Computing
COP 5611 Operating Systems Fall 2011
INFO 344 Web Tools And Development
Fault Tolerance Distributed Web-based Systems
Why do Internet services fail, and what can be done about it?
WELCOME! Nonclinical Topics Working Group CSS Breakout Plan.
CSC3050 – Computer Architecture
Productive + Hybrid + Intelligent + Trusted
Presentation transcript:

Recovery Oriented Computing: Update Armando Fox (in loco Patterson) Summer ROC Retreat, June 2002

© 2002 Armando Fox Welcome and ROC Philosophy n ROC philosophy (“Peres’s Law”): “If a problem has no solution, it may not be a problem, but a fact; not to be solved, but to be coped with over time” Israeli foreign minister Shimon Peres l Failures (hardware, software, operator-induced) are a fact; recovery is how we cope with them over time l Availability = MTTF/MTBF= MTTF / (MTTF + MTTR) - rather than just making MTTF very large, make MTTR << MTTF n ROC Principles 1. Isolation and partitionability => redundancy 2. Enable fault injection, output checking => online monitoring & verification 3. Undo support 4. Diagnostic support

© 2002 Armando Fox Major ROC Areas n Failure detection and diagnosis l Pinpoint l FIG l Internet service failure causes n Recovery techniques and Design-for-Recovery l Recursive Restartability l Making state-management tradeoffs explicit (QAPSL) l Firm state from infirm components (RAINS) l Designing for Undo: theory and practice n Benchmarking and measurement l Dependability benchmarks for various applications l End-user availability measurements on the Web l Why Internet services fail l Estimating the cost of downtime l Availability in the PSTN

© 2002 Armando Fox Recent Publications ROC Techniques and Tools: n A Utility-Centered Approach to Internet Services Design. George Candea, Armando Fox, in SIGOPS European Workshop n FIG: A prototype Tool for Online Verification of Recovery Mechanisms. P. Broadwell, N.Sastry, J.Traupman, D.Patterson, in SHAMAN workshop at ICS 2002 n Rewind, repair, replay: 3 R’s to Dependability. A. Brown and D. Patterson, SIGOPS European Workshop n Including the Human Factor in Dependability Benchmarks. A. Brown, L. Chung, D. Patterson. In DSN 2002 Workshop on Dependability Benchmarking. ROC Measurements: n Architecture, operation, and dependability of large-scale Internet services: three case studies. D. Oppenheimer and D.A. Patterson. Submission to IEEE Internet Computing special issue on Global Deployment of Data Centers, February (Shorter version in SIGOPS European Workshop) n Measuring End-User Availability on the Web: Practical Experience. Matthew Merzbacher and Dan Patterson. n Lessons from the PSTN for Dependable Computing. P.Enriquez, A.Brown, D.Patterson, in SHAMAN workshop at ICS Fault monitoring/diagnosis: n An Online Evolutionary Approach to Internet Services. E. Kiciman, M. Chen, E. Brewer. In SIGOPS European Workshop

© 2002 Armando Fox Recent Evangelism n Evangelism publications l “Case for ROC” Technical Report l Introduction to Dependability (;login) l A Simple Way to Measure Cost of Downtime (LISA 02) n Evangelism talks l Microsoft Research l HPCA 02 keynote (Patterson) l FAST keynote (Filesystems And Storage Technologies) l IBM Autonomic Computing workshops (Almaden & TJ Watson)

© 2002 Armando Fox About ROC Retreats n Purpose of semi-annual retreats l Progress reports/talks from academia and industry l Exposure/feedback on new ideas or work in progress l Brainstorming in immersive atmosphere l Industry/visitor feedback, opportunities for collaboration l Water fights during rafting trip n Logistics l Web server with retreat talks/papers - thanks to Mike Howard and Bob Miller - WaveLAN “ANY”

© 2002 Armando Fox Retreat Schedule - a work in progress n Rest of today l OceanStore update from Kubi l Intros l ROC talks l All day: Posters (especially right before & after dinner) n Tomorrow l Morning: OceanStore talks l Afternoon: Lunch and rafting l Post-rafting: breakout sessions followed by dinner l Breakout reporting/joint panel session with SAHARA n Wednesday l Industry talk(s) l “Open mike”/outrageous ideas session? l Visitor feedback

© 2002 Armando Fox Breakout Sessions n Target: 3-4 breakouts l Using virtual machine technology for ROC l Ideas for the second ROC showcase application l Applying ROC to OceanStore l Management and Self-healing of large-scale systems l Is >100 year storage a pipe dream? n Other topics solicited n Final breakout topics will be decided based on interest in each topic and limiting each group size