Evaluating Undo: Human-Aware Recovery Benchmarks Aaron Brown with Leonard Chung, Calvin Ling, and William Kakes January 2004 ROC Retreat.

Slides:



Advertisements
Similar presentations
Paul Smith Office for National Statistics
Advertisements

Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Fabián E. Bustamante, Winter 2006 Recovery Oriented Computing Embracing Failure A. B. Brown and D. A. Patterson, Embracing failure: a case for recovery-
Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
Computer Performance CS350 Term Project-Spring 2001 Elizabeth Cramer Bryan Driskell Yassaman Shayesteh.
Variability in Architectural Simulations of Multi-threaded Workloads Alaa R. Alameldeen and David A. Wood University of Wisconsin-Madison
Highly Available Central Services An Intelligent Router Approach Thomas Finnern Thorsten Witt DESY/IT.
Project 4 U-Pick – A Project of Your Own Design Proposal Due: April 14 th (earlier ok) Project Due: April 25 th.
SE 450 Software Processes & Product Metrics Reliability: An Introduction.
Empirical Usability Testing in a Component-Based Environment: Improving Test Efficiency with Component-Specific Usability Measures Willem-Paul Brinkman.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 30 Slide 1 Security Engineering.
Lesson 11-Virtual Private Networks. Overview Define Virtual Private Networks (VPNs). Deploy User VPNs. Deploy Site VPNs. Understand standard VPN techniques.
Design and Evaluation of Iterative Systems n For most interactive systems, the ‘design it right first’ approach is not useful. n The 3 basic steps in the.
Computer Security: Principles and Practice
A Feature-Based Analysis & Comparison of IT Automation Tools: Comparing Kaseya to Developed By: & Advisor : Dr. S. Masoud Sadjadi School of Computing and.
Module 8 Implementing Backup and Recovery. Module Overview Planning Backup and Recovery Backing Up Exchange Server 2010 Restoring Exchange Server 2010.
Adaptive Server Farms for the Data Center Contact: Ron Sheen Fujitsu Siemens Computers, Inc Sever Blade Summit, Getting the.
1 Motivation Goal: Create and document a black box availability benchmark Improving dependability requires that we quantify the ROC-related metrics.
Web Based Applications
 Prototype for Course on Web Security ETEC 550.  Huge topic covering both system/network architecture and programming techniques.  Identified lack.
Guide to Linux Installation and Administration, 2e 1 Chapter 9 Preparing for Emergencies.
1 An SLA-Oriented Capacity Planning Tool for Streaming Media Services Lucy Cherkasova, Wenting Tang, and Sharad Singhal HPLabs,USA.
Recovery-Oriented Computing User Study Training Materials October 2003.
Chapter Fourteen Windows XP Professional Fault Tolerance.
Chapter 6 : Software Metrics
A generic tool to assess impact of changing edit rules in a business survey – SNOWDON-X Pedro Luis do Nascimento Silva Robert Bucknall Ping Zong Alaa Al-Hamad.
Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003.
Event Management & ITIL V3
Composing Adaptive Software Authors Philip K. McKinley, Seyed Masoud Sadjadi, Eric P. Kasten, Betty H.C. Cheng Presented by Ana Rodriguez June 21, 2006.
Failure Analysis of the PSTN: 2000 Patricia Enriquez Mills College Oakland, California Mentors: Aaron Brown David Patterson.
Topic (ii): New and Emerging Methods Maria Garcia (USA) Jeroen Pannekoek (Netherlands) UNECE Work Session on Statistical Data Editing Paris, France,
CORNERSTONES TO MANAGING INFORMATION TECHNOLOGY. WHY SERVICE LEVEL AGREEMENTS? Customer Perceptions---Fantasy? Customer Expectations---Reality Customer.
Tony McGregor RIPE NCC Visiting Researcher The University of Waikato DAR Active measurement in the large.
Metrics and Techniques for Evaluating the Performability of Internet Services Pete Broadwell
Slide 1 Initial Availability Benchmarking of a Database System Aaron Brown 2001 Winter ISTORE Retreat.
Measuring Interactive Performance with VNCplay Nickolai Zeldovich, Ramesh Chandra Stanford University.
SONIC-3: Creating Large Scale Installations & Deployments Andrew S. Neumann Principal Engineer, Progress Sonic.
Fault Tolerance Benchmarking. 2 Owerview What is Benchmarking? What is Dependability? What is Dependability Benchmarking? What is the relation between.
SONIC-3: Creating Large Scale Installations & Deployments Andrew S. Neumann Principal Engineer Progress Sonic.
In Search of Usable Security: Five Lessons from the Field Presentation by 王志誠.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
WERST – Methodology Group
Slide 1 Security Engineering. Slide 2 Objectives l To introduce issues that must be considered in the specification and design of secure software l To.
Spheres of Undo: A Framework for Extending Undo Aaron Brown January 2004 ROC Retreat.
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
OOD OO Design. OOD-2 OO Development Requirements Use case analysis OO Analysis –Models from the domain and application OO Design –Mapping of model.
Introduction to Performance Testing Performance testing is the process of determining the speed or effectiveness of a computer, network, software program.
Survey Results Aaron Brown Billy Kakes Calvin Ling Professor David Patterson.
CHARACTERIZING CLOUD COMPUTING HARDWARE RELIABILITY Authors: Kashi Venkatesh Vishwanath ; Nachiappan Nagappan Presented By: Vibhuti Dhiman.
UCI Large-Scale Collection of Application Usage Data to Inform Software Development David M. Hilbert David F. Redmiles Information and Computer Science.
What do System Administrators Do? William Kakes Calvin Ling Leonard Chung Aaron Brown EECS Computer Science Division University of California, Berkeley.
Computer Security: Principles and Practice First Edition by William Stallings and Lawrie Brown Lecture slides by Lawrie Brown Chapter 17 – IT Security.
1 Design and evaluation methods: Objectives n Design life cycle: HF input and neglect n Levels of system design: Going beyond the interface n Sources of.
Undo for Recovery: Approaches and Models Aaron Brown UC Berkeley ROC Group.
Unix Server Consolidation
Outline Introduction. Changes made to the Tycho design from last time (June 2005). Example Tycho setup. Tycho benchmark motivations and methodology. Some.
Chapter 6: Securing the Cloud
Managing Windows Server 2012
PREPARED BY G.VIJAYA KUMAR ASST.PROFESSOR
Software Architecture in Practice
Large Distributed Systems
Security Engineering.
Dependability Evaluation and Benchmarking of
Undo for Recovery: Approaches and Models
CMSC 611: Advanced Computer Architecture
Unit 2: Fundamentals of Computer Systems
CMSC 611: Advanced Computer Architecture
Agenda The current Windows XP and Windows XP Desktop situation
Implementation of a small-scale desktop grid computing infrastructure in a commercial domain    
Design.
Presentation transcript:

Evaluating Undo: Human-Aware Recovery Benchmarks Aaron Brown with Leonard Chung, Calvin Ling, and William Kakes January 2004 ROC Retreat

Slide 2 Recap: ROC Undo We have developed & built a ROC Undo Tool – a recovery tool for human operators – lets operators take a system back in time to undo damage, while preserving end-user work We have evaluated its feasibility via performance and overhead benchmarks Now we must answer the key question: – does Undo-based recovery improve dependability?

Slide 3 Approach: Recovery Benchmarks Recovery benchmarks measure the dependability impact of recovery – behavior of system during recovery period – speed of recovery recovery time performability impact (performance, correctness) fault/error injection normal behavior performability recovery complete

Slide 4 What About the People? Existing recovery/dependability benchmarks ignore the human operator – inappropriate for undo, where human drives recovery To measure Undo, we need benchmarks that capture human-driven recovery – by including people in the benchmarking process

Slide 5 Outline Introduction Methodology – overview – faultload development – managing human subjects Evaluation of Undo Discussion and conclusions

Slide 6 Methodology Combine traditional recovery benchmarks with human user studies – apply workload and faultload – measure system behavior during recovery from faults – run multiple trials with a pool of human subjects acting as system operators Benchmark measures system, not humans – indirectly captures human aspects of recovery » quality of situational awareness, applicability of tools, usability & error-proneness of recovery procedures

Slide 7 Human-Aware Recovery Benchmarks Key components – workload: reuse performance benchmark – faultload: survey plus cognitive walkthrough – metrics: performance, correctness, and availability – human operators: handle non-self-healing recovery recovery time performability impact (performance, correctness) fault/error injection normal behavior performability recovery complete Key components – workload: reuse performance benchmark » faultload: survey plus cognitive walkthrough – metrics: performance, correctness, and availability » human operators: handle recovery tasks/tools

Slide 8 Developing the Faultload ROC approach combines surveys and cognitive walkthrough – surveys to establish common failure modes, symptoms, and error-prone administrative tasks » domain-specific, system-independent – cognitive walkthrough to translate to system-specific faultload Faultload specifies generic errors and events – provides system-independence, broader applicability – cognitive walkthrough maps to system-specific faults

Slide 9 Example: Service Faultload Web-based survey of admins – core questions: » “Describe any incidents in the past 3 months where data was lost or the service was unavailable.” » “Describe any administrative tasks you performed in the past 3 months that were particularly challenging.” – cost: 4 x $50 gift certificate to amazon.com » raffled off as incentive for participation – response: 68 respondents from SAGE mailing list

Slide 10 Survey Results Results configuration deployment/ upgrade other undoable non- undoable Common TasksChallenging TasksLost problems 50% 56% 25% 26% 17% 25% 18% 31% 33% 12% 1% 6% (151 total)(68 total)(12 total) – results dominated by » configuration errors (e.g., mail filters) » botched software/platform upgrades » hardware & environmental failures – Undo potentially useful for majority of problems

Slide 11 From Survey to Faultload Cognitive walkthrough example: SW upgrade – platform: sendmail on linux – task: upgrade from sendmail to sendmail – approach: 1. configure/locate existing sendmail-linux system 2. clone system to test machine (or use virtual machine) 3. attempt upgrade, identifying possible failure points » benchmarker must understand system to do this 4. simulate failures and select those that match symptom report from task survey – sample result: simulate failed upgrade that disables spam filtering by omitting -DMILTER compile-time flag

Slide 12 Human-Aware Recovery Benchmarks Key components – workload: reuse performance benchmark – faultload: survey plus cognitive walkthrough – metrics: performance, correctness, and availability – human operators: handle non-self-healing recovery recovery time performability impact (performance, correctness) fault/error injection normal behavior performability recovery complete Key components – workload: reuse performance benchmark » faultload: survey plus cognitive walkthrough – metrics: performance, correctness, and availability » human operators: handle recovery tasks/tools

Slide 13 Human Subject Protocol Benchmarks structured as human trials Protocol – human subject plays the role of system operator – subjects complete multiple sessions – in each session: » apply workload to test system » select random scenario and simulate problem » give human subject 30 minutes to complete recover Results reflect statistical average across subjects

Slide 14 The Variability Challenge Must control human variability to get reproducible, meaningful results Techniques – subject pool selection – screening – training – self-comparison » each subject faces same recovery scenario on all systems » system’s score determined by fraction of subjects with better recovery behavior » powerful, but only works for comparison benchmarks

Slide 15 Outline Introduction Methodology Evaluation of Undo – setup – per-subject results – aggregate results Discussion and conclusions

Slide 16 Evaluating Undo: Setup Faultload scenarios 1. SPAM filter configuration error 2. failed server upgrade 3. simple software crash (undo not useful here) Subject pool (after screening) – 12 UCB Computer Science graduate students Self-comparison protocol – each subject given same scenario in each of 2 sessions » undo available in first session only » imposes learning bias against undo, but lowers variability

Slide 17 Sample Single User Result Undo significantly improves correctness – with some (partially-avoidable) availability cost Without UndoWith Undo

Slide 18 Overall Evaluation Undo significantly improves correctness – and reduces variance across operators – statistically-justified, p-value Undo hurts IMAP availability – several possible workarounds exist Overall, Undo has a positive impact on dependability Sessions where Undo used

Slide 19 Outline Introduction Methodology Evaluation of Undo Discussion and conclusions

Slide 20 Discussion Undo-based recovery improves dependability – reduces incorrectly-handled mail in common failure cases More can still be done – tweaks to Undo implementation will reduce availability impact Benchmark methodology is effective at controlling human variability – self-comparison protocol gives statistically-justified results with 9 subjects (vs 15+ for random design)

Slide 21 Future Directions: Controlling Cost Human subject experiments are still costly – recruiting and compensating participants – extra time spent on training, multiple benchmark runs – extra demands on benchmark infrastructure – less than a user study, more than a perf. benchmark A necessary price to pay! Techniques for cost reduction – best-case results using best-of-breed operator – remote web-based participation – avoid human trials: extended cognitive walkthrough

Evaluating Undo: Human-Aware Recovery Benchmarks For more info: – – – paper: A. Brown, L. Chung et al. “Dependability Benchmarking of Human-Assisted Recovery Processes.” Submitted to DSN 2004, June 2004.

Backup Slides

Slide 24 Example: Service Faultload Results of task survey Lost Operator error (8%) User error (8%) External resource (8%) Software error (8%) Hardware/ Env’t (17%) Unknown (8%) (12 reports) Challenging Tasks Filter Installation (37%) Platform Change/ Upgrade (26%) Tool Dev. (6%) Config. (13%) Other (6%) User Ed. (4%) Architecture Changes (7%) (68 total) Configuration problems (25%) Upgrade- related (17%)

Slide 25 Full Summary Dataset