Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003.

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

Tivoli Service Request Manager
State of Indiana Business One Stop (BOS) Program Roadmap Updated June 6, 2013 RFI ATTACHMENT D.
Fabián E. Bustamante, Winter 2006 Recovery Oriented Computing Embracing Failure A. B. Brown and D. A. Patterson, Embracing failure: a case for recovery-
Chapter 19: Network Management Business Data Communications, 4e.
These courseware materials are to be used in conjunction with Software Engineering: A Practitioner’s Approach, 6/e and are provided with permission by.
Interpret Application Specifications
System Development Life Cycle (SDLC)
Leveraging User Interactions for In-Depth Testing of Web Applications Sean McAllister, Engin Kirda, and Christopher Kruegel RAID ’08 1 Seoyeon Kang November.
Handout # 4: Scaling Controllers in SDN - HyperFlow
Computer Security: Principles and Practice
System Engineering Instructor: Dr. Jerry Gao. System Engineering Jerry Gao, Ph.D. Jan System Engineering Hierarchy - System Modeling - Information.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 11 Managing and Monitoring a Windows Server 2008 Network.
Installing software on personal computer
Maintaining Windows Server 2008 File Services
The chapter will address the following questions:
1 Motivation Goal: Create and document a black box availability benchmark Improving dependability requires that we quantify the ROC-related metrics.
Managing Client Access
Module 4 Managing Client Access. Module Overview Configuring the Client Access Server Role Configuring Client Access Services for Outlook Clients Configuring.
This chapter is extracted from Sommerville’s slides. Text book chapter
1.Database plan 2.Information systems plan 3.Technology plan 4.Business strategy plan 5.Enterprise analysis Which of the following serves as a road map.
Internet Service Provisioning Phase - I August 29, 2003 TSPT Web:
Module 10 Configuring and Managing Storage Technologies.
S/W Project Management
MCSE Guide to Microsoft Exchange Server 2003 Administration Chapter Four Configuring Outlook and Outlook Web Access.
11 SECURITY TEMPLATES AND PLANNING Chapter 7. Chapter 7: SECURITY TEMPLATES AND PLANNING2 OVERVIEW  Understand the uses of security templates  Explain.
Warren He, Devdatta Akhawe, and Prateek MittalUniversity of California Berkeley This subset of the web application generates new requests to the server.
‘One Sky for Europe’ EUROCONTROL © 2002 European Organisation for the Safety of Air Navigation (EUROCONTROL) Page 1 VALIDATION DATA REPOSITORY Overview.
Recovery-Oriented Computing User Study Training Materials October 2003.
Conditions and Terms of Use
Building Undo for Operators: An Undoable Store Aaron Brown ROC Research Group University of California, Berkeley Winter 2003 ROC Research Retreat.
Architecture-Based Runtime Software Evolution Peyman Oreizy, Nenad Medvidovic & Richard N. Taylor.
Automatic Software Testing Tool for Computer Networks ADD Presentation Dudi Patimer Adi Shachar Yaniv Cohen
Chapter 6 : Software Metrics
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
Configuring and Troubleshooting Identity and Access Solutions with Windows Server® 2008 Active Directory®
Mobile Topic Maps for e-Learning John McDonald & Darina Dicheva Intelligent Information Systems Group Computer Science Department Winston-Salem State University,
Rewind, Repair, Replay: Three R’s to improve dependability Aaron Brown and David Patterson ROC Research Group University of California at Berkeley SIGOPS.
MD Digital Government Summit, June 26, Maryland Project Management Oversight & System Development Life Cycle (SDLC) Robert Krauss MD Digital Government.
Module 4 Planning and Deploying Client Access Services in Microsoft® Exchange Server 2010 Presentation: 120 minutes Lab: 90 minutes After completing.
Business Process Change and Discrete-Event Simulation: Bridging the Gap Vlatka Hlupic Brunel University Centre for Re-engineering Business Processes (REBUS)
- Ahmad Al-Ghoul Data design. 2 learning Objectives Explain data design concepts and data structures Explain data design concepts and data structures.
Adaptable Consistency Control for Distributed File Systems Simon Cuce Monash University Dept. of Computer Science and Software.
Overview Managing a DHCP Database Monitoring DHCP
SOFTWARE DESIGN AND ARCHITECTURE LECTURE 05. Review Software design methods Design Paradigms Typical Design Trade-offs.
Module 3 Planning and Deploying Mailbox Services.
Evaluating Undo: Human-Aware Recovery Benchmarks Aaron Brown with Leonard Chung, Calvin Ling, and William Kakes January 2004 ROC Retreat.
Microsoft Management Seminar Series SMS 2003 Change Management.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
PAPI: Simple and Ubiquitous Access to Internet Information Services JISC/CNI Conference - Edinburgh, 27 June 2002.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
A Generalized Architecture for Bookmark and Replay Techniques Thesis Proposal By Napassaporn Likhitsajjakul.
Rational Unified Process Fundamentals Module 4: Core Workflows II - Concepts Rational Unified Process Fundamentals Module 4: Core Workflows II - Concepts.
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
Spheres of Undo: A Framework for Extending Undo Aaron Brown January 2004 ROC Retreat.
CITA 310 Section 6 Providing Services (Textbook Chapter 8)
State of Georgia Release Management Training
MCSE Guide to Microsoft Exchange Server 2003 Administration Chapter One Introduction to Exchange Server 2003.
What do System Administrators Do? William Kakes Calvin Ling Leonard Chung Aaron Brown EECS Computer Science Division University of California, Berkeley.
Computer Security: Principles and Practice First Edition by William Stallings and Lawrie Brown Lecture slides by Lawrie Brown Chapter 17 – IT Security.
Undo for Recovery: Approaches and Models Aaron Brown UC Berkeley ROC Group.
Samad Paydar WTLab Research Group Ferdowsi University of Mashhad LD2SD: Linked Data Driven Software Development 24 th February.
Chapter 8 Environments, Alternatives, and Decisions.
Table spaces.
Maintaining Windows Server 2008 File Services
Addressing Human Error with Undo
Undo for Recovery: Approaches and Models
PLANNING A SECURE BASELINE INSTALLATION
Presentation transcript:

Undo: Update and Futures Aaron Brown ROC Research Group University of California, Berkeley Summer 2003 ROC Retreat 5 June 2003

Slide 2 Outline Recap of Undo for Operators Measurements of undo prototype Upcoming: human evaluation Potential future extensions

Slide 3 Recap: What Is “Operator Undo”? Give operators and system admins the ability to “travel in time” – to undo the effects of erroneous actions » configuration changes » new software deployment » patches and upgrades » problem repairs – to retroactively repair other problems affecting state » software bugs » viruses » external attacks

Slide 4 Recap: Three R’s Undo Model Time travel for system operators – Rewind: roll back all state, users’ and operator’s – Replay: re-execute rewound user events » operator timeline must be restored manually, if desired » may cause externally-visible paradoxes for users – Repair: alter past operator events to avert problems User timeline Operator timeline “Undo!”

Slide 5 A Simple Solution for a Common Case Undo for services with human end-users – centralized state scopes the problem – human users provide flexibility for handling paradoxes » undo is typically transparent to end-user, but not perfect » worst-case: end-user must reconcile mental model based on supplied hints Applicability ideally suited to Undopoorly suited to Undo online auctions missile launch control online shopping shared calendaring financial applications file/block storage service web search

Slide 6 Architecture in Brief Target – black-box services with human end-users – single-host, for simplicity Approach – rewindable storage – intercept, log, replay user requests Fault assumptions – service can be arbitrarily incorrect Operator Repairs Application Service Can include: - user state - application - OS Rewindable Storage Users App. Proxy App. protocol User Timeline Log User events

Slide 7 Instantiation: Prototype Operator Repairs Store Service Can include: - mailboxes - server code - OS NetApp Filer Users IMAP/SMTP Proxy IMAP/SMTP IMAP User Timeline Log events Prototype target – store service » leaf node in delivery network Implementation – NetApp filer provides rewindable storage layer – -specific proxy intercepts/replays IMAP & SMTP requests SMTP

Slide 8 Key Concept: Verbs Verbs encode user events – encapsulate application protocol commands » record of desired user action » context-independent record of parameters » record of externally-visible output – intended to capture intent of protocol commands, not effects on system state Example verbs for (simplified) – SMTP: DELIVER {to, from, messageText} {} – IMAP:COPY {srcFolder, msgNum[], dstFolder} {} FETCH {folder, msgNum[], fetchSpec} {text}

Slide 9 Role of Verbs Verbs enable replay – verb log forms a history of end-user interaction » dissociated from original system context » annotated with original output to end-user » annotated with external consistency policy and compensations for consistency violations Verbs make it easier to reason about 3R’s – define exactly what user state is preserved by 3R cycle Verbs capture key application semantics – consistency model and commutativity of operations

Slide 10 Outline Recap of Undo for Operators Measurements of undo prototype Upcoming: human evaluation Potential future extensions

Slide 11 Prototype Details Target service: store service – a leaf node in the Internet network Prototype details – wraps an existing IMAP/SMTP store service » not platform-specific » evaluation uses sendmail and the UW IMAP server – written in Java » ~25K lines (~9K semicolons) » about 1/8 the size of the mail service itself, in LoC

Slide 12 Prototype Measurements Experiments – space overhead – time overhead – rewind & replay time Evaluation workload – modified SPECmail2000 workload with 10,000 users » simulates traffic seen by ISP mail server » modified to use IMAP instead of POP; all mail kept local

Slide 13 Feasibility: Space & Time Overhead Space overhead – 0.45 GB/day/1000 users » uncompressed » Java serialization bug overhead factored out (>2x bigger) – ~250,000 user-days of data on one 120GB disk Time overhead – IMAP/SMTP session lengths for SPECmail workload: – below perceived “sluggishness” threshold for interactive apps.

Slide 14 Feasibility: Rewind and Replay Rewind – NetApp filer snapshot restore: ~8 seconds » independent of amount of data to restore » but not undoable – alternative is O(#files) » 10 minutes for 10,000 users Replay – replay speed: ~9 verbs/sec – with parallel, O-O-O replay – better connection management will help – compared to real-time:

Slide 15 Outline Recap of Undo for Operators Measurements of undo prototype Upcoming: human evaluation Potential future extensions

Slide 16 Evaluating Undo: Human Factors Undo is a recovery tool for human operators – effectiveness depends on how it is used » will it address the problems faced by real operators? » will operators know when/how to use it? » does it improve dependability over manual recovery? Need methodology that synthesizes systems benchmarking with human studies – include human operators to drive recovery – but focus is on the system and system metrics » recovery time, dependability, performance

Slide 17 Evaluating Human Factors of Undo Three-step process 1) survey operators to identify real-world problems » evaluate whether Undo will address them » collect scenarios for step 2 2) controlled laboratory experiments involving humans » evaluate Undo against manual recovery » use scenarios from step 1 » evaluate with dependability metrics: recovery time, correctness, performance 3) long-term ethnographic study of deployed system » evaluate dependability benefits of Undo “in the wild” » requires time and resources beyond the scope of this work

Slide 18 Step 1: Survey Operators Online survey of system operators – questions on daily tasks, challenges, recent problems – 68 responses Results configuration deployment/ upgrade other undoable non- undoable Common TasksChallenging TasksLost problems 50% 56% 25% 26% 17% 25% 18% 31% 33% 12% 1% 6% (151 total)(68 total)(12 total) » configuration and deployment issues dominate » Undo potentially useful for majority of tasks, problems

Slide 19 Step 2: Lab Experiments w/Humans Questions to answer – do operators know when Undo is appropriate? – does having Undo improve dependability? Compare systems with & without Undo – randomized human trials – each trial structured as a dependability benchmark In progress

Slide 20 Dependability Benchmarks Dependability benchmark basics – apply workload – simulate realistic problem scenario – measure recovery time, correctness, performance recovery time performability impact (performance, correctness) start of scenario normal behavior performability end of scenario – trial scenarios chosen based on survey results » including scenarios where Undo is unlikely to help See: Brown, Chung, Patterson, “Including the Human Factor in Dependability Benchmarks”, DSN WDB Brown, Patterson, “Towards Availability Benchmarks...”, USENIX 2000.

Slide 21 Lab Experiments with Humans Some key subtleties – overcoming mental model inertia » select and train less-experienced subjects – making scenarios tractable » subject plays role of shift-work operator repairing documented problem from previous shift Status: in progress – experimental protocol defined – just received Human Subjects Committee approval – data collection to begin shortly

Slide 22 Outline Recap of Undo for Operators Measurements of undo prototype Upcoming: human evaluation Potential future extensions

Slide 23 Extending Undo: Other Apps When is undo possible? – state is centralized (or observable) – all output to external entities can be intercepted » and can be correlated to user requests – external output is provisional for some time window » e.g., can be cancelled, altered, reissued » or simply doesn’t matter in application’s external consistency model ideally suited to Undopoorly suited to Undo online auctions missile launch control online shopping shared calendaring financial applications file/block storage service web search

Slide 24 Extending Undo: Spheres of Undo Rewindable storage defines a sphere of undo All info crossing sphere must be intercepted – input: becomes verbs – output: becomes externalized output » must be possible to associate output with a verb Rewindable Storage Application Service Sphere of Undo Users Service RS External service (output consumer) External data source P P P

Slide 25 Further Extensions Verb concept may have broader applicability – impact analysis of configuration changes » use verb log as annotated history to evaluate changes on cloned system – self-checking data set for self-testing components – general approach to defining & encapsulating application consistency from end-user point of view? » today, procedural and implicit » can verbs be made declarative? » can verbs be extracted automatically from object relationships?

Slide 26 More Verb Extensions Extending verbs to administrative tasks – in desktop environment » manage software installations/upgrades » provide “system refresh” using undo techniques » capture configuration changes at intent level – in server environment » move common tasks into undo framework » dynamically identify and guide ongoing operations tasks by analyzing verb sequences – key challenge in either environment is to capture breadth of administrative tasks

Slide 27 Conclusions implementation demonstrates feasibility of Undo – improvements in protocols, base storage technology would help reduce overhead Human experiments to evaluate usefulness about to begin Verb construct has significant potential for further research – extending Undo to broader domains – exploring other tools to support human operators

Undo: Update and Futures Acknowledgements – ROC Undergraduate Benchmarking Group » Leonard Chung, Billy Kakes, Calvin Ling – Berkeley/Stanford ROC Research Group For more info: – –