Software Upgrades for Distributed Systems

Slides:



Advertisements
Similar presentations
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
Advertisements

Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Software Frame Simulator (SFS) Technion CS Computer Communications Lab (236340) in cooperation with ECI telecom Uri Ferri & Ynon Cohen January 2007.
Replication and Consistency (2). Reference r Replication in the Harp File System, Barbara Liskov, Sanjay Ghemawat, Robert Gruber, Paul Johnson, Liuba.
Towards a Logic for Wide-Area Internet Routing Nick Feamster and Hari Balakrishnan M.I.T. Computer Science and Artificial Intelligence Laboratory Kunal.
Software Upgrades in Distributed Systems Barbara Liskov MIT Laboratory for Computer Science October 23, 2001.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
11 MAINTAINING THE OPERATING SYSTEM Chapter 5. Chapter 5: MAINTAINING THE OPERATING SYSTEM2 CHAPTER OVERVIEW Understand the difference between service.
Chapter 10 Architectural Design
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
Programming with Alice Computing Institute for K-12 Teachers Summer 2011 Workshop.
Introduction to Networked Graphics Part 3 of 5: Latency.
11 MANAGING AND DISTRIBUTING SOFTWARE BY USING GROUP POLICY Chapter 5.
Copyright 2002 Prentice-Hall, Inc. Modern Systems Analysis and Design Third Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich Chapter 20 Object-Oriented.
11 Web Services. 22 Objectives You will be able to Say what a web service is. Write and deploy a simple web service. Test a simple web service. Write.
Diagnostic Pathfinder for Instructors. Diagnostic Pathfinder Local File vs. Database Normal operations Expert operations Admin operations.
Yang Shi (Richard), Yong Zhang IETF 74 th 26 March 2009, San Francisco CAPWAP WG MIB Drafts Report.
Chapter 8 Object Design Reuse and Patterns. Object Design Object design is the process of adding details to the requirements analysis and making implementation.
Capabilities of Software. Object Linking & Embedding (OLE) OLE allows information to be shared between different programs For example, a spreadsheet created.
The TAOS Authentication System: Reasoning Formally About Security Brad Karp UCL Computer Science CS GZ03 / M th November, 2008.
ECE450 - Software Engineering II1 ECE450 – Software Engineering II Today: Design Patterns VIII Chain of Responsibility, Strategy, State.
SKYPIAX, how to add Skype capabilities to FreeSWITCH (and Asterisk) CHICAGO, USA, September 2009.
Feb 1, 2001CSCI {4,6}900: Ubiquitous Computing1 Eager Replication and mobile nodes Read on disconnected clients may give stale data Eager replication prohibits.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
Testing OO software. State Based Testing State machine: implementation-independent specification (model) of the dynamic behaviour of the system State:
Slide title In CAPITALS 50 pt Slide subtitle 32 pt RTSP draft-ietf-mmusic-rfc2396bis-10 Magnus Westerlund Co-auhtors: Henning Schulzrinne, Rob Lanphier,
Lazy Modular Upgrades in Persistent Object Stores Chandrasekhar Boyapati, Barbara Liskov, Liuba Shrira, Chuang-Hue Moh, Steven Richman MIT CSAIL October.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Progress Apama Fundamentals
Kernel Design & Implementation
Essentials of UrbanCode Deploy v6.1 QQ147
Processes and threads.
Server Upgrade HA/DR Integration
Regression Testing with its types
Transaction Management
Inheritance Based on slides by Ethan Apter
Self Healing and Dynamic Construction Framework:
Unified Modeling Language
Outline What does the OS protect? Authentication for operating systems
CSE 486/586 Distributed Systems Consistency --- 1
Arab Open University 2nd Semester, M301 Unit 5
Graph Coverage for Specifications CS 4501 / 6501 Software Testing
OpenStorage API part II
Multiprocessor Cache Coherency
Modularity and Memory Clearly, programs must have access to memory
Outline What does the OS protect? Authentication for operating systems
View Change Protocols and Reconfiguration
Design and Maintenance of Web Applications in J2EE
Department of Computer Science University of California, Santa Barbara
Providing Secure Storage on the Internet
Chapter 9: IOS Images and Licensing
Internet Control Message Protocol Version 4 (ICMPv4)
CSE 486/586 Distributed Systems Consistency --- 1
Chapter 20 Object-Oriented Analysis and Design
Coding Concepts (Basics)
An Introduction to Software Architecture
SAMANVITHA RAMAYANAM 18TH FEBRUARY 2010 CPE 691
View Change Protocols and Reconfiguration
Design Yaodong Bi.
Chapter 22 Object-Oriented Systems Analysis and Design and UML
Use Case Analysis – continued
ECE 352 Digital System Fundamentals
Department of Computer Science University of California, Santa Barbara
Dynamic Routing Protocols part3 B
Testing.
ECE 352 Digital System Fundamentals
Design.
CSE 486/586 Distributed Systems Consistency --- 1
Upgrading Distributed Systems is not rsync
How to install and manage exchange server 2010 OP Saklani.
Presentation transcript:

Software Upgrades for Distributed Systems Sameer Ajmani Google Shadow methods similar to Haskell monads? In that they relate types. Barbara Liskov MIT Liuba Shrira Brandeis

Internet Services Are long-lived, robust Run on many machines Must be continuously available Have persistent state Face ever-changing requirements Require software upgrades to Fix bugs Improve performance Add/change/remove features Long-lived implies robust to failures!

Upgrade Requirements Automatic, Controlled Deployment Ensure continuous availability Test new software on a few nodes Upgrade servers before clients Mixed mode operation Upgrades may be slow due to transforms v1 v3 v2 v2 (upgrading) v2

Outline System & Upgrade Model Specifying Upgrades Implementation Models

System Model A node is an object of class C Different nodes may run different classes Nodes communicate via RPCs Examples of classes include storage servers, routers, clients Can be extended to more general models (e.g., multiple objects per physical node).

Upgrade Model A class upgrade replaces an old class, Cold, with a new one, Cnew Implements types Told, Tnew May be compatible or incompatible An upgrade is a set of class upgrades We focus on incompatible upgrades here.

Supporting Mixed Mode Each node handles calls to past and future versions of itself Adding support for new versions must be fast Removing support for old versions should be easy We chose to have receivers handling simulation; callers are oblivious to receiver’s version. Nodes are responsible for their own behavior. We considered doing simulation on the clients -- doesn’t really work. TRANSITION: We need some software structure to support this!

Simulation Objects Each node handles calls to past and future versions of itself C3 F5 F4 P2 DESCRIBE the diagram: - Every node is running a current version. - This node is running at version 3, but supports versions 2 through 5. Version 3 is the highly-optimized actual software; versions 4, 5, and 2 are simulated via other objects. Version 1 is no longer supported. - There needs to be some mechanism for dispatching calls; we’ll discuss this later. - The arrows are one possible way of connecting nodes. Future SO Installed when a new version is introduced Past SO Installed when the node upgrades Future SOs simulate future behavior Past SOs simulate past behavior Adding/removing an SO does not require a restart

Simulation Objects Each node handles calls to past and future versions of itself C3 F5 F4 P2 Internal node upgrades (e.g., performance); not interface change -> no SOs Compatible upgrades; need future SO, but not past SO Incompatible upgrades: need both future and past SO -- this is our focus SOs are only required for certain upgrades

Specifying Upgrades Shadow methods similar to Haskell monads? In that they relate types.

Specifying Upgrades Must behave like a single object Even when upgrades are incompatible Upgrade specification must define this Goal: no surprises for clients! E.g., changing permissions to ACLs This is the big contribution. Compatible upgrades are specified implicitly by the subtype relationship.

Constraints on Specifications Type requirement A call of version V behaves according to the specification of type TV This requirement is stated differently than in the paper (paper mentions which class implements which type, which should not be specified here). To a node running v3, the whole world appears to be running v3 -- simplifies writing software! Calls are stamped with client version. v1 v3 v3 v3 v2 v1 v2

Constraints on Specifications Sequence requirement Each event must reflect all earlier ones despite: Client upgrades Server upgrades Version introduced Version retired Explain version retirement. File system example: client expects to see files it wrote (or peers wrote) despite these events. Client upgrade is interesting -- what does the client expect to see on the server after it upgrades?

Example ColorSet  FlavorSet Incompatible upgrade ColorSet methods: insertColor(x, c), getColor(x), … E.g., { (1, red), (2, blue), (3, red) } FlavorSet methods: insertFlavor(x,f), getFlavor(x), … This is simplistic but analogous to real examples like adding properties to files in a filesystem.

Specifications: Invariant Invariant I relates the object states I(Oold, Onew) Oold Onew I Choose the weakest invariant possible. In this case, lots of nondeterminism. Invariant must be TOTAL over all legal Oold, Onew states. I: {x|<x,c> in OCS} = {x|<x,f> in OFS}

Specifications: Mapping Function Mapping function MF defines initial state Onew = MF(Oold) s.t. I(Oold, Onew) Oold Onew MF MF establishes the invariant. Could have done something more sophisticated here -- we had choices. Used to create the Future SO when it is installed. OFS = MF(OCS) = {<x,grape>| <x,c> in OCS}

Relating Behavior Only for mutators Onew O’new ? I I Oold O’old m We thought we could infer the ‘?’ from the invariant and MF and spec of m, but we cannot (as show in following slides). Only for mutators

Relating Behavior Shadow methods relate behavior Told.m  Tnew.$m Onew O’new $m I I Oold O’old m Shadow methods relate behavior Told.m  Tnew.$m Tnew.p Told.$p

Shadow Method Specification void ColorSet.$insertFlavor(x, f) Effects: no <x,c> in thispre  thispost = thispre U {<x,blue>} Also ColorSet.$delete, FlavorSet.$insertColor, FlavorSet.$delete Defining shadow methods is usually simple.

The Compound Type I, MF, and the shadow methods define a compound type Told&new All the methods, with extended specs for mutators We would like: Told&new is a subtype of Told, Tnew When this doesn’t work: Weaken invariant I Upgrade scheduling Disallow methods FlavoSet&ColorSet is a subtype each of FS and CS TRANSITION: now how to implement this Consider introducing compund type before I, MF, shadow in a future talk.

Implementation Models

Direct Model SO handles calls only to its own version And delegates down the chain C2 F4 F3 P1 v3 calls v2 calls v2 calls Future: terminology: delegation vs. forwarding v3 calls

Problems with Direct Model Poor expressive power E.g., FlavorSet SO doesn’t know about C.insertColor call Synchronization From the point of view of F4, calls to other objects break encapsulation. C2 F4 F3 P1 v3 calls v2 calls v2 calls v3 calls

Interceptor Model Newest SO gets all calls And delegates down the chain C1 F4 F3 F2 v3,2,1 calls v2,1 calls v1 calls TRANSITION: what happens when we upgrade to v2, which is incompatible? all calls

Interceptor Model After first (incompatible) upgrade is installed C2 v3,2,1 calls v2,1 calls v2 calls (somewhat different notation from the paper) Past so is “P2” because it was defined in the upgrade for v2, but it handles v1 calls! TRANSITION: what happens when we upgrade to v3, which is compatible? all calls

Interceptor Model After second (possibly compatible) upgrade is installed C3 F4 P3 v3,2,1 calls v3 calls This upgrade is compatible, but still need past SO to support version 1! In this model, there is always at most 1 past SO. This past SO disappears when version 1 is retired all calls

Interceptor Model Evaluation Excellent expressive power Future and past SO must do more Can reuse code and delegate C3 F4 P3 TRANSITION: we’ve implemented a prototype… v3,2,1 calls v3 calls all calls

Prototype Implementation: Upstart C++ and Sun RPC Intercepts socket(), read(), write() Imposes minimal overhead

Summary Upstart is the first complete approach Allows mixed mode operation The first definition of what must be specified for incompatible upgrades A powerful and useful implementation model A prototype implementation

Software Upgrades for Distributed Systems Sameer Ajmani Google Barbara Liskov MIT Liuba Shrira Brandeis

Disallowing Constraint: never disallow methods of the current object Future SO may disallow Tnew methods Past SO may disallow Told methods In either case, disallow Mutators whose shadows are problem Observers that expose problems

Some shadows cannot be implemented via delegation Disallow methods that have unimplementable shadows Add shadow method to delegate via dynamic updating Allowed iff T + shadow is a subtype of T E.g., can’t add delete() to GrowSet Implement shadow method in interceptor Impacts transform function Won’t work for past SOs because of retirement

What does Google do? Extensible protocols Assume defaults for missing fields Ignore unexpected fields Round-robin upgrades among replicas Datacenter-by-datacenter Extensible protocols are not perfect, e.g., adding do_not_overwrite bit to Store(key, value) method: Assuming bit is true is correct, but ignoring this bit is dangerous.

END OF SLIDES The remaining slides are leftover from previous talks and may contain stale information

Code Execution Call contains version number Called node dispatches F5 Delete this slide (move after last slide). v5 calls

Upstart A system that supports upgrades And a methodology Joint work with Barbara Liskov Liuba Shrira

Class Upgrade New and old classes Cnew, Cold Scheduling function SF Implement Tnew, Told Scheduling function SF Transform function TF Simulation classes Snew, Sold Might be incompatible Tnew is not a subtype of Told

Class Upgrade Replaces an old class, Cold with a new one, Cnew Every node running old class will switch to new class eventually Upgrade is a set of class upgrades

Defining an Upgrade Upgrader enters new upgrade at UDB nodes . . . Upgrader enters new upgrade at UDB Defines a new version

Propagating an Upgrade UDB nodes . . . Nodes query the UDB periodically Version numbers flow on all messages

Executing an Upgrade If upgrade affects the node Runs the SF And simulates the future Shuts down, restarts, runs TF Starts up “normally” And simulates the past

Disallowing Example GrowSet  IntSet For the future SO: Disallow IntSet.delete For the past SO: Disallow GrowSet.isIn Told&new becomes <Tfuture , Tpast>

What Disallowing Provides Tfuture is a subtype of Told And it implements Tnew Tpast is a subtype of Tnew And it implements Told

Transform Functions Implement the identity map May need to use future SO, create past SO Must be restartable Cannot make remote calls

Scheduling Functions Can consult the UDB Examples: Rolling upgrade Big flip Fast reboot

Implementing Upgrades Need to provide SOs, TF, SF For the SOs, need an implementation model See paper for information on TFs and SFs

Summary of Specifications Specification defines the compound type Told&new I, MF, and the shadows If the compound type isn’t a subtype, disallow

Specifications: Shadow Methods Shadow methods relate behavior Told.m  Tnew.$m Tnew.p Told.$p Oold Onew m $m e.g., FlavorSet.$insertColor

Talk Outline Upgrade requirements Upstart overview Specifying upgrades Implementing upgrades

Requirement: Generality Support for arbitrary changes Incompatible upgrades Old features are no longer supported

Requirement: Continuous Availability Service is required 24/7 Even when upgrading Therefore systems upgrade gradually Implies mixed-mode operation

Requirement: Controlled Deployment Systems upgrade gradually But with control Manual control is impractical An automatic system But upgrader needs control

Requirement: Persistence Systems store important state for users It cannot be lost But may need to be transformed

Requirement: Ease of Use Avoiding feature creep helps Upgrader needs to understand only a few recent versions