Download presentation
Presentation is loading. Please wait.
1
Upgrading Distributed Systems is not rsync
Sameer Ajmani Barbara Liskov Liuba Shrira MIT Lab for Computer Science Upgrading dist sys is not rsync Nodes must upgrade at different times Nodes running different versions must communicate
2
Large, Long-Lived Distributed Systems Need Automatic Upgrades
e.g., CDNs, peer-to-peer nets, sensor nets Software must change over time No direct operator access Upgrades must propagate automatically Upgrades must avoid disrupting service The last point is key: This isn’t like upgrading a set of workstations with rsync and a cron job. These are systems that need to be highly available and yet need to upgrade.
3
A System Upgrade Each node runs at a particular version
New version information propagates to nodes A node upgrades by shutting down, installing new code, transforming its persistent state, and restarting These systems are robust, so they tolerate node restarts Systems may be heterogeneous, e.g. different kinds of nodes in a sensor net, but nodes running the same version are designed to work together.
4
Upgrades Can Disrupt Service
Version 1 Version 2 A node cannot service requests while upgrading May need to delay a node upgrade To avoid causing too many simultaneous failures To test an upgrade on a few nodes Implies mixed mode operation Delay == scheduling
5
Mixed Mode Operation Version 2 Version 1
In this example, a node that has upgrades attempts to make a call on a node that has not yet upgrades. When an upgrade just improves performance or fixes a bug, this is no big deal. But when an upgrade enhances the capabilities of a node, changes an interface, or deprecates behavior, calls across versions may fail.
6
Simulation Enables Interoperation
Version 1 Node Version 2 Nodes label outgoing calls with their version Nodes dispatch incoming calls to simulation objects Installing a simulation object is much faster than upgrading Both upgraded and non-upgraded nodes use simulation Imperfect simulation may degrade service In the paper, we discuss when simulation works well, and when it doesn’t.
7
Summary Upgrades require scheduling and simulation
Other issues (covered in the paper) Transforms for persistent state Correctness of transforms and simulation An upgrade infrastructure that supports upgrade propagation, scheduling, code installation, transforms, and simulation
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.