Download presentation
Presentation is loading. Please wait.
Published byJoanna Marshall Modified over 9 years ago
1
Recovery Management in QuickSilver Roger Haskin, Yoni Malachi, Wayne Sawdon, and Gregory Chan IBM Almaden Research Center
2
Introduction: Problem Domain Recovery management in distributed OSs Trends in contemporary research: Extensibility and Distribution
3
Contemporary Recovery Techniques timeouts how to distinguish slow from dead? connectionless protocols / stateless servers some actions can’t be made idempotent retries can cause problems virtual circuits can’t handle multiple servers replication too expensive for some uses how to detect failures?
4
Quicksilver: what’s so special? Fundamental Trade-Off: Generality & efficiency vs. Ease of use (Quicksilver) (Camelot, Argus, etc.) Transparency isn’t always best!
5
Quicksilver: specs and features Client-server model System services are processes IPC message-passing More complicated set of failure modes (to handle more specific cases) Atomic transactions
6
Server Classes Common server classes: Volatile (window manager) Replicated + volatile (name server) Recoverable (file server) Long running transactions need log support
7
Design Goals Programs should be resilient to external process and machine failure Server processes should contain their own recovery code Uniform system-wide architecture for recovery management Logically related activities must execute atomically
8
Transaction Structure Everything belongs to a transaction Globally unique transaction identifiers (tid) Each transaction has one owner and multiple participants Owner can commit or abort Participants can only abort
9
Recovery Manager: Components Transaction Manager: manages commit coordination by communicating with servers at its own node and with transaction managers at other nodes Log Manager: serves as a common recovery log both for the TM’s commit log and the server’s recovery data Deadlock Detector: detects and resolves global deadlocks (not implemented)
10
Quicksilver System Structure
11
Transaction Manager Tracks transactions for processes on host Manages distributed commit protocol Distributed transaction is a tree Only need to know your superior and your immediate subordinates Several alternative commit protocols available to servers 1-phase – used by volatile servers 2-phase – used by recoverable servers
12
2-Phase Commit Voting options abort: undo my action, announce abort to others in 2 nd phase commit-read-only: no recoverable resources modified, don’t include me in 2 nd phase commit-volatile: same as read-only, but notify me of results of 2 nd phase commit-recoverable: recoverable state modified, notify me of results of 2 nd phase
13
Transaction Coordination Transaction coordinator at transaction birth-site Usually a user workstation, likely to fail Migrate or replicate coordinator for reliability
14
Log Manager Log manager provides optional services Backpointers for log replay Block I/O access Log replication Log archival Servers tell LM what they need Not penalized for services they don’t use LM does not interpret data – servers determine recovery strategy
15
Quicksilver Distributed IPC
16
Structure of a Distributed Transaction
17
Open questions - ??? Efficiency vs. Transparency? Still relevant for today’s hardware? …
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.