Presentation is loading. Please wait.

Presentation is loading. Please wait.

Recovery Management in QuickSilver Roger Haskin, Yoni Malachi, Wayne Sawdon, and Gregory Chan IBM Almaden Research Center.

Similar presentations


Presentation on theme: "Recovery Management in QuickSilver Roger Haskin, Yoni Malachi, Wayne Sawdon, and Gregory Chan IBM Almaden Research Center."— Presentation transcript:

1 Recovery Management in QuickSilver Roger Haskin, Yoni Malachi, Wayne Sawdon, and Gregory Chan IBM Almaden Research Center

2 Introduction: Problem Domain Recovery management in distributed OSs Trends in contemporary research:  Extensibility and Distribution

3 Contemporary Recovery Techniques timeouts  how to distinguish slow from dead? connectionless protocols / stateless servers  some actions can’t be made idempotent  retries can cause problems virtual circuits  can’t handle multiple servers replication  too expensive for some uses  how to detect failures?

4 Quicksilver: what’s so special? Fundamental Trade-Off:  Generality & efficiency vs. Ease of use (Quicksilver) (Camelot, Argus, etc.) Transparency isn’t always best!

5 Quicksilver: specs and features Client-server model System services are processes IPC message-passing More complicated set of failure modes (to handle more specific cases) Atomic transactions

6 Server Classes Common server classes:  Volatile (window manager)  Replicated + volatile (name server)  Recoverable (file server)  Long running transactions need log support

7 Design Goals Programs should be resilient to external process and machine failure Server processes should contain their own recovery code Uniform system-wide architecture for recovery management Logically related activities must execute atomically

8 Transaction Structure Everything belongs to a transaction Globally unique transaction identifiers (tid) Each transaction has one owner and multiple participants  Owner can commit or abort  Participants can only abort

9 Recovery Manager: Components Transaction Manager: manages commit coordination by communicating with servers at its own node and with transaction managers at other nodes Log Manager: serves as a common recovery log both for the TM’s commit log and the server’s recovery data Deadlock Detector: detects and resolves global deadlocks (not implemented)

10 Quicksilver System Structure

11 Transaction Manager Tracks transactions for processes on host Manages distributed commit protocol Distributed transaction is a tree  Only need to know your superior and your immediate subordinates Several alternative commit protocols available to servers  1-phase – used by volatile servers  2-phase – used by recoverable servers

12 2-Phase Commit Voting options  abort: undo my action, announce abort to others in 2 nd phase  commit-read-only: no recoverable resources modified, don’t include me in 2 nd phase  commit-volatile: same as read-only, but notify me of results of 2 nd phase  commit-recoverable: recoverable state modified, notify me of results of 2 nd phase

13 Transaction Coordination Transaction coordinator at transaction birth-site  Usually a user workstation, likely to fail  Migrate or replicate coordinator for reliability

14 Log Manager Log manager provides optional services  Backpointers for log replay  Block I/O access  Log replication  Log archival Servers tell LM what they need  Not penalized for services they don’t use LM does not interpret data – servers determine recovery strategy

15 Quicksilver Distributed IPC

16 Structure of a Distributed Transaction

17 Open questions - ??? Efficiency vs. Transparency? Still relevant for today’s hardware? …


Download ppt "Recovery Management in QuickSilver Roger Haskin, Yoni Malachi, Wayne Sawdon, and Gregory Chan IBM Almaden Research Center."

Similar presentations


Ads by Google