Recovery Management in QuickSilver Roger Haskin, Yoni Malachi, Wayne Sawdon, and Gregory Chan IBM Almaden Research Center.

Recovery Management in QuickSilver Roger Haskin, Yoni Malachi, Wayne Sawdon, and Gregory Chan IBM Almaden Research Center

Introduction: Problem Domain Recovery management in distributed OSs Trends in contemporary research:  Extensibility and Distribution

Contemporary Recovery Techniques timeouts  how to distinguish slow from dead? connectionless protocols / stateless servers  some actions can’t be made idempotent  retries can cause problems virtual circuits  can’t handle multiple servers replication  too expensive for some uses  how to detect failures?

Quicksilver: what’s so special? Fundamental Trade-Off:  Generality & efficiency vs. Ease of use (Quicksilver) (Camelot, Argus, etc.) Transparency isn’t always best!

Quicksilver: specs and features Client-server model System services are processes IPC message-passing More complicated set of failure modes (to handle more specific cases) Atomic transactions

Server Classes Common server classes:  Volatile (window manager)  Replicated + volatile (name server)  Recoverable (file server)  Long running transactions need log support

Design Goals Programs should be resilient to external process and machine failure Server processes should contain their own recovery code Uniform system-wide architecture for recovery management Logically related activities must execute atomically

Transaction Structure Everything belongs to a transaction Globally unique transaction identifiers (tid) Each transaction has one owner and multiple participants  Owner can commit or abort  Participants can only abort

Recovery Manager: Components Transaction Manager: manages commit coordination by communicating with servers at its own node and with transaction managers at other nodes Log Manager: serves as a common recovery log both for the TM’s commit log and the server’s recovery data Deadlock Detector: detects and resolves global deadlocks (not implemented)

Quicksilver System Structure

Transaction Manager Tracks transactions for processes on host Manages distributed commit protocol Distributed transaction is a tree  Only need to know your superior and your immediate subordinates Several alternative commit protocols available to servers  1-phase – used by volatile servers  2-phase – used by recoverable servers

2-Phase Commit Voting options  abort: undo my action, announce abort to others in 2 nd phase  commit-read-only: no recoverable resources modified, don’t include me in 2 nd phase  commit-volatile: same as read-only, but notify me of results of 2 nd phase  commit-recoverable: recoverable state modified, notify me of results of 2 nd phase

Transaction Coordination Transaction coordinator at transaction birth-site  Usually a user workstation, likely to fail  Migrate or replicate coordinator for reliability

Log Manager Log manager provides optional services  Backpointers for log replay  Block I/O access  Log replication  Log archival Servers tell LM what they need  Not penalized for services they don’t use LM does not interpret data – servers determine recovery strategy

Quicksilver Distributed IPC

Structure of a Distributed Transaction

Open questions - ??? Efficiency vs. Transparency? Still relevant for today’s hardware? …

Recovery Management in QuickSilver Roger Haskin, Yoni Malachi, Wayne Sawdon, and Gregory Chan IBM Almaden Research Center.

Similar presentations

Presentation on theme: "Recovery Management in QuickSilver Roger Haskin, Yoni Malachi, Wayne Sawdon, and Gregory Chan IBM Almaden Research Center."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Recovery Management in QuickSilver Roger Haskin, Yoni Malachi, Wayne Sawdon, and Gregory Chan IBM Almaden Research Center.

Similar presentations

Presentation on theme: "Recovery Management in QuickSilver Roger Haskin, Yoni Malachi, Wayne Sawdon, and Gregory Chan IBM Almaden Research Center."— Presentation transcript:

Similar presentations

About project

Feedback