Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 ExtraVirt: Detecting and recovering from transient processor faults Dominic Lucchetti, Steve Reinhardt, Peter Chen University of Michigan.

Similar presentations


Presentation on theme: "1 ExtraVirt: Detecting and recovering from transient processor faults Dominic Lucchetti, Steve Reinhardt, Peter Chen University of Michigan."— Presentation transcript:

1 1 ExtraVirt: Detecting and recovering from transient processor faults Dominic Lucchetti, Steve Reinhardt, Peter Chen University of Michigan

2 2 Flips Happen Similar die area + Decreasing transition energy = Increasing risk of transient failure

3 3 Multi-Processors & Virtual Machine Multi-Processor  Ensure error independence  Enable fault detection  Efficient resource sharing Virtual Machine  No changes to OS or applications  VM replay Synchronize replicas Recover correct state Replica 1Replica 2 Hypervisor Device Drivers Replication Management Layer (RML)

4 4 Example: Memory Copy on write  Reduces overhead  Protects checkpoints Merge on checkpoint  Verify correctness  Re-execute on deviation Memory Fault Protection  ECC against RAM faults  MMU against CPU faults MemoryCheckpointReplica 1CheckpointReplica 2 A B C D E A B C X E A B C E Verify Replica 3 A B C D E

5 5 Status Present  VM Replay  Beginnings of Replication Management Layer (RML)  Still much to do… Future  Replicate the un-replicated  Handle faults in device drivers  Expanded fault model Replica 1Replica 2 Hypervisor/RML Device Drivers

6 6 Questions?


Download ppt "1 ExtraVirt: Detecting and recovering from transient processor faults Dominic Lucchetti, Steve Reinhardt, Peter Chen University of Michigan."

Similar presentations


Ads by Google