Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

Slides:



Advertisements
Similar presentations
Remus: High Availability via Asynchronous Virtual Machine Replication
Advertisements

Debugging operating systems with time-traveling virtual machines Sam King George Dunlap Peter Chen CoVirt Project, University of Michigan.
SE-292: High Performance Computing
Virtualization Technology
Full-System Timing-First Simulation Carl J. Mauer Mark D. Hill and David A. Wood Computer Sciences Department University of Wisconsin—Madison.
CS533 Concepts of Operating Systems Class 14 Virtualization and Exokernels.
Department of Computer Science iGPU: Exception Support and Speculative Execution on GPUs Jaikrishnan Menon, Marc de Kruijf Karthikeyan Sankaralingam Vertical.
XEN AND THE ART OF VIRTUALIZATION Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, lan Pratt, Andrew Warfield.
Anh M. Nguyen, Nabil Schear, HeeDong Jung, Apeksha Godiyal, Samuel T.King University of Illionis at Urbana-Champaign Hai D. Nguyen Hanoi University of.
CS530 Operating System Nesting Paging in VM Replay for MPs Jaehyuk Huh Computer Science, KAIST.
Bart Miller. Outline Definition and goals Paravirtualization System Architecture The Virtual Machine Interface Memory Management CPU Device I/O Network,
Virtual Machines What Why How Powerpoint?. What is a Virtual Machine? A Piece of software that emulates hardware.  Might emulate the I/O devices  Might.
Introduction to Operating Systems CS-2301 B-term Introduction to Operating Systems CS-2301, System Programming for Non-majors (Slides include materials.
G Robert Grimm New York University Disco.
Deterministic Logging/Replaying of Applications. Motivation Run-time framework goals –Collect a complete trace of a program’s user-mode execution –Keep.
CS 300 – Lecture 22 Intro to Computer Architecture / Assembly Language Virtual Memory.
Disco Running Commodity Operating Systems on Scalable Multiprocessors.
Dongyoon Lee, Benjamin Wester, Kaushik Veeraraghavan, Satish Narayanasamy, Peter M. Chen, and Jason Flinn University of Michigan, Ann Arbor Respec: Efficient.
1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, and Mendel Rosenblum, Stanford University, 1997.
DoublePlay: Parallelizing Sequential Logging and Replay Kaushik Veeraraghavan Dongyoon Lee, Benjamin Wester, Jessica Ouyang, Peter M. Chen, Jason Flinn,
1 ExtraVirt: Detecting and recovering from transient processor faults Dominic Lucchetti, Steve Reinhardt, Peter Chen University of Michigan.
DTHREADS: Efficient Deterministic Multithreading
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
Virtual Machines. Virtualization Virtualization deals with “extending or replacing an existing interface so as to mimic the behavior of another system”
Virtualization for Cloud Computing
Virtual Machine Monitors CSE451 Andrew Whitaker. Hardware Virtualization Running multiple operating systems on a single physical machine Examples:  VMWare,
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
CSE598C Virtual Machines and Their Applications Operating System Support for Virtual Machines Coauthored by Samuel T. King, George W. Dunlap and Peter.
Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs A. Nistor, D. Marinov and J. Torellas to appear.
Tanenbaum 8.3 See references
Samsara: Efficient Deterministic Replay with Hardware Virtualization Extensions Peking University Shiru Ren, Chunqi Li, Le Tan, and Zhen Xiao July 27 ,
Zen and the Art of Virtualization Paul Barham, et al. University of Cambridge, Microsoft Research Cambridge Published by ACM SOSP’03 Presented by Tina.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Jonathan Walpole (based on a slide set from Vidhya Sivasankaran)
CS533 Concepts of Operating Systems Jonathan Walpole.
ICOM Noack Operating Systems - Administrivia Prontuario - Please time-share and ask questions Info is in my homepage amadeus/~noack/ Make bookmark.
Xen I/O Overview. Xen is a popular open-source x86 virtual machine monitor – full-virtualization – para-virtualization para-virtualization as a more efficient.
Virtualization Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation is licensed.
IntroductionSnoopingDirectoryConclusion IntroductionSnoopingDirectoryConclusion Memory 1A 2B 3C 4D 5E Cache 1 1A 2B 3C Cache 2 3C 4D 5E Cache 4 1A 2B.
- 1 - Dongyoon Lee, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan, Ann Arbor Chimera: Hybrid Program Analysis for Determinism * Chimera.
The Best of Both Worlds with On-Demand Virtualization Thawan Kooburat and Michael M. Swift On-Demand Virtualization allows systems to benefit from virtualization.
- 1 - Dongyoon Lee †, Mahmoud Said*, Satish Narayanasamy †, Zijiang James Yang*, and Cristiano L. Pereira ‡ University of Michigan, Ann Arbor † Western.
1 COMPSCI 110 Operating Systems Who - Introductions How - Policies and Administrative Details Why - Objectives and Expectations What - Our Topic: Operating.
Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.
CS533 Concepts of Operating Systems Jonathan Walpole.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard et al. Madhura S Rama.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.
Seminar of “Virtual Machines” Course Mohammad Mahdizadeh SM. University of Science and Technology Mazandaran-Babol January 2010.
Introduction to virtualization
Modeling Virtualized Environments in Simalytic ® Models by Computing Missing Service Demand Parameters CMG2009 Paper 9103, December 11, 2009 Dr. Tim R.
Full and Para Virtualization
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
DATA COMPROMISE Controlling the flow of sensitive electronic information remains a major challenge, ranging from theft to accidental violation of policies.
Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.
CSE 451: Operating Systems Winter 2015 Module 25 Virtual Machine Monitors Mark Zbikowski Allen Center 476 © 2013 Gribble, Lazowska,
Execution Replay and Debugging. Contents Introduction Parallel program: set of co-operating processes Co-operation using –shared variables –message passing.
Flashback : A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging Sudarshan M. Srinivasan, Srikanth Kandula, Christopher.
Virtualizing a Multiprocessor Machine on a Network of Computers Easy & efficient utilization of distributed resources Goal Kenji KanedaYoshihiro OyamaAkinori.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Kendo: Efficient Deterministic Multithreading in Software M. Olszewski, J. Ansel, S. Amarasinghe MIT to be presented in ASPLOS 2009 slides by Evangelos.
Virtual Machines (part 2) CPS210 Spring Papers  Xen and the Art of Virtualization  Paul Barham  ReVirt: Enabling Intrusion Analysis through Virtual.
Virtual Machine Monitors
COMPSCI 110 Operating Systems
Presented by Yoon-Soo Lee
OS Virtualization.
By Dunlap, King, Cinar, Basrai, Chen
CS 140 Lecture Notes: Virtual Machines
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
Presentation transcript:

Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen

Big ideas Detection and replay of memory races is possible on commodity hardware Overhead high for some workloads …but surprisingly low for other workloads

Execution Replay CPU Memory Disk Network Keyboard, mouse Interrupts

Uses of Execution Replay Reconstructing state –Fault tolerance Reconstructing execution –Debugging –Realistic trace generation Both –Intrusion analysis

Single-processor Replay Basic principles well understood –Log all non-deterministic inputs –Timing of asynchronous events Minimal overhead (Dunlap02) –13% worst case –Log for months or years Available commercially –VMWare: Record/Replay

Replay for Multiprocessors Memory races in multiprocessor VMs The Ordering Requirement The CREW Protocol –Implementing with page protections –Relation to the Ordering Requirement –Generating constrants from CREW events DMA-capable devices and CREW Performance

The Multiprocessor Challenge Interleaved reads and writes –Fine-grained non-determinism –Much more difficult Existing solutions –Hardware modification –Software instrumentation SMP-ReVirt –Hardware MMU to detect sharing

Multiprocessor Replay P2 Memory P1 P2 n=3 n=5 if (n<4)

Ordering Memory Accesses Preserving order will reproduce execution –a→b: “a happens-before b” –Ordering is transitive: a→b, b→c means a→c Two instructions must be ordered if: –they both access the same memory, and –one of them is a write

Constraints: Enforcing order To guarantee a→d: –a→d–a→d –b→d–b→d –a→c–a→c –b→c–b→c Suppose we need b→c –b→c is necessary –a→d is redundant P1 a b c d P2 overconstrained

CREW Protocol Each shared object in one of two states: –Concurrent-Read: all processors can read, none can write –Exclusive-Write: one processor (the owner) can read and write; others have no access

CREW protocol, con’t Enforced with hardware MMU –Read/write –Read-only –None Change CREW states on demand –Fault, fixup, re-execute CREW event –Increasing or reducing permission due to CREW state changes

CREW Property If two instructions on different processors: –access the same page, –and one of them is a write, –there will be a CREW event on each processor between them.

Generating Constraints State: Concurrent Read –All processors read-only d*: CREW fault New state: P2 Exclusive r: privilege reduction –Read to None i: privilege increase –Read to Read/write Log timing of r and i Constraint: –r → i P1 a d P2 r i d*

Direct Memory Access Device accesses memory directly Logically another processor –Reads and writes need to be ordered –IOMMU: can’t fault/fixup/re-execute Observation: Transaction model Device: non-preemptible actor

Prototype: SMP-ReVirt Modified Xen hypervisor Implement logging, CREW protocol Details in paper

Evaluation questions What is the overhead? What affects performance? –In paper When might I want to use MP? –Log with 1, 2, or N cpus?

Evaluation Workloads SPLASH2 parallel application suite –FMM, LU, ocean, radix, water-spatial, radiosity Kernel-build Dbench

Predicting results Key changes in sharing attributes –4096-byte sharing granularity –“Miss” is very expensive SPLASH2 –Good: high spatial locality / low false sharing –Bad: random access patterns / high false sharing The Linux kernel –Tuned to 16-byte cacheline –Involving the kernel may be expensive

Single-processor Xen guests

Log Growth Rate WorkloadLog growth(GB/day)Days to fill 300GB FMM LU Ocean Radix Water-spatial Kernel-build Radiosity Dbench

2-processor Xen guests

2-processor, con’t

Log Growth Rate WorkloadLog growth(GB/day)Days to fill 300GB FMM LU Ocean Radix Water-spatial Kernel-build Radiosity Dbench

4-processor Xen guests

Recap Memory races in multiprocessor VMs The Ordering Requirement The CREW Protocol –Implementing with page protections –Relation to the Ordering Requirement –Generating constrants from CREW events DMA-capable devices and CREW Performance

Big ideas Detection and replay of memory races is possible on commodity hardware Overhead high for some workloads …but surprisingly low for other workloads

Questions