R2: An application-level kernel for record and replay Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, Z. Zhang, (MSR Asia, Tsinghua, MIT),

Slides:



Advertisements
Similar presentations
Processes Management.
Advertisements

More on Processes Chapter 3. Process image _the physical representation of a process in the OS _an address space consisting of code, data and stack segments.
MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
Lightweight Remote Procedure Call BRIAN N. BERSHAD THOMAS E. ANDERSON EDWARD D. LAZOWSKA HENRY M. LEVY Presented by Wen Sun.
Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented by Alana Sweat.
CMPT 300: Operating Systems I Dr. Mohamed Hefeeda
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
1 School of Computing Science Simon Fraser University CMPT 300: Operating Systems I Dr. Mohamed Hefeeda.
OS Fall ’ 02 Introduction Operating Systems Fall 2002.
Accurate and Efficient Replaying of File System Traces Nikolai Joukov, TimothyWong, and Erez Zadok Stony Brook University (FAST 2005) USENIX Conference.
Introduction to Kernel
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
OS Spring’03 Introduction Operating Systems Spring 2003.
Chapter 11 Operating Systems
1 I/O Management in Representative Operating Systems.
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
Module 8: Monitoring SQL Server for Performance. Overview Why to Monitor SQL Server Performance Monitoring and Tuning Tools for Monitoring SQL Server.
1 Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska and Henry M. Levy Presented by: Karthika Kothapally.
CS533 Concepts of Operating Systems Class 9 Lightweight Remote Procedure Call (LRPC) Rizal Arryadi.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems David Goldschmidt, Ph.D.
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
Microsoft Research Asia Ming Wu, Haoxiang Lin, Xuezheng Liu, Zhenyu Guo, Huayang Guo, Lidong Zhou, Zheng Zhang MIT Fan Long, Xi Wang, Zhilei Xu.
R2: An Application-Level Kernel for Record and Replay Zhenyu Guo, Xi Wang, Jian Tang, Xuezheng Liu, Zhilei Xu, Ming Wu, M. Frans Kaashoek, and Zheng Zhang.
Segmentation & O/S Input/Output Chapter 4 & 5 Tuesday, April 3, 2007.
Threads, Thread management & Resource Management.
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems CSCI-6140 – Computer Operating Systems David Goldschmidt, Ph.D.
Operating Systems Lecture 2 Processes and Threads Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing Liu School of.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
1 Oracle Architectural Components. 1-2 Objectives Listing the structures involved in connecting a user to an Oracle server Listing the stages in processing.
Windows 2000 Course Summary Computing Department, Lancaster University, UK.
Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.
Replay Compilation: Improving Debuggability of a Just-in Time Complier Presenter: Jun Tao.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Christopher Kruegel University of California Engin Kirda Institute Eurecom Clemens Kolbitsch Thorsten Holz Secure Systems Lab Vienna University of Technology.
G53SEC 1 Reference Monitors Enforcement of Access Control.
® IBM Software Group © 2007 IBM Corporation Best Practices for Session Management
CISC Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.
System Components ● There are three main protected modules of the System  The Hardware Abstraction Layer ● A virtual machine to configure all devices.
Processes and Virtual Memory
CSC 322 Operating Systems Concepts Lecture - 7: by Ahmed Mumtaz Mustehsan Special Thanks To: Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
Threads, Thread management & Resource Management.
LAIO: Lazy Asynchronous I/O For Event Driven Servers Khaled Elmeleegy Alan L. Cox.
Threads. Readings r Silberschatz et al : Chapter 4.
An Efficient Threading Model to Boost Server Performance Anupam Chanda.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
ECE 456 Computer Architecture Lecture #9 – Input/Output Instructor: Dr. Honggang Wang Fall 2013.
Nguyen Thi Thanh Nha HMCL by Roelof Kemp, Nicholas Palmer, Thilo Kielmann, and Henri Bal MOBICASE 2010, LNICST 2012 Cuckoo: A Computation Offloading Framework.
Remigius K Mommsen Fermilab CMS Run 2 Event Building.
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
ITMT 1371 – Window 7 Configuration 1 ITMT Windows 7 Configuration Chapter 8 – Managing and Monitoring Windows 7 Performance.
CSCI/CMPE 4334 Operating Systems Review: Exam 1 1.
Introduction to Operating Systems Concepts
Introduction to threads
Operating Systems {week 01.b}
Module 12: I/O Systems I/O hardware Application I/O Interface
Free Transactions with Rio Vista
Presented by: Daniel Taylor
Processes and threads.
Presented by Yoon-Soo Lee
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Chapter 4: Multithreaded Programming
INTER-PROCESS COMMUNICATION
Heming Cui, Jingyue Wu, John Gallagher, Huayang Guo, Junfeng Yang
CS703 - Advanced Operating Systems
Types of Computers Mainframe/Server
Free Transactions with Rio Vista
Prof. Leonardo Mostarda University of Camerino
System Calls System calls are the user API to the OS
Presentation transcript:

R2: An application-level kernel for record and replay Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, Z. Zhang, (MSR Asia, Tsinghua, MIT), OSDI’08 Shimin Chen LBA Reading Group

What is R2? Library-based record & replay  Intercept calls, record in log, replay from log Novel features:  Allow users (app developers) to decide which interface to do the record and replay  A set of annotations for the interface calls Implementation: Windows  Supports Win32, MPI, and SQLite API

Outline Introduction Design overview Execution orders Annotations for optimization Implementation Evaluation Summary

Choosing an Interface for Record & Replay Must choose a “cut” in the call graph Above interface: executed during record and during replay Below interface: executed during record. Replayed from log.

Isolation Rule RULE 1 (ISOLATION) All instances of unrecorded reads and writes to a variable should be either be below or above the interposed interface.  Isolate variables above the interface and variables below the interface  Can hold for Windows For example, as long as R2 intercepts the complete set of file functions, file descriptors can be recorded

Non-Determinism Rule Any source of non-determinism should be below the interposed interface. Sources of non-determinism: 1. Calls that receive external data 2. Shared memory inter-process communications 3. Shared variables by multiple threads  R2 can handle 1  For 2 and 3, must choose higher-level interface for hiding the effects (e.g., lock and unlock for spinlocks)

Terminology! interposed interface R2 records the output of R2 syscalls, the input of R2 upcalls, and their ordering

Execution Control R2 tracks the state of every thread with a replay/system mode bit  Mode bit is updated when crossing the interface  Recording is avoided for R2 syscalls made from R2 system space When a user invokes R2 with an application 1. R2’s initial state is in system space 2. The “main” is treated as an upcall (recorded, going into replay space)

Memory Management R2 ensures the following in the replay space  malloc/free return the same address R2 replay space uses a dedicated memory pool  Stack locations are the same R2 replay space uses a separate stack per thread R2 system space uses different stacks  R2 syscalls, e.g., getcwd(NULL,0), return memory buffers at the same locations Returned buffer is copied to space allocated from the replay pool

Annotation and Code Generation Developers annotate interface calls. Then R2 can automatically generate stub code for record and replay. Direction: in/out Buffer: bsize(return)  The buffer will be recorded. This example is simple.  If a C++ object is to be recorded, serialization & deserialization should be provided via operator overloading on streams

Annotation for Asynchronous Operation Start asynchronous file read Call back Key to identity the call prepare indicates that ReadFileEx issues an asynchronous I/O request keyed by lpOverlapped; commit indicates the request keyed by lpOverlapped is completed and the transferred data size is cbTransferred.

Outline Introduction Design overview Execution orders Annotations for optimization Implementation Evaluation Summary

How to track execution orders? Tracking causality  R2 syscall – R2 upcall causality: callback See previous example  R2 syscall – R2 syscall causality: sync(key)

Recording Event Order (Lamport Clock) Thread t’s clock c(t); event e’s clock c(e)

Replaying Event Order Total-order recording + total-order replaying  Use a token to serialize execution Causal-order recording + total-order replaying  Before replay, generate a total order based on the causal order recorded

Outline Introduction Design overview Data transfers Execution orders Annotations for optimization Implementation Evaluation Summary

Reducing Log Size for Frequent Calls Some calls (e.g., GetLastError on Windows returns 0 in most cases)  “cache” annotation  R2 will cache the last return value  R2 will avoid recording the return value for subsequent calls until there is a change

Reproduce annotation Some data can be reproduced at replay time without recording  For example, read file data from local disk  Can annotate with “reproduce”  R2 will execute the call during replay

All the Annotations

Outline Introduction Design overview Data transfers Execution orders Defining your own syscalls Annotations for optimization Implementation Evaluation Summary

Detecting Un-recorded Non-Determinism R2 records R2 syscall signature (e.g., name) and checks it during replay Detect mismatch and report

Outline Introduction Design overview Data transfers Execution orders Defining your own syscalls Annotations for optimization Implementation Evaluation Summary

Questions to be Answered: How much effort is required to annotate the syscall/upcall interface? How important are annotations to successful replay of applications? How much does R2 slowdown applications during recording? How effective are custom syscall layers and annotations (cache and reproduce) in reducing log size and optimizing performance? Replay is not evaluated:  “However, the replayed application without any debugging interaction runs much faster than when recording (e.g., a replay run of BitTorrent file downloading is 13x faster).”

Experimental Setup All machines:  2.0 GHz Xeon dual-core CPU,  4 GB memory  two 250 GB, 7200 rpm disks  running Windows Server 2003 Service Pack 2  interconnected via a 1 Gbps switch. Unless explicitly specified:  the application data and R2 log files are kept on the same disk  total-order recording & execution  all optimizations (i.e., cache and reproduce) are turned off.

Annotation Effort The paper says:  500+ Win32 syscall interface: one person-week  MPI and SQLite: each take two person-days

Performance without optimization Apache is configured with 250 threads. ApacheBench mimics 50 concurrent client, downloading 64KB sized web pages. Each configuration executes 500,000 requests.

Customized R2 Syscall Layers Query: compute vertex degrees in a social network: SELECT COUNT(*) FROM edge GROUP BY src_uid; The data set is ~3MB large. FILE / MEM chooses where SQLite stores temporary data

Cache Annotation for Apache Profiling shows that 5 R2 syscalls contribute > 50% of syscalls Using cache annotation reduces the log size from 21.99MB to 18.1MB.

Reproduced File I/O (BitTorrent) 1 machine seeds a 4GB file, upload bandwidth is limited to 8MB/s. 10 machines download the file concurrently. Average log size is reduced from 17.1GB to 5.4GB by reproduce.

Reproduced Network I/O GE and PU are two MPI benchmarks. Annotated MPI functions using reproduce annotation so that the messages are not recorded but reproduced during replay.

Summary Library based record and replay in software Annotation and automatic generation of stub code for record and replay Impressively support many Win32 applications But cannot handle un-recorded non-determinism  e.g., data races in the replay space