Recording Inter-Thread Data Dependencies for Deterministic Replay Tarun GoyalKevin WaughArvind Gopalakrishnan.

Slides:



Advertisements
Similar presentations
Debugging operating systems with time-traveling virtual machines Sam King George Dunlap Peter Chen CoVirt Project, University of Michigan.
Advertisements

Accessing I/O Devices Processor Memory BUS I/O Device 1 I/O Device 2.
RTR: 1 Byte/Kilo-Instruction Race Recording Min Xu Rastislav BodikMark D. Hill.
Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
SE-292 High Performance Computing
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Gwendolyn Voskuilen, Faraz Ahmad, and T. N. Vijaykumar Electrical & Computer Engineering ISCA 2010.
Chapter 1 Computer System Overview Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
Continuously Recording Program Execution for Deterministic Replay Debugging.
October 2003 What Does the Future Hold for Parallel Languages A Computer Architect’s Perspective Josep Torrellas University of Illinois
Architectural Support for Operating Systems. Announcements Most office hours are finalized Assignments up every Wednesday, due next week CS 415 section.
Deterministic Logging/Replaying of Applications. Motivation Run-time framework goals –Collect a complete trace of a program’s user-mode execution –Keep.
1: Operating Systems Overview
BugNet Continuously Recording Program Execution for Deterministic Replay Debugging Satish Narayanasamy Gilles Pokam Brad Calder.
Architectural Considerations for CPU and Network Interface Integration C. D. Cranor; R. Gopalakrishnan; P. Z. Onufryk IEEE Micro Volume: 201, Jan.-Feb.
1 Computer System Overview OS-1 Course AA
Chapter 1 and 2 Computer System and Operating System Overview
Figure 1.1 Interaction between applications and the operating system.
Chapter 1 and 2 Computer System and Operating System Overview
1 COMP541 Interrupts, DMA, Serial I/O Montek Singh April 24, 2007.
Computer System Overview Chapter 1. Basic computer structure CPU Memory memory bus I/O bus diskNet interface.
A “Flight Data Recorder” for Enabling Full-system Multiprocessor Deterministic Replay Min Xu, Rastislav Bodik, Mark D. Hill
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview: Using Hardware.
A “Flight Data Recorder” for Enabling Full-system Multiprocessor Deterministic Replay Min Xu, Rastislav Bodik, Mark D. Hill
Chapter 1 Computer System Overview Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
Chapter 1 Computer System Overview Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
- 1 - Dongyoon Lee †, Mahmoud Said*, Satish Narayanasamy †, Zijiang James Yang*, and Cristiano L. Pereira ‡ University of Michigan, Ann Arbor † Western.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
OSes: 11. FS Impl. 1 Operating Systems v Objectives –discuss file storage and access on secondary storage (a hard disk) Certificate Program in Software.
Replay Compilation: Improving Debuggability of a Just-in Time Complier Presenter: Jun Tao.
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
Rerun: Exploiting Episodes for Lightweight Memory Race Recording Derek R. Hower and Mark D. Hill Computer systems complex – more so with multicore What.
Accessing I/O Devices Processor Memory BUS I/O Device 1 I/O Device 2.
Virtual Memory 1 1.
On-Demand Dynamic Software Analysis Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 12,
Computer Architecture Lecture 32 Fasih ur Rehman.
A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording (ASLPOS’06) Min Xu Rastislav BodikMark D. Hill Shimin Chen LBA Reading Group Presentation.
Review °Apply Principle of Locality Recursively °Manage memory to disk? Treat as cache Included protection as bonus, now critical Use Page Table of mappings.
Computer Architecture Lecture 27 Fasih ur Rehman.
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
CS.305 Computer Architecture Memory: Virtual Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Processor Memory Processor-memory bus I/O Device Bus Adapter I/O Device I/O Device Bus Adapter I/O Device I/O Device Expansion bus I/O Bus.
Execution Replay and Debugging. Contents Introduction Parallel program: set of co-operating processes Co-operation using –shared variables –message passing.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
CS161 – Design and Architecture of Computer

Chapter 1 Computer System Overview
Rerun: Exploiting Episodes for Lightweight Memory Race Recording
Memory COMPUTER ARCHITECTURE
Lecture: Large Caches, Virtual Memory
CS161 – Design and Architecture of Computer
Virtual Memory - Part II
Improving Memory Access 1/3 The Cache and Virtual Memory
CSC 4250 Computer Architectures
5.2 Eleven Advanced Optimizations of Cache Performance
Cache Memory Presentation I
Computer Architecture
The University of Adelaide, School of Computer Science
CMSC 611: Advanced Computer Architecture
CS140 – Operating Systems Midterm Review
Operating System Introduction.
Chapter 1 Computer System Overview
CSE 471 Autumn 1998 Virtual memory
Virtual Memory 1 1.
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

Recording Inter-Thread Data Dependencies for Deterministic Replay Tarun GoyalKevin WaughArvind Gopalakrishnan

Debugging Multi – Threading Programs Debuggers – always helpfulAim of discussion Deterministic replay of multi processor execution Record non deterministic events, specially memory races Flight Data Recorder (FDR)Bug – NetStrata

FDR -- Approach Deterministic re-players and data race detectors exist FDR – Records operating system and I/O issues

FDR -- Assumptions Sequential ConsistencyDirectory based schemeCache size is same as memory

FDR -- Kinds of logs 3 kinds to meet performance, space and complexity requirements To restore consistent state  logs old memory on updates – checkpoints and logging Record outcome of races  assumes SC and records subset (implied races omitted) Record system I/O  logs interrupt timing and treats device interfaces as pseudo processors. Has low time space overhead – continuously enabled

Recording Races Necessary to log non deterministic thread interleaving – outcomes of races Question? – how much… solution in memory model – here SC Record arcs – order pairs of dynamic instructions – not all Time stamps of cached blocks stored – missing timestamps approx

FDR Issues and Optimizations Log Size – Regulated Transitive Reduction – judiciously log strict vector dependencies Hardware Cost – false races – approx on LRU in associative set – 24KB per core Simpler Design – take timestamps out of the cache TSO Model – avoids replay deadlocks of SC – additional info of load values

BugNet:Net the Bug Architecture support for Deterministic Replay Debugging.Focus on replay of user code and shared libraries.Built, improving on the ideas of FDR Claim to be viable for use with software development (application).

Archtecture Overview Checkpoint based recording Check Point Interval snapshots CP buffer (PC+Reg Map) Observe the Loads done by threads to trace the complete execution Intial Register Values in a CP The Trace of the loads Tracking loads works in spite of interrupts,DMA transfers and other threads writing to shared memory. Load Bits in cache Reduce multiple loads/log size. Updates stores from external events FLL and MRBDictionary based compression For log data

FDR vs BugNet FDR Features include tracking I/O, Interrupts, DMA accesses. Extra Hardware and log size overhead BugNet Focus on application level S/W debugging, simpler scheme. Smaller in terms of Hardware and Log Size

Assumptions/Limitations Assumes a sequential consistency memory model Wont help in finding bugs which are caused by interactions with the OS and other system code. Question usability in mainstream systems. For debugging user level applications, software based recording more viable?

Strata – Logging Shared Memory Dependencies Record memory counts on a dependencyHardware/cache-based scheme Assumes sequential consistency Dictionary and Snoopy cache consistency Drop-in replacement for Netzer’s scheme Smaller log size Less computation to create log More complicated replay Narayanasamyet. al. ASPLOS06

Strata cont. Lowresource overhead 12% bandwidth on Dictionary Scheme ~0% bandwidth on Snoopy Scheme Scales linearly with number of threads Each stratum holds one word per threads Potentially worse than Netzer’s scheme

Concerns and Criticisms All systems are require hardware Significant resource overhead Software would be slower, but still useful Consistency models restrictive Exclude commodity hardware (x86) Encourages sloppy programming Users != Testers