Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC.

Slides:



Advertisements
Similar presentations
Computer-System Structures Er.Harsimran Singh
Advertisements

CPU Structure and Function
Computer Architecture
Anshul Kumar, CSE IITD CSL718 : VLIW - Software Driven ILP Hardware Support for Exposing ILP at Compile Time 3rd Apr, 2006.
COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands.
Computer Organization and Architecture
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
Chapter 12 CPU Structure and Function. CPU Sequence Fetch instructions Interpret instructions Fetch data Process data Write data.
Computer Organization and Architecture
Computer Organization and Architecture
Scalable Load and Store Processing in Latency Tolerant Processors Amit Gandhi 1,2 Haitham Akkary 1 Ravi Rajwar 1 Srikanth T. Srinivasan 1 Konrad Lai 1.
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
October 2003 What Does the Future Hold for Parallel Languages A Computer Architect’s Perspective Josep Torrellas University of Illinois
Computer Organization and Architecture The CPU Structure.
Yuanyuan ZhouUIUC-CS Architectural Support for Software Bug Detection Yuanyuan (YY) Zhou and Josep Torrellas University of Illinois at Urbana-Champaign.
LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks Feng Qin, Cheng Wang, Zhenmin Li, Ho-seop Kim, Yuanyuan.
PathExpander: Architectural Support for Increasing the Path Coverage of Dynamic Bug Detection S. Lu, P. Zhou, W. Liu, Y. Zhou, J. Torrellas University.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.
MemTracker Efficient and Programmable Support for Memory Access Monitoring and Debugging Guru Venkataramani, Brandyn Roemer, Yan Solihin, Milos Prvulovic.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
CH12 CPU Structure and Function
Qin Zhao (MIT) Derek Bruening (VMware) Saman Amarasinghe (MIT) Umbra: Efficient and Scalable Memory Shadowing CGO 2010, Toronto, Canada April 26, 2010.
Fast Dynamic Binary Translation for the Kernel Piyus Kedia and Sorav Bansal IIT Delhi.
Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Three fundamental concepts in computer security: Reference Monitors: An access control concept that refers to an abstract machine that mediates all accesses.
Edited By Miss Sarwat Iqbal (FUUAST) Last updated:21/1/13
1 CS/COE0447 Computer Organization & Assembly Language Chapter 5 part 4 Exceptions.
Transmeta and Dynamic Code Optimization Ashwin Bharambe Mahim Mishra Matthew Rosencrantz.
Thread-Level Speculation Karan Singh CS
Virtual Memory Expanding Memory Multiple Concurrent Processes.
ReSlice: Selective Re-execution of Long-retired Misspeculated Instructions Using Forward Slicing Smruti R. Sarangi, Wei Liu, Josep Torrellas, Yuanyuan.
Chapter 4 Memory Management Virtual Memory.
Transmeta’s New Processor Another way to design CPU By Wu Cheng
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
Detecting Atomicity Violations via Access Interleaving Invariants
Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex.
COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands.
Flashback : A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging Sudarshan M. Srinivasan, Srikanth Kandula, Christopher.
Interrupts and Exception Handling. Execution We are quite aware of the Fetch, Execute process of the control unit of the CPU –Fetch and instruction as.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore
Computer Organization CS224
Data Prefetching Smruti R. Sarangi.
William Stallings Computer Organization and Architecture 8th Edition
Section 9: Virtual Memory (VM)
Today How was the midterm review? Lab4 due today.
/ Computer Architecture and Design
Commit out of order Phd student: Adrián Cristal.
15-740/ Computer Architecture Lecture 5: Precise Exceptions
Mengjia Yan† , Jiho Choi† , Dimitrios Skarlatos,
Sampoorani, Sivakumar and Joshua
Data Prefetching Smruti R. Sarangi.
Virtual Memory Overcoming main memory size limitation
Computer Architecture
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSC3050 – Computer Architecture
Lecture 8: Efficient Address Translation
CPU Structure and Function
Chapter 11 Processor Structure and function
Presentation transcript:

Efficient and Flexible Architectural Support for Dynamic Monitoring YUANYUAN ZHOU, PIN ZHOU, FENG QIN, WEI LIU, & JOSEP TORRELLAS UIUC

Outline Background iWatcher Functionality iWatcher Design Performance Conclusion

Static or Dynamic Monitoring? Static Monitoring – Needs annotation, programmer work – Difficult for unsafe languages (C, C++) Dynamic Monitoring – Large instrumentation cost – Significant slowdown, performance loss Dynamic is stronger than Static Monitoring – Dynamic based on actual execution path

Code or Location Controlled Dynamic Monitoring? Code-Controlled Monitoring – Monitoring performed by special instructions – Assertions & Dynamic Checkers belong here – No hardware support needed Location-Controlled Monitoring – Monitoring performed only when program accesses watched memory locations by any way – Hardware support is usually required – iWatcher and hardware-assisted watchpoints

iWatcher Functionality Flexible and low-overhead dynamic monitoring With hardware support – Without expensive exceptions – The program has its own internal light-weight exception handler, the monitoring function When a watched memory address is accessed, the monitoring function is automatically executed.

iWatcher Functionality (cont) If the check of the monitoring action fails, then: – Report, simply report error (non-interactive) – Break, raise a hardware exception, switching control to the debugger – Rollback, revert to a safe checkpoint For the same address, more than one monitors may be watching.

iWatcher – Software Level int x, *p;/* assume invariant: x = 1 */ iWatcherOn(&x, sizeof(int), READWRITE, BreakMode, &MonitorX, &x, 1);... p = foo(); /* a bug: p points to x incorrectly*/ *p = 5; /* line A: a triggering access */ z = Array[x]; /* line B: a triggering access */... iWatcherOff(&x, sizeof(int), READWRITE, &MonitorX); bool MonitorX(int *x, int value){ return (*x == value); }

Modest Hardware Support (?)

How to monitor a location? When iWatcherOn() is called – Add monitoring function to (software) CheckTable – If size < LargeRegion → all words are transferred to L2 cache and tagged update L1 if necessary – If size > LargeRegion → the entire area is tagged in the Range Watch Table (RWT) If RWT full, proceed as if size < LargeRegion

How to monitor a location? (cont) If a word is evicted from L2, store the watch bits (if valid) in Victim WatchFlag Table VWT – If VWT full, O/S support (rare) When the word is restored, copy the watch bits from VWT When iWatcherOff is called: – Remove monitoring function from Check Table – If no monitors are watching this area, update VWT, RWT, L1 and L2 bits as necessary.

How to detect a triggering access? Out of Order Execution, Pipelining → – Not all instructions will commit For each Load/Store – Check if valid entry exists in RWT – Bring word and WatchFlag from cache (load) or prefetch word to cache and get WatchFlag (store) – Store the flags in the ReOrder Buffer (ROB) – Upon retirement of instruction (if it retires), jump to the monitor, if bits are set.

How to Trigger Monitoring Functions? When a triggering access is detected – Save processor status and jump to Main_Check_Function Register – The monitor scans the CheckTable and calls serially all monitors that: Watch this address For this access mode – For performance, the Thread-Level Speculation (TLS) mechanism may be used.

Executing Monitoring Functions

Comparison to Other Approaches

Performance Compared to Valgrind 4-179% overhead, x less than Valgrind

Performance with/without TLS Up to 30% reduction in two cases

Performance varying the fraction of triggering loads and TLS

Performance varying the size of monitoring function and TLS Above 4 contexts there is no significant improvement

Conclusion Some Hardware Changes <180% overhead if 20% of loads are monitored Detects most bugs – Buffer Overflow – Memory Leaks – Access to non-allocated or non-initialized – …