Pin : Building Customized Program Analysis Tools with Dynamic Instrumentation Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff.

Slides:



Advertisements
Similar presentations
Profiler In software engineering, profiling ("program profiling", "software profiling") is a form of dynamic program analysis that measures, for example,
Advertisements

Instrumentation of Linux Programs with Pin Robert Cohn & C-K Luk Platform Technology & Architecture Development Enterprise Platform Group Intel Corporation.
Evaluating Indirect Branch Handling Mechanisms in Software Dynamic Translation Systems Jason D. Hiser, Daniel Williams, Wei Hu, Jack W. Davidson, Jason.
1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,
CPE 731 Advanced Computer Architecture Instruction Level Parallelism Part I Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
CSI 3120, Implementing subprograms, page 1 Implementing subprograms The environment in block-structured languages The structure of the activation stack.
Comprehensive Kernel Instrumentation via Dynamic Binary Translation Peter Feiner, Angela Demke Brown, Ashvin Goel University of Toronto Presenter: Chuong.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
SuperPin: Parallelizing Dynamic Instrumentation for Real-Time Performance Steven Wallace and Kim Hazelwood.
The PinPoints Toolkit for Finding Representative Regions of Large Programs Harish Patil Platform Technology & Architecture Development Enterprise Platform.
Chapter 9 Subprogram Control Consider program as a tree- –Each parent calls (transfers control to) child –Parent resumes when child completes –Copy rule.
Intermediate code generation. Code Generation Create linear representation of program Result can be machine code, assembly code, code for an abstract.
1 Storage Registers vs. memory Access to registers is much faster than access to memory Goal: store as much data as possible in registers Limitations/considerations:
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
1 Handling nested procedures Method 1 : static (access) links –Reference to the frame of the lexically enclosing procedure –Static chains of such links.
Cpeg421-08S/final-review1 Course Review Tom St. John.
EECC722 - Shaaban #1 Lec # 10 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.
Run time vs. Compile time
San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation.
PZ09A Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ09A - Activation records Programming Language Design.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
EECC722 - Shaaban #1 Lec # 9 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.
University of Colorado
Qin Zhao (MIT) Derek Bruening (VMware) Saman Amarasinghe (MIT) Umbra: Efficient and Scalable Memory Shadowing CGO 2010, Toronto, Canada April 26, 2010.
Fast Dynamic Binary Translation for the Kernel Piyus Kedia and Sorav Bansal IIT Delhi.
Pin2 Tutorial1 Pin Tutorial Kim Hazelwood Robert Muth VSSAD Group, Intel.
Fast, Effective Code Generation in a Just-In-Time Java Compiler Rejin P. James & Roshan C. Subudhi CSE Department USC, Columbia.
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
1 Dimension: An Instrumentation Tool for Virtual Execution Environments Jing Yang, Shukang Zhou and Mary Lou Soffa Department of Computer Science University.
Programmer's view on Computer Architecture by Istvan Haller.
KEVIN COOGAN, GEN LU, SAUMYA DEBRAY DEPARTMENT OF COMUPUTER SCIENCE UNIVERSITY OF ARIZONA 報告者:張逸文 Deobfuscation of Virtualization- Obfuscated Software.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 10, 10/30/2003 Prof. Roy Levow.
Research supported by IBM CAS, NSERC, CITO Context Threading: A flexible and efficient dispatch technique for virtual machine interpreters Marc Berndl.
1 Instrumentation of Intel® Itanium® Linux* Programs with Pin download: Robert Cohn MMDC Intel * Other names and brands.
PMaC Performance Modeling and Characterization A Static Binary Instrumentation Threading Model for Fast Memory Trace Collection Michael Laurenzano 1, Joshua.
1/36 by Martin Labrecque How to Fake 1000 Registers Oehmke, Binkert, Mudge, Reinhart to appear in Micro 2005.
Introducing The IA-64 Architecture - Kalyan Gopavarapu - Kalyan Gopavarapu.
Activation Records (in Tiger) CS 471 October 24, 2007.
Instrumentation in Software Dynamic Translators for Self-Managed Systems Bruce R. Childers Naveen Kumar, Jonathan Misurda and Mary.
Scalable Support for Multithreaded Applications on Dynamic Binary Instrumentation Systems Kim Hazelwood Greg Lueck Robert Cohn.
Day 2: Building Process Virtualization Systems Kim Hazelwood ACACES Summer School July 2009.
RUN-Time Organization Compiler phase— Before writing a code generator, we must decide how to marshal the resources of the target machine (instructions,
Limits of Instruction-Level Parallelism Presentation by: Robert Duckles CSE 520 Paper being presented: Limits of Instruction-Level Parallelism David W.
Lecture 04: Instruction Set Principles Kai Bu
JIT Instrumentation – A Novel Approach To Dynamically Instrument Operating Systems Marek Olszewski Keir Mierle Adam Czajkowski Angela Demke Brown University.
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
Efficient Software Based Fault Isolation Author: Robert Wahobe,Steven Lucco,Thomas E Anderson, Susan L Graham Presenter: Maitree kanungo Date:02/17/2010.
Efficient software-based fault isolation Robert Wahbe, Steven Lucco, Thomas Anderson & Susan Graham Presented by: Stelian Coros.
1 JIFL: JIT Instrumentation Framework for Linux Marek Olszewski Adam Czajkowski Keir Mierle University of Toronto.
© 2006 Andrew R. BernatMarch 2006Generalized Code Relocation Generalized Code Relocation for Instrumentation and Efficiency Andrew R. Bernat University.
Assembly Language Co-Routines
1 ROGUE Dynamic Optimization Framework Using Pin Vijay Janapa Reddi PhD. Candidate - Electrical And Computer Engineering University of Colorado at Boulder.
LECTURE 19 Subroutines and Parameter Passing. ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments.
Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore
PINTOS: An Execution Phase Based Optimization and Simulation Tool) PINTOS: An Execution Phase Based Optimization and Simulation Tool) Wei Hsu, Jinpyo Kim,
Smalltalk Implementation Harry Porter, October 2009 Smalltalk Implementation: Optimization Techniques Prof. Harry Porter Portland State University 1.
Henk Corporaal TUEindhoven 2009
For Example: User level quicksort program Three address code.
Inlining and Devirtualization Hal Perkins Autumn 2011
Henk Corporaal TUEindhoven 2011
PZ09A - Activation records
Activation records Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Languages and Compilers (SProg og Oversættere)
How to improve (decrease) CPI
Lecture 4: Instruction Set Design/Pipelining
Structure of Processes
Procedure Linkages Standard procedure linkage Procedure has
Dynamic Binary Translators and Instrumenters
Activation records Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Presentation transcript:

Pin : Building Customized Program Analysis Tools with Dynamic Instrumentation Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, Kim Hazelwood

What is Pin ? Dynamic binary instrumentation system. JIT compiler. Pintools for writing instrumentation routines. Rich API for pintools. Call-based model of instrumentation.

Design goals Transparency  Application observes same addresses (code/data) and values (register/memory). Ease-of-use  Architecture knowledge not required.  Manual inlining of instrumentation instructions not required.  Manual save/restore of architectural state not required. Portability  Architecture independent API for pintools. Efficiency  Optimized instrumentation. Robustness  Handle binaries with mixed code and data.  Handle variable length instructions. Process attaching/detaching Support instrumentation at instruction/basic block/routine levels.

System Overview

Instrumentation with Pin Attach to process using ptrace. Intercept execution of first/next instruction. Loop until process terminate or detach from process  Generate new code(trace) for straight-line code sequence starting from instruction.  Insert calls to instrumentation routines into jitted trace.  Trace stored in code cache and executed.  Branche(s) in trace transfer control back to Pin.  Repeat starting with branch target instruction.

Trace code management Software based cache:  entryIaddr : original instruction address of trace entry.  entrySct: static context of trace. Register bindings. Recent call sites (call stack).  Two traces are compatible if they have same entryIaddr and same entrySct or only register binding differences.  JIT generates new trace only if no compatible trace exists in code cache. Hash table:  Trace entry address.  Trace entry liveness information.

Support for multithreaded applications Thread local storage for virtual register spilling. Pin steals physical register(%ebx,%r7) as pointer to spill area. Application is assumed to be single threaded until thread-create syscall is intercepted. Spill area accessed using absolute addressing for single threaded application.

Optimized Instrumentation Trace linking Register re-allocation Inlining X86 eflags liveness analysis Instruction scheduling

Trace Linking Branch directly from trace exit to target trace. Trivial for direct branches but difficult for indirect branches. Optimization techniques  Target prediction.  Per indirect jump hashtable.  Function cloning for returns using call stack.

Register re-allocation Obtain registers for JIT without overwriting application’s scratch registers. Interprocedural register allocation. Register liveness analysis. Reconciliation of register bindings.

Other instrumentation optimizations Inline analysis routines  Avoid call/return to/from bridge routine.  Avoid call/return to/from analysis routine.  Rename caller-save registers, avoid explicit save/restore. x86 eflags liveness analysis  Avoid save/restore of dead eflags. Pintool API (IPOINT_ANYWHERE)  Schedule analysis routine to avoid save/restore of eflags.

Experimental Evaluation IA32, EM64T, Itanium, and ARM ports. Instrumentation optimizations. Comparison with Valgrind and DynamoRIO.  Performance without instrumentation.  Performance with basic block counting instrumentation.

Sample Pintools Opcodemix.  Determine dynamic mix of opcode of execution.  Useful for architectural and compiler comparison studies. PinPoints.  Automated collection and validation of representative instruction traces.

Questions and Discussions