Instrumentation of Linux Programs with Pin Robert Cohn & C-K Luk Platform Technology & Architecture Development Enterprise Platform Group Intel Corporation.

Slides:



Advertisements
Similar presentations
Performance Analysis and Optimization through Run-time Simulation and Statistics Philip J. Mucci University Of Tennessee
Advertisements

© Lab for Soft Tech, ETH Zurich, SSW WS03/ Week 46 – Overview Exercise 3: hints 64 bit addressing Exercise 2: solution.
A Binary Agent Technology for COTS Software Integrity Richard Schooler Anant Agarwal InCert Software.
Introducing Compare and Contrast in First Grade
Profiler In software engineering, profiling ("program profiling", "software profiling") is a form of dynamic program analysis that measures, for example,
Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
EZ-COURSEWARE State-of-the-Art Teaching Tools From AMS Teaching Tomorrow’s Technology Today.
1 Enterprise Platforms Group Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation Harish Patil, Robert Cohn,
Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Integrity & Malware Dan Fleck CS469 Security Engineering Some of the slides are modified with permission from Quan Jia. Coming up: Integrity – Who Cares?
Dec 5, 2007University of Virginia1 Efficient Dynamic Tainting using Multiple Cores Yan Huang University of Virginia Dec
C Programming and Assembly Language Janakiraman V – NITK Surathkal 2 nd August 2014.
Pin : Building Customized Program Analysis Tools with Dynamic Instrumentation Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff.
SuperPin: Parallelizing Dynamic Instrumentation for Real-Time Performance Steven Wallace and Kim Hazelwood.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
The PinPoints Toolkit for Finding Representative Regions of Large Programs Harish Patil Platform Technology & Architecture Development Enterprise Platform.
Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.
Pipeline Enhancements for the Y86 Architecture
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation.
ATOM: A System for Building Customized Program Analysis Tools.
1 ICS 51 Introductory Computer Organization Fall 2006 updated: Oct. 2, 2006.
Accessing parameters from the stack and calling functions.
Dynamic Binary Translation
San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation.
University of Colorado
8/16/2015\course\cpeg323-08F\Topics1b.ppt1 A Review of Processor Design Flow.
Pin2 Tutorial1 Pin Tutorial Kim Hazelwood Robert Muth VSSAD Group, Intel.
Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by.
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
1 Dimension: An Instrumentation Tool for Virtual Execution Environments Jing Yang, Shukang Zhou and Mary Lou Soffa Department of Computer Science University.
Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.
Performance Monitoring on the Intel ® Itanium ® 2 Processor CGO’04 Tutorial 3/21/04 CK. Luk Massachusetts Microprocessor Design.
PMaC Performance Modeling and Characterization Performance Modeling and Analysis with PEBIL Michael Laurenzano, Ananta Tiwari, Laura Carrington Performance.
- 1 - Copyright © 2006 Intel Corporation. All Rights Reserved. Using the Pin Instrumentation Tool for Computer Architecture Research Aamer Jaleel, Chi-Keung.
Pin Tutorial Kim Hazelwood David Kaeli Dan Connors Vijay Janapa Reddi.
1 Instrumentation of Intel® Itanium® Linux* Programs with Pin download: Robert Cohn MMDC Intel * Other names and brands.
1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.
PMaC Performance Modeling and Characterization A Static Binary Instrumentation Threading Model for Fast Memory Trace Collection Michael Laurenzano 1, Joshua.
Dynamic Compilation and Modification CS 671 April 15, 2008.
Assembly Code Optimization Techniques for the AMD64 Athlon and Opteron Architectures David Phillips Robert Duckles Cse 520 Spring 2007 Term Project Presentation.
Scalable Support for Multithreaded Applications on Dynamic Binary Instrumentation Systems Kim Hazelwood Greg Lueck Robert Cohn.
JIT Instrumentation – A Novel Approach To Dynamically Instrument Operating Systems Marek Olszewski Keir Mierle Adam Czajkowski Angela Demke Brown University.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Part Two: Optimizing Pintools Robert Cohn Kim Hazelwood.
Guiding Ispike with Instrumentation and Hardware (PMU) Profiles CGO’04 Tutorial 3/21/04 CK. Luk Massachusetts Microprocessor Design.
Buffer Overflow Attack- proofing of Code Binaries Ramya Reguramalingam Gopal Gupta Gopal Gupta Department of Computer Science University of Texas at Dallas.
Compiler Construction Code Generation Activation Records
Marian Ivanov (New) Root Memory checker. Outlook ● Motivation ● New memory checker – Implementation – User interface – Examples ● AliRoot observations.
Performance Optimization of Pintools C K Luk Copyright © 2006 Intel Corporation. All Rights Reserved. Reducing Instrumentation Overhead Total Overhead.
1 ROGUE Dynamic Optimization Framework Using Pin Vijay Janapa Reddi PhD. Candidate - Electrical And Computer Engineering University of Colorado at Boulder.
Correct RelocationMarch 20, 2016 Correct Relocation: Do You Trust a Mutated Binary? Drew Bernat
DR. MIGUEL ÁNGEL OROS HERNÁNDEZ 2. Software de bajo nivel.
© Dr. A. Williams, Fall Present Software Quality Assurance – Clover Lab 1 Tutorial / lab 2: Code instrumentation Goals of this session: 1.Create.
*Pentium is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries Performance Monitoring.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Paradyn Project Safe and Efficient Instrumentation Andrew Bernat.
Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore
PINTOS: An Execution Phase Based Optimization and Simulation Tool) PINTOS: An Execution Phase Based Optimization and Simulation Tool) Wei Hsu, Jinpyo Kim,
Kim Hazelwood Robert Cohn Intel SPI-ST
Olatunji Ruwase* Shimin Chen+ Phillip B. Gibbons+ Todd C. Mowry*
Tracing and Performance Analysis Tools for Heterogeneous Multicore System by Soon Thean Siew.
PinADX: Customizable Debugging with Dynamic Instrumentation
High-Level Language Interface
A Review of Processor Design Flow
Advantages of Pin Instrumentation
Feedback directed optimization in Compaq’s compilation tools for Alpha
System Calls System calls are the user API to the OS
What Are Performance Counters?
Dynamic Binary Translators and Instrumenters
Presentation transcript:

Instrumentation of Linux Programs with Pin Robert Cohn & C-K Luk Platform Technology & Architecture Development Enterprise Platform Group Intel Corporation

ASPLOS’042Pin Tutorial People Kim Hazelwood Cettei Robert Cohn Artur Klauser Geoff Lowney CK Luk Robert Muth Harish Patil Vijay Janapa Reddi Steven Wallace

ASPLOS’043Pin Tutorial What is Instrumentation? Max = 0; for (p = head; p; p = p->next) { if (p->value > max) { max = p->value; } count[0]++; count[1]++; printf(“In Loop\n”); printf(“In max\n”); User defined

ASPLOS’044Pin Tutorial What can Instrumentation do? Profiler for compiler optimization: –Basic-block count –Value profile Micro architectural study: –Instrument branches to simulate branch predictors –Generate traces Bug checking: –Find references to uninitialized, unallocated data Software tools that use instrumentation: –Purify, Valgrind, Vtune

ASPLOS’045Pin Tutorial Dynamic Instrumentation Pin uses dynamic instrumentation –Instrument code when it is executed the first time Many advantages over static instrumentation: –No need of a separate instrumentation pass –Can instrument all user-level codes executed Shared libraries Dynamically generated code –Easy to distinguish code and data –Instrumentation can be turned on/off –Can attach and instrument an already running process

ASPLOS’046Pin Tutorial Execution-driven Instrumentation ’ 2’ 1’ Compiler Original code Code cache

ASPLOS’047Pin Tutorial Execution-driven Instrumentation ’ 2’ 1’ Compiler Original code Code cache 3’ 5’ 6’

ASPLOS’048Pin Tutorial Transparent Instrumentation Pin’s instrumentation is transparent: –Application itself sees the same: Code addresses Data addresses Memory content –Instrumentation sees the original application: Code addresses Data address Memory content  Observe original app. behavior, won’t expose latent bugs

ASPLOS’049Pin Tutorial Instruction-level Instrumentation Instrument relative to an instruction: –Before –After: Fall-through edge Taken edge (if it is a branch) cmp%esi, %edx jle mov$0x1, %edi : mov $0x8,%edi count(10) count(30) count(20)

ASPLOS’0410Pin Tutorial Pin Instrumentation APIs Basic APIs are architecture independent: –Provide common functionalities such as finding out: Control-flow changes Memory accesses Architecture-specific APIs for more detailed info – IA-32, EM64T, Itanium, Xscale ATOM-based notion: –Instrumentation routines –Analysis routines

ASPLOS’0411Pin Tutorial Instrumentation Routines User writes instrumentation routines: –Walk list of instructions, and –Insert calls to analysis routines Pin invokes instrumentation routines when placing new instructions in code cache Repeated execution uses already instrumented code in code cache

ASPLOS’0412Pin Tutorial Analysis Routines User inserts calls to analysis routine: –User-specified arguments –E.g., increment counter, record data address, … User writes in C, C++, ASM Pin provides isolation so analysis does not affect application Optimizations like inlining, register allocation, and scheduling make it efficient

ASPLOS’0413Pin Tutorial Example: Instruction Count $ /bin/ls Makefile atrace.o imageload.out itrace proccount Makefile.example imageload inscount0 itrace.o proccount.o atrace imageload.o inscount0.o itrace.out $ pin -t inscount0 -- /bin/ls Makefile atrace.o imageload.out itrace proccount Makefile.example imageload inscount0 itrace.o proccount.o atrace imageload.o inscount0.o itrace.out Count

ASPLOS’0414Pin Tutorial Example: Instruction Count sub$0xff, %edx cmp%esi, %edx jle mov$0x1, %edi add$0x10, %eax counter++;

ASPLOS’0415Pin Tutorial #include #include "pin.H" UINT64 icount = 0; VOID docount() { icount++; } VOID Instruction(INS ins, VOID *v) { INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)docount, IARG_END); } VOID Fini(INT32 code, VOID *v) { std::cerr << "Count " << icount << endl; } int main(int argc, char * argv[]) { PIN_Init(argc, argv); INS_AddInstrumentFunction(Instruction, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0; } ManualExamples/inscount0.C instrumentation routine analysis routine

ASPLOS’0416Pin Tutorial Example: Instruction Trace $ pin -t itrace -- /bin/ls Makefile atrace.o imageload.out itrace proccount Makefile.example imageload inscount0 itrace.o proccount.o atrace imageload.o inscount0.o itrace.out $ head itrace.out 0x40001e90 0x40001e91 0x40001ee4 0x40001ee5 0x40001ee7 0x40001ee8 0x40001ee9 0x40001eea 0x40001ef0 0x40001ee0

ASPLOS’0417Pin Tutorial Example: Instruction Trace printip(ip); sub$0xff, %edx cmp%esi, %edx jle mov$0x1, %edi add$0x10, %eax

ASPLOS’0418Pin Tutorial #include #include "pin.H" FILE * trace; VOID printip(VOID *ip) { fprintf(trace, "%p\n", ip); } VOID Instruction(INS ins, VOID *v) { INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)printip, IARG_INST_PTR, IARG_END); } int main(int argc, char * argv[]) { trace = fopen("itrace.out", "w"); PIN_Init(argc, argv); INS_AddInstrumentFunction(Instruction, 0); PIN_StartProgram(); return 0; } ManualExamples/itrace.C analysis routine argument

ASPLOS’0419Pin Tutorial Arguments to Analysis Routine Some examples: IARG_UINT32 –An integer value IARG_REG_VALUE –Value of the register specified IARG_INST_PTR –Instruction pointer (program counter) value IARG_BRANCH_TAKEN –A non-zero value if the branch instrumented is taken IARG_BRANCH_TARGET_ADDR –Target address of the branch instrumented IARG_G_ARG0_CALLER –1 st general-purpose function argument, as seen by the caller IARG_MEMORY_READ_EA –Effective address of a memory read IARG_END –Must be the last in IARG list

ASPLOS’0420Pin Tutorial Instruction Inspection APIs Some examples: INS_IsCall (INS ins) –True if ins is a call instruction INS_IsRet (INS ins) –True if ins is a return instruction INS_IsAtomicUpdate (INS ins) –True if ins is an instruction that may do atomic memory update INS_IsMemoryRead (INS ins) –True if ins is a memory read instruction INS_MemoryReadSize (INS ins) –Return the number of bytes read from memory by this inst INS_Address (INS ins) –Return the instruction’s IP INS_Size (INS ins) –Return the size of the instruction (in bytes)

ASPLOS’0421Pin Tutorial Example: Faster Instruction Count sub$0xff, %edx cmp%esi, %edx jle mov$0x1, %edi add$0x10, %eax counter++; counter += 3 counter += 2

ASPLOS’0422Pin Tutorial #include #include "pin.H“ UINT64 icount = 0; VOID docount(INT32 c) { icount += c; } VOID Trace(TRACE trace, VOID *v) { for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl)) { BBL_InsertCall(bbl, IPOINT_BEFORE, (AFUNPTR)docount, IARG_UINT32, BBL_NumIns(bbl), IARG_END); } VOID Fini(INT32 code, VOID *v) { fprintf(stderr, "Count %lld\n", icount); } int main(int argc, char * argv[]) { PIN_Init(argc, argv); TRACE_AddInstrumentFunction(Trace, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0; } ManualExamples/inscount1.C

ASPLOS’0423Pin Tutorial Trace Single-entry, multiple-exit instruction sequence Create a new trace when a new entry is seen Program sub $0x5, %esi :add $0x3, %ebx cmp %esi, %ebx jnz … Trace 1 sub $0x5, %esi add $0x3, %ebx cmp %esi, %ebx jnz … Trace 2 add $0x3, %ebx cmp %esi, %ebx jnz …

ASPLOS’0424Pin Tutorial Instrumentation Granularity “Just-in-time” instrumentation –Instrument when code is first executed –2 granularities: Instruction Trace (basic blocks) “Ahead-of-time” instrumentation –Instrument entire image when first loaded –2 granularities: Image (shared library, executable) Routine

ASPLOS’0425Pin Tutorial Image Instrumentation $ pin -t imageload -- /bin/ls _insprofiler.Cimageload imageload.out insprofiler.C proccount.C atrace.Cimageload.C inscount0.C itrace.C staticcount.C atrace.oimageload.o inscount1.C makefile strace.C $ cat imageload.out Loading /bin/ls Loading /lib/ld-linux.so.2 Loading /lib/libtermcap.so.2 Loading /lib/i686/libc.so.6 Unloading /bin/ls Unloading /lib/ld-linux.so.2 Unloading /lib/libtermcap.so.2 Unloading /lib/i686/libc.so.6 Example: Reporting images loaded and unloaded

ASPLOS’0426Pin Tutorial #include #include "pin.H" FILE * trace; VOID ImageLoad(IMG img, VOID *v) { fprintf(trace, "Loading %s\n", IMG_Name(img).c_str()); } VOID ImageUnload(IMG img, VOID *v) { fprintf(trace, "Unloading %s\n", IMG_Name(img).c_str()); } VOID Fini(INT32 code, VOID *v) { fclose(trace); } int main(int argc, char * argv[]) { trace = fopen("imageload.out", "w"); PIN_Init(argc, argv); IMG_AddInstrumentFunction(ImageLoad, 0); IMG_AddUnloadFunction(ImageUnload, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0; } ManualExamples/imageload.C

ASPLOS’0427Pin Tutorial Routine Instrumentation VOID Image(IMG img, VOID *v) { RTN mallocRtn = RTN_FindByName(img, "malloc"); if (RTN_Valid(mallocRtn)) { RTN_Open(mallocRtn); // fetch insts in mallocRtn RTN_InsertCall(mallocRtn, IPOINT_BEFORE, (AFUNPTR)Arg1Before, IARG_G_ARG0_CALLEE, IARG_END); RTN_InsertCall(mallocRtn, IPOINT_AFTER, (AFUNPTR)MallocAfter, IARG_G_RESULT0, IARG_END); RTN_Close(mallocRtn); } SimpleExamples/malloctrace.C before malloc’s entry before malloc’s return 1 st argument to malloc (#bytes wanted) 1 st return value (address allocated)

ASPLOS’0428Pin Tutorial Example Pintools Instruction cache simulation –Replace itrace ’s analysis function Data cache simulation –Like I-cache, but instrument loads/stores and pass effective address Malloc/Free trace –instrument entry/exit points Detect out-of-bound stack references –Instrument instructions that move stack pointer –Instrument loads/stores to check in bound

ASPLOS’0429Pin Tutorial Instrumentation Library Pre-defined C++ classes Implement common instrumentation tasks: –Icount Instruction counting –Alarm Trigger on an event (instruction count or IP) –Controller Detect start and stop of an interval –Filter Skip instrumentation in parts of the program (e.g., ignoring shared libraries)

ASPLOS’0430Pin Tutorial Instrumentation Performance  Pin’s instrumentation is efficient

ASPLOS’0431Pin Tutorial Advanced Topics Symbol and debug information Hooks Detach/Attach Modifying program behavior Debugging Pintools

ASPLOS’0432Pin Tutorial Symbol/Debug Information Procedure names: –RTN_Name() Shared library names: –IMG_Name() File and line number information –PIN_FindLineFileByAddress()

ASPLOS’0433Pin Tutorial Hooks Pintools can catch: –Shared library load/unload IMG_AddInstrumentFunction() IMG_AddUnloadFunction() –Program end PIN_AddFiniFunction() –System calls INS_IsSyscall() –Thread create/end Pin 0 provides call backs for thread create and destroy Yet to be done for Pin 2

ASPLOS’0434Pin Tutorial Detach/Attach Detach from Pin and execute original code –PIN_Detach () –Restore to full speed after sufficient profiling Attach Pin to an already running process –Similar to debugger’s attach –Command line: “ pin –pid –t inscount0 ” –Fast forward to where you want to start profiling

ASPLOS’0435Pin Tutorial Modify Program Behavior with Instrumentation Analysis routines modify register values –IARG_RETURN_REGS Instrumentation modifies register operands –add %eax, %ebx => add %eax, %edx Use virtual registers –add %eax, %ebx => add %eax, REG_INST_G0 Modify memory –Pintool in the same address space as the program

ASPLOS’0436Pin Tutorial Debugging Pintools 1.Invoke gdb with your pintool (but don’t use “run”) 2.On another window, start your pintool with “-pause_tool” 3.Go back to gdb: a)Attach to the process b)Use “cont” to continue execution; can set breakpoints as usual (gdb) attach (gdb) break main (gdb) cont $ pin –pause_tool –t inscount0 -- /bin/ls Pausing to attach to pid $ gdb inscount0 (gdb)

ASPLOS’0437Pin Tutorial Status Pin 0: Itanium-only release 10/2003 –Used by Intel, HP, Oracle, many universities Pin 2: released 7/15/2004 –IA-32, EM64T, Xscale –Debian, Suse, Red Hat 7.2, 8.0, 9.0, EL3 –gcc, icc –Over 1000 downloads!

ASPLOS’0438Pin Tutorial Future Features Instrumentation of multithreaded programs Windows port?

ASPLOS’0439Pin Tutorial Summary Pin: dynamic instrumentation framework for Linux –IA32, EM64T, Itanium, and Xscale –Easy to use, transparent, and efficient Lots of sample tools Write your own tool!

ASPLOS’0440Pin Tutorial Acknowledgments Prof Dan Connors for providing the website at University of Colorado

ASPLOS’0441Pin Tutorial Project Engineering Automatic nightly testing –4 architectures –6 Linux versions –8 compilers –9000 binaries Automatically generated user manual, internal documentation using Doxygen