Pin Tutorial Kim Hazelwood David Kaeli Dan Connors Vijay Janapa Reddi.

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing Profiling and Performance R. Govindarajan
Advertisements

Profiler In software engineering, profiling ("program profiling", "software profiling") is a form of dynamic program analysis that measures, for example,
Instrumentation of Linux Programs with Pin Robert Cohn & C-K Luk Platform Technology & Architecture Development Enterprise Platform Group Intel Corporation.
Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
1 4/20/06 Exploiting Instruction-Level Parallelism with Software Approaches Original by Prof. David A. Patterson.
Integrity & Malware Dan Fleck CS469 Security Engineering Some of the slides are modified with permission from Quan Jia. Coming up: Integrity – Who Cares?
UPC Microarchitectural Techniques to Exploit Repetitive Computations and Values Carlos Molina Clemente LECTURA DE TESIS, (Barcelona,14 de Diciembre de.
- 1 - Copyright © 2006 Intel Corporation. All Rights Reserved. Fault Analysis Using Pin Srilatha (Bobbie) Manne Intel.
Accurately Approximating Superscalar Processor Performance from Traces Kiyeon Lee, Shayne Evans, and Sangyeun Cho Dept. of Computer Science University.
Pin : Building Customized Program Analysis Tools with Dynamic Instrumentation Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff.
Pin PLDI Tutorial Advantages of Pin Instrumentation Easy-to-use Instrumentation: Uses dynamic instrumentation –Do not need source code, recompilation,
SuperPin: Parallelizing Dynamic Instrumentation for Real-Time Performance Steven Wallace and Kim Hazelwood.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.
Pin Tutorial Robert Cohn Intel. Pin Tutorial Academia Sinica About Me Robert Cohn –Original author of Pin –Senior Principal Engineer at Intel –Ph.D.
Software & Services Group PinADX: Customizable Debugging with Dynamic Instrumentation Gregory Lueck, Harish Patil, Cristiano Pereira Intel Corporation.
Pipelined Profiling and Analysis on Multi-core Systems Qin Zhao Ioana Cutcutache Weng-Fai Wong PiPA.
Aamer Jaleel Intel® Corporation, VSSAD June 17, 2006
Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
1 ICS 51 Introductory Computer Organization Fall 2006 updated: Oct. 2, 2006.
Quiz Wei Hsu 8/16/2006. Which of the following instructions are speculative in nature? A)Data cache prefetch instruction B)Non-faulting loads C)Speculative.
Computer Architecture 2011 – out-of-order execution (lec 7) 1 Computer Architecture Out-of-order execution By Dan Tsafrir, 11/4/2011 Presentation based.
Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.
San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
University of Colorado
Pin2 Tutorial1 Pin Tutorial Kim Hazelwood Robert Muth VSSAD Group, Intel.
Pin PLDI Tutorial Kim Hazelwood David Kaeli Dan Connors Vijay Janapa Reddi.
About Us Kim Hazelwood Vijay Janapa Reddi
Software & Services Group 1 Pin: Intel’s Dynamic Binary Instrumentation Engine Pin Tutorial Intel Corporation Presented By: Tevi Devor CGO ISPASS 2012.
Day 3: Using Process Virtualization Systems Kim Hazelwood ACACES Summer School July 2009.
University of Maryland parseThat: A Robust Arbitrary-Binary Tester for Dyninst Ray Chen.
1 Introduction to SimpleScalar (Based on SimpleScalar Tutorial) CPSC 614 Texas A&M University.
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009.
Pin: Intel’s Dynamic Binary Instrumentation Engine Pin Tutorial
Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.
PMaC Performance Modeling and Characterization Performance Modeling and Analysis with PEBIL Michael Laurenzano, Ananta Tiwari, Laura Carrington Performance.
- 1 - Copyright © 2006 Intel Corporation. All Rights Reserved. Using the Pin Instrumentation Tool for Computer Architecture Research Aamer Jaleel, Chi-Keung.
1 Instrumentation of Intel® Itanium® Linux* Programs with Pin download: Robert Cohn MMDC Intel * Other names and brands.
1 Software Instrumentation and Hardware Profiling for Intel® Itanium® Linux* CGO’04 Tutorial 3/21/04 Robert Cohn, Intel Stéphane Eranian, HP CK Luk, Intel.
Processes and Threads CS550 Operating Systems. Processes and Threads These exist only at execution time They have fast state changes -> in memory and.
Dynamic Compilation and Modification CS 671 April 15, 2008.
Scalable Support for Multithreaded Applications on Dynamic Binary Instrumentation Systems Kim Hazelwood Greg Lueck Robert Cohn.
Source: Operating System Concepts by Silberschatz, Galvin and Gagne.
CS333 Intro to Operating Systems Jonathan Walpole.
CSE 332: C++ debugging Why Debug a Program? When your program crashes –Finding out where it crashed –Examining program memory at that point When a bug.
Day 2: Building Process Virtualization Systems Kim Hazelwood ACACES Summer School July 2009.
Part Two: Optimizing Pintools Robert Cohn Kim Hazelwood.
1 University of Maryland Runtime Program Evolution Jeff Hollingsworth © Copyright 2000, Jeffrey K. Hollingsworth, All Rights Reserved. University of Maryland.
Performance Optimization of Pintools C K Luk Copyright © 2006 Intel Corporation. All Rights Reserved. Reducing Instrumentation Overhead Total Overhead.
1 ROGUE Dynamic Optimization Framework Using Pin Vijay Janapa Reddi PhD. Candidate - Electrical And Computer Engineering University of Colorado at Boulder.
Real-World Pipelines Idea –Divide process into independent stages –Move objects through stages in sequence –At any given times, multiple objects being.
Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore
PINTOS: An Execution Phase Based Optimization and Simulation Tool) PINTOS: An Execution Phase Based Optimization and Simulation Tool) Wei Hsu, Jinpyo Kim,
Kim Hazelwood Robert Cohn Intel SPI-ST
PinADX: Customizable Debugging with Dynamic Instrumentation
CS399 New Beginnings Jonathan Walpole.
Introduction to SimpleScalar (Based on SimpleScalar Tutorial)
Introduction to SimpleScalar (Based on SimpleScalar Tutorial)
Advantages of Pin Instrumentation
Phase Capture and Prediction with Applications
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Henk Corporaal TUEindhoven 2011
When your program crashes
Programming with Shared Memory
Dynamic Hardware Prediction
rePLay: A Hardware Framework for Dynamic Optimization
Dynamic Binary Translators and Instrumenters
Presentation transcript:

Pin Tutorial Kim Hazelwood David Kaeli Dan Connors Vijay Janapa Reddi

Pin Tutorial About Us Kim Hazelwood –Assistant Professor at University of Virginia –Tortola Research Group: HW/SW Collaboration, Virtualization David Kaeli –Full Professor at Northeastern University –NUCAR Research Group: Computer Architecture Dan Connors –Assistant Professor at University of Colorado –DRACO Research Group: Compilers, Instrumentation Vijay Janapa Reddi –Ph.D. Student at Harvard University –VM Optimizations, VM Scalability

Pin Tutorial Agenda I.Pin Intro and Overview II.Fundamental Compiler/Architecture Concepts using Pin III.Advanced Compiler/Architecture Concepts using Pin IV.Exploratory Extensions and Hands-On Workshop

Pin Tutorial What is Instrumentation? A technique that inserts extra code into a program to collect runtime information Instrumentation approaches: Source instrumentation: –Instrument source programs Binary instrumentation: –Instrument executables directly

Pin Tutorial No need to recompile or relink Discover code at runtime Handle dynamically-generated code Attach to running processes Why use Dynamic Instrumentation?

Pin Tutorial How is Instrumentation used in Compiler Research? Program analysis –Code coverage –Call-graph generation –Memory-leak detection –Instruction profiling Thread analysis –Thread profiling –Race detection

Pin Tutorial Trace Generation Branch Predictor and Cache Modeling Fault Tolerance Studies Emulating Speculation Emulating New Instructions How is Instrumentation used in Computer Architecture Research?

Pin Tutorial Advantages of Pin Instrumentation Easy-to-use Instrumentation: Uses dynamic instrumentation –Do not need source code, recompilation, post-linking Programmable Instrumentation: Provides rich APIs to write in C/C++ your own instrumentation tools (called Pintools) Multiplatform: Supports x86, x86-64, Itanium, Xscale Supports Linux, Windows, MacOS Robust: Instruments real-life applications: Database, web browsers, … Instruments multithreaded applications Supports signals Efficient: Applies compiler optimizations on instrumentation code

Pin Tutorial Using Pin Launch and instrument an application $ pin –t pintool –- application Instrumentation engine (provided in the kit) Instrumentation tool (write your own, or use one provided in the kit) Attach to and instrument an application $ pin –t pintool –pid 1234

Pin Tutorial Pin Instrumentation APIs Basic APIs are architecture independent: Provide common functionalities like determining: –Control-flow changes –Memory accesses Architecture-specific APIs e.g., Info about segmentation registers on IA32 Call-based APIs: Instrumentation routines Analysis routines

Pin Tutorial Instrumentation vs. Analysis Concepts borrowed from the ATOM tool: Instrumentation routines define where instrumentation is inserted e.g., before instruction  Occurs first time an instruction is executed Analysis routines define what to do when instrumentation is activated e.g., increment counter  Occurs every time an instruction is executed

Pin Tutorial Pintool 1: Instruction Count sub$0xff, %edx cmp%esi, %edx jle mov$0x1, %edi add$0x10, %eax counter++;

Pin Tutorial Pintool 1: Instruction Count Output $ /bin/ls Makefile imageload.out itrace proccount imageload inscount0 atrace itrace.out $ pin -t inscount0 -- /bin/ls Makefile imageload.out itrace proccount imageload inscount0 atrace itrace.out Count

Pin Tutorial ManualExamples/inscount0.cpp instrumentation routine analysis routine #include #include "pin.h" UINT64 icount = 0; void docount() { icount++; } void Instruction(INS ins, void *v) { INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)docount, IARG_END); } void Fini(INT32 code, void *v) { std::cerr << "Count " << icount << endl; } int main(int argc, char * argv[]) { PIN_Init(argc, argv); INS_AddInstrumentFunction(Instruction, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0; } Same source code works on the 4 architectures Pin automatically saves/restores application state

Pin Tutorial Pintool 2: Instruction Trace sub$0xff, %edx cmp%esi, %edx jle mov$0x1, %edi add$0x10, %eax Print(ip); Need to pass ip argument to the analysis routine (printip())

Pin Tutorial Pintool 2: Instruction Trace Output $ pin -t itrace -- /bin/ls Makefile imageload.out itrace proccount imageload inscount0 atrace itrace.out $ head -4 itrace.out 0x40001e90 0x40001e91 0x40001ee4 0x40001ee5

Pin Tutorial ManualExamples/itrace.cpp argument to analysis routine analysis routine instrumentation routine #include #include "pin.H" FILE * trace; void printip(void *ip) { fprintf(trace, "%p\n", ip); } void Instruction(INS ins, void *v) { INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)printip, IARG_INST_PTR, IARG_END); } void Fini(INT32 code, void *v) { fclose(trace); } int main(int argc, char * argv[]) { trace = fopen("itrace.out", "w"); PIN_Init(argc, argv); INS_AddInstrumentFunction(Instruction, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0; }

Pin Tutorial Examples of Arguments to Analysis Routine IARG_INST_PTR Instruction pointer (program counter) value IARG_UINT32 An integer value IARG_REG_VALUE Value of the register specified IARG_BRANCH_TARGET_ADDR Target address of the branch instrumented IARG_MEMORY_READ_EA Effective address of a memory read And many more … (refer to the Pin manual for details)

Pin Tutorial Instrumentation Points Instrument points relative to an instruction: Before (IPOINT_BEFORE) After: –Fall-through edge (IPOINT_AFTER) –Taken edge (IPOINT_TAKEN_BRANCH) cmp%esi, %edx jle mov$0x1, %edi : mov $0x8,%edi count()

Pin Tutorial Instruction Basic block –A sequence of instructions terminated at a control-flow changing instruction –Single entry, single exit Trace –A sequence of basic blocks terminated at an unconditional control-flow changing instruction –Single entry, multiple exits Instrumentation Granularity sub$0xff, %edx cmp%esi, %edx jle mov$0x1, %edi add$0x10, %eax jmp 1 Trace, 2 BBs, 6 insts Instrumentation can be done at three different granularities:

Pin Tutorial Recap of Pintool 1: Instruction Count sub$0xff, %edx cmp%esi, %edx jle mov$0x1, %edi add$0x10, %eax counter++; Straightforward, but the counting can be more efficient

Pin Tutorial Pintool 3: Faster Instruction Count sub$0xff, %edx cmp%esi, %edx jle mov$0x1, %edi add$0x10, %eax counter += 3 counter += 2 basic blocks (bbl)

Pin Tutorial ManualExamples/inscount1.cpp #include #include "pin.H“ UINT64 icount = 0; void docount(INT32 c) { icount += c; } void Trace(TRACE trace, void *v) { for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl)) { BBL_InsertCall(bbl, IPOINT_BEFORE, (AFUNPTR)docount, IARG_UINT32, BBL_NumIns(bbl), IARG_END); } void Fini(INT32 code, void *v) { fprintf(stderr, "Count %lld\n", icount); } int main(int argc, char * argv[]) { PIN_Init(argc, argv); TRACE_AddInstrumentFunction(Trace, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0; } analysis routine instrumentation routine

Pin Tutorial Modifying Program Behavior Pin allows you not only to observe but also change program behavior Ways to change program behavior: Add/delete instructions Change register values Change memory values Change control flow

Pin Tutorial Instrumentation Library #include #include "pin.H" UINT64 icount = 0; VOID Fini(INT32 code, VOID *v) { std::cerr << "Count " << icount << endl; } VOID docount() { icount++; } VOID Instruction(INS ins, VOID *v) { INS_InsertCall(ins, IPOINT_BEFORE,(AFUNPTR)docount, IARG_END); } int main(int argc, char * argv[]) { PIN_Init(argc, argv); INS_AddInstrumentFunction(Instruction, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0; } #include #include "pin.H" #include "instlib.H" INSTLIB::ICOUNT icount; VOID Fini(INT32 code, VOID *v) { cout << "Count" << icount.Count() << endl; } int main(int argc, char * argv[]) { PIN_Init(argc, argv); PIN_AddFiniFunction(Fini, 0); icount.Activate(); PIN_StartProgram(); return 0; } Instruction counting Pin Tool

Pin Tutorial Useful InstLib abstractions ICOUNT –# of instructions executed FILTER –Instrument specific routines or libraries only ALARM –Execution count timer for address, routines, etc. FOLLOW_CHILD –Inject Pin into new process created by parent process TIME_WARP –Preserves RDTSC behavior across executions CONTROL –Limit instrumentation address ranges

Pin Tutorial Useful InstLib ALARM Example

Pin Tutorial Invoke gdb with your pintool (don’t “run”) 2.In another window, start your pintool with the “-pause_tool” flag 3.Go back to gdb window: a) Attach to the process b) “cont” to continue execution; can set breakpoints as usual (gdb) attach (gdb) break main (gdb) cont $ pin –pause_tool 5 –t inscount0 -- /bin/ls Pausing to attach to pid $ gdb inscount0 (gdb) Debugging Pintools

Pin Tutorial Pin Overhead SPEC Integer 2006

Pin Tutorial Adding User Instrumentation

Pin Tutorial Fast exploratory studies Instrumentation ~= native execution Simulation speeds at MIPS Characterize complex applications E.g. Oracle, Java, parallel data-mining apps Simple to build instrumentation tools Tools can feed simulation models in real time Tools can gather instruction traces for later use Instrumentation Driven Simulation

Pin Tutorial Performance Models Branch Predictor Models: PC of conditional instructions Direction Predictor: Taken/not-taken information Target Predictor: PC of target instruction if taken Cache Models: Thread ID (if multi-threaded workload) Memory address Size of memory operation Type of memory operation (Read/Write) Simple Timing Models: Latency information

Pin Tutorial Branch Predictor Model BP Model BPSim Pin Tool Pin Instrumentation RoutinesAnalysis RoutinesInstrumentation Tool API() Branch instr infoAPI data BPSim Pin Tool Instruments all branches Uses API to set up call backs to analysis routines Branch Predictor Model: Detailed branch predictor simulator

Pin Tutorial BranchPredictor myBPU; VOID ProcessBranch(ADDRINT PC, ADDRINT targetPC, bool BrTaken) { BP_Info pred = myBPU.GetPrediction( PC ); if( pred.Taken != BrTaken ) { // Direction Mispredicted } if( pred.predTarget != targetPC ) { // Target Mispredicted } myBPU.Update( PC, BrTaken, targetPC); } VOID Instruction(INS ins, VOID *v) { if( INS_IsDirectBranchOrCall(ins) || INS_HasFallThrough(ins) ) INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR) ProcessBranch, ADDRINT, INS_Address(ins), IARG_UINT32, INS_DirectBranchOrCallTargetAddress(ins), IARG_BRANCH_TAKEN, IARG_END); } int main() { PIN_Init(); INS_AddInstrumentationFunction(Instruction, 0); PIN_StartProgram(); } INSTRUMENT BP Implementation ANALYSIS MAIN

Pin Tutorial Branch prediction accuracies range from 0-100% Branches are hard to predict in some phases Can simulate these regions alone by fast forwarding to them in real time Bimodal In McFarling PredictorMcFarling Predictor Bimodal not chosen Branch Predictor Performance - GCC

Pin Tutorial Performance Model Inputs Branch Predictor Models: PC of conditional instructions Direction Predictor: Taken/not-taken information Target Predictor: PC of target instruction if taken Cache Models: Thread ID (if multi-threaded workload) Memory address Size of memory operation Type of memory operation (Read/Write) Simple Timing Models: Latency information

Pin Tutorial Cache Model Cache Pin Tool Pin Instrumentation RoutinesAnalysis RoutinesInstrumentation Tool API() Mem Addr infoAPI data Cache Pin Tool Instruments all instructions that reference memory Use API to set up call backs to analysis routines Cache Model: Detailed cache simulator Cache Simulators

Pin Tutorial CACHE_t CacheHierarchy[MAX_NUM_THREADS][MAX_NUM_LEVELS]; VOID MemRef(int tid, ADDRINT addrStart, int size, int type) { for(addr=addrStart; addr<(addrStart+size); addr+=LINE_SIZE) LookupHierarchy( tid, FIRST_LEVEL_CACHE, addr, type); } VOID LookupHierarchy(int tid, int level, ADDRINT addr, int accessType){ result = cacheHier[tid][cacheLevel]->Lookup(addr, accessType ); if( result == CACHE_MISS ) { if( level == LAST_LEVEL_CACHE ) return; LookupHierarchy(tid, level+1, addr, accessType); } VOID Instruction(INS ins, VOID *v) { if( INS_IsMemoryRead(ins) ) INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR) MemRef, IARG_THREAD_ID, IARG_MEMORYREAD_EA, IARG_MEMORYREAD_SIZE, IARG_UINT32, ACCESS_TYPE_LOAD, IARG_END); if( INS_IsMemoryWrite(ins) ) INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR) MemRef, IARG_THREAD_ID, IARG_MEMORYWRITE_EA, IARG_MEMORYWRITE_SIZE, IARG_UINT32, ACCESS_TYPE_STORE, IARG_END); } int main() { PIN_Init(); INS_AddInstrumentationFunction(Instruction, 0); PIN_StartProgram(); } INSTRUMENT Cache Implementation ANALYSIS MAIN

Pin Tutorial Performance Models Branch Predictor Models: PC of conditional instructions Direction Predictor: Taken/not-taken information Target Predictor: PC of target instruction if taken Cache Models: Thread ID (if multi-threaded workload) Memory address Size of memory operation Type of memory operation (Read/Write) Simple Timing Models: Latency information

Pin Tutorial Simple Timing Model  = instruction count;  = # branch mispredicts ; A l = # accesses to cache level l ;  = # last level cache (LLC) misses Assume 1-stage pipeline T i cycles for instruction execution Assume branch misprediction penalty T b cycles penalty for branch misprediction Assume cache access & miss penalty T l cycles for demand reference to cache level l T m cycles for demand reference to memory Total cycles = T i + T b + A l T l + T m LLC l = 1

Pin Tutorial cumulative 10 mil phase IPC L1 Miss Rate L2 Miss Rate L3 Miss Rate 2-way 32KB 4-way 256KB 8-way 2MB Several phases of execution Important to pick the correct phase of execution Performance - GCC

Pin Tutorial IPC L1 Miss Rate L2 Miss Rate L3 Miss Rate cumulative 10 mil phase 2-way 32KB 4-way 256KB 8-way 2MB initrepetitive One loop (3 billion instructions) is representative High miss rate at beginning; exploits locality at end Performance – AMMP

Pin Tutorial Knobs- Getting command arguments to your PIN tool Example declarations: KNOB KnobOutputFile(KNOB_MODE_WRITEONCE, "pintool", "o", "dcache.out", "specify dcache file name"); KNOB KnobTrackLoads(KNOB_MODE_WRITEONCE, "pintool", "l", "0", "track individual loads -- increases profiling time"); KNOB KnobThresholdMiss(KNOB_MODE_WRITEONCE, "pintool", "m","100", "only report memops with miss count above threshold"); -m # is the command flag to the pin tool 100 is the default value “only report…” usage of that parm

Pin Tutorial Knobs- Getting command arguments to your PIN tool Example knob use: TrackLoads= KnobTrackLoads.Value(); if( TrackLoads ) { }