San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation C.K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V.J. Reddi, K. Hazelwood Presented by: Michael Laurenzano
San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC What is Program Instrumentation? Inserting extra code into an application to observe its behavior –Example: Cache Simulation for (int i = 0; i < LENGTH; i++) { CacheSim(&A[i]); A[i] = (double)i; CacheSim(&B[i]); B[i] = (double)i; CacheSim(&C[i]); C[i] = (double)i; }
San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Uses of Program Instrumentation Code Profiles –Basic block/Instruction count –Operation results Microarchitectural study –Branch outcomes –Memory addresses Bug checking –Memory leaks –Uninitialized data
San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Pin System Layout
San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Pin System Layout The code being analyzed
San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Pin System Layout The code being analyzed Tells us where and how to perform analysis
San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Pin System Layout The code being analyzed Tells us where and how to perform analysis Combines application and pintool code to create instrumented code
San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Pin System Layout The code being analyzed Tells us where and how to perform analysis Combines application and pintool code to create instrumented code Stores the Instrumented code created by the JIT
San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Pin System Layout The code being analyzed Tells us where and how to perform analysis Combines application and pintool code to create instrumented code Stores the Instrumented code created by the JIT Controls execution, maintains data structures, tracks program state
San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Simplified Instrumentation Transfer control to VM at an application control transfer Look for instrumented version of branch target in code cache –If found: execute instrumented code –If not: compile the code, insert into code cache, execute new code Repeat
San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Trace Linking Transfer control directly between traces –Branch target must be known statically –Target trace must be present in code cache Sequence 1 Trace 1 Trace 2 Virtual Machine Trace 1 Trace 2 Sequence 2 Regular Execution Pin w/o Trace Linking Pin w/ Trace Linking
San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Trace Linking (Indirect) “Unknown” targets are usually somewhat predictable –Function typically returns to a few locations (few call sites) –Indirect Jump usually goes to a few locations Try several predicted targets to see if we can avoid VM intervention –Short target lists are maintained for each indirect branch –If we exhaust this list, use the VM
San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Function Cloning Most common indirect control transfer is a function return Create a function instance for each call site –Return address is then unique and known for each function instance –Turns this indirect control transfer into a direct control transfer –Code bloat! Implemented by keeping a call stack for each instrumented instruction sequence –Keep last 4 in call stack –Call stack represented as a 64-bit integer
San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Register Bindings Register re-allocation occurs so that Pin can use registers –The register bindings can be different from one trace to the next When compiling, keep register bindings from the previous trace if possible When linking traces, modify the register bindings before going to the next trace –Usually only a few registers are mismatched in practice
San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Optimization – Inlined Analysis Routines Without InliningWith Inlining Application Bridge Routine Bridge Routine Analysis Routine -2 fewer calls and 2 fewer returns Application Bridge Code Analysis Code Bridge Code Application -Other optimizations: constant folding, code relocation
San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Optimization – eflags Register Liveness The x86 eflags register is treated as a bit-vector containing state information –This register can be modified as a side- effect of some instructions eflags might not be live when we reach analysis routine –If this is the case, we do not need to save/restore it
San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Optimization – Call Scheduling User can specify that the routine be put anywhere in the particular scope –Anywhere in instruction, basic block, function, program, etc. Pin can schedule the call according to best performance –Perhaps at a point where few registers need to be saved –How well will this actually work?
San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Basic Pin Overhead
San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Effectiveness of Optimizations
San Diego Supercomputer Center Performance Modeling and Characterization Lab PMaC Questions?