Instrumentation of Linux Programs with Pin Robert Cohn & C-K Luk Platform Technology & Architecture Development Enterprise Platform Group Intel Corporation
ASPLOS’042Pin Tutorial People Kim Hazelwood Cettei Robert Cohn Artur Klauser Geoff Lowney CK Luk Robert Muth Harish Patil Vijay Janapa Reddi Steven Wallace
ASPLOS’043Pin Tutorial What is Instrumentation? Max = 0; for (p = head; p; p = p->next) { if (p->value > max) { max = p->value; } count[0]++; count[1]++; printf(“In Loop\n”); printf(“In max\n”); User defined
ASPLOS’044Pin Tutorial What can Instrumentation do? Profiler for compiler optimization: –Basic-block count –Value profile Micro architectural study: –Instrument branches to simulate branch predictors –Generate traces Bug checking: –Find references to uninitialized, unallocated data Software tools that use instrumentation: –Purify, Valgrind, Vtune
ASPLOS’045Pin Tutorial Dynamic Instrumentation Pin uses dynamic instrumentation –Instrument code when it is executed the first time Many advantages over static instrumentation: –No need of a separate instrumentation pass –Can instrument all user-level codes executed Shared libraries Dynamically generated code –Easy to distinguish code and data –Instrumentation can be turned on/off –Can attach and instrument an already running process
ASPLOS’046Pin Tutorial Execution-driven Instrumentation ’ 2’ 1’ Compiler Original code Code cache
ASPLOS’047Pin Tutorial Execution-driven Instrumentation ’ 2’ 1’ Compiler Original code Code cache 3’ 5’ 6’
ASPLOS’048Pin Tutorial Transparent Instrumentation Pin’s instrumentation is transparent: –Application itself sees the same: Code addresses Data addresses Memory content –Instrumentation sees the original application: Code addresses Data address Memory content Observe original app. behavior, won’t expose latent bugs
ASPLOS’049Pin Tutorial Instruction-level Instrumentation Instrument relative to an instruction: –Before –After: Fall-through edge Taken edge (if it is a branch) cmp%esi, %edx jle mov$0x1, %edi : mov $0x8,%edi count(10) count(30) count(20)
ASPLOS’0410Pin Tutorial Pin Instrumentation APIs Basic APIs are architecture independent: –Provide common functionalities such as finding out: Control-flow changes Memory accesses Architecture-specific APIs for more detailed info – IA-32, EM64T, Itanium, Xscale ATOM-based notion: –Instrumentation routines –Analysis routines
ASPLOS’0411Pin Tutorial Instrumentation Routines User writes instrumentation routines: –Walk list of instructions, and –Insert calls to analysis routines Pin invokes instrumentation routines when placing new instructions in code cache Repeated execution uses already instrumented code in code cache
ASPLOS’0412Pin Tutorial Analysis Routines User inserts calls to analysis routine: –User-specified arguments –E.g., increment counter, record data address, … User writes in C, C++, ASM Pin provides isolation so analysis does not affect application Optimizations like inlining, register allocation, and scheduling make it efficient
ASPLOS’0413Pin Tutorial Example: Instruction Count $ /bin/ls Makefile atrace.o imageload.out itrace proccount Makefile.example imageload inscount0 itrace.o proccount.o atrace imageload.o inscount0.o itrace.out $ pin -t inscount0 -- /bin/ls Makefile atrace.o imageload.out itrace proccount Makefile.example imageload inscount0 itrace.o proccount.o atrace imageload.o inscount0.o itrace.out Count
ASPLOS’0414Pin Tutorial Example: Instruction Count sub$0xff, %edx cmp%esi, %edx jle mov$0x1, %edi add$0x10, %eax counter++;
ASPLOS’0415Pin Tutorial #include #include "pin.H" UINT64 icount = 0; VOID docount() { icount++; } VOID Instruction(INS ins, VOID *v) { INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)docount, IARG_END); } VOID Fini(INT32 code, VOID *v) { std::cerr << "Count " << icount << endl; } int main(int argc, char * argv[]) { PIN_Init(argc, argv); INS_AddInstrumentFunction(Instruction, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0; } ManualExamples/inscount0.C instrumentation routine analysis routine
ASPLOS’0416Pin Tutorial Example: Instruction Trace $ pin -t itrace -- /bin/ls Makefile atrace.o imageload.out itrace proccount Makefile.example imageload inscount0 itrace.o proccount.o atrace imageload.o inscount0.o itrace.out $ head itrace.out 0x40001e90 0x40001e91 0x40001ee4 0x40001ee5 0x40001ee7 0x40001ee8 0x40001ee9 0x40001eea 0x40001ef0 0x40001ee0
ASPLOS’0417Pin Tutorial Example: Instruction Trace printip(ip); sub$0xff, %edx cmp%esi, %edx jle mov$0x1, %edi add$0x10, %eax
ASPLOS’0418Pin Tutorial #include #include "pin.H" FILE * trace; VOID printip(VOID *ip) { fprintf(trace, "%p\n", ip); } VOID Instruction(INS ins, VOID *v) { INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)printip, IARG_INST_PTR, IARG_END); } int main(int argc, char * argv[]) { trace = fopen("itrace.out", "w"); PIN_Init(argc, argv); INS_AddInstrumentFunction(Instruction, 0); PIN_StartProgram(); return 0; } ManualExamples/itrace.C analysis routine argument
ASPLOS’0419Pin Tutorial Arguments to Analysis Routine Some examples: IARG_UINT32 –An integer value IARG_REG_VALUE –Value of the register specified IARG_INST_PTR –Instruction pointer (program counter) value IARG_BRANCH_TAKEN –A non-zero value if the branch instrumented is taken IARG_BRANCH_TARGET_ADDR –Target address of the branch instrumented IARG_G_ARG0_CALLER –1 st general-purpose function argument, as seen by the caller IARG_MEMORY_READ_EA –Effective address of a memory read IARG_END –Must be the last in IARG list
ASPLOS’0420Pin Tutorial Instruction Inspection APIs Some examples: INS_IsCall (INS ins) –True if ins is a call instruction INS_IsRet (INS ins) –True if ins is a return instruction INS_IsAtomicUpdate (INS ins) –True if ins is an instruction that may do atomic memory update INS_IsMemoryRead (INS ins) –True if ins is a memory read instruction INS_MemoryReadSize (INS ins) –Return the number of bytes read from memory by this inst INS_Address (INS ins) –Return the instruction’s IP INS_Size (INS ins) –Return the size of the instruction (in bytes)
ASPLOS’0421Pin Tutorial Example: Faster Instruction Count sub$0xff, %edx cmp%esi, %edx jle mov$0x1, %edi add$0x10, %eax counter++; counter += 3 counter += 2
ASPLOS’0422Pin Tutorial #include #include "pin.H“ UINT64 icount = 0; VOID docount(INT32 c) { icount += c; } VOID Trace(TRACE trace, VOID *v) { for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl)) { BBL_InsertCall(bbl, IPOINT_BEFORE, (AFUNPTR)docount, IARG_UINT32, BBL_NumIns(bbl), IARG_END); } VOID Fini(INT32 code, VOID *v) { fprintf(stderr, "Count %lld\n", icount); } int main(int argc, char * argv[]) { PIN_Init(argc, argv); TRACE_AddInstrumentFunction(Trace, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0; } ManualExamples/inscount1.C
ASPLOS’0423Pin Tutorial Trace Single-entry, multiple-exit instruction sequence Create a new trace when a new entry is seen Program sub $0x5, %esi :add $0x3, %ebx cmp %esi, %ebx jnz … Trace 1 sub $0x5, %esi add $0x3, %ebx cmp %esi, %ebx jnz … Trace 2 add $0x3, %ebx cmp %esi, %ebx jnz …
ASPLOS’0424Pin Tutorial Instrumentation Granularity “Just-in-time” instrumentation –Instrument when code is first executed –2 granularities: Instruction Trace (basic blocks) “Ahead-of-time” instrumentation –Instrument entire image when first loaded –2 granularities: Image (shared library, executable) Routine
ASPLOS’0425Pin Tutorial Image Instrumentation $ pin -t imageload -- /bin/ls _insprofiler.Cimageload imageload.out insprofiler.C proccount.C atrace.Cimageload.C inscount0.C itrace.C staticcount.C atrace.oimageload.o inscount1.C makefile strace.C $ cat imageload.out Loading /bin/ls Loading /lib/ld-linux.so.2 Loading /lib/libtermcap.so.2 Loading /lib/i686/libc.so.6 Unloading /bin/ls Unloading /lib/ld-linux.so.2 Unloading /lib/libtermcap.so.2 Unloading /lib/i686/libc.so.6 Example: Reporting images loaded and unloaded
ASPLOS’0426Pin Tutorial #include #include "pin.H" FILE * trace; VOID ImageLoad(IMG img, VOID *v) { fprintf(trace, "Loading %s\n", IMG_Name(img).c_str()); } VOID ImageUnload(IMG img, VOID *v) { fprintf(trace, "Unloading %s\n", IMG_Name(img).c_str()); } VOID Fini(INT32 code, VOID *v) { fclose(trace); } int main(int argc, char * argv[]) { trace = fopen("imageload.out", "w"); PIN_Init(argc, argv); IMG_AddInstrumentFunction(ImageLoad, 0); IMG_AddUnloadFunction(ImageUnload, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0; } ManualExamples/imageload.C
ASPLOS’0427Pin Tutorial Routine Instrumentation VOID Image(IMG img, VOID *v) { RTN mallocRtn = RTN_FindByName(img, "malloc"); if (RTN_Valid(mallocRtn)) { RTN_Open(mallocRtn); // fetch insts in mallocRtn RTN_InsertCall(mallocRtn, IPOINT_BEFORE, (AFUNPTR)Arg1Before, IARG_G_ARG0_CALLEE, IARG_END); RTN_InsertCall(mallocRtn, IPOINT_AFTER, (AFUNPTR)MallocAfter, IARG_G_RESULT0, IARG_END); RTN_Close(mallocRtn); } SimpleExamples/malloctrace.C before malloc’s entry before malloc’s return 1 st argument to malloc (#bytes wanted) 1 st return value (address allocated)
ASPLOS’0428Pin Tutorial Example Pintools Instruction cache simulation –Replace itrace ’s analysis function Data cache simulation –Like I-cache, but instrument loads/stores and pass effective address Malloc/Free trace –instrument entry/exit points Detect out-of-bound stack references –Instrument instructions that move stack pointer –Instrument loads/stores to check in bound
ASPLOS’0429Pin Tutorial Instrumentation Library Pre-defined C++ classes Implement common instrumentation tasks: –Icount Instruction counting –Alarm Trigger on an event (instruction count or IP) –Controller Detect start and stop of an interval –Filter Skip instrumentation in parts of the program (e.g., ignoring shared libraries)
ASPLOS’0430Pin Tutorial Instrumentation Performance Pin’s instrumentation is efficient
ASPLOS’0431Pin Tutorial Advanced Topics Symbol and debug information Hooks Detach/Attach Modifying program behavior Debugging Pintools
ASPLOS’0432Pin Tutorial Symbol/Debug Information Procedure names: –RTN_Name() Shared library names: –IMG_Name() File and line number information –PIN_FindLineFileByAddress()
ASPLOS’0433Pin Tutorial Hooks Pintools can catch: –Shared library load/unload IMG_AddInstrumentFunction() IMG_AddUnloadFunction() –Program end PIN_AddFiniFunction() –System calls INS_IsSyscall() –Thread create/end Pin 0 provides call backs for thread create and destroy Yet to be done for Pin 2
ASPLOS’0434Pin Tutorial Detach/Attach Detach from Pin and execute original code –PIN_Detach () –Restore to full speed after sufficient profiling Attach Pin to an already running process –Similar to debugger’s attach –Command line: “ pin –pid –t inscount0 ” –Fast forward to where you want to start profiling
ASPLOS’0435Pin Tutorial Modify Program Behavior with Instrumentation Analysis routines modify register values –IARG_RETURN_REGS Instrumentation modifies register operands –add %eax, %ebx => add %eax, %edx Use virtual registers –add %eax, %ebx => add %eax, REG_INST_G0 Modify memory –Pintool in the same address space as the program
ASPLOS’0436Pin Tutorial Debugging Pintools 1.Invoke gdb with your pintool (but don’t use “run”) 2.On another window, start your pintool with “-pause_tool” 3.Go back to gdb: a)Attach to the process b)Use “cont” to continue execution; can set breakpoints as usual (gdb) attach (gdb) break main (gdb) cont $ pin –pause_tool –t inscount0 -- /bin/ls Pausing to attach to pid $ gdb inscount0 (gdb)
ASPLOS’0437Pin Tutorial Status Pin 0: Itanium-only release 10/2003 –Used by Intel, HP, Oracle, many universities Pin 2: released 7/15/2004 –IA-32, EM64T, Xscale –Debian, Suse, Red Hat 7.2, 8.0, 9.0, EL3 –gcc, icc –Over 1000 downloads!
ASPLOS’0438Pin Tutorial Future Features Instrumentation of multithreaded programs Windows port?
ASPLOS’0439Pin Tutorial Summary Pin: dynamic instrumentation framework for Linux –IA32, EM64T, Itanium, and Xscale –Easy to use, transparent, and efficient Lots of sample tools Write your own tool!
ASPLOS’0440Pin Tutorial Acknowledgments Prof Dan Connors for providing the website at University of Colorado
ASPLOS’0441Pin Tutorial Project Engineering Automatic nightly testing –4 architectures –6 Linux versions –8 compilers –9000 binaries Automatically generated user manual, internal documentation using Doxygen