Presentation is loading. Please wait.

Presentation is loading. Please wait.

Software & Services Group 1 Pin: Intel’s Dynamic Binary Instrumentation Engine Pin Tutorial Intel Corporation Presented By: Tevi Devor CGO ISPASS 2012.

Similar presentations


Presentation on theme: "Software & Services Group 1 Pin: Intel’s Dynamic Binary Instrumentation Engine Pin Tutorial Intel Corporation Presented By: Tevi Devor CGO ISPASS 2012."— Presentation transcript:

1 Software & Services Group 1 Pin: Intel’s Dynamic Binary Instrumentation Engine Pin Tutorial Intel Corporation Presented By: Tevi Devor CGO ISPASS 2012

2 Software & Services Group 2 Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, reference www.intel.com/software/products. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Copyright © 2010. Intel Corporation.

3 Software & Services Group 3 Intel compilers, associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel and non-Intel microprocessors (for example SIMD instruction sets), but do not optimize equally for non-Intel microprocessors. In addition, certain compiler options for Intel compilers, including some that are not specific to Intel micro- architecture, are reserved for Intel microprocessors. For a detailed description of Intel compiler options, including the instruction sets and specific microprocessors they implicate, please refer to the “Intel Compiler User and Reference Guides” under “Compiler Options." Many library routines that are part of Intel compiler products are more highly optimized for Intel microprocessors than for other microprocessors. While the compilers and libraries in Intel compiler products offer optimizations for both Intel and Intel-compatible microprocessors, depending on the options you select, your code and other factors, you likely will get extra performance on Intel microprocessors. Intel compilers, associated libraries and associated development tools may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel® Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel and non-Intel microprocessors, Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements. We hope to win your business by striving to offer the best performance of any compiler or library; please let us know if you find we do not. Notice revision #20110307 Optimization Notice

4 Software & Services Group 4 Agenda Part1: Introduction to Pin Part2: Larger Pin tools and writing efficient Pin tools Part3: Deeper into Pin API Part4: Advanced Pin API Part5: Performance #s

5 Software & Services Group 5 Part1: Introduction to Pin Dynamic Binary Instrumentation Pin Capabilities Overview of how Pin works Sample Pin Tools

6 Software & Services Group 6 What Does “Pin” Stand For? Three Letter Acronyms @ Intel –TLAs –26 3 possible TLAs –26 3 -1 are in use at Intel –Only 1 is not approved for use at Intel –Guess which one: Pin Is Not an acronym Pin is based on the post link optimizer Spike –Pin is a small Spike –Spike is EOL http://www.cgo.org/cgo2004/papers/01_82_luk_ck.pdf

7 Software & Services Group 7 Which one of these people is the Pin Performance Guru?tevi.devor@intel.com

8 Software & Services Group 8 Instrumentation Source-Code Instrumentation Static Binary Instrumentation Dynamic Binary Instrumentation  Instrument code just before it runs ( Just In Time – JIT) –No need to recompile or re-link –Discover code at runtime –Handle dynamically-generated code –Attach to running processes A technique that inserts code into a program to collect run-time information  Program analysis : performance profiling, error detection, capture & replay  Architectural study : processor and cache simulation, trace collection Pin is a dynamic binary instrumentation engine

9 Software & Services Group 9 Advantages of Pin Instrumentation Programmable Instrumentation: –Provides rich set of APIs to write, in C,C++,assembly, your own instrumentation tools, called PinTools –APIs are designed to maximize ease of use –abstract away the underlying instruction set idiosyncrasies Multiplatform: Robust: –Instruments real-life applications: Database, web browsers, … –Instruments multithreaded applications –Supports signals and exceptions, self modifying code… –If you can Run it – you can Pin it Efficient: –Applies compiler optimizations on instrumentation code Pin can be used to instrument all the user level code in an application OSs Architectures Windows, LinuxIA-32, Intel64

10 Software & Services Group 10 Pin Instrumentation Capabilities Use Pin APIs to write PinTools that: –Replace application functions with your own –Call the original application function from within your replacement function –Fully examine any application instruction, insert a call to your instrumenting function to be executed whenever that instruction executes –Pass parameters to your instrumenting function from a large set of supported parameters –Register values (including IP), Register values by reference (for modification) –Memory addresses read/written by the instruction –Full register context –…. –Track function calls including syscalls and examine/change arguments –Track application threads –Intercept signals –Instrument a process tree –Many other capabilities… If Pin doesn ’ t have it, you don ’ t want it

11 Software & Services Group 11 Usage of Pin at Intel Profiling and analysis products –Intel Parallel Studio –Amplifier (Performance Analysis) –Lock and waits analysis –Concurrency analysis –Inspector (Correctness Analysis) –Threading error detection (data race and deadlock) –Memory error detection Architectural research and enabling –Emulating new instructions (Intel SDE) –Trace generation –Branch prediction and cache modeling GUI Algorithm PinTool Pin

12 Software & Services Group 12 Example Pin-tools SDE: http://software.intel.com/en-us/articles/intel-software-development-emulator CMP$IM: http://www-mount.ece.umn.edu/~jjyi/MoBS/2008/program/02A-Jaleel.pdf PinPlay: Paper presented at CGO2010 http://www.cgo.org/cgo2010/program.html

13 Software & Services Group 13 Pin Usage Outside Intel Popular and well supported –30,000+ downloads, 400+ citations Free DownLoad –www.pintool.orgwww.pintool.org –Includes: Detailed user manual, source code for 100s of Pin tools Pin User Group (PinHeads) – http://tech.groups.yahoo.com/group/pinheads/http://tech.groups.yahoo.com/group/pinheads/ –Pin users and Pin developers answer questions

14 Software & Services Group 14 Starting at first application IP Read a Trace from Application Code Jit it, adding instrumentation code from inscount.dll Encode the trace into the Code Cache Execute Jitted code Execution of Trace ends Call into PINVM.DLL to Jit next trace Pass in app IP of Trace ’ s target Source Trace exit branch is modified to directly branch to Destination Trace Pin Invocation gzip.exe input.txt Application Code and Data Application Process System Call Dispatcher Event Dispatcher Thread Dispatcher PINVM.DLL inscount.dll PIN.LIB Code Cache NTDLL.DLL Windows kernel CreateProcess (gzip.exe, input.txt, suspended) Launcher PIN.EXE Launcher Process Boot Routine + Data: firstAppIp, “Inscount.dll” Load PINVM.DLL Inject Pin BootRoutine and Data into application Load inscount.dll and run its main() Start PINVM.DLL running (firstAppIp, “ inscount.dll ” ) pin.exe – t inscount.dll – gzip.exe input.txt Count 258743109 PinTool that counts application instructions executed, prints Count at end Resume at BootRoutine First app IP app Ip of Trace ’ s target Read a Trace from Application Code Jit it, adding instrumentation code from inscount.dll Encode the jitted trace into the Code Cache GetContext(&firstAppIp) SetContext(BootRoutineIp) WriteProcessMemory(BootRoutine, BootData) Decoder Encoder

15 Software & Services Group 15 All code in this presentation is covered by the following: /*BEGIN_LEGAL Intel Open Source License Copyright (c) 2002-2010 Intel Corporation. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. Neither the name of the Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE INTEL OR ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. END_LEGAL */

16 Software & Services Group 16 Jitting time routine: Pin CallBack restore eflags mov 0x1, %edi jle Instruction Counting Tool (inscount.dll) switch to pin stack save registers call docount restore registers switch to app stack inc icount sub$0xff, %edx cmp%esi, %edx save eflags Execution time routine #include "pin.h" UINT64 icount = 0; void docount() { icount++; } void Instruction(INS ins, void *v) { INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)docount, IARG_END); } void Fini(INT32 code, void *v) { std::cerr << "Count " << icount << endl; } int main(int argc, char * argv[]) { PIN_Init(argc, argv); INS_AddInstrumentFunction(Instruction, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); // Never returns return 0; } Instrumentation routine Called during jitting of INS Analysis routine Executes each time jitted INStruction executes INS is only valid inside this Instrumentation routine For 10pts: Which function is the analysis routine? For 20pts: Which function is executed more often?

17 Software & Services Group 17 Instrumentation vs. Analysis Concepts borrowed from the ATOM tool: Instrumentation routines define where instrumentation is inserted –e.g., before instruction  Occurs when an instruction is being jitted Analysis routines define what to do when instrumentation is activated –e.g., increment counter  Occurs every time an instruction is executed

18 Software & Services Group 18 Application Code and Data Application Process System Call Dispatcher Event Dispatcher Thread Dispatcher PINVM.DLL inscount.dll PIN.LIB Code Cache NTDLL.DLL Windows kernel Launcher PIN.EXE Launcher Process Boot Routine + Data: firstAppIp, “Inscount.dll” pin.exe – t inscount.dll – gzip.exe input.txt First app IP Read a Trace from Application Code Jit it, adding instrumentation code from inscount.dll Encode the Jitted trace into the Code Cache Decoder Encoder

19 Software & Services Group 19 Trace Original code Trace BBL#3 BBL#2 ’ BBL#1 ’ Early Exit via Stub Trace Exit via Stub Early Exit via Stub BBL#2BBL#4 BBL#1 BBL#3 BBL# 5BBL# 6 BBL# 7 FT TK Trace: A sequence of continuous instructions, with one entry point BBL: has one entry point and ends at first control transfer instruction

20 Software & Services Group 20 #include "pin.H" UINT64 icount = 0; void PIN_FAST_ANALYSIS_CALL docount(INT32 c) { icount += c; } void Trace(TRACE trace, void *v){// Pin Callback for(BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl)) BBL_InsertCall(bbl, IPOINT_ANYWHERE, (AFUNPTR)docount, IARG_FAST_ANALYSIS_CALL, IARG_UINT32, BBL_NumIns(bbl), IARG_END); } void Fini(INT32 code, void *v) { // Pin Callback fprintf(stderr, "Count %lld\n", icount); } int main(int argc, char * argv[]) { PIN_Init(argc, argv); TRACE_AddInstrumentFunction(Trace, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0; } ManualExamples/inscount2.cpp

21 Software & Services Group 21 20 0x001de0000 mov r14, 0xc5267d40 //inscount2.docount 58 0x001de000a add [r14], 0x2 //inscount2.docount 2 0x001de0015 0x77ec4600 cmp rax, rdx 9 0x001de0018 jz 0x1deffa0 L1 //patched in future 52 0x001de001e mov r14, 0xc5267d40 //inscount2.docount 29 0x001de0028 mov [r15+0x60], rax 57 0x001de002c lahf 37 0x001de002e seto al 50 0x001de0031 mov [r15+0xd8], ax 30 0x001de0039 mov rax, [r15+0x60] 12 0x001de003d add [r14], 0x2 //inscount2.docount 40 0x001de0048 0x77ec4609 movzx edi, [rax+0x2] //ecx alloced to edi 22 0x001de004c push 0x77ec4612 //push retaddr 61 0x001de0051 nop 17 0x001de0052 jmp 0x1deffd0 L2 //patched in future L2: 24 0x001deffd0 mov [r15+0x40], rsp // save app rsp 34 0x001deffd4 mov rsp, [r15+0x2d0] // switch to pin stack 66 0x001deffdb call [0x2f000000]// call VmEnter // data used by VmEnter – pointed to by return-address of call 0x001deffe8_svc(VMSVC_XFER) 0x001defff0_sct(0x00065fb60) // current register mapping 0x001defff8_iaddr(0x077ef7870) // app target IP of // call at 0x77ec460d L1: 41 0x001deffa0 mov [r15+0x40], rsp // save app rsp 63 0x001deffa4 mov rsp, [r15+0x2d0] // switch to pin stack 56 0x001deffab call [0x2f000000] // call VmEnter // data used by VmEnter – pointed to by return-address of call 0x001deffb8_svc(VMSVC_XFER) 0x001deffc0_sct(0x00065f998) // current register mapping 0x001deffc8_iaddr(0x077f1eac9)// app target IP of jz at 0x77ec4603 APP IP 2 0x77ec4600 cmp rax, rdx 22 0x77ec4603 jz 0x77f1eac9 40 0x77ec4609 movzx ecx, [rax+0x2] 37 0x77ec460d call 0x77ef7870 save status flags Application Trace How many BBLs in this trace? Compiler generated code for docount Inlined by Pin r14 allocated by Pin r15 allocated by Pin Points to per-thread spill area

22 Software & Services Group 22 #include "pin.H" INT32 numThreads = 0; const INT32 MaxNumThreads = 10000; struct THREAD_DATA { UINT64 _count; UINT8 _pad[56]; // guess why? }icount[MaxNumThreads]; // Analysis routine VOID PIN_FAST_ANALYSIS_CALL docount(ADDRINT c, THREADID tid) { icount[tid]._count += c;} // Pin Callback VOID ThreadStart(THREADID threadid, CONTEXT *ctxt, INT32 flags, VOID *v){numThreads++;} VOID Trace(TRACE trace, VOID *v) { // Jitting time routine: Pin Callback for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl)) BBL_InsertCall(bbl, IPOINT_ANYWHERE, (AFUNPTR)docount, IARG_FAST_ANALYSIS_CALL, IARG_UINT32, BBL_NumIns(bbl), IARG_THREAD_ID, IARG_END); } VOID Fini(INT32 code, VOID *v){// Pin Callback for (INT32 t=0; t<numThreads; t++) printf ("InsCount[of thread#%d]= %d\n",t,icount[t]._count); } int main(int argc, char * argv[]) { PIN_Init(argc, argv); for (INT32 t=0; t<MaxNumThreads; t++) {icount[t]._count = 0;} PIN_AddThreadStartFunction(ThreadStart, 0); TRACE_AddInstrumentFunction(Trace, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0; } SimpleExamples/inscount2_mt.cpp Why is there NO synchronization?

23 Software & Services Group 23 Multi-Threading Pin supports multi-threading –Application threads execute jitted code including instrumentation code (inlined and not inlined), without any serialization introduced by Pin –Instrumentation code can use Pin and/or OS synchronization constructs to introduce serialization if needed. –Will see examples of this in Part4 –Pin provides APIs for thread local storage. –Will see examples in Part3 –Pin callbacks are serialized –Jitting is serialized –Only one application thread can be jitting code at any time

24 Software & Services Group 24 #include "pin.h“#include std::map disAssemblyMap; VOID ReadsMem (ADDRINT applicationIp, ADDRINT memoryAddressRead, UINT32 memoryReadSize) { printf ("0x%x %s reads %d bytes of memory at 0x%x\n", applicationIp, disAssemblyMap[applicationIp].c_str(), memoryReadSize, memoryAddressRead);} VOID Instruction(INS ins, void * v) {// Jitting time routine // Pin Callback if (INS_IsMemoryRead(ins)) { disAssemblyMap[INS_Address(ins)] = INS_Disassemble(ins); INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR) ReadsMem, IARG_INST_PTR,// application IP IARG_MEMORYREAD_EA, IARG_MEMORYREAD_SIZE, IARG_END); } int main(int argc, char * argv[]) { PIN_Init(argc, argv); INS_AddInstrumentFunction(Instruction, 0); PIN_StartProgram(); } Memory Read Logger Tool Switch to pin stack push 4 push %eax push 0x7f083de call ReadsMem Pop args off pin stack Switch back to app stack inc DWORD_PTR[%eax] inc DWORD_PTR[%esi]0x8 Switch to pin stack push 4 lea %ecx,[%esi]0x8 push %ecx push 0x7f083e4 call ReadsMem Pop args off pin stack Switch back to app stack Pin has determined that it can overwrite ecx Many other IARGs IARG_BRANCH_TARGET_ADDR The target address of a branch instruction when executed (including indirect branches) IARG_BRANCH_TAKEN Is the branch taken (when executed) IARG_REG_VALUE, The value of register REG IARG_REG_REFERENCE, A “pointer” to a register IARG_FUNCARG_ENTRYPOINT_VALUE, The value of ARGUMENT # of the function (RTN instrumentation) IARG_CONTEXT Handle to the full register context of the executing thread Work in progress: IARG_MAKE_ME_A_COFFEE Pin does full register allocation during jitting. At the start of jitting a trace, all registers accessed by the application code, are considered to be virtual registers. Pin allocates each of these virtual registers to either: A physical register. Or a per-thread register spill area pointed to by ebx (IA-32) r15 (Intel64)

25 Software & Services Group 25 #include "pin.H" void * MallocWrapper( CONTEXT * ctxt, AFUNPTR pf_malloc, size_t size) { // Simulate out-of-memory every so often void * res; if (TimeForOutOfMem()) return (NULL); PIN_CallApplicationFunction(ctxt, PIN_ThreadId(), CALLINGSTD_DEFAULT, pf_malloc, PIN_PARG(void *), &res, PIN_PARG(size_t), size); return res; } VOID ImageLoad(IMG img, VOID *v) { // Pin callback. Registered by IMG_AddInstrumentFunction if (strstr(IMG_Name(img).c_str(), "libc.so") || strstr(IMG_Name(img).c_str(), "MSVCR80") || strstr(IMG_Name(img).c_str(), "MSVCR90")) { RTN mallocRtn = RTN_FindByName(img, "malloc"); PROTO protoMalloc = PROTO_Allocate( PIN_PARG(void *), CALLINGSTD_DEFAULT, "malloc", PIN_PARG(size_t), PIN_PARG_END() ); RTN_ReplaceSignature(mallocRtn, AFUNPTR(MallocWrapper), IARG_PROTOTYPE, protoMalloc, IARG_CONST_CONTEXT, IARG_ORIG_FUNCPTR, IARG_FUNCARG_ENTRYPOINT_VALUE, 0, IARG_END); } }} } int main(int argc, CHAR *argv[]) { PIN_InitSymbols(); PIN_Init(argc,argv)); IMG_AddInstrumentFunction(ImageLoad, 0); PIN_StartProgram(); } Malloc Wrapping This is rather expensive Also has 2 synchroniztion points The ImageLoad callback is called for each image (exe, shared library) loaded into the process It is called before any code in the loaded image is executed This is referred to ahead-of-time-instrumentation

26 Software & Services Group 26 #include "pin.H“ ADDRINT mallocReturnIp = 0; VOID BeforeMalloc(ADDRINT returnIp, ADDRINT size) { mallocReturnIp = returnIp; printf ("(tool) call to malloc for %5d bytes call will return to %p\n", size, returnIp); } int CheckReturn(ADDRINT sp, ADDRINT returnRegVal) { // is this a return from malloc return (mallocReturnIp == *(reinterpret_cast (sp))); } VOID ProcessReturnFromMalloc(ADDRINT ip, ADDRINT returnVal) { printf ("(tool) return from malloc at %p malloc returns %p\n", ip, returnVal); mallocReturnIp = 0; } static void Instruction(INS ins, void *v) { if( INS_IsRet(ins)) { INS_InsertIfCall(ins, IPOINT_BEFORE, (AFUNPTR)CheckReturn, IARG_REG_VALUE, REG_STACK_PTR, IARG_END); INS_InsertThenCall(ins, IPOINT_BEFORE, (AFUNPTR)ProcessReturnFromMalloc, IARG_INST_PTR, IARG_FUNCRET_EXITPOINT_VALUE, IARG_END); }} VOID ImageLoad(IMG img, VOID *v) { // Pin callback. Registered by IMG_AddInstrumentFunction if (strstr(IMG_Name(img).c_str(), "libc.so") || strstr(IMG_Name(img).c_str(), "MSVCR80”)) { RTN mallocRtn = RTN_FindByName(img, "malloc"); RTN_Open(mallocRtn); RTN_InsertCall(mallocRtn, IPOINT_BEFORE, AFUNPTR(BeforeMalloc), IARG_RETURN_IP, IARG_FUNCARG_ENTRYPOINT_VALUE, 0, IARG_END); RTN_Close(mallocRtn); } } int main(int argc, CHAR *argv[]) { PIN_InitSymbols(); PIN_Init(argc,argv); IMG_AddInstrumentFunction(ImageLoad, 0); INS_AddInstrumentFunction(Instruction, 0); PIN_StartProgram(); } Cheaper Malloc Wrapping IPOINT_AFTER is NOT guaranteed to find all returns Function is inlinable Function returns TRUE/FALSE Function is not inlinable Function will be called iff The IfCall function (CheckReturn) Returns TRUE

27 Software & Services Group 27 Pin Probe-Mode Probe mode is a method of using Pin to instrument at the function level only. Wrap, Replace, call Analysis function before/after. Replacement or Wrapping function can call the replaced (original) function. The application and the replacement routine are run natively (not Jitted). –Faster than Jit-mode –Puts more responsibility on the tool writer. –Probes can only be placed on RTN boundaries –Must be inserted within the Image load callback. –Pin will automatically remove the probes when an image is unloaded. Many of the PIN APIs that are available in JIT mode are not available in Probe mode.

28 Software & Services Group 28 Entry point overwritten with probe : 0x400113d4: jmp 0x41481064 0x400113d9: push %ebx A Sample Probe –A probe is a jump instruction that overwrites original instruction(s) in the application –Instrumentation invoked with probes –Pin copies/translates original bytes so probed (replaced) functions can be called from the replacement function Copy of entry point with original bytes: 0x50000004: push %ebp 0x50000005: mov %esp,%ebp 0x50000007: push %edi 0x50000008: push %esi 0x50000009: jmp 0x400113d9 0x41481064: push %ebp // tool wrapper func :::::::::::::::::::: 0x414827fe: call 0x50000004 // call original func Original function entry point: 0x400113d4: push %ebp 0x400113d5: mov %esp,%ebp 0x400113d7: push %edi 0x400113d8: push %esi 0x400113d9: push %ebx

29 Software & Services Group 29 #include "pin.H" void * MallocWrapper(AFUNPTR pf_malloc, size_t size) { // Simulate out-of-memory every so often void * res; if (TimeForOutOfMem()) return (NULL); res = pf_malloc(size); return res; } VOID ImageLoad (IMG img, VOID *v) { if (strstr(IMG_Name(img).c_str(), "libc.so") || strstr(IMG_Name(img).c_str(), "MSVCR80") || strstr(IMG_Name(img).c_str(), "MSVCR90")) { RTN mallocRtn = RTN_FindByName(img, "malloc"); if ( RTN_Valid(mallocRtn) && RTN_IsSafeForProbedReplacement(mallocRtn) ) { PROTO proto_malloc = PROTO_Allocate(PIN_PARG(void *), CALLINGSTD_DEFAULT, "malloc", PIN_PARG(size_t), PIN_PARG_END() ); RTN_ReplaceSignatureProbed (mallocRtn, AFUNPTR(MallocWrapper), IARG_PROTOTYPE, proto_malloc, IARG_ORIG_FUNCPTR, IARG_FUNCARG_ENTRYPOINT_VALUE, 0, IARG_END); } }}} }} int main(int argc, CHAR *argv[]) { PIN_InitSymbols(); PIN_Init(argc,argv)); IMG_AddInstrumentFunction(ImageLoad, 0); PIN_StartProgramProbed(); } Malloc Wrapping Probe-Mode

30 Software & Services Group 30 SDE SDE: A fast functional simulator for applications with new instructions –New instructions have been defined –Compiler generates code with new instructions –What can be used to run the apps with the new instructions? –Use PinTool that emulates new instructions. –vmovdqu ymm?, mem256 vmovdqu mem256, ymm? –16 new 256 bit ymm registers –Read/Write ymm register from/to memory.

31 Software & Services Group 31 Application Code and Data Application Process System Call Dispatcher Event Dispatcher Thread Dispatcher PINVM.DLL sde.dll PIN.LIB Code Cache NTDLL.DLL Windows kernel Launcher PIN.EXE Launcher Process Boot Routine + Data: firstAppIp, “sde.dll” pin.exe - t sde.dll -- gzip.exe input.txt First app IP Read a Trace from Application Code Jit it, adding instrumentation code from sde.dll Encode the Jitted trace into the Code Cache Execute it Decoder Encoder Decoder can decode new instructions Host CPU can NOT execute them New instruction is replaced with call to emulation function in the tool gzip.exe compiled with compiler that generates new instructions

32 Software & Services Group 32 #include "pin.H" VOID EmVmovdquMem2Reg(unsigned int ymmDstRegNum, ADDRINT * ymmMemSrcPtr) { PIN_SafeCopy(ymmRegs[ymmDstRegNum], ymmMemSrcPtr, 32); } VOID EmVmovdquReg2Mem(int ymmSrcRegNum, ADDRINT * ymmMemDstPtr) { PIN_SafeCopy(ymmMemDstPtr, ymmRegs[ymmRegNum], 32); } VOID Instruction(INS ins, VOID *v) { switch (INS_Opcode(ins) { ::::: case XED_ICLASS_VMOVDQU: if (INS_IsMemoryRead(ins)) // vmovdqu ymm? <= mem256 INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)EmVmovdquMem2Reg, IARG_UINT32, REG(INS_OperandReg(ins, 0)) - REG_YMM0, IARG_MEMORYREAD_EA, IARG_END); else if (INS_IsMemoryWrite(ins)) // vmovdqu mem256 <= ymm? INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)EmVmovdquReg2Mem, IARG_UINT32, REG(INS_OperandReg(ins, 1)) - REG_YMM0, IARG_MEMORYWRITE_EA, IARG_END); INS_DeleteIns(ins); //Processor does NOT execute this instruction break; } int main(int argc, CHAR *argv[]) { PIN_Init(argc,argv)); INS_AddInstrumentFunction(Instruction, 0); PIN_StartProgram(); } sde_emul.dll Schema

33 Software & Services Group 33 pin – t inscount.so – gzip input.txt Linux Invocation+Injection gzip input.txt Child (Injector) Pin (Injectee) PinTool that counts application instructions executed, prints Count at end fork exitLoop = FALSE; Ptrace TraceMe while(!exitLoop){} Ptrace Injectee – Injectee Freezes Injectee.exitLoop = TRUE; execv(gzip); // Injectee Freezes Ptrace continue (unFreezes Injectee) Ptrace Copy (save, gzip.CodeSegment, sizeof(MiniLoader)) PtraceGetContext (gzip.OrigContext) PtraceCopy (gzip.CodeSegment, MiniLoader, sizeof(MiniLoader)) Ptrace continue@MiniLoader (unFreezes Injectee) MiniLoader loads Pin+Tool, allocates Pin stack Kill(SigTrace, Injector): Freezes until Ptrace Cont Execution of Injector resumes after execv(gzip) in Injectee completes Ptrace Detach Wait for MiniLoader complete (SigTrace from Injectee) Pin Code and Data MiniLoader Pin Code and Data MiniLoader gzip Code and Data Code to Save MiniLoader Code to Save Ptrace Copy (gzip.CodeSegment, save, sizeof(MiniLoader)) Ptrace Copy (gzip.pin.stack, gzip.OrigCtxt, sizeof (ctxt)) Ptrace SetContext (gzip.IP=pin, gzip.SP=pin.Stack) gzip OrigCtxt Pin Code and Data MiniLoader Inscount2.so gzip (Injectee) Pin stack gzip OrigCtxt IP

34 Software & Services Group 34 Part1 Summary Pin is Intel’s dynamic binary instrumentation engine Pin can be used to instrument all user level code –Windows, Linux –IA-32, Intel64, IA64 –Product level robustness –Jit-Mode for full instrumentation: Thread, Function, Trace, BBL, Instruction –Probe-Mode for Function Replacement/Wrapping/Instrumentation only. –Pin supports multi-threading, no serialization of jitted application nor of instrumentation code Pin API makes Pin Tools easy to write –Presented 6 full Pin tools, each one fit on 1 ppt slide Popular and well supported –30,000+ downloads, 400+ citations Free DownLoad –www.pintool.orgwww.pintool.org –Includes: Detailed user manual, source code for 100s of Pin tools Pin User Group – http://tech.groups.yahoo.com/group/pinheads/http://tech.groups.yahoo.com/group/pinheads/ –Pin users and Pin developers answer questions

35 Software & Services Group 35 Part2: Larger Pin tools and writing efficient Pin tools

36 Software & Services Group 36 CMP$im – A CMP Cache Simulation Pin Tool ThreadID Address, Size Access Type Params to configure # cache levels, size, threads/cache etc PIN Cache model WORK LOAD PRIVATE LLC/SHARED BANKED LLC DL1 LLC ThreadID, Address, Size, Access Type L2 INTERCONNECT Instrumentation Routines Modeling an 8-core CMP using CMP$im CMP$im author: Aamer.Jaleel@intel.com

37 Software & Services Group 37 ANALYSIS ROUTINES INSTR ROUTINES MAIN Pin Tool CMP$im – Instrument Memory References VOID Instruction(INS ins, VOID *v) { if( INS_IsMemoryRead(ins) ) // If instruction reads // from memory INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)MemoryReference, IARG_THREAD_ID, IARG_MEMORYREAD_EA, IARG_MEMORYREAD_SIZE, IARG_UINT32, ACCESS_TYPE_LOAD, IARG_END); if( INS_IsMemoryWrite(ins) ) // If instructions writes // to memory INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR) MemoryReference, IARG_THREAD_ID, IARG_MEMORYWRITE_EA, IARG_MEMORYWRITE_SIZE, IARG_UINT32, ACCESS_TYPE_STORE, IARG_END); }

38 Software & Services Group 38 ANALYSIS ROUTINES INSTR ROUTINES MAIN CMP$im – Analyze Memory References #include “cache_model.h” CACHE_t CacheHierarchy[MAX_NUM_THREADS][MAX_NUM_LEVELS]; VOID MemoryReference( int tid, ADDRINT addrStart, int size, int type) { for(addr=addrStart; addr<(addrStart+size); addr+=LINE_SIZE) LookupHierarchy( tid, FIRST_LEVEL_CACHE, addr, type); } VOID LookupHierarchy( int tid, int level, ADDRINT addr, int accessType) { result = cacheHier[tid][cacheLevel]->Lookup( addr, accessType ); if( result == CACHE_MISS ) { if( level == LAST_LEVEL_CACHE ) return; if( IsShared(level) ) AcquireLock(&lock[level], tid); LookupHierarchy(tid, level+1, addr, accessType); ReleaseLock(&lock[level]); } Synchronization point

39 Software & Services Group 39 Intel Thread Checker Detect data races Instrumentation –Memory operations –Synchronization operations Analysis –Use dynamic history of lock acquisition and release to form a partial order of memory references [Lamport 1978] –Unordered read/write and write/write pairs to same location are races Paul Petersen, Zhiqiang Ma

40 Software & Services Group 40 a documented data race in the art benchmark is detected

41 Software & Services Group 41 PinPlay : Workload capture and deterministic replay Problem : Multi-threaded programs are inherently non- deterministic making their analysis, simulation, debugging very challenging Solution: PinPlay : A Pin-based framework for capturing an execution of multi-threaded program and replaying it deterministically under Pin Harish Patil & Cristiano Pereira Joint work with Brad Calder, UCSD PinPlay LOGS Deterministic replay on any machine Application logger Pin input Application Replayer Pin App and input not needed once we have the log

42 Software & Services Group 42 Logging to provide deterministic behavior Start with checkpoint: memory image of code and data A thread is deterministic if every loads sees either: –Data from original checkpoint –Or a value computed and stored on the thread Potential non-determinism when a load sees a memory location written by an external agent –Another thread –Or system call, DMA, etc. Log these values with timestamps

43 Software & Services Group 43 Applying multi-threaded tracing to software tools Debugging. Customer interested in debugging tools derived from PinPlay –Capture bug at customer, bring home log to debug –Capture multi-threaded “heisenbug”, replay multiple times –How: combine PinPlay tracing with transparent debugging PinPlay LOGS Replayer Pin Debug Agent debugger Standard protocol Pin debug agent enables custom debugger commands

44 Software & Services Group 44 Total Overhead = Pin Overhead + Pintool Overhead ~5% for SPECfp and ~50% for SPECint Pin team’s job is to minimize this Usually much larger than pin overhead Pintool writers can help minimize this! Reducing Instrumentation Overhead

45 Software & Services Group 45 Instrumentation Routines Overhead Pintool’s Overhead Frequency of calling an Analysis Routine Work required for transiting to Analysis Routine Reducing the Pintool’s Overhead Analysis Routines Overhead + Work required in the Analysis Routine x Work done inside Analysis Routine +

46 Software & Services Group 46 Reducing Work in Analysis Routines Key: Shift computation from analysis routines to instrumentation routines whenever possible This usually has the largest speedup

47 Software & Services Group 47 Counting control flow edges call jne ret jne jmp 100 60 40 60 40 1

48 Software & Services Group 48 Edge Counting: a Slower Version... void docount2(ADDRINT src, ADDRINT dst, INT32 taken) { COUNTER *pedg = Lookup(src, dst); pedg->count += taken; } void Instruction(INS ins, void *v) { if (INS_IsBranchOrCall(ins)) { INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)docount2, IARG_INST_PTR, IARG_BRANCH_TARGET_ADDR, IARG_BRANCH_TAKEN, IARG_END); }... Instrumentation Analysis

49 Software & Services Group 49 Edge Counting: a Faster Version void docount(COUNTER* pedge, INT32 taken) { pedg->count += taken; } void docount2(ADDRINT src, ADDRINT dst, INT32 taken) { COUNTER *pedg = Lookup(src, dst); pedg->count += taken; } void Instruction(INS ins, void *v) { if (INS_IsDirectBranchOrCall(ins)) { COUNTER *pedg = Lookup(INS_Address(ins), INS_DirectBranchOrCallTargetAddress(ins)); INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR) docount, IARG_ADDRINT, pedg, IARG_BRANCH_TAKEN, IARG_END); } else INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR) docount2, IARG_INST_PTR, IARG_BRANCH_TARGET_ADDR, IARG_BRANCH_TAKEN, IARG_END); } … Analysis Instrumentation

50 Software & Services Group 50 Key: Instrument at the largest granularity whenever possible Instead of inserting one call per instruction Insert one call per basic block or trace Analysis Routines: Reduce Call Frequency

51 Software & Services Group 51 Slower Instruction Counting sub$0xff, %edx cmp%esi, %edx jle mov$0x1, %edi add$0x10, %eax counter++;

52 Software & Services Group 52 Faster Instruction Counting sub$0xff, %edx cmp%esi, %edx jle mov$0x1, %edi add$0x10, %eax counter += 3 counter += 2 Counting at BBL level sub$0xff, %edx cmp%esi, %edx jle mov$0x1, %edi add$0x10, %eax counter += 5 Counting at Trace level counter+=3 L1

53 Software & Services Group 53 Reducing Work for Analysis Transitions Reduce number of arguments to analysis routines –Inline analysis routines –Pass arguments in registers –Instrumentation scheduling

54 Software & Services Group 54 Reduce Number of Arguments Eliminate arguments only used for debugging Instead of passing TRUE/FALSE, create 2 analysis functions –Instead of inserting a call to: Analysis(BOOL val) –Insert a call to one of these: AnalysisTrue() AnalysisFalse() –IARG_CONTEXT is very expensive (> 10 arguments) –Use the new IARG_CONST_CONTEXT

55 Software & Services Group 55 Inlining int docount0(int i) { x[i]++ return x[i]; } Inlinable int docount1(int i) { if (i == 1000) x[i]++; return x[i]; } Not-inlinable int docount2(int i) { x[i]++; printf(“%d”, i); return x[i]; } Not-inlinable void docount3() { for(i=0;i<100;i++) x[i]++; } Not-inlinable Pin will inline analysis functions into jitted application code

56 Software & Services Group 56 Inlining Use the –log_inline invocation switch to record inlining decisions in pin.log pin –log_inline –t mytool – app Look in pin.log Analysis function (0x2a9651854c) from mytool.cpp:53 INLINED Analysis function (0x2a9651858a) from mytool.cpp:178 NOT INLINED The last instruction of the first BBL fetched is not a ret instruction Look at source or disassembly of the function in mytool.cpp at line 178 0x0000002a9651858a push rbp 0x0000002a9651858b mov rbp, rsp 0x0000002a9651858e mov rax, qword ptr [rip+0x3ce2b3] 0x0000002a96518595 inc dword ptr [rax] 0x0000002a96518597 mov rax, qword ptr [rip+0x3ce2aa] 0x0000002a9651859e cmp dword ptr [rax], 0xf4240 0x0000002a965185a4 jnz 0x11 –The function could not be inlined because it contains a control-flow changing instruction (other than ret)

57 Software & Services Group 57 Conditional Inlining Inline a common scenario where the analysis routine has a single “if-then” –The “If” part is always executed –The “then” part is rarely executed –Useful cases: 1.“If” can be inlined, “Then” is not 2.“If” has small number of arguments, “then” has many arguments (or IARG_CONST_CONTEXT) Pintool writer breaks analysis routine into two: –INS_InsertIfCall (ins, …, (AFUNPTR)doif, …) –INS_InsertThenCall (ins, …, (AFUNPTR)dothen, …)

58 Software & Services Group 58 IP-Sampling (a Slower Version) VOID Instruction(INS ins, VOID *v) { INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)IpSample, IARG_INST_PTR, IARG_END); } VOID IpSample(VOID* ip) { --icount; if (icount == 0) { fprintf(trace, “%p\n”, ip); icount = N + rand()%M; //icount is between } const INT32 N = 10000; const INT32 M = 5000; INT32 icount = N;

59 Software & Services Group 59 IP-Sampling (a Faster Version) VOID Instruction(INS ins, VOID *v) { // CountDown() is always called before an inst is executed INS_InsertIfCall(ins, IPOINT_BEFORE, (AFUNPTR)CountDown, IARG_END); // PrintIp() is called only if the last call to CountDown() // returns a non-zero value INS_InsertThenCall(ins, IPOINT_BEFORE, (AFUNPTR)PrintIp, IARG_INST_PTR, IARG_END); } INT32 CountDown() { --icount; return (icount==0); } VOID PrintIp(VOID *ip) { fprintf(trace, “%p\n”, ip); icount = N + rand()%M; //icount is between } inlined not inlined

60 Software & Services Group 60 Using Liveness Information Use -xyzzy –liveness 1 –App does NOT have exception handler that examines registers that are dead to the application –Perf gain mainly from NOT having to save app flags

61 Software & Services Group 61 Jitting time Jitting is expensive –Takes far more time to jit an instruction than to execute a jitted instruction Portions of a workload where very many IPs are being jitted, and executed a small number of times –Jitting time dominates execution time –E.g. –startup of a GUI app –Compiler compiling a non-large file –Vs Loop executing a large number of times –Jitting time is amortized over execution time

62 Software & Services Group 62 Optimizing Your Pintools - Summary Baseline Pin has fairly low overhead for non-jitting portions of workloads (~5-20%) Adding instrumentation can increase overhead significantly, but you can help! 1.Move work from analysis to instrumentation routines 2.Explore larger granularity instrumentation 3.Explore conditional instrumentation 4.Understand when Pin can inline instrumentation

63 Software & Services Group 63 OS Specifics

64 Software & Services Group 64 Windows-specific Challenges (1/2) Handling system calls –Pin must intercept system calls to regain control of the application on return from the system –Pin must monitor system calls to notify instrumentation when DLLs are loaded/unloaded, threads are created/terminated, etc. –System call interface is undocumented –System call numbers potentially change with each system build Handling exceptions and asynchronous interruptions –To maintain control and notify instrumentation about control flow changes Pin must intercept all transitions from kernel to user mode –Windows is not designed to have an independent agent interposed between the kernel and application –The kernel dispatches interruptions via (undocumented) entry points in ntdll.dll The main obstacle: direct interface between user-level code and Windows kernel is undocumented

65 Software & Services Group 65 Windows-specific Challenges (2/2) Injection –PIN VMM is a DLL that must be loaded into the address space of the application to get initial control of the process –Windows is not designed for proprietary loader –Common practice: intercept control at the entry point of the application –Instrumentation can not observe initialization procedures in staticly linked application DLLs –Injection presented in the introduction is referred to as Late Injection –It misses the initialization procedures in staticly linked application DLLs –Early injection is not trivial Isolation of instrumentation from the application –Instrumentation runs in the same process as the application it is observing –Enabling C run-time in the instrumentation causes sharing of system libraries (e.g. kernel32.dll) and their state with the application –To be transparent, Pin must –Preserve original state of system resources –Avoid reentrant use of shared libraries Pin minimizes its dependence on Windows system services in order to maximize observability and achieve better isolation

66 Software & Services Group 66 Windows Kernel PINTOOL.DLL PINVM.DLL - Virtual Machine Monitor Pin Architecture in Windows Shared memory Startup time CreateProcess PIN.EXE PIN.EXE Application Process Launcher Code Cache Application Code and Data KERNEL32.DLL NTDLL.DLL Injection Helper Symbol Server DBGHELP.DLL System Gate System Call Emulator Event Dispatcher Thread Dispatcher JIT Compiler

67 Software & Services Group 67 Injection Injection is the procedure for loading the PINVM.DLL into the address space of an application and gaining control of execution Other systems hook the entry point of the application –Too late: initialization procedures in application DLLs can not be instrumented For maximum observability, Pin should inject itself into a new process as early as possible, however… Pin depends on some basic system services so it is not possible to load PINVM.DLL until the loader and kernel32.dll have initialized The optimal injection point: just after initialization of kernel32.dll –Injection presented in the introduction is referred to as Late Injection –It misses the initialization procedures in staticly linked application DLLs

68 Software & Services Group 68 Launch of the Instrumented Process pin –t pintool.dll -- application.exe Pin Boot Routine Create (suspended) application process Attach to the application as a debugger Run the application process until kernel32.dll is loaded and initialized Copy Boot Routine into the application process and set PC to start of the routine Detach from the application process Load and start Pin VMM Load the instrumentation toolPIN.EXE Pin Boot Routine PINVM.DLL Instrument and execute the application Debugging API All application instructions are executed under Pin control PIN.EXE Windows Kernel Application Process NTDLL.DLL APPLICATION.EXE APPLICATION.DLL KERNEL32.DLL PINVM.DLL PINTOOL.DLL

69 Software & Services Group 69 Handling System Calls Pin must manage the execution of system calls –To regain control when the system returns to user mode with a modified thread context –To monitor and handle some important system events –Loading DLLs, creation and termination of threads and processes, etc. Pin intercepts system call instructions, not Win32 APIs –Pin instruments all modules in the user space, including system libraries –Some applications use native API (NTDLL interface) directly, bypassing Win32 API –Win32 API layer is very wide, while system call instructions are easy to discover Three steps in managing system calls: –Detect a system call and redirect control to VMM –Execute the system call on behalf of the application –Regain control when the kernel returns to user with a new context –The system may interrupt system call execution by asynchronous calls to application procedures

70 Software & Services Group 70 System Call Interception Pin detects system call instructions when it generates traces in the code cache –IA-32: sysenter and int 2E ; Intel64: syscall ; etc. –This is a static analysis, so the overhead is low Pin executes system calls in VMM, not in the code cache - emits jump to VMM instead of the system call instruction –Enables flushing the code cache while a system call blocks in the kernel –VM lock is NOT held during the actual syscall Some system calls may affect Pin’s internal state. To handle them properly, Pin must know the corresponding system call numbers –Windows system call numbers are unpublished and potentially change with each system build –Pin discovers system call numbers dynamically, on the early stage of the injection process –We trace the corresponding NTDLL functions until a system call instruction is reached and then read the system call number from the EAX/RAX register

71 Software & Services Group 71 System Call Execution The System Call Emulator executes all “known” system calls that may affect the VMM state, e.g. memory mappings, creation and termination of threads and processes, etc. The remaining, unknown system calls are forwarded to the System Gate –Per-thread procedure that transparently executes system calls and regains control upon return or interruption –Fills/spills original context before/after system calls –Recovers original context (PC) when a system call is interrupted Switch to the application context int 2e Switch to the Pin context System Call Emulator System Gate Code Cache Original Code sysenter jmp VMM Notify Pintool before system call ReturnFromSystemCall: Notify Pintool after system call Is “known” system call? Execute/Emulate Y jmp ReturnFromSystemCall N System Gate executes system calls “blindly”, assuming that each of them can arbitrarily modify context and control flow (if interrupted)

72 Software & Services Group 72 User Procedure Calls (UPC) UPC is a control transfer from the kernel to a user-level procedure Asynchronous procedure call (APC) –Asynchronous events: file I/O completion, timer expiration –Thread initialization APC signals start of a new thread Callback –Asynchronous Windows GUI message Exception –Access violation, illegal instruction, divide by zero, etc. Asynchronous events are not delivered immediately, but wait in queue until the application invokes an interruptible (alertable) system call Pin must intercept UPCs to maintain control of the application and recover the original interruption context (visible to the application)

73 Software & Services Group 73 UPC Interception The kernel dispatches UPCs through entry points in NTDLL.DLL To intercept UPCs, Pin overwrites the NTDLL entry points with trampolines that jump to the Event Dispatcher in Pin When a UPC is intercepted, Pin recovers original interruption context in the UPC frame prepared by the kernel –JIT Compiler recovers context of exceptions that occurred in the code cache –System Call Emulator recovers context of interrupted system calls NTDLL.DLL Windows kernel Pin VMM Code Cache UPC Dispatcher KiUserApcDispatcherKiUserCallbackDispatcherKiUserExceptionDispatcher Translated KiUserApcDispatcher Translated KiUserCallbackDispatcher Translated KiUserExceptionDispatcher APC CallbackException Recover original PC in the APC frame Save PC of the interrupted system call Recover original context in the exception frame Pin intercepts all control transfers from the kernel to the user mode

74 Software & Services Group 74 Exceptions (1/2) Unlike APCs and callbacks that are queued and delivered at the next alertable system call, exceptions are synchronous events Exceptions do not necessarily cause abnormal termination of the process – the application may expect and handle exceptions Pin must provide exception handlers with the same exception information that accompanies exceptions in the native application –Exception context, code and exception-specific parameters From the Pin’s perspective, there are three kinds (sources) of exceptions in Windows applications: –An attempt to fetch an invalid or inaccessible instruction –An attempt to execute a faulting instruction –Software exceptions generated by the application

75 Software & Services Group 75 Exceptions (2/2) Decoder (fetcher) of instructions raises an exception if it encounters an invalid or inaccessible instruction –When the kernel delivers this exception back to the user mode, Pin skips the context translation because it sees original PC in the exception context Other hardware exceptions occur in the code cache –Recovery of the original exception context is nontrivial due to register allocation –Pin retranslates the interrupted trace to get the virtual-physical register bindings at the faulting point –Optimization: small cache of register bindings for frequent exceptions Other hardware exceptions occur in the tool code –Pin APIs for tool to manage it’s exceptions Application can generate software exceptions using Win32 API –The exception context represents an original application state –Context translation is not needed Pin delivers precise exceptions to applications handlers

76 Software & Services Group 76 Multithreading Support Pin instruments and runs all threads of the application from the first to the last user-mode instruction –Attaches to a new thread when the system delivers the thread initialization APC –Maintains control until the thread exits –Intercepts threads created by remote processes Pin’s threading activities are transparent to the application –Pin VMM serializes some of its operations (e.g. JIT compilation), but never executes code of the application under Pin locks –Except for initialization phase, Pin never acquires locks in system libraries, e.g. loader lock or process heap lock –Each thread has a shadow stack that is used by Pin VMM and Pintool

77 Software & Services Group 77 Thread-Local State Key elements of the Pin’s thread-local state: –Spill area keeps values of spilled virtual registers –JIT-compiled traces need fast access to spilled register values –Pin steals one physical register to point to the spilling area –TEB state keeps original thread-local state of system libraries, e.g. last Win32 error value, stack limit –C run-time routines may access/modify these values –Need to preserve the original state while running in Pin VMM or PinTool –System call state contains information about active and interrupted system calls in the thread –The information is used to restore the original context on return from the system Pin steals one TLS slot from the application to enable fast access to the thread-local data in Pin VMM

78 Software & Services Group 78 Thread Suspension and Context Manipulation A thread can suspend another thread and read/modify its context –SuspendThread(), GetThreadContext(), SetThreadContext() Pin must emulate the corresponding system calls to avoid deadlocks and transparency issues –Target thread may hold a Pin lock –The thread context is not original –Suspended traces disable flushing the code cache Solution: Force a thread to leave the code cache and wait until the thread reaches a safe point Safe point = no locks, not in the code cache, accessible original context –Unlink the suspended trace from successors and let it enter VMM –Block the thread in the safe VMM point or in the System Gate –Use thread-local data to store and access the original context associated with the safe point

79 Software & Services Group 79 Linux-specific Challenges (1/3) Handling system calls –Pin must intercept system calls to regain control of the application on return from the system –Pin must monitor system calls to notify instrumentation when DLLs are loaded/unloaded, threads are created/terminated, etc. –Some system calls may behave differently on different Linux distributions. Signal handling –Pin must identify whether the signal originated from the application, the tool or Pin itself. –Pin cannot seem to interfere with the applications signal mask.

80 Software & Services Group 80 Linux-specific Challenges (2/3) Injection –Pin relies on the ptrace system call for injection. The ptrace system call has known bugs in several Linux version. –Some platforms do not allow tracing a parent application by a child. In such cases the application is run on the child -> pid is changed. Isolation of instrumentation from the application –Instrumentation runs in the same process as the application it is observing. –Pin and the application share the same physical segment registers. In probe mode, this restricts the libc versions allowed.

81 Software & Services Group 81 Linux-specific Challenges (3/3) GLIBC –Pin must emulate several libc services for several reasons. For example: –Pin may run before libc is initialized e.g. during injection. –Pin’s libc may “get confused”. For example: libc’s getpid wrapper function incorporates a cache. The first call to getpid actually calls the getpid system call but any subsequent calls will access the cache. Upon a fork, the cache is invalidated and the next call to getpid will again call the getpid system call. Since the process has two copies of libc, only the application’s cache is invalidated and Pin’s copy is stale. –

82 Software & Services Group 82 Handling System Calls Pin must manage the execution of system calls –Pin must maintain control all the time –System calls are executed inside pin and return to the application –In most cases the system call is executed without the pin VM lock –Certain system calls are emulated by pin (see below) System call emulation –Pin detects if a system call needs emulation. –Pin needs to know the attributes of each memory page for SMC support –Therefore all system calls related to memory are emulated by pin –Signal related system calls are emulated –Creating of new threads and new child processes –Setting/getting of the TLS segment registers –Thread and process termination

83 Software & Services Group 83 Signal Handling Pin registers its own signal handlers for all signals, and saves the application’s handlers. Pin must handle both synchronous and asynchronous signals. Asynchronous signals: –These signals may be delivered “at will” so Pin waits for safe point to deliver them. –When such a signal arrives, Pin’s internal handler registers this signal, unlinks the current trace and resumes execution from the code cache. –At the trace’s exit point, the executing thread jumps to the VM, thus transferring control over to Pin. The VM checks if there are pending signals and calls the application’s original signal handlers for these signals (jitting them). Synchronous signals: –These signals must be delivered immediately. –They may originate from the application, the tool or Pin itself. –Pin’s internal handler is called, it determines the origin of the signal and propagates the signal delivery to the tool and application is necessary. –If signal is delivered to the application, the application’s signal handler is jitted.

84 Software & Services Group 84 Multithreading Support Pin instruments and runs all threads of the application from the first to the last user-mode instruction –Attaches to the thread upon the first user-space instruction –Maintains control until the thread exits Pin’s threading activities are transparent to the application –The Pin VM serializes some of its operations (e.g. JIT compilation), but never executes code of the application under Pin locks –Each thread has a shadow stack that is used by the Pin VM and the Pintool –Pin and pintools are prohibited from using the pthread library due to conflicts with some internal structures. Therefore Pin provides its own APIs for thread creation and control.

85 Software & Services Group 85 Thread-Local Storage (Linux) JIT mode – segment virtualization –TLS is accessed via the fs (64 bit) or gs (32 bit) segment register. –Both the application and Pin share this register, but expect different values. –Pin emulates the application’s usage of the fs/gs register thus isolating the application’s TLS for Pin’s. Probe mode – no TLS usage by Pin –Probe mode does not enable the method described above. –Pin uses its own version of GLIBC which does not use TLS.

86 Software & Services Group 86 Isolation/Windows Pin Tools are compiled to use the static CRT Pin on Windows does not separate DLLs loaded by the tool from the application DLLs - it uses the same system loader. –The tool should not load any DLL that can be shared with the application. – The tool should avoid static links to any common DLL, except for those listed in PIN_COMMON_LIBS (see source\tools\ms.flags file).

87 Software & Services Group 87 Isolation/Windows Pin on Windows guarantees safe usage of C/C++ run-time services in Pin tools, including indirect calls to Windows API through C run-time library. –Any other use of Windows API in Pin tool is not guaranteed to be safe Pin uses some base types that conflict with Windows types. If you use "windows.h", you may see compilation errors. So do: namespace WINDOWS { #include }

88 Software & Services Group 88 Isolation/Linux Pin is injected in to address space and has its own copy of the dynamic loader and runtime libraries (GLIBC, etc). Pin uses a small library of CRT for direct calls to system calls. The process has a single signals table (shared among all threads), pin manages an internal signal table and emulate all the system calls related to signals.

89 Software & Services Group 89 Isolation/Linux pthread functions cannot be called from an analysis or replacement routine Pintools on Linux need to take care when calling standard C or C++ library routines from analysis or replacement functions – because the C and C++ libraries linked into Pintools are not thread-safe

90 Software & Services Group 90 Part3: Deeper into Pin API Agenda –memtrace_simple tool –membuffer_simple tool –branchbuffer_simple tool –Symbols DebugInfo –Probe-Mode –Multi-Threading –Taint analysis

91 Software & Services Group 91 memtrace_simple Tool code collects pairs of {appIP, memAddr} of memory accessing instructions into a per-thread buffer. –Process when no more room in buffer

92 Software & Services Group 92 memtrace_simple Tool code must –Instrument each memory accessing instruction –Determine where in the buffer the {appIP, memAddr} of the instruction should be written –Determine when the buffer becomes full Will instrument instructions on Trace level – i.e. TRACE_AddInstrumentFunction(Trace, 0); –Not all instructions in the trace will necessarily execute each time trace is executed – because of early exits. Will try to allocate, in the buffer, maximum space needed by trace at the trace start – if not enough space => buffer is full

93 Software & Services Group 93 memtrace_simple Instrumentation code for each memory accessing instruction in the trace will write it’s {appIP, memAddr} pair to a constant offset from the start of the trace in the buffer. –Empty pairs (those instructions that were NOT executed) will be denoted by having an appIP==0.

94 Software & Services Group 94 memtrace_simple Early Exit Trace Exit Non memory access ins Instrumentation code for following memory access ins Memory access ins Trace Buffer endOfTraceReg endOfBufferReg TotalSize Occupied ByTraceIn Buffer If endOf(Previous)TraceReg + TotalSizeOccupiedByTraceInBuffer > endOfBufferReg Then Call BufferFull endOfTraceReg += TotalSizeOccupiedByTraceInBuffer appIP memAddr appIP memAddr appIP memAddr

95 Software & Services Group 95 memtrace_simple Early Exit Trace Exit Non memory access ins Instrumentation code for following memory access ins Memory access ins Trace Buffer endOfTraceReg endOfBufferReg TotalSize Occupied ByTraceIn Buffer If endOf(Previous)TraceReg + TotalSizeOccupiedByTraceInBuffer > endOfBufferReg Then Call BufferFull endOfTraceReg += TotalSizeOccupiedByTraceInBuffer appIP memAddr appIP memAddr appIP memAddr endOfTraceReg

96 Software & Services Group 96 memtrace_simple Tool will: – iterate thru all INSs of the Trace –Record which ones need to be instrumented (access memory) –Record the ins, the memop, the offset from start of the trace in the buffer where the {appIP, memAddr} pair of this ins should be written –Get a sum of the TotalSizeOccupiedByTraceInBuffer –Insert the IF-THEN sequence at the beginning of the trace –Insert the update of endOfTraceReg just after the IF-THEN sequence –iterate thru recorded (memory accessing) INSs of the Trace –Insert the instrumentation code before each recorded memory accessing instruction –this is the code that writes the {appIP, memAddr} pair into the buffer at the designated offset (from start of trace) for this INS. –endOfTraceReg and endOfBufferReg are virtual registers allocated by Pin to the Pin tool.

97 Software & Services Group 97 memtrace_simple TLS_KEY appThreadRepresentitiveKey; // Pin TLS key REG endOfTraceInBufferReg; // Pin virtual Reg that will hold the pointer to the end of the trace data in // the buffer REG endOfBufferReg; // Pin virtual Reg that will hold the pointer to the end of the buffer struct MEMREF { ADDRINT appIP; ADDRINT memAddr; } ; // structure of the {appIP, memAddr} pair of a memory accessing ins in the buffer int main(int argc, char * argv[]) { PIN_Init(argc,argv) ; // Pin TLS slot for holding the object that represents the application thread appThreadRepresentitiveKey = PIN_CreateThreadDataKey(0); // get the registers to be used in each thread for managing the per-thread buffer endOfTraceInBufferReg = PIN_ClaimToolRegister(); endOfBufferReg = PIN_ClaimToolRegister(); TRACE_AddInstrumentFunction(TraceAnalysisCalls, 0); PIN_AddThreadStartFunction(ThreadStart, 0); PIN_AddThreadFiniFunction(ThreadFini, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); }

98 Software & Services Group 98 memtrace_simple KNOB KnobNumBytesInBuffer(KNOB_MODE_WRITEONCE, "pintool", "num_bytes_in_buffer", "0x100000", "number of bytes in buffer"); APP_THREAD_REPRESENTITVE::APP_THREAD_REPRESENTITVE(THREADID myTid) { _buffer = new char[KnobNumBytesInBuffer.Value()]; // Allocate the buffer _numBuffersFilled = 0; _numElementsProcessed = 0; _myTid = myTid; } char * APP_THREAD_REPRESENTITVE::Begin() { return _buffer; } char * APP_THREAD_REPRESENTITVE:: End() { return _buffer + KnobNumBytesInBuffer.Value(); } VOID ThreadStart(THREADID tid, CONTEXT *ctxt, INT32 flags, VOID *v) // Pin callback on thread creation { // There is a new APP_THREAD_REPRESENTITVE object for every thread APP_THREAD_REPRESENTITVE * appThreadRepresentitive = new APP_THREAD_REPRESENTITVE(tid); // A thread will need to look up its APP_THREAD_REPRESENTITVE, so save pointer in Pin TLS PIN_SetThreadData(appThreadRepresentitiveKey, appThreadRepresentitive, tid); // Initialize endOfTraceInBufferReg to point at beginning of buffer PIN_SetContextReg(ctxt, endOfTraceInBufferReg, reinterpret_cast (appThreadRepresentitive->Begin())); // Initialize endOfBufferReg to point at end of buffer PIN_SetContextReg(ctxt, endOfBufferReg, reinterpret_cast (appThreadRepresentitive->End())); }

99 Software & Services Group 99 memtrace_simple / Trace Instrmnt void TraceAnalysisCalls(TRACE trace, void *) /*TRACE_AddInstrumentFunction(TraceAnalysisCalls, 0)*/ { // Go over all BBLs of the trace and for each BBL determine and record the INSs which need // to be instrumented - i.e. the ins requires an analysis call TRACE_ANALYSIS_CALLS_NEEDED traceAnalysisCallsNeeded; for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl)) DetermineBBLAnalysisCalls(bbl, &traceAnalysisCallsNeeded); // If No memory accesses in this trace if (traceAnalysisCallsNeeded.NumAnalysisCallsNeeded() == 0) return; // APP_THREAD_REPRESENTITVE::CheckIfNoSpaceForTraceInBuffer will determine if there are NOT enough // available bytes in the buffer. If there are NOT then it returns TRUE and the BufferFull function is called TRACE_InsertIfCall(trace, IPOINT_BEFORE, AFUNPTR(APP_THREAD_REPRESENTITVE::CheckIfNoSpaceForTraceInBuffer), IARG_FAST_ANALYSIS_CALL, IARG_REG_VALUE, endOfTraceInBufferReg, // previous trace IARG_REG_VALUE, endOfBufferReg, IARG_UINT32, traceAnalysisCallsNeeded.TotalSizeOccupiedByTraceInBuffer(), IARG_END); TRACE_InsertThenCall(trace, IPOINT_BEFORE, AFUNPTR(APP_THREAD_REPRESENTITVE::BufferFull), IARG_FAST_ANALYSIS_CALL, IARG_REG_VALUE, endOfTraceInBufferReg, IARG_THREAD_ID, IARG_RETURN_REGS, endOfTraceInBufferReg, IARG_END); TRACE_InsertCall(trace, IPOINT_BEFORE, AFUNPTR(APP_THREAD_REPRESENTITVE::AllocateSpaceForTraceInBuffer), IARG_FAST_ANALYSIS_CALL, IARG_REG_VALUE, endOfTraceInBufferReg, IARG_UINT32, traceAnalysisCallsNeeded.TotalSizeOccupiedByTraceInBuffer(), IARG_RETURN_REGS, endOfTraceInBufferReg, IARG_END); // Insert Analysis Calls for each INS on the trace that was recorded as needing one traceAnalysisCallsNeeded.InsertAnalysisCalls(); } Trace level instrumentation Record all INSs in trace that access memory At the start of the trace: InsertIf call to CheckIfNoSpaceForTraceInBuffer to check if there is NOT enough room in the buffer to insert the AppIp, MemAddr pair for each of the recorded INSs CheckIfNoSpaceForTraceInBuffer will return 1 if there is NOT enough room in the buffer In this case the BufferFull function will be called to process all pairs in the buffer and set the endOfTraceInBufferReg to point to the top of the buffer Now insert a call to the function AllocateSpaceForTraceInBuffer which allocates space in the buffer for the instrumentations of each of the memory accessing INSs in the trace to write their {appIp, memAddr} pairs. This is done by adding size of the space needed to the endOfTraceInBufferReg Finally, iterate over all of the memory accessing INSs in the and insert a call to the analysis routine that records the {appIp, memAddr} pair into it’s allocated space in the buffer

100 Software & Services Group 100 memtrace_simple class ANALYSIS_CALL_INFO { public: ANALYSIS_CALL_INFO(INS ins, UINT32 offsetFromTraceStartInBuffer, UINT32 memop) : _ins(ins), _offsetFromTraceStartInBuffer(offsetFromTraceStartInBuffer), _memop (memop) {} void InsertAnalysisCall(INT32 sizeofTraceInBuffer); private: INS _ins; INT32 _offsetFromTraceStartInBuffer; UINT32 _memop; }; class TRACE_ANALYSIS_CALLS_NEEDED { public: TRACE_ANALYSIS_CALLS_NEEDED() : _numAnalysisCallsNeeded(0), _currentOffsetFromTraceStartInBuffer(0) {} UINT32 NumAnalysisCallsNeeded() const { return _numAnalysisCallsNeeded; } UINT32 TotalSizeOccupiedByTraceInBuffer() const { return _currentOffsetFromTraceStartInBuffer; } void RecordAnalysisCallNeeded(INS ins, UINT32 memop) { _analysisCalls.push_back(ANALYSIS_CALL_INFO(ins, _currentOffsetFromTraceStartInBuffer, memop)); _currentOffsetFromTraceStartInBuffer += sizeof(MEMREF); _numAnalysisCallsNeeded++; } void InsertAnalysisCalls(); private: INT32 _currentOffsetFromTraceStartInBuffer; INT32 _numAnalysisCallsNeeded; vector _analysisCalls; }; void DetermineBBLAnalysisCalls (BBL bbl, TRACE_ANALYSIS_CALLS_NEEDED * traceAnalysisCallsNeeded) { for (INS ins = BBL_InsHead(bbl); INS_Valid(ins); ins = INS_Next(ins)) { // Iterate over each memory operand of the instruction. for (UINT32 memOp = 0; memOp < INS_MemoryOperandCount(ins); memOp++) // Record that an analysis call is needed, along with the info needed to generate the analysis // call traceAnalysisCallsNeeded->RecordAnalysisCallNeeded(ins, memOp); } } Class for recording one memory accessing INS that will be instrumented Class for recording all the memory accessing INSs in the Trace Called for each BBL in the Trace Records each memory accessing INS in the BBL into vector of ANALYSIS_CALL_INFO

101 Software & Services Group 101 memtrace_simple / Trace Instrmnt void TraceAnalysisCalls(TRACE trace, void *) /*TRACE_AddInstrumentFunction(TraceAnalysisCalls, 0)*/ { // Go over all BBLs of the trace and for each BBL determine and record the INSs which need // to be instrumented - i.e. the ins requires an analysis call TRACE_ANALYSIS_CALLS_NEEDED traceAnalysisCallsNeeded; for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl)) DetermineBBLAnalysisCalls(bbl, &traceAnalysisCallsNeeded); // If No memory accesses in this trace if (traceAnalysisCallsNeeded.NumAnalysisCallsNeeded() == 0) return; // APP_THREAD_REPRESENTITVE::CheckIfNoSpaceForTraceInBuffer will determine if there are NOT enough // available bytes in the buffer. If there are NOT then it returns TRUE and the BufferFull function is called TRACE_InsertIfCall(trace, IPOINT_BEFORE, AFUNPTR(APP_THREAD_REPRESENTITVE::CheckIfNoSpaceForTraceInBuffer), IARG_FAST_ANALYSIS_CALL, IARG_REG_VALUE, endOfTraceInBufferReg, // previous trace IARG_REG_VALUE, endOfBufferReg, IARG_UINT32, traceAnalysisCallsNeeded.TotalSizeOccupiedByTraceInBuffer(), IARG_END); TRACE_InsertThenCall(trace, IPOINT_BEFORE, AFUNPTR(APP_THREAD_REPRESENTITVE::BufferFull), IARG_FAST_ANALYSIS_CALL, IARG_REG_VALUE, endOfTraceInBufferReg, IARG_THREAD_ID, IARG_RETURN_REGS, endOfTraceInBufferReg, IARG_END); TRACE_InsertCall(trace, IPOINT_BEFORE, AFUNPTR(APP_THREAD_REPRESENTITVE::AllocateSpaceForTraceInBuffer), IARG_FAST_ANALYSIS_CALL, IARG_REG_VALUE, endOfTraceInBufferReg, IARG_UINT32, traceAnalysisCallsNeeded.TotalSizeOccupiedByTraceInBuffer(), IARG_RETURN_REGS, endOfTraceInBufferReg, IARG_END); // Insert Analysis Calls for each INS on the trace that was recorded as needing one traceAnalysisCallsNeeded.InsertAnalysisCalls(); }

102 Software & Services Group 102 memtrace_simple static ADDRINT PIN_FAST_ANALYSIS_CALL APP_THREAD_REPRESENTITVE::CheckIfNoSpaceForTraceInBuffer ( // Pin will inline this function char * endOfPreviousTraceInBuffer, char * bufferEnd, ADDRINT totalSizeOccupiedByTraceInBuffer) { return (endOfPreviousTraceInBuffer + totalSizeOccupiedByTraceInBuffer >= bufferEnd); } static char * PIN_FAST_ANALYSIS_CALL APP_THREAD_REPRESENTITVE::BufferFull ( // Pin will NOT inline this function char *endOfTraceInBuffer, ADDRINT tid) { // Get this thread’s APP_THREAD_REPRESENTITVE from the Pin TLS APP_THREAD_REPRESENTITVE * appThreadRepresentitive = static_cast (PIN_GetThreadData(appThreadRepresentitiveKey, tid)); appThreadRepresentitive->ProcessBuffer(endOfTraceInBuffer); // After processing the buffer, move the endOfTraceInBuffer back to the beginning of the buffer endOfTraceInBuffer = appThreadRepresentitive->Begin(); return endOfTraceInBuffer; } static char * PIN_FAST_ANALYSIS_CALL APP_THREAD_REPRESENTITVE::AllocateSpaceForTraceInBuffer (// Pin will inline this function char * endOfPreviousTraceInBuffer, ADDRINT totalSizeOccupiedByTraceInBuffer) { return (endOfPreviousTraceInBuffer + totalSizeOccupiedByTraceInBuffer); } Analysis functions inserted at start of each trace. IF call to determine if there is NOT enough room in the buffer for the { appIp, memAddr} pairs of all the memory accessing INSs in the trace. Inserted and Executed at beginning of each trace. Inlined by Pin. Returns 1 if there is NOT enough room, 0 if there is THEN call to process the buffer, and set the endOfTraceInBufferReg to the beginning of the buffer Inserted at beginning of each trace, just AFTER the IF call Executed only when the IF function returns 1 Function inserted at the beginning of each trace, just after the THEN function. Executed each time trace executes. Inlined by Pin Allocates space in the buffer for the {appIp, memAddr} pairs of all of the memory accessing INSs in the trace.

103 Software & Services Group 103 memtrace_simple / Trace Instrmnt void TraceAnalysisCalls(TRACE trace, void *) /*TRACE_AddInstrumentFunction(TraceAnalysisCalls, 0)*/ { // Go over all BBLs of the trace and for each BBL determine and record the INSs which need // to be instrumented - i.e. the ins requires an analysis call TRACE_ANALYSIS_CALLS_NEEDED traceAnalysisCallsNeeded; for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl)) DetermineBBLAnalysisCalls(bbl, &traceAnalysisCallsNeeded); // If No memory accesses in this trace if (traceAnalysisCallsNeeded.NumAnalysisCallsNeeded() == 0) return; // APP_THREAD_REPRESENTITVE::CheckIfNoSpaceForTraceInBuffer will determine if there are NOT enough // available bytes in the buffer. If there are NOT then it returns TRUE and the BufferFull function is called TRACE_InsertIfCall(trace, IPOINT_BEFORE, AFUNPTR(APP_THREAD_REPRESENTITVE::CheckIfNoSpaceForTraceInBuffer), IARG_FAST_ANALYSIS_CALL, IARG_REG_VALUE, endOfTraceInBufferReg, // previous trace IARG_REG_VALUE, endOfBufferReg, IARG_UINT32, traceAnalysisCallsNeeded.TotalSizeOccupiedByTraceInBuffer(), IARG_END); TRACE_InsertThenCall(trace, IPOINT_BEFORE, AFUNPTR(APP_THREAD_REPRESENTITVE::BufferFull), IARG_FAST_ANALYSIS_CALL, IARG_REG_VALUE, endOfTraceInBufferReg, IARG_THREAD_ID, IARG_RETURN_REGS, endOfTraceInBufferReg, IARG_END); TRACE_InsertCall(trace, IPOINT_BEFORE, AFUNPTR(APP_THREAD_REPRESENTITVE::AllocateSpaceForTraceInBuffer), IARG_FAST_ANALYSIS_CALL, IARG_REG_VALUE, endOfTraceInBufferReg, IARG_UINT32, traceAnalysisCallsNeeded.TotalSizeOccupiedByTraceInBuffer(), IARG_RETURN_REGS, endOfTraceInBufferReg, IARG_END); // Insert Analysis Calls for each INS on the trace that was recorded as needing one traceAnalysisCallsNeeded.InsertAnalysisCalls(); }

104 Software & Services Group 104 memtrace_simple static void PIN_FAST_ANALYSIS_CALL APP_THREAD_REPRESENTITVE::RecordMEMREFInBuffer ( // Pin will inline this function char* endOfTraceInBuffer, ADDRINT offsetFromEndOfTrace, ADDRINT appIp, ADDRINT memAddr) { *reinterpret_cast (endOfTraceInBuffer+ offsetFromEndOfTrace) = appIp; *reinterpret_cast (endOfTraceInBuffer+ offsetFromEndOfTrace +sizeof(ADDRINT)) = memAddr; } void ANALYSIS_CALL_INFO::InsertAnalysisCall(INT32 sizeofTraceInBuffer) { /* the place in the buffer where the {appIp, memAddr} of this _ins should be recorded is computed by: endOfTraceInBufferReg -sizeofTraceInBuffer + _offsetFromTraceStartInBuffer(of this _ins) */ INS_InsertCall(_ins, IPOINT_BEFORE, AFUNPTR(APP_THREAD_REPRESENTITVE::RecordMEMREFInBuffer), IARG_FAST_ANALYSIS_CALL, IARG_REG_VALUE, endOfTraceInBufferReg, IARG_ADDRINT, ADDRINT(_offsetFromTraceStartInBuffer - sizeofTraceInBuffer), IARG_INST_PTR, IARG_MEMORYOP_EA, _memop, IARG_END); } void TRACE_ANALYSIS_CALLS_NEEDED::InsertAnalysisCalls() {// Iterate over the recorded ANALYSIS_CALL_INFO elements – insert the analysis call for (vector ::iterator c = _analysisCalls.begin(); c != _analysisCalls.end(); c++) c->InsertAnalysisCall(TotalSizeOccupiedByTraceInBuffer()); }

105 Software & Services Group 105 membuffer_simple Since managing a per-thread buffer is a necessity of a large class of Pin tools: Provide Pin APIs to make it (more) easy. Pin Buffering API, abstracts away the need for a Pin tool to manage per-thread buffers PIN_DefineTraceBuffer –Define a per-thread buffer that each application trace can write data to INS_InsertFillBuffer – Instrumentation code is generated to write the desired data into the buffer –This code is inlined Tool defined BufferFull function, instrumentation code will cause this function to be called when the buffer becomes full

106 Software & Services Group 106 membuffer_simple Pin Buffering API actually works somewhat different than memtrace –Instrumentation code will insert the data generated by an INS into the buffer immediately after the data generated by the previously executed instrumented INS –Better buffer utilization –Requires the instrumentation to update the next buffer location to write to – this was not required in the memtrace implementatio –All this is invisible to the Pin tool writer membuffer_simple is a Pin tool that uses the Pin Buffering API to do the same memory access recording that memtrace_simple does

107 Software & Services Group 107 membuffer_simple KNOB KnobNumPagesInBuffer(KNOB_MODE_WRITEONCE, "pintool", "num_pages_in_buffer", "256", "number of pages in buffer"); // Struct of memory reference written to the buffer struct MEMREF { ADDRINT appIP; ADDRINT memAddr; }; // The buffer ID returned by the one call to PIN_DefineTraceBuffer BUFFER_ID bufId; TLS_KEY appThreadRepresentitiveKey; int main(int argc, char * argv[]) { PIN_Init(argc,argv) ; // Pin TLS slot for holding the object that represents an application thread appThreadRepresentitiveKey = PIN_CreateThreadDataKey(0); // Define the buffer that will be used – buffer is allocated to each thread when the thread starts //running bufId = PIN_DefineTraceBuffer(sizeof(struct MEMREF), KnobNumPagesInBuffer, BufferFull, // This Pin tool function will be called when buffer is full 0); INS_AddInstrumentFunction(Instruction, 0); // The Instruction function will use the Pin Buffering // API to insert the instrumentation code that writes // the MEMREF of a memory accessing INS into the buffer PIN_AddThreadStartFunction(ThreadStart, 0); PIN_AddThreadFiniFunction(ThreadFini, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); }

108 Software & Services Group 108 membuffer_simple /* * Pin generates code to call this function when a buffer fills up, and exceutes a callback to this function when the thread exits * Pin will NOT inline this function * @param[in] id buffer handle * @param[in] tid id of owning thread * @param[in] ctxt application context * @param[in] buf actual pointer to buffer * @param[in] numElements number of records * @param[in] v callback value * @return A pointer to the buffer to resume filling. */ VOID * BufferFull(BUFFER_ID id, THREADID tid, const CONTEXT *ctxt, VOID *buf, UINT64 numElements, VOID *v) { // retrieve the APP_THREAD_REPRESENTITVE* of this thread from the Pin TLS APP_THREAD_REPRESENTITVE * appThreadRepresentitive = static_cast ( PIN_GetThreadData( appThreadRepresentitiveKey, tid ) ); appThreadRepresentitive->ProcessBuffer(buf, numElements); return buf; }}

109 Software & Services Group 109 membuffer_simple VOID Instruction (INS ins, VOID *v) { UINT32 numMemOperands = INS_MemoryOperandCount(ins); // Iterate over each memory operand of the instruction. for (UINT32 memOp = 0; memOp < numMemOperands ; memOp++) { // Add the instrumentation code to write the appIP and memAddr // of this memory operand into the buffer // Pin will inline the code that writes to the buffer INS_InsertFillBuffer(ins, IPOINT_BEFORE, bufId, IARG_INST_PTR, offsetof(struct MEMREF, appIP), IARG_MEMORYOP_EA, memOp, offsetof(struct MEMREF, memAddr), IARG_END); }

110 Software & Services Group 110 branchbuffer_simple Use Pin Buffering API to collect a branch trace: –For each executed branch instruction record: –appIP of the branch instruction –targetAddress of the branch instruction –branchTaken boolean

111 Software & Services Group 111 branchbuffer_simple KNOB KnobNumPagesInBuffer(KNOB_MODE_WRITEONCE, "pintool", "num_pages_in_buffer", "256", "number of pages in buffer"); struct BRANCH_INFO { // This is the structure of the data that will be written into the buffer ADDRINT appIP; ADDRINT targetAddress; BOOL branchTaken; }; int main(int argc, char *argv[]) { PIN_Init(argc,argv); bufId = PIN_DefineTraceBuffer(sizeof(BRANCH_INFO), KnobNumPagesInBuffer, BufferFull, 0); // Register function to be called to instrument traces TRACE_AddInstrumentFunction(Trace, 0); // Register function to be called when the application exits PIN_AddFiniFunction(Fini, 0); // Start the program, never returns PIN_StartProgram(); }

112 Software & Services Group 112 branchbuffer_simple void Trace(TRACE tr, void* V) // TRACE_AddInstrumentFunction(Trace, 0); { for(BBL bbl = TRACE_BblHead(tr); BBL_Valid(bbl); bbl=BBL_Next(bbl)) { if (INS_IsBranchOrCall(BBL_InsTail(bbl))) // The branch instruction, if it exists, will always // be the last in the BBL { INS_InsertFillBuffer(BBL_InsTail(bbl), IPOINT_BEFORE, bufId, IARG_INST_PTR, offsetof(BRANCH_INFO, appIP), IARG_BRANCH_TARGET_ADDR, offsetof(BRANCH_INFO, targetAddress), IARG_BRANCH_TAKEN, offsetof(BRANCH_INFO, branchTaken), IARG_END); }

113 Software & Services Group 113 Symbols PIN_InitSymbols() –Pin will use whatever symbol information is available –Debug info in the app –Pdb files –Export Tables –On Windows uses dbghelp –See PIN_InitSymbolsAlt() for more control over which symbols will be used Use symbols to instrument/wrap/replace specific functions –wrap/replace: see malloc replacement examples in intro Access application debug information from a Pin tool

114 Software & Services Group 114 Symbols: Instrument malloc and free int main(int argc, char *argv[]) { // Initialize pin symbol manager PIN_InitSymbols(); // See also PIN_InitSymbolsAlt() for more control over which symbols are read PIN_Init(argc,argv); // Register the function ImageLoad to be called each time an image is loaded in the process // This includes the process itself and all shared libraries it loads (implicitly or explicitly) IMG_AddInstrumentFunction(ImageLoad, 0); // Never returns PIN_StartProgram(); }

115 Software & Services Group 115 Symbols: Instrument malloc and free VOID ImageLoad(IMG img, VOID *v) // Pin Callback. IMG_AddInstrumentFunction(ImageLoad, 0); { // Instrument the malloc() and free() functions. Print the input argument // of each malloc() or free(), and the return value of malloc(). RTN mallocRtn = RTN_FindByName(img, "_malloc"); // Find the malloc() function. if (RTN_Valid(mallocRtn)) { RTN_Open(mallocRtn); // Instrument malloc() to print the input argument value and the return value. RTN_InsertCall(mallocRtn, IPOINT_BEFORE, (AFUNPTR)MallocBefore, IARG_FUNCARG_ENTRYPOINT_VALUE, 0, IARG_END); RTN_InsertCall(mallocRtn, IPOINT_AFTER, (AFUNPTR)MallocAfter, IARG_FUNCRET_EXITPOINT_VALUE, IARG_END); RTN_Close(mallocRtn); } RTN freeRtn = RTN_FindByName(img, "_free"); // Find the free() function. if (RTN_Valid(freeRtn)) { RTN_Open(freeRtn); // Instrument free() to print the input argument value. RTN_InsertCall(freeRtn, IPOINT_BEFORE, (AFUNPTR)FreeBefore, IARG_FUNCARG_ENTRYPOINT_VALUE, 0, IARG_END); RTN_Close(freeRtn); }

116 Software & Services Group 116 Symbols: Instrument malloc Handling name-mangling and multiple symbols at same address VOID Image(IMG img, VOID *v) // IMG_AddInstrumentFunction(Image, 0); { // Walk through the symbols in the symbol table. for (SYM sym = IMG_RegsymHead(img); SYM_Valid(sym); sym = SYM_Next(sym)) { string undFuncName = PIN_UndecorateSymbolName(SYM_Name(sym), UNDECORATION_NAME_ONLY); if (undFuncName == "malloc") // Find the malloc function. { RTN mallocRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym)); if (RTN_Valid(mallocRtn)) { RTN_Open(mallocRtn); // Instrument to print the input argument value and the return value. RTN_InsertCall(mallocRtn, IPOINT_BEFORE, (AFUNPTR)MallocBefore, IARG_FUNCARG_ENTRYPOINT_VALUE, 0, IARG_END); RTN_InsertCall(mallocRtn, IPOINT_AFTER, (AFUNPTR)MallocAfter, IARG_FUNCRET_EXITPOINT_VALUE, IARG_END); RTN_Close(mallocRtn); }

117 Software & Services Group 117 Symbols: Accessing Application Debug Info from a Pin Tool: Catch a Memory Overwrite VOID Instruction(INS ins, VOID *v) // INS_AddInstrumentFunction(Instruction, 0); { UINT32 numMemOperands = INS_MemoryOperandCount(ins); // Iterate over each memory operand of the instruction. for (UINT32 memOp = 0; memOp < numMemOperands ; memOp++) { if (INS_MemoryOperandIsWritten(ins, memOp)) { // Insert instrumentation code to catch a memory overwrite INS_InsertIfCall (ins, IPOINT_BEFORE, AFUNPTR(AnalyzeMemWrite), IARG_FAST_ANALYSIS_CALL, IARG_MEMORYOP_EA, memop, IARG_MEMORYWRITE_SIZE, IARG_END); INS_InsertThenCall (ins, IPOINT_BEFORE, AFUNPTR(MemoryOverWriteAt), IARG_FAST_ANALYSIS_CALL, IARG_INST_PTR, IARG_MEMORYOP_EA, memop, IARG_MEMORYWRITE_SIZE, IARG_END); } } }

118 Software & Services Group 118 Symbols: Accessing Application Debug Info from a Pin Tool KNOB KnobMemAddrBeingOverwritten(KNOB_MODE_WRITEONCE, "pintool", "mem_overwrite_addr", "256", "overwritten memaddr"); static ADDRINT PIN_FAST_ANALYSIS_CALL AnalyzeMemWrite ( // Pin will inline this function, it is the IF part ADDRINT memWriteAddr, UINT32 numBytesWritten) { // return 1 if this memory write overwrites the address specified by // KnobMemAddrBeingOverwritten return (memWriteAddr<= KnobMemAddrBeingOverwritten && (memWriteAddr + numBytesWritten) > KnobMemAddrBeingOverwritten); } static VOID PIN_FAST_ANALYSIS_CALL MemoryOverWriteAt ( // Pin will NOT inline this function, it is the THEN part ADDRINT appIP, ADDRINT memWriteAddr, UINT32 numBytesWritten) { INT32 column, lineNum; string fileName; PIN_GetSourceLocation (appIP, &column, &line, &fileName); printf ("overwrite of %p from instruction at %p originating from file %s line %d col %d\n", KnobMemAddrBeingOverwritten, appIP, fileName.c_str(), lineNum, column); printf (" writing %d bytes starting at %p\n", numBytesWritten, memWriteAddr); }

119 Software & Services Group 119 Probe Mode JIT Mode –Pin creates a modified copy of the application on- the-fly –Original code never executes  More flexible, more common approach Probe Mode –Pin modifies the original application instructions –Inserts jumps to instrumentation code (trampolines)  Lower overhead (less flexible) approach

120 Software & Services Group 120 Pin Probe-Mode Probe mode is a method of using Pin to instrument at the function level only. Wrap, Replace, call Analysis function before/after. Replacement or Wrapping function can call the replaced (original) function. The application and the replacement routine are run natively (not Jitted). –Faster than Jit-mode –Puts more responsibility on the tool writer. –Probes can only be placed on RTN boundaries –Must be inserted within the Image load callback. –Pin will automatically remove the probes when an image is unloaded. Many of the PIN APIs that are available in JIT mode are not available in Probe mode.

121 Software & Services Group 121 Entry point overwritten with probe : 0x400113d4: jmp 0x41481064 0x400113d9: push %ebx A Sample Probe –A probe is a jump instruction that overwrites original instruction(s) in the application –Instrumentation invoked with probes –Pin copies/translates original bytes so probed (replaced) functions can be called from the replacement function Copy of entry point with original bytes: 0x50000004: push %ebp 0x50000005: mov %esp,%ebp 0x50000007: push %edi 0x50000008: push %esi 0x50000009: jmp 0x400113d9 0x41481064: push %ebp // tool wrapper func :::::::::::::::::::: 0x414827fe: call 0x50000004 // call original func Original function entry point: 0x400113d4: push %ebp 0x400113d5: mov %esp,%ebp 0x400113d7: push %edi 0x400113d8: push %esi 0x400113d9: push %ebx

122 Software & Services Group 122 PinProbes Instrumentation Advantages: –Low overhead – few percent –Less intrusive – execute original code –Leverages Pin: –API –Instrumentation engine Disadvantages: –More tool writer responsibility –Routine-level granularity (RTN)

123 Software & Services Group 123 Using Probes to Replace/Wrap a Function RTN_ReplaceSignatureProbed() redirects all calls to application routine rtn to the specified replacementFunction –Can add IARG_* types to be passed to the replacement routine, including pointer to original function and IARG_CONTEXT. –Replacement function can call original function. To use: –Must use PIN_StartProgramProbed() –Application prototype is required

124 Software & Services Group 124 #include "pin.H" void * MallocWrapper(AFUNPTR pf_malloc, size_t size) { // Simulate out-of-memory every so often void * res; if (TimeForOutOfMem()) return (NULL); res = pf_malloc(size); return res; } VOID ImageLoad (IMG img, VOID *v) { if (strstr(IMG_Name(img).c_str(), "libc.so") || strstr(IMG_Name(img).c_str(), "MSVCR80") || strstr(IMG_Name(img).c_str(), "MSVCR90")) { RTN mallocRtn = RTN_FindByName(img, "malloc"); if ( RTN_Valid(mallocRtn) && RTN_IsSafeForProbedReplacement(mallocRtn) ) { PROTO proto_malloc = PROTO_Allocate(PIN_PARG(void *), CALLINGSTD_DEFAULT, "malloc", PIN_PARG(size_t), PIN_PARG_END() ); RTN_ReplaceSignatureProbed(mallocRtn, AFUNPTR(MallocWrapper), IARG_PROTOTYPE, proto_malloc, IARG_ORIG_FUNCPTR, IARG_FUNCARG_ENTRYPOINT_VALUE, 0, IARG_END); } }} int main(int argc, CHAR *argv[]) { PIN_InitSymbols(); PIN_Init(argc,argv)); IMG_AddInstrumentFunction(ImageLoad, 0); PIN_StartProgramProbed(); } Malloc Replacement Probe-Mode

125 Software & Services Group 125 Using Probes to Call Analysis Functions RTN_InsertCallProbed() invokes the analysis routine before or after the specified rtn –Use IPOINT_BEFORE or IPOINT_AFTER – Pin may NOT be able to find all AFTER points on the function when it is running in Probe-Mode –PIN IARG_TYPEs are used for arguments To use: –Must use PIN_StartProgramProbed() –Application prototype is required for IPOINT_AFTER

126 Software & Services Group 126 Symbols: Instrument malloc Handling name-mangling and multiple symbols at same address Probe-Mode VOID Image(IMG img, VOID *v) // IMG_AddInstrumentFunction(Image, 0); { // Walk through the symbols in the symbol table. for (SYM sym = IMG_RegsymHead(img); SYM_Valid(sym); sym = SYM_Next(sym)) { string undFuncName = PIN_UndecorateSymbolName(SYM_Name(sym), UNDECORATION_NAME_ONLY); if (undFuncName == "malloc") // Find the malloc function. { RTN mallocRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym)); if (RTN_Valid(mallocRtn)) { RTN_Open(mallocRtn); PROTO proto_malloc = PROTO_Allocate(PIN_PARG(void *), CALLINGSTD_DEFAULT, "malloc", PIN_PARG(size_t), PIN_PARG_END() ); // Instrument to print the input argument value and the return value. RTN_InsertCallProbed(mallocRtn, IPOINT_BEFORE, (AFUNPTR)MallocBefore, IARG_PROTOTYPE, proto_malloc, IARG_FUNCARG_ENTRYPOINT_VALUE, 0, IARG_END); RTN_InsertCallProbed(mallocRtn, IPOINT_AFTER, (AFUNPTR)MallocAfter, IARG_PROTOTYPE, proto_malloc, IARG_FUNCRET_EXITPOINT_VALUE, IARG_END); RTN_Close(mallocRtn); } } }

127 Software & Services Group 127 Tool Writer Responsibilities No control flow into the instruction space where probe is placed –6 bytes on IA-32, 7 bytes on Intel64, 1 bundle on IA64 –Branch into “replaced” instructions will fail –Probes at function entry point only Thread safety for insertion and deletion of probes –During image load callback is safe –Only loading thread has a handle to the image Replacement function has same behavior as original

128 Software & Services Group 128 Multi-Threading Have shown a number of examples of Pin tools supporting multi-threading Pin fully supports multi-threading –Application threads execute jitted code including instrumentation code (inlined and not inlined), without any serialization introduced by Pin –Instrumentation code can use Pin and/or OS synchronization constructs to introduce serialization if needed. –Will see examples of this in Part3 –System calls require serialized entry to the VM before and after execution – BUT actual execution is NOT serialized –Pin does NOT create any threads of it’s own –Pin callbacks are serialized –Including the BufferFull callback –Jitting is serialized –Only one application thread can be jitting code at any time

129 Software & Services Group 129 Multi-Threading Pin Tools, in Jit-Mode, can: – Track Threads –ThreadStart, ThreadFini callbacks –IARG_THREAD_ID –Use Pin TLS for thread-specific data –Use Pin Locks to synchronize threads –Create threads to do Pin Tool work –Use Pin provided APIs to do this –Otherwise these threads would be Jitted –Details in Part4

130 Software & Services Group 130 Multi-Threading, Locking Guidelines Basic Rules –If the tool acquires any locks in a Pin call-back, it must release those locks before returning from that call-back. –If the tool acquires any locks in an analysis routine, it must release those locks before returning from the analysis routine. –If the tool calls a Pin API from a call-back, it should not hold any tool locks when calling the API. –If the tool calls a Pin API from an analysis routine, it may need to acquire the Pin client lock first (see the documentation for the API). The tool should not hold any other locks when calling the API.

131 Software & Services Group 131 Multi-Threading, Locking Guidelines Advanced Rules –If the tool acquires any locks in a Pin call-back, it must release those locks before returning from that call-back. –If the tool calls a Pin API from a call-back, it should not hold any tool locks when calling the API. –If the tool calls a Pin API from an analysis routine, it may need to acquire the Pin client lock first (see the documentation for the API). If the tool holds a tool lock L while calling the API, that lock L must obey the following sub-rule: –The tool must not acquire lock L from any call-back. This avoids a lock order inversion with respect to the Pin internal locks.

132 Software & Services Group 132 Multi-Threading, Locking Guidelines Advanced Rules –If the tool acquires any locks in an analysis routine, it must release those locks before leaving the trace that contains the analysis routine. Tools must expect that the trace may exit “early” if an application instruction raises an exception. Any lock L, which the tool might hold when the application raises an exception, must obey the following sub-rules: –The tool must establish a call-back that executes when the application raises an exception, and this call-back must release lock L if it was acquired at the time of the exception. Tools can use PIN_AddContextChangeFunction() to establish this call-back. –The tool must not acquire lock L from any call-back. This avoids a lock order inversion with respect to the Pin internal locks.

133 Software & Services Group 133 Taint Analysis For each instruction –Identify source and destination operands –Explicit, Implicit –If SRC is tainted then set DEST is tainted –If SRC isn’t tainted then set DEST isn’t tainted Sounds simple, right?

134 Software & Services Group 134 Taint Analysis Implicit operands Partial register taint Math instructions Logical instructions Exchange instructions

135 Software & Services Group 135 A simple taint analyzer Set of Tainted Memory Addresses Tainted Registers bffff081 bffff082 b64d4002 EAXEDXESI Set of Tainted Memory Addresses Tainted Registers bffff081 bffff082 b64d4002 EAXEDXESI

136 Software & Services Group 136 #include "pin.H" #include #include "xed-iclass-enum.h" set TaintedAddrs; // tainted memory addresses bool TaintedRegs[REG_LAST]; // tainted registers std::ofstream out; // output file KNOB KnobOutputFile(KNOB_MODE_WRITEONCE, "pintool", "o", "taint.out", "specify file name for the output file"); /*! * Print out help message. */ INT32 Usage() { cerr << "This tool follows the taint defined by the first argument to " << endl << "the instrumented program command line and outputs details to a file" << endl ; cerr << KNOB_BASE::StringKnobSummary() << endl; return -1; }

137 Software & Services Group 137 int main(int argc, char *argv[]) { // Initialize PIN PIN_InitSymbols(); if( PIN_Init(argc,argv) ) { return Usage(); } // Register function to be called to instrument traces TRACE_AddInstrumentFunction(Trace, 0); RTN_AddInstrumentFunction(Routine, 0); // Register function to be called when the application exits PIN_AddFiniFunction(Fini, 0); // init output file string fileName = KnobOutputFile.Value(); out.open(fileName.c_str()); // Start the program, never returns PIN_StartProgram(); return 0; }

138 Software & Services Group 138 /*! * Routine instrumentation, called for every routine loaded * this function adds a call to MainAddTaint on the main function */ VOID Routine(RTN rtn, VOID *v) { RTN_Open(rtn); if (RTN_Name(rtn) == "main") //if this is the main function { RTN_InsertCall(rtn, IPOINT_BEFORE, (AFUNPTR)MainAddTaint, IARG_FUNCARG_ENTRYPOINT_VALUE, 0, IARG_FUNCARG_ENTRYPOINT_VALUE, 1, IARG_END); } RTN_Close(rtn); } /*! * Print out the taint analysis results. * This function is called when the application exits. */ VOID Fini(INT32 code, VOID *v) { DumpTaint(); out.close(); }

139 Software & Services Group 139 VOID DumpTaint() { out << "======================================" << endl; out << "Tainted Memory: " << endl; set ::iterator it; for ( it=TaintedAddrs.begin() ; it != TaintedAddrs.end(); it++ ) { out << " " << *it; } out << endl << "***" << endl << "Tainted Regs:" << endl; for (int i=0; i < REG_LAST; i++) { if (TaintedRegs[i]) { out << REG_StringShort((REG)i); } } out << "======================================" << endl; } // This function marks the contents of argv[1] as tainted VOID MainAddTaint(unsigned int argc, char *argv[]) { if (argc != 2) return; int n = strlen(argv[1]); ADDRINT taint = (ADDRINT)argv[1]; for (int i = 0; i < n; i++) TaintedAddrs.insert(taint + i); DumpTaint(); }

140 Software & Services Group 140 // This function represents the case of a register copied to memory void RegTaintMem(ADDRINT reg_r, ADDRINT mem_w) { out " << mem_w << endl; if (TaintedRegs[reg_r]) { TaintedAddrs.insert(mem_w); } else //reg not tainted --> mem not tainted { if (TaintedAddrs.count(mem_w)) { // if mem is already not tainted nothing to do TaintedAddrs.erase(TaintedAddrs.find(mem_w)); } // this function represents the case of a memory copied to register void MemTaintReg(ADDRINT mem_r, ADDRINT reg_w, ADDRINT inst_addr) { out " << REG_StringShort((REG)reg_w) << endl; if (TaintedAddrs.count(mem_r)) //count is either 0 or 1 for set { TaintedRegs[reg_w] = true; } else //mem is clean -> reg is cleaned { TaintedRegs[reg_w] = false; }

141 Software & Services Group 141 // this function represents the case of a reg copied to another reg void RegTaintReg(ADDRINT reg_r, ADDRINT reg_w) { out " << REG_StringShort((REG)reg_w) << endl; TaintedRegs[reg_w] = TaintedRegs[reg_r]; } // this function represents the case of an immediate copied to a register void ImmedCleanReg(ADDRINT reg_w) { out " << REG_StringShort((REG)reg_w) << endl; TaintedRegs[reg_w] = false; } // this function represents the case of an immediate copied to memory void ImmedCleanMem(ADDRINT mem_w) { out " << mem_w << endl; if (TaintedAddrs.count(mem_w)) //if mem is not tainted nothing to do { TaintedAddrs.erase(TaintedAddrs.find(mem_w)); }

142 Software & Services Group 142 // True if the instruction has an immediate operand // meant to be called only from instrumentation routines bool INS_has_immed(INS ins); // returns the full name of the first register operand written REG INS_get_write_reg(INS ins); // returns the full name of the first register operand read REG INS_get_read_reg(INS ins) Helpers

143 Software & Services Group 143 /*! * This function checks for each instruction if it does a mov that can potentially * transfer taint and if true adds the approriate analysis routine to check * and propogate taint at run-time if needed * This function is called every time a new trace is encountered. */ VOID Trace(TRACE trace, VOID *v) { for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl)) { for (INS ins = BBL_InsHead(bbl); INS_Valid(ins); ins = INS_Next(ins)) { if ( (INS_Opcode(ins) >= XED_ICLASS_MOV) && (INS_Opcode(ins) <= XED_ICLASS_MOVZX) ) { if (INS_has_immed(ins)) { if (INS_IsMemoryWrite(ins)) { //immed -> mem INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)ImmedCleanMem, IARG_MEMORYOP_EA, 0, IARG_END); } else //immed -> reg { REG insreg = INS_get_write_reg(ins); INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)ImmedCleanReg, IARG_ADDRINT, (ADDRINT)insreg, IARG_END); } } // end of if INS has immed else if (INS_IsMemoryRead(ins)) //mem -> reg

144 Software & Services Group 144 else if (INS_IsMemoryRead(ins)) { //mem -> reg //in this case we call MemTaintReg to copy the taint if relevant REG insreg = INS_get_write_reg(ins); INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)MemTaintReg, IARG_MEMORYOP_EA, 0, IARG_ADDRINT, (ADDRINT)insreg, IARG_INST_PTR, IARG_END); } else if (INS_IsMemoryWrite(ins)) { //reg -> mem //in this case we call RegTaintMem to copy the taint if relevant REG insreg = INS_get_read_reg(ins); INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)RegTaintMem, IARG_ADDRINT, (ADDRINT)insreg, IARG_MEMORYOP_EA, 0, IARG_END); } else if (INS_RegR(ins, 0) != REG_INVALID()) { //reg -> reg //in this case we call RegTaintReg REG Rreg = INS_get_read_reg(ins); REG Wreg = INS_get_write_reg(ins); INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)RegTaintReg, IARG_ADDRINT, (ADDRINT)Rreg, IARG_ADDRINT, (ADDRINT)Wreg, IARG_END); } else { out << "serious error?!\n" << endl; } } // IF opcode is a MOV } // For INS } // For BBL } // VOID Trace

145 Software & Services Group 145 Part3 Summary Saw Examples of –Allocating Pin Registers for Pin Tool Use –Pin IF-THEN instrumentation –Changing register values in instrumentation code –Changing register values in CONTEXT –Knobs –Pin TLS –Pin Buffering API –Using Symbol and Debug Info –Probe-Mode –Multi-Threading support

146 Software & Services Group 146 Part4: Advanced Pin API “To boldly go where few PinHeads have gone before…” Agenda –membuffer_threadpool tool –Using multiple buffers in the Pin Buffering API –Using Pin Tool Threads –Using Pin and OS locks to synchronize threads –System call instrumentation –Instrumenting a process tree –CONTEXT* and IARG_CONST_CONTEXT, IARG_CONTEXT –Managing Exceptions and Signals –Accessing Decode API –Pin Code-Cache API –Transparent debugging, and extending the debugger

147 Software & Services Group 147 membuffer_threadpool Recall membuffer_simple: –Uses Pin Buffering API –One buffer for each thread –Inlined call to INS_InsertFillBuffer writes instrumentation data into the buffer –Application threads execute jitted application and instrumentation code –When buffer becomes full the Pin Tool defined BufferFull function is called (by the application thread) –Process the data in the buffer –After the buffer is processed it is set to be re-filled from the top –Application thread continues executing jitted application and instrumentation code

148 Software & Services Group 148 membuffer_threadpool Improvement: Process buffers that become full asynchronously, allows application code to continue executing while buffers are being processed. –Pin Buffering API supports multiple buffers per-thread –Each application thread will allocate a number of buffers. –The buffers allocated by the thread can only be used by the allocating thread so: –Each application thread will have a buffers-free list, holding all buffers that are not currently full or being filled. –Pin supports creating Pin Tool threads, these are NOT jitted and can be used to do Pin Tool work asynchronously. –A number of these threads will be created, their job is: –Process buffers that become full. These will be located on a global full-buffers list. –After processing, return them to the buffers-free list of the application thread that filled them –Application threads execute jitted application code and instrumentation code – the instrumentation code writes data into the buffers and when it detects that the buffer is full calls the BufferFull callback. –The BufferFull callback function will NOT process the buffer –Remember it is executed by an application thread –It places the buffer on the global full-buffers list –It retrieves a free buffer from this application thread’s free buffer list and returns it as the next buffer to fill.

149 Software & Services Group 149 membuffer_threadpool Application thread buffers-free list Buffer being filled Application thread buffers-free list Buffer being filled Pin Tool Processing thread buffers-full list Pin Tool Processing thread Buffer becomes full BufferFull function executed Buffer Processing finishes. Buffer returned to owner ’ s buffers-free list

150 Software & Services Group 150 membuffer_threadpool int main(int argc, char *argv[]) { PIN_Init(argc,argv); // Pin TLS slot for holding the object that represents an application thread appThreadRepresentitiveKey = PIN_CreateThreadDataKey(0); // Define the buffer that will be used – bufId = PIN_DefineTraceBuffer(sizeof(struct MEMREF), KnobNumPagesInBuffer, BufferFull, // This Pin tool function will be called when buffer is full 0); TRACE_AddInstrumentFunction(Trace, 0); // add an instrumentation callback function // add callbacks PIN_AddThreadStartFunction(ThreadStart, 0); PIN_AddThreadFiniFunction(ThreadFini, 0); PIN_AddFiniFunction(Fini, 0); PIN_AddFiniUnlockedFunction(FiniUnlocked, 0); // Used for Pin Tool thread termination /* It is safe to create internal threads in the tool's main procedure and spawn new * internal threads from existing ones. All other places, like Pin callbacks and * analysis routines in application threads, are not safe for creating internal threads. */ // NOTE: These threads are NOT jitted, Need to discuss when the threads actually start running for (int i=0; i<KnobNumProcessingThreads; i++) { THREADID threadId; PIN_THREAD_UID threadUid; threadId = PIN_SpawnInternalThread (BufferProcessingThread, NULL, 0, &threadUid); RecordToolThreadCreated(threadUid); /* Used for Pin Tool thread termination */ } PIN_StartProgram(); /* Start the program, never returns */ }

151 Software & Services Group 151 membuffer_threadpool static void RecordToolThreadCreated (PIN_THREAD_UID threadUid) { // Record the unique ID of the Pin Tool thread uidSet.insert(threadUid); } // The thread function of Pin Tool threads – this code runs natively: NO Jitting static VOID BufferProcessingThread(VOID * arg) { processingThreadRunning = TRUE; // Indicate that thread has started running THREADID myThreadId = PIN_ThreadId(); while (!doExit) { VOID *buf; UINT64 numElements; APP_THREAD_REPRESENTITVE *appThreadRepresentitive; // Get full buffer from the full buffer list fullBuffersListManager.GetBufferFromList(&buf,&numElements, &appThreadRepresentitive, myThreadId); if (buf == NULL) { // this will happen at process termination time – when there are NO ASSERTX(doExit); // no buffers left to process break; } // Process the full buffer ProcessBuffer(buf, numElements, appThreadRepresentitive); // Put the processed buffer back on the free buffer list of the application thread that owns it appThreadRepresentitive->FreeBufferListManager() ->PutBufferOnList(buf, 0, appThreadRepresentitive, myThreadId); }

152 Software & Services Group 152 membuffer_threadpool /*! * Called by, instrumentation code, when a buffer fills up, by Pin when the thread exits, so the buffer can be processed * Called in the context of the application thread * @param[in] idbuffer handle * @param[in] tidid of owning thread * @param[in] ctxtapplication context * @param[in] bufactual pointer to buffer * @param[in] numElementsnumber of records * @param[in] vcallback value * @return A pointer to the buffer to resume filling. */ VOID * BufferFull(BUFFER_ID id, THREADID tid, const CONTEXT *ctxt, VOID *buf, UINT64 numElements, VOID *v) { // get the APP_THREAD_REPRESENTITVE of this app thread from the Pin TLS APP_THREAD_REPRESENTITVE * appThreadRepresentitive = static_cast ( PIN_GetThreadData( appThreadRepresentitiveKey, tid ) ); // Enqueue the full buffer, on the full-buffers list, and get the next buffer to fill, from this // thread’s free buffer list VOID *nextBuffToFill = appThreadRepresentitive->EnqueFullAndGetNextToFill(buf, numElements); return (nextBuffToFill); }

153 Software & Services Group 153 membuffer_threadpool VOID * APP_THREAD_REPRESENTITVE::EnqueFullAndGetNextToFill(VOID *fullBuf, UINT64 numElements) { // cannot wait for Pin Tool threads to start running since this may cause deadlock // because this app thread may be holding some OS resource that the Pin Tool // thread needs to obtain in order to start - e.g. the LoaderLock if ( !processingThreadRunning) { // process buffer in this app thread ProcessBuffer(fullBuf, numElements, this); return fullBuf; } if (!_buffersAllocated) { // now allocate the rest of the KnobNumBuffersPerAppThread buffers to be used for (int i=0; i<KnobNumBuffersPerAppThread-1; i++) _freeBufferListManager->PutBufferOnList(PIN_AllocateBuffer(bufId), 0, this, _myTid); _buffersAllocated = TRUE; } // put the fullBuf on the full buffers list, on the Pin Tool processing // threads will pick it from there, process it, and then put it on this app-thread's free buffer list fullBuffersListManager.PutBufferOnList(fullBuf, numElements, this, _myTid); // return the next buffer to fill. // It is always taken from the free buffers list of this app thread. If the list is empty then this app // thread will be blocked until one is placed there (by one of the Pin Tool buffer processing threads). VOID *nextBufToFill; UINT64 numElementsDummy; APP_THREAD_REPRESENTITVE *appThreadRepresentitiveDummy; _freeBufferListManager->GetBufferFromList(&nextBufToFill, &numElementsDummy, &appThreadRepresentitiveDummy, _myTid); ASSERTX(appThreadRepresentitiveDummy = this); return nextBufToFill; }

154 Software & Services Group 154 membuffer_threadpool VOID Instruction (INS ins, VOID *v) { UINT32 numMemOperands = INS_MemoryOperandCount(ins); // Iterate over each memory operand of the instruction. for (UINT32 memOp = 0; memOp < numMemOperands ; memOp++) { // Add the instrumentation code to write the appIP and memAddr // of this memory operand into the buffer // Pin will inline the code that writes to the buffer INS_InsertFillBuffer(ins, IPOINT_BEFORE, bufId, IARG_INST_PTR, offsetof(struct MEMREF, appIP), IARG_MEMORYOP_EA, memOp, offsetof(struct MEMREF, memAddr), IARG_END); }

155 Software & Services Group 155 membuffer_threadpool class BUFFER_LIST_MANAGER { public: BUFFER_LIST_MANAGER(); VOID PutBufferOnList (VOID *buf, UINT64 numElements, APP_THREAD_REPRESENTITVE *appThreadRepresentitive, THREADID tid) { // build the list element BUFFER_LIST_ELEMENT bufferListElement; bufferListElement.buf = buf; bufferListElement.numElements = numElements; bufferListElement.appThreadRepresentitive = appThreadRepresentitive; GetLock(&_bufferListLock, tid+1); // lock the list, using a Pin lock _bufferList.push_back(bufferListElement); // insert the element at the end of the list ReleaseLock(&_bufferListLock); // unlock the list WIND::ReleaseSemaphore(_bufferSem, 1, NULL); // signal that there is a buffer on the list } VOID GetBufferFromList (VOID **buf,UINT64 *numElements, APP_THREAD_REPRESENTITVE **appThreadRepresentitive, THREADID tid){ WIND::WaitForSingleObject (_bufferSem, INFINITE); // wait until there is a buffer on the list GetLock(&_bufferListLock, tid+1); // lock the list BUFFER_LIST_ELEMENT &bufferListElement = (_bufferList.front()); // retrieve the first element of the list *buf = bufferListElement.buf; *numElements = bufferListElement.numElements; *appThreadRepresentitive = bufferListElement.appThreadRepresentitive; _bufferList.pop_front(); // remove the first element from the list ReleaseLock(&_bufferListLock); // unlock the list } VOID SignalBufferSem() {WIND::ReleaseSemaphore(_bufferSem, 1, NULL);} UINT32 NumBuffersOnList () { return (_bufferList.size());} private: struct BUFFER_LIST_ELEMENT // structure of an element of the buffer list { VOID *buf; UINT64 numElements; APP_THREAD_REPRESENTITVE *appThreadRepresentitive; // the application thread that owns this buffer }; WIND::HANDLE _bufferSem; // counting semaphore, value is #of buffers on the list, value==0 => WaitForSingleObject blocks PIN_LOCK _bufferListLock; // Pin Lock list _bufferList; };

156 Software & Services Group 156 membuffer_threadpool VOID ThreadFini(THREADID tid, const CONTEXT *ctxt, INT32 code, VOID *v) { // get the APP_THREAD_REPRESENTITVE of this app thread from the Pin TLS APP_THREAD_REPRESENTITVE * appThreadRepresentitive = static_cast (PIN_GetThreadData( appThreadRepresentitiveKey, tid)); // wait for all my buffers to be processed while(appThreadRepresentitive->_freeBufferListManager->NumBuffersOnList() != KnobNumBuffersPerAppThread-1) PIN_Sleep(1); delete appThreadRepresentitive; PIN_SetThreadData(appThreadRepresentitiveKey, 0, tid); } static VOID FiniUnlocked(INT32 code, VOID *v) { BOOL waitStatus; INT32 threadExitCode; doExit = TRUE; // indicate that process is exiting // signal all the Pin Tool threads to wake up and recognize the exit for (int i=0; i<KnobNumProcessingThreads; i++) fullBuffersListManager.SignalBufferSem(); // Wait until all Pin Tool threads exit for (set ::iterator it = uidSet.begin(); it != uidSet.end(); ++it) waitStatus = PIN_WaitForThreadTermination(*it, PIN_INFINITE_TIMEOUT, &threadExitCode); }

157 Software & Services Group 157 System Call Instrumentation VOID SyscallEntry(THREADID threadIndex, CONTEXT *ctxt, SYSCALL_STANDARD std, VOID *v) { ADDRINT appIP = PIN_GetContextReg(ctxt, REG_INST_PTR); printf ("syscall# %d at appIP %x param1 %x param2 %x param3 %x param4 %x param5 %x param6 %x\n", PIN_GetSyscallNumber(ctxt, std), appIP, PIN_GetSyscallArgument(ctxt, std, 0), PIN_GetSyscallArgument(ctxt, std, 1), PIN_GetSyscallArgument(ctxt, std, 2), PIN_GetSyscallArgument(ctxt, std, 3), PIN_GetSyscallArgument(ctxt, std, 4), PIN_GetSyscallArgument(ctxt, std, 5)); } VOID SyscallExit(THREADID threadIndex, CONTEXT *ctxt, SYSCALL_STANDARD std, VOID *v) { printf(" returns: %x\n", PIN_GetSyscallReturn(ctxt, std); } int main(int argc, char *argv[]) { PIN_Init(argc, argv); // Instrument system calls via these Pin Callbacks and not via analysis functions PIN_AddSyscallEntryFunction (SyscallEntry, 0); PIN_AddSyscallExitFunction (SyscallExit, 0); PIN_StartProgram(); // Never returns }

158 Software & Services Group 158 Instrumenting a Process Tree Process A creates Process B –Process B creates Process C and D –And so forth Can use Pin to instrument all or part of the processes of a process tree –Use the –follow_exevc Pin invocation switch to turn this on Can use different Pin modes (Jit or Probe) on the different processes in the process tree. Can use different Pin Tools on the different processes of a process tree. Architecture of processes in the process tree may be intermixed: e.g. Process A is 32bit, Process B is 64 bit, Process C is 64 bit, Process D is 32 bit…

159 Software & Services Group 159 Instrumenting a Process Tree // If this Pin Callback returns FALSE, then the child process will run Natively BOOL FollowChild(CHILD_PROCESS childProcess, VOID * userData) { BOOL res; INT appArgc; CHAR const * const * appArgv; OS_PROCESS_ID pid = CHILD_PROCESS_GetId(childProcess); // Get the command line that child process will be Pinned with, these are the Pin invocation switches // that were specified when this (parent) process was Pinned CHILD_PROCESS_GetCommandLine(childProcess, &appArgc, &appArgv); // The Pin invocation switches of the child can be made to order INT pinArgc = 0; CHAR const * pinArgv[20]; :::: Put values in pinArgv, Set pinArgc to be the number of entries in pinArgv that are to be used CHILD_PROCESS_SetPinCommandLine(childProcess, pinArgc, pinArgv); return TRUE; /* Specify Child process is to be Pinned */ } int main(INT32 argc, CHAR **argv) { PIN_Init(argc, argv); cout << " Process is running on Pin in " << PIN_IsProbeMode() ? " Probe " : " Jit " << " mode " // The FollowChild Pin Callback will be called when the application being Pinned is about to spawn // child process PIN_AddFollowChildProcessFunction (FollowChild, 0); if ( PIN_IsProbeMode() ) PIN_StartProgramProbed(); // Never returns else PIN_StartProgram(); }

160 Software & Services Group 160 CONTEXT*, IARG_CONST_CONTEXT, IARG_CONTEXT CONTEXT* is a Handle to the full register context of the application at a particular point in the execution –CONTEXT* can NOT be dereferenced. It is a handle to be passed to Pin API functions CONTEXT* is passed by default to a number of Pin Callback functions: e.g. –ThreadStart registered by PIN_AddThreadStartFunction –BufferFull registered by PIN_DefineTraceBuffer –OnContextChange registered by PIN_AddContextChangeFunction

161 Software & Services Group 161 CONTEXT* … Pin API functions supplied to Get and Set registers within the CONTEXT Have Pin API functions to Get and Set FP context Can request CONTEXT* be passed to an analysis function by requesting IARG_(CONST)_CONTEXT Requesting IARG_CONTEXT –The analysis function will NOT be inlined –The passing of the CONTEXT* is time consuming Passing IARG_CONST_CONTEXT is ~4X faster than IARG_CONTEXT –Contents of CONTEXT* passed for IARG_CONST_CONTEXT can NOT be changed

162 Software & Services Group 162 CONTEXT* … Changes made to the contents of a CONTEXT* –IARG_CONTEXT –Changes made will be visible in subsequent PIN API calls made from within the nesting of the analysis function –Changes made will NOT be visible in the application context after return from the analysis function –Passed to PIN Callbacks –Changes made will be visible in both of above

163 Software & Services Group 163 #include "pin.H" void *FunctionReplacer ( CONTEXT * ctxt, AFUNPTR pf_malloc, size_t size) { void * res; CONTEXT writableContext, * context = ctxt; if (TimeForRegChange()) { PIN_SaveContext(ctxt, &writableContext); // need to copy the ctxt into a writable context context = & writableContext; PIN_SetContextReg(context, REG_GAX, 1); } PIN_CallApplicationFunction(context, PIN_ThreadId(), CALLINGSTD_DEFAULT, pf_malloc, PIN_PARG(void *), &res, PIN_PARG(size_t), size); return res; } VOID ImageLoad(IMG img, VOID *v) { // Pin callback. Registered by IMG_AddInstrumentFunction RTN rtn = RTN_FindByName(img, “Function"); PROTO proto = PROTO_Allocate( PIN_PARG(void *), CALLINGSTD_DEFAULT, "proto", PIN_PARG(size_t), PIN_PARG_END() ); RTN_ReplaceSignature (rtn, AFUNPTR(FunctionReplacer), IARG_PROTOTYPE, proto, IARG_CONST_CONTEXT, IARG_ORIG_FUNCPTR, IARG_FUNCARG_ENTRYPOINT_VALUE, 0, IARG_END); } int main(int argc, CHAR *argv[]) { PIN_InitSymbols(); PIN_Init(argc,argv)); IMG_AddInstrumentFunction(ImageLoad, 0); PIN_StartProgram(); } Function Replacement with register change

164 Software & Services Group 164 memtrace_simple KNOB KnobNumBytesInBuffer(KNOB_MODE_WRITEONCE, "pintool", "num_bytes_in_buffer", "0x100000", "number of bytes in buffer"); APP_THREAD_REPRESENTITVE::APP_THREAD_REPRESENTITVE(THREADID myTid) { _buffer = new char[KnobNumBytesInBuffer.Value()]; // Allocate the buffer _numBuffersFilled = 0; _numElementsProcessed = 0; _myTid = myTid; } char * APP_THREAD_REPRESENTITVE::Begin() { return _buffer; } char * APP_THREAD_REPRESENTITVE:: End() { return _buffer + KnobNumBytesInBuffer.Value(); } VOID ThreadStart(THREADID tid, CONTEXT *ctxt, INT32 flags, VOID *v) // Pin callback on thread creation { // There is a new APP_THREAD_REPRESENTITVE object for every thread APP_THREAD_REPRESENTITVE * appThreadRepresentitive = new APP_THREAD_REPRESENTITVE(tid); // A thread will need to look up its APP_THREAD_REPRESENTITVE, so save pointer in Pin TLS PIN_SetThreadData(appThreadRepresentitiveKey, appThreadRepresentitive, tid); // Initialize endOfTraceInBufferReg to point at beginning of buffer PIN_SetContextReg(ctxt, endOfTraceInBufferReg, reinterpret_cast (appThreadRepresentitive->Begin())); // Initialize endOfBufferReg to point at end of buffer PIN_SetContextReg(ctxt, endOfBufferReg, reinterpret_cast (appThreadRepresentitive->End())); }

165 Software & Services Group 165 Managing Exceptions and Signals

166 Software & Services Group 166 Exceptions Catch Exceptions that occur in Pin Tool code –Global exception handler –PIN_AddInternalExceptionHandler –Guard code section with exception handler –PIN_TryStart –PIN_TryEnd

167 Software & Services Group 167 Exceptions VOID InstrumentDivide(INS ins, VOID* v) { if ((INS_Mnemonic(ins) == "DIV") && (INS_OperandIsReg(ins, 0))) { // Will Emulate div instruction with register operand INS_InsertCall(ins, IPOINT_BEFORE, AFUNPTR(EmulateIntDivide), IARG_REG_REFERENCE, REG_GDX, IARG_REG_REFERENCE, REG_GAX, IARG_REG_VALUE, REG(INS_OperandReg(ins, 0)), IARG_CONST_CONTEXT, IARG_THREAD_ID, IARG_END); INS_Delete(ins); // Delete the div instruction } int main(int argc, char * argv[]) { PIN_Init(argc, argv); INS_AddInstrumentFunction (InstrumentDivide, 0); PIN_AddInternalExceptionHandler (GlobalHandler, NULL); // Registers a Global Exception Handler PIN_StartProgram(); // Never returns return 0; }

168 Software & Services Group 168 Exceptions EXCEPT_HANDLING_RESULT DivideHandler (THREADID tid, EXCEPTION_INFO * pExceptInfo, PHYSICAL_CONTEXT * pPhysCtxt, // The context when the exception // occurred VOID *appContextArg // The application context when the // exception occurred ) { if(PIN_GetExceptionCode(pExceptInfo) == EXCEPTCODE_INT_DIVIDE_BY_ZERO) { // Divide by zero occurred in the code emulating the divide, use PIN_RaiseException to raise this exception // at the appIP – for handling by the application cout << " DivideHandler : Caught divide by zero." << PIN_ExceptionToString(pExceptInfo) << endl; // Get the application IP where the exception occurred from the application context CONTEXT * appCtxt = (CONTEXT *)appContextArg; ADDRINT faultIp = PIN_GetContextReg (appCtxt, REG_INST_PTR); // raise the exception at the application IP, so the application can handle it as it wants to PIN_SetExceptionAddress (pExceptInfo, faultIp); PIN_RaiseException (appCtxt, tid, pExceptInfo); // never returns } return EHR_CONTINUE_SEARCH; } VOID EmulateIntDivide(ADDRINT * pGdx, ADDRINT * pGax, ADDRINT divisor, CONTEXT * ctxt, THREADID tid) { PIN_TryStart(tid, DivideHandler, ctxt); // Register a Guard Code Section Exception Handler UINT64 dividend = *pGdx; dividend <<= 32; dividend += *pGax; *pGax = dividend / divisor; *pGdx = dividend % divisor; PIN_TryEnd(tid); /* Guarded Code Section ends */ }

169 Software & Services Group 169 Exceptions EXCEPT_HANDLING_RESULT GlobalHandler(THREADID threadIndex, EXCEPTION_INFO * pExceptInfo, PHYSICAL_CONTEXT * pPhysCtxt, VOID *v) { // Any Exception occurring in Pin Tool, or Pin that is not in a Guarded Code Section will cause this function to be // executed cout << "GlobalHandler: Caught unexpected exception. " << PIN_ExceptionToString(pExceptInfo) << endl; return EHR_UNHANDLED; }

170 Software & Services Group 170 Exceptions, Monitoring Application Exceptions PIN_AddContextChangeFunction –Can monitor and change that application state at application exceptions int main(int argc, char **argv) { PIN_Init(argc, argv); PIN_AddContextChangeFunction(OnContextChange, 0); PIN_StartProgram(); }

171 Software & Services Group 171 Exceptions, Monitoring Application Exceptions static void OnContextChange (THREADID tid, CONTEXT_CHANGE_REASON reason, const CONTEXT *ctxtFrom // Application's register state at exception point CONTEXT *ctxtTo, // Application's register state delivered to handler INT32 info, VOID *v) { if (CONTEXT_CHANGE_REASON_SIGRETURN == reason || CONTEXT_CHANGE_REASON_APC == reason || CONTEXT_CHANGE_REASON_CALLBACK == reason || CONTEXT_CHANGE_REASON_FATALSIGNAL == reason || ctxtTo == NULL) { // don't want to handle these return; } // CONTEXT_CHANGE_REASON_EXCEPTION // change some register values in the context that the application will see at the handler FPSTATE fpContextFromPin; // change the bottom 4 bytes of xmm0 PIN_GetContextFPState (ctxtFrom, &fpContextFromPin); fpContextFromPin.fxsave_legacy._xmm[3] = ' de ' ; fpContextFromPin.fxsave_legacy._xmm[2] = ' ad ' ; fpContextFromPin.fxsave_legacy._xmm[1] = ' be ' ; fpContextFromPin.fxsave_legacy._xmm[0] = ' ef ' ; PIN_SetContextFPState (ctxtTo, &fpContextFromPin); // change eax PIN_SetContextReg(ctxtTo, REG_RAX, 0xbaadf00d); }

172 Software & Services Group 172 Signals Establish an interceptor function for signals delivered to the application –Tools should never call sigaction() directly to handle signals. –function is called whenever the application receives the requested signal, regardless of whether the application has a handler for that signal. –function can then decide whether the signal should be forwarded to the application

173 Software & Services Group 173 Signals A tool can take over ownership of a signal in order to: –use the signal as an asynchronous communication mechanism to the outside world. –For example, if a tool intercepts SIGUSR1, a user of the tool could send this signal and tell the tool to do something. In this usage model, the tool may call PIN_UnblockSignal() so that it will receive the signal even if the application attempts to block it. –"squash" certain signals that the application generates. –a tool that forces speculative execution in the application may want to intercept and squash exceptions generated in the speculative code. A tool can set only one "intercept" handler for a particular signal, so a new handler overwrites any previous handler for the same signal. To disable a handler, pass a NULL function pointer.

174 Software & Services Group 174 Signals BOOL EnableInstrumentation = FALSE; BOOL SignalHandler(THREADID, INT32, CONTEXT *, BOOL, const EXCEPTION_INFO *, void *) { // When tool receives the signal, enable instrumentation. Tool calls // PIN_RemoveInstrumentation() to remove any existing instrumentation from Pin's code cache. EnableInstrumentation = TRUE; PIN_RemoveInstrumentation(); return FALSE; /* Tell Pin NOT to pass the signal to the application. */ } VOID Trace(TRACE trace, VOID *) { if (!EnableInstrumentation) return; for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl)) BBL_InsertCall(bbl, IPOINT_BEFORE, AFUNPTR(AnalysisFunc), IARG_INST_PTR, IARG_END);} int main(int argc, char * argv[]) { PIN_Init(argc, argv); PIN_InterceptSignal(SIGUSR1, SignalHandler, 0); // Tool should really determine which signal is NOT in // use by application PIN_UnblockSignal(SIGUSR1, TRUE); TRACE_AddInstrumentFunction(Trace, 0); PIN_StartProgram(); }

175 Software & Services Group 175 Accessing the Decode API The decoder/encoder used is called XED –http://www.pintool.org/docs/24110/Xed/html/http://www.pintool.org/docs/24110/Xed/html/ Tool code can use the XED API –E.g. decode an instruction inside an analysis routine.

176 Software & Services Group 176 Accessing the Decode API extern "C" { #include "xed-interface.h" } static VOID PIN_FAST_ANALYSIS_CALL MemoryOverWriteAt ( // Pin will NOT inline this function, it is the THEN part ADDRINT appIP, ADDRINT memWriteAddr, UINT32 numBytesWritten) { INT32 column, lineNum; string fileName; PIN_GetSourceLocation (appIP, &column, &line, &fileName); static const xed_state_t dstate = { XED_MACHINE_MODE_LEGACY_32, XED_ADDRESS_WIDTH_32b}; xed_decoded_inst_t xedd; xed_decoded_inst_zero_set_mode (&xedd, &dstate); xed_error_enum_t xed_code = xed_decode (&xedd, reinterpret_cast (appIP), 15); char buf[256]; xed_decoded_inst_dump_intel_format(&xedd, buf, 256, appIP); printf ("overwrite of %p from instruction at %p %s originating from file %s line %d col %d\n", KnobMemAddrBeingOverwritten, appIP, buf, fileName.c_str(), lineNum, column); printf (" writing %d bytes starting at %p\n", numBytesWritten, memWriteAddr); }

177 Software & Services Group 177 Pin Code-Cache API The Code-Cache API allows a Pin Tool to: –Inspect Pin's code cache and/or alter the code cache replacement policy –Assume full control of the code cache –Remove all or selected traces from the code cache –Monitor code cache activity, including start/end of execution of code in the code cache

178 Software & Services Group 178 Pin Code-Cache API VOID DoSmcCheck(VOID * traceAddr, VOID * traceCopyAddr, USIZE traceSize, CONTEXT * ctxP) { if (memcmp(traceAddr, traceCopyAddr, traceSize) != 0) /* application code changed */ { // the jitted trace is no longer valid free(traceCopyAddr); CODECACHE_InvalidateTraceAtProgramAddress((ADDRINT)traceAddr); PIN_ExecuteAt(ctxP); /* Continue jited execution at this application trace */ } } VOID InstrumentTrace(TRACE trace, VOID *v) { VOID * traceAddr; VOID * traceCopyAddr; USIZE traceSize; traceAddr = (VOID *)TRACE_Address(trace); // The appIP of the start of the trace traceSize = TRACE_Size(trace); // The size of the original application trace in bytes traceCopyAddr = malloc(traceSize); if (traceCopyAddr != 0) { memcpy(traceCopyAddr, traceAddr, traceSize); // Copy of original application code in trace // Insert a call to DoSmcCheck before every trace TRACE_InsertCall(trace, IPOINT_BEFORE, (AFUNPTR)DoSmcCheck, IARG_PTR, traceAddr, IARG_PTR, traceCopyAddr, IARG_UINT32, traceSize, IARG_CONTEXT, IARG_END); } } int main(int argc, char * argv[]) { PIN_Init(argc, argv); TRACE_AddInstrumentFunction(InstrumentTrace, 0); PIN_StartProgram(); }

179 Software & Services Group 179 Transparent debugging, and extending the debugger Transparently debug the application while it is running on Pin + Pin Tool –PinADX: Customizable Debugging with Dynamic Instrumentation CGO2012 Use Pin Tool to enhance/extend the debugger capabilities –Watchpoint: Is order of magnitude faster when implemented using Pin Tool –See previous “Symbols: Accessing Application Debug Info from a Pin Tool” –Which branch is branching to address 0 –Easy to write a Pin Tool that implements this

180 Software & Services Group 180 Debug Application while Running Pin Useful for Pin-based emulators –User can debug application while emulating Provide advanced debugging features with Pin: –Stack monitoring features –Buffer overrun detection –Reverse debugging –Write your own debugger extension via Pin

181 Software & Services Group 181 Naïve Solution Won’t Work Why can’t we just debug normally? –Debugger sees Pin state, not application state –Pin recompiles application code –Instructions wrong, registers wrong, PC wrong, … Pin Application Tool GDB Pined process ?

182 Software & Services Group 182 Pin Debugger Interface GDB debugs application (not Pin itself) Leverage GDB remote protocol ABI Application Tool GDB Debug Agent Pin GDB remote protocol (tcp) Pin process (unmodified)

183 Software & Services Group 183 1.Run Pin with -appdebug 2.Start GDB, enter “target remote …” 3.Set breakpoints, etc. Continue with “cont” $ pin -appdebug -t tool.so --./application Application stopped until continued from debugger. Start GDB, then issue this command at the (gdb) prompt: target remote :1234 Debug the Application with Pin $ gdb./application (gdb) target remote :1234 (gdb) break main (gdb) cont

184 Software & Services Group 184 Extending the Debugger Normal debugging with Pin useful but limited Extending the debugger: –Add GDB commands via a Pin tool –Stop at “semantic breakpoint” via instrumentation

185 Software & Services Group 185 Pintool 4: Stack Debugger $ pin -appdebug -t stack-debugger.so --./app $ gdb./app (gdb) target remote :1234 (gdb) monitor stackbreak 4000 Break when thread uses 4000 stack bytes. (gdb) cont Thread uses 4004 stack bytes. […] (gdb) monitor stats Maximum stack usage: 8560 bytes. Commands implemented in Pintool

186 Software & Services Group 186 Stack-Debugger Instrumentation Thread Start: […] sub$0x60, %esp cmp%esi, %edx jle size = StackBase - %esp; if (size > MaxStack) MaxStack = size; if (size > StackLimit) TriggerBreakpoint(); StackBase = %esp; MaxStack = 0; After each stack-changing instruction

187 Software & Services Group 187 ManualExamples/stack-debugger.cpp instrumentation routine analysis routine VOID Instruction(INS ins, VOID *) { if (INS_RegWContain(ins, REG_STACK_PTR)) { IPOINT where = (INS_HasFallThrough(ins)) ? IPOINT_AFTER : IPOINT_TAKEN_BRANCH; INS_InsertCall(ins, where, (AFUNPTR)OnStackChange, IARG_REG_VALUE, REG_STACK_PTR, IARG_THREAD_ID, IARG_CONST_CONTEXT, IARG_END); } } VOID OnStackChange(ADDRINT sp, THREADID tid, CONTEXT *ctxt) { size_t size = StackBase - sp; if (size > StackMax) StackMax = size; if (size > StackLimit) { ostringstream os; os << "Thread uses " << size << " stack bytes."; PIN_ApplicationBreakpoint(ctxt, tid, FALSE, os.str()); } } Triggers debugger breakpoint Insert only after instructions that write to %esp

188 Software & Services Group 188 ManualExamples/stack-debugger.cpp int main() { […] PIN_AddDebugInterpreter(HandleDebugCommand, 0); } BOOL HandleDebugCommand(const string &cmd, string *result) { if (cmd == "stats") { ostringstream os; os << "Maximum stack usage: " << StackMax << " bytes.\n"; *result = os.str(); return TRUE; } else if (cmd.find("stackbreak ") == 0) { StackLimit = /* parse limit */; ostringstream os; os << "Break when thread uses " << StackLimit << " stack bytes."; *result = os.str(); return TRUE; } return FALSE; // Unknown command } Hooks the GDB “monitor” command. E.g.: (gdb) monitor stats (gdb) monitor stackbreak 4000 Receives text after “monitor” This string written to GDB session

189 Software & Services Group 189 Other Debugger Tools Breakpoint on buffer overrun Debug from a recorded log file Reverse debugging from a recording Design your own custom debugger tool

190 Software & Services Group 190 Pin Debugger Internals

191 Software & Services Group 191 Pin Debugger Interface Application Tool GDB Debug Agent Pin GDB remote protocol(tcp) Pin process (unmodified)

192 Software & Services Group 192 Communication Details Very low level Symbol processing in GDB Expression evaluation in GDB Reusing GDB’s remote debugging interface Many debuggers have interface like this GDB Debug Agent Pin Commands Read / write registers, memory Set breakpoints Continue, single-step, stop Notifications Breakpoint triggered Caught signal Application exited

193 Software & Services Group 193 Commands / Notifications Virtualized Commands Read register -> read virtualized app register Set breakpoint -> set virtual breakpoint Single step -> step VM one app instruction Notifications Signal notification -> on virtual signal in VM Breakpoint notification -> on virtual BP in VM

194 Software & Services Group 194 Single Step Original code Code cache 1’ 2 3 1 4 5 6 step complete notification GDBPin Execution stops in Pin Waits for GDB to continue do single-step

195 Software & Services Group 195 Breakpoint Original code Code cache 1’ 2 3 1 4 5 6 breakpoint notification GDBPin Execution stops in Pin Waits for GDB to continue BP 2’ 3’ set breakpoint at 4 continue

196 Software & Services Group 196 Part4 Summary Boldly went where no pin head has gone before… –Lived to tell the tail –membuffer_threadpool tool –Using multiple buffers in the Pin Buffering API –Using Pin Tool Threads –Using Pin and OS locks to synchronize threads –System call instrumentation –Instrumenting a process tree –CONTEXT* and IARG_CONTEXT –Managing Exceptions and Signals –Decode API –Pin Code-Cache API –Transparent debugging, and extending the debugger

197 Software & Services Group 197 Part5 Performance #s

198 Software & Services Group 198 Pin Performance (Windows)

199 Software & Services Group 199 Pin Performance (Windows)

200 Software & Services Group 200 Scalability of Workloads on Pin Scalability Pin VMM serialization does not impact scalability of the application –Execution in the code cache is not serialized Scalability may drop due to limited memory bandwidth (MemTrace) or contention for tool private data (MemError) No Instrumentation BBCount Native MemTrace MemError

201 Software & Services Group 201 Kernel Interaction Overhead assa System Calls12X Exceptions10.5X APCs3X Callbacks1.8X Slowdown Relative to Native per Kernel Interaction Cost of a trip in Pin VMM for each system call is high –~3000 cycles in VMM vs. ~500 cycles for ring crossing –Future work: a faster path in VMM for system calls IllustratorExcelCINEBENCHPOV-Ray System Calls 1,659,298 658,683101,70075,313 Exceptions 1 000 APCs 6 624 Callbacks 73,06268,7679617,682 Overhead vs. Total Runtime 3.3%2.8%<1% Kernel Interaction Counts Total overhead for handling kernel interactions is relatively low –Kernel interactions are infrequent for majority of applications

202 Software & Services Group 202 Overall Summary Pin is Intel’s dynamic binary instrumentation engine Pin can be used to instrument all user level code –Windows, Linux –IA-32, Intel64, IA64 –Product level robustness –Jit-Mode for full instrumentation: Thread, Function, Trace, BBL, Instruction –Probe-Mode for Function Replacement/Wrapping/Instrumentation only. –Pin supports multi-threading, no serialization of jitted application nor of instrumentation code Pin API makes Pin tools easy to write –Presented many tools, many fit on 1 ppt slide Pin performance is good –Pin APIs provide for writing efficient Pin tools Popular and well supported –30,000+ downloads, 400+ citations Free DownLoad –www.pintool.orgwww.pintool.org –Includes: Detailed user manual, source code for 100s of Pin tools, tutorials Pin User Group – http://tech.groups.yahoo.com/group/pinheads/http://tech.groups.yahoo.com/group/pinheads/ –Pin users and Pin developers answer questions


Download ppt "Software & Services Group 1 Pin: Intel’s Dynamic Binary Instrumentation Engine Pin Tutorial Intel Corporation Presented By: Tevi Devor CGO ISPASS 2012."

Similar presentations


Ads by Google