Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scalable Support for Multithreaded Applications on Dynamic Binary Instrumentation Systems Kim Hazelwood Greg Lueck Robert Cohn.

Similar presentations


Presentation on theme: "Scalable Support for Multithreaded Applications on Dynamic Binary Instrumentation Systems Kim Hazelwood Greg Lueck Robert Cohn."— Presentation transcript:

1 Scalable Support for Multithreaded Applications on Dynamic Binary Instrumentation Systems Kim Hazelwood Greg Lueck Robert Cohn

2 Hazelwood – ISMM 2009 1 Dynamic Binary Instrumentation sub$0xff, %edx cmp%esi, %edx jle mov$0x1, %edi add$0x10, %eax counter++; Inserts or modify arbitrary instructions in executing binaries, e.g.: instruction count

3 Hazelwood – ISMM 2009 2 Instruction Count Output $ /bin/ls Makefile imageload.out itrace proccount imageload inscount atrace itrace.out $ pin -t inscount.so -- /bin/ls Makefile imageload.out itrace proccount imageload inscount atrace itrace.out Count 422838

4 Hazelwood – ISMM 2009 3 How Does it Work? Generates and caches modified copies of instructions Modified (cached) instructions are executed in lieu of original instructions EXE Transform Code Cache Execute Profile

5 Hazelwood – ISMM 2009 4 Why “Dynamic” Instrumentation? Robustness! No need to recompile or relink Discover code at runtime Handle dynamically-generated code Attach to running processes The Code Discovery Problem on x86 Instr 1Instr 2 Instr 3Jump RegDATA Instr 5Instr 6 Uncond BranchPADDING Instr 8 Indirect jump to ?? Data interspersed with code Pad for alignment

6 Hazelwood – ISMM 2009 55 Intel Pin A dynamic binary instrumentation system Easy-to-use instrumentation interface Supports multiple platforms –Four ISAs – IA32, Intel64, IPF, ARM –Four OSes – Linux, Windows, FreeBSD, MacOS Popular and well supported –32,000+ downloads –400+ citations –500+ mailing list subscribers

7 Hazelwood – ISMM 2009 6 Research Applications Gather profile information about applications Compare programs generated by competing compilers Generate a select stream of live information for event-driven simulation Add security features Emulate new hardware Anything and everything multicore

8 Hazelwood – ISMM 2009 7 The Problem with Modern Tools Many research tools do not support multithreaded guest applications Providing support for MT apps is mostly straightforward Providing scalable support can be tricky!

9 Hazelwood – ISMM 2009 8 Issues that Arise Gaining control of executing threads Determining what should be private vs. shared between threads Code cache maintenance and consistency Concurrent instruction writes Providing/handling thread-local storage Handling indirect branches Handling signals / system calls

10 Hazelwood – ISMM 2009 9 The Pin Architecture JIT Compiler Syscall Emulator Signal Emulator Dispatcher Instrumentation Code Call-Back Handlers Analysis Code Code Cache Pin SerializedParallel T1 T2 T1 T2 Pin Tool

11 Hazelwood – ISMM 2009 10 Code Cache Consistency Cached code must be removed for a variety of reasons: Dynamically unloaded code Ephemeral/adaptive instrumentation Self-modifying code Bounded code caches EXE Transform Code Cache Execute Profile

12 Hazelwood – ISMM 2009 11 Motivating a Bounded Code Cache The Perl Benchmark

13 Hazelwood – ISMM 2009 12 Option 1: All threads have a private code cache (oops, doesn’t scale) Option 2: Shared code cache across threads If one thread flushes the code cache, other threads may resume in stale memory Flushing the Code Cache

14 Hazelwood – ISMM 2009 13 Naïve Flush Wait for all threads to return to the code cache Could wait indefinitely! VM CC1 VMstall VMstall CC2 VMCC1VMCC2 Flush Delay Thread1 Thread2 Thread3 Time

15 Hazelwood – ISMM 2009 14 Generational Flush Allow threads to continue to make progress in a separate area of the code cache VM CC1 VM CC2 VMCC1VMCC2 Thread1 Thread2 Thread3 Requires a high water mark Time

16 Hazelwood – ISMM 2009 15 Memory Scalability of the Code Cache Ensuring scalability also requires carefully configuring the code stored in the cache Trace Lengths First basic block is non-speculative, others are speculative Longer traces = fewer entries in the lookup table, but more unexecuted code Shorter traces = two off-trace paths at ends of basic blocks with conditional branches = more exit stub code

17 Hazelwood – ISMM 2009 16 Effect of Trace Length on Trace Count

18 Hazelwood – ISMM 2009 17 Effect of Trace Length on Memory

19 Hazelwood – ISMM 2009 18 Rewriting Instructions Pin must regularly rewrite branches No atomic branch write on x86 We use a neat trick*: “old” 5-byte branch 2-byte self branch n-2 bytes of “new” branch “new” 5-byte branch * Sundaresan et al. 2006

20 Hazelwood – ISMM 2009 19 Performance Results We use the SPEC OMP 2001 benchmarks OMP_NUM_THREADS environment variable We compare Native performance and scalability Pin (no Pintool) performance scalability Pin (lightweight Pintool) scalability InsCount Pintool – counts instructions at BB granularity Pin (middleweight Pintool) scalability MemTrace Pintool – records memory addresses Pin (heavyweight Pintool) scalability CMP$im – collects memory addresses and applies a software model of the CMP cache

21 Hazelwood – ISMM 2009 20 Native Scalability of SPEC OMP 2001

22 Hazelwood – ISMM 2009 21 Performance Scalability (No Instrumentation)

23 Hazelwood – ISMM 2009 22 Performance Scalability (LightWeight Instrumentation)

24 Hazelwood – ISMM 2009 23 Performance Scalability (MiddleWeight Instrumentation)

25 Hazelwood – ISMM 2009 24 Performance Scalability (HeavyWeight Instrumentation)

26 Hazelwood – ISMM 2009 25 Memory Scalability

27 Hazelwood – ISMM 2009 26 Summary Dynamic instrumentation tools are useful In the multicore era, we must provide support for MT application analysis and simulation Providing MT support in Pin was easy Making it robust and scalable was not easy http://www.pintool.org


Download ppt "Scalable Support for Multithreaded Applications on Dynamic Binary Instrumentation Systems Kim Hazelwood Greg Lueck Robert Cohn."

Similar presentations


Ads by Google