Download presentation
Presentation is loading. Please wait.
Published byRegina Cooper Modified over 9 years ago
1
Scalable Support for Multithreaded Applications on Dynamic Binary Instrumentation Systems Kim Hazelwood Greg Lueck Robert Cohn
2
Hazelwood – ISMM 2009 1 Dynamic Binary Instrumentation sub$0xff, %edx cmp%esi, %edx jle mov$0x1, %edi add$0x10, %eax counter++; Inserts or modify arbitrary instructions in executing binaries, e.g.: instruction count
3
Hazelwood – ISMM 2009 2 Instruction Count Output $ /bin/ls Makefile imageload.out itrace proccount imageload inscount atrace itrace.out $ pin -t inscount.so -- /bin/ls Makefile imageload.out itrace proccount imageload inscount atrace itrace.out Count 422838
4
Hazelwood – ISMM 2009 3 How Does it Work? Generates and caches modified copies of instructions Modified (cached) instructions are executed in lieu of original instructions EXE Transform Code Cache Execute Profile
5
Hazelwood – ISMM 2009 4 Why “Dynamic” Instrumentation? Robustness! No need to recompile or relink Discover code at runtime Handle dynamically-generated code Attach to running processes The Code Discovery Problem on x86 Instr 1Instr 2 Instr 3Jump RegDATA Instr 5Instr 6 Uncond BranchPADDING Instr 8 Indirect jump to ?? Data interspersed with code Pad for alignment
6
Hazelwood – ISMM 2009 55 Intel Pin A dynamic binary instrumentation system Easy-to-use instrumentation interface Supports multiple platforms –Four ISAs – IA32, Intel64, IPF, ARM –Four OSes – Linux, Windows, FreeBSD, MacOS Popular and well supported –32,000+ downloads –400+ citations –500+ mailing list subscribers
7
Hazelwood – ISMM 2009 6 Research Applications Gather profile information about applications Compare programs generated by competing compilers Generate a select stream of live information for event-driven simulation Add security features Emulate new hardware Anything and everything multicore
8
Hazelwood – ISMM 2009 7 The Problem with Modern Tools Many research tools do not support multithreaded guest applications Providing support for MT apps is mostly straightforward Providing scalable support can be tricky!
9
Hazelwood – ISMM 2009 8 Issues that Arise Gaining control of executing threads Determining what should be private vs. shared between threads Code cache maintenance and consistency Concurrent instruction writes Providing/handling thread-local storage Handling indirect branches Handling signals / system calls
10
Hazelwood – ISMM 2009 9 The Pin Architecture JIT Compiler Syscall Emulator Signal Emulator Dispatcher Instrumentation Code Call-Back Handlers Analysis Code Code Cache Pin SerializedParallel T1 T2 T1 T2 Pin Tool
11
Hazelwood – ISMM 2009 10 Code Cache Consistency Cached code must be removed for a variety of reasons: Dynamically unloaded code Ephemeral/adaptive instrumentation Self-modifying code Bounded code caches EXE Transform Code Cache Execute Profile
12
Hazelwood – ISMM 2009 11 Motivating a Bounded Code Cache The Perl Benchmark
13
Hazelwood – ISMM 2009 12 Option 1: All threads have a private code cache (oops, doesn’t scale) Option 2: Shared code cache across threads If one thread flushes the code cache, other threads may resume in stale memory Flushing the Code Cache
14
Hazelwood – ISMM 2009 13 Naïve Flush Wait for all threads to return to the code cache Could wait indefinitely! VM CC1 VMstall VMstall CC2 VMCC1VMCC2 Flush Delay Thread1 Thread2 Thread3 Time
15
Hazelwood – ISMM 2009 14 Generational Flush Allow threads to continue to make progress in a separate area of the code cache VM CC1 VM CC2 VMCC1VMCC2 Thread1 Thread2 Thread3 Requires a high water mark Time
16
Hazelwood – ISMM 2009 15 Memory Scalability of the Code Cache Ensuring scalability also requires carefully configuring the code stored in the cache Trace Lengths First basic block is non-speculative, others are speculative Longer traces = fewer entries in the lookup table, but more unexecuted code Shorter traces = two off-trace paths at ends of basic blocks with conditional branches = more exit stub code
17
Hazelwood – ISMM 2009 16 Effect of Trace Length on Trace Count
18
Hazelwood – ISMM 2009 17 Effect of Trace Length on Memory
19
Hazelwood – ISMM 2009 18 Rewriting Instructions Pin must regularly rewrite branches No atomic branch write on x86 We use a neat trick*: “old” 5-byte branch 2-byte self branch n-2 bytes of “new” branch “new” 5-byte branch * Sundaresan et al. 2006
20
Hazelwood – ISMM 2009 19 Performance Results We use the SPEC OMP 2001 benchmarks OMP_NUM_THREADS environment variable We compare Native performance and scalability Pin (no Pintool) performance scalability Pin (lightweight Pintool) scalability InsCount Pintool – counts instructions at BB granularity Pin (middleweight Pintool) scalability MemTrace Pintool – records memory addresses Pin (heavyweight Pintool) scalability CMP$im – collects memory addresses and applies a software model of the CMP cache
21
Hazelwood – ISMM 2009 20 Native Scalability of SPEC OMP 2001
22
Hazelwood – ISMM 2009 21 Performance Scalability (No Instrumentation)
23
Hazelwood – ISMM 2009 22 Performance Scalability (LightWeight Instrumentation)
24
Hazelwood – ISMM 2009 23 Performance Scalability (MiddleWeight Instrumentation)
25
Hazelwood – ISMM 2009 24 Performance Scalability (HeavyWeight Instrumentation)
26
Hazelwood – ISMM 2009 25 Memory Scalability
27
Hazelwood – ISMM 2009 26 Summary Dynamic instrumentation tools are useful In the multicore era, we must provide support for MT application analysis and simulation Providing MT support in Pin was easy Making it robust and scalable was not easy http://www.pintool.org
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.