Download presentation
Presentation is loading. Please wait.
1
Instruction-level Tracing: Framework & Applications
Sanjay Bhansali Binary Technologies Group Center for Software Excellence (CSE) Microsoft 11/04/2005
2
Context Program analysis and transformation technology can have huge impact on engineering of software. Center for Software Excellence Part of Windows Core OS Division Balance research on innovation with focus on deployment Binary Technologies Group Binary analysis Static and Dynamic approaches PA can have big impact in engineering complex systems – more reliable, finding defects early, perform better, 9/20/2018
3
Outline Applications of Execution Traces Dynamic Translation
Trace Capture Trace Replay Applications Related Work Summary 9/20/2018
4
Applications of Execution Traces
Debugging Regression Analysis Bug detection Coverage Analysis Optimization Impact analysis Usage analysis … We want to build a framework that makes it easy to attack many different kinds of problems. 9/20/2018
5
Run Once, Analyze Many Complete instruction-level trace
Deterministic, full fidelity replay of user mode execution Pros Run once, analyze multiple times Cons Trace size, performance 9/20/2018
6
Framework for Instruction level Tracing and Analysis
Task and machine independent User mode processes Modest overhead (space and time) On-demand tracing Reduce engineering effort for building analysis tools 9/20/2018
7
Dynamic Binary Translation
Runtime interpretation/translation of binary instructions Pros Requires no static instrumentation, or special symbol information Handle dynamically generated code, self modifying code Cons Approximately ~5x slower than native execution Competing alternative is to do it statically 9/20/2018
8
Nirvana Architecture Nirvana Client Nirvana API Code Cache
JIT translator Application The core component is an instruction emulator that we call nirvana. VM monitor User Kernel Nirvana Driver Operating System 9/20/2018
9
JIT Translation Example
Native code Translated code mov EDX, tls.ebp mov EAX, [EDX] mov eax, [ebp] 9/20/2018
10
JIT Translation Example
Native code Translated code mov EDX, tls.ebp mov ECX, tls call MemReadCallback mov EAX, [EDX] mov eax, [ebp] 9/20/2018
11
Code Cache Management Single code cache Per Thread code cache
Contention, locality Per Thread code cache Code bloat P+d code caches where P = number of processors Reuse code caches when possible Fall back on interpretation 9/20/2018
12
Self modifying code Snoop on system calls to flush hardware cache
Watch page protection of code bytes Mark page if non-writable, and flush code cache on page protection change Insert self-mod instruction check otherwise Fall back on interpretation if too many code cache flushes 9/20/2018
13
Nirvana API RegisterEventCallback(event,callback) Events: Translation
InstructionStart MemRead MemWrite FlowChange Sequencing 9/20/2018
14
Example Nirvana Client
/* Memory Read Logger */ bool Initialize() { if (!InitializeNirvanaClient) RegisterCallback(MemReadEvent, MemCallback); } void MemCallback(NirvContext *ctx, void* pAddr, int nBytes) X86REGS *pRegs = (X86Regs*) ctx->cpuRegs; Log(pregs->InstructionPtr(),pAddr,nBytes); 9/20/2018
15
Tracing & Replay Overview
Playback Process Record Process >> << || Application Nirvana Emulation Replay Defect Trace Writer Nirvana Trace Reader Debugger Trace Log … Different Machines 9/20/2018
16
Trace Writer Log only what cannot be regenerated by processor
Values read from memory Values changed by kernel Machine and time sensitive instructions (cpuid,rdtsc) Everything else can be regenerated Trace size is ~4-5 bytes per instruction Trace files are self contained 9/20/2018
17
Optimization: Trace select reads
Observation: Hardware caches eliminate most off-chip reads Use same trick to optimize logging: Have logger and replayer simulate identical cache memories Only log cache misses Average trace size is <1 bit per instruction 9/20/2018
18
Example The only read not predicted and logged follows the system call
for (j = 0; j < 10; j++) { i = i + j; } k = i; // value read is 46 System_call(); k = i; // value read is 0 (not predicted) The only read not predicted and logged follows the system call 9/20/2018
19
Sequence points & Checkpoints
lock xadd User/Kernel Kernel/User Kernel/User Exception Module Load Tracing uses per-thread streams for performance Sequence points used to impose partial order on instruction executions across threads Checkpoint frames for random access into the trace (every 5 million instructions) Hard key frames enable ring buffer 9/20/2018
20
Trace Writer Performance
Application Simulated Instructions (millions) Trace File Size Bits / Instruction Native Execution Time Time While Tracing Overhead Gzip 24,097 245 MB 0.09 11.7s 187s 15.98 Excel 1,781 99 MB 0.47 18.2s 105s 5.76 Power Point 7,392 528 MB 0.60 43.6s 247s 5.66 IE 116 5 MB 0.50 0.499s 6.94s 13.90 Vulcan 2,408 152 MB 0.53 2.74s 46.6s 17.01 Satsolver 9,431 1300 MB 1.16 9.78s 127s 12.98 9/20/2018
21
Trace Reader - Replay Nirvana requests code & data via the Fetch operations TraceReader uses same prediction cache as TraceWriter Instruction Fetch Trace Log Data Read Nirvana Miss Data Fetch Prediction Cache Data Write 9/20/2018
22
Trace Reader - Navigation
Destination Current Position Checkpoint Frame 1 2 3 4 5 6 7 8 Navigation involves going back to the closest Checkpoint frame before the destination and executing forward to the destination from there. Label T1, T2, T3 9/20/2018
23
Trace Reader - Navigation
Destination Current Position Checkpoint Frame 1 2 3 4 5 6 7 8 Navigation involves going back to the closest Checkpoint frame before the destination and executing forward to the destination from there. Label T1, T2, T3 9/20/2018
24
Trace Reader - Navigation
Destination Current Position Checkpoint Frame 1 2 3 4 5 6 7 8 Navigation involves going back to the closest Checkpoint frame before the destination and executing forward to the destination from there. 9/20/2018
25
Time Travel Debugging Examine a program as it runs backwards to figure out root cause of a problem. Reverse breakpoint Step back Search backwards in time Used to diagnose bugs in shipped products 9/20/2018
26
Truscan: Defect Detection Tool
Scan traces for bugs that “hide” memory leaks dangling pointers un-initialized memory Report bugs that really happen – no false positives Debug with time travel debugging 9/20/2018
27
Example: Memory Leak Detection
eax = HeapAlloc(42); mov [0x4004], eax ADDR = 0x3004 SIZE = 42 REFCOUNT = eax eax = 0; 0x4004 2 1 1 pass Leak!! mov [0x4004], 0 This example is trivial, but … 9/20/2018
28
Statistics A Windows Application (under development)
600 million instructions 80,000 allocations 30 million pointers 48 leaks (8 unique bugs) Native : ~9 seconds Trace: ~44 seconds Analyze: ~41 minutes (3 Ghz, single threaded, 1GB ram) 9/20/2018
29
Regression Analysis TraceDiff Callgraph OS1 OS2 App 1
Foo bar m1 m2 m3 p1 m4 Callgraph OS1 OS2 App 1 Instruction Sequence Mov edi, edi Push ebp Mov ebp, esp Sub esp, 0x54 Cmp [ebp+24],0 Jne … Call … Mov … Trace 1 . TraceDiff Coverage Foo Bar m1 m2 m3 p1 Trace 2 9/20/2018
30
Related Work Process Virtualization Instrumentation Trace Compression
DynamoRIO, Mojo, DELI, ReVirt, Valgrind Instrumentation ATOM, Vulcan, SHADE, Pin Trace Compression VPC Reverse Debugging ReVirt, Traceback, BugNet, Flashback, FDR Program/Trace Diffing & Applications Zeller, Zhang&Gupta 9/20/2018
31
Summary Flexible framework for instruction level tracing and analysis
Complete full-fidelity traces Run once, analyze multiple times Reasonable overhead Many useful applications Debugging, defect detection, optimization, … 9/20/2018
32
Shameless self promotion!
Hiring for internships and full-time positions at all levels Contact: 9/20/2018
33
Questions 9/20/2018
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.