Download presentation
Presentation is loading. Please wait.
Published byDomenic Jenkins Modified over 8 years ago
1
Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore {zhaoqin,esim,wongwf}@comp.nus.edu.sg DEP : Detailed Execution Profile Larry Rudolph SingaporeMIT Alliance Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology rudolph@csail.mit.edu Chine -Cheng Wu PAS Lab,CSIE, NTU
2
Introduction Previous work on profiling needs large memory space and big times slowdown DEP (detailed Execution Profile) captures the complete dynamic control flow, data dependency and memory reference at the same time The profile size is significantly reduced DEP uses DynamoRIO binary instrumentation framework to profile in an infrastructure called Adept (A dynamic execution profiling tool)
3
DEP Advantage DEP complete coverage of the program including shared libraries Multi-threaded application can be collected by independent DEPs Collection is very efficient, incurring a 5 times slowdown Profile contains memory reference and control flow information
4
Control Flow Profile : DEP c Traditional way to record basic block entries using 4 byte for each DEP use 2-byte for each and an extra 2-byte if needed H-tag for high 2 bytes L-tag for low 2 bytes This compressibility does not guarantee space optimization
5
Memory References Profile : DEP m Memory reference : {pc,addr,size,type} PC of the memory reference instruction Address of memory reference Size of the data being accessed If it’s a read or a write Storing only the necessary values that
6
Memory Reference There are three memory references above Push ebp; Mov 0 -> [esp+4]; Mov 0 -> [esp+8];
7
BB_pc+Mem_addr Compared to DEP DEP trigger fewer analyzer calls than (BB_pc+Mem_addr) cause of smaller profile data that reach overflow to signal analyzer Penalty includes steal and restore registers Address calculation Storage of the address Update profile counter Extra overhead Checking H-tag changes Checking and updating register status
8
DynamoRIO Running on IA-32 under both Linux and Windows DynamoRIO executes applications by copying user code into cache and then executing Code is the same as original one except control operation return to DynamoRIO Trace cache will cache code for in-direct branch lookup
9
ADEPT : A Dynamic Execution Profiling Tool
10
Control Flow : Obtaining DEPc If the L-tag is 0x0000
11
Memory References: Obtaining DEPm Two state of each register variable : UPDATED, RECORDED
12
Profile Buffer Store the collected profile for future analysis One buffer for each thread Using large buffer will reduce analyzer invocations Profile buffer has two parts for DEPc and DEPm separately 20 % for DEPc, 80 % for DEPm works well Analyzer is triggered by buffer full using OS signal of page segmentation fault
13
Optimizing DEPc Basic block 0x0804ffa4 branch to 0x08050000
14
Optimizing DEP m Optimized
15
Evaluation Platform : Dual-core 3.2GHz Intel Pentium D 840, 2GBytes of RAM OS : Linux Fedora Core 4 and Windows XP SP2 Benchmarks : SPEC CPU2000 integer benchmarks for Linux, SysMark 2004SE for windows ( run Access, PowerPoint and Word ) Compiler : gcc with -O3 flag
16
Execution Time
17
Relative slowdown
18
Profile Frameworks Pin Count number of basic blocks executed Count number of memory references Valgrind Cachegrind is a cache profiler for capture the number of basic blocks counts and memory references counts eWPP (Extended Whole Program Paths) Recording control flow and dependence information Uses two-phase profiling approach First phase, identify all memory dependence Second phase, collection phase
19
Profile Size and Compressibility * CF_bit uses bits and 4-byte target addresses for indirect branches
20
Normalize by uncompress BB_pc size Normalize by uncompress Mem_addr CF_bit not compress well
21
Related Work Whole Execution Traces (WET) Simulation environment Whole Program Paths (eWPP) Encode trace information in WPP Whole Program Paths (WPP) They have difficulties to support multi-thread applications
22
Conclusion DEP captures major program execution Control flow, memory reference DEP collected by Adept which can perform on-line or off- line analysis Adept builds the mapping between collected information and original apps. Experiment results show 5 times slowdown and save 40% space compared to traditional profiles Complete trace to recover whole program execution is not necessarily, particular segment can be reproduced for simulations or replay
23
Back-up Slides
24
Recovering memory reference trace Using naïve approach of recovering the memory reference trace from a DEP
25
Recovering Memory References Scenario 1 : complete memory reference profile { pc,addr,size,type} Scenario 2 : DEP collected by Adept Scenario 2 almost triple of native execution time Tradeoff
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.