1 SWAT Memory Leak Detection Matthias Hauswirth
2 Agenda Approaches to memory leak detection SWAT infrastructure Heap model Staleness predicates Leak analysis tool
3 Memory Leaks time object1 allocfreeaccess
4 Memory Leaks time object1 allocaccess allocfreeaccess shutdown object2
5 Memory Leaks time object1 allocaccess allocaccess reachableunreachable allocfreeaccess shutdown object2 object3
6 Approaches to Leak Detection Survivors Objects surviving until program termination Unreachables Objects unreachable at snapshot (GC) Stales Objects not recently accessed at snapshot (SWAT)
7 Survivors: Guess time o1 o2 startupshutdown o3 o4 o5 leak -
8 Survivors: Reality time o1 o2 startupshutdown o3 o4 o5 leak ? leak leak ? leak -
9 Unreachables: Guess time o1 o2 startupshutdown o3 o4 o5 leak snapshot - alive -
10 Unreachables: Reality time o1 o2 startupshutdown o3 o4 o5 alive ? snapshot leak - alive -
11 Stales (SWAT): Guess time o1 o2 startupshutdown o3 o4 o5 leak snapshot leak - - alive
12 Stales (SWAT): Reality time o1 o2 startupshutdown o3 o4 o5 snapshot - - alive leak
13 SWAT Infrastructure instrument winword.exe winword.swat.exe runswatruntime.dll source info postprocess snapshots statistics view settings
14 Instrument proc1 comp1
15 Bursty Tracing: Duplicate Basic Blocks proc1prof$proc1 comp1
16 Bursty Tracing: Insert Dispatch Checks proc1prof$proc1 comp1
17 Instrumentation: Patch Allocations & Frees xallocXallocWrapper comp1swatruntime.dll
18 Instrumentation: Instrument Loads & Stores proc1prof$proc1 comp1 RecordReference swatruntime.dll
19 Bursty Tracing Dispatch Check DecOrig StayOrig OrigTgt OrigZero DecProf StayProfStartOrigStartProf ProfTgt OrigSrcProfSrc Global Counters: cOrig# of StayOrig cProf# of StayProf cOrig==1 cOrig>1cProf==0cProf==1cProf>1
20 Adaptive Bursty Tracing Bursty tracing Sampling rate influences results Rate chosen at runtime Adaptive bursty tracing Different sampling rate by dispatch check point Start at high rate Wait until average gets down to requested rate Start rate, delta & target rate chosen at runtime
21 Why Adaptive Bursty Tracing?
22 Adaptive Bursty Tracing Dispatch Check DecOrig StayOrig OrigTgt OrigZero DecProf StayProfStartOrigStartProf ProfTgt OrigSrcProfSrc Per-Dispatch Check Counter: cOrig[dcid]# of StayOrig Global Counter: cProf# of StayProf cOrig[dcid]==1 cOrig[dcid]>1cProf==0cProf==1cProf>1 dcid
23 Effect of Adaptive Bursty Tracing on Coverage
24 SWAT Heap Model Requirements AllocateObject(eip, startAddress, size) FreeObject(eip, startAddress) FindObject(eip, address) GetObjectIterator() Implementations Hash table (address→objectInfo) Hash table (startAddress→objectInfo) Hash table (address→offsetToStartAddress) Address tree
25 SWAT Heap Model Address:
26 SWAT Heap Model byte0101
27 SWAT Heap Model
28 SWAT Heap Model
29 SWAT Heap Model
30 SWAT Heap Model
31 SWAT Heap Model Start address:0101 Size:8 Access count:19 Last access time:19’000’000 Alloc site:EIP 0x Last access site:EIP 0x400190
32 SWAT Heap Model Space Overhead Address Tree Nodes 0.03 … 0.35 allocated node bytes / allocated byte Overall 0.12 … 3.4 times the allocated memory Time FindObject(eip, address) Log(addressSpaceSize) --- (32 bits = 32 nodes)
33 Evaluation: Time Overhead
34 active Staleness Predicates Stale = object not needed anymore Stale, if… Never accessed Idle time > t Idle time > n * active time idle n*active t idle
35 Evaluation Inject leaks Randomly, at runtime, decide not to execute a free Variables Sampling rate Adaptive or bursty Predicate Measurement results per snapshot List of objects assumed leaked Some true, some false List of objects assumed alive Some true, some false
36 Comparing Predicates
37 Comparing Sampling Rates
38 Lucky Omission Effect maxIdleTime time [# actual references] Injected Leak Question At time of snapshot, is object a leak? snapshot
39 Lucky Omission Effect maxIdleTime Low sampling rate time [# actual references] snapshot
40 Lucky Omission Effect maxIdleTime Low sampling rateassumed leaked: true time [# actual references] snapshot
41 Lucky Omission Effect maxIdleTime Low sampling rate High sampling rate assumed leaked: true time [# actual references] snapshot
42 Lucky Omission Effect maxIdleTime Low sampling rate High sampling rate assumed leaked: true assumed alive: false time [# actual references] snapshot
43 Lucky Omission Effect lucky omission window maxIdleTime Low sampling rate High sampling rate assumed leaked: true assumed alive: false time [# actual references] snapshot
44 Mitigation of Lucky Omission Effect Reduce chance of leak happening during maxIdleTime snapshotInterval >> maxIdleTime maxIdleTime time [# actual references] maxIdleTime snapshotInterval snapshot
45 Practical Sampling Rates & Useful Predicates
46 Leak Analysis Tool
47 Ranking Sort pairs Old rankings: # of stale objects [currently used] # of stale bytes Drag caused by stale objects (bytes*idle time) New ranking: # of predicates declaring an object stale
48 Conclusions Many ways to leak detection Predicting leaks by looking at past events: Important objects might never be used (boxsim) Lots of stale objects might indicate a space- inefficient algorithm Leak Analysis Tool Made it easy to find several statically injected leaks
49 Future Work Currently: Store source info compactly (at instrumentation time) Snapshots at runtime don’t use source info Post process snapshots to add source info This week: Rank leaks Update Leak Analysis Tool to use ranking Run new version on winword.exe and mshtml.dll Later: Combine “Unreachables” with “Stales” approach