Download presentation
Presentation is loading. Please wait.
Published byLeon Barnett Modified over 6 years ago
1
Cork: Dynamic Memory Leak Detection with Garbage Collection
Maria Jump Kathryn S. McKinley
2
Memory Bugs What memory bugs do explicitly managed languages have?
Does managed memory solve all our memory problems? Program CRASHING after days/weeks of execution complicates the debugging process …
3
Memory Leaks in Managed Languages
Result from inadvertently maintaining references to “dead” objects Best case: increases GC workload Worst case: causes program crash Dynamically analyze heap to detect systematic heap growth Program CRASHING after days/weeks of execution complicates the debugging process …
4
Related Work Offline Techniques: Online Techniques:
Static analysis [Heine et al. 03] Heap differencing [JProbe, DePauw et al. 98, 99, 00] Allocation and/or usage tracking [OptimizeIt, Rationale, Purify, HAT, HPROF, Shaham et al. 00] Online Techniques: Leakbot (partially online) [Mitchell et al. 03] Adaptive usage tracking [Chilimbi et al. 04, Bond et al. 06] Static analysis (c/c++) = uses object ownership to find double-frees and missing frees (what are the downsides) Heap differencing and allocation/usage tracking = takes heap dumps and analyzes them separately from application by differences them to find parts of the graph no longer being used Leakbot = refines heap-differencing to identify data structures with potential leaks, then uses online diagnosis of those data structures to report leaks to the user. Still two runs. Adaptive usage tracking uses adaptive profiling to reduce the overhead of per-object access tracking. Do not account for custom memory management in C/C++ programs they analyze. Per-instance bookkeeping too expensive for Java Cork accurately pinpoints systematic heap growth completely online
5
Cork Opportunity: tracing GC visits all the object in the heap!
Build heap summarization graph Class points-from graph (CPFG) Summarizes volume of nodes and edges Identify growth by differencing CPFGs across collections Identify candidates using node rank Identify the data structure using edge rank * Be clear on the difference between candidates and data structure
6
1. Calculating Type Points-To
Heap Type Points-To (TPT) 2 3 1 3 1 1 4 4 1 2 Remember to talk about volume (number of bytes) =instance =type
7
2. Differencing Graphs Cork’s optimizations: Keeps 3 graphs TPTi
Prunes obviously non-growing parts Volume decay guards against premature pruning Ranks nodes/edges 1 1 1 TPTi 1 1 2 2 1 2 2 2 2 1 2 TPTi+1 1 3 3 3 1 1 1 1 1 1 1 TPTi+2 1 4 4 1
8
Finding Growth (RRT) Find nodes of types t that grow
1 4 Find nodes of types t that grow Vt(i) > (1 -f) * Vt(i-1) i is the phase & f is a decay factor e.g., .05 Rank nodes and edges ri = ri-1 +/- pi * (Q - 1) P add to rank if type grows in phase p, subtract if it shrinks Q is a ratio > 1 of Vi to Vi-1 Designate node as a candidate if rt(i) > Rthreshold Say that we are not sensitive to the rank threshold
9
Reported Candidates SRT RRT # of Candidates jess fop SPECjbb
This summarizes the Cork’s Candidate Reports jess fop SPECjbb
10
Finding Data Structure
1 4 1 Finding Data Structure Type is not enough Growing edges identify the data structure Rank edges Calculate a slice from each candidate Set of all paths (n0…nn) such that “Sees” beyond non-candidate nodes
11
Implementation and Methodology
Jikes RVM with MMTk Benchmarks: SPECjvm98, DaCapo, SPECjbb2000 Eclipse 3.1.2 Garbage collector Generational with 4MB bounded nursery For performance, report application only Replay compilation 2nd run methodology Jikes RVM is a Java-in-Java virtual machine We also used it to search for memory leaks in Eclipse
12
Efficiency and Scalability
Node/type data stored in type information block (TIB) adding 5 words 1 word for type volume and edge list pointer for each of the previous 4 collections 1 word for # of phases (p) Edge data stored in lists Prune parts of TPFG that are non-growing Are there ways to implement the type summary graph?
13
Space Overhead jess Eclipse Geomean # of types bm+VM 1744 3365 1747
TPFG avg 318 667 334 TPFG max 319 775 346 # of edges 844 4090 904 861 7585 1142 % pruned 66% 42% 60% Increased Alloc % 0.094% 0.167% 0.233% 19% 2.7X 0.233% Geomean is across ALL benchmarks Surprising result … heap does not have very many LIVE types at once
14
Heap Size Relative to Minimum
Time Overhead Normalized Total Time As the heap gets bigger, overhead decreases as GC time decreases Heap Size Relative to Minimum UMCP
15
Time Overhead Is this good enough? Would you add it to your system?
As the heap gets bigger, overhead decreases as GC time decreases Is this good enough? Would you add it to your system?
16
Dynamic Heap Analysis with Cork
Cork identified: Systematic heap growth Growing classes Growing data structure Benchmarks: fop – application design jess – in input SPECjbb2000 – memory leak Eclipse # – repeatedly performing a structural (recursive) diff leaks memory SPECjbb fop jess SPECjbb2000 bug … one of the major reasons they moved to SPECjbb2005 to fix it Give Mike credit for the Eclipse bug
17
Time (MB of allocation)
Eclipse Heap Occupancy (MB) Time (MB of allocation)
18
Eclipse 115789: CPFG 3365 classes loaded (1773 in Eclipse)
Average graph: 667 nodes 4090 edges
19
ResourceCompareInput
Eclipse : Slice Path Identifies 7 candidates: rt > rthres Calculates slice from each candidate: set of all paths (n0…nn) s.t. rn(k+1)n(k)<0 File Folder String[] Object[] ResourceCompareInput$ FilteredBufferedResourceNode ArrayList ResourceCompareInput
20
Time (MB of allocation)
Eclipse Heap Occupancy (MB) Time (MB of allocation)
21
ResourceCompareInput
Eclipse : Slice Path Identifies 7 candidates: rt > rthres Calculates slice from each candidate: set of all paths (n0…nn) s.t. rn(k+1)n(k)<0 File Folder String[] Object[] ResourceCompareInput$ FilteredBufferedResourceNode ArrayList ResourceCompareInput HashMap
22
Time (MB of allocation)
Eclipse Heap Occupancy (MB) Time (MB of allocation)
23
Cork’s Contributions Performs dynamic heap analysis to detect systematic heap growth Uses a class points-from graph to summarize volume relations <0.5% space overhead ~2% time overhead Accurately identifies User-defined classes causing the growth Data structure containing the growth
24
What else can the GC tell us?
Testing time? In deployment?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.