Cork: Dynamic Memory Leak Detection with Garbage Collection

Slides:



Advertisements
Similar presentations
Department of Computer Sciences Dynamic Shape Analysis via Degree Metrics Maria Jump & Kathryn S. McKinley Department of Computer Sciences The University.
Advertisements

Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
Introduction to Memory Management. 2 General Structure of Run-Time Memory.
1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center.
Michael Bond Kathryn McKinley The University of Texas at Austin Presented by Na Meng Most of the slides are from Mike’s original talk. Many thanks go to.
Object Field Analysis for Heap Space Optimization ISMM 2004 G. Chen, M. Kandemir, N. Vijaykrishnanan and M. J. Irwin The Pennsylvania State University.
MC 2 : High Performance GC for Memory-Constrained Environments - Narendran Sachindran, J. Eliot B. Moss, Emery D. Berger Sowmiya Chocka Narayanan.
Various languages….  Could affect performance  Could affect reliability  Could affect language choice.
Department of Computer Sciences Cork: Dynamic Memory Leak Detection with Garbage Collection Maria Jump Kathryn S. McKinley
Guoquing Xu, Atanas Rountev Ohio State University Oct 9 th, 2008 Presented by Eun Jung Park.
Hastings Purify: Fast Detection of Memory Leaks and Access Errors.
CORK: DYNAMIC MEMORY LEAK DETECTION FOR GARBAGE- COLLECTED LANGUAGES A TRADEOFF BETWEEN EFFICIENCY AND ACCURATE, USEFUL RESULTS.
MC 2 : High Performance GC for Memory-Constrained Environments N. Sachindran, E. Moss, E. Berger Ivan JibajaCS 395T *Some of the graphs are from presentation.
Heap Shape Scalability Scalable Garbage Collection on Highly Parallel Platforms Kathy Barabash, Erez Petrank Computer Science Department Technion, Israel.
A Comparison of Online and Dynamic Impact Analysis Algorithms Ben Breech Mike Tegtmeyer Lori Pollock University of Delaware.
Finding Low-Utility Data Structures Guoqing Xu 1, Nick Mitchell 2, Matthew Arnold 2, Atanas Rountev 1, Edith Schonberg 2, Gary Sevitsky 2 1 Ohio State.
U NIVERSITY OF M ASSACHUSETTS Department of Computer Science Automatic Heap Sizing Ting Yang, Matthew Hertz Emery Berger, Eliot Moss University of Massachusetts.
Run time vs. Compile time
An Adaptive, Region-based Allocator for Java Feng Qian & Laurie Hendren 2002.
Age-Oriented Concurrent Garbage Collection Harel Paz, Erez Petrank – Technion, Israel Steve Blackburn – ANU, Australia April 05 Compiler Construction Scotland.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Garbage Collection Without Paging Matthew Hertz, Yi Feng, Emery Berger University.
1 Reducing Generational Copy Reserve Overhead with Fallback Compaction Phil McGachey and Antony L. Hosking June 2006.
Bell: Bit-Encoding Online Memory Leak Detection Michael D. Bond Kathryn S. McKinley University of Texas at Austin.
The College of William and Mary 1 Influence of Program Inputs on the Selection of Garbage Collectors Feng Mao, Eddy Zheng Zhang and Xipeng Shen.
Tolerating Memory Leaks Michael D. Bond Kathryn S. McKinley.
Michael Bond Kathryn McKinley The University of Texas at Austin.
Flexible Reference-Counting-Based Hardware Acceleration for Garbage Collection José A. Joao * Onur Mutlu ‡ Yale N. Patt * * HPS Research Group University.
Exploiting Prolific Types for Memory Management and Optimizations By Yefim Shuf et al.
Taking Off The Gloves With Reference Counting Immix
Fast Conservative Garbage Collection Rifat Shahriyar Stephen M. Blackburn Australian National University Kathryn S. M cKinley Microsoft Research.
1 Fast and Efficient Partial Code Reordering Xianglong Huang (UT Austin, Adverplex) Stephen M. Blackburn (Intel) David Grove (IBM) Kathryn McKinley (UT.
Dynamic Object Sampling for Pretenuring Maria Jump Department of Computer Sciences The University of Texas at Austin Stephen M. Blackburn.
Chameleon Automatic Selection of Collections Ohad Shacham Martin VechevEran Yahav Tel Aviv University IBM T.J. Watson Research Center Presented by: Yingyi.
Finding Your Cronies: Static Analysis for Dynamic Object Colocation Samuel Z. Guyer Kathryn S. McKinley T H E U N I V E R S I T Y O F T E X A S A T A U.
Investigating the Effects of Using Different Nursery Sizing Policies on Performance Tony Guan, Witty Srisa-an, and Neo Jia Department of Computer Science.
380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.
1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),
CoCo: Sound and Adaptive Replacement of Java Collections Guoqing (Harry) Xu Department of Computer Science University of California, Irvine.
A Region-Based Compilation Technique for a Java Just-In-Time Compiler Toshio Suganuma, Toshiaki Yasue and Toshio Nakatani Presenter: Ioana Burcea.
1 GC Advantage: Improving Program Locality Xianglong Huang, Zhenlin Wang, Stephen M Blackburn, Kathryn S McKinley, J Eliot B Moss, Perry Cheng.
Tracking Bad Apples: Reporting the Origin of Null & Undefined Value Errors Michael D. Bond UT Austin Nicholas Nethercote National ICT Australia Stephen.
GC Assertions: Using the Garbage Collector To Check Heap Properties Samuel Z. Guyer Tufts University Edward Aftandilian Tufts University.
1 The Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss.
Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.
Dynamic Bug Detection & Tolerance Kathryn S McKinley The University of Texas at Austin.
Object Lifetime and Pointers
Non Contiguous Memory Allocation
Inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 7 – More Memory Management Lecturer PSOE Dan Garcia
No Bit Left Behind: The Limits of Heap Data Compression
Storage Management.
Pointers and Dynamic Variables
Storage.
Ulterior Reference Counting Fast GC Without The Wait
David F. Bacon, Perry Cheng, and V.T. Rajan
Jipeng Huang, Michael D. Bond Ohio State University
Strategies for automatic memory management
Adaptive Code Unloading for Resource-Constrained JVMs
Closure Representations in Higher-Order Programming Languages
Correcting the Dynamic Call Graph Using Control Flow Constraints
Memory Management Overview
Lecture Topics: 11/1 General Operating System Concepts Processes
Dongyun Jin, Patrick Meredith, Dennis Griffith, Grigore Rosu
José A. Joao* Onur Mutlu‡ Yale N. Patt*
No Bit Left Behind: The Limits of Heap Data Compression
Garbage Collection Advantage: Improving Program Locality
Lecture 21 Amortized Analysis
Inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 7 – More Memory Management Lecturer PSOE Dan Garcia
Program-level Adaptive Memory Management
Presentation transcript:

Cork: Dynamic Memory Leak Detection with Garbage Collection Maria Jump Kathryn S. McKinley {mjump,mckinley}@cs.utexas.edu

Memory Bugs What memory bugs do explicitly managed languages have? Does managed memory solve all our memory problems? Program CRASHING after days/weeks of execution complicates the debugging process …

Memory Leaks in Managed Languages Result from inadvertently maintaining references to “dead” objects Best case: increases GC workload Worst case: causes program crash Dynamically analyze heap to detect systematic heap growth Program CRASHING after days/weeks of execution complicates the debugging process …

Related Work Offline Techniques: Online Techniques: Static analysis [Heine et al. 03] Heap differencing [JProbe, DePauw et al. 98, 99, 00] Allocation and/or usage tracking [OptimizeIt, Rationale, Purify, HAT, HPROF, Shaham et al. 00] Online Techniques: Leakbot (partially online) [Mitchell et al. 03] Adaptive usage tracking [Chilimbi et al. 04, Bond et al. 06] Static analysis (c/c++) = uses object ownership to find double-frees and missing frees (what are the downsides) Heap differencing and allocation/usage tracking = takes heap dumps and analyzes them separately from application by differences them to find parts of the graph no longer being used Leakbot = refines heap-differencing to identify data structures with potential leaks, then uses online diagnosis of those data structures to report leaks to the user. Still two runs. Adaptive usage tracking uses adaptive profiling to reduce the overhead of per-object access tracking. Do not account for custom memory management in C/C++ programs they analyze. Per-instance bookkeeping too expensive for Java Cork accurately pinpoints systematic heap growth completely online

Cork Opportunity: tracing GC visits all the object in the heap! Build heap summarization graph Class points-from graph (CPFG) Summarizes volume of nodes and edges Identify growth by differencing CPFGs across collections Identify candidates using node rank Identify the data structure using edge rank * Be clear on the difference between candidates and data structure

1. Calculating Type Points-To Heap Type Points-To (TPT) 2 3 1 3 1 1 4 4 1 2 Remember to talk about volume (number of bytes) =instance =type

2. Differencing Graphs Cork’s optimizations: Keeps 3 graphs TPTi Prunes obviously non-growing parts Volume decay guards against premature pruning Ranks nodes/edges 1 1 1 TPTi 1 1 2 2 1 2 2 2 2 1 2 TPTi+1 1 3 3 3 1 1 1 1 1 1 1 TPTi+2 1 4 4 1

Finding Growth (RRT) Find nodes of types t that grow 1 4 Find nodes of types t that grow Vt(i) > (1 -f) * Vt(i-1) i is the phase & f is a decay factor e.g., .05 Rank nodes and edges ri = ri-1 +/- pi * (Q - 1) P add to rank if type grows in phase p, subtract if it shrinks Q is a ratio > 1 of Vi to Vi-1 Designate node as a candidate if rt(i) > Rthreshold Say that we are not sensitive to the rank threshold

Reported Candidates SRT RRT # of Candidates jess fop SPECjbb This summarizes the Cork’s Candidate Reports jess fop SPECjbb

Finding Data Structure 1 4 1 Finding Data Structure Type is not enough Growing edges identify the data structure Rank edges Calculate a slice from each candidate Set of all paths (n0…nn) such that “Sees” beyond non-candidate nodes

Implementation and Methodology Jikes RVM with MMTk Benchmarks: SPECjvm98, DaCapo, SPECjbb2000 Eclipse 3.1.2 Garbage collector Generational with 4MB bounded nursery For performance, report application only Replay compilation 2nd run methodology Jikes RVM is a Java-in-Java virtual machine We also used it to search for memory leaks in Eclipse

Efficiency and Scalability Node/type data stored in type information block (TIB) adding 5 words 1 word for type volume and edge list pointer for each of the previous 4 collections 1 word for # of phases (p) Edge data stored in lists Prune parts of TPFG that are non-growing Are there ways to implement the type summary graph?

Space Overhead jess Eclipse Geomean # of types bm+VM 1744 3365 1747 TPFG avg 318 667 334 TPFG max 319 775 346 # of edges 844 4090 904 861 7585 1142 % pruned 66% 42% 60% Increased Alloc % 0.094% 0.167% 0.233% 19% 2.7X 0.233% Geomean is across ALL benchmarks Surprising result … heap does not have very many LIVE types at once

Heap Size Relative to Minimum Time Overhead Normalized Total Time As the heap gets bigger, overhead decreases as GC time decreases Heap Size Relative to Minimum UMCP

Time Overhead Is this good enough? Would you add it to your system? As the heap gets bigger, overhead decreases as GC time decreases Is this good enough? Would you add it to your system?

Dynamic Heap Analysis with Cork Cork identified: Systematic heap growth Growing classes Growing data structure Benchmarks: fop – application design jess – in input SPECjbb2000 – memory leak Eclipse 3.1.2 #115789 – repeatedly performing a structural (recursive) diff leaks memory SPECjbb fop jess SPECjbb2000 bug … one of the major reasons they moved to SPECjbb2005 to fix it Give Mike credit for the Eclipse bug

Time (MB of allocation) Eclipse 115789 Heap Occupancy (MB) Time (MB of allocation)

Eclipse 115789: CPFG 3365 classes loaded (1773 in Eclipse) Average graph: 667 nodes 4090 edges

ResourceCompareInput Eclipse 115789: Slice Path Identifies 7 candidates: rt > rthres Calculates slice from each candidate: set of all paths (n0…nn) s.t. rn(k+1)n(k)<0 File Folder String[] Object[] ResourceCompareInput$ FilteredBufferedResourceNode ArrayList ResourceCompareInput

Time (MB of allocation) Eclipse 115789 Heap Occupancy (MB) Time (MB of allocation)

ResourceCompareInput Eclipse 115789: Slice Path Identifies 7 candidates: rt > rthres Calculates slice from each candidate: set of all paths (n0…nn) s.t. rn(k+1)n(k)<0 File Folder String[] Object[] ResourceCompareInput$ FilteredBufferedResourceNode ArrayList ResourceCompareInput HashMap

Time (MB of allocation) Eclipse 115789 Heap Occupancy (MB) Time (MB of allocation)

Cork’s Contributions Performs dynamic heap analysis to detect systematic heap growth Uses a class points-from graph to summarize volume relations <0.5% space overhead ~2% time overhead Accurately identifies User-defined classes causing the growth Data structure containing the growth

What else can the GC tell us? Testing time? In deployment?