Connectivity-Based Garbage Collection Presenter Feng Xian Author Martin Hirzel, et.al Published in OOPSLA’2003
2 Garbage Collection Benefits Garbage collection leads to simpler Design no complex deallocation protocols Implementation automatic deallocation Maintenance fewer bugs Benefits are widely accepted Java, C#, Python, …
3 Garbage Collection: Haven’t we solved this problem yet? For a state-of-the-art garbage collector: –time ~14% of execution time –space 3x high watermark –pauses 0.8 seconds Can reduce any one cost Challenge: reduce all three costs
4 o2o2 o1o1 o4o4 o3o3 o5o5 o 10 o6o6 o8o8 o9o9 o7o7 o 11 o 15 o 14 o 12 o 13 Example Heap Boxes: heap objects Arrows: pointers Long box: stack + global variables s1s1 s2s2 g
5 o2o2 o1o1 o4o4 o3o3 o5o5 o 10 o6o6 o8o8 o9o9 o7o7 o 11 o 15 o 14 o 12 o 13 Thesis 1.Objects form distinct data structures 2.Connected objects die together 3.Garbage collectors can exploit 1. and 2. to reclaim objects efficiently stack + globals
6 Experimental Infrastructure JikesRVM Research Virtual Machine –From IBM Research –Written in Java –Application and runtime system share heap Good garbage collection even more important Benchmarks –SPECjvm98 suite and SPECjbb2000 –Java Olden suite –xalan, ipsixql, nfc, jigsaw
7 Outline Garbage Collector Design Principles Family of Garbage Collectors Design Space Exploration Conclusion
8 Garbage Collector Design Principles “Do partial collections.” Don’t collect the full heap every time Shorter pause times o2o2 o1o1 o4o4 o3o3 o5o5 o 10 o6o6 o8o8 o9o9 o7o7 o 11 o 15 o 14 o 12 o 13 stack + globals
9 Garbage Collector Design Principles “Predict lifetime based on age.” Generational hypothesis: Most objects die young Generational garbage collection: –Partition by age –Collect young objects most often Low time overhead That’s the state of the art. o2o2 o1o1 o4o4 o3o3 o5o5 o 10 o6o6 o8o8 o9o9 o7o7 o 11 o 15 o 14 o 12 o 13 stack + globals young generationold generation
10 Garbage Collector Design Principles Generational GC Problems o2o2 o1o1 o4o4 o3o3 o5o5 o 10 o6o6 o8o8 o9o9 o7o7 o 11 o 15 o 14 o 12 o 13 stack + globals young generationold generation Regular full collections Long peak pause Old-to-young pointers Need bookkeeping
11 Garbage Collector Design Principles “Collect connected objects together.” Likelihood that two objects die at the same time: ConnectivityExampleLikelihood Any pair33.1% Weakly connected46.3% Strongly connected72.4% Direct pointer76.4% o2o2 o1o1 ? o2o2 o1o1 o2o2 o1o1 o2o2 o1o1
12 Garbage Collector Design Principles “Focus on objects with few ancestors.” Shortlived objects are easy to collect Lifetime Median number of ancestor objects Short2 objects Long83,324 objects
13 Garbage Collector Design Principles “Predict lifetime based on roots.” o1o1 o2o2 o3o3 stack + globals Lifetime Objects reachable …ShortLong indirectly from stack25.6%16.2% only directly from stack32.9%0.8% from globals4.0%20.5% Total62.5%37.5% o4o4 g s For details, see [ISMM’02] paper.
14 Outline Garbage Collector Design Principles Family of Garbage Collectors Design Space Exploration Conclusion
15 CBGC Family of Garbage Collectors: Connectivity-Based Garbage Collection o2o2 o1o1 o4o4 o3o3 o5o5 o 10 o6o6 o8o8 o9o9 o7o7 o 11 o 15 o 12 o 13 p1p1 p2p2 p3p3 p4p4 o 14 stack + globals Do partial collections. Collect connected objects together. Predict lifetime based on age. Focus on objects with few ancestors. Predict lifetime based on roots.
16 Family of Garbage Collectors Components of CBGC Before allocation: 1.Partitioning Decide into which partition to put each object Collection algorithm: 2.Estimator Estimate dead + live objects for each partition 3.Chooser Choose “good” set of partitions 4.Partial collection Collect chosen partitions
17 Find fine-grained partitions, where Partition edges respect pointers Objects don’t move between partitions o2o2 o1o1 o4o4 o3o3 o5o5 o 10 o6o6 o8o8 o9o9 o7o7 o 11 o 15 o 12 o 13 p1p1 p2p2 p3p3 p4p4 Family of Garbage Collectors Partitioning Problem o 14 stack + globals
18 Pointer analysis Type-based [Harris] –o 1 may point to o 2 if o 1 has a field of a type compatible to o 2 -conservative: they determine the absence of a pointer btw two heaps only if they can prove that such pointer cannot exist. o2o2 o1o1 o4o4 o3o3 o5o5 o 10 o6o6 o8o8 o9o9 o7o7 o 11 o 15 o 12 o 13 p1p1 p2p2 p3p3 p4p4 Family of Garbage Collectors Partitioning Solutions o 14 stack + globals
19 Family of Garbage Collectors Estimator Problem For each partition guess dead –Objects that can be reclaimed –Pay-off live –Objects that must be traversed –Cost 3 dead + 3 live 1 dead + 2 live 2 dead + 0 live p1p1 p2p2 p3p3 p4p4 2 dead + 2 live stack + globals
20 Family of Garbage Collectors Estimator Solutions Heuristics Connected objects die together Most objects die young Objects reachable from globals live long The past predicts the future 3 dead + 3 live 1 dead + 2 live 2 dead + 0 live p1p1 p2p2 p3p3 p4p4 2 dead + 2 live stack + globals
21 Family of Garbage Collectors Chooser Problem Pick subset of partitions Maximize total dead Minimize total live Closed under predecessor relation No bookkeeping for external pointers p3p3 p1p1 p2p2 p3p3 p4p4 7 dead + 5 live 3 dead + 3 live 1 dead + 2 live 2 dead + 0 live 2 dead + 2 live stack + globals
22 Family of Garbage Collectors Chooser Solutions Optimal algorithm based on network flow [TR] Simpler, greedy algorithm p3p3 p1p1 p2p2 p3p3 p4p4 7 dead + 5 live 3 dead + 3 live 1 dead + 2 live 2 dead + 0 live 2 dead + 2 live stack + globals
23 o5o5 o 10 o8o8 o 11 Family of Garbage Collectors Partial Collection Problem o2o2 o6o6 o9o9 o7o7 o5o5 o 10 o8o8 o 11 o 12 o 13 o 15 p2p2 p3p3 p4p4 rest of heap o 14 Look only at chosen partitions Traverse reachable objects Reclaim unreachable objects stack + globals o o
24 o5o5 o 10 o8o8 o 11 Family of Garbage Collectors Partial Collection Solutions o2o2 o6o6 o9o9 o7o7 o5o5 o 10 o8o8 o 11 o 12 o 13 o 15 p2p2 p3p3 p4p4 rest of heap o 14 stack + globals Generalize canonical full-heap algorithms Mark and sweep [McCarthy’60] Semi-space copying [Cheney’70] Treadmill [Baker’92]
25 Outline Garbage Collector Design Principles Family of Garbage Collectors Design Space Exploration Conclusion
26 Design Space Exploration Questions How good is a naïve CBGC? How good could CBGC be in 20 years? How well does CBGC do in a JVM?
27 Design Space Exploration Simulator Methodology Garbage collection simulator (under GPL) –Uses traces of allocations and pointer writes from our benchmark runs Simulator advantages –Easier to implement variety of collector algorithms –Know entire trace beforehand: can use that for “in 20 years” experiments Currently adding CBGC to JikesRVM
28 Design Space Exploration How good is a naïve CBGC? Cost in time Cost in space Pause times Full-heap Semi-space copying CBGC-naïve Type-based partitioning [Harris] Heuristics estimator Appel Copying generational jackxalanjbbjavacjackxalanjbbjavacjackxalanjbbjavac
29 Cost in time Cost in space Pause times Full-heap Semi-space copying CBGC-oracles Partitioning and estimator based on trace Appel Copying generational jackxalanjbbjavacjackxalanjbbjavacjackxalanjbbjavac Design Space Exploration How good could CBGC be in 20 years?
30 CBGC with oracles beats Appel –We did not find a “performance wall” –CBGC has potential The performance gap between CBGC with oracles and naïve CBGC is large –Research challenges Design Space Exploration How good could CBGC be in 20 years?
31 How well does CBGC do in a Java virtual machine? Implementation in progress Need a pointer analysis for the partitioning
32 Contributions presented in this talk Connectivity-based GC design principles [ISMM’02] CBGC, a new family of garbage collectors; Design space exploration with simulator [OOPSLA’03]