Compilation 2007 Garbage Collection Michael I. Schwartzbach BRICS, University of Aarhus
2 Garbage Collection The Garbage Collector A garbage collector is part of the runtime system It reclaims heap-allocated records (objects) that are no longer in use A garbage collector should : reclaim all unused records spend very little time per record not cause significant delays allow all of memory to be used These are difficult and conflicting requirements
3 Garbage Collection Life Without Garbage Collection Unused records must be explicitly deallocated This is superior if done correctly But it is easy to miss some records And it is dangerous to handle pointers Memory leaks in real life ( ical v.2.1 ): MB hours
4 Garbage Collection Record Liveness Which records are still in use? Ideally, those that will be accessed in the future execution of the program But that is of course undecidable... Basic conservative approximation: A record is live if it is reachable from a stack location (local variable or local stack) Dead records may still point to each other
5 Garbage Collection A Heap With Live and Dead Records p q r
6 Garbage Collection The Mark-and-Sweep Algorithm Explore pointers starting from all stack locations and mark all the records encountered Sweep through all records in the heap and reclaim the unmarked ones Unmark all marked records Assumptions: we know the start and size of each record in memory we know which record fields are pointers reclaimed records are kept in a freelist
7 Garbage Collection Pseudo Code for Mark-and-Sweep function DFS(x) { if (x is a heap pointer) if (x is not marked) { mark x; for (i=1; i<=|x|; i++) DFS(x.f i ) } function Sweep() { p = first address in heap; while (p<last address in heap) { if (p is marked) unmark p; else { p.f 1 = freelist; freelist = p; } p = p + sizeof(p); } function Mark() { foreach (v in a stack frame) DFS(v); }
8 Garbage Collection Marking and Sweeping (1/11) p q r
9 Garbage Collection Marking and Sweeping (2/11) p q r
10 Garbage Collection Marking and Sweeping (3/11) p q r
11 Garbage Collection Marking and Sweeping (4/11) p q r
12 Garbage Collection Marking and Sweeping (5/11) p q r
13 Garbage Collection Marking and Sweeping (6/11) p q r freelist
14 Garbage Collection Marking and Sweeping (6/11) p q r freelist
15 Garbage Collection Marking and Sweeping (6/11) p q r freelist
16 Garbage Collection Marking and Sweeping (7/11) p q r freelist
17 Garbage Collection Marking and Sweeping (8/11) p q r freelist
18 Garbage Collection Marking and Sweeping (9/11) p q r freelist
19 Garbage Collection Marking and Sweeping (10/11) p q r freelist
20 Garbage Collection Marking and Sweeping (11/11) p q r freelist
21 Garbage Collection Analysis of Mark-and-Sweep Assume the heap has H words Assume that R words are reachable The cost of garbage collection is: c 1 R + c 2 H The cost per reclaimed word is: (c 1 R + c 2 H)/(H - R) If R is close to H, then this is expensive
22 Garbage Collection Allocation The freelist must be searched for a record that is large enough to provide the requested memory Free records may be sorted by size The freelist may become fragmented: containing many small free records but none that is large enough Defragmentation joins adjacent free records
23 Garbage Collection Pointer Reversal The DFS recursion stack could have size H It has at least size log(H) This may be too much (after all, memory is low) The recursion stack may be cleverly embedded in the fields of the marked records This technique makes mark-and-sweep practical
24 Garbage Collection The Reference Counting Algorithm Maintain a counter of the total number of references to each record For each assignment, update the counters A record is dead when its counter is zero Advantages: catches dead records immediately does not cause long pauses Disadvantages: cannot detect cycles of dead records is rather expensive
25 Garbage Collection Pseudo Code for Reference Counting function Increment(x) { x.count++; } function Decrement(x) { x.count--; if (x.count==0) PutOnFreeList(x); } function PutOnFreelist(x) { Decrement(x.f 1 ); x.f 1 = freelist; freelist = x; } function RemoveFromFreelist(x) { for (i=2; i<=|x|; i++) Decrement(x.f i ); }
26 Garbage Collection The Stop-and-Copy Algorithm Divide the heap space into two parts Only use one part at a time When it runs full, copy live records to the other part of the heap space Then switch the roles of the two parts Advantages: fast allocation (no freelist) avoids fragmentation Disadvantage: wastes half your memory
27 Garbage Collection Before and After Stop-and-Copy from-spaceto-space next limit to-spacefrom-space limit next
28 Garbage Collection Pseudo Code for Stop-and-Copy function Forward(x) { if (x from-space) { if (x.f 1 to-space) return x.f 1 ; else for (i=1; i<|x|; i++) next.f i = x.f i ; x.f 1 = next; next = next + sizeof(x); return x.f 1 ; } else return x; } function Copy() { scan = next = start of to-space; foreach (v in a stack frame) v = Forward(v); while (scan < next) { for (i=1; i<=|scan|; i++) scan.f i = Forward(scan.f i ); scan = scan + sizeof(scan); }
29 Garbage Collection Stopping and Copying (1/13) p q r from-spaceto-space
30 Garbage Collection Stopping and Copying (2/13) p q r from-spaceto-space
31 Garbage Collection Stopping and Copying (3/13) p q r from-spaceto-space
32 Garbage Collection Stopping and Copying (4/13) p q r from-spaceto-space
33 Garbage Collection Stopping and Copying (5/13) p q r from-spaceto-space
34 Garbage Collection Stopping and Copying (6/13) p q r from-spaceto-space
35 Garbage Collection Stopping and Copying (7/13) p q r from-spaceto-space
36 Garbage Collection Stopping and Copying (8/13) p q r from-spaceto-space
37 Garbage Collection Stopping and Copying (9/13) p q r from-spaceto-space
38 Garbage Collection Stopping and Copying (10/13) p q r from-spaceto-space
39 Garbage Collection Stopping and Copying (11/13) p q r from-spaceto-space
40 Garbage Collection Stopping and Copying (12/13) p q r from-spaceto-space
41 Garbage Collection Stopping and Copying (13/13) p q r 37 to-spacefrom-space
42 Garbage Collection Analysis of Stop-and-Copy Assume the heap has H words Assume that R words are reachable The cost of garbage collection is: c 3 R The cost per reclaimed word is: c 3 R/(H/2 - R) This has no lower bound as H grows
43 Garbage Collection Recognizing Records and Pointers Earlier assumptions: we know the start and size of each record in memory we know which record fields are pointers For object-oriented languages, each record already contains a pointer to a class descriptor For general languages, we must sacrifice a few bytes per record For the stack frame: use a bit per stack location use a table per program point
44 Garbage Collection Conservative Garbage Collection For mark-and-sweep, we may use a conservative approximation to recognize pointers A word is a pointer if it looks like one (its value is an address in the range of the heap space) This will recognize too many pointers Thus, too many records will be marked as live This does not work for stop-and-copy...
45 Garbage Collection Triggering Garbage Collection A collection must be triggered when there is no more free heap space But this may cause a long pause in the execution Collections may be triggered by heuristics: after a certain number of records have been allocated when only a certain fraction of the heap is free after a certain period of time when the program is not busy
46 Garbage Collection Generational Collection Observation: the young die quickly! The collector should focus on young records Divide the heap into generations: G 0, G 1, G 2,... All records in G i are younger than records in G i+1 Collect G 0 often, G 1 less often, and so on Promote a record from G i to G i+1 when it survives several collections
47 Garbage Collection Collecting a Generation How to collect the G 0 generation: roots are no longer just stack locations, but also pointers from G 1, G 2,... it could be expensive to find those pointers fortunately they are rare, so we can remember them Ways to remember pointers: maintain a set of all updated records mark pages of memory that contain updated records (using hardware or software)
48 Garbage Collection Incremental Collection A garbage collector creates (long) pauses This is bad for real-time programs An incremental collector runs concurrently with the program (in a separate thread) It must now handle simultaneous heap updates
49 Garbage Collection The Tricoloring Algorithm Records are colored black, grey, or white visited and all children visited visited, but not all children visited not visited The program may update the heap as it pleases, but must maintain an invariant: no black record points to a white record
50 Garbage Collection Function Tricolor() { color all records white; color all roots grey; while (more grey records) { x = a grey record; for (i=1; i<=|x|; i++) color x.f i grey; color x black; } reclaim all white records; } Pesudo Code for Tricoloring
51 Garbage Collection Maintaining the Invariant Write barriers: x.f i = y; black2grey(x).f i = y; Read barriers: x.f i = y; x.f i = white2grey(y); Requires synchronizations between the running program and the collector
52 Garbage Collection Garbage Collection in Java Sun's HotSpot VM uses by default: two generations: "nursery" and "old objects" the nursery is collected using stop-and-copy the old objects are collected using mark-and-sweep in a version that also compacts the live records For real-time applications: use option -Xincgc a more sophisticated incremental algorithm 10% slower but with shorter pauses
53 Garbage Collection Finalizers If an object has a finalize() method, it will be invoked before the object is reclaimed by the garbage collector But there is no guarantee how soon this happens This method may actually resurrect the object Typically, the garbage collector needs an extra pass to find out if the dead really stay dead
54 Garbage Collection Interacting With the Garbage Collector Trigger the garbage collector manually: System.gc(); The java.lang.ref package allows variations of the pointer concept: SoftReference WeakReference
55 Garbage Collection Soft References The garbage collector may reclaim an object that has soft references but no ordinary (strong) references This is typically used for caching: SoftReference sr = null;... Image img; if (sr == null) { img = getImage("huge.gif"); sr = new SoftReference(img); } else img = (Image)sr.get(); display(img); img = null;
56 Garbage Collection Weak References The garbage collector will reclaim an object that has weak references but no strong or soft references This is used in java.util.WeakHashMap, where keys are automatically removed when they are no longer in use