Copying GC and Reference Counting Jonathan Kalechstain Tel Aviv University 11/11/2014
Outline of talk Reminder Copying Garbage Collection Algorithm Reference Counting Algorithm Summary
Recap Last week we went through two algorithms - Mark and Sweep - Mark Compact Mark and Sweep suffers from fragmentation Mark Compact is very costly
Mark & Sweep 4 Traverse live objects & mark black. White objects can be reclaimed. registers Roots
During the run objects are allocated and reclaimed. Gradually, the heap gets fragmented. When space is too fragmented to allocate, a compaction algorithm is used. Move all live objects to the beginning of the heap and update all pointers to reference the new locations. Mark-Compact 5 The Heap
Copying garbage collection 6 Part IPart II Roots A D C B E
The collection copies… 7 Part IPart II Roots A D C B E A C
Roots are updated; Part I reclaimed. 8 Part IPart II Roots A C
Algorithm
Algorithm (2) free top tospace fromspace
Algorithm (3) free top tospace fromspace A C
Algorithm (3) free tospace fromspace AC top
Algorithm (4)
Algorithm (5) free top tospace fromspace A C free top fromspace tospace A C
Something about pointers 2000 N address Here: Pointers(N) = {1000,1004} 1800 null
Algorithm (6)
Algorithm (7) 1.fromRef = &A 2.toRef = null 3.toRef = free = free + 40 = FA(fromRef)= Return 1000 free = 1000 free = 1040 fromRef fld FA roots
Algorithm (8) 1.fromRef = A.ptr1 2.toRef = null 3.toRef = free = free + 40 = FA(fromRef)= Return 1040 FA free = 1040 free = 1080 ptr1
Algorithm (5) free = 1080 ptr1
Definitions A node copied to the “to area” is considered grey (went through the function process). A node whose pointers were updated is considered black Once a node is black, all it’s children are copied and considered grey (or black if they were already scanned)
Back to Algorithm
Work List Implementation We assumed that we get nodes by some order Implementation possibilities: - Queue - Stack - Pointer (Cheney,1970) As we will see, different implementations can have an effect on performance.
Cheney’s Worklist
Full Running Example Note the colors! White is unvisited Grey is copied but not scanned Black is scanned (and will never be scanned again)
Full Running Example(2) Fromspace Tospace L is a linked list L is directly reachable from the roots
Full Running Example(3) Fromspace Tospace scanfree 1.L was copied to “To space” 2.L points to L` 3.Scan points to start of tospace 4.L` points to A and E 5.Scan of L` next page…
Full Running Example(4) Fromspace Tospace scan free Flow: 1.Get next item from remove(worklist) and advance scan. 2.Copy all children, update free and references.
Full Running Example(5) Fromspace Tospace scan free Flow: 1.Get next item from remove(worklist) and advance scan. 2.Copy all children, update free and references.
Full Running Example(6) Fromspace Tospace scanfree Flow: 1.Get next item from remove(worklist) and advance scan. 2.Copy all children, update free and references.
Full Running Example(7) Fromspace Tospace scanfree Flow: 1.Get next item from remove(worklist) and advance scan. 2.Copy all children, update free and references.
Full Running Example(8) Fromspace Tospace scan free Flow: 1.Get next item from remove(worklist) and advance scan. 2.Copy all children, update free and references.
Correctness Lemma 1 : free is updated a finite number of times Lemma 2 : free += c iff at some later phase scan += c (c is an object and it’s byte size) Lemma 3 : The algorithm terminates Lemma 4 : All live objects are copied Lemma 5 : All live object are scanned Theorem 1 : All scanned objects are updated correctly at the end of the algorithm.
Correctness(1) Lemma 1 : free is called a finite number of times - There is a finite number of objects. - Free is updated from the function copy() - When an object is copied, it’s forwarding address is updated. - Once a forwarding address is not null, then copy() isn’t invoked
Correctness(2)
Correctness(3)
Correctness(4)
Correctness(5) Lemma 4 : All live objects are copied By induction on d, the distance from the roots base: d = 1, at the beginning all roots are processed and therefore copied. step: assume for some d. Lets observe some object o of distance d+1. Let’s look at o’s father s. s was discovered, copied (i.h) and by lemma 1 scanned. When s was scanned o had to be copied Lemma 5 : All live object are scanned -Follows immediately from lemmas (1+4)
Correctness(6) Theorem 1 : All scanned objects are updated correctly at the end of the algorithm. - When an object is scanned, all it’s pointers are either copied and updated, or either updated from the forwarding address. - From lemma 5, all live objects are copied and scanned
Traversal Order & Locality In the example and algorithm we saw, the traversal was BFS. The only extra memory required was a pointer. Is BFS better than DFS ?
Traversal Order & Locality(2) BFS DFS Page 1Page 2Page 3
Traversal Order & Locality(3) BFS tends to separate children from parents. DFS keeps them more closely. Cache misses and page faults are important. DFS requires a stack and more space. Compromise?
Pros/cons of copying GC Easy allocation Avoid external fragmentation Can reorder objects to decrease future page faults and cache misses Easy to Implement Uses half the size of heap, requires more collections for same size heap.
Reference counting 43 Recall that we would like to know if an object is reachable from the roots. Associate a reference count field with each object: how many pointers reference this object. When nothing points to an object, it can be deleted. Very simple, used in many systems.
Basic Reference Counting 44 Each object has an RC field, new objects get o.RC:=1. When p that points to o 1 is modified to point to o 2 we execute: o 1.RC--, o 2.RC++. if then o 1.RC==0: – Delete o 1. – Decrement o.RC for all “children” of o 1. – Recursively delete objects whose RC is decremented to 0. o1o1 o2o2 p
Reference counting 45 Algorithm is direct Reference update is part of the mutator’s responsibility. Because there can by many mutators, the writing function must be atomic.
Easy Reference Counting 46
Example 47 L RC=1 ptrA ptrB RC=1 ptrA ptrB A RC=1 ptrA ptrB B RC=2 ptrA ptrB C RC=1 ptrA ptrB D
Example 48 L RC=1 ptrA ptrB RC=0 ptrA ptrB A RC=0 ptrA ptrB B RC=1 ptrA ptrB C RC=2 ptrA ptrB D L.ptrA = D
Example 49 L RC=1 ptrA ptrB RC=0 ptrA ptrB C RC=1 ptrA ptrB D L.ptrB = null
Example 50 L RC=1 ptrA ptrB RC=1 ptrA ptrB D
Advantages 51 Memory management costs are distributed throughout the computation. Can recycle object as soon as it becomes garbage. Operates well on almost full heap. (immediate recycling) Doesn’t have to know the roots or the layout of the program
Disadvantages (of last algorithm) 52 Big overhead - Every read/write operation requires updating the counter. - Even iterating through a list requires continuing updates to objects counter Read-only operation requires stores to memory (update counters) which pollute the cache and induce extra memory traffic No handling of cycles
Conclusion 53 Naive algorithm is too weak Synchronization and overhead of commands is too costly. Need to find another approach Postpone cleanup to a specific time and freeze mutators phase?
Deferred Reference Counting 54 Postpone some of the updates to a “stop the world” phase Save much update and synchronization actions Need to pay attention to correctness. All pointers from roots (stack,registers etc.) are postponed to a later phase. Keep a zero table, that keeps all objects with zero references from heap.
55 heap stack 1.A.ptr = B => B.rc++ immediately 2.stack.someptr = B => B.rc++ later
Algorithm 56
Algorithm (2) 57
Algorithm (3) 58 atomic collect(): for each fld in Roots /*mark stacks*/ addReference(*fld) sweepZCT() for each fld in Roots /*unmark stacks*/ deleteReferenceToZCT(*fld ): sweepZCT(): while not isEmpty(zct) ref <- remove(zct) if rc(ref) = 0 for each fld in Pointers(ref) deleteReference(*fld) free(ref)
Cyclic Reference Counting 59 Until now we ignored cycles In a cycle the RC of all nodes has to be greater than zero. We will overview “The Recycler” [Bacon 2001;Bacon and Rajan 2001; Pa, 2007]
RC = 1 Even if the cycle is disconnected from the rest of the heap, the RC algorithm will not be able to tell it’s garbage Heap\C
Observations 61 1.Garbage cycles can arise only from a pointer deletion that leaves a reference count greater than zero. 2.In any garbage structure all reference counts must be due to pointers between objects within the structure.
62 A is considered a candidate RC = 2 RC = 1 roots RC = 1
63 A is considered a candidate and colored purple RC = 2 RC = 1 roots RC = 1
Colors for algorithm 64 1.Black is an object that has been freed or that is alive. 2.White is garbage to be freed 3.Purple is for root candidates of a garbage cycle 4.Grey is for possible members of a garbage cycle
The Big Picture 65 1.Identify possible roots of garbage cycles (purple nodes) 2.Decrement reference count due to internal pointers and color all nodes grey 3.If reference count of an object is greater than zero, color black all it’s fanout 4.Mark white all other nodes 5.Free all white nodes
66 Collect() is called… RC = 1 RC = 0 roots RC = 0
67 RC = 1 RC = 0 roots RC = 1
68 RC = 1 roots RC = 1
Algorithm 69
Algorithm (2) 70
Algorithm (3) 71 atomic collect(): markCandidates() for each ref in candidates scan(ref) collectCandidates() markCandidates() for ref in candidates if color(ref) = purple markGrey(ref) else remove(candidates,ref) if color(ref) = black && rc(ref) = 0 free(ref)
Algorithm (4) 72
Algorithm (5) 73
Running Example RC = 2 RC = 1 RC = 2 RC = 1 RC = 2
Running Example RC = 1 RC = 2 RC = 1 RC = 2 deleteReference(ref toD)
Running Example RC = 1 RC = 0 RC = 2 RC = 1 markCandidates() Calls markGrey(&D)
Running Example RC = 0 RC = 1 RC = 0 RC = 1 markGrey(&D) Calls markGrey(&A) markGrey(&E)
Running Example RC = 0 RC = 1 RC = 0 Every reachable Node is grey
Running Example RC = 0 RC = 1 RC = 0 scan(&D)
Running Example RC = 0 RC = 1 RC = 0 Scan(&A) Scan(&E)
Running Example RC = 0 RC = 1 RC = 0 Scan(&C)
Running Example RC = 0 RC = 1 RC = 0 RC = 1 ScanBlack(&B)
Running Example RC = 0 RC = 1 RC = 0 RC = 1 ScanBlack(&B) : We update RC(E)++
Running Example RC = 0 RC = 1 ScanBlack(&E) : We update RC(C)++
Running Example RC = 0 RC = 1 ScanBlack(&C)
Running Example RC = 0 RC = 1 coollectCandidates(): Calls collectWhite(&D)
Running Example RC = 0 RC = 1 collectWhite(&A)
Running Example RC = 1 Free(&A) Free(&D)
Summary We have explored two basic algorithms and some variations of them. 1.Copying GC – explored the algorithm, proved it’s correctness and discussed data structures and scanning order. 2.Reference Counting – explored the basic algorithm, referred reference count algorithm and cycle removal algorithm.
Questions ?