© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Garbage Collection Richard Jones Computing Laboratory University of Kent at Canterbury BCS Advanced Programming SG Thursday 14 October 1999 ©Richard Jones, All rights reserved.
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Why garbage collect? Language requirement many languages assume GC, e.g. allocated objects may survive much longer than the method that created them Problem requirement the nature of the problem may make it very hard/impossible to determine when something is garbage Abstraction and Modularity Not a silver bullet some memory management problems cannot be solved using automatic GC, e.g. if you forget to drop references to objects that you no longer need.
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October What is garbage? Almost all garbage collectors assume the following definition of live objects called liveness by reachability: if you can get to an object, then it is live. More formally: An object is live if and only if: it is referenced in a predefined variable called a root, or it is referenced in a variable contained in a live object. Garage example: I need my dinghy, & so I need its sails, & so I need their bag. Roots include words in the static area, registers and words on the execution stack that point into the heap.
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October The basic algorithms Think of clearing out your garage: Reference counting: Keep a note on each object in your garage, indicating the number of live references to the object. If an object’s reference count goes to zero, throw the object out (it’s dead). Mark-Sweep: Put a note on objects you need (roots). Then recursively put a note on anything needed by a live object. Afterwards, check all objects and throw out objects without notes. Mark-Compact: Put notes on objects you need (as above). Move anything with a note on it to the back of the garage. Burn everything at the front of the garage (it’s all dead). Copying: Move objects you need to a new garage. Then recursively move anything needed by an object in the new garage. Afterwards, burn down the old garage (any objects in it are dead)!
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Your choice! Basic algorithms Improving mark-sweep GC Generational GC Incremental GC Conservative GC Distributed GC
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Reference counting The simplest form of garbage collection is reference counting. Basic idea: count the number of references from live objects. Each object has a reference count (RC) when a reference is copied, the referent’s RC is incremented when a reference is deleted, the referent’s RC is decremented an object can be reclaimed when its RC = 0 Common implementation: smart pointers
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Advantages of reference counting Simple to implement Costs distributed throughout program Good locality of reference: only touch old and new targets' RCs Works well because few objects are shared and many are short-lived — space can be reused Minimal zombie time from when an object becomes garbage until it is collected Immediate finalisation is possible (due to near zero zombie time)
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Disadvantages of reference counting Not comprehensive (does not collect all garbage): cannot reclaim cyclic data structures High cost of manipulating RCs: cost is ever-present even if no garbage is collected Bad for concurrency: RC manipulations must be atomic Tightly coupled interface to mutator. smart pointers raw pointers High space overheads for some applications Recursive freeing cascade is only bounded by heap size
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Mark-Sweep Mark-sweep is a tracing algorithm — it follows (traces) references from live objects to find other live objects. Implementation: Each object has a mark-bit associated with it. There are two phases: Mark phase: starting from the roots, the graph is traced and the mark-bit is set in each unmarked object encountered. At the end of the mark phase, unmarked objects are garbage. Sweep phase: starting from the bottom, the heap is swept –mark-bit not set:the object is reclaimed –mark-bit set:the mark-bit is cleared
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Comprehensive: cyclic garbage collected naturally No run-time overhead on pointer manipulations Loosely coupled to mutator Does not move objects does not break any mutator invariants optimiser-friendly requires only one reference to each live object to be discovered (rather than having to find every reference) Advantages of mark-sweep
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Disadvantages of mark-sweep Stop/start nature leads to disruptive pauses and long zombie times. Complexity is O(heap) rather than O(live) every live object is visited in mark phase every object, alive or dead, is visited in sweep phase Degrades with residency (heap occupancy) the collector needs headroom in the heap to avoid thrashing Fragmentation and mark-stack overflow are issues Tracing collectors must be able to find roots (unlike reference counting)
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Cheney copying GC Divide heap into 2 halves (semi-spaces) named Fromspace and Tospace Allocate objects in Tospace When Tospace is full flip the roles of the semi-spaces pick out all live data in Fromspace and copy them to Tospace preserve sharing by leaving a forwarding address in the Fromspace replica use Tospace objects as a work queue
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October copy root and update pointer, leaving forwarding address scan A' copy B and C, leaving forwarding addresses scan B' copy D and E, leaving forwarding addresses scan C' copy F and G, leaving forwarding addresses scan D' and E' nothing to do scan F' use A's forwarding address scan G' nothing to do scan=free so collection is complete Copying GC Example
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Advantages of copying GC Compaction for free Allocation is very cheap for all object sizes out-of-space check is pointer comparison simply increment free pointer to allocate Only live data is processed (commonly a small fraction of the heap) Fixed space overheads free and scan pointers forwarding addresses can be written over user data Comprehensive: cyclic garbage collected naturally Simple to implement a reasonably efficient copying GC
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Disadvantages of copying GC Stop-and-copy may be disruptive Degrades with residency Requires twice the address space of other simple collectors touch twice as many pages trade-off against fragmentation Cost of copying large objects Long-lived data may be repeatedly copied All references must be updated Moving objects may break mutator invariants Breadth-first copying may disturb locality patterns
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Mark-compact collection Mark-compact collectors make at least two passes over the heap after marking to relocate objects to update references (not necessarily in this order) Issues how many passes? compaction style –sliding: preserve the original order of objects –linearising: objects that reference each other are placed adjacently (as far as possible) –arbitrary: objects moved without regard for original order or referential locality
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Complexity: caveat emptor Claim: “Copying GC is always better than mark-sweep GC” Simple mark-sweep GC is O(heap-size) every live object is visited in mark phase every object, alive or dead, is visited in sweep phase Simple copying GC is O(live) only live objects are copied But... Copying is more expensive than setting a bit Efficient implementations of mark-sweep are dominated by cost of mark phase –linear scanning less expensive than tracing –cost of sweep can be reduced further Simple asymptotic analyses are misleading
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October GC Metrics Execution time total execution time distribution of GC execution time time to allocate a new object Delay time length of disruptive pauses zombie times Memory usage additional memory overhead fragmentation virtual memory and cache performance Other important metrics comprehensiveness implementation simplicity and robustness
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Your choice! Basic algorithms Improving mark-sweep GC Generational GC Incremental GC Conservative GC Distributed GC
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Improving Mark-sweep GC Problem: mark-sweep collectors touch all memory both dead and alive — not good in a VM environment. Mark-sweep actions mark phase: mark-bits of all live objects modified sweep phase: all objects — live or dead — modified Reducing the cost of the sweep want to avoid sweep's O(heap) cost manipulate mark-bits more efficiently want to minimise paging
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Bitmap marking Store mark-bits in a separate bitmap table if smallest object is two 32-bit words, bitmap is 1.5% of heap A dvantages of bitmaps mark-bits held in RAM can be read or written without page faults no object/page is dirtied during marking –no page need be written back to swap disk live objects need not be touched during sweep –atomic objects need never be touched at all objects live and die in clusters, so test 32 bits at a time increased safety: less chance of program messing with bits Access to mark-bits is more expensive
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Lazy sweeping Reduce pauses by sweeping in parallel with mutator sweeper only modifies mark-bits and (maybe) garbage both are invisible to mutator thus sweeper and mutator cannot interfere Do a small amount of sweeping at each allocation sweep until sufficient free memory is found save reclaimed objects in a free-list or fixed-size vector Lazy sweeping and bitmaps deal with every bit in a bitmap word at the same time. since objects live and die in clusters, bitmaps of clusters can be test and cleared in groups of 32
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October block headers blocks
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Your choice! Basic algorithms Improving mark-sweep GC Generational GC Incremental GC Conservative GC Distributed GC
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October The generational hypothesis Weak generational hypothesis “Most objects die young” [Ungar, 1984] It is common for 80-95% objects to die before a further megabyte has been allocated 50-90% of CL and 75-95% of Haskell objects die before they are 10kb old SML/NJ reclaims 98% of any generation at each collection Only 1% Cedar objects survived beyond 721kb of allocation 95% of Java objects are ‘short-lived’ Strong generational hypothesis “The older an object is, the more likely it is to die?” does not appear to hold
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Generational GC Strategy: Segregate objects by age into generations Collect different generations at different frequencies Concentrate on the nursery generation By concentrating on a small part of the heap pause times can be reduced overall effort can also be reduced
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Example Update with pointer to a; request new object allocation request fails; perform minor collection further updates, allocation, etc... where are the roots for the new generation?
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Issues raised by generational garbage collection Old-young pointers give rise to roots for the young generation: how can these roots be discovered? Tenured garbage in older generations cannot be reclaimed by minor collections: how can the volume of older garbage be minimised? When should surviving objects be promoted to the next generation? How do we record ages? How should generations be organised?
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Problem: Intergenerational pointers We can collect the young generation on its own (minor collection) Old-young pointers give rise to roots for the young generation such pointers are comparatively rare they arise from destructive pointer writes these assignments can be trapped with a write barrier implementations: remembered sets, card tables Young-old pointers are common but not a problem if we collect younger generation whenever we collect an older one (major collection)
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Problem: Tenured garbage Garbage in older generations cannot be reclaimed by minor collections. Tenured garbage also causes the retention of young objects it references (nepotism) Pause time depends on size of the youngest generation how large should this be? how early should object be promoted to next generation? Too large? pause time will be too long Too small? young objects do not have sufficient time to die older generations will fill too fast
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Solutions Multiple generations allows youngest generation to be kept small allows older data more time to die Controlling promotion rate how many collections before promotion? GC when youngest generation is full or when allocation threshold is reached? adaptive techniques Techniques for age recording creation and aging spaces, buckets Other organisations of heap regions large object area mature object space
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Train algorithm The ‘Train algorithm’ can collect a region incrementally Divide this mature object space into cars each car has a remembered set collect one car at a time But data structures may span cars so group cars into trains idea is to accumulate linked data structure into a single train To collect a From-car move objects referenced from outside MOS into a fresh train copy their descendents into this train copy objects referenced from other trains to those trains
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October reference to A is external so copy A and its descendents in car 1 to a new car move object P referenced from other trains to that train; move object X referenced from car in this train to the last car in this train. Example
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Benefits and costs Benefits once a structure is held in a single train, it can be reclaimed if there are no external references to it volume of data copied is bounded at each collection: one car-full objects are clustered and compacted as they are copied Costs complex space required by remembered sets is large
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Generational GC: a summary Highly successful for a range of applications reduces pause time to a level acceptable for interactive applications improves paging and cache behaviour reduces the overall cost of garbage collection requires a low survival rate, infrequent major collections, low overall cost of write barrier But generational GC is not a universal panacea. It attempts to improve expected pause time at expense of the worst case objects may not die sufficiently fast applications may thrash the write barrier too many old-young pointers, or very deep execution stacks, may increase pause times
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Your choice! Basic algorithms Improving mark-sweep GC Generational GC Incremental GC Conservative GC Distributed GC
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Incremental/concurrent garbage collection runs collector in parallel with mutator attempts to bound pause time many soft real-time solutions but no general hard real-time solutions yet Sequential GC can be made incremental by interleaving collection with allocation. Alternatively, run GC concurrently in a separate thread. at each allocation, do a small amount of GC work. tune the rate of collection to the rate of allocation to prevent mutator running out of memory before collection is complete
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Asynchronous execution of mutator and collector introduces a coherency problem. For example, Synchronisation Update(right(B), right(A)) right(A) = nil Update(right(A), right(B)) right(B) = nil
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Tricolour abstraction Black object and its immediate descendants have been visited GC has finished with black objects and need not visit again. Grey object has been visited but its components may not have been scanned. or, for an incremental/concurrent GC, the mutator has rearranged connectivity of the graph. in either case, the collector must visit them again. White object is unvisited and, at the end of the phase, garbage. A collection terminates when no grey objects remain, i.e. all live objects have been blackened.
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Two ways to prevent disruption There are two ways to prevent the mutator from interfering with a collection by writing white pointers into black objects. 1)Ensure the mutator never sees a white object when mutator attempts to access a white object, the object is visited by the collector protect white objects with a read-barrier 2)Record where mutator writes black-white pointers, so that the GC can (re)visit objects protect objects with a write-barrier
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Write-barrier methods To falsely reclaim an object, two conditions must hold: a pointer to the white object is written into a black object and furthermore, this must be the only reference to the white object the original reference to the white object is destroyed If does not hold, there will be at least one path to each reachable white object that passes through a grey object. If does not hold, the white object will still be reachable through the original reference. Write barrier methods incremental update methods catch changes to connectivity snapshot-at-the-beginning methods prevent the loss of the original reference
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Issues Barriers lead to floating garbage. How conservative is an algorithm? how much garbage is left floating in the heap until the next collection cycle? what is the policy towards new (and often short-lived) objects? what colour are they allocated? How expensive is the barrier? How is initialisation achieved? How is termination achieved? In particular how are large root sets handled? how are threads synchronised?
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Best known method was introduced by Dijkstra et al Update (A,C) { *A = C shade(C) } shade(P) { if white(P) colour(P) = grey } The barrier traps attempts to install a pointer to a white object into a black object incrementally records changes to the shape of the graph prevents condition arising but no special action is required when a pointer is deleted Incremental update methods
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Snapshot at the beginning Update(A, C) { shade(*A) *A = C } The barrier remembers old references prevents condition arising More conservative than IU Simpler termination than IU
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Read barrier methods Idea: don't let the mutator see white objects so it cannot disrupt the collector Best known is Baker's copying collector. During collection each mutator read from Fromspace is trapped by read barrier writes are OK — they are not trapped objects are copied to Tospace, at B allocation is made at top of Tospace, at T new allocation copied objects
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Limitations of simple read-barrier methods Pointer reads are very common (15% of instructions?) read barrier is expensive (30% overhead?) pauses may be unpredictable and tightly clustered work done by read barrier is unbounded –depends on the size of the object the root set must be scanned atomically at the flip –pause depends on depth of execution stack
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October VM methods Concurrency without a fine-grain barrier Appel-Ellis-Li is based on Baker uses OS memory protection pagewise black-only read barrier more conservative than Baker Implementation lock grey pages mutator access springs trap –scan (blacken) grey page, copying its children –unlock page collector also scavenges in the background
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Appel-Ellis-Li issues Trap and scanner threads must be able to access protected Tospace pages Page traps are expensive, so not a real-time algorithm However, cost is application dependent. trap sprung only once per page per collection: suppose every element of a large array was updated at each iteration –only the first update would spring the OS trap –but every update would pay for a software write barrier
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Your choice! Basic algorithms Improving mark-sweep GC Generational GC Incremental GC Conservative GC Distributed GC
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Conservative collectors for C and C++ Conservative collectors try hard to find garbage but if in doubt, they must be conservative and declare questionable objects live. Conservative collectors have little or no knowledge of where roots are to be found stack frame layout which words are pointers and which are not Further constraints: values of words cannot be changed unless it is safe to do so compiler optimisations may compromise reachability invariants memory manager must be library-safe Solutions Conservative mark-sweep collectors ‘Mostly Copying’ collectors
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October BDW collector C interface To make a C program a garbage collected program, just add: #define malloc(sz)GC_malloc(sz) #define realloc(p,sz)GC_realloc(p,sz) #define free(p) C++ interface Objects derived from class gc are collectable. class A : public gc {...}; A* a = new A; // a is collectable. Objects allocated with ::operator new are uncollectable Both uncollectable and collectable objects can be explicitly deleted delete invokes an object's destructors and frees its storage immediately.
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October BDW Execution time
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October BDW Maximum memory usage Note: with Ultrix allocator, gawk and cfrac used only 79 and 64kb respectively
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Conservative GC vs. explicit deallocation BDW is comparable with explicit deallocation mean execution time overhead 19% above the best performs best with programs that allocate small objects But BDW has substantial space overhead mean overhead is 58% (excluding programs that allocate little) fixed overheads account for large space overheads for these programs Some caveats about these figures No attempts were made optimise for GC Programming style was ignored At best, these surveys provide an upper bound on the cost of this conservative collector.
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Pointer finding algorithm No coopera tion from the compiler no knowledge of heap or stack layout objects do not have headers (that the collector can use) Does a pointer p refer to the heap? compare with highest and lowest plausible addresses Has that heap block been allocated? obtain from p through a two-level search tree Is the offset of the object from the start of the block a multiple of the object size for that block? use an object_map for each object size is the entry in this map for (this block, this object) valid?
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Pointer misidentification Problem: wrongly identifying a bit-pattern as a pointer would cause a leak most (small) integers cannot be valid heap addresses classes of data such as compressed bitmaps: use GC_malloc_atomic to avoid scanning them uninitialised stack frame slots: clear a few stack frames before GC Blacklisting if reference points into the heap but fails validity tests black list that address — do not use it for allocation call the GC before any allocation to detect false references from static data! In practice leaks are relatively uncommon defensive programming techniques prevent common scenarios
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Conservative GC & optimising compilers Problem: garbage collectors assume liveness = pointer reachability but programming practices may disguise pointers, e.g. arithmetic on pointers reversing pointers with exclusive-or operator. Valid compiler optimisations may render objects temporarily invisible. For example, the following optimisation destroys even interior pointers sum = 0; for(i=0; i<SIZE; i++) sum += x[i]+y[i] sum = 0; diff = y - x; xend = x + SIZE; for(; x<xend; x++) sum += *x + *(x+diff); x -= SIZE; y = x +diff
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Your choice! Basic algorithms Improving mark-sweep GC Generational GC Incremental GC Conservative GC Distributed GC
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Problems of a distributed world Concurrency everywhere must avoid dead-locks, live-locks Communication is costly changing the reference count of a remote object may cost 10,000 times as much as changing the count of a local object Not easy to get complete knowledge of object graph synchronisation is expensive Faults everywhere communications, processes
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Ideal distributed collector Safe Complete reclaim all garbage including cycles Concurrent mutator-collector and collector-collector Efficient garbage should be reclaimed promptly Expedient make progress despite unavailability of parts of system Scalable to networks of many processes Fault tolerant against message delay, duplication or loss, and process failure
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Distributed GC Solutions Two (or more) levels of GC run a collector local to each machine run a distributed collector Distributed Reference Counting reference counting is a scalable solution reference listing also an alternative race conditions must not cause premature reclamation cannot reclaim garbage cycles Leases objects are reclaimed when all remote leases on them have expired
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Distributed tracing Comprehensive collectors require tracing but this requires global synchronisation: must conservatively identify all live objects Tracing with timestamps propagate timestamps to 'live' objects weaker synchronisation: any object with timestamp less than global threshold is garbage Weaken 'comprehensiveness' tracing within groups direct tracing: trace garbage rather than live objects –hybrid reference counting / partial tracing algorithms –back tracing
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October Your choice! Basic algorithms Improving mark-sweep GC Generational GC Incremental GC Conservative GC Distributed GC
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October In conclusion Garbage collection is a relatively mature technology. But hard problems remain. Commercial deployment of collector technology is still at an early stage. There are few players, and they use a small set of solutions. There are no magic solutions to all problems: know your application! Resources Garbage Collection page this talk
© Richard Jones, 1999BCS Advanced Programming SG: Garbage Collection 14 October