© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July A Rapid Introduction to Garbage Collection Richard Jones Computing Laboratory University of Kent at Canterbury mm-net Garbage Collection & Memory Management Summer School Tuesday 20 July 2004 © Richard Jones, All rights reserved.
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July PART 1: Introduction Motivation for garbage collection What to look for Motivation for garbage collection What to look for
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Why garbage collect? Finite storage requirement computer have finite, limited storage Language requirement many OO languages assume GC, e.g. allocated objects may survive much longer than the method that created them Problem requirement the nature of the problem may make it very hard/impossible to determine when something is garbage
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Why automatic garbage collection? Because human programmers just can’t get it right. Either too little is collected leading to memory leaks, or too much is collected leading to broken programs. Explicit memory management conflicts with the software engineering principles of abstraction and modularity. It’s not a silver bullet Some memory management problems cannot be solved using automatic GC, e.g. if you forget to drop references to objects that you no longer need. Some environments are inimical to garbage collection –embedded systems with limited memory –hard real-time systems
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July PART 2: The Basics What is garbage? The concept of liveness by reachability The basic algorithms The cost of garbage collection What is garbage? The concept of liveness by reachability The basic algorithms The cost of garbage collection
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July What is garbage? Almost all garbage collectors assume the following definition of live objects called liveness by reachability: if you can get to an object, then it is live. More formally: An object is live if and only if: it is referenced in a predefined variable called a root, or it is referenced in a variable contained in a live object (i.e. it is transitively referenced from a root). Non-live objects are called dead objects, i.e. garbage.
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Roots Objects and references can be considered a directed graph. Live objects are those reachable from a root. A process executing a computation is called a mutator — it simply modifies the object graph dynamically. Determining roots of a computation is, in general, language-dependent. In common language implementations roots include words in the static area registers words on the execution stack that point into the heap.
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July The basic algorithms Reference counting: Keep a note on each object in your garage, indicating the number of live references to the object. If an object’s reference count goes to zero, throw the object out (it’s dead). Mark-Sweep: Put a note on objects you need (roots). Then recursively put a note on anything needed by a live object. Afterwards, check all objects and throw out objects without notes. Mark-Compact: Put notes on objects you need (as above). Move anything with a note on it to the back of the garage. Burn everything at the front of the garage (it’s all dead). Copying: Move objects you need to a new garage. Then recursively move anything needed by an object in the new garage. Afterwards, burn down the old garage (any objects in it are dead)!
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Update(left(R), S) Reference counting The simplest form of garbage collection is reference counting. Basic idea: count the number of references from live objects. Each object has a reference count (RC) when a reference is copied, the referent’s RC is incremented when a reference is deleted, the referent’s RC is decremented an object can be reclaimed when its RC = 0
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Advantages of reference counting Simple to implement Costs distributed throughout program Good locality of reference: only touch old and new targets' RCs Works well because few objects are shared and many are short-lived Zombie time minimized: the zombie time is the time from when an object becomes garbage until it is collected Immediate finalisation is possible (due to near zero zombie time) OHPOHP
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Disadvantages of reference counting Not comprehensive (does not collect all garbage): cannot reclaim cyclic data structures High cost of manipulating RCs: cost is ever-present even if no garbage is collected Bad for concurrency — need Compare&Swap Tightly coupled interface to mutator High space overheads Recursive freeing cascade OHPOHP
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Mark-Sweep Mark-sweep is a tracing algorithm — it works by following (tracing) references from live objects to find other live objects. Implementation: Each object has a mark-bit associated with it. There are two phases: Mark phase: starting from the roots, the graph is traced and the mark-bit is set in each unmarked object encountered. At the end of the mark phase, unmarked objects are garbage. Sweep phase: starting from the bottom, the heap is swept –mark-bit not set:the object is reclaimed –mark-bit set:the mark-bit is cleared
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July A simple mark-sweep example 0 1 2
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Comprehensive: cyclic garbage collected naturally No run-time overhead on pointer manipulations Loosely coupled to mutator Does not move objects does not break any mutator invariants optimiser-friendly requires only one reference to each live object to be discovered (rather than having to find every reference) Advantages of mark-sweep OHPOHP
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Disadvantages of mark-sweep Stop/start nature leads to disruptive pauses and long zombie times. Complexity is O(heap) rather than O(live) every live object is visited in mark phase every object, alive or dead, is visited in sweep phase Degrades with residency (heap occupancy) the collector needs headroom in the heap to avoid thrashing Fragmentation and mark-stack overflow are issues Tracing collectors must be able to find roots (unlike reference counting) OHPOHP
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Fast allocation? Problem: Non-moving memory managers fragment the heap mark-sweep reference counting A compacted heap offers better spatial locality, e.g. better virtual memory and cache performance allows fast allocation –merely bump a pointer
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Copying garbage collection Divide heap into 2 halves called semi-spaces and named Fromspace and Tospace Allocate objects in Tospace When Tospace is full flip the roles of the semi-spaces pick out all live data in Fromspace and copy them to Tospace preserve sharing by leaving a forwarding address in the Fromspace replica use Tospace objects as a work queue OHPOHP
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July copy root and update pointer, leaving forwarding address scan A' copy B and C, leaving forwarding addresses scan B' copy D and E, leaving forwarding addresses scan C' copy F and G, leaving forwarding addresses scan D' and E' nothing to do scan F' use A's forwarding address scan G' nothing to do scan=free so collection is complete Copying GC Example
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Advantages of copying GC Compaction for free Allocation is very cheap for all object sizes out-of-space check is pointer comparison simply increment free pointer to allocate Only live data is processed (commonly a small fraction of the heap) Fixed space overheads free and scan pointers forwarding addresses can be written over user data Comprehensive: cyclic garbage collected naturally Simple to implement a reasonably efficient copying GC OHPOHP
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Disadvantages of copying GC Stop-and-copy may be disruptive Degrades with residency Requires twice the address space of other simple collectors touch twice as many pages trade-off against fragmentation Cost of copying large objects Long-lived data may be repeatedly copied All references must be updated Moving objects may break mutator invariants Breadth-first copying may disturb locality patterns
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Mark-compact collection Mark-compact collectors make at least two passes over the heap after marking to relocate objects to update references (not necessarily in this order) Issues how many passes? compaction style –sliding: preserve the original order of objects –linearising: objects that reference each other are placed adjacently (as far as possible) –arbitrary: objects moved without regard for original order or referential locality
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Cost metrics Many cost metrics can be interesting (albeit not necessarily at the same time). These cost metrics cover different types of concerns that may apply. The metrics are partially orthogonal, partially overlapping, and certainly also partially contradictory. In general it is not possible to identify one particular metric as the most important in all cases — it is application dependent. Because different GC algorithms emphasise different metrics, it is also, in general, not possible to point out one particular GC algorithm as “the best”. In the following, we present the most important metrics to consider when choosing a collector algorithm.
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July GC Metrics Execution time total execution time distribution of GC execution time time to allocate a new object Memory usage additional memory overhead fragmentation virtual memory and cache performance Delay time length of disruptive pauses zombie times Other important metrics comprehensiveness implementation simplicity and robustness
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Execution time metrics Total execution time relevant for applications such as batch processing. can be less important for some applications, e.g. where there is much idle time (interactive applications). Distribution of GC execution time the absolute amount of execution time consumed may be less important than the amortisation of that cost over the mutator’s execution. The time to allocate a new object for some applications it may be important to be able to allocate new objects fast. NotesonlyNotesonly
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Delay time metrics Length of disruptive pauses for applications requiring rapid response, e.g. most interactive applications, the length of the disruptive pauses introduced by the collector may be the most relevant metric. Zombie times the delay time from when an object becomes garbage until the memory allocated to it is actually collected. Long zombie times require more memory to be available (to house the dead, as yet uncollected, objects) NotesonlyNotesonly
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Memory metrics Amount of extra memory consumed some algorithms work better (or simply require) large amounts of extra memory Memory fragmentation some algorithms result in much fragmentation of memory, while others actually reduce fragmentation Virtual memory and cache performance the interaction between virtual memory, caches, and the garbage collector can be quite important. Some algorithms touch all parts of allocated memory (live as well as dead objects and even unallocated memory) while others touch only limited amounts (e.g. live objects only). NotesonlyNotesonly
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Other important metrics Comprehensiveness does it find all garbage? Some collectors are comprehensive: they collect all garbage while others are conservative: they leave some garbage uncollected, for example, some do not collect cyclic object structures, while others retain some dead objects because these collectors cannot clearly identify all garbage. Implementation simplicity and robustness at times, simplicity of implementation is most important (get the job done!) is the garbage collector robust? How tightly coupled to the mutator is it? NotesonlyNotesonly
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July PART 3: Generational GC
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July The generational hypothesis Weak generational hypothesis “Most objects die young” [Ungar, 1984] It is common for 80-95% objects to die before a further megabyte has been allocated 95% of objects are ‘short-lived’ in many Java programs 50-90% of CL and 75-95% of Haskell objects die before they are 10kb old SML/NJ reclaims 98% of any generation at each collection Only 1% Cedar objects survived beyond 721kb of allocation
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Generational GC Strategy: Segregate objects by age into generations Collect different generations at different frequencies Concentrate on the nursery generation By concentrating on a small part of the heap, pause times can be reduced –Java HotSpot claims pause times are typically reduced by a factor of 5 OHPOHP
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Semi-space copying vs. generational GC
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Example Update with pointer to a; request new object; allocation fails perform minor collection further updates, allocation, etc... where are the roots for the new generation?
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Not a universal panacea Generational GC is a successful strategy for many but not all programs. There are common examples of programs that do not obey the weak generational hypothesis. It is common for programs to retain most objects for a long time and then to release them all at the same time. Generational GC imposes a cost on the mutator: pointer writes become more expensive certain classes of program may thrash this overhead. –e.g. a program may repeatedly process a large long-lived array of heap-allocated data, updating many, if not all, of the array's slots at each iteration. need to copy objects.
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Roots of the new generation
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Issues raised by generational garbage collection Old-young pointers give rise to roots for the young generation: how can these roots be discovered? Garbage in older generations cannot be reclaimed by minor collections: how can this be minimised? When should surviving objects be promoted to the next generation? How do we record object ages? How should generations be organised? OHPOHP
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Problem: Intergenerational pointers We can collect the young generation on its own (minor collection) Old-young pointers give rise to roots for the young generation such pointers are comparatively rare they arise from destructive pointer writes these assignments can be trapped with a write barrier Young-old pointers are common but not a problem if we collect younger generation whenever we collect an older one (major collection) OHPOHP
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Problem: Tenured garbage Garbage in older generations cannot be reclaimed by minor collections Tenured garbage also causes the retention of young objects it references (nepotism)
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Problem: Promotion policies Aims of generational GC to reduce pause time to reduce overall costs of long-lived objects Pause time depends on size of the youngest generation how large should this be? how early should object be promoted to next generation? Too late? pause time will be too long Too early? young objects do not have sufficient time to die older generations will fill too fast major collections will be too frequent working set will be diluted cost of write barrier increases (more old-young pointers) OHPOHP
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Multiple generations Allow youngest generation to be kept small hence, short pauses for minor collections and good locality –try to tune to size of cache? Filter objects prematurely promoted allows more time to die prevents garbage reaching oldest generation But, more complex intermediate collections may also be disruptive
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Promotion rate Promotion rate depends on the number of minor collections that an object must survive. Copy count = 1? en masse promotion: all young objects promoted allows simple heap organisation, but young objects have too little opportunity to die promotion rate % higher than necessary Copy count = 2? denies promotion to recently created objects may reduce survivors by a factor of 2, but may increase copying costs by less than half Copy count > 2? gives diminishing returns
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Adaptive techniques Number of survivors of a scavenge predicts the next pause time don't promote any objects if volume of survivors was low otherwise set threshold to promote sufficient survivors to reduce next pause But this cannot reduce amount of tenured garbage So allow boundary between generations to move in both directions? but must record all forward pointers not just intergenerational ones OHPOHP
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Generation organisations 1) One semi-space per generation, en masse promotion no need to record ages youngest region recycled at each GC — good locality requires multiple generations causes more write-barrier traps 2) Divide generation into a creation space and an aging space aging space holds survivors from creation space –must be organised into semi-spaces since objects may be held for more than one scavenge –but semi-spaces can be kept small creation space can be reused at each collection –locality benefit –keep in physical memory (or cache)
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Other organisations Bucket-brigades divide each generation into sub-regions: determine age by a pointer comparison rather than recording per-object ages. Large Object Areas managed by non-copying GC Older-First Collection assumes that recently allocated objects need time to die. objects laid out in allocation order collector processes objects in older-first order. Beltway Beltway is a flexible framework that exploits and separates object age and incrementality. It encompasses all other copying collectors.
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Inter-generational pointers Need to discover these as they are roots for younger generations Scan older generations? no cost to mutator requires more scanning at GC time (but scanning is faster than tracing and has better locality) Trap writes? IGPs caused through pointer stores and promotion Software or hardware techniques Issues –cost to the mutator (time, code size) –space overhead –discovery at scavenge-time
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Roots of the new generation OHPOHP
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Write barriers Write barriers can be implemented in software or with operating system support. Software have the compiler emit extra instructions Operating system use memory protection mechanism use paging mechanism dirty bits
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Trapping pointer writes Problem: need to trap pointer writes that may create old- young pointers don’t need to trap stores to stack or initialising stores –can initialising stores be distinguished from other pointer writes? writes relatively uncommon (but language dependent) –Lisp: 5-10% of memory references are non-initialising pointer stores –SML/NJ: frequency of pointer stores is less than 1% cost also depends on whether language is interpreted or not There are 2 ways to do this. OHPOHP
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Remembered sets Keep address of old object containing pointer to young one question: address of object or address of slot in object? Scanning cost dependent on size of set (not number of pointer stores) and size of objects (if slot addresses not remembered)
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Card tables Aims: a faster write barrier small memory cost portability Divide heap into small cards (e.g. 128 bytes table < 1% heap) Set a bit in the card table unconditionally whenever a word in heap corresponding to that card is modified Scan modified cards at scavenge-time Result Fast trap — 3 instructions Cost of scanning proportional to number and size of cards marked Difficulty: how to handle objects that span cards?
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Generational GC: a summary Highly successful for a range of applications reduces pause time to a level acceptable for interactive applications improves paging and cache behaviour reduces the overall cost of garbage collection Requires a low survival rate, infrequent major collections, low overall cost of write barrier But generational GC is not a universal panacea. It attempts to improve expected pause time at expense of the worst case objects may not die sufficiently fast applications may thrash the write barrier too many old-young pointers, or very deep execution stacks, may increase pause times
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July PART 4: The GC Interface
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July GC Interface Programmer’s viewpoint: Memory management policies and configurations When to collect? Region in which to allocate –tracing or reference counting? –moving or non-moving? –immortal? –region sizes Programming restrictions Implementer's viewpoint: When to collect? –System.gc()? –region full? –other data structures? Safe points Threads –synchronisation –avoiding locks Root finding
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Root finding Global roots, roots in other regions static area, class data structures card tables, rem-sets; maintained by write-barriers Stacks which slots contain references? scan conservatively, or type accurately –compiler generates a stack map –map PC to stack info –significant fraction of code size but compresses well –only collect at GC-safe points – ambiguities (Java jsr problem)
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Safe-points Before a ‘stop the world’ GC, need to halt all mutator threads at safe-points make every instruction a safe point? polling (e.g. before method calls, back-branches, etc) patching (GC patches in suspensions) Suspension can account for significant proportion of GC-time.
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July PART 5: Incremental and concurrent garbage collection Incremental/concurrent garbage collection runs collector interleaved/in parallel with mutator attempts to bound pause time many soft real-time solutions but no general hard real-time solutions yet Incremental/concurrent garbage collection runs collector interleaved/in parallel with mutator attempts to bound pause time many soft real-time solutions but no general hard real-time solutions yet
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Terminology mutator collector Incremental collection idle Concurrent collection Parallel collection Suppose GC ‘should’ account for 20% of execution time Multiple GC threads: GC = 20% time Single GC thread: 33% time
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Asynchronous execution of mutator and collector introduces a coherency problem. For example, in the marking phase Synchronisation Update(right(B), right(A)) right(A) = nil Update(right(A), right(B)) right(B) = nil Collector marks A Collector scans A Collector marks B Collector scans B
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Tricolour abstraction Black object and its immediate descendants have been visited GC has finished with black objects and need not visit again.Grey object has been visited but its components may not have been scanned. or, for an incremental/concurrent GC, the mutator has rearranged connectivity of the graph. in either case, the collector must visit them again.White object is unvisited and, at the end of the phase, garbage. A collection terminates when no grey objects remain, i.e. all live objects have been blackened. OHPOHP
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Example grey set
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Two ways to prevent disruption There are two ways to prevent the mutator from interfering with a collection by writing white pointers into black objects. 1)Ensure the mutator never sees a white object when mutator attempts to access a white object, the object is visited by the collector protect white objects with a read-barrier 2)Record where mutator writes black-white pointers, so that the GC can (re)visit objects protect objects with a write-barrier
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Write-barrier methods To falsely reclaim an object, two conditions must hold: a pointer to the white object is written into a black object and furthermore, this must be the only reference to the white object the original reference to the white object is destroyed If does not hold, there will be at least one path to each reachable white object that passes through a grey object. If does not hold, the white object will still be reachable through the original reference. Write barrier methods incremental update methods catch changes to connectivity snapshot-at-the-beginning methods prevent the loss of the original reference
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Issues Barriers lead to floating garbage. How conservative is an algorithm? how much garbage is left floating in the heap until the next collection cycle? what is the policy towards new (and often short-lived) objects? what colour are they allocated? How expensive is the barrier? How is initialisation achieved? How is termination achieved? In particular how are large root sets handled? how are threads synchronised?
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Best known method was introduced by Dijkstra et al Update (A,C) { *A = C shade(C) } shade(P) { if white(P) colour(P) = grey } The barrier traps attempts to install a pointer to a white object into a black object incrementally records changes to the shape of the graph prevents condition arising but no special action is required when a pointer is deleted Incremental update methods
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Snapshot at the beginning Update(A, C) { shade(*A) *A = C } The barrier remembers old references prevents condition arising More conservative than IU Simpler termination: only have to scan roots once, even if no barrier on local variables and temporaries
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July Read barrier methods Idea: don't let the mutator see white objects so it cannot disrupt the collector but mutator can access black objects Question: should mutator be allowed to see grey objects as well? Best known is Baker's copying collector. During collection each mutator read from Fromspace is trapped by read barrier writes are OK — they are not trapped objects are copied to Tospace, at B allocation is made at top of Tospace, at T new allocation copied objects
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July THE END