Dynamic Compilation Vijay Janapa Reddi The University of Texas at Austin Garbage Collection 1
Today Garbage Collection Why use garbage collection? What is garbage? Reachable vs live, stack maps, etc. Allocators and their collection mechanisms Semispace Marksweep Performance comparisons Incremental age based collection Write barriers: Friend or foe? Generational Beltway More performance
Basic VM Structure Program/Bytecode Executing Program Class Loader Verifier, etc. Heap Thread Scheduler Dynamic Compilation Subsystem Garbage Collector
True or False? Real programmers use languages with explicit memory management? I can optimize my memory management much better than any garbage collector
True or False? Real programmers use languages with explicit memory management. I can optimize my memory management much better than any garbage collector Scope of effort?
Why Use Garbage Collection? Software engineering benefits Less user code compared to explict memory management (MM) Less user code to get correct Protects against some classes of memory errors No free(), thus no premature free(), no double free(), or forgetting to free() Not perfect, memory can still “leak” Programmers still need to eliminate all pointers to objects the program no longer needs Performance: space/time tradeoff Time proportional to dead objects (explicit mm, reference counting) or live objects (semispace, marksweep) Throughput versus pause time Less frequent collection, typically reduces total time but can increase space requirements and pause times Hidden locality benefits?
GC, A tool for all occasions? When might you NOT be willing to use a garbage collector?
What is Garbage? In theory, any object the program will never reference again But compiler & runtime system cannot figure that out In practice, any object the program cannot reach is garbage Approximate liveness with reachability OK, so how do we what data is reachable? Keep track of pointers They tell you how to “reach” some other piece of data What about programming languages like C? X = *(<arbitrary address >) Everything is (potentially) reachable It’s up to the programmer… malloc() & free()
What is Garbage? Managed languages couple GC with “safe” pointers Programs may not access arbitrary addresses in memory The compiler can identify and provide to the garbage collector all the pointers, thus “Once garbage, always garbage” Runtime system can potentially relocate objects by updating pointers
Reference Counting If we know whenever we assign a pointer, we update a reference count. When it is decremented to 0, it is freed. Consider: Pop of a stack… Head 1 1 1
Reference Counting If we know whenever we assign a pointer, we update a reference count. When it is decremented to 0, it is freed. Consider: Pop of a stack… Head 2 1
Reference Counting If we know whenever we assign a pointer, we update a reference count. When it is decremented to 0, it is freed. Consider: Pop of a stack… Head 1 1
Reference Counting What if we want to delete this deque? Head = Tail = NULL; Head Tail 2 2 2
Reference Counting What if we want to delete this deque? Head = Tail = NULL; Head NULL Tail 1 2 1
Reference Counting What if we want to delete this deque? Head = Tail = NULL; Head NULL Tail 1 2 1 Cycles lead to Orphaned Garbage
Reference Counting Reference Counting is used in C++ “smart pointers” “Shared” pointers that cause reference counting “Weak” pointers won’t keep an object alive (don’t affect reference count) Programmers need to pay attention to right kind of pointer Sort-of a half-way between manual new/delete and “real” garbage collection Head Tail 1 1 2
Tracing Collectors More robust solution are “tracing” collectors Start with a “root set” of all references Objects you “know” without having to have a pointer to them (e.g. globals) See what the objects in the root set point to, follow those, etc. Need to know how to find pointers within an object Bascially, traces through to find all the reachable objects Everything else must be garbage
{ Tracing Collectors .... r0 = obj globals stack registers heap A B C Compiler produces a stack-map at GC safe-points and Type Information Blocks GC safe points: new(), method entry, method exit, & back-edges (thread switch points) Stack-map: enumerate global variables, stack variables, live registers -- This code is hard to get right! Why? Type Information Blocks: identify reference fields in objects A B C { .... r0 = obj PC -> p.f = obj globals stack registers heap
{ Tracing Collectors .... r0 = obj globals stack registers heap A B C Compiler produces a stack-map at GC safe-points and Type Information Blocks Type Information Blocks: identify reference fields in objects for each type i (class) in the program, a map TIBi 2 3 A B C { .... r0 = obj PC -> p.f = obj globals stack registers heap
{ Tracing Collectors Tracing collector (semispace, marksweep) mark Marks the objects reachable from the roots live, and then performs a transitive closure over them mark A B C { .... r0 = obj PC -> p.f = obj globals stack registers heap
{ Tracing Collectors Tracing collector (semispace, marksweep) mark Marks the objects reachable from the roots live, and then performs a transitive closure over them mark A B C { .... r0 = obj PC -> p.f = obj globals stack registers heap
{ Tracing Collectors Tracing collector (semispace, marksweep) mark Marks the objects reachable from the roots live, and then performs a transitive closure over them mark A B C { .... r0 = obj PC -> p.f = obj globals stack registers heap
{ Tracing Collectors Tracing collector (semispace, marksweep) Marks the objects reachable from the roots live, and then performs a transitive closure over them All unmarked objects are dead, and can be reclaimed mark A B C { .... r0 = obj PC -> p.f = obj globals stack registers heap
{ Tracing Collectors Tracing collector (semispace, marksweep) Marks the objects reachable from the roots live, and then performs a transitive closure over them All unmarked objects are dead, and can be reclaimed sweep A B C { .... r0 = obj PC -> p.f = obj globals stack registers heap
Conservative Collectors What if we didn’t have the type information block? That is, we can’t identify the pointers within an object
Conservative Collectors What if we didn’t have the type information block? That is, we can’t identify the pointers within an object e.g. with C or vanilla C++ pointers Answer: Do “Conservative Collection” Treat every value like it might be a pointer If it looks like it might point to a memory region in the heap, assume it is a pointer Trace the block of data that was “malloc’d” that contains that address In some architectures, pointers must be word-aligned (least significant two bits are zero) which helps filter out random integers But “unfortunate integers” can keep memory alive Also, can’t move objects since we can’t safely backpatch pointers (since, they might really be integers)
Today Garbage Collection Why use garbage collection? What is garbage? Reachable vs live, stack maps, etc. Allocators and their collection mechanisms Semispace Marksweep Performance comparisons Incremental age based collection Write barriers: Friend or foe? Generational Beltway More performance
Semispace Fast bump pointer allocation Requires copying collection Cannot incrementally reclaim memory, must free en masse Reserves 1/2 the heap to copy in to, in case all objects are live to space from space heap
Semispace Fast bump pointer allocation Requires copying collection Cannot incrementally reclaim memory, must free en masse Reserves 1/2 the heap to copy in to, in case all objects are live to space from space heap
Semispace Fast bump pointer allocation Requires copying collection Cannot incrementally reclaim memory, must free en masse Reserves 1/2 the heap to copy in to, in case all objects are live to space from space heap
Semispace Fast bump pointer allocation Requires copying collection Cannot incrementally reclaim memory, must free en masse Reserves 1/2 the heap to copy in to, in case all objects are live to space from space heap
Semispace Mark phase: copies object when collector first encounters it installs forwarding pointers from space to space heap
Semispace Mark phase: copies object when collector first encounters it installs forwarding pointers performs transitive closure, updating pointers as it goes from space to space heap
Semispace Mark phase: copies object when collector first encounters it installs forwarding pointers performs transitive closure, updating pointers as it goes from space to space heap
Semispace Mark phase: copies object when collector first encounters it installs forwarding pointers performs transitive closure, updating pointers as it goes from space to space heap
Semispace Mark phase: copies object when collector first encounters it installs forwarding pointers performs transitive closure, updating pointers as it goes reclaims “from space” en masse from space to space heap
Semispace Mark phase: from space to space heap copies object when collector first encounters it installs forwarding pointers performs transitive closure, updating pointers as it goes reclaims “from space” en masse start allocating again into “to space” from space to space heap
Semispace Mark phase: from space to space heap copies object when collector first encounters it installs forwarding pointers performs transitive closure, updating pointers as it goes reclaims “from space” en masse start allocating again into “to space” from space to space heap
Semispace Notice: fast allocation locality of contemporaneously allocated objects locality of objects connected by pointers wasted space from space to space heap
Marksweep Free-lists organized by size blocks of same size, or individual objects of same size Most objects are small < 128 bytes 4 8 12 16 ... 128 ... heap ... free lists
Marksweep Allocation heap free lists Grab a free object off the free list 4 8 12 16 ... 128 ... heap ... free lists
Marksweep Allocation heap free lists Grab a free object off the free list 4 8 12 16 ... 128 ... heap ... free lists
Marksweep Allocation heap free lists Grab a free object off the free list 4 8 12 16 ... 128 ... heap ... free lists
Marksweep heap free lists Allocation Grab a free object off the free list No more memory of the right size triggers a collection Mark phase - find the live objects Sweep phase - put free ones on the free list 4 8 12 16 ... 128 ... heap ... free lists
Marksweep heap free lists Mark phase Sweep phase Transitive closure marking all the live objects Sweep phase sweep the memory for free objects populating free list 4 8 12 16 ... 128 ... heap ... free lists
Marksweep heap free lists Mark phase Sweep phase Transitive closure marking all the live objects Sweep phase sweep the memory for free objects populating free list 4 8 12 16 ... 128 ... heap ... free lists
Marksweep heap free lists Mark phase Sweep phase Transitive closure marking all the live objects Sweep phase sweep the memory for free objects populating free list 4 8 12 16 ... 128 ... heap ... free lists
Marksweep heap free lists Mark phase Sweep phase Transitive closure marking all the live objects Sweep phase sweep the memory for free objects populating free list can be made incremental by organizing the heap in blocks and sweeping one block at a time on demand 4 8 12 16 ... 128 ... heap ... free lists
Marksweep heap free lists space efficiency Incremental object reclamation relatively slower allocation time poor locality of contemporaneously allocated objects 4 8 12 16 ... 128 ... heap ... free lists
How do these differences play out in practice? Marksweep space efficiency Incremental object reclamation relatively slower allocation time poor locality of contemporaneously allocated objects Semispace fast allocation locality of contemporaneously allocated objects locality of objects connected by pointers wasted space
Methodology [SIGMETRICS 2004] Compare Marksweep (MS) and Semispace (SS) Mutator time, GC time, total time Jikes RVM & MMTk replay compilation measure second iteration without compilation Platforms 1.6GHz G5 (PowerPC 970) 1.9GHz AMD Athlon 2600+ 2.6GHz Intel P4 Linux 2.6.0 with perfctr patch & libraries Separate accounting of GC & Mutator counts SPECjvm98 & pseudojbb
Allocation Mechanism Bump pointer Free list ~70 bytes IA32 instructions, 726MB/s Free list ~140 bytes IA32 instructions, 654MB/s Bump pointer 11% faster in tight loop < 1% in practical setting No significant difference (?)
Mutator Time
jess
jess
jess
jess
javac
pseudojbb
Geometric Mean Mutator Time
Garbage Collection Time
Garbage Collection Time javac pseudojbb jess Geometric mean
Total Time
Total Time javac pseudojbb jess Geometric mean
MS/SS Crossover: 1.6GHz PPC
MS/SS Crossover: 1.9GHz AMD
MS/SS Crossover: 2.6GHz P4
MS/SS Crossover: 3.2GHz P4
MS/SS Crossover locality space 2.6GHz 1.6GHz 3.2GHz 1.9GHz