Steve Blackburn Department of Computer Science Australian National University Perry Cheng TJ Watson Research Center IBM Research Kathryn McKinley Department of Computer Sciences University of Texas at Austin IBM Research Myths & Realities The Performance Impact of Garbage Collection
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Background No prior apples-to-apples comparisons MMTk Canonical policies implemented (SS, MS, RC, genX, etc) – Shared mechanisms – Good performance (match/beat old Watson GCs) – Ideal platform for apples-to-apples comparisons
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Some Questions Architecture – How well do modern OO languages play to modern architectures? Collection – Is generational GC “a waste of time”? – Are write barriers expensive? Allocation – Free list or bump pointer? “Locality is everything” – Really??? – Is it different for young & old? Why? Locality and architecture – What is the impact, what is the trend?
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Methodology Jikes RVM & MMTk Platforms 1.6GHz G5 (PowerPC 970) 1.9GHz AMD Athlon GHz Intel P4 Linux with perfctr patch & libraries – Separate accounting of GC & Mutator perf counts SPECjvm98 & pseudojbb
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Architecture
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Relative Performance Athlon GHz P4 2.6GHz G5 1.6GHz compress jess raytrace db javac mtrt jack pseudojbb
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Architecture - Q & A How big is the mismatch between modern arch & modern languages???
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Allocation
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Allocation Choices Bump pointer – ~70 bytes IA32 instructions, 726MB/s Free list – ~140 bytes IA32 instructions, 654MB/s Bump pointer 11% faster in tight loop – < 1% in practical setting – No significant difference (?) Second order effects? – Locality?? – Collection mechanism??
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Implications for Locality Compare SS & MS mutator – Mutator time = total – GC time – Mutator memory performance: L1, L2 & TLB
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection jess
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection jess
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection jess
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection jess
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection javac
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection pseudojbb
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection db
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Locality
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Bump Pointer & Free List Is the locality differential age-dependant? Re-run experiment with GenCopy & GenMS –Generational variants of MarkSweep & SemiSpace –Young objects treated identically –Mature objects either SemiSpace or MarkSweep
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Bump Pointer & Free List
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Bump Pointer & Free List Why? Mature space locality? Nursery absorbs most allocs – lower frag Relatively frequent copying in SS Contigious allocation in nursery?
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Bump Pointer & Free List Why? Mature space locality? Nursery absorbs most allocs – lower frag Relatively frequent copying in SS Contigious allocation in nursery?
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Bump Pointer & Free List Why? Mature space locality Nursery absorbs most allocs – lower frag Relatively frequent copying in SS Contigious allocation in nursery
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Bump Pointer & Free List Run SS & MS in “infinite” heap
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Bump Pointer & Free List Run SS & MS in “infinite” heap
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Bump Pointer & Free List Run SS & MS in “infinite” heap Infinite heap does not degrade locality (!?) – Exceptions: jess (degrades), db (improves) why? – Is spatial locality unimportant in mature space???
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection BP & FL Locality Implications Is spatial locality unimportant in mature space?? –No [Huang et al OOPSLA 2004] –But perhaps temporal locality is more significant Seems clear contiguous allocation is good –Vast majority of objects < cache line –h/w prefetcher may be significant Hard to improve over alloc order, easy to mess up? –Unlikely to be true: MarkSweep < Compacting < SemiSpace
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Locality & Architecture
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection MS/SS Crossover: 1.6GHz PPC
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection MS/SS Crossover: 1.9GHz AMD
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection MS/SS Crossover: 2.6GHz P4
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection MS/SS Crossover: 3.2GHz P4
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection MS/SS Crossover 2.6GHz 1.9GHz 1.6GHz localityspace 3.2GHz
Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Conclusions Need for (re) evaluation of GC performance –Key GC insights > 20yrs old –Technology has changed –Absence of apples-to-apples comparisons –Highly architecturally sensitive MMTk + perf counters –High performance infrastructure –Multiple GCs, shared mechanisms Some myths exposed & new realities