© Imperial College London Exploring the Barrier to Entry Incremental Generational Garbage Collection for Haskell Andy Cheadle & Tony Field Imperial College.

Slides:



Advertisements
Similar presentations
Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
Advertisements

1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center.
Automatic Memory Management Noam Rinetzky Schreiber 123A /seminar/seminar1415a.html.
Lecture 10: Heap Management CS 540 GMU Spring 2009.
Garbage Collection  records not reachable  reclaim to allow reuse  performed by runtime system (support programs linked with the compiled code) (support.
Garbage Collection CSCI 2720 Spring Static vs. Dynamic Allocation Early versions of Fortran –All memory was static C –Mix of static and dynamic.
By Jacob SeligmannSteffen Grarup Presented By Leon Gendler Incremental Mature Garbage Collection Using the Train Algorithm.
MC 2 : High Performance GC for Memory-Constrained Environments N. Sachindran, E. Moss, E. Berger Ivan JibajaCS 395T *Some of the graphs are from presentation.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 18.
Memory Management. History Run-time management of dynamic memory is a necessary activity for modern programming languages Lisp of the 1960’s was one of.
1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.
Garbage Collection Mooly Sagiv html://
1 Storage Registers vs. memory Access to registers is much faster than access to memory Goal: store as much data as possible in registers Limitations/considerations:
Virtual Memory Primitives for User Programs Andrew W. Appel and Kai Li Presented by: Khanh Nguyen.
Memory Allocation. Three kinds of memory Fixed memory Stack memory Heap memory.
Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch
Honors Compilers Addressing of Local Variables Mar 19 th, 2002.
Runtime The optimized program is ready to run … What sorts of facilities are available at runtime.
Run-Time Storage Organization
Run time vs. Compile time
Incremental Garbage Collection
An Adaptive, Region-based Allocator for Java Feng Qian & Laurie Hendren 2002.
Age-Oriented Concurrent Garbage Collection Harel Paz, Erez Petrank – Technion, Israel Steve Blackburn – ANU, Australia April 05 Compiler Construction Scotland.
The environment of the computation Declarations introduce names that denote entities. At execution-time, entities are bound to values or to locations:
Garbage collection (& Midterm Topics) David Walker COS 320.
Garbage Collection Mooly Sagiv
1 Reducing Generational Copy Reserve Overhead with Fallback Compaction Phil McGachey and Antony L. Hosking June 2006.
Jangwoo Shin Garbage Collection for Real-Time Java.
Uniprocessor Garbage Collection Techniques Paul R. Wilson.
Using Generational Garbage Collection To Implement Cache- conscious Data Placement Trishul M. Chilimbi & James R. Larus מציג : ראובן ביק.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
Garbage Collection Memory Management Garbage Collection –Language requirement –VM service –Performance issue in time and space.
A Parallel, Real-Time Garbage Collector Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao.
1 Overview Assignment 6: hints  Living with a garbage collector Assignment 5: solution  Garbage collection.
Address Obfuscation: An Efficient Approach to Combat a Broad Range of Memory Error Exploits Sandeep Bhatkar, Daniel C. DuVarney, and R. Sekar Stony Brook.
Chapter 7: Runtime Environment –Run time memory organization. We need to use memory to store: –code –static data (global variables) –dynamic data objects.
380C Lecture 17 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Why you need to care about workloads.
A Real-Time Garbage Collector Based on the Lifetimes of Objects Henry Lieberman and Carl Hewitt (CACM, June 1983) Rudy Kaplan Depena CS395T: Memory Management.
Fast Conservative Garbage Collection Rifat Shahriyar Stephen M. Blackburn Australian National University Kathryn S. M cKinley Microsoft Research.
A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.
Message Analysis-Guided Allocation and Low-Pause Incremental Garbage Collection in a Concurrent Language Konstantinos Sagonas Jesper Wilhelmsson Uppsala.
1 Real-Time Replication Garbage Collection Scott Nettles and James O’Toole PLDI 93 Presented by: Roi Amir.
Incremental Garbage Collection Uwe Kern 23. Januar 2002
COP4020 Programming Languages Subroutines and Parameter Passing Prof. Xin Yuan.
Runtime Environments. Support of Execution  Activation Tree  Control Stack  Scope  Binding of Names –Data object (values in storage) –Environment.
OOPLs /FEN March 2004 Object-Oriented Languages1 Object-Oriented Languages - Design and Implementation Java: Behind the Scenes Finn E. Nordbjerg,
September 11, 2003 Beltway: Getting Around GC Gridlock Steve Blackburn, Kathryn McKinley Richard Jones, Eliot Moss Modified by: Weiming Zhao Oct
RUN-Time Organization Compiler phase— Before writing a code generator, we must decide how to marshal the resources of the target machine (instructions,
380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center ControllingFragmentation and Space Consumption in the Metronome.
A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center.
Parameter Passing Mechanisms CS308 Compiler Theory.
Runtime The optimized program is ready to run … What sorts of facilities are available at runtime.
2/4/20161 GC16/3011 Functional Programming Lecture 20 Garbage Collection Techniques.
Procedures and Functions Procedures and Functions – subprograms – are named fragments of program they can be called from numerous places  within a main.
® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.
The Metronome Washington University in St. Louis Tobias Mann October 2003.
CS412/413 Introduction to Compilers and Translators April 21, 1999 Lecture 30: Garbage collection.
LECTURE 19 Subroutines and Parameter Passing. ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments.
Runtime Environments Chapter 7. Support of Execution  Activation Tree  Control Stack  Scope  Binding of Names –Data object (values in storage) –Environment.
1 The Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss.
Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.
Dynamic Compilation Vijay Janapa Reddi
David F. Bacon, Perry Cheng, and V.T. Rajan
CMPE 152: Compiler Design May 2 Class Meeting
Mooly Sagiv html:// Garbage Collection Mooly Sagiv html://
Reference Counting vs. Tracing
Presentation transcript:

© Imperial College London Exploring the Barrier to Entry Incremental Generational Garbage Collection for Haskell Andy Cheadle & Tony Field Imperial College London Simon Marlow & Simon Peyton Jones Microsoft Research, Cambridge, UK Lyndon While The University of Western Australia, Perth

© Imperial College London Page 2 Introduction We focus on Haskell with the intent of building an: Efficient Barrierless Hybrid Incremental Generational … garbage collector for GHC Investigate pause time bounds and mutator utilisation. Explore application to other dynamic dispatch systems.

© Imperial College London Page 3 Highlights Improving Non-Stop Haskell –Incremental GC read-barrier optimisation without the per-object space overhead Bridging the Generation Gap –Generational GC write-barrier optimisation Consistent Mutator Utilisation –Time-based versus Work based scheduling

© Imperial College London Page 4 Barriers: Friend or Foe - Summary Blackburn & Hosking - ISMM 2004 Conditional read-barrier –AMD: 21.24%, P4: 15.91%, PPC: 6.49% –Incremental GC: Standard Baker read-barrier Unconditional read-barrier –AMD: 8.05%, P4: 5.04%, PPC: 0.85% –Brooks indirection read-barrier –Metronome ‘Eager’ barrier ~ 4% –BUT: space overhead -> increased GC count Must consider GC cost!!!

© Imperial College London Page 5 Non-Stop Haskell Implementing Baker’s incremental collector typically introduces high overheads –The software read-barrier We have shown that this can be done efficiently in systems with dynamic dispatching Caveat Dynamic dispatching already “costs” something; we show that incremental garbage collection comes at virtually no extra cost.

© Imperial College London Page 6 Dynamic Dispatch and the STG Machine The STG machine is a model for the compilation of lazy functional languages All objects are represented on the heap as closures: To compute function ‘f’ applied to arguments ‘a b c d’ jump to Entry code 0:3: imm2:1:4: imm 2, 2 Other fields Entry code … heap pointers abcdf static info table

© Imperial College London Page 7 The Read-Barrier Invariant 2r 2: unscavenged 3r 3: unscavenged Stack top from-space to-space 1r 1: scavenged Problem 1 Problem 2

© Imperial College London Page 8 When the garbage collector is on make info pointers point to code that scavenges evacuated closures before entering them At all other times the system operates with no read barrier! Invariant Problem 1: Scavenging Closures 0:3: imm2:1:4: imm Self-scav code … heap pointers 2, 2 Other fields

© Imperial College London Page 9 QHow do we restore the original info pointer? AWe remember it when the closure is evacuated Non-Stop Haskell: Use an extra word in to-space Note: the space overhead applies only to objects copied from from-space but effectively reduces to-space by 30% Freshly allocated objects carry no space overhead 0:3: imm2:1:4: imm 2, 2 Other fields Entry code … heap pointers -1: Self-scav code … 2, 2 Other fields

© Imperial College London Page 10 QHow do we restore the original info pointer? AWe remember it when the closure is evacuated In production: Specialise every closure type at compile time Runtime space overhead is replaced by a static one of ~ 25% 0:3: imm2:1:4: imm 2, 2 Other fields Entry code … heap pointers Self-scav code JMP Entry code 2, 2 Other fields

© Imperial College London Page 11 Invariant Problem 2: Stack Scavenging STG machine stack frames look just like closures Before returning to the caller frame we ‘hijack’ the caller’s return address, replacing it with a pointer to self- scavenging code for that frame 1: scavenged 2r 2: unscavenged 3r 3: unscavenged 3r 3: unscavenged 2: scavenged scav; mod 3r; update; return scav; mod 4r; update; return update; return

© Imperial College London Page 12 Background Scavenging GHC’s heap is block allocated. So, scavenge at: –Every Allocation (EA) –Every Block allocation (EB) Reduce forced-completions via block chaining Incremental scavenger pauses are allocation- dependent Exploit GHC’s lightweight scheduler to implement a time-scheduled scavenger (Jikes RVM Metronome) –Consistent mutator utilisation –Increase in forced-completions due to allocation bursts

© Imperial College London Page 13 Results – Binary Sizes

© Imperial College London Page 14 Results – Runtimes

© Imperial College London Page 15

© Imperial College London Page 16

© Imperial College London Page 17

© Imperial College London Page 18

© Imperial College London Page 19

© Imperial College London Page 20

© Imperial College London Page 21 The Generational Write-barrier root set for generation N – 1 inter-generational pointer generation N generation N - 1 root set Depending on the number of updates, the write-barrier can impose an overhead of 8 – 24% (NJ/ML and Clean).

© Imperial College London Page 22 Bridging the Generation Gap We implement in GHC a mechanism that again exploits dynamic dispatch to eliminate unnecessary write-barriers: root set for generation 0 generation 0 THUNK_SELECT THUNK_1 THUNK_2 root set Promote to generation 1

© Imperial College London Page 23 Bridging the Generation Gap root set for generation 0 generation 1 generation 0 THUNK_SELECT THUNK_1 THUNK_2 IND_PRE_UPD root set force THUNK selectee evaluation

© Imperial College London Page 24 Bridging the Generation Gap root set for generation 0 generation 1 generation 0 THUNK_SELECT THUNK_1 THUNK_2 IND_UPD IND_PRE_UPD root set

© Imperial College London Page 25 Bridging the Generation Gap root set for generation 0 generation 1 generation 0 THUNK_SELECT THUNK_1 IND_OLDGEN IND_UPD IND_PRE_UPD root set CONSTR_2 inter-generational pointer Preliminary benchmarks suggested a reduction of 5 - 9%, in production it is actually around 2 - 3%.

© Imperial College London Page 26 Ongoing Work Unfortunately Java programs are not “pure” in their use of dynamic dispatch –Field access via get() / set() methods –Inlining must be disallowed Application of read-barrier optimisation to Java Investigating within Jikes RVM: Inter- and intra-class inlining Code bloat arising from get() / set() methods, restricted inlining and additional per-class VMT Cost of VMT TIB pointer flip

© Imperial College London Page 27 Removal of collector-specific barriers and tests: Yields cheaper ‘vanilla’ collectors Allows the efficient hybridisation of multiple collector algorithms Conclusion Time-based scheduling is massively attractive, but: Complete decoupling from the allocator is problematic* A hybrid approach looks promising: –Parameterised by mutator utilisation –Sensitive to allocation rate Elimination of per-object overhead: Mandatory for our production collector