Portable, mostly-concurrent, mostly-copying GC for multi-processors Tony Hosking Secure Software Systems Lab Purdue University.

Slides:



Advertisements
Similar presentations
An Implementation of Mostly- Copying GC on Ruby VM Tomoharu Ugawa The University of Electro-Communications, Japan.
Advertisements

Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
A Coherent and Managed Runtime for ML on the SCC KC SivaramakrishnanLukasz Ziarek Suresh Jagannathan Purdue University SUNY Buffalo Purdue University.
Automatic Memory Management Noam Rinetzky Schreiber 123A /seminar/seminar1415a.html.
Compilation /15a Lecture 13 Compiling Object-Oriented Programs Noam Rinetzky 1.
Virtual Memory Primitives for User Programs Andrew W. Appel and Kai Li Presented by Phil Howard.
Reducing Pause Time of Conservative Collectors Toshio Endo (National Institute of Informatics) Kenjiro Taura (Univ. of Tokyo)
An On-the-Fly Mark and Sweep Garbage Collector Based on Sliding Views Hezi Azatchi - IBM Yossi Levanoni - Microsoft Harel Paz – Technion Erez Petrank –
On-the-Fly Garbage Collection: An Exercise in Cooperation Edsget W. Dijkstra, Leslie Lamport, A.J. Martin and E.F.M. Steffens Communications of the ACM,
Efficient Concurrent Mark-Sweep Cycle Collection Daniel Frampton, Stephen Blackburn, Luke Quinane and John Zigman (Pending submission) Presented by Jose.
Mark DURING Sweep rather than Mark then Sweep Presented by Ram Mantsour Authors: Chrisitan Queinnec, Barbara Beaudoing, Jean-Pierre Queille.
Parallel Garbage Collection Timmie Smith CPSC 689 Spring 2002.
OOPSLA 2003 Mostly Concurrent Garbage Collection Revisited Katherine Barabash - IBM Haifa Research Lab. Israel Yoav Ossia - IBM Haifa Research Lab. Israel.
1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.
G Robert Grimm New York University Cool Pet Tricks with… …Virtual Memory.
MOSTLY PARALLEL GARBAGE COLLECTION Authors : Hans J. Boehm Alan J. Demers Scott Shenker XEROX PARC Presented by:REVITAL SHABTAI.
0 Parallel and Concurrent Real-time Garbage Collection Part I: Overview and Memory Allocation Subsystem David F. Bacon T.J. Watson Research Center.
Runtime The optimized program is ready to run … What sorts of facilities are available at runtime.
3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.
Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.
3.5 Interprocess Communication
Incremental Garbage Collection
 2004 Deitel & Associates, Inc. All rights reserved. Chapter 4 – Thread Concepts Outline 4.1 Introduction 4.2Definition of Thread 4.3Motivation for Threads.
1 Reducing Generational Copy Reserve Overhead with Fallback Compaction Phil McGachey and Antony L. Hosking June 2006.
Damien Doligez Georges Gonthier POPL 1994 Presented by Eran Yahav Portable, Unobtrusive Garbage Collection for Multiprocessor Systems.
User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
Garbage Collection Memory Management Garbage Collection –Language requirement –VM service –Performance issue in time and space.
A Parallel, Real-Time Garbage Collector Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao.
1 Overview Assignment 6: hints  Living with a garbage collector Assignment 5: solution  Garbage collection.
SEG Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms.
ISMM 2004 Mostly Concurrent Compaction for Mark-Sweep GC Yoav Ossia, Ori Ben-Yitzhak, Marc Segal IBM Haifa Research Lab. Israel.
The Impact of Performance Asymmetry in Multicore Architectures Saisanthosh Ravi Michael Konrad Balakrishnan Rajwar Upton Lai UW-Madison and, Intel Corp.
CS533 - Concepts of Operating Systems Virtual Memory Primitives for User Programs Presentation by David Florey.
Fast Conservative Garbage Collection Rifat Shahriyar Stephen M. Blackburn Australian National University Kathryn S. M cKinley Microsoft Research.
1162 JDK 5.0 Features Christian Kemper Principal Architect Borland.
 2004 Deitel & Associates, Inc. All rights reserved. 1 Chapter 4 – Thread Concepts Outline 4.1 Introduction 4.2Definition of Thread 4.3Motivation for.
1 Real-Time Replication Garbage Collection Scott Nettles and James O’Toole PLDI 93 Presented by: Roi Amir.
Incremental Garbage Collection Uwe Kern 23. Januar 2002
Supporting Multi-Processors Bernard Wong February 17, 2003.
Kernel Locking Techniques by Robert Love presented by Scott Price.
Garbage Collection and Classloading Java Garbage Collectors  Eden Space  Surviver Space  Tenured Gen  Perm Gen  Garbage Collection Notes Classloading.
Runtime System CS 153: Compilers. Runtime System Runtime system: all the stuff that the language implicitly assumes and that is not described in the program.
Concurrent Garbage Collection Presented by Roman Kecher GC Seminar, Tel-Aviv University 23-Dec-141.
Automated and Modular Refinement Reasoning for Concurrent Programs Shaz Qadeer.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 4: Threads.
GARBAGE COLLECTION IN AN UNCOOPERATIVE ENVIRONMENT Hans-Juergen Boehm Computer Science Dept. Rice University, Houston Mark Wieser Xerox Corporation, Palo.
Processes & Threads Introduction to Operating Systems: Module 5.
Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.
OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.
Runtime The optimized program is ready to run … What sorts of facilities are available at runtime.
4.1 Introduction to Threads Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads.
Concurrent Mark-Sweep Presented by Eyal Dushkin GC Seminar, Tel-Aviv University
® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.
The Metronome Washington University in St. Louis Tobias Mann October 2003.
Operating Systems Unit 2: – Process Context switch Interrupt Interprocess communication – Thread Thread models Operating Systems.
GC Assertions: Using the Garbage Collector To Check Heap Properties Samuel Z. Guyer Tufts University Edward Aftandilian Tufts University.
Chapter 4 – Thread Concepts
CS 6560: Operating Systems Design
Chapter 4 – Thread Concepts
Rifat Shahriyar Stephen M. Blackburn Australian National University
Department of Computer Science University of California, Santa Barbara
Strategies for automatic memory management
Department of Computer Science University of California, Santa Barbara
CS 5204 Operating Systems Lecture 5
Continuously and Compacting By Lyndon Meadow
New GC collectors in Java 11
Nikola Grcevski Testarossa JIT Compiler IBM Toronto Lab
Presentation transcript:

Portable, mostly-concurrent, mostly-copying GC for multi-processors Tony Hosking Secure Software Systems Lab Purdue University

Platform assumptions Symmetric multi-processor (SMP/CMP) Multiple mutator threads (Large heaps)

Desirable properties Maximize throughput Minimize collector pauses Scalability

Exploiting parallelism Avoid contention (Mostly-)Concurrent allocation (Mostly-)Concurrent collection

Concurrent allocation Use thread-private allocation “pages” Threads contend for free pages Each thread allocates from its own page multiple small objects per page, or multiple pages per large object

Concurrent collection: The tricolour abstraction Black “live” scanned cannot refer to white Grey “live” wavefront still to be scanned may refer to any color White hypothetical garbage

Garbage collection White = whole heap Shade root targets grey While grey nonempty Shade one grey object black Shade its white children grey At end, white objects are garbage

Copying collection Partition white from black by copying Reclaim white partition wholesale At next GC, “flip” black to white

Mutator threads Incremental collection

Mutator threads Concurrent collection Background GC thread

Concurrent mutators Mutation changes reachability during GC Loss of black/grey reference is safe Non-white object losing its last reference will be garbage at next GC New reference from black to white New reference may make target live Collector may never see new reference Mutations may require compensation

Compensation options Prevent mutator from creating black-to- white references write barrier on black read barrier on grey to prevent mutator obtaining white refs Prevent destruction of any path from a grey object to a white object without telling GC write barrier on grey

Mostly-copying GC [Bartlett] Copying collection with ambiguous roots Uncooperative compilers Untidy references Explicit pinning Pin ambiguously-referenced objects Shade their page grey without copying Assume heap accuracy Copy remaining heap-referenced objects

Incremental MCGC [DeTreville] Enforce grey mutator invariant –STW greys ambiguously-referenced pages –Read barrier on grey using VM page protection Read barrier –Stop mutator threads –Unprotect page –Copy white targets to grey –Shade page black –Restart threads Atomic system call wrappers unprotect parameter targets (otherwise traps in OS return error)

Concurrent MCGC? Stopping all threads at each increment is prohibitive on SMP & impedes concurrency BUT barriers difficult to place on ambiguous references with uncooperative compilers ALSO Preemptive scheduling may break wrapper atomicity

Mostly-concurrent MCGC Enforce black mutator invariant STW blackens ambiguously-referenced pages Read barrier on load of accurate (tidy) grey reference Read barrier: Blacken grey references as they are loaded No system call wrappers: arguments are always black

Read barrier on load of grey Object header bit marks grey objects Inline fast path checks grey bit in target header, calls out to slow path if set Out-of-line slow path: Lock heap meta-data For each (grey) source object in target page Copy white targets to grey Clear grey header bit Shade target page black Unlock heap meta-data

Coherence for fast path STW phase synchronizes mutators’ views of heap state Grey bits are set only in newly-copied objects (ie, newly-allocated grey pages) since most recent STW Mutators can never see a cleared grey header unless the page is also black Seeing a spurious grey header due to weak ordering is benign: slow path will synchronize

Implementation Modula-3: gcc-based compiler back-end No tricky target-specific stack-maps Compiler front-end emits barriers M3 threads map to preemptively-scheduled POSIX pthreads Stop/start threads: signals + semaphores, or OS primitives if available Simple to port: Darwin (OS X), Linux, Solaris, Alpha/OSF

Experiments Parallelized GCOld benchmark to permit throughput measurements for multiple mutators Measures steady-state GC throughput 2 platforms: 2 x 2.3GHz PowerPC Macintosh Xserve running OS X x 700MHz Intel Pentium 3 SMP running Linux 2.6

Read Barriers: STW 1 user-level mutator thread, work=1

Elapsed time (s) 1 system-level mutator thread, work=1

Heap size 1 system-level mutator thread

BMU 1 system-level mutator thread, work=1000, ratio=1

Scalability work=1000, ratio=1, 8xP3

Java Hotspot server work=1000, 8xP3

Conclusions Mostly-concurrent,mostly-copying collection is feasible for multi-processors (proof-of- existence) Performance is good (scalable) Portable: changes only to compiler front-end to introduce barriers, and to GC run-time system Compiler back-end unchanged: full-blown optimizations enabled, no stack-map overheads

Future work Convert read barrier to “clean” only target object instead of whole page

BMU 1 system-level mutator thread, work=10, ratio=1

Scalability work=10, ratio=1, 8xP3

Java Hotspot server work=10, 8xP3