An On-the-Fly Reference Counting Garbage Collector for Java Erez Petrank Technion – Israel Institute of Technology Joint work with Yossi Levanoni – Microsoft.

Slides:



Advertisements
Similar presentations
Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
Advertisements

1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center.
Compilation /15a Lecture 13 Compiling Object-Oriented Programs Noam Rinetzky 1.
On-the-Fly Garbage Collection Using Sliding Views Erez Petrank Technion – Israel Institute of Technology Joint work with Yossi Levanoni, Hezi Azatchi,
Incorporating Generations into a Modern Reference Counting Garbage Collector Hezi Azatchi Advisor: Erez Petrank.
Garbage Collection What is garbage and how can we deal with it?
MC 2 : High Performance GC for Memory-Constrained Environments - Narendran Sachindran, J. Eliot B. Moss, Emery D. Berger Sowmiya Chocka Narayanan.
Garbage Collection  records not reachable  reclaim to allow reuse  performed by runtime system (support programs linked with the compiled code) (support.
An On-the-Fly Mark and Sweep Garbage Collector Based on Sliding Views Hezi Azatchi - IBM Yossi Levanoni - Microsoft Harel Paz – Technion Erez Petrank –
On-the-Fly Garbage Collection: An Exercise in Cooperation Edsget W. Dijkstra, Leslie Lamport, A.J. Martin and E.F.M. Steffens Communications of the ACM,
Efficient Concurrent Mark-Sweep Cycle Collection Daniel Frampton, Stephen Blackburn, Luke Quinane and John Zigman (Pending submission) Presented by Jose.
Heap Shape Scalability Scalable Garbage Collection on Highly Parallel Platforms Kathy Barabash, Erez Petrank Computer Science Department Technion, Israel.
Mark and Sweep Algorithm Reference Counting Memory Related PitFalls
An Efficient Machine-Independent Procedure for Garbage Collection in Various List Structures, Schorr and Waite CACM August 1967, pp Curtis Dunham.
ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing.
Using Prefetching to Improve Reference-Counting Garbage Collectors Harel Paz IBM Haifa Research Lab Erez Petrank Microsoft Research and Technion.
OOPSLA 2003 Mostly Concurrent Garbage Collection Revisited Katherine Barabash - IBM Haifa Research Lab. Israel Yoav Ossia - IBM Haifa Research Lab. Israel.
1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.
© 2005 IBM Corporation ISMM’06 Ottawa, Ontario, Canada June 10 th 2006 | ISMM’06 Ottawa, Ontario, Canada © 2006 IBM Corporation Improving Locality with.
MOSTLY PARALLEL GARBAGE COLLECTION Authors : Hans J. Boehm Alan J. Demers Scott Shenker XEROX PARC Presented by:REVITAL SHABTAI.
Connectivity-Based Garbage Collection Presenter Feng Xian Author Martin Hirzel, et.al Published in OOPSLA’2003.
Memory Allocation and Garbage Collection. Why Dynamic Memory? We cannot know memory requirements in advance when the program is written. We cannot know.
JVM-1 Introduction to Java Virtual Machine. JVM-2 Outline Java Language, Java Virtual Machine and Java Platform Organization of Java Virtual Machine Garbage.
Age-Oriented Concurrent Garbage Collection Harel Paz, Erez Petrank – Technion, Israel Steve Blackburn – ANU, Australia April 05 Compiler Construction Scotland.
1 An Efficient On-the-Fly Cycle Collection Harel Paz, Erez Petrank - Technion, Israel David F. Bacon, V. T. Rajan - IBM T.J. Watson Research Center Elliot.
Damien Doligez Georges Gonthier POPL 1994 Presented by Eran Yahav Portable, Unobtrusive Garbage Collection for Multiprocessor Systems.
Comparison of JVM Phases on Data Cache Performance Shiwen Hu and Lizy K. John Laboratory for Computer Architecture The University of Texas at Austin.
Reference Counters Associate a counter with each heap item Whenever a heap item is created, such as by a new or malloc instruction, initialize the counter.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
Garbage Collection Memory Management Garbage Collection –Language requirement –VM service –Performance issue in time and space.
A Parallel, Real-Time Garbage Collector Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao.
SEG Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms.
Flexible Reference-Counting-Based Hardware Acceleration for Garbage Collection José A. Joao * Onur Mutlu ‡ Yale N. Patt * * HPS Research Group University.
Taking Off The Gloves With Reference Counting Immix
An Adaptive, Region-based Allocator for Java Feng Qian, Laurie Hendren {fqian, Sable Research Group School of Computer Science McGill.
Ulterior Reference Counting: Fast Garbage Collection without a Long Wait Author: Stephen M Blackburn Kathryn S McKinley Presenter: Jun Tao.
Lecture 10 : Introduction to Java Virtual Machine
Fast Conservative Garbage Collection Rifat Shahriyar Stephen M. Blackburn Australian National University Kathryn S. M cKinley Microsoft Research.
Copyright (c) 2004 Borys Bradel Myths and Realities: The Performance Impact of Garbage Collection Paper: Stephen M. Blackburn, Perry Cheng, and Kathryn.
Incremental Garbage Collection Uwe Kern 23. Januar 2002
A Principled Approach to Nondeferred Reference-Counting Garbage Collection † Pramod G. Joisha HP Labs, Palo Alto † This work was done when the author was.
Compilation (Semester A, 2013/14) Lecture 13b: Memory Management Noam Rinetzky Slides credit: Eran Yahav 1.
Investigating the Effects of Using Different Nursery Sizing Policies on Performance Tony Guan, Witty Srisa-an, and Neo Jia Department of Computer Science.
Concurrent Garbage Collection Presented by Roman Kecher GC Seminar, Tel-Aviv University 23-Dec-141.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
University of Washington Wouldn’t it be nice… If we never had to free memory? Do you free objects in Java? 1.
A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center.
Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.
Introduction to Garbage Collection. Garbage Collection It automatically reclaims memory occupied by objects that are no longer in use It frees the programmer.
Concurrent Mark-Sweep Presented by Eyal Dushkin GC Seminar, Tel-Aviv University
® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.
CS412/413 Introduction to Compilers and Translators April 21, 1999 Lecture 30: Garbage collection.
Reference Counting. Reference Counting vs. Tracing Advantages ✔ Immediate ✔ Object-local ✔ Overhead distributed ✔ Very simple Trivial implementation for.
GC Assertions: Using the Garbage Collector To Check Heap Properties Samuel Z. Guyer Tufts University Edward Aftandilian Tufts University.
An Efficient, Incremental, Automatic Garbage Collector P. Deutsch and D. Bobrow Ivan JibajaCS 395T.
Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.
Garbage Collection What is garbage and how can we deal with it?
Java 9: The Quest for Very Large Heaps
Concepts of programming languages
Cycle Tracing Chapter 4, pages , From: "Garbage Collection and the Case for High-level Low-level Programming," Daniel Frampton, Doctoral Dissertation,
Ulterior Reference Counting Fast GC Without The Wait
David F. Bacon, Perry Cheng, and V.T. Rajan
Strategies for automatic memory management
Memory Management Kathryn McKinley.
Presentation: Cas Craven
José A. Joao* Onur Mutlu‡ Yale N. Patt*
Reference Counting.
Garbage Collection What is garbage and how can we deal with it?
Reference Counting vs. Tracing
Presentation transcript:

An On-the-Fly Reference Counting Garbage Collector for Java Erez Petrank Technion – Israel Institute of Technology Joint work with Yossi Levanoni – Microsoft Corporation ACM Conference on Object Oriented Programming Systems Languages & Applications Tampa, Florida October 18, 2001

Levanoni & PetrankOn-the-Fly Reference Counting2 Garbage Collection Today Two classic approaches: –Tracing [McCarthy 1960]: trace reachable objects, reclaim objects not traced. –Reference counting [Collins 1960]: keep reference count for each object, reclaim objects with count 0. Today’s advanced environments: –multiprocessors –huge memories

Levanoni & PetrankOn-the-Fly Reference Counting3 Motivation for RC Reference Counting work is proportional to work on creations and modifications. –Can tracing deal with tomorrow’s huge heaps? Reference counting has good locality. Tracing rules JVM’s, is it justified? The Challenge: –RC write barriers seem too expensive. –RC seems impossible to “parallelize”.

Levanoni & PetrankOn-the-Fly Reference Counting4 This work An improved RC (suitable for Java) –Reduced overhead on write barrier, –Concurrent with low overhead: on-the-fly, no sync. operation in write barrier, multiprocessor. –Thus: low latency, high performance. Implementation: –JVM: SUN’s Java Virtual Machine –Platform: 4-way IBM Netfinity 8500R server with 550MHz Intel III Xeon and 2GB memory.

Levanoni & PetrankOn-the-Fly Reference Counting5 Agenda Introduction Motivation  The Algorithm Related issues Implementation and Measurements Conclusions

Levanoni & PetrankOn-the-Fly Reference Counting6 Terminology

Levanoni & PetrankOn-the-Fly Reference Counting7 Basic Reference Counting Each object has an RC field, new objects get o.RC:=1. When p that points to o 1 is modified to point to o 2 we do: o 1.RC--, o 2.RC++. if then o 1.RC==0: –Delete o 1. –Decrement o.RC for all sons of o 1. –Recursively delete objects whose RC is decremented to 0.

Levanoni & PetrankOn-the-Fly Reference Counting8 Basic Reference Counting Each object has an RC field, new objects get o.RC:=1. When p that points to o 1 is modified to point to o 2 we do: o 1.RC--, o 2.RC++. if then o 1.RC==0: –Delete o 1. –Decrement o.RC for all sons of o 1. –Recursively delete objects whose RC is decremented to 0. o1o1 o2o2 p

Deferred Reference Counting Problem: overhead on updating program variables (locals) costs too much. Solution [Deutch & Bobrow] : –Don’t update RC for locals. –“Once in a while”: collect all objects with o.RC=0 that are not referenced from local roots. Deferred RC reduces overhead by 80%. Used in most modern RC systems.

Multithreaded RC? Problem: –Parallel updates confuse counts: –(And more: Update ref counts in parallel  races.) A BDC Thread 2: Read A.next; A.next  D; B.RC- -; D.RC++ Thread 1: Read A.next; A.next  C; B.RC- -; C.RC++

Multithreaded RC Problem: –Parallel updates confuse counts. –Update ref counts in parallel  races. [DeTreville]: –Lock heap for each pointer modification. –Thread records its updates in a buffer. –Once in a while (snapshot alike): GC thread reads all buffers to update ref counts Reclaims all objects with 0 rc that are not local.

To Summarize… Overhead on write barrier is considered high. –Even with deferred RC of Deutch & Bobrow. Using reference counting concurrently with program threads seems to bear high synchronization cost. –Lock or “compare & swap” for each pointer update.

Improving RC Consider a pointer p that takes the following values between GC’s: O 0,O 1, O 2, …, O n. All RC algorithms perform 2n operations: O 0.RC--; O 1.RC++; O 1.RC--; O 2.RC++; O 2.RC--; … ; O n.RC++; But only two operations are needed: O 0.RC-- and O n.RC++ p O1O1 O2O2 O3O3 OnOn..... O4O4 O0O0

Improving RC cont’d Don’t record all pointer modifications. Record first modifications between GC’s (O 0 ). During the collection, for each recorded ptr p: –find O 0 by checking the record, –find O n by reading the heap during the collection. Apply only two operations for each such pointer: O 0.RC-- and O n.RC++ p O1O1 O2O2 O3O3 OnOn..... O4O4 O0O0 This reduces number of logging & counter updates by a factor of for normal benchmarks!

Improving Synch. Overhead Simple solutions bear unacceptable overhead: –DeTreville uses a lock for all pointer modifications –Simple alternatives require 3 compare-and- swap’s Our second contribution: –A carefully designed write barrier (and an observation) allows elimination of all sync. operations from the write barrier.

The write barrier Update(Object **slot, Object *new){ Object *old = *slot if (!IsDirty(slot)) { log( slot, old ) SetDirty(slot) } *slot = new } Observation: If two threads: 1.invoke the write barrier in parallel, and 2.both log an old value, then both record the same old value.

Intermediate Algorithm: Snapshot Oriented, Concurrent Use write barrier with program threads. To collect: –Stop all threads –Scan roots (locals) –get the buffers with modified slots –Clear all dirty bits. –Resume threads –For each modified slot: decrease rc for old value (written in buffer), increase rc for current value (“read heap”), –Reclaim non-local objects with rc 0.

The Sliding View Algorithm On-th-Fly Do all collection as threads run: –Read threads buffers (one thread at a time), –Clear all dirty bits, –Update reference counts, –Read roots of each thread, one at a time, –Reclaim (recursively) objects with rc 0. Note: rc’s are not correct for any specific point in time, yet, with care, most dead objects may be reclaimed! Borrow ideas from [Lamport et. Al.] Sliding View

Cycles Collection Our solution: use a tracing algorithm infrequently. Currently this is the most efficient solution. Cycle collectors have high cost. We propose a new on-the-fly mark & sweep algorithm that works best with the same sliding view. Can also be used “on its own”.

Implementation for Java Based on Sun’s JDK1.2.2 for Windows NT Main features –2-bit RC field per object (á la [Wise et. al.]) –A supplemental sliding view tracing algorithm –A custom allocator for on-the-fly RC: Multi leveled fine grained locking Supports sporadic reclamation of objects Supports sweeping the heap

Performance Measurements First multiprocessor measurements in a “normal” environment! –(Previous measured reports assumed one CPU is free for GC all the time.) Benchmarks: –Server benchmarks SPECjbb simulates business-like transactions in a large firm MTRT --- a multi-threaded ray tracer –Client benchmarks SPECjvm a suite of mostly single-threaded client benchmarks

Improved RC How many RC updates are eliminated? BenchmarkNo of storesNo of “first” stored Ratio of “first” stores jbb71,011,357264,1151/269 Compress64,905511/1273 Db33,124,78030,6961/1079 Jack135,174,7751,5461/87435 Javac22,042,028535,2961/41 Jess26,258,10727,3331/961 mpegaudio5,517,795511/108192

SPECjbb Latency (Max Transaction Time)

SPECjbb Throughput

MTRT Throughput

SPECjbb Heap Utilization

Client Performance

Related Work On-the-fly tracing: –Dijkstra et. al. (1976), Steele (1976), Lamport (1976), –Kung & Song (1977), Gries (1977) Ben-Ari (1982,1984), Huelsbergen et. al. (1993,1998) –Doligez-Gonthier-Leroy (1993-4), Domani-Kolodner- Petrank (2000) Concurrent reference counting: –DeTreville (1990), –Martinez et. al. (1990), Lins (1992) –Plakal & Fischer (2001), –Bacon et. al. (2001)

Conclusions A new algorithm for reference counting. –Low overhead on pointer modification –On-the-fly Implementation for Java Measurements show high throughput and low latency. To be out soon: A matching paper on the sliding view tracing collector.