0 Parallel and Concurrent Real-time Garbage Collection Part II: Memory Allocation and Sweeping David F. Bacon T.J. Watson Research Center.

Slides:



Advertisements
Similar presentations
Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
Advertisements

1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center.
David Notkin Autumn 2009 CSE303 Lecture 13 This space for rent.
Compilation /15a Lecture 13 Compiling Object-Oriented Programs Noam Rinetzky 1.
Garbage Collection What is garbage and how can we deal with it?
Reducing Pause Time of Conservative Collectors Toshio Endo (National Institute of Informatics) Kenjiro Taura (Univ. of Tokyo)
Run-time organization  Data representation  Storage organization: –stack –heap –garbage collection Programming Languages 3 © 2012 David A Watt,
An On-the-Fly Mark and Sweep Garbage Collector Based on Sliding Views Hezi Azatchi - IBM Yossi Levanoni - Microsoft Harel Paz – Technion Erez Petrank –
Locality-Conscious Lock-Free Linked Lists Anastasia Braginsky & Erez Petrank 1.
0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.
1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.
MOSTLY PARALLEL GARBAGE COLLECTION Authors : Hans J. Boehm Alan J. Demers Scott Shenker XEROX PARC Presented by:REVITAL SHABTAI.
Memory Management Chapter 5 Mooly Sagiv
0 Parallel and Concurrent Real-time Garbage Collection Part I: Overview and Memory Allocation Subsystem David F. Bacon T.J. Watson Research Center.
Honors Compilers Addressing of Local Variables Mar 19 th, 2002.
Correctness-Preserving Derivation of Concurrent Garbage Collection Algorithms Martin T. Vechev Eran Yahav David F. Bacon University of Cambridge IBM T.J.
Incremental Garbage Collection
Compilation 2007 Garbage Collection Michael I. Schwartzbach BRICS, University of Aarhus.
1 An Efficient On-the-Fly Cycle Collection Harel Paz, Erez Petrank - Technion, Israel David F. Bacon, V. T. Rajan - IBM T.J. Watson Research Center Elliot.
Damien Doligez Georges Gonthier POPL 1994 Presented by Eran Yahav Portable, Unobtrusive Garbage Collection for Multiprocessor Systems.
CS61C Midterm Review on C & Memory Management Fall 2006 Aaron Staley Some material taken from slides by: Michael Le Navtej Sadhal.
Reference Counters Associate a counter with each heap item Whenever a heap item is created, such as by a new or malloc instruction, initialize the counter.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
A Parallel, Real-Time Garbage Collector Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao.
1 Overview Assignment 6: hints  Living with a garbage collector Assignment 5: solution  Garbage collection.
SEG Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms.
Garbage Collection. Terminology Heap – a finite pool of data cells, can be organized in many ways Roots - Pointers from the program into the Heap. – We.
An example demonstrating the ABA problem Xinyu Feng University of Science and Technology of China.
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center The Metronome: A Hard Real-time Garbage Collector.
1 CSC 222: Computer Programming II Spring 2004 Pointers and linked lists  human chain analogy  linked lists: adding/deleting/traversing nodes  Node.
Parallel GC (Chapter 14) Eleanor Ainy December 16 th
A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.
Parallel Checking of Expressive Heap Assertions Greta YorshMartin VechevEran YahavBard Bloom IBM T.J. Watson Research Center.
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
Incremental Garbage Collection Uwe Kern 23. Januar 2002
Memory Management II: Dynamic Storage Allocation Mar 7, 2000 Topics Segregated free lists –Buddy system Garbage collection –Mark and Sweep –Copying –Reference.
OOPLs /FEN March 2004 Object-Oriented Languages1 Object-Oriented Languages - Design and Implementation Java: Behind the Scenes Finn E. Nordbjerg,
Chapter 4 Memory Management Virtual Memory.
Garbage Collection and Classloading Java Garbage Collectors  Eden Space  Surviver Space  Tenured Gen  Perm Gen  Garbage Collection Notes Classloading.
Garbage Collection and Memory Management CS 480/680 – Comparative Languages.
Concurrent Garbage Collection Presented by Roman Kecher GC Seminar, Tel-Aviv University 23-Dec-141.
1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.
Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center ControllingFragmentation and Space Consumption in the Metronome.
A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center.
2/4/20161 GC16/3011 Functional Programming Lecture 20 Garbage Collection Techniques.
Real-time collection for multithreaded Java Microcontroller Garbage Collection. Garbage Collection. Application of Java in embedded real-time systems.
Concurrent Mark-Sweep Presented by Eyal Dushkin GC Seminar, Tel-Aviv University
® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.
The Metronome Washington University in St. Louis Tobias Mann October 2003.
By Anand George SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org)
GC Assertions: Using the Garbage Collector To Check Heap Properties Samuel Z. Guyer Tufts University Edward Aftandilian Tufts University.
Dynamic Compilation Vijay Janapa Reddi
Concepts of programming languages
Joint work with Yong Li, Xinyu Feng, Zhong Shao and Yu Zhang
David F. Bacon, Perry Cheng, and V.T. Rajan
Memory Management and Garbage Collection Hal Perkins Autumn 2011
Simulated Pointers.
Strategies for automatic memory management
Memory Management Kathryn McKinley.
Chapter 12 Memory Management
Multicore programming
List Allocation and Garbage Collection
Parallel GC and Heap Management in Poly/ML and Isabelle
Where is all the knowledge we lost with information? T. S. Eliot
Continuously and Compacting By Lyndon Meadow
CSC Multiprocessor Programming, Spring, 2011
Reference Counting.
Presentation transcript:

0 Parallel and Concurrent Real-time Garbage Collection Part II: Memory Allocation and Sweeping David F. Bacon T.J. Watson Research Center

1 Page Data Synchronization, Take free

2 Page Data, Take free Thread Thread 2

3 free ? 16 free 16 free 64 free 64 free 256 free temp full temp full

4 Compare-and-Swap* boolean CAS(int* addr, int old, int new) { atomic { if (* addr == old) { *addr = new; return true; } else { return false; } IBM 370 since 1970, Intel since 1989 Load-Reserved: ARM 6 since 1991, IBM POWER2 since 1993

5 Non-locking* Stack push(Page page) { do { Page t = top; page->next = t; } while (! CAS(&top, t, page); } Page pop() { Page t = null; do { t = top; if (t == null) return; Page n = t->next; } while (! CAS(&top, t, n); return t; } top

6 A A B B C C A A B B C C C C A A B B Thread 1 x=pop() start Thread 2 a=pop() b=pop() top Thread 2 push(A) C C B B top Thread 1 pop() finish n t n t n t A A b a b

7 Correct Non-locking Stack push(PageId page) { do { = top; page->next = t; } while (! CAS(&top,, ); } PageId pop() { PageId t = 0; do { = top; if (t==0) return 0; } while (! CAS(&top,, ); return t; }

8 Really Correct Non-locking Stack push(Page page) { do { Page t = LOAD_AND_RESERVE(&top); page->next = t; } while (! STORE_CONDITIONAL(&top, page)); } Page pop() { Page t = null; do { Page t = LOAD_AND_RESERVE(&top); if (t==null) return null; } while (! STORE_CONDITIONAL(&top, t->next)); return t; }

9 Coalescing Ideal: –getMultiplePages(n) returns best fit –Does not cause mutator activities to block –Does not cause collection activities to block –Does not suffer pathology under contention Reality: tradeoffs between –Quality of fit –Probability of blocking –Overhead (number of CAS’s) free ?

10 Cartesian Tree + Try-lock 44 ? x.left.size = x.right.size x.left.address < x.address < x.right.address

11 Non-locking Rebalancing Tree Homework… …but I won’t use it. Solution would be –Too complex to be reliable –Require too many CAS operations to be fast

12 The Fix: Transactional Memory AtomicCartesianTree extends CartesianTree { atomic Block getBlock() { return super.getBlock(); } atomic void releaseBlock(Block b) { super.releaseBlock(b); } atomic Block[] getMultipleBlocks(int n) { return super.getMultipleBlocks(n); }

13 Hierarchical Non-locking Buddy?

14 So much for allocation… Garbage Collection

15 Handling the Concurrency ? r r p p T T a b X X a b U U a b Z Z a b W W a b Y Y a b Y Y a b X X a b GC Threads Application (“mutator”) Threads Trace Collect Format Load pointer Store pointer Allocate

16 Yuasa Snapshot Algorithm (1990) Logically –Mark-and-Sweep Style Algorithm –Take a “copy-on-write” heap snapshot –Collect the garbage in that snapshot Physically –Stop all threads –Copy their stacks (“roots”) –Force them to save over-written pointers –Trace roots and over-written pointers * Dijktstra’75, Steele’76

17 1: Take Logical Snapshot Stack r r p p T T a b X X a b U U a b Z Z a b W W a b Y Y a b

18 2(a): Copy Over-written Pointers Stack p p r r T T a b X X a b U U a b Z Z a b W W a b Y Y a b

19 2(b): Trace Stack p p T T a b X X a b U U a b Z Z a b W W a b Y Y a b W W a b Z Z a b Y Y a b X X a b * Color is per-object mark bit

20 2(c): Allocate “Black” Stack p p T T a b X X a b U U a b Z Z a b W W a b Y Y a b s s V V a b W W a b Z Z a b Y Y a b X X a b

21 3(a): Sweep Garbage Stack p p T T a b X X a b U U a b Z Z a b W W a b Y Y a b W W a b Z Z a b Y Y a b X X a b s s V V a b free

22 3(b): Allocate “White” Stack p p T T a b X X a b U U a b Z Z a b Y Y a b W W a b Z Z a b Y Y a b X X a b s s V V a b free V2 a b t t

23 4: Clear Marks Stack p p T T a b X X a b U U a b Z Z a b Y Y a b Z Z a b Y Y a b X X a b s s V V a b free V2 a b t t V V a b W W a b W W a b

24 Yuasa Algorithm Phases Snapshot stack and global roots Trace Flip Sweep Flip Clear Marks * Synchronous

25 Metronome-2 Concurrency GC Worker Threads GC Master Thread Application Threads (may do GC work)

26 Metronome-2 Phases Initiation –Setup turn double barrier on Root Scan –Active Finalizer scan –Class scan –Thread scan** switch to single barrier, color to black –Debugger, JNI, Class Loader scan Trace –Trace* –Trace Terminate*** Re-materialization 1 –Weak/Soft/Phantom Reference List Transfer –Weak Reference clearing** (snapshot) Re-Trace 1 –Trace Master –(Trace*) –(Trace Terminate***) Re-materialization 2 –Finalizable Processing Clearing –Monitor Table clearing –JNI Weak Global clearing –Debugger Reference clearing –JVMTI Table clearing –Phantom Reference clearing Re-Trace 2 –Trace Master –(Trace*) –(Trace Terminate***) –Class Unloading Completion –Finalizer Wakeup –Class Unloading Flush –Clearable Compaction** –Book-keeping * Parallel ** Callback *** Single actor symmetric Flip –Move Available Lists to Full List* (contention) turn write barrier off –Flush Per-thread Allocation Pages** switch allocation color to white switch to temp full list Sweeping –Sweep* –Switch to regular Full List** –Move Temp Full List to regular Full List* (contention)

27 Part 3: Sweep Initiation –Setup turn double barrier on Root Scan –Active Finalizer scan –Class scan –Thread scan** switch to single barrier, color to black –Debugger, JNI, Class Loader scan Trace –Trace* –Trace Terminate*** Re-materialization 1 –Weak/Soft/Phantom Reference List Transfer –Weak Reference clearing** (snapshot) Re-Trace 1 –Trace Master –(Trace*) –(Trace Terminate***) Re-materialization 2 –Finalizable Processing Clearing –Monitor Table clearing –JNI Weak Global clearing –Debugger Reference clearing –JVMTI Table clearing –Phantom Reference clearing Re-Trace 2 –Trace Master –(Trace*) –(Trace Terminate***) –Class Unloading Completion –Finalizer Wakeup –Class Unloading Flush –Clearable Compaction** –Book-keeping * Parallel ** Callback *** Single actor symmetric Flip –Move Available Lists to Full List* (contention) turn write barrier off –Flush Per-thread Allocation Pages** switch allocation color to white switch to temp full list Sweeping –Sweep* –Switch to regular Full List** –Move Temp Full List to regular Full List* (contention)

28 Assume Trace Has Finished Stack p p T T a b X X a b U U a b Z Z a b W W a b Y Y a b s s V V a b W W a b Z Z a b Y Y a b X X a b

29 JVM Application Thread Structure Thread 1 inVM  color Thread 2 epoch 37 inVM  color writebuf 43 pthread pthread_t writebuf 41 pthread pthread_t next thread list dblb  sweep  epoch 43 sweep  BarrierOn true BarrierOn false

30 “Move Available” Phase

31 free ? 16 free 16 free 64 free 64 free 256 free 256 temp full temp full available

32 free ? 16 free 16 free 64 free 64 free 256 free 256 temp full temp full available contention?

33 Generic Structure of Move Phase: Shared Monotonic Work Pool GC Worker Threads GC Master Thread Application Threads (may do GC work) pool of work

34 Performing a GC Work Quantum every (500us) tryToDoSomeWorkForGC(); tryToDoSomeWorkForGC() { short phase = enterPhase(); if (phase > 0) doQuantum(phase); leavePhase(); }

35 Phase Entry short enterPhase() { thread->epoch = Epoch.newest; while (true) = PhaseInfo; if (workers == phaseTable[phase].maxWorkers) return -1; if (CAS(PhaseInfo,, )) return phase; } PhaseInfo phase workers PhaseInfo phase workers moveavail 3 ABA?

36 Phase Exit short leavePhase() { while (true) = PhaseInfo; if (workers > 1 || moreWorkForPhase(phase)) newPhase = ; else if (phase == TRACE_PHASE) newPhase = ; else newPhase = ; if (CAS(PhaseInfo,, newPhase)) return newPhase; } PhaseInfo phase workers PhaseInfo phase workers moveavail 3

37 “Per-Thread Flush” Phase

38 Flush Per-Thread Pages Thread 1 inVM  color Thread 2 epoch 37 inVM  writebuf 43 pthread pthread_t writebuf 41 pthread pthread_t next thread list dblb  epoch 43 BarrierOn true BarrierOn false color PhaseInfo phase workers PhaseInfo phase workers flush 1 color sweep  sweep 

39 free ? 16 free 16 free 64 free 64 free 256 free 256 temp full temp full available Put Full White Pages on temp-full List

40 How is this Phase Different? GC Worker Threads GC Master Thread Application Threads (may do GC work)  pool of work Per-Thread State Update

41 Callback Mechanism bool doCallbacks(int callbackId) { bool alldone = true; LOCK(threadlist); for each (Thread thread in threadlist) if (hasCallbackWork(thread, callbackId); if (! thread.inVM && TRY_LOCK(thread)) doCallbackWorkForThread(thread, callbackId); UNLOCK(thread); else postCallback(thread, callbackId); alldone = false; UNLOCK(threadlist); return alldone; } * Non-locking list with cursor?

42 “Sweep” Phase

43 free ? 16 free 16 free 64 free 64 free 256 free 256 temp full temp full 64 available Sweep Phase

44 “Switch Full List” Phase

45 Switch Back to Normal Full List Thread 1 inVM  color Thread 2 epoch 37 inVM  writebuf 43 pthread pthread_t writebuf 41 pthread pthread_t next thread list dblb  epoch 43 BarrierOn true BarrierOn false color PhaseInfo phase workers PhaseInfo phase workers switchfull 1 color sweep  sweep 

46 “Drain Temp Full” Phase

47 free ? 16 free 16 free 64 free 64 free 256 free 256 temp full temp full 64 available Drain Temp Full

48 Done with Collection!! But… –That Was the Easy Part And there are still some problems –What if threads go out to lunch?

49