Download presentation
Presentation is loading. Please wait.
Published byJaycee Bainbridge Modified over 9 years ago
1
Portable, mostly-concurrent, mostly-copying GC for multi-processors Tony Hosking Secure Software Systems Lab Purdue University
2
Platform assumptions Symmetric multi-processor (SMP/CMP) Multiple mutator threads (Large heaps)
3
Desirable properties Maximize throughput Minimize collector pauses Scalability
4
Exploiting parallelism Avoid contention (Mostly-)Concurrent allocation (Mostly-)Concurrent collection
5
Concurrent allocation Use thread-private allocation “pages” Threads contend for free pages Each thread allocates from its own page multiple small objects per page, or multiple pages per large object
6
Concurrent collection: The tricolour abstraction Black “live” scanned cannot refer to white Grey “live” wavefront still to be scanned may refer to any color White hypothetical garbage
7
Garbage collection White = whole heap Shade root targets grey While grey nonempty Shade one grey object black Shade its white children grey At end, white objects are garbage
8
Copying collection Partition white from black by copying Reclaim white partition wholesale At next GC, “flip” black to white
9
Mutator threads Incremental collection
10
Mutator threads Concurrent collection Background GC thread
11
Concurrent mutators Mutation changes reachability during GC Loss of black/grey reference is safe Non-white object losing its last reference will be garbage at next GC New reference from black to white New reference may make target live Collector may never see new reference Mutations may require compensation
12
Compensation options Prevent mutator from creating black-to- white references write barrier on black read barrier on grey to prevent mutator obtaining white refs Prevent destruction of any path from a grey object to a white object without telling GC write barrier on grey
13
Mostly-copying GC [Bartlett] Copying collection with ambiguous roots Uncooperative compilers Untidy references Explicit pinning Pin ambiguously-referenced objects Shade their page grey without copying Assume heap accuracy Copy remaining heap-referenced objects
14
Incremental MCGC [DeTreville] Enforce grey mutator invariant –STW greys ambiguously-referenced pages –Read barrier on grey using VM page protection Read barrier –Stop mutator threads –Unprotect page –Copy white targets to grey –Shade page black –Restart threads Atomic system call wrappers unprotect parameter targets (otherwise traps in OS return error)
15
Concurrent MCGC? Stopping all threads at each increment is prohibitive on SMP & impedes concurrency BUT barriers difficult to place on ambiguous references with uncooperative compilers ALSO Preemptive scheduling may break wrapper atomicity
16
Mostly-concurrent MCGC Enforce black mutator invariant STW blackens ambiguously-referenced pages Read barrier on load of accurate (tidy) grey reference Read barrier: Blacken grey references as they are loaded No system call wrappers: arguments are always black
17
Read barrier on load of grey Object header bit marks grey objects Inline fast path checks grey bit in target header, calls out to slow path if set Out-of-line slow path: Lock heap meta-data For each (grey) source object in target page Copy white targets to grey Clear grey header bit Shade target page black Unlock heap meta-data
18
Coherence for fast path STW phase synchronizes mutators’ views of heap state Grey bits are set only in newly-copied objects (ie, newly-allocated grey pages) since most recent STW Mutators can never see a cleared grey header unless the page is also black Seeing a spurious grey header due to weak ordering is benign: slow path will synchronize
19
Implementation Modula-3: gcc-based compiler back-end No tricky target-specific stack-maps Compiler front-end emits barriers M3 threads map to preemptively-scheduled POSIX pthreads Stop/start threads: signals + semaphores, or OS primitives if available Simple to port: Darwin (OS X), Linux, Solaris, Alpha/OSF
20
Experiments Parallelized GCOld benchmark to permit throughput measurements for multiple mutators Measures steady-state GC throughput 2 platforms: 2 x 2.3GHz PowerPC Macintosh Xserve running OS X 10.4.4 8 x 700MHz Intel Pentium 3 SMP running Linux 2.6
21
Read Barriers: STW 1 user-level mutator thread, work=1
22
Elapsed time (s) 1 system-level mutator thread, work=1
23
Heap size 1 system-level mutator thread
24
BMU 1 system-level mutator thread, work=1000, ratio=1
25
Scalability work=1000, ratio=1, 8xP3
26
Java Hotspot server work=1000, 8xP3
27
Conclusions Mostly-concurrent,mostly-copying collection is feasible for multi-processors (proof-of- existence) Performance is good (scalable) Portable: changes only to compiler front-end to introduce barriers, and to GC run-time system Compiler back-end unchanged: full-blown optimizations enabled, no stack-map overheads
28
Future work Convert read barrier to “clean” only target object instead of whole page
29
BMU 1 system-level mutator thread, work=10, ratio=1
30
Scalability work=10, ratio=1, 8xP3
31
Java Hotspot server work=10, 8xP3
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.