Download presentation
Presentation is loading. Please wait.
Published byJenny Byrd Modified over 9 years ago
1
Reducing Pause Time of Conservative Collectors Toshio Endo (National Institute of Informatics) Kenjiro Taura (Univ. of Tokyo)
2
Incremental GC for soft-realtime applications [Steele 75] [Yuasa 90] [Doligez 93] Target: Multimedia, game etc. – Pauses should be <10ms Collection tasks are divided into small pieces Success: Pauses of <5ms [Cheng 01] – They assume compiler cooperation Reduction of pause for ‘ conservative ’ GCs is insufficient
3
Conservative GC [Boehm et al. 88] Mark sweep GC for C/C++ programs No compiler cooperation (e.g., write barriers) Mostly parallel GC [Boehm et al. 91] Incremental, conservative Pauses >100ms fairly common
4
Write barriers in conservative GCs No fine-grain write barrier by compiler VM ’ s write protection Coarse grain – Page level – Detect only first update after protection Restrict design
5
Incremental mark sweep algorithms Snapshot at beginning&DLG [Yuasa 90] [Doligez 93] – Make (conceptual) heap snapshot before marking – Promise short pause – Large space overhead with VM write barrier Incremental update [Steele 75] [Dijkstra 78 ] – Maintain consistency after marking Need final marking before finish Unlimitedly long! Only choice With VM
6
Contributions Analyze why previous algorithms fail Propose techniques to bound pauses & guarantee progress Show a `stress-test’ benchmark: iukiller Demonstrate experimental results – < 5ms in applications – < 12ms in the stress-test benchmark (constant across all heap sizes) (This talk omits parallel issues)
7
Overview of presentation Mostly parallel GC Techniques to reduce pause time Experimental results Related work Summary
8
Mostly parallel garbage collector (1) Start GC Write-protect heap Incremental markUser write fault Remember dirty (=updated) pages addr. Unprotect Final marking Incremental sweepUser Trap handler End GC
9
Mostly parallel garbage collector (2) Second update is un-trapped – Mark r in final phase Need final marking writer p q r p qr p q
10
Final marking heap root 1. Scan all dirty pages + root 2. Mark all unmarked objects from scanned region The amount of work is unbounded # of dirty pages Objects reachable from a dirty page Makes pauses >100ms
11
Overview of presentation Mostly parallel garbage collector Techniques to reduce pause time Experimental results Related work Summary
12
Goal of our collector Bound pause time (< constant) – Mutator utilization is important, but focus on pause Guarantee progress of collection Combine two techniques: Bound dirty pages (BD) Retry incremental marking (RI)
13
Bounding dirty pages (1) Basic collector produces many dirty pages Keep # of dirty pages < a given limit – If exceeds limit, choose a dirty page – Re-protect, scan, clean it – Good: Reduce task in final marking – Bad: More protection cost
14
Bounding dirty pages (2) Is pause now bounded? … No! Unmarked objects reachable from a dirty page are not bounded heap root
15
Retrying incremental marking (1) Start GC Write-protect heap Incremental markUser Trap handler Final marking Incremental sweepUser End GC Finished before limit? Yes. No. Retry! Keep works of final marking < a given limit
16
Retrying incremental marking (2) Good: Bound length of single final marking Bad: Risk of starvation (no progress) – Final marking may abort before finishing scanning (unbounded) dirty pages – Unmarked objects may ‘ escape ’ from collector
17
The worst case Abort a final marking with no progress Final aborts write Final aborts write Incr. finishes Incr. finishes
18
Ensuring bounded pause and progress Either is insufficient … Need two techniques: – Bounding dirty pages (BD) – Retrying incremental marking (RI) BD Every final marking can scan all dirty pages It finds some unmarked objects, if any
19
Overview of presentation Mostly parallel garbage collector Techniques to reduce pause time Experimental results Related work Summary
20
Experimental Environments 400MHz UltraSPARC, Solaris 8 Four GCs – Stop: Stop-the-world GC – Basic: Basic incremental GC – BD: Use bounding dirty pages – BD+R: Use bounding dirty pages + retrying incremental marking Basic/BD/BD+R: GC starts when heap usage > 75% BD/BD+R: # of dirty pages < 16
21
The iukiller synthetic benchmark ‘ Stress-test ’ benchmark for mostly parallel GC Trees tend to escape from collector Final marking tends to be long root large binary trees repeat
22
Results of iukiller benchmark: the maximum pause time Previous collectors fail – > 1.8 seconds – The larger the heap, the longer BD+R achieves <12ms pause – independent from heap size
23
Application benchmarks Programs written in C/C++ – deltablue: an incremental constraint solver (25MB) – espresso: a logic optimizer for PLA (10MB) – N-Body: an N-Body solver with Barnes-Hut (15MB) – CKY: a context free grammar parser (40MB) – Cube: a Rubik ’ s cube puzzle solver (8MB)
24
Results of application benchmarks: the maximum pause time BD+R achieves <5ms pause in five applications BD is also OK (< 16ms) 215ms 283ms
25
Results of application benchmarks: overhead BD/BD+R is <9% slower than Basic – More protection All incr. GCs are 1 — 53% slower than Stop – VM write barrier – Floating garbage – More GC cycles Total execution times ( ‘ Stop ’ =1)
26
Related work [Appel et al. 88] – Copy GC with VM read barrier. Slower than write barrier [Furuso et al. 91] – Snapshot-at-beginning on VM. Large space overhead Recent version of [Boehm et al. 91] – Time limit on final marking. Risks of starvation [Printezis et al. 00] [Ossia et al. 02] – Keep # of dirty cards small. Final marking is still unbounded
27
Summary An incremental conservative GC Short pause (<5ms in 5 applications) GC progress Use both techniques: – Bounding dirty pages – Retrying incremental marking
28
Future direction Reducing overhead of BD – Strategy for proper limit for dirty pages Bounding roots to be scanned – Protect stacks partially
29
Mostly parallel garbage collector (cont. 1) Stop-the-world GC time Mostly parallel GC time User GC GC cycle Initialization &protection concurrent marking final marking concurrent sweeping markingsweeping
30
Mostly parallel garbage collector (cont. 2) Protect heap and start marking from roots Proceed concurrent marking User program may – update pointers – create new objects Concurrent marking finishes – But some reachable objects are unmarked yet!! Perform final marking atomically from – marked objects in dirty pages – roots heap root
31
Mostly parallel garbage collector (cont. 2) Protect heap and start marking from roots Proceed concurrent marking User program may – update pointers – create new objects Concurrent marking finishes – But some reachable objects are unmarked yet!! Perform final marking atomically from – marked objects in dirty pages – roots heap root heap root
32
Mostly parallel garbage collector (cont. 2) Protect heap and start marking from roots Proceed concurrent marking User program may – update pointers – create new objects Concurrent marking finishes – But some reachable objects are unmarked yet!! Perform final marking atomically from – marked objects in dirty pages – roots heap root heap root
33
Mostly parallel garbage collector (cont. 2) Protect heap and start marking from roots Proceed concurrent marking User program may – update pointers – create new objects Concurrent marking finishes – But some reachable objects are unmarked yet!! Perform final marking atomically from – marked objects in dirty pages – roots heap root heap root
34
Mostly parallel garbage collector (cont. 2) Protect heap and start marking from roots Proceed concurrent marking User program may – update pointers – create new objects Concurrent marking finishes – But some reachable objects are unmarked yet!! Perform final marking atomically from – marked objects in dirty pages – roots heap root heap root
35
Technique 2: Retrying concurrent marking Instead of a single final marking, we repeat concurrent marking and termination check – If termination check takes longer time than a given limit, it aborts and restarts concurrent marking Boehm ’ s implementation on Web repeats termination check up to twice time GC cycle Initialization concurrent marking termination check concurrent sweeping
36
Discussion on techniques Each technique is not novel, but combining the two is essential Without retrying, final marking may be still long Without bounding, progress of termination check may be insufficient w/o bounding with bounding termination check aborted termination check found unmarked objects
37
Other techniques Concurrent protecting Atomic protecting takes O(heap-size) time! Allocating black in later stages of GC cycle – Allocating always black retains many short lived objects – Allocating always white (unmarked) may prevent GC progress Allocating white first, and black in later
38
Results: Minimum mutator utilization (MMU) Window sizes are on a log scale The optimized collector shows good MMUs for small windows
39
Results of application benchmarks: the number of repetition BD+R: Repetition of incr. marking per GC Usually <2times No infinite loop The worst case is 5 times. Need improvement?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.