Download presentation
Presentation is loading. Please wait.
Published byPreston Singleton Modified over 9 years ago
1
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center The Metronome: A Hard Real-time Garbage Collector
2
The Problem Real-time systems growing in importance – Many CPUs in a BMW: “80% of innovation in SW” Programmers left behind – Still using assembler and C – Lack productivity advantages of Java Result – Complexity – Low reliability or very high validation cost
3
Problem Domain Hard real-time garbage collection Uniprocessor – Multiprocessors very rare in real-time systems – Complication: collector must be finely interleaved – Simplification: memory model easier to program No truly concurrent operations Sequentially consistent
4
Metronome Project Goals Make GC feasible for hard real-time systems Provide simple application interface Develop technology that is efficient: – Throughput, Space comparable to stop-the- world BREAK THE MILLISECOND BARRIER – While providing even CPU utilization
5
Outline What is “Real-Time” Garbage Collection? Problems with Previous Work Overview of the Metronome Scheduling Empirical Results Conclusions
6
What is “ Real-Time ” Garbage Collection?
7
3 Uniprocessor GC Types STW Inc RT time GC #1GC #2
8
Utilization vs Time: 2s Window STW Inc RT
9
Utilization vs Time:.4 s Window STW Inc RT
10
Minimum Mutator Utilization STW Inc RT
11
What is Real-time? It Is … Maximum pause time < required response CPU Utilization sufficient to accomplish task – Measured with MMU Memory requirement < resource limit
12
Problems with Previous Work
13
No Compaction Hypothesis: fragmentation not a problem – Can use avoidance and coalescing [ Johnstone] – Non-moving incremental collection is simpler Problem: long-running applications – Reboot your car every 500 miles? 1 X max live 2 X max live 3 X max live 4 X max live
14
Copying Collection Idea: copy into to-space concurrently Compaction is part of basic operation Problem: space usage – 2 semi-spaces plus space to mutate during GC – Requires 4-8 X max live data in practice 1 X max live 2 X max live 3 X max live 4 X max live
15
Work-Based Scheduling The Baker fallacy: – “A real-time list processing system is one in which the time required by the elementary list processing operations…is bounded by a small constant” [Baker’78] Implicitly assumes GC work done in mutator – What does “small constant” mean? – Typically, constant is not so small And there is variability (fault rate analogy)
16
Overview of the Metronome
17
Is it real-time? Yes Maximum pause time < 4 ms MMU > 50% ± 2% Memory requirement < 2 X max live
18
Components of the Metronome Segregated free list allocator – Geometric size progression limits internal fragmentation Write barrier: snapshot-at-the-beginning [Yuasa] Read barrier: to-space invariant [Brooks] – New techniques with only 4% overhead Incremental mark-sweep collector – Mark phase fixes stale pointers Selective incremental defragmentation – Moves < 2% of traced objects Arraylets: bound fragmentation, large object ops Time-based scheduling OldNew
19
Segregated Free List Allocator Heap divided into fixed-size pages Each page divided into fixed-size blocks Objects allocated in smallest block that fits 24 16 12
20
Fragmentation on a Page Internal: wasted space at end of object Page-internal: wasted space at end of page External: blocks needed for other size external internal page-internal
21
Limiting Internal Fragmentation Choose page size P and block sizes s k such that – s k = s k-1 (1+ ρ ) – s max = P ρ Then – Internal and page-internal fragmentation < ρ Example: – P =16KB, ρ =1/8, s max = 2KB – Internal and page-internal fragmentation < 1/8
22
Fragmentation: ρ=1/8 vs ρ=1/2
23
Write Barrier: Snapshot-at-start Problem: mutator changes object graph Solution: write barrier prevents lost objects Logically, collector takes atomic snapshot – Objects live at snapshot will not be collected – Write barrier saves overwritten pointers [Yuasa] – Write buffer must be drained periodically WB A C BB
24
Read Barrier: To-space Invariant Problem: Collector moves objects (defragmentation) – and mutator is finely interleaved Solution: read barrier ensures consistency – Each object contains a forwarding pointer [Brooks] – Read barrier unconditionally forwards all pointers – Mutator never sees old versions of objects From-spaceTo-space A X Y Z A X Y Z A′ BEFOREAFTER
25
Read Barrier Optimizations Barrier variants: when to redirect – Lazy: easier for collector (no fixup) – Eager: better for performance (loop over a[i]) Standard optimizations: CSE, code motion Problem: pointers can be null – Augment read barrier for GetField(x,offset): tmp = x[offset]; return tmp == null ? null : tmp[redirect] – Optimize by null-check combining, sinking
26
Read Barrier Results Conventional wisdom: read barriers too slow – Previous studies: 20-40% overhead [Zorn,Nielsen]
27
Incremental Mark-Sweep Mark/sweep finely interleaved with mutator Write barrier ensures no lost objects – Must treat objects in write buffer as roots Read barrier ensures consistency – Marker always traces correct object With barriers, interleaving is simple
28
Pointer Fixup During Mark When can a moved object be freed? – When there are no more pointers to it Mark phase updates pointers – Redirects forwarded pointers as it marks them Object moved in collection n can be freed: – At the end of mark phase of collection n+1 From-spaceTo-space A X Y Z A′
29
Selective Defragmentation When do we move objects? – When there is fragmentation Usually, program exhibits locality of size – Dead objects are re-used quickly Defragment either when – Dead objects are not re-used for a GC cycle – Free pages fall below limit for performing a GC In practice: we move 2-3% of data traced – Major improvement over copying collector
30
Arraylets Large arrays create problems – Fragment memory space – Can not be moved in a short, bounded time Solution: break large arrays into arraylets – Access via indirection; move one arraylet at a time Optimizations – Type-dependent code optimized for contiguous case – Opportunistic contiguous allocation A1A2A3 A
31
Scheduling
32
Work-based Scheduling Trigger the collector to collect C W bytes – Whenever the mutator allocates Q W bytes MMU (CPU Utilization) Window Size (s) - log Space (MB) Time (s)
33
Time-based Scheduling Trigger collector to run for C T seconds – Whenever mutator runs for Q T seconds Space (MB) Time (s) MMU (CPU Utilization) Window Size (s) - log
34
Parameterization Mutator a * (ΔGC) m Collector R Tuner Δt s u Allocation Rate Maximum Live Memory Collection Rate Real Time Interval Maximum Used Memory CPU Utilization
35
Empirical Results
36
Pause time distribution: javac 12 ms Time-based SchedulingWork-based Scheduling
37
Utilization vs Time: javac Time (s) 0.4 0.2 0.0 0.6 0.8 1.0 Utilization (%) 0.4 0.2 0.0 0.6 0.8 1.0 Time-based SchedulingWork-based Scheduling 0.45
38
MMU: javac
39
Space Usage: javac
40
Conclusions
41
The Metronome provides true real-time GC – First collector to do so without major sacrifice Short pauses (4 ms) High MMU during collection (50%) Low memory consumption (2x max live) Critical features – Time-based scheduling – Hybrid, mostly non-copying approach – Integration w/compiler
42
Future Work Two main goals: – Reduce pause time, memory requirement – Increase predictability Pause time: – Expect sub-millisecond using current techniques – For 10’s of microseconds, need interrupt- based Predictability – Studying parameterization of collector – Good research area
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.