Presentation is loading. Please wait.

Presentation is loading. Please wait.

David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center The Metronome: A Hard Real-time Garbage Collector.

Similar presentations


Presentation on theme: "David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center The Metronome: A Hard Real-time Garbage Collector."— Presentation transcript:

1 David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center The Metronome: A Hard Real-time Garbage Collector

2 The Problem Real-time systems growing in importance – Many CPUs in a BMW: “80% of innovation in SW” Programmers left behind – Still using assembler and C – Lack productivity advantages of Java Result – Complexity – Low reliability or very high validation cost

3 Problem Domain Hard real-time garbage collection Uniprocessor – Multiprocessors very rare in real-time systems – Complication: collector must be finely interleaved – Simplification: memory model easier to program No truly concurrent operations Sequentially consistent

4 Metronome Project Goals Make GC feasible for hard real-time systems Provide simple application interface Develop technology that is efficient: – Throughput, Space comparable to stop-the- world BREAK THE MILLISECOND BARRIER – While providing even CPU utilization

5 Outline What is “Real-Time” Garbage Collection? Problems with Previous Work Overview of the Metronome Scheduling Empirical Results Conclusions

6 What is “ Real-Time ” Garbage Collection?

7 3 Uniprocessor GC Types STW Inc RT time GC #1GC #2

8 Utilization vs Time: 2s Window STW Inc RT

9 Utilization vs Time:.4 s Window STW Inc RT

10 Minimum Mutator Utilization STW Inc RT

11 What is Real-time? It Is … Maximum pause time < required response CPU Utilization sufficient to accomplish task – Measured with MMU Memory requirement < resource limit

12 Problems with Previous Work

13 No Compaction Hypothesis: fragmentation not a problem – Can use avoidance and coalescing [ Johnstone] – Non-moving incremental collection is simpler Problem: long-running applications – Reboot your car every 500 miles? 1 X max live 2 X max live 3 X max live 4 X max live

14 Copying Collection Idea: copy into to-space concurrently Compaction is part of basic operation Problem: space usage – 2 semi-spaces plus space to mutate during GC – Requires 4-8 X max live data in practice 1 X max live 2 X max live 3 X max live 4 X max live

15 Work-Based Scheduling The Baker fallacy: – “A real-time list processing system is one in which the time required by the elementary list processing operations…is bounded by a small constant” [Baker’78] Implicitly assumes GC work done in mutator – What does “small constant” mean? – Typically, constant is not so small And there is variability (fault rate analogy)

16 Overview of the Metronome

17 Is it real-time? Yes Maximum pause time < 4 ms MMU > 50% ± 2% Memory requirement < 2 X max live

18 Components of the Metronome Segregated free list allocator – Geometric size progression limits internal fragmentation Write barrier: snapshot-at-the-beginning [Yuasa] Read barrier: to-space invariant [Brooks] – New techniques with only 4% overhead Incremental mark-sweep collector – Mark phase fixes stale pointers Selective incremental defragmentation – Moves < 2% of traced objects Arraylets: bound fragmentation, large object ops Time-based scheduling OldNew

19 Segregated Free List Allocator Heap divided into fixed-size pages Each page divided into fixed-size blocks Objects allocated in smallest block that fits 24 16 12

20 Fragmentation on a Page Internal: wasted space at end of object Page-internal: wasted space at end of page External: blocks needed for other size external internal page-internal

21 Limiting Internal Fragmentation Choose page size P and block sizes s k such that – s k = s k-1 (1+ ρ ) – s max = P ρ Then – Internal and page-internal fragmentation < ρ Example: – P =16KB, ρ =1/8, s max = 2KB – Internal and page-internal fragmentation < 1/8

22 Fragmentation: ρ=1/8 vs ρ=1/2

23 Write Barrier: Snapshot-at-start Problem: mutator changes object graph Solution: write barrier prevents lost objects Logically, collector takes atomic snapshot – Objects live at snapshot will not be collected – Write barrier saves overwritten pointers [Yuasa] – Write buffer must be drained periodically WB A C BB

24 Read Barrier: To-space Invariant Problem: Collector moves objects (defragmentation) – and mutator is finely interleaved Solution: read barrier ensures consistency – Each object contains a forwarding pointer [Brooks] – Read barrier unconditionally forwards all pointers – Mutator never sees old versions of objects From-spaceTo-space A X Y Z A X Y Z A′ BEFOREAFTER

25 Read Barrier Optimizations Barrier variants: when to redirect – Lazy: easier for collector (no fixup) – Eager: better for performance (loop over a[i]) Standard optimizations: CSE, code motion Problem: pointers can be null – Augment read barrier for GetField(x,offset): tmp = x[offset]; return tmp == null ? null : tmp[redirect] – Optimize by null-check combining, sinking

26 Read Barrier Results Conventional wisdom: read barriers too slow – Previous studies: 20-40% overhead [Zorn,Nielsen]

27 Incremental Mark-Sweep Mark/sweep finely interleaved with mutator Write barrier ensures no lost objects – Must treat objects in write buffer as roots Read barrier ensures consistency – Marker always traces correct object With barriers, interleaving is simple

28 Pointer Fixup During Mark When can a moved object be freed? – When there are no more pointers to it Mark phase updates pointers – Redirects forwarded pointers as it marks them Object moved in collection n can be freed: – At the end of mark phase of collection n+1 From-spaceTo-space A X Y Z A′

29 Selective Defragmentation When do we move objects? – When there is fragmentation Usually, program exhibits locality of size – Dead objects are re-used quickly Defragment either when – Dead objects are not re-used for a GC cycle – Free pages fall below limit for performing a GC In practice: we move 2-3% of data traced – Major improvement over copying collector

30 Arraylets Large arrays create problems – Fragment memory space – Can not be moved in a short, bounded time Solution: break large arrays into arraylets – Access via indirection; move one arraylet at a time Optimizations – Type-dependent code optimized for contiguous case – Opportunistic contiguous allocation A1A2A3 A

31 Scheduling

32 Work-based Scheduling Trigger the collector to collect C W bytes – Whenever the mutator allocates Q W bytes MMU (CPU Utilization) Window Size (s) - log Space (MB) Time (s)

33 Time-based Scheduling Trigger collector to run for C T seconds – Whenever mutator runs for Q T seconds Space (MB) Time (s) MMU (CPU Utilization) Window Size (s) - log

34 Parameterization Mutator a * (ΔGC) m Collector R Tuner Δt s u Allocation Rate Maximum Live Memory Collection Rate Real Time Interval Maximum Used Memory CPU Utilization

35 Empirical Results

36 Pause time distribution: javac 12 ms Time-based SchedulingWork-based Scheduling

37 Utilization vs Time: javac Time (s) 0.4 0.2 0.0 0.6 0.8 1.0 Utilization (%) 0.4 0.2 0.0 0.6 0.8 1.0 Time-based SchedulingWork-based Scheduling 0.45

38 MMU: javac

39 Space Usage: javac

40 Conclusions

41 The Metronome provides true real-time GC – First collector to do so without major sacrifice Short pauses (4 ms) High MMU during collection (50%) Low memory consumption (2x max live) Critical features – Time-based scheduling – Hybrid, mostly non-copying approach – Integration w/compiler

42 Future Work Two main goals: – Reduce pause time, memory requirement – Increase predictability Pause time: – Expect sub-millisecond using current techniques – For 10’s of microseconds, need interrupt- based Predictability – Studying parameterization of collector – Good research area


Download ppt "David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center The Metronome: A Hard Real-time Garbage Collector."

Similar presentations


Ads by Google