Download presentation
Presentation is loading. Please wait.
Published byLeslie Baldwin Modified over 9 years ago
1
Message Analysis-Guided Allocation and Low-Pause Incremental Garbage Collection in a Concurrent Language Konstantinos Sagonas Jesper Wilhelmsson Uppsala University, Sweden
2
Goals of this work Efficiently implement concurrency through asynchronous message-passing Memory management with real-time characteristics o Short stop-times o High mutator utilization Design for multithreading
3
Our context: Erlang Designed for highly concurrent applications Soft Real-Time Light-weight processes No destructive updates Data types: atoms, numbers, PIDs, tuples, cons cells (lists), binaries heap data
4
Our context: the Erlang/OTP system Industrial-strength implementation Used in embedded applications Three memory architectures: [ISMM’02] o Private o Shared o Hybrid
5
Stack Heap Private heaps PP
6
PP O(|message|) copy
7
Private heaps PP Garbage collection is a private business Fast memory reclamation of terminated processes
8
O(1) Shared heap PP Global synchronization Longer stop-times No fast reclamation of process-local data
9
Hybrid architecture PP Message area Process-local heaps Big objects area
10
Several possible methods o User annotations o Dynamic monitoring [Petrank et al ISMM’02] o Static analysis guided allocation Allocating messages in the message area
11
Static message analysis [SAS’03] Similar to escape analysis Allocation is process-local by default o Possible messages allocated on message area o Copy on demand Analysis is quite precise o Typically finds 99% of all messages
12
Process-local heaps Private business: No synchronization required Message area Two generations Copying collector in young generation o Fast allocation Mark-and-sweep in old generation o Prevents repeated copying of old objects Garbage Collection in Hybrid Arch.
13
GC of the message area is a bottleneck 1.Generational process scanning 2.Remembered set in local heaps The root-set for the message area consists of all stacks and process-local heaps This is not enough... We need an incremental collector in the Message Area!
14
Properties of incremental collector No overhead on mutator No space overhead on heap objects Short stop-times High mutator utilization
15
Old generation Organization of the Message Area Fwd Black-map Young generation Nursery From- space Nursery and from- space always have a constant size, (=100k words) Storage area for forwarding pointers. Size bound by (currently = ) List of arbitrary sized areas Free-list, first-fit allocation Bit-array used to mark objects in mark- and-sweep
16
N limit N top allocation limit Nursery Organization of the Message Area
17
Incremental collector Two approaches to choose from: Work-based Reclaim n live words each step Time-based A step takes no more than t ms n and t are user-specified
18
Work-based collection The mutator wants to allocate need words reclaim = max( n, need ) N limit N top allocation limit Allocation limit = N top + reclaim
19
Time-based collection 1.User annotations (as in Metronome) 2.Dynamic worst-case calculation How much can the mutator allocate? How much live data is there?
20
Time-based collection GC = reclaimed after GC – reclaimed before GC GC steps = – reclaimed after GC GC w M = N free GC steps N limit N top allocation limit Allocation limit = N top + w M
21
Collecting the Message Area P1P2P3 FwdNurseryFromspace
22
Process Queue Collecting the Message Area P1P2P3 FwdFromspaceNursery
23
Process Queue Collecting the Message Area P1P2P3 FwdFromspaceNursery
24
Process Queue Collecting the Message Area P1P2P3 FwdFromspaceNursery P1
25
Process Queue P1 Collecting the Message Area P2P3 FwdFromspaceNursery
26
Process Queue P1 Collecting the Message Area P2P3 FwdFromspaceNursery
27
Process Queue P1 Collecting the Message Area P2P3 FwdFromspaceNursery allocation limit Cheap write barrier Link receiver to a list in the send operation
28
Process Queue Collecting the Message Area P2P3 FwdFromspaceNursery P1 allocation limit
29
Process Queue Collecting the Message Area P2P3 FwdFromspaceNursery P1 allocation limit
30
Process Queue Collecting the Message Area P2P3 FwdFromspaceNursery P1 allocation limit
31
Process Queue Collecting the Message Area P2P3 FwdFromspaceNursery allocation limit P1
32
Process Queue Collecting the Message Area P2P3 FwdFromspaceNursery P1 allocation limit
33
Process Queue Collecting the Message Area P2P3 FwdFromspaceNursery P1 allocation limit
34
Process Queue Collecting the Message Area P2P3 FwdFromspaceNursery P1 allocation limit
35
Process Queue Collecting the Message Area P2P3 FwdFromspaceNursery P1 allocation limit
36
Process Queue Collecting the Message Area P2P3 FwdFromspaceNursery allocation limit P1
37
Collecting the Message Area P2P3 FwdFromspaceNurseryallocation limit P1
38
Performance evaluation: Settings Intel Xeon 2.4 GHz, 1GB RAM, Linux Start with small process-local heaps (233 words, grows when needed) Measure active CPU time o using hardware performance monitors
39
Performance evaluation: Benchmarks Mnesia – Distributed database system 1,109 processes 2,892,855 messages Yaws – HTTP Web server 420 processes 2,275,467 messages Adhoc – Data mining application 137 processes 246,021 messages
40
Stop-times – Time-based Mnesia Yaws t = 1ms
41
Stop-times – Work-based AdhocYaws n = 2 words Mean: 3 Geo. Mean: 2 Mean: 9 Geo. Mean: 1
42
Stop-times – Work-based AdhocYaws n = 100 words Mean: 53 Geo. Mean: 46 Mean: 268 Geo. Mean: 36 Time ( s)
43
Bench- mark n = 2 MA GC n = 100 MA GC n = 1000 MA GC Non-Inc. MA GC Mnesia18216415688 Yaws373374242153 Adhoc2442037827 Message area total GC times incremental vs. non-incremental Times in ms
44
Bench- mark Mutator Local GC MA n = 2 MA n = 100 MA n = 1000 Mnesia52,9064,439182164156 Yaws237,62911,728373374242 Adhoc61,0458,19424420378 Runtimes – Incremental Times in ms
45
Minimum Mutator Utilization The fraction of time that the mutator executes in any time window [Cheng & Blelloch PLDI 2001]
46
Mutator Utilization – Work-based Adhoc Yaws n = 100 words
47
Concluding Remarks Memory allocator is guided by the intended use of data Incremental Garbage Collector High mutator utilization Small overhead on total runtime No mutator overhead Small space overhead Really short stop-times!
48
Runtimes incremental vs. non-incremental Times in ms Bench- mark Inc. Mutator Non-Inc. Mutator Mnesia52,90653,276 Yaws237,629240,985 Adhoc61,04561,578
49
Total GC times incremental vs. non-incremental Times in ms Bench- mark Inc. Local GC Non-Inc. Local GC Mnesia4,4394,487 Yaws11,72811,359 Adhoc8,1947,848
50
Mutator Utilization – Time-based Mnesia Yaws t = 1ms
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.