Download presentation
Presentation is loading. Please wait.
1
An On-the-Fly Reference Counting Garbage Collector for Java Erez Petrank Technion – Israel Institute of Technology Joint work with Yossi Levanoni – Microsoft Corporation ACM Conference on Object Oriented Programming Systems Languages & Applications Tampa, Florida October 18, 2001
2
Levanoni & PetrankOn-the-Fly Reference Counting2 Garbage Collection Today Two classic approaches: –Tracing [McCarthy 1960]: trace reachable objects, reclaim objects not traced. –Reference counting [Collins 1960]: keep reference count for each object, reclaim objects with count 0. Today’s advanced environments: –multiprocessors –huge memories
3
Levanoni & PetrankOn-the-Fly Reference Counting3 Motivation for RC Reference Counting work is proportional to work on creations and modifications. –Can tracing deal with tomorrow’s huge heaps? Reference counting has good locality. Tracing rules JVM’s, is it justified? The Challenge: –RC write barriers seem too expensive. –RC seems impossible to “parallelize”.
4
Levanoni & PetrankOn-the-Fly Reference Counting4 This work An improved RC (suitable for Java) –Reduced overhead on write barrier, –Concurrent with low overhead: on-the-fly, no sync. operation in write barrier, multiprocessor. –Thus: low latency, high performance. Implementation: –JVM: SUN’s Java Virtual Machine 1.2.2 –Platform: 4-way IBM Netfinity 8500R server with 550MHz Intel III Xeon and 2GB memory.
5
Levanoni & PetrankOn-the-Fly Reference Counting5 Agenda Introduction Motivation The Algorithm Related issues Implementation and Measurements Conclusions
6
Levanoni & PetrankOn-the-Fly Reference Counting6 Terminology
7
Levanoni & PetrankOn-the-Fly Reference Counting7 Basic Reference Counting Each object has an RC field, new objects get o.RC:=1. When p that points to o 1 is modified to point to o 2 we do: o 1.RC--, o 2.RC++. if then o 1.RC==0: –Delete o 1. –Decrement o.RC for all sons of o 1. –Recursively delete objects whose RC is decremented to 0.
8
Levanoni & PetrankOn-the-Fly Reference Counting8 Basic Reference Counting Each object has an RC field, new objects get o.RC:=1. When p that points to o 1 is modified to point to o 2 we do: o 1.RC--, o 2.RC++. if then o 1.RC==0: –Delete o 1. –Decrement o.RC for all sons of o 1. –Recursively delete objects whose RC is decremented to 0. o1o1 o2o2 p
9
Deferred Reference Counting Problem: overhead on updating program variables (locals) costs too much. Solution [Deutch & Bobrow] : –Don’t update RC for locals. –“Once in a while”: collect all objects with o.RC=0 that are not referenced from local roots. Deferred RC reduces overhead by 80%. Used in most modern RC systems.
10
Multithreaded RC? Problem: –Parallel updates confuse counts: –(And more: Update ref counts in parallel races.) A BDC Thread 2: Read A.next; A.next D; B.RC- -; D.RC++ Thread 1: Read A.next; A.next C; B.RC- -; C.RC++
11
Multithreaded RC Problem: –Parallel updates confuse counts. –Update ref counts in parallel races. [DeTreville]: –Lock heap for each pointer modification. –Thread records its updates in a buffer. –Once in a while (snapshot alike): GC thread reads all buffers to update ref counts Reclaims all objects with 0 rc that are not local.
12
To Summarize… Overhead on write barrier is considered high. –Even with deferred RC of Deutch & Bobrow. Using reference counting concurrently with program threads seems to bear high synchronization cost. –Lock or “compare & swap” for each pointer update.
13
Improving RC Consider a pointer p that takes the following values between GC’s: O 0,O 1, O 2, …, O n. All RC algorithms perform 2n operations: O 0.RC--; O 1.RC++; O 1.RC--; O 2.RC++; O 2.RC--; … ; O n.RC++; But only two operations are needed: O 0.RC-- and O n.RC++ p O1O1 O2O2 O3O3 OnOn..... O4O4 O0O0
14
Improving RC cont’d Don’t record all pointer modifications. Record first modifications between GC’s (O 0 ). During the collection, for each recorded ptr p: –find O 0 by checking the record, –find O n by reading the heap during the collection. Apply only two operations for each such pointer: O 0.RC-- and O n.RC++ p O1O1 O2O2 O3O3 OnOn..... O4O4 O0O0 This reduces number of logging & counter updates by a factor of 100- 1000 for normal benchmarks!
15
Improving Synch. Overhead Simple solutions bear unacceptable overhead: –DeTreville uses a lock for all pointer modifications –Simple alternatives require 3 compare-and- swap’s Our second contribution: –A carefully designed write barrier (and an observation) allows elimination of all sync. operations from the write barrier.
16
The write barrier Update(Object **slot, Object *new){ Object *old = *slot if (!IsDirty(slot)) { log( slot, old ) SetDirty(slot) } *slot = new } Observation: If two threads: 1.invoke the write barrier in parallel, and 2.both log an old value, then both record the same old value.
17
Intermediate Algorithm: Snapshot Oriented, Concurrent Use write barrier with program threads. To collect: –Stop all threads –Scan roots (locals) –get the buffers with modified slots –Clear all dirty bits. –Resume threads –For each modified slot: decrease rc for old value (written in buffer), increase rc for current value (“read heap”), –Reclaim non-local objects with rc 0.
18
The Sliding View Algorithm On-th-Fly Do all collection as threads run: –Read threads buffers (one thread at a time), –Clear all dirty bits, –Update reference counts, –Read roots of each thread, one at a time, –Reclaim (recursively) objects with rc 0. Note: rc’s are not correct for any specific point in time, yet, with care, most dead objects may be reclaimed! Borrow ideas from [Lamport et. Al.] Sliding View
19
Cycles Collection Our solution: use a tracing algorithm infrequently. Currently this is the most efficient solution. Cycle collectors have high cost. We propose a new on-the-fly mark & sweep algorithm that works best with the same sliding view. Can also be used “on its own”.
20
Implementation for Java Based on Sun’s JDK1.2.2 for Windows NT Main features –2-bit RC field per object (á la [Wise et. al.]) –A supplemental sliding view tracing algorithm –A custom allocator for on-the-fly RC: Multi leveled fine grained locking Supports sporadic reclamation of objects Supports sweeping the heap
21
Performance Measurements First multiprocessor measurements in a “normal” environment! –(Previous measured reports assumed one CPU is free for GC all the time.) Benchmarks: –Server benchmarks SPECjbb2000 --- simulates business-like transactions in a large firm MTRT --- a multi-threaded ray tracer –Client benchmarks SPECjvm98 --- a suite of mostly single-threaded client benchmarks
22
Improved RC How many RC updates are eliminated? BenchmarkNo of storesNo of “first” stored Ratio of “first” stores jbb71,011,357264,1151/269 Compress64,905511/1273 Db33,124,78030,6961/1079 Jack135,174,7751,5461/87435 Javac22,042,028535,2961/41 Jess26,258,10727,3331/961 mpegaudio5,517,795511/108192
23
SPECjbb Latency (Max Transaction Time)
24
SPECjbb Throughput
25
MTRT Throughput
26
SPECjbb Heap Utilization
27
Client Performance
28
Related Work On-the-fly tracing: –Dijkstra et. al. (1976), Steele (1976), Lamport (1976), –Kung & Song (1977), Gries (1977) Ben-Ari (1982,1984), Huelsbergen et. al. (1993,1998) –Doligez-Gonthier-Leroy (1993-4), Domani-Kolodner- Petrank (2000) Concurrent reference counting: –DeTreville (1990), –Martinez et. al. (1990), Lins (1992) –Plakal & Fischer (2001), –Bacon et. al. (2001)
29
Conclusions A new algorithm for reference counting. –Low overhead on pointer modification –On-the-fly Implementation for Java Measurements show high throughput and low latency. To be out soon: A matching paper on the sliding view tracing collector.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.