Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Prefetching to Improve Reference-Counting Garbage Collectors Harel Paz IBM Haifa Research Lab Erez Petrank Microsoft Research and Technion.

Similar presentations


Presentation on theme: "Using Prefetching to Improve Reference-Counting Garbage Collectors Harel Paz IBM Haifa Research Lab Erez Petrank Microsoft Research and Technion."— Presentation transcript:

1 Using Prefetching to Improve Reference-Counting Garbage Collectors Harel Paz IBM Haifa Research Lab Erez Petrank Microsoft Research and Technion

2 2Prefetching for RCPaz & Petrank, CC 2006 Motivation for Prefetching Memory speed falls way behind processor speed. A fast cache creates a buffer. Cache Mem CPU Early prefetch available on most platforms. Reduces (hides) memory stalls. But it may also increase number of executed instructions, and pollute the cache. Schedule-sensitive…

3 3Prefetching for RCPaz & Petrank, CC 2006 Garbage Collection Application allocates space dynamically. Garbage collector automatically frees “unreachable” space. Software engineering benefits. (Ameliorates memory leaks, no dereferencing of freed objects.) Built into modern languages like Java and C#.

4 4Prefetching for RCPaz & Petrank, CC 2006 Garbage Collection Two Classic Approaches Reference counting [Collins 1960]: keep a reference count for each object, reclaim objects with count 0. Tracing [McCarthy 1960]: trace reachable objects, reclaim objects not traced.

5 5Prefetching for RCPaz & Petrank, CC 2006 RC vs. Tracing Tracing work proportional to amount of live data RC work proportional to amount of application work (pointer updates and amount of allocation/reclamation). Recommended Settings for Tracing: Large heaps (infrequent collections) small live data (young generation).

6 6Prefetching for RCPaz & Petrank, CC 2006 RC vs. Tracing Tracing work proportional to amount of live data RC work proportional to amount of application work (pointer updates and amount of allocation/reclamation). Recommended Settings for RC: Large live data, infrequent updates (old generation) Tight heaps (frequent collections): –Systems that cache data (smart servers of various kinds, web servers, web browsers) –Small devices.

7 7Prefetching for RCPaz & Petrank, CC 2006 RC vs. Tracing Tracing work proportional to amount of live data RC work proportional to amount of application work (pointer updates and amount of allocation/reclamation). Recommended Settings for RC: Large live data, infrequent updates (old generation) Tight heaps (frequent collections): –Systems that cache data (smart servers of various kinds, web servers, web browsers) –Small devices. RC has good locality

8 8Prefetching for RCPaz & Petrank, CC 2006 RC Problems & Recent Advances RC was believed to bear high overhead: –High overhead on pointer updates –Atomic operation per each pointer update Solved lately: –[Levanoni-Petrank OOPSLA’01, TOPLAS’06] Missing RC algorithmic & engineering research Recently: Generations [Azatchi-Petrank CC’03, Paz et al CC’05]; Cycle collection [Paz et al. CC’05, TOPLAS’07]; Static analysis [Joisha ISMM’06]; This work: prefetching efficacy for RC.

9 9Prefetching for RCPaz & Petrank, CC 2006 Talk Plan Introduction (GC, RC, prefetching) Introduce a generic modern RC collector. Locate opportunities for prefetching Add code to prefetch Measure improved performance. Related work Conclusion two examples

10 10Prefetching for RCPaz & Petrank, CC 2006 Basic Reference Counting Each object has an rc field, new objects get o.rc:=1. When p that points to o 1 is modified to point to o 2 execute: o 2.rc++, o 1.rc--. if then o 1.rc==0: –Delete o 1. –Decrement o.rc for all children of o 1. –Recursively delete objects whose rc is decremented to 0. o1o1 o2o2 p

11 11Prefetching for RCPaz & Petrank, CC 2006 Deferred Reference Counting Problem: overhead on updating program variables (locals) is too high. Solution [Deutch & Bobrow 76] : –Don’t update rc for local variables (roots). –“Once in a while”: collect all objects with o.rc=0 that are not referenced from local variables. Deferred RC reduces overhead by 80%. Used in most modern RC systems. Still, overhead on heap pointer modification is too costly.

12 Reducing RC Overhead: An Observation [ Levanoni-Petrank ] Consider a pointer p that takes the following values between GC’s: O 0,O 1, O 2, …, O n. Naive RC algorithms performed 2n operations: O 0.rc--; O 1.rc++; O 1.rc--; O 2.rc++; O 2.rc--; … ; O n.rc++; But only two operations are needed: O 0.rc-- and O n.rc++ p O1O1 O2O2 O3O3 OnOn..... O4O4 O0O0

13 13Prefetching for RCPaz & Petrank, CC 2006 Updates Coalescing Time This reduces overhead dramatically Garbage Collection P  O 1 ; (record p; record p’s previous value O 0 ; mark p “modified”;) P  O 2 ; (do nothing) … P  O n ; (do nothing) Garbage Collection: For each modified slot p: Read p to get O n, read records to get O 0.Read p to get O n, read records to get O 0. O n.rc++, O 0.rc--O n.rc++, O 0.rc--

14 14Prefetching for RCPaz & Petrank, CC 2006 Some Technical Remarks We actually log each object that gets modified (and not just a single pointer). –Reason 1: we don’t want modified flag per pointer. –Reason 2: objects’ pointers tend to get modified together. When an object is first modified it is: –marked “modified” –added to list of modified objects (ModBuffer) –all its non-null pointer values are added to a list (DecBuffer) of rc’s to be decremented.

15 15Prefetching for RCPaz & Petrank, CC 2006 Pointer Modification and Allocation ModBuffer: Contains all modified objects: increment rc of their new children DecBuffer: Contains all previous pointer values of modified object. Update( O, offset, obj ) if ( ! O.modified ) set O.modified add O to ModBuffer for each non-null ptr of O add ptr to DecBuffer write(O, offset, obj) new object O Allocate (O) set O.modified add O to ModBuffer

16 16Prefetching for RCPaz & Petrank, CC 2006 Collector Building Blocks Two major collection components are: –The reference-count increment stage: –The reference-count decrement and object deletion stage An additional relevant code is –Preparation of structures for the allocator

17 17Prefetching for RCPaz & Petrank, CC 2006 Processing ModBuffer Process-ModBuffer for each object obj listed in ModBuffer do unset obj.modified for each pointer ptr of obj do increment rc of object referenced by ptr ModBuffer Obj-1 ABC Obj-2 DE Obj-3 FGHI

18 18Prefetching for RCPaz & Petrank, CC 2006 Processing ModBuffer with Prefetching Process-ModBuffer prefetch the first object listed in ModBuffer previous := dummyObject for each object obj listed in ModBuffer do prefetch the next object listed in ModBuffer unset obj.modified for each pointer ptr of obj do prefetch the rc field of the object referenced by ptr increment rc of object referenced by previous previous := ptr increment rc of object referenced by previous

19 19Prefetching for RCPaz & Petrank, CC 2006 Overall 5 Opportunities Two for processing the modBuffer (as discussed) Two when processing decBuffer One when preparing the data structure for the allocator.

20 20Prefetching for RCPaz & Petrank, CC 2006 Measurements Implemented on top of the Reference- counting collector of Jikes RVM –No cycle collection Benchmarks: SPECjbb2000, SPECjvm98, DaCapo Dual Intel's Xeon 1.8GHz processors workstation

21 21Prefetching for RCPaz & Petrank, CC 2006 Prefetch Effectiveness

22 22Prefetching for RCPaz & Petrank, CC 2006 Partition into Phases

23 23Prefetching for RCPaz & Petrank, CC 2006 Partitioning Improvements for ModBuffer Processing

24 24Prefetching for RCPaz & Petrank, CC 2006 Repeating Accesses Prfetching helps only if object is not in cache. Tracing: each object is accessed once. RC: reference counts accessed repeatedly, in an arbitrary manner. How many of the accesses are “repetitive”. –An object access is w-repetitive if this object was accessed within the most recent w accesses.

25 25Prefetching for RCPaz & Petrank, CC 2006 Repeating Accesses

26 26Prefetching for RCPaz & Petrank, CC 2006 Repeating Accesses for ModBuffer Method

27 27Prefetching for RCPaz & Petrank, CC 2006 Hardware Counters

28 28Prefetching for RCPaz & Petrank, CC 2006 Hardware Counter Averages Overhead Reduction Cycles Stalled L2 Cache Misses TLB Misses Overall GC Time 9.6%0.4%8.7%

29 29Prefetching for RCPaz & Petrank, CC 2006 Further Reading on our RC Project A New Reference Counting Algorithm [OOPSLA’01,TOPLAS’06] Generations for RC [CC’03, CC’05] Cycle Collection [CC’05, TOPLAS to appear] Prefetching for RC [CC’07] A Sliding Views Tracing Collector [OOPSLA’03]

30 30Prefetching for RCPaz & Petrank, CC 2006 Prefetching for Tracing Collectors Prefetch objects pushed on the stack [Boehm ISMM’00]. Prefetching for a generational copying garbage collector [Cahoon PhD, 2002]. Improved performance by using FIFO instead of stack [Cher et al. ASPLOS’04].

31 31Prefetching for RCPaz & Petrank, CC 2006 Conclusion Study of prefetching for reference counting. Potentially less effective than with tracing. –Many accesses are repetetive. In practice: –5 opportunities –Substantial collector efficiency improvement (8.7% on average). Part of a larger study of algorithms and engineering issues for reference counting.


Download ppt "Using Prefetching to Improve Reference-Counting Garbage Collectors Harel Paz IBM Haifa Research Lab Erez Petrank Microsoft Research and Technion."

Similar presentations


Ads by Google