Download presentation
Presentation is loading. Please wait.
1
Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch (moti@nmt.co.il)
2
The problem Some data die young, and some data die old. In recursions, most deep stack unwind very infrequently. Scanning unchanged roots may take a dominant time.
3
We compare the following types Semispace stack collection (Cheney). Generational collector. General Collection with stack marker. Pretenuring with Stack marker.
4
Semispace copy collection Scanning the Stack for roots, and copy data that reachable from the roots to unused areas (Nursery, Survive). Disadvantage: –all data is copied, when some data die young, and some die old.
5
Generational collection Base on semispace copy collection. Arrange some heap areas according to the objects life time. Disadvantage: –For programs with deep call chain, The stack scanning can take a lot of time. –Long time object are typically copied several times before they are tenured.
6
General stack collection Use stack marker in order to cache the root scan. Disadvantage: –Long time object are typically copied several times before they are tenured
7
Pretenuring Making a run, in order to build profiles for each object life time according to it’s allocation site.
8
TIL Compiler Optimization compiler for ML (SML). Intentional polymorphism. Nearly Tag free garbage collection. Conventional functional language optimization. Loop Optimization.
9
Stack Scanning At any execution point, data is live if it is accessed as the program continue to execute. The collector need to retain data that is accessible by following the all pointers roots. The roots are registers and stack slots.
10
Difficulties Accurate determine the root set. In callee-save registers, the content of a register or stack slot can come from caller frames so stack frames cannot be decoded in isolation. In Polymorphism the compiler cannot statically compute whether a value is a pointer of not.
11
Finding the root When the GC is called from mutator, the return address indicate the current execution point (Return Address). By the RA (Using a table), we can determine the frame layout of the GC - caller frame. By continuing this way, we can find the root.
12
Finding the roots Determine the roots set from the initial frame, By scanning downwards. The two ways scanning is needed since there are stack slots that their type depend on the previous stack slot.
13
Trace table information The Return address (RA). Stack frame size. For each stack-slot we record its trace: –Pointer: The compiler statically determine that it’s a pointer. –Non Pointer - The value is not a root. –Calee-save + (Register) - Calle-save information.
14
Trace table information - 2 –Compute: Compiler couldn’t statically determine the pointer status of a value. Have an additional information to determine where the type of such value reside.
15
Stack frames and the corresponding table entry. RA=0x2001c718 42 Slot 1 Slot 2 Slot 3 Slot 4 Slot 5 Slot 6 5556 777879 INT 3.1415 Stack Frame RA=0x2001c718 Frame size = 6 Non Pointer Pointer Compute: Stack 4 Entry 1 Entry 2 Entry 3 Entry 4 Entry 5 Entry 6 Entry 7 Compute: Calle $10 …Trace info on Register Table Entry
16
Semispace against Generations collections
17
SemiSpace against Generations collections
19
Semispace against Generations collections
20
Stack marking When the stack is deep, scanning the root may take a dominant time of the GC time. Most of the stack usually doesn’t change from the previous GC, to the current GC. Marking the stack frames that didn’t changed, can significant improve the roots scanning.
21
Marking the stack - 1st method On each stack frame, add a flag whether it was changed. The collector reset this flag when passing it, while the mutator set this flag. Disadvantage: –The mutator is involved in the GC process. –The compiler need to do several operations for the GC, on each return, while most time the GC is not used.
22
Marking the stack - 2nd method When scanning the roots, set the RA of every n stack frame to a special stub function. The stub function hold a table of the RA. The stub function notes that this frame was deactivate, and continue to the original RA.
23
Marking the stack - Method 2 The Problems with this method: –Functions doesn’t always return normally. –When exception is raised, It’s invoked in stack order until there is a matching handler. –Fortunately, we can hold a value of M that updated on exceptions that is contains the shallowest stack pointer that occurred as a result of raised exception.
24
Stack Marker improvement
25
Pretenuring Using profile data to predict the survival rate of an object. We speculate that object allocated from the same place in program would have to be similar lifetime. In order to check this hypothesis we divide the program to some heap allocations site.
26
Pretenuring - 2 The compiler is modified in order to update a table of allocation sites when creating. During garbage collection the entries are updated. We scan allocation area after each collection to located death object and update their allocation site.
27
Pretenuring - 3 Using this information we can create statistics about the number, size and average age of object created from each allocation site. We include only allocation sites that included at least 1% of the allocations, or 1% of the copied data.
28
The profile results
30
The results According to the results we can see that 90% of the allocation have very short life time, but 96 - 99 % of the copied date are generated from 4 sites.
31
Using the profile data Object that created from allocated site that have long life time, directly created into the older generation. Problem: An object directly allocated in the older generation may have a reference to an object in the younger generation.
32
Solutions ? Allocating that type of object in the young generation. –May lead to a lot more copying. Remember the area of the older generation that have reference to the young reference, and scan it on each minor generation. –Scanning without copying doesn’t take a lot of time.
33
Improvement of pretenuring (ms)
34
Improvement of pretenuring (bytes copy)
35
Comparing between all the methods
36
Conclusion for pretenuring The reduction of GC time is smaller that excepted from the reduction of data copied. Since we have to check the younger generations, the cost of GC time is still proportional to the live data (With a smaller constant).
37
Suggestion to improve the speed Creating a control-flow and data-flow analysis on objects.
38
Conclusions Generational collector is twice faster on GC time. And also improve the GC time, since it’s improve the cache locality. For programs that use deep stack, caching the roots data can improve GC time up to 74%. Profiling the heap can improve the speed for some cases by 50%.
39
The End
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.