1 GC Advantage: Improving Program Locality Xianglong Huang, Zhenlin Wang, Stephen M Blackburn, Kathryn S McKinley, J Eliot B Moss, Perry Cheng.

2 2 Motivation Memory gap How are Java programs affected?

3 3 Marksweep vs. Copying pseudojbb

4 4 Motivation Javac with perfect L1 and L2 cache. 16K L1 256K L2 Appel, GCTk. Breadth first

5 5 Motivation Copying collector can reorder objects Goal: take advantage of copying collectors reorder objects to improve locality

6 6 Exploring The Space Different policies for traversing roots Class-oblivious traversal orders  Which traversing order is the best? Class-based traversal orders  How to find the “important” data structure?

7 7 Different Root Traversal Policies Two different types of roots:  Stack, global variables  Remember sets (for generational) Different traversal orders  Copy all roots before traversing any children  Copy each root and its children (root-by-root)  Split roots Stack first and the children Remset first and the children

8 8 Experiment Setup JikesRVM, JMTk Generational copying collector with bounded nursery size of 4MB PseudoAdaptive 2 nd iteration

9 9 Different Root Traversal Policies RxR has the best mutator locality

10 10 Different Root Traversal Policies Total execution time

11 11 Exploring The Space Different policies for traversing roots Class-oblivious traversal orders  Which traversing order is the best? Class-based traversal orders  How to find the “important” data structure?

12 12 Different Traversal Orders Breadth first 1,2,3,4,5,6,7 Pure depth first 1,2,6,3,4,7,5 Pure depth first, LIFO 1,5,4,7,3,2,6 1 4 7 6 23 5

13 13 Different Traversal Orders Breadth first 1,2,3,4,5,6,7 Pure depth first 1,2,6,3,4,7,5 Pure depth first, LIFO 1,5,4,7,3,2,6 Partial depth first, 2 children 1,2,6,3,4,5,7 1 4 7 6 23 5

14 14 Class Oblivious Type Different traversal policies Partial DF is the best

15 15 Exploring The Space Different policies for traversing roots Class-oblivious traversal orders  Which traversing order is the best? Class-based traversal orders  How to find the “important” data structure?

16 16 Class-based Traversal Class-oblivious traversal orders inflexible Class-based object traversal  Static profiling  Dynamic sampling

17 17 Static Profiling Profile object accesses Find hot pairs with strong correlation Example  (1,4), (4,7) and (2,6) have strong correlation  Order: 1,4,7,2,6,3,5 1 4 7 6 23 5

18 18 Online Profiling Use the adaptive compiler sampling  Hot method  Hot basic block Use field accesses to indicate hot fields Example: (In a hot method) { Class A a; a.b=…; … } A B b …..

19 19 Online Profiling Micro benchmark results

20 20 Online Profiling Geometric mean

21 21 Reasons No advice for most of the objects copied  For jess, db and raytrace, we only pick <<1% of the objects as hot objects  5% for javac The hot fields are within the first 2 pointers  90% of the advised objects for javac

22 22 Online Profiling PseudoJBB mutator results  Generate advice for 23% of the copied objects  75% of the objects have adviced hot fields other than first 2

23 23 Questions Have we found all the hot objects?  Not all hot objects are connected? Is class-base good enough?  For pseudojbb, we need instance-based? Locality for the nursery objects?

24 24 Future Work Sampling technique  Catch more hot objects access Lower the threshold Hot objects that are not connected  Dynamically change the advice for phase changing Nursery locality Different traversal orders for cold objects Instance-based

25 25 Conclusion Reorder objects during copying collection can improve locality In class-oblivious traversal orders partial depth first order is the best Online profiling, class-based traversal is  more flexible, up to 50% better.  very low overhead, ~0% Still mysteries

26 26 Questions?

27 27 Answers? Lower the threshold of the sampling, not only the hot methods For objects with only 1 or 2 pointers, it maybe easier just depth first Maybe the nursery locality is more important Instance-based advice

28 28 Online Profiling Execution overhead

29 29 Online Profiling Micro benchmark results for mutator time

30 30 Different Root Traversal Policies _227_mtrt

31 31 Static Profiling Results

32 32 Answers? Most objects have only one pointer Percentage of objects copied by advice (whether it is really hot?)  For pseudojbb ~50%, for jess <<1%, for our micro benchmark ~16% Change! Half of the pairs do not form chains longer than 2 Maybe the nursery locality is more important

33 33 Class Oblivious Orderings Different traversal policies Partial DF is better pseudoJBB

34 34 Motivation MarkSweep vs. Copying Collector Mutator time of _213_javac

35 35 Motivation Mutator L2 misses _213_javac

