Download presentation
Presentation is loading. Please wait.
Published byBarbara Bradford Modified over 9 years ago
1
1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass), Zhenlin Wang (MTU), Perry Cheng (IBM)
2
2 Motivation Memory gap problem OO programs become more popular OO programs exacerbates memory gap problem –Automatic memory management –Pointer data structures –Many small methods Goal: improve OO program locality
3
3 Cache Performance Matters
4
4 Opportunity Generational copying garbage collector reorders objects at runtime
5
5 1 4 6 5 7 23 Copying of Linked Objects Breadth First 6 5 7 4 32 1
6
6 7123456 1 4 6 5 7 23 Copying of Linked Objects 6 5 7 4 32 1 Breadth First Depth First
7
7 7 123 4 56 Copying of Linked Objects Depth First Online Object Reordering 1 4 Breadth First 6 1 2 347 5 1 4 6 5 7 23 6 5 7 4 32 1 4 1
8
8 Outline Motivation Online Object Reordering (OOR) Methodology Experimental Results Conclusion
9
9 Online Object Reordering Where are the cache misses? How to identify hot field accesses at runtime? How to reorder the objects?
10
10 Where Are The Cache Misses? VM ObjectsStack Older Generation Heap structure: Nursery Not to scale
11
11 Where Are The Cache Misses?
12
12 Where Are The Cache Misses? Two opportunities to reorder objects in the older generation –Promote nursery objects –Full heap collection
13
13 How to Find Hot Fields? Runtime info (intercept every read)? Compiler analysis? Runtime information + compiler analysis Key: Low overhead estimation
14
14 Which Classes Need Reordering? Step 1: Compiler analysis –Excludes cold basic blocks –Identifies field accesses Step 2: JIT adaptive sampling identifies hot methods –Mark as hot field accesses in hot methods Key: Low overhead estimation
15
15 Example: Compiler Analysis Compiler Hot BB Collect access info Cold BB Ignore Compiler Access List: 1. A.b 2. …. …. Method Foo { Class A a; try { …=a.b; … } catch(Exception e){ …a.c }
16
16 Example: Adaptive Sampling Method Foo { Class A a; try { …=a.b; … } catch(Exception e){ …a.c } Adaptive Sampling Foo is hot Foo Accesses: 1. A.b 2. …. …. A.b is hot A B b ….. c A’s type information cb
17
17 1 4 6 5 7 23 Copying of Linked Objects 6 5 7 4 3 Online Object Reordering Type Information 1 43 2 1 Hot space Cold space
18
18 OOR System Overview Baseline Compiler Source Code Executing Code Adaptive Sampling Optimizing Compiler Hot Methods Access Info Database Register Hot Field Accesses Look Up Adds Entries GC: Copies Objects Affects Locality Advice GC: Copies Objects OOR addition JikesRVM componentInput/Output Optimizing Compiler Adaptive Sampling Improves Locality
19
19 Outline Motivation Online Object Reordering Methodology Experimental Results Conclusion
20
20 Methodology: Virtual Machine Jikes RVM –VM written in Java –High performance –Timer based adaptive sampling –Dynamic optimization Experiment setup –Pseudo-adaptive –2 nd iteration [Eeckhout et al.]
21
21 Methodology: Memory Management Memory Management Toolkit (MMTk): –Allocators and garbage collectors –Multi-space heap Boot image Large object space (LOS) Immortal space Experiment setup –Generational copying GC with 4M bounded nursery
22
22 Overhead: OOR Analysis Only BenchmarkBase Execution Time (sec) w/ only OOR Analysis (sec) Overhead jess 4.394.430.84% jack 5.795.820.57% raytrace 4.634.61-0.59% mtrt 4.954.990.70% javac 12.8312.70-1.05% compress 8.568.540.20% pseudojbb 13.3913.430.36% db 18.88 -0.03% antlr 0.940.91-2.90% hsqldb 160.56158.46-1.30% ipsixql 41.6242.431.93% jython 37.7137.16-1.44% ps-fun 129.24128.04-1.03% Mean -0.19%
23
23 Detailed Experiments Separate application and GC time Vary thresholds for method heat Vary thresholds for cold basic blocks Three architectures –x86, AMD, PowerPC x86 Performance counter: –DL1, trace cache, L2, DTLB, ITLB
24
24 Performance javac
25
25 Performance db
26
26 Performance jython Any static ordering leaves you vulnerable to pathological cases.
27
27 Phase Changes
28
28 Related Work Evaluate static orderings [Wilson et al.] –Large performance variation Static profiling [Chilimbi et al., and others] –Lack of flexibility Instance-based object reordering [Chilimbi et al.] –Too expensive
29
29 Conclusion Static traversal orders have up to 25% variation OOR improves or matches best static ordering OOR has very low overhead Past predicts future
30
30 Questions? Thank you!
31
31 OOR System Overview Records object accesses in each method (excludes cold basic blocks) Finds hot methods by adaptive sampling Reorders objects with hot fields in older generation during GC Copies hot objects into separate region
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.