Download presentation
Presentation is loading. Please wait.
Published byKathleen Owen Modified over 9 years ago
1
Trace Fragment Selection within Method- based JVMs Duane Merrill Kim Hazelwood VEE ‘08
2
2 Overview Would trace fragment dispatch benefit VMs with JITs? –Fragment-dispatch as a feedback-directed optimization Why? –Improve VM performance via better instruction layout Overview –Motivation –New scheme for trace selection –Viability in JikesRVM Evaluate opportunities for code improvement Evaluate trace selection overhead
3
3 Traditional VM Adaptive Code Generation Phase 3: More Advanced JIT Compilation Update Class/TOC dispatch tables, perform OSR Phase 2: JIT Method compilation Compilation Shape: Source Method Dispatch Shape: Corresponding MC Code Array & Machine Code Trace Fragment Phase 1: Interpreter Compilation Shape: Source Instruction Dispatch Shape:Corresponding MC Instruction(s) Machine Code Trace Fragment
4
4 SDT/ DBI/ Embedded VM Adaptive Code Generation Phase 3: More Advanced JIT Compilation Update Class/TOC dispatch tables, perform OSR Phase 2: JIT Method compilation Compilation Shape: Source Method Dispatch Shape: Corresponding MC Code Array & Machine Code Trace Fragment Phase 1: Interpreter Compilation Shape: Source Instruction Dispatch Shape:Corresponding MC Instruction(s) Machine Code Trace Fragment
5
5 Proposed VM Adaptive Code Generation Phase 3: More Advanced JIT Compilation Update Class/TOC dispatch tables, perform OSR Phase 2: JIT Method compilation Compilation Shape: Source Method Dispatch Shape(s): Corresponding MC Code Array & Machine Code Trace Fragment Phase 1: Interpreter Compilation Shape: Source Instruction Dispatch Shape:Corresponding MC Instruction(s) Machine Code Trace Fragment
6
6 Trace Fragment Dispatch Trace –A specific sequence of instructions observed at runtime –Span: Branches Procedure calls and returns Potentially arbitrary number of instructions Trace Fragment –A finite, linear sequence of machine code instructions –Single-entry, multiple-exit (viz. superblock) –Cached, linked A B E C DM NO P foo() bar() ABDMOPE to Cto N
7
7 Trace Fragment Dispatch: The Good Location, Location, Location –“Inlining-like”: Context sensitive Partial –Spatial locality provides most of achieved speedup Simple, low-cost “local” optimizations –Redundancy elimination Nimbly adjusts to changing behavior –Efficient –Lots of early-exits? Discard fragment and re-trace A B E C DM NO P foo() bar() ABDMOPE to Cto N
8
8 Trace Fragment Dispatch: The Bad Lacks optimization power –Data flow analysis –Code motion & loop optimizations Code expansion –Tail duplication –Exponential growth (if all paths maintained indefinitely) A B E C DM NO P foo() bar() ABDMOPE to Cto N
9
9 Trace Fragment Dispatch: The Bad to A A B E C DM NO P foo() bar() ABDMOPE to Cto N CDMOPE Lacks optimization power –Data flow analysis –Code motion & loop optimizations Code expansion –Tail duplication –Exponential growth (if all paths maintained indefinitely)
10
10 Trace Fragment Dispatch: The Bad to A A B E C DM NO P foo() bar() ABDMOPE to Cto N CDMOPE NPE Lacks optimization power –Data flow analysis –Code motion & loop optimizations Code expansion –Tail duplication –Exponential growth (if all paths maintained indefinitely)
11
11 Supplement Method Dispatch with Trace Dispatch Why? –Improve VM performance via better instruction layout –Easily-disposable fragments reflect current program behavior How? –JIT compiler inserts instrumentation into method code arrays: Monitor potential “hot trace headers” Record control flow –VM runtime assembles & patches trace fragments: Blocks “scavenged” from compiled code arrays Conditionals adjusted for proper fallthoughs Method code arrays patched to transfer control to fragments New fragments linked to existing fragments
12
12 Easy Fragment Management Improved trace selection –JIT to identify trace starting –VM to determine trace stopping locations “Friendly” encoding of instructions –Patch spots built-in –Avoid pesky PC-relative jumps (e.g., switch statements) Knowledge of language implementation features: –Calling conventions –Stack layout –Virtual method dispatch tables
13
13 Efficient Fragment Management “Mixed-mode” scheme: –Execution in both method code arrays & trace fragments Share the same register allocation –Control flows off-trace into method code arrays Fewer trace fragments Manageable code expansion –JVM control is already built into yield points –Disposable trace fragments No need to redo expensive analysis as behavior changes
14
14 Our Work: Trace Fragment Selection 1.Develop new trace selection methodology –Leverage JIT global analysis, VM runtime 2.Implement trace selection in JikesRVM and evaluate viability –Do recorded traces indicate room for code improvement? –Do the traces exhibit good characteristics? –Is instrumentation overhead reasonable?
15
15 Improved Trace Selection: Starting Locations 1.Loop Header Locations –Identified by JIT loop analysis –More accurate than “target of backward branch” heuristic 2.“Early exit” blocks –Allows trace fragments to be “layered” 3.Method prologue –Catches recursive execution A B E C DM NO P foo() bar() ABDMOPE to Cto N
16
16 to A Improved Trace Selection: Starting Locations 1.Loop Header Locations –Identified by JIT loop analysis –More accurate than “target of backward branch” heuristic 2.“Early exit” blocks –Allows trace fragments to be “layered” 3.Method prologue –Catches recursive execution A B E C DM NO P foo() bar() ABDMOPE to Cto N NPE
17
17 Improved Trace Selection: Starting Locations 1.Loop Header Locations –Identified by JIT loop analysis –More accurate than “target of backward branch” heuristic 2.“Early exit” blocks –Allows trace fragments to be “layered” 3.Method prologue –Catches recursive execution A BC D foo() ABD to C to Epilogue
18
18 Improved Trace Selection: Stopping Criteria 1.Cycle Returned to the loop header 2.Abutted Arrived at another loop header 3.Length Limited (unusual) 128 basic blocks encountered 4.Rejoined (unusual) Returned to a basic block already in trace 5.Exited (unusual) Exited the method without meeting above conditions. (Identifiable by stack height.) to A A B E C DM NO P foo() bar() ABDMOPE to Cto N NPE
19
19 Improved Trace Selection: Stopping Criteria 1.Cycle Returned to the loop header 2.Abutted Arrived at another loop header 3.Length Limited (unusual) 128 basic blocks encountered 4.Rejoined (unusual) Returned to a basic block already in trace 5.Exited (unusual) Exited the method without meeting above conditions. (Identifiable by stack height.) to A A B E C DM NO P foo() bar() ABDMOPE to Cto N NPE
20
20 JIT-Inserted Instrumentation (a) Assembly of original method code-block (b) Assembly of code-block to be used for tracing ABCD Loop header counters Paths through blocks Low-fidelity InstrumentationHigh-fidelity Instrumentation A JUMP_BLOCK TRACE_HEAD_A BCD TRACE_HEAD_BTRAMPOLINE_ATRAMPOLINE_B A’ INSTRUM_A B’C’D’ INSTRUM_B TRAMPOLINE_A’TRAMPOLINE_B’ INSTRUM_C TRAMPOLINE_C’TRAMPOLINE_D’ INSTRUM_D (Loop header)
21
21 JIT-Inserted Instrumentation (a) Assembly of original method code-block (b) Assembly of code-block to be used for tracing Low-fidelity InstrumentationHigh-fidelity Instrumentation A JUMP_BLOCK TRACE_HEAD_A BCD TRACE_HEAD_BTRAMPOLINE_ATRAMPOLINE_B A’ INSTRUM_A B’C’D’ INSTRUM_B TRAMPOLINE_A’TRAMPOLINE_B’ INSTRUM_C TRAMPOLINE_C’TRAMPOLINE_D’ INSTRUM_D ABCD Loop header counters Paths through blocks (Loop header)
22
22 JIT-Inserted Instrumentation (a) Assembly of original method code-block (b) Assembly of code-block to be used for tracing Low-fidelity InstrumentationHigh-fidelity Instrumentation A JUMP_BLOCK TRACE_HEAD_A BCD TRACE_HEAD_BTRAMPOLINE_ATRAMPOLINE_B A’ INSTRUM_A B’C’D’ INSTRUM_B TRAMPOLINE_A’TRAMPOLINE_B’ INSTRUM_C TRAMPOLINE_C’TRAMPOLINE_D’ INSTRUM_D ABCD Loop header counters Paths through blocks (Loop header)
23
23 JIT-Inserted Instrumentation (a) Assembly of original method code-block (b) Assembly of code-block to be used for tracing Low-fidelity InstrumentationHigh-fidelity Instrumentation A JUMP_BLOCK BCD TRACE_HEAD_BTRAMPOLINE_ATRAMPOLINE_B A’ INSTRUM_A B’C’D’ INSTRUM_B TRAMPOLINE_A’TRAMPOLINE_B’ INSTRUM_C TRAMPOLINE_C’TRAMPOLINE_D’ INSTRUM_D ABCD Loop header counters Paths through blocks (Loop header)
24
24 Improvement Opportunity A B E C DM NO P foo() bar() ABDECMNPO
25
25 Improvement Opportunity A B E C DM NO P foo() bar() ABDECMNPO 5B0480C6 (Low) 9BFE8D1F (High) Virtual Address Space (1GB)
26
26 Trace Layouts in Address Space (227_MTRT) Traces 5B0480C6 (Low) 9BFE8D1F (High) Virtual Address Space (1GB)
27
27 Improvement Opportunity A B E C DM NO P foo() bar() ABDECMNPO Gap Transition Fallthrough Transition
28
28 Trace Continuity DaCapo & SpecJVM98 Benchmarks –1/3 traces necessarily fragmented (inter-procedural) –Most intra-procedural traces non-contiguous
29
29 Transitions between basic blocks –Appropriate fallthough block 80% of the time –15% misprediction rate for local control flow. –20% of all transitions could benefit from trace fragment dispatch DistanceTransition Gaps 0 - 64B (cacheline)34.7% 65B - 4KB (page)40.7% 4KB+24.6%
30
30 Trace Characteristics –Cycle and abutted traces make the majority –Few length-limited, rejoined traces –Surprisingly large number of exited traces Sporadic loops
31
31 Instrumentation Overhead (Startup) –One-iteration tests. (40x) –Mixed slowdown results: 7.4% (jython), -6.5% (_227_mtrt) –Average startup overhead: 1.7%
32
32 Instrumentation Overhead (Steady State) –40-iteration tests. (8x) –Average steady-state overhead: 1.7%
33
33 Summary Envision trace fragment dispatch as a feedback-directed optimization –Locality optimizations not addressed by JIT compiler –Adapt to changing behavior without recompilation More accurate trace selection –Enabled by the co-location with the JIT and VM runtime Evaluated opportunity and cost –20% of basic block transitions do not use sequential fallthough. –25% of taken branches/calls transfer control flow to locations outside the VM page –Minimal startup and maintenance overhead for trace selection
34
34 Questions?
35
35 Improved Trace Selection: Starting Locations 1.Loop Header Locations –Identified by JIT loop analysis –More accurate than “target of backward branch” heuristic 2.“Early exit” blocks –Allows trace fragments to be “layered” 3.Method prologue –Catches recursive execution A B C foo() BC to D D
36
36 to A Improved Trace Selection: Starting Locations 1.Loop Header Locations –Identified by JIT loop analysis –More accurate than “target of backward branch” heuristic 2.“Early exit” blocks –Allows trace fragments to be “layered” 3.Method prologue –Catches recursive execution A B C foo() BC to D DA D
37
37 Normalized Trace Layouts (227_MTRT) Traces
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.