Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lengthening Traces to Improve Opportunities for Dynamic Optimization Chuck Zhao, Cristiana Amza, Greg Steffan, University of Toronto Youfeng Wu Intel Research.

Similar presentations


Presentation on theme: "Lengthening Traces to Improve Opportunities for Dynamic Optimization Chuck Zhao, Cristiana Amza, Greg Steffan, University of Toronto Youfeng Wu Intel Research."— Presentation transcript:

1 Lengthening Traces to Improve Opportunities for Dynamic Optimization Chuck Zhao, Cristiana Amza, Greg Steffan, University of Toronto Youfeng Wu Intel Research Feb. 16, 2007 Interact-12, HPCA

2 2 Intel’s StarDBT Project StarDBT A Dynamic Binary Translation framework Operates on traces, optimizes hot traces Long term goal: Use StarDBT to allow legacy apps to exploit TM support (NOT by automatically parallelizing legacy apps) Allow speculative sequential optimizations Use hardware TM’s checkpoint/restore Problem: default traces are too small TM overheads would overwhelm benefits Challenge: lengthening traces can be tricky

3 3 Trace Formation A B D C FE G A B D F G basic-block profiletrace profile C E on-trace blocks off-trace stub Control flow that goes off-trace can be costly

4 4 A B D F G 5% 100% - 10% = 90% A B D F G A B D F G 5% Trade-offs when Lengthening Traces Tradeoffs: longer traces have more optimization opportunities longer traces have more side-exit branches Completion ratio: likelihood of execution staying on trace percentage of execution reaching trace tail side-exit ratio 100% - 25% = 75% completion ratio Sweet spot exits in between, can we find it?

5 5 Our Work So Far (i.e., this talk) 1. Lengthening traces while maintaining completion ratios Through unrolling and straightening A characterization of the impact on traces length, completion ratio, unroll factor, … 2. Improving optimization opportunities on longer traces Improve Local Value Numbering (LVN) hits Measurement of impact on performance is pending 3. Performing on-the-fly actions by DBT system Decisions made by instrumenting/sampling code online

6 6 Related Work Binary Translation Systems Dynamo DynamoRIO PIN StarDBT transparent translation x86 legacy code Trace Collection and Optimizations Java JIT Dynamo, DynamoRIO, Mojo StarDBT x86 binary level MRET 2 to improve trace formation aggressive trace optimizations First full analysis of trace-lengthening issues for DBT systems

7 7 StarDBT Trace Types self type other trace type elsewhere type a b c d dispatcher

8 8 Lengthening Traces Through Unrolling Unrolling increases trace’s length, but reduces completion ratio a aaa 90% 72.9% 81% completion ratio:

9 9 Finding the Sweet-Spot Unroll Factor Unroll factorCompletion ratio 1p (0.99) N (10)p 10 (0.904) …… 2p 2 (0.98) 3p 3 (0.97) given p orig = 99% and p target = 90% N (11)p 11 (0.895) aaa... Traces with 100% completion ratio: set N = 10 chosen by system designer aa

10 10 Lengthening Traces Through Straightening b cb c We don’t yet implement/evaluate straightening d

11 11 Evaluation

12 12 Majority of hot traces have completion ratios in 90%-100% Distribution of Original Completion Ratios Original Completion Ratios original completion ratio

13 13 Impact of Unrolling on Hot Trace Size Lengthening increases hot trace size by more than 36% completion ratio 36% longer Select SPECIntCPU 2000 bmarks with MinneSpec input Average Number of Instructions

14 14 How Much are Traces Unrolled? Hot traces are unrolled on average by 1.38x or more Target completion ratio 1.38-1.58x Average Unroll Factor Not unrolled

15 15 Average Completion Ratio After Lengthening Lengthening traces reduces completion ratio by < 0.5% <0.5% 10% 20% 30% 40% 50% 60% 70% 80% 90% completion ratio Completion Ratio

16 16 Impact of Lengthening on Optimizations

17 17 Local Value Numbering (LVN) No need to build Control Flow Graph (CFG) Partial info No need to perform Data Flow Analysis (DFA) Expensive, rely on CFG Can be arranged into a single-pass scan Ease of implementation Relatively light weight algorithm Performs three optimizations: Common Subexpression Elimination (CSE) Copy Propagation (CP) Dead-Code Elimination (DCE) LVN is common in JIT optimizers

18 18 Ex: LVN On a Lengthened Trace … c = a + b d = a e = b Original Traces … 312 c 3 = a 1 + b 2 11 d 1 = a 1 22 e 2 = b 2 312 f 3 = d 1 + e 2 33 f 3 = c 3 44 d 4 = x 4 … CSE hit DCE hit … c = a + b e = b f = c d = x … Lengthened TraceOptimized Trace f = d + e d = x …

19 19 LVN Hits Improvement (%) 10+% more LVN hits are available through lengthening 35% 30% 25% 20% 15% 10% 5% % Increase in LVN Hits target completion ratio

20 20 Ongoing Work Complete DBT Optimization Framework Evaluate speculative optimizations on long hot traces with high completion ratios Automatically determine optimal transaction granularity Use HTM to support trace-based speculative optimizations

21 21 Control Speculation cmp … 10-% ld x=[y] 90+% ld.s x = [y] if(c){ chk.s x, recovery next: … } recovery: ld x=[y] jmp next A Compiler Framework for Speculative Analysis and Optimizations: Lin et. al, PLDI 03

22 22 Use HTM to Support Trace-based Speculative Optimizations cmp … 10-% ld x=[y] 90+% start_tx ld x = [y] if(c){ chk x, abort_tx … } commit_tx Use longer traces with high completion ratio as tx granularity HTM hardware support simplifies speculative optimization

23 23 Conclusion Traces can be effectively lengthened increase in trace size by 36+% decrease completion ratio by less than 0.5% Longer traces provide better opportunities for optimization increase in LVN hits by 10%+

24 24 Q + A

25 25 Complete StarDBT Optimization Framework X86 CISIC ISA code patching won’t work Really need a code generator and IR Design + implement a low-level Runtime IR close to hardware capture + represent all necessary low-level info easy to convert from/to machine code easy to implement analysis and optimizations Starting point Dynamo IR LLVM IR GCC RTL …

26 26 StarDBT Overall Structure

27 27 Trace Formation Heuristics MRET: Most Recent Execution Tail originally proposed by Dynamo Trace head loop head (backward branch target) sampling counter reaches a certain threshold Trace tail satisfy certain trace-tail conditions MRET 2 : 2-pass MRET perform 2 independent MRET trace formation intersect traces with common head

28 28 Traces and Hot Traces Trace MRET 2 recognize trace heads Trace tails satisfy certain conditions Blocks in between become a trace Hot Trace Based on recognized Traces Put in additional software counters head: head counter each early-exit branch: off-trace counters sampling: hot-trace’s completion ratio

29 29


Download ppt "Lengthening Traces to Improve Opportunities for Dynamic Optimization Chuck Zhao, Cristiana Amza, Greg Steffan, University of Toronto Youfeng Wu Intel Research."

Similar presentations


Ads by Google