Download presentation
Presentation is loading. Please wait.
Published byHorace Hood Modified over 9 years ago
1
Diverge-Merge Processor (DMP) Hyesoon Kim José A. Joao Onur Mutlu* Yale N. Patt HPS Research Group *Microsoft Research University of Texas at Austin
2
2 Outline Predicated Execution Diverge-Merge Processor (DMP) Implementation of DMP Experimental Evaluation Conclusion
3
3 Predicated Execution Convert control flow dependence to data dependence (normal branch code) CB D A T N p1 = (cond) branch p1, TARGET mov b, 1 jmp JOIN TARGET: mov b, 0 A B C B C D A (predicated code) A B C if (cond) { b = 0; } else { b = 1; } p1 = (cond) (!p1) mov b, 1 (p1) mov b, 0
4
4 Fetch Decode Rename Schedule RegisterRead Execute Benefit of Predicated Execution Predicated Execution can be high performance and energy-efficient. A B C D A E F Predicated Execution Branch Prediction Pipeline flush!! EDBF nop Fetch Decode Rename Schedule RegisterRead Execute A B A C BA CB D A DCBEAEDCFB A FEDC BAAFBCDE F EDABCFEABCD FED CBA FE DCAB EDC BAFAFBCDE
5
5 Limitations/Problems of Predication ISA: Predicate registers and predicated instructions Dynamic-Hammock Predication[Klauser ’ 98] can solve this problem but it is only applicable to simple hammocks. Adaptivity: Static predication is not adaptive to run-time branch behavior. Branch behavior changes based on input set, phase, control-flow path. Wish Branches[Kim ’ 05] Complex CFG: A large subset of control-flow graphs is not converted to predicated code. Function calls, loops, many instructions inside a region, and complex CFGs Hyperblock[Mahlke ’ 92] cannot adapt to frequently-executed paths dynamically.
6
6 Outline Predicated Execution Diverge-Merge Processor (DMP) Implementation of DMP Experimental Evaluation Conclusion
7
7 Diverge-Merge Processor (DMP) DMP can dynamically predicate complex branches (in addition to simple hammocks). The compiler identifies Diverge branches Control-flow merge (CFM) points The microarchitecture decides when and what to predicate dynamically.
8
8 select-µops (φ-nodes in SSA) Dynamic Predication A B C H Klauser et al.[PACT’98]: Dynamic-hammock predication CB H A T N mov R1, 1 jmp JOIN TARGET: mov R1, 0 A B C p1 = (cond) branch p1, TARGET (mov R1, 1) PR10 = 1 (mov R1, 0) PR11 = 0 PR12 = (cond) ? PR11 : PR10 Low-confidence H JOIN: add R5, R1, 1
9
9 Diverge-Merge Processor CB E D F G Frequently executed path Not frequently executed path A C E B H Insert select-µops Diverge Branch CFM point A H
10
10 diverge-branch executed block CFM point Diverge-Merge Processor CB E D F G Frequently executed path Not frequently executed path AAA AAA A H
11
11 Control-Flow Graphs A simple hammock A nested hammock A frequently-hammock A loop A........... non-merging DMP Dynamic Hammock SW pred Wish br. Dual-path
12
12 Dual-path Execution vs. DMP Low-confidence C D E F B D E F A B C D E F path 1path 2 C D E F B path 1path 2 Dual-pathDMP CFM
13
13 Control-Flow Graphs A simple hammock A nested hammock A frequently-hammock A loop A........... non-merging DMP Dynamic- hammock SW pred Wish br. Dual-path sometimes
14
14 Distribution of Mispredicted Branches 66% of mispredicted branches can be dynamically predicated in DMP.
15
15 Distribution of Mispredicted Branches 66% of mispredicted branches can be dynamically predicated in DMP.
16
16 Outline Predicated Execution Diverge-Merge Processor (DMP) Implementation of DMP Experimental Evaluation Conclusion
17
17 Fetch Mechanism CB E D F G predicted path A C E B H Diverge Branch CFM point A H Low Confidence Round-robin fetch
18
18 PR21 PR11 PR41 add pr21 pr13, #1 (p1) Dynamic Predication Arch.Phy.M R1 R2PR12 R3PR13 A C E B H branch r0, C add r1 r3, #1 add r4 r1, r3 add r1 r2, # -1 branch pr10,C p1 = pr10 add pr24 pr41, pr13add pr31 pr12, # -1(!p1) Arch.Phy.M R1 R2PR12 R3PR13 PR31 1 1 select-µop pr41 = p1? pr21 : pr31 RAT2 RAT1 Forks RAT, RAS, and GHR PR11
19
19 DMP Support ISA Support Mark diverge branches/CFM points. Compiler Support [CGO’07] The compiler identifies diverge branches and the corresponding CFM points. Hardware Support Confidence estimator Fetch mechanisms Load/store processing Instruction retirement Dynamic predication
20
20 Hardware Complexity Analysis ST-LD Forwarding SW pred. Dual path Select-Uop Gen. Rename Support Front-End Check Flush/no Flush Predicate Registers Confidence Estimator Wish br. Multi path Dyn. ham. DMP
21
21 Outline Predicated Execution Diverge-Merge Processor (DMP) Implementation of DMP Experimental Evaluation Conclusion
22
22 Simulation Methodology 12 SPEC 2000 INT, 5 SPEC 95 INT Different input sets for profiling and evaluation Alpha ISA execution driven simulator Baseline processor configuration 64KB perceptron predictor/O-GEHL (paper) Minimum 30-cycle branch misprediction penalty 8-wide, 512-entry instruction window 2 KB 12-bit history enhanced JRS confidence estimator Less aggressive processor (paper) Power model using Wattch
23
23 Different CFG types
24
24 Performance Improvement
25
25 Energy Consumption
26
26 Outline Predicated Execution Diverge-Merge Processor (DMP) Implementation of DMP Experimental Evaluation Conclusion
27
27 Conclusion DMP introduces the concept of frequently-hammocks and it dynamically predicates complex CFGs. DMP can overcome the three major limitations of software predication: ISA support, adaptivity, complex CFG. DMP reduces branch mispredictions energy efficiently 19% performance improvement, 9% less energy DMP divides the work between the compiler and the microarchitecture: The compiler analyzes the control-flow graphs. The microarchitecture decides when and what to predicate dynamically.
28
Thank You!!
29
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.