Wrong Path Events and Their Application to Early Misprediction Detection and Recovery David N. Armstrong Hyesoon Kim Onur Mutlu Yale N. Patt University.

Slides:



Advertisements
Similar presentations
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,
Advertisements

Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.
Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *
Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.
Dynamic Branch Prediction
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
A Scalable Front-End Architecture for Fast Instruction Delivery Paper by: Glenn Reinman, Todd Austin and Brad Calder Presenter: Alexander Choong.
Mitigating the Performance Degradation due to Faults in Non-Architectural Structures Constantinos Kourouyiannis Veerle Desmet Nikolas Ladas Yiannakis Sazeides.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
W04S1 COMP s1 Seminar 4: Branch Prediction Slides due to David A. Patterson, 2001.
EECS 470 Branch Prediction Lecture 6 Coverage: Chapter 3.
WCED: June 7, 2003 Matt Ramsay, Chris Feucht, & Mikko Lipasti University of Wisconsin-MadisonSlide 1 of 26 Exploring Efficient SMT Branch Predictor Design.
Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003.
1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.
EECC551 - Shaaban #1 lec # 5 Fall Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
Wish Branches A Review of “Wish Branches: Enabling Adaptive and Aggressive Predicated Execution” Russell Dodd - October 24, 2006.
Energy Efficient Instruction Cache for Wide-issue Processors Alex Veidenbaum Information and Computer Science University of California, Irvine.
Branch Target Buffers BPB: Tag + Prediction
Address-Value Delta (AVD) Prediction Onur Mutlu Hyesoon Kim Yale N. Patt.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.
Spring 2003CSE P5481 Control Hazard Review The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction.
1 Runahead Execution A review of “Improving Data Cache Performance by Pre- executing Instructions Under a Cache Miss” Ming Lu Oct 31, 2006.
1 Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3)
Neural Methods for Dynamic Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University.
CB E D F G Frequently executed path Not frequently executed path Hard to predict path A C E B H Insert select-µops (φ-nodes SSA) Diverge Branch CFM point.
Diverge-Merge Processor (DMP) Hyesoon Kim José A. Joao Onur Mutlu* Yale N. Patt HPS Research Group *Microsoft Research University of Texas at Austin.
Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps José A. Joao *‡ Onur Mutlu ‡* Hyesoon Kim § Rishi Agarwal.
Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.
Analysis of Branch Predictors
1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.
Out-of-Order Commit Processors Adrián Cristal (UPC), Daniel Ortega (HP Labs), Josep Llosa (UPC) and Mateo Valero (UPC) HPCA-10, Madrid February th.
Diverge-Merge Processor (DMP) Hyesoon Kim José A. Joao Onur Mutlu* Yale N. Patt HPS Research Group *Microsoft Research University of Texas at Austin.
Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques.
Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt.
Computer Structure Advanced Branch Prediction
Computer Architecture 2015 – Advanced Branch Prediction 1 Computer Architecture Advanced Branch Prediction By Yoav Etsion and Dan Tsafrir Presentation.
CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.
2D-Profiling Detecting Input-Dependent Branches with a Single Input Data Set Hyesoon Kim M. Aater Suleman Onur Mutlu Yale N. Patt HPS Research Group The.
On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University.
Prophet/Critic Hybrid Branch Prediction B B B
Computer Architecture: Runahead Execution Prof. Onur Mutlu Carnegie Mellon University.
Fast Path-Based Neural Branch Prediction Daniel A. Jimenez Presented by: Ioana Burcea.
Samira Khan University of Virginia April 12, 2016
Computer Architecture: Branch Prediction (II) and Predicated Execution
CS203 – Advanced Computer Architecture
Computer Structure Advanced Branch Prediction
Dynamic Branch Prediction
Computer Architecture Advanced Branch Prediction
CMSC 611: Advanced Computer Architecture
Chang Joo Lee Hyesoon Kim* Onur Mutlu** Yale N. Patt
Module 3: Branch Prediction
So far we have dealt with control hazards in instruction pipelines by:
EE 382N Guest Lecture Wish Branches
Address-Value Delta (AVD) Prediction
15-740/ Computer Architecture Lecture 24: Control Flow
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
So far we have dealt with control hazards in instruction pipelines by:
Lecture 10: Branch Prediction and Instruction Delivery
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Adapted from the slides of Prof
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
rePLay: A Hardware Framework for Dynamic Optimization
So far we have dealt with control hazards in instruction pipelines by:
Lois Orosa, Rodolfo Azevedo and Onur Mutlu
Presentation transcript:

Wrong Path Events and Their Application to Early Misprediction Detection and Recovery David N. Armstrong Hyesoon Kim Onur Mutlu Yale N. Patt University of Texas at Austin

Motivation Branch predictors are not perfect. Branch misprediction penalty causes significant performance degradation. We would like to reduce the misprediction penalty by detecting the misprediction and initiating recovery before the mispredicted branch is resolved.

Branch Misprediction Resolution Latency

Performance Potential of Early Misprediction Recovery

An Observation and A Question In an out-of-order processor, some instructions are executed on the mispredicted path (wrong-path instructions). Is the behavior of wrong-path instructions different from the behavior of correct-path instructions? –If so, we can use the difference in behavior for early misprediction detection and recovery.

Talk Outline Concept of Wrong Path Events Types of Wrong Path Events Experimental Evaluation Realistic Early Misprediction Recovery Shortcomings and Future Research Conclusions

What is a Wrong Path Event? An instance of illegal or unusual behavior that is more likely to occur on the wrong path than on the correct path. Wrong Path Event = WPE Probability (wrong path | WPE) ~ 1

Why Does a WPE Occur? A wrong-path instruction may be executed before the mispredicted branch is executed. –Because the mispredicted branch may be dependent on a long-latency instruction. The wrong-path instruction may consume a data value that is not properly initialized.

WPE Example from eon: NULL pointer dereference 1 : for (int i=0 ; i< length(); i++) { 2 : structure *ptr = array[i]; 3 : if (ptr->x) { 4 : //... 5 : } 6 : }

Beginning of the loop xEFF8B0x8ABCD0x0 Array boundary Array of pointers to structs i = 0 1 : for (int i=0 ; i< length(); i++) { 2 : structure *ptr = array[i]; 3 : if (ptr->x) { 4 : //... 5 : } 6 : }

First iteration xEFF8B0x8ABCD0x0 Array boundary Array of pointers to structs i = 0 ptr = x8ABCD0 1 : for (int i=0 ; i< length(); i++) { 2 : structure *ptr = array[i]; 3 : if (ptr->x) { 4 : //... 5 : } 6 : }

First iteration xEFF8B0x8ABCD0x0 Array boundary Array of pointers to structs i = 0 ptr = x8ABCD0 * ptr 1 : for (int i=0 ; i< length(); i++) { 2 : structure *ptr = array[i]; 3 : if (ptr->x) { 4 : //... 5 : } 6 : }

Loop branch correctly predicted xEFF8B0x8ABCD0x0 Array boundary Array of pointers to structs i = 1 1 : for (int i=0 ; i< length(); i++) { 2 : structure *ptr = array[i]; 3 : if (ptr->x) { 4 : //... 5 : } 6 : }

Second iteration xEFF8B0x8ABCD0x0 Array boundary Array of pointers to structs i = 1 ptr = xEFF8B0 1 : for (int i=0 ; i< length(); i++) { 2 : structure *ptr = array[i]; 3 : if (ptr->x) { 4 : //... 5 : } 6 : }

Second iteration xEFF8B0x8ABCD0x0 Array boundary Array of pointers to structs i = 1 ptr = xEFF8B0 * ptr 1 : for (int i=0 ; i< length(); i++) { 2 : structure *ptr = array[i]; 3 : if (ptr->x) { 4 : //... 5 : } 6 : }

Loop exit branch mispredicted xEFF8B0x8ABCD0x0 Array boundary Array of pointers to structs i = 2 1 : for (int i=0 ; i< length(); i++) { 2 : structure *ptr = array[i]; 3 : if (ptr->x) { 4 : //... 5 : } 6 : }

Third iteration on wrong path xEFF8B0x8ABCD0x0 Array boundary Array of pointers to structs i = 2 ptr = 0 1 : for (int i=0 ; i< length(); i++) { 2 : structure *ptr = array[i]; 3 : if (ptr->x) { 4 : //... 5 : } 6 : }

Wrong Path Event xEFF8B0x8ABCD0x0 Array boundary Array of pointers to structs NULL pointer dereference! i = 2 ptr = 0 * ptr 1 : for (int i=0 ; i< length(); i++) { 2 : structure *ptr = array[i]; 3 : if (ptr->x) { 4 : //... 5 : } 6 : }

Types of WPEs Due to memory instructions –NULL pointer dereference –Write to read-only page –Unaligned access (illegal in the Alpha ISA) –Access to an address out of segment range –Data access to code segment –Multiple concurrent TLB misses

Types of WPEs (continued) Due to control-flow instructions –Misprediction under misprediction If three branches are executed and resolved as mispredicts while there are older unresolved branches in the processor, it is almost certain that one of the older unresolved branches is mispredicted. –Return address stack underflow –Unaligned instruction fetch address (illegal in Alpha) Due to arithmetic instructions –Some arithmetic exceptions e.g. Divide by zero

Experimental Evaluation

Two Empirical Questions 1.How often do WPEs occur? 2.When do WPEs occur on the wrong path?

Simulation Methodology Execution-driven Alpha ISA simulator –Faithful modeling of wrong-path execution and wrong-path misprediction recoveries 8-wide out-of-order processor 256-entry instruction window Large and accurate branch predictors –64K-entry gshare and 64K-entry PAs hybrid –64K-entry target cache for indirect branches –64-entry return address stack Minimum 30-cycle branch misprediction penalty Minimum 500-cycle memory latency, 1 MB L2 cache SPEC 2000 integer benchmarks

Mispredictions Resulting in WPEs

Distribution of WPEs

When do WPEs Occur?

Early Misprediction Recovery Using WPEs

Early Misprediction Recovery Once a WPE is detected, there may be: –no older unresolved branch in the window Do nothing (false alarm) –one older unresolved branch in the window The processor initiates early recovery for that branch, guessing that it is mispredicted. –multiple older unresolved branches in the window May be different instances of the same static branch. To initiate early misprediction recovery, we need a mechanism that decides which branch is the oldest mispredicted branch in the window.

A Realistic Recovery Mechanism The distance in the instruction window between the WPE-generating instruction and the oldest mispredicted branch is predictable. The first time a WPE is encountered, the processor records the distance between the WPE-generating instruction and the mispredicted branch. The next time the same instruction generates a WPE, the processor predicts that the branch that has the same distance to the WPE-generating instruction is mispredicted.

Ld C Generates a WPE Instr.SeqNoPC Br A110PC A Br A220PC A Ld C40PC C NULL pointer dereference

WPE is Recorded Instr.SeqNoPC Br A110PC A Br A220PC A Ld C40PC C WPE Record SeqNoPC 40PC C

Br A1 Resolves as a Mispredict Instr.SeqNoPC Br A110PC A Br A220PC A Ld C40PC C WPE Record SeqNoPC 40PC C Misprediction = 30

Distance is Recorded in a Table PC C Global History ValidDistance XOR 00130

Ld C Again Generates a WPE Instr.SeqNoPC Br A125PC A Br A235PC A Ld C55PC C NULL pointer dereference

Distance Table is Accessed PC C Global History ValidDistance 130 XOR

Br A1 is Predicted To Be a Mispredict Instr.SeqNoPC Br A125PC A Br A235PC A Ld C55PC C Distance = 30

Three Issues in Realistic Recovery Recovery may be initiated for a correctly-predicted correct-path branch. –Happens for 3.8% of WPE detections. A WPE may occur on the correct path! –Happens rarely (0.05% of all WPEs). –We see no recovery initiated in this case. Early recovery for indirect branches –The processor needs to predict a new target address. –Target addresses can also be recorded in the distance prediction table.

IPC Improvement

Shortcomings 1.WPEs do not occur frequently. Their coverage of mispredicted branches is low. 2.WPEs do not occur early enough. 3.Staying on the wrong path a few more cycles is sometimes more beneficial for performance than recovering early.

Future Research Directions Finding/generating new types of WPEs –Can the compiler help? –Better understanding of the differences between wrong path and correct path needed. Identifying what makes a particular wrong-path episode beneficial for performance –Understanding the trade-offs related to the wrong path would be useful in deciding when to initiate early recovery. What else can WPEs be used for? –Other kinds of speculation? –Energy savings?

Conclusions Wrong-path instructions sometimes exhibit illegal and unusual behavior, called wrong path events (WPEs). WPEs can be used to guess that the processor is on the wrong path and to initiate early misprediction recovery. A realistic and accurate early misprediction recovery mechanism is described. Shortcomings of WPEs are discussed. Future research should focus on finding new WPEs.

Backup Slides

Related Work Glew proposed the use of bad memory addresses and illegal instructions as strong indicators of branch mispredictions [Wisc 00]. Jacobsen et al. explored branch confidence estimation to predict the likelihood that a branch is mispredicted [ISCA 96]. Manne et al. used branch confidence to gate the pipeline when there is a high likelihood that the processor is on the wrong path [ISCA 98]. Jiménez et al. proposed an overriding predictor scheme, where a more accurate slow predictor overrides the prediction made by a less accurate fast predictor [MICRO 00]. Falcón et al. proposed the use of predictions made for younger branches to re-evaluate the prediction made for an older branch [ISCA 04].

Contributions 1.A new idea in branch prediction. 2.We observe that wrong-path instructions exhibit illegal and unusual behavior, called wrong path events (WPEs). 3.We describe a novel mechanism to trigger early misprediction detection and recovery using WPEs. 4.We analyze and discuss the shortcomings of WPEs and propose new research areas to address these shortcomings.

Distribution of Cycles Between WPE and Branch Resolution

Distance Predictor Accuracy

WPE and Misprediction Rate