1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.

Slides:

Advertisements

Similar presentations

Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.

Advertisements

1 Lecture 11: Modern Superscalar Processor Models Generic Superscalar Models, Issue Queue-based Pipeline, Multiple-Issue Design.

Lec18.1 Step by step for Dynamic Scheduling by reorder buffer Copyright by John Kubiatowicz (http.cs.berkeley.edu/~kubitron)

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Oct 19, 2005 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)

A scheme to overcome data hazards

Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.

1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.

Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.

COMP25212 Advanced Pipelining Out of Order Processors.

Computer Architecture Lec 8 – Instruction Level Parallelism.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 14, 2002 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)

EEL 5708 Speculation. Branch prediction. Superscalar processors. Lotzi Bölöni.

CS 211: Computer Architecture Lecture 5 Instruction Level Parallelism and Its Dynamic Exploitation Instructor: M. Lancaster Corresponding to Hennessey.

/ Computer Architecture and Design Instructor: Dr. Michael Geiger Summer 2014 Lecture 6: Speculation.

CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

DAP Spr.‘98 ©UCB 1 Lecture 6: ILP Techniques Contd. Laxmi N. Bhuyan CS 162 Spring 2003.

Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim.

1 Zvika Guz Slides modified from Prof. Dave Patterson, Prof. John Kubiatowicz, and Prof. Nancy Warter-Perez Out Of Order Execution.

1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)

Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 9, 2002 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)

Review of CS 203A Laxmi Narayan Bhuyan Lecture2.

1 IBM System 360. Common architecture for a set of machines. Robert Tomasulo worked on a high-end machine, the Model 91 (1967), on which they implemented.

CIS 629 Fall 2002 Multiple Issue/Speculation Multiple Instruction Issue: CPI < 1 To improve a pipeline’s CPI to be better [less] than one, and to utilize.

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )

1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )

Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

1 Overcoming Control Hazards with Dynamic Scheduling & Speculation.

1 Sixth Lecture: Chapter 3: CISC Processors (Tomasulo Scheduling and IBM System 360/91) Please recall:  Multicycle instructions lead to the requirement.

1 Chapter 2: ILP and Its Exploitation Review simple static pipeline ILP Overview Dynamic branch prediction Dynamic scheduling, out-of-order execution Hardware-based.

1 Advanced Computer Architecture Dynamic Instruction Level Parallelism Lecture 2.

1 Lecture 5 Overview of Superscalar Techniques CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading: Textbook, Ch. 2.1 “Complexity-Effective.

1 Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading:Textbook 2.4, 2.5.

CSCE 614 Fall Hardware-Based Speculation As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing.

1 Lecture 7: Speculative Execution and Recovery using Reorder Buffer Branch prediction and speculative execution, precise interrupt, reorder buffer.

1 Lecture 5: Dependence Analysis and Superscalar Techniques Overview Instruction dependences, correctness, inst scheduling examples, renaming, speculation,

1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.

Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.

1 Lecture 10: Memory Dependence Detection and Speculation Memory correctness, dynamic memory disambiguation, speculative disambiguation, Alpha Example.

CS 5513 Computer Architecture Lecture 6 – Instruction Level Parallelism continued.

CS203 – Advanced Computer Architecture ILP and Speculation.

Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.

Instruction-Level Parallelism and Its Dynamic Exploitation

IBM System 360. Common architecture for a set of machines

/ Computer Architecture and Design

/ Computer Architecture and Design

Tomasulo Loop Example Loop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1

CS203 – Advanced Computer Architecture

Lecture: Out-of-order Processors

Advantages of Dynamic Scheduling

11/14/2018 CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenković, Electrical and Computer.

CMSC 611: Advanced Computer Architecture

Lecture 6: Advanced Pipelines

A Dynamic Algorithm: Tomasulo’s

Out of Order Processors

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

Lecture 11: Memory Data Flow Techniques

Lecture: Out-of-order Processors

Lecture 8: Dynamic ILP Topics: out-of-order processors

Adapted from the slides of Prof

Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)

September 20, 2000 Prof. John Kubiatowicz

Larry Wittie Computer Science, StonyBrook University and ~lw

Adapted from the slides of Prof

Chapter 3: ILP and Its Exploitation

September 20, 2000 Prof. John Kubiatowicz

Overcoming Control Hazards with Dynamic Scheduling & Speculation

Lecture 9: Dynamic ILP Topics: out-of-order processors

Presentation transcript:

1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer

2 Control Dependencies Every instruction is control dependent on some set of branches if p1 S1; if p2 S2; S1 is control dependent on p1, and S2 is control dependent on p2 but not on p1. control dependencies must be preserved to preserve program order

3 Control Dependence Ignored If CPU stalls on branches, how much would CPI increase? Control dependence need not be preserved in the whole execution willing to execute instructions that should not have been executed, thereby violating the control dependences, if can do so without affecting correctness of the program Two properties critical to program correctness are data flow and exception behavior

4 Branch Prediction and Speculative Execution Speculation is to run instructions on prediction – predictions could be wrong. Branch prediction: cannot be avoided, could be very accurate Mis-prediction is less frequent event – but can we ignore? Example: for (i=0; i<1000; i++) C[i] = A[i]+B[i]; Branch prediction: predict the execution as accurate as possible (frequent cases) Speculative execution recovery: if prediction is wrong, roll the execution back

5 Exception Behavior Preserving exception behavior -- exceptions must be raised exactly as in sequential execution Same sequences No “extra” exceptions Example: DADDUR2,R3,R4 BEQZR2,L1 LWR1,0(R2) L1: Problem with moving LW before BEQZ ? Again, a dynamic execution must look like a sequential execution, any time when it is stopped

6 Precise Interrupts Tomasulo had: In-order issue, out-of-order execution, and out-of-order completion Need to “fix” the out-of-order completion aspect so that we can find precise breakpoint in instruction stream.

7 Branch Prediction vs. Precise Interrupt Mis-prediction is “exception” on the branch inst Execution “branches out” on exceptions Every instruction is “predicted” not to take the “branch” to interrupt handler Same technique for handling both issue: in-order completion or commit: change register/memory only in program order (sequential) How does it ensure the correctness?

8 The Hardware: Reorder Buffer If inst write results in program order, reg/memory always get the correct values Reorder buffer (ROB) – reorder out-of-order inst to program order at the time of writing reg/memory (commit) If some inst goes wrong, handle it at the time of commit – just flush inst afterwards Inst cannot write reg/memory immediately after execution, so ROB also buffer the results No such a place in Tomasulo original Reorder Buffer Decode FU1FU2 RS Fetch Unit Rename L-bufS-buf DM Regfile IM

9 Four Steps of Speculative Tomasulo Algorithm 1.Issue—get instruction from FP Op Queue If reservation station and reorder buffer slot free, issue instr & send operands & reorder buffer no. for destination (this stage sometimes called “dispatch”) 2.Execution—operate on operands (EX) When both operands ready then execute; if not ready, watch CDB for result; when both in reservation station, execute; checks RAW (sometimes called “issue”) 3.Write result—finish execution (WB) Write on Common Data Bus to all awaiting FUs & reorder buffer; mark reservation station available. 4.Commit—update register with reorder result When instr. at head of reorder buffer & result present, update register with result (or store to memory) and remove instr from reorder buffer. Mispredicted branch flushes reorder buffer (sometimes called “graduation”)

10 Reorder Buffer Details Holds branch valid and exception bits Flush pipeline when any bit is set How do the architectural states look like after the flushing? Holds dest, result and PC Write results to dest at the time of commit Which PC to hold? A ready bit (not shown) indicates if the Supplies operands between execution complete and commit Reorder Buffer Dest reg Result Exceptions? Program Counter Branch or L/W? Ready?

11 Speculative Execution Recovery Flush the pipeline on mis-prediction MIPS 5-stage pipeline used flushing on taken branches Where is the flush signal from? When to flush? Which components are flushed? Reorder Buffer Decode FU1FU2 RS Fetch Unit Rename L-bufS-buf DM Regfile IM

12 Changes to Other Components Use ROB index as tag Why not RS index any more? Why is ROB index a valid choice? Renaming table maps architecture registers to ROB index if the register is renamed Reservation stations now use ROB index for tracking dependence and for wakeup Again tag (now ROB index) and data are broadcast on CDB at writeback Inst may receive values from reg/mem, data broadcasting, or ROB

13 Code Example Loop: LD R2, 0(R1) DADDIUR2, R2, #1 SD R2, 0(R1) DADDIU R1, R1, #4 BNE R2, R3, Loop How would this code be executed? InstIssueExecMemory read Write results Commit LD12345 ……………… ………………

14 Summary Reservations stations: implicit register renaming to larger set of registers + buffering source operands Prevents registers as bottleneck Avoids WAR, WAW hazards of Scoreboard Not limited to basic blocks when compared to static scheduling (integer units gets ahead, beyond branches) Today, helps cache misses as well Don’t stall for L1 Data cache miss (insufficient ILP for L2 miss?) Can support memory-level parallelism Lasting Contributions Dynamic scheduling Register renaming Load/store disambiguation (discuss later) 360/91 descendants are Pentium III; PowerPC 604; MIPS R10000; HP-PA 8000; Alpha 21264

15 Dynamic Scheduling: The Only Choice? Most high-performance processors today are dynamically scheduled superscalar processors With deeper and n-way issue pipeline Other alternatives to exploit instruction-level parallelism Statically scheduled superscalar VLIW Mixed effort: EPIC – Explicit Parallel Instruction Computing Example: Intel Itanium processors Why is dynamic scheduling so popular today? Technology trends: increasing transistor budget, deeper pipeline, wide issue