Anshul Kumar, CSE IITD CSL718 : Superscalar Processors Speculative Execution 2nd Feb, 2006.

Slides:

Advertisements

Similar presentations

Tomasulo without Re-order Buffer Opcode Operand1 Operand2 Reservation station MUL1 RS MUL2RS Store1 Multiply unit 1 Mul unit 2 Store unit 1 RS Store2 Store.

Advertisements

Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.

1 Lecture 11: Modern Superscalar Processor Models Generic Superscalar Models, Issue Queue-based Pipeline, Multiple-Issue Design.

Anshul Kumar, CSE IITD CSL718 : VLIW - Software Driven ILP Hardware Support for Exposing ILP at Compile Time 3rd Apr, 2006.

Lec18.1 Step by step for Dynamic Scheduling by reorder buffer Copyright by John Kubiatowicz (http.cs.berkeley.edu/~kubitron)

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Oct 19, 2005 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)

A scheme to overcome data hazards

CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.

1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.

Anshul Kumar, CSE IITD CS718 : VLIW - Software Driven ILP Introduction 23rd Mar, 2006.

Dyn. Sched. CSE 471 Autumn 0219 Tomasulo’s algorithm “Weaknesses” in scoreboard: –Centralized control –No forwarding (more RAW than needed) Tomasulo’s.

Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.

Pipelining 5. Two Approaches for Multiple Issue Superscalar –Issue a variable number of instructions per clock –Instructions are scheduled either statically.

Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 7: Dynamic Scheduling and Branch Prediction * Jeremy R. Johnson Wed. Nov. 8, 2000.

Computer Architecture Lec 8 – Instruction Level Parallelism.

Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 14, 2002 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)

CS 211: Computer Architecture Lecture 5 Instruction Level Parallelism and Its Dynamic Exploitation Instructor: M. Lancaster Corresponding to Hennessey.

CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 3 (and Appendix C) Instruction-Level Parallelism and Its Exploitation Cont. Computer Architecture.

CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim.

1 Zvika Guz Slides modified from Prof. Dave Patterson, Prof. John Kubiatowicz, and Prof. Nancy Warter-Perez Out Of Order Execution.

1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )

Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

Review of CS 203A Laxmi Narayan Bhuyan Lecture2.

Tomasulo’s Approach and Hardware Based Speculation

1 Lecture 18: Pipelining Today’s topics:  Hazards and instruction scheduling  Branch prediction  Out-of-order execution Reminder:  Assignment 7 will.

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )

1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )

7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.

Nov. 9, Lecture 6: Dynamic Scheduling with Scoreboarding and Tomasulo Algorithm (Section 2.4)

1 Sixth Lecture: Chapter 3: CISC Processors (Tomasulo Scheduling and IBM System 360/91) Please recall:  Multicycle instructions lead to the requirement.

Anshul Kumar, CSE IITD CSL718 : Superscalar Processors Issue and Despatch 23rd Jan, 2006.

Instruction-Level Parallelism dynamic scheduling prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University May 2015Instruction-Level Parallelism.

1 Chapter 2: ILP and Its Exploitation Review simple static pipeline ILP Overview Dynamic branch prediction Dynamic scheduling, out-of-order execution Hardware-based.

CSCE 614 Fall Hardware-Based Speculation As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing.

1 Lecture 7: Speculative Execution and Recovery using Reorder Buffer Branch prediction and speculative execution, precise interrupt, reorder buffer.

1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.

04/03/2016 slide 1 Dynamic instruction scheduling Key idea: allow subsequent independent instructions to proceed DIVDF0,F2,F4; takes long time ADDDF10,F0,F8;

1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.

1 Lecture 10: Memory Dependence Detection and Speculation Memory correctness, dynamic memory disambiguation, speculative disambiguation, Alpha Example.

CS203 – Advanced Computer Architecture ILP and Speculation.

Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.

CSL718 : Superscalar Processors

/ Computer Architecture and Design

Out of Order Processors

Dynamic Scheduling and Speculation

Tomasulo Loop Example Loop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1

CS203 – Advanced Computer Architecture

CS5100 Advanced Computer Architecture Hardware-Based Speculation

Lecture 12 Reorder Buffers

Tomasulo With Reorder buffer:

CMSC 611: Advanced Computer Architecture

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

ECE 2162 Reorder Buffer.

CS 704 Advanced Computer Architecture

Lecture 8: Dynamic ILP Topics: out-of-order processors

Adapted from the slides of Prof

Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)

Larry Wittie Computer Science, StonyBrook University and ~lw

CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue

Adapted from the slides of Prof

Chapter 3: ILP and Its Exploitation

September 20, 2000 Prof. John Kubiatowicz

CSL718 : Superscalar Processors

Overcoming Control Hazards with Dynamic Scheduling & Speculation

Lecture 9: Dynamic ILP Topics: out-of-order processors

Conceptual execution on a processor which exploits ILP

Presentation transcript:

Anshul Kumar, CSE IITD CSL718 : Superscalar Processors Speculative Execution 2nd Feb, 2006

Anshul Kumar, CSE IITD slide 2 Handling Control Dependence Simple pipeline –Branch prediction reduces stalls due to control dependence Wide issue processor –Mere branch prediction is not sufficient –Instructions in the predicted path need to be fetched and EXECUTED (speculated execution)

Anshul Kumar, CSE IITD slide 3 What is required for speculation? Branch prediction to choose which instructions to execute Execution of instructions before control dependences are resolved Ability to undo the effects of incorrectly speculated sequence Preserving of correct behaviour under exceptions

Anshul Kumar, CSE IITD slide 4 Types of speculation Hardware based speculation –done with dynamic branch prediction and dynamic scheduling –used in Superscalar processors Compiler based speculation –done with static branch prediction and static scheduling –used in VLIW processors

Anshul Kumar, CSE IITD slide 5 Extending Tomasulo’s scheme for speculative execution Introduce re-order buffer (ROB) Add another stage – “commit” Normal execution Issue Execute Write result Speculative execution Issue Execute Write result Commit f x f x i i x x

Anshul Kumar, CSE IITD slide 6 Extending Tomasulo’s scheme for speculative execution – contd. Write results into ROB in the “write result” stage Write results into register file or memory in the “commit” stage Dependent instructions can read operands from ROB A speculative instruction commits only if the prediction is determined to be correct Instructions may complete execution out-of-order, but they commit in-order

Anshul Kumar, CSE IITD slide 7 Recall Tomasulo’s scheme......

Anshul Kumar, CSE IITD slide 8 IssueIssue Get next instruction from instruction queue Check if there is a matching RS which is empty –no: structural hazard, instruction stalls –yes: issue the instruction to that RS For each operand, check if it is available in RF –yes: put the operand in the RS –no: keep track of FU that will produce it

Anshul Kumar, CSE IITD slide 9 ExecuteExecute If one or more operands not available, wait and monitor CDB When an operand becomes available, it is placed in RS When all operands are available, start execution Choice may need to be made if multiple instructions become ready at the same time

Anshul Kumar, CSE IITD slide 10 Write result When result is available –write it on CDB and –from there into RF and relevant RSs Mark RS as available

Anshul Kumar, CSE IITD slide 11 More formal description......

Anshul Kumar, CSE IITD slide 12 RS and RF fields opbusyQjVjQkVk valQi

Anshul Kumar, CSE IITD slide 13 IssueIssue Get instruction from instruction queue Wait until  r  RS[r].busy = no if (RF[rs].Qi  0) {RS[r].Qj  RF[rs].Qi} else {RS[r].Vj  RF[rs].val; RS[r].Qj  0} similarly for rt RS[r].op  op; RS[r].busy  yes; RF[rd].Qi  r

Anshul Kumar, CSE IITD slide 14 ExecuteExecute Wait until RS[r].Qj = 0 and RS[r].Qk = 0 Compute result: operation is RS[r].op, operands are RS[r].Vj and RS[r].Vk

Anshul Kumar, CSE IITD slide 15 Write result Wait until execution complete at r and CDB available  x if (RF[x].Qi = r) {RF[x].val  result; RF[x].Qi  0}  x if (RS[x].Qj = r) {RS[x].Vj  result; RS[x].Qj  0} similarly for Qk / Vk RS[r].busy  no

Anshul Kumar, CSE IITD slide 16 Tomasulo’s scheme plus ROB......

Anshul Kumar, CSE IITD slide 17 IssueIssue Get next instruction from instruction queue Check if there is a matching RS which is empty and an empty slot in ROB –no: structural hazard, instruction stalls –yes: issue the instruction to that RS and mark the ROB slot, also put ROB slot number in RS For each operand, check if it is available in RF or ROB –yes: put the operand in the RS –no: keep track of FU that will produce it

Anshul Kumar, CSE IITD slide 18 Execute (no change) If one or more operands not available, wait and monitor CDB When an operand becomes available, it is placed in RS When all operands are available, start execution Choice may need to be made if multiple instructions become ready at the same time

Anshul Kumar, CSE IITD slide 19 Write result When result is available –write it on CDB with ROB tag and –from there into ROB RF and relevant RSs Mark RS as available

Anshul Kumar, CSE IITD slide 20 Commit (non-branch instruction) Wait until instruction reaches head of ROB Update RF Remove instruction from ROB

Anshul Kumar, CSE IITD slide 21 Commit (branch instruction) Wait until instruction reaches head of ROB If branch is mispredicted, –flush ROB –Restart execution at correct successor of the branch instruction else –Remove instruction from ROB

Anshul Kumar, CSE IITD slide 22 More formal description......

Anshul Kumar, CSE IITD slide 23 RS fields opbusyQiQjVjQkVk

Anshul Kumar, CSE IITD slide 24 RF fields valQibusy

Anshul Kumar, CSE IITD slide 25 ROB fields instbusyrdyvaldst

Anshul Kumar, CSE IITD slide 26 IssueIssue Get instruction from instruction queue Wait until  r  RS[r].busy=no and ROB[b].busy=no, where b = ROB tail if (RF[rs].busy) {h  RF[rs].Qi; if (ROB[h].rdy) {RS[r].Vj  ROB[h].val; RS[r].Qj  0} else {RS[r].Qj  h} } else {RS[r].Vj  RF[rs].val; RS[r].Qj  0} similarly for rt RS[r].op  op; RS[r].busy  yes; RS[r].Qi  b RF[rd].Qi  b; RF[rd].busy  yes; ROB[b].busy  yes ROB[b].inst  op; ROB[b].dst  rd; ROB[b].rdy  no

Anshul Kumar, CSE IITD slide 27 Execute (no change) Wait until RS[r].Qj = 0 and RS[r].Qk = 0 Compute result: operation is RS[r].op, operands are RS[r].Vj and RS[r].Vk

Anshul Kumar, CSE IITD slide 28 Write result Wait until execution complete at r and CDB available b  RS[r].Qi; RS[r].busy  no  x if (RF[x].Qi = r) {RF[x]  result; RF[x].Qi  0}  x if (RS[x].Qj = b) {RS[x].Vj  result; RS[x].Qj  0} similarly for Qk / Vk ROB[b].rdy  yes; ROB[b].val  result

Anshul Kumar, CSE IITD slide 29 Commit (non-branch instruction) Wait until instruction reaches head of ROB (entry h) and ROB[h].rdy = yes d  ROB[h].dst RF[d].val  ROB[h].val ROB[h].busy  no if (RF[d].Qi = h) {RF[d].busy  no}

Anshul Kumar, CSE IITD slide 30 Commit (branch instruction) Wait until instruction reaches head of ROB (entry h) and ROB[h].rdy = yes If branch is mispredicted, –clear ROB, RF[ ].Qi –fetch branch dest else –ROB[h].busy  no –if (RF[d].Qi = h) {RF[d].busy  no}