CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim.

Slides:



Advertisements
Similar presentations
Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.
Advertisements

Speculative ExecutionCS510 Computer ArchitecturesLecture Lecture 11 Trace Scheduling, Conditional Execution, Speculation, Limits of ILP.
Lec18.1 Step by step for Dynamic Scheduling by reorder buffer Copyright by John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Oct 19, 2005 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)
Dynamic Branch Prediction (Sec 4.3) Control dependences become a limiting factor in exploiting ILP So far, we’ve discussed only static branch prediction.
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
Pipelining 5. Two Approaches for Multiple Issue Superscalar –Issue a variable number of instructions per clock –Instructions are scheduled either statically.
Computer Architecture Lec 8 – Instruction Level Parallelism.
Copyright 2001 UCB & Morgan Kaufmann ECE668.1 Adapted from Patterson, Katz and Kubiatowicz © UCB Csaba Andras Moritz UNIVERSITY OF MASSACHUSETTS Dept.
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
CS136, Advanced Architecture Speculation. CS136 2 Outline Speculation Speculative Tomasulo Example Memory Aliases Exceptions VLIW Increasing instruction.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 14, 2002 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)
Lecture 8: More ILP stuff Professor Alvin R. Lebeck Computer Science 220 Fall 2001.
EEL 5708 Speculation. Branch prediction. Superscalar processors. Lotzi Bölöni.
CS 211: Computer Architecture Lecture 5 Instruction Level Parallelism and Its Dynamic Exploitation Instructor: M. Lancaster Corresponding to Hennessey.
/ Computer Architecture and Design Instructor: Dr. Michael Geiger Summer 2014 Lecture 6: Speculation.
CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
DAP Spr.‘98 ©UCB 1 Lecture 6: ILP Techniques Contd. Laxmi N. Bhuyan CS 162 Spring 2003.
CSE 502 Graduate Computer Architecture Lec – More Instruction Level Parallelism Via Speculation Larry Wittie Computer Science, StonyBrook University.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
W04S1 COMP s1 Seminar 4: Branch Prediction Slides due to David A. Patterson, 2001.
1 Zvika Guz Slides modified from Prof. Dave Patterson, Prof. John Kubiatowicz, and Prof. Nancy Warter-Perez Out Of Order Execution.
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 9, 2002 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)
Review of CS 203A Laxmi Narayan Bhuyan Lecture2.
CIS 629 Fall 2002 Multiple Issue/Speculation Multiple Instruction Issue: CPI < 1 To improve a pipeline’s CPI to be better [less] than one, and to utilize.
1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )
ENGS 116 Lecture 91 Dynamic Branch Prediction and Speculation Vincent H. Berk October 10, 2005 Reading for today: Chapter 3.2 – 3.6 Reading for Wednesday:
CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim.
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
Nov. 9, Lecture 6: Dynamic Scheduling with Scoreboarding and Tomasulo Algorithm (Section 2.4)
1 Overcoming Control Hazards with Dynamic Scheduling & Speculation.
1 Sixth Lecture: Chapter 3: CISC Processors (Tomasulo Scheduling and IBM System 360/91) Please recall:  Multicycle instructions lead to the requirement.
1 Chapter 2: ILP and Its Exploitation Review simple static pipeline ILP Overview Dynamic branch prediction Dynamic scheduling, out-of-order execution Hardware-based.
1 Advanced Computer Architecture Dynamic Instruction Level Parallelism Lecture 2.
Chapter 3 Instruction Level Parallelism 2 Dr. Eng. Amr T. Abdel-Hamid Elect 707 Spring 2014 Computer Applications Text book slides: Computer Architec ture:
CSCE 614 Fall Hardware-Based Speculation As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing.
2/24; 3/1,3/11 (quiz was 2/22, QuizAns 3/8) CSE502-S11, Lec ILP 1 Tomasulo Organization FP adders Add1 Add2 Add3 FP multipliers Mult1 Mult2 From.
1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.
Anshul Kumar, CSE IITD CSL718 : Superscalar Processors Speculative Execution 2nd Feb, 2006.
04/03/2016 slide 1 Dynamic instruction scheduling Key idea: allow subsequent independent instructions to proceed DIVDF0,F2,F4; takes long time ADDDF10,F0,F8;
CS 5513 Computer Architecture Lecture 6 – Instruction Level Parallelism continued.
CS203 – Advanced Computer Architecture ILP and Speculation.
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
Dynamic Branch Prediction
Instruction-Level Parallelism and Its Dynamic Exploitation
/ Computer Architecture and Design
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue
COMP 740: Computer Architecture and Implementation
Tomasulo Loop Example Loop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1
CS203 – Advanced Computer Architecture
CS5100 Advanced Computer Architecture Hardware-Based Speculation
CPSC 614 Computer Architecture Lec 5 – Instruction Level Parallelism
11/14/2018 CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenković, Electrical and Computer.
A Dynamic Algorithm: Tomasulo’s
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Dynamic Hardware Branch Prediction
Adapted from the slides of Prof
Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)
Larry Wittie Computer Science, StonyBrook University and ~lw
CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue
Tomasulo Organization
CPSC 614 Computer Architecture Lec 5 – Instruction Level Parallelism
Adapted from the slides of Prof
Chapter 3: ILP and Its Exploitation
September 20, 2000 Prof. John Kubiatowicz
Overcoming Control Hazards with Dynamic Scheduling & Speculation
Presentation transcript:

CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim

CPSC614 Lec 5.2 Correlating Predictors Two-level predictors if (d == 0) d = 1; if (d == 1)

CPSC614 Lec 5.3 initial value of d b1value of d before b2 b

CPSC614 Lec bit Predictor (Initialized to NT) db1 predic b1 action new b1 pr b2 predic b2 action new b2 pr

CPSC614 Lec 5.5 (1,1) Predictor Every branch has two separate prediction bits. –First bit: the prediction if the last branch in the program is not taken. –Second bit: the prediction if the last branch in the program is taken. Write the pair of prediction bits together.

CPSC614 Lec 5.6 Combinations & Meaning Prediction bitsPrediction if not taken Prediction if taken

CPSC614 Lec 5.7 (m,n) Predictor Uses the last m branches to choose from 2 m branch predictors, each of which is an n-bit predictor. Yields higher prediction rates than 2-bit scheme Requires a trivial amount of additional hardware The global history of the most recent m branches are recorded in an m-bit shift register.

CPSC614 Lec 5.8

CPSC614 Lec 5.9 (m,n) Predictor Total number of bits: = 2 m x n x #prediction entries selected by the branch address Examples

CPSC614 Lec 5.10

CPSC614 Lec 5.11 Tournament Predictors Most popular multilevel branch predictors

CPSC614 Lec 5.12 Tournament Predictors By using multiple predictors (one based on global information, one based on local information, and combining them with a selector), it can select the right predictor for the right branch. Alpha –Uses most sophisticated branch predictor as of 2001.

CPSC614 Lec 5.13

CPSC614 Lec 5.14

CPSC614 Lec 5.15 Need Address at Same Time as Prediction Branch Target Buffer (BTB): Address of branch index to get prediction AND branch address (if taken) Branch PCPredicted PC =? PC of instruction FETCH Extra prediction state bits Yes: instruction is branch and use predicted PC as next PC No: branch not predicted, proceed normally (Next PC = PC+4)

CPSC614 Lec 5.16 Multiple-Issue Processors Allow multiple instructions to issue in a clock cycle. Ideal CPI < 1 2 flavors –Superscalar –VLIW (Very Long Instruction Word)

CPSC614 Lec 5.17 Superscalar Processors Issue varying numbers of instructions per clock –statically scheduled »using compiler techniques »in-order execution –dynamically scheduled »Tomasulo ’ s algorithm »out-of-order execution

CPSC614 Lec 5.18 Superscalar MIPS: 2 instructions, 1 FP & 1 anything – Fetch 64-bits/clock cycle; Int on left, FP on right – Can only issue 2nd instruction if 1st instruction issues – More ports for FP registers to do FP load & FP op in a pair TypePipeStages Int. instructionIFIDEXMEMWB FP instructionIFIDEXMEMWB Int. instructionIFIDEXMEMWB FP instructionIFIDEXMEMWB Int. instructionIFIDEXMEMWB FP instructionIFIDEXMEMWB Figure 3.24 P.219

CPSC614 Lec 5.19 VLIW Processors issue a fixed number of instructions formatted either as one large instruction or as a fixed instruction packet with the parallelism among instructions explicitly indicated by the instruction (EPIC: Explicitly Parallel Instruction Computers). Statically scheduled by the compiler.

CPSC614 Lec 5.20 nameIssue structure Hazard detection SchedulingDistinguishing characteristic Examples Superscalar (static) dynamich/wstaticin-order execution Sun Ultra SPARC II/III Superscalar (dynamic) dynamich/wdynamicsome out-of- order execution IBM Power2 Superscalar (speculative) dynamich/wdynamic w/ speculation out-of-order execution w/ speculation Pentium III/4, MIPS R10K, Alpha 21264, VLIW/LIWstatics/wstaticno hazards between issue packets Trimedia, i860 EPICmostly static mostly s/wmostly staticexplicit dependences marked by compiler Itanium

CPSC614 Lec 5.21 Hardware-Based Speculation As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing burden. => Speculating on the outcome of branches and executing the program as if the guesses were correct. Hardware Speculation

CPSC614 Lec Key Ideas of Hardware Speculation Dynamic Branch Prediction –Choose which instruction to execute. Speculation –Allow the execution of instructions before the control dependences are resolved (with the ability to undo the effect of an incorrectly speculated sequence). Dynamic Scheduling –Deal with the scheduling of different combinations of basic blocks

CPSC614 Lec 5.23 Examples PowerPC 603/604/G3/G4 MIPS R10000/12000 Intel Pentium II/III/4 Alpha AMD K5/K6/Athlon

CPSC614 Lec 5.24 What about Precise Interrupts? Tomasulo had: In-order issue, out-of-order execution, and out-of-order completion Need to “fix” the out-of-order completion aspect so that we can find precise breakpoint in instruction stream.

CPSC614 Lec 5.25 Relationship between precise interrupts and speculation: Speculation is a form of guessing. Important for branch prediction: –Need to “take our best shot” at predicting branch direction. If we speculate and are wrong, need to back up and restart execution to point at which we predicted incorrectly: –This is exactly same as precise exceptions! Technique for both precise interrupts/exceptions and speculation: in-order completion or commit

CPSC614 Lec 5.26 HW support for precise interrupts Need HW buffer for results of uncommitted instructions: reorder buffer –3 fields: instr, destination, value –Use reorder buffer number instead of reservation station when execution completes –Supplies operands between execution complete & commit –(Reorder buffer can be operand source => more registers like RS) –Instructions commit –Once instruction commits, result is put into register –As a result, easy to undo speculated instructions on mispredicted branches or exceptions Reorder Buffer FP Op Queue FP Adder Res Stations FP Regs

CPSC614 Lec 5.27 Four Steps of Speculative Tomasulo Algorithm 1.Issue—get instruction from FP Op Queue If reservation station and reorder buffer slot free, issue instr & send operands & reorder buffer no. for destination (this stage sometimes called “dispatch”) 2.Execution—operate on operands (EX) When both operands ready then execute; if not ready, watch CDB for result; when both in reservation station, execute; checks RAW (sometimes called “issue”) 3.Write result—finish execution (WB) Write on Common Data Bus to all awaiting FUs & reorder buffer; mark reservation station available. 4.Commit—update register with reorder result When instr. at head of reorder buffer & result present, update register with result (or store to memory) and remove instr from reorder buffer. Mispredicted branch flushes reorder buffer (sometimes called “graduation”)

CPSC614 Lec 5.28 What are the hardware complexities with reorder buffer (ROB)? Reorder Buffer FP Op Queue FP Adder Res Stations FP Regs Compar network How do you find the latest version of a register? –(As specified by Smith paper) need associative comparison network –Could use future file or just use the register result status buffer to track which specific reorder buffer has received the value Need as many ports on ROB as register file Reorder Table Dest Reg ResultExceptions?Valid Program Counter