ILP: Advanced HWCSCE430/830 Instruction-level parallelism: Advanced HW Approaches CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Fall, 2006.

Slides:



Advertisements
Similar presentations
Instruction-Level Parallelism compiler techniques and branch prediction prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University March.
Advertisements

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
Dynamic Branch Prediction (Sec 4.3) Control dependences become a limiting factor in exploiting ILP So far, we’ve discussed only static branch prediction.
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
Dynamic Branch Prediction
CS 211: Computer Architecture Lecture 5 Instruction Level Parallelism and Its Dynamic Exploitation Instructor: M. Lancaster Corresponding to Hennessey.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
CPE 631: Branch Prediction Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar Milenkovic,
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 3 (and Appendix C) Instruction-Level Parallelism and Its Exploitation Computer Architecture.
EECC551 - Shaaban #1 lec # 5 Spring Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
1 Lecture 7: Static ILP, Branch prediction Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections )
EECC551 - Shaaban #1 lec # 5 Fall Static Conditional Branch Prediction Branch prediction schemes can be classified into static (at compilation.
CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim.
Slide 1 Instruction-level Parallelism Instruction-Level Parallelism (ILP):Instruction-Level Parallelism (ILP): overlapping of executions among instructions.
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
EECC551 - Shaaban #1 lec # 5 Fall Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
EECC551 - Shaaban #1 lec # 7 Fall Hardware Dynamic Branch Prediction Simplest method: –A branch prediction buffer or Branch History Table.
Computer Architecture Lecture 6 Overview of Branch Prediction.
CIS 629 Fall 2002 Multiple Issue/Speculation Multiple Instruction Issue: CPI < 1 To improve a pipeline’s CPI to be better [less] than one, and to utilize.
EECC551 - Shaaban #1 lec # 5 Winter Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )
Branch Prediction Dimitris Karteris Rafael Pasvantidιs.
CSC 4250 Computer Architectures October 27, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation.
So far we have dealt with control hazards in instruction pipelines by:
Dynamic Branch Prediction
EENG449b/Savvides Lec /25/05 March 24, 2005 Prof. Andreas Savvides Spring g449b EENG 449bG/CPSC 439bG.
CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )
ENGS 116 Lecture 91 Dynamic Branch Prediction and Speculation Vincent H. Berk October 10, 2005 Reading for today: Chapter 3.2 – 3.6 Reading for Wednesday:
CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim.
EECC551 - Shaaban #1 lec # 5 Fall Static Conditional Branch Prediction Branch prediction schemes can be classified into static and dynamic.
1 Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3)
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
CSCE 614 Fall Hardware-Based Speculation As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing.
CPE 631 Session 17 Branch Prediction Electrical and Computer Engineering University of Alabama in Huntsville.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
Dynamic Branch Prediction
Instruction-Level Parallelism and Its Dynamic Exploitation
CS203 – Advanced Computer Architecture
Dynamic Branch Prediction
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue
CS 704 Advanced Computer Architecture
Instruction-level Parallelism
CMSC 611: Advanced Computer Architecture
Lecture 6: Static ILP, Branch prediction
So far we have dealt with control hazards in instruction pipelines by:
Dynamic Hardware Branch Prediction
Chapter 3: ILP and Its Exploitation
Dynamic Branch Prediction
Advanced Computer Architecture
So far we have dealt with control hazards in instruction pipelines by:
CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Adapted from the slides of Prof
Dynamic Hardware Prediction
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Presentation transcript:

ILP: Advanced HWCSCE430/830 Instruction-level parallelism: Advanced HW Approaches CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Fall, 2006

ILP: Advanced HWCSCE430/830 ILP: Advanced HW Approaches Dynamic Hardware Branch PredictionDynamic Hardware Branch Prediction: control dependences rapidly become the limiting factor as the amount of ILP to be exploited increases, which is particularly true when multiple instructions are to be issued per cycle. –Basic Branch Prediction and Branch-Prediction Buffers »A small memory indexed by the lower portion of the address of the branch instruction, containing a bit that says whether the branch was recently taken or not – simple, and useful only when the branch delay is longer than the time to calculate the target address »The prediction bit is inverted each time there is a wrong prediction – an accuracy problem (mispredict twice); a remedy: 2-bit predictor, a special case of n-bit predictor (saturating counter), which performs well (accuracy:99-82%)performs well Not taken Taken Not taken Taken Predict taken Predict not taken

ILP: Advanced HWCSCE430/830 ILP: Advanced HW Approaches Dynamic Hardware Branch PredictionDynamic Hardware Branch Prediction: –Correlating Branch Predictors »The behavior of branch b3 is correlated with the behavior of branches b1 and b2 (b1 & b2 both not taken  b3 will be taken); A predictor that uses only the behavior of a single branch to predict the outcome of that branch can never capture this behavior. correlating predictorstwo-level predictors »Branch predictors that use the behavior of other branches to make prediction are called correlating predictors or two-level predictors. If (aa==2) aa=0; If (bb==2) bb=0; If (aa!=bb){ Assign aa and bb to registers R1 and R2 DSUBUI R3,R1,#2 BNEZ R3,L1 ;branch b1 (aa!=2) DADD R1,R0,R0 ;aa=0 L1: DSUBUI R3,R2,#2 BNEZ R3,L2 ;branch b2 (bb!=2) DADD R2,R0,R0 ;bb=0 L2: DSUBUI R3,R1,R2 ;R3=aa-bb BEQZ R3,L3 ;branch b3 (aa==bb)

ILP: Advanced HWCSCE430/830 ILP: Advanced HW Approaches Dynamic Hardware Branch PredictionDynamic Hardware Branch Prediction: If (d==0) d=1; If (d==1) Assign d to register R1 BNEZ R1,L1 ;branch b1 (d!=0) DADDIU R1,R0,#1 ;d==0, so d=1 L1: DADDIU R3,R1, # -1 BNEZ R3,L2 ;branch b2 (d!=1) … L2: Initial value of dd==0?b1Value of d before b2d==1?b2 0Yes Not taken 1Yes Not taken 1NoTaken1YesNot taken 2NoTaken2NoTaken Behavior of a 1-bit Standard Predictor Initialized to Not Taken d=?b1 predictionb1 actionNew b1 predictionb2 predictionb2 actionNew b2 prediction 2NTTT TT 0 2 0

ILP: Advanced HWCSCE430/830 ILP: Advanced HW Approaches Dynamic Hardware Branch PredictionDynamic Hardware Branch Prediction: If (d==0) d=1; If (d==1) Assign d to register R1 BNEZ R1,L1 ;branch b1 (d!=0) DADDIU R1,R0,#1 ;d==0, so d=1 L1: DADDIU R3,R1, # -1 BNEZ R3,L2 ;branch b2 (d!=1) … L2: Initial value of dd==0?b1Value of d before b2d==1?b2 0Yes Not taken 1Yes Not taken 1NoTaken1YesNot taken 2NoTaken2NoTaken Behavior of a 1-bit Standard Predictor Initialized to Not Taken d=?b1 predictionb1 actionNew b1 predictionb2 predictionb2 actionNew b2 prediction 2NTTT TT 0T T 2 0

ILP: Advanced HWCSCE430/830 ILP: Advanced HW Approaches Dynamic Hardware Branch PredictionDynamic Hardware Branch Prediction: If (d==0) d=1; If (d==1) Assign d to register R1 BNEZ R1,L1 ;branch b1 (d!=0) DADDIU R1,R0,#1 ;d==0, so d=1 L1: DADDIU R3,R1, # -1 BNEZ R3,L2 ;branch b2 (d!=1) … L2: Initial value of dd==0?b1Value of d before b2d==1?b2 0Yes Not taken 1Yes Not taken 1NoTaken1YesNot taken 2NoTaken2NoTaken Behavior of a 1-bit Standard Predictor Initialized to Not Taken d=?b1 predictionb1 actionNew b1 predictionb2 predictionb2 actionNew b2 prediction 2NTTT TT 0T T 2 TT TT 0

ILP: Advanced HWCSCE430/830 ILP: Advanced HW Approaches Dynamic Hardware Branch PredictionDynamic Hardware Branch Prediction: If (d==0) d=1; If (d==1) Assign d to register R1 BNEZ R1,L1 ;branch b1 (d!=0) DADDIU R1,R0,#1 ;d==0, so d=1 L1: DADDIU R3,R1, # -1 BNEZ R3,L2 ;branch b2 (d!=1) … L2: Initial value of dd==0?b1Value of d before b2d==1?b2 0Yes Not taken 1Yes Not taken 1NoTaken1YesNot taken 2NoTaken2NoTaken Behavior of a 1-bit Standard Predictor Initialized to Not Taken (100% wrong prediction) d=?b1 predictionb1 actionNew b1 predictionb2 predictionb2 actionNew b2 prediction 2NTTT TT 0T T 2 TT TT 0T T

ILP: Advanced HWCSCE430/830 ILP: Advanced HW Approaches Dynamic Hardware Branch PredictionDynamic Hardware Branch Prediction: –Correlating Branch Predictors all »The standard predictor mispredicted all branches! The 2 Prediction bits (p1/p2)Prediction if last branch not taken (p1)Prediction if last branch taken (p2) NT/NTNT NT/TNTT T/NTTNT T/TTT The Action of the 1-bit Predictor with 1-bit correlation, Initialized to Not Taken/Not Taken d=?b1 predictionb1 actionNew b1 predictionb2 predictionb2 actionNew b2 prediction 2NT/NTT Initial value of dd==0?b1Value of d before b2d==1?b2 0Yes Not taken 1Yes Not taken 1NoTaken1YesNot taken 2NoTaken2NoTaken

ILP: Advanced HWCSCE430/830 ILP: Advanced HW Approaches Dynamic Hardware Branch PredictionDynamic Hardware Branch Prediction: –Correlating Branch Predictors all »The standard predictor mispredicted all branches! The 2 Prediction bits (p1/p2)Prediction if last branch not taken (p1)Prediction if last branch taken (p2) NT/NTNT NT/TNTT T/NTTNT T/TTT The Action of the 1-bit Predictor with 1-bit correlation, Initialized to Not Taken/Not Taken d=?b1 predictionb1 actionNew b1 predictionb2 predictionb2 actionNew b2 prediction 2NT/NTTT/NT Initial value of d d==0?b1Value of d before b2d==1?b2 0Yes Not taken 1Yes Not taken 1NoTaken1YesNot taken 2NoTaken2NoTaken

ILP: Advanced HWCSCE430/830 ILP: Advanced HW Approaches The 2 Prediction bits (p1/p2)Prediction if last branch not taken (p1)Prediction if last branch taken (p2) NT/NTNT NT/TNTT T/NTTNT T/TTT The Action of the 1-bit Predictor with 1-bit correlation, Initialized to Not Taken/Not Taken d=?b1 predictionb1 actionNew b1 predictionb2 predictionb2 actionNew b2 prediction 2NT/NTTT/NTNT/NTT Dynamic Hardware Branch PredictionDynamic Hardware Branch Prediction: –Correlating Branch Predictors all »The standard predictor mispredicted all branches! Initial value of d d==0?b1Value of d before b2d==1?b2 0Yes Not taken 1Yes Not taken 1NoTaken1YesNot taken 2NoTaken2NoTaken

ILP: Advanced HWCSCE430/830 ILP: Advanced HW Approaches The 2 Prediction bits (p1/p2)Prediction if last branch not taken (p1)Prediction if last branch taken (p2) NT/NTNT NT/TNTT T/NTTNT T/TTT The Action of the 1-bit Predictor with 1-bit correlation, Initialized to Not Taken/Not Taken d=?b1 predictionb1 actionNew b1 predictionb2 predictionb2 actionNew b2 prediction 2NT/NTTT/NTNT/NTTNT/T Dynamic Hardware Branch PredictionDynamic Hardware Branch Prediction: –Correlating Branch Predictors all »The standard predictor mispredicted all branches! Initial value of d d==0?b1Value of d before b2d==1?b2 0Yes Not taken 1Yes Not taken 1NoTaken1YesNot taken 2NoTaken2NoTaken

ILP: Advanced HWCSCE430/830 ILP: Advanced HW Approaches The 2 Prediction bits (p1/p2)Prediction if last branch not taken (p1)Prediction if last branch taken (p2) NT/NTNT NT/TNTT T/NTTNT T/TTT The Action of the 1-bit Predictor with 1-bit correlation, Initialized to Not Taken/Not Taken d=?b1 predictionb1 actionNew b1 predictionb2 predictionb2 actionNew b2 prediction 2NT/NTTT/NTNT/NTTNT/T 0T/NTNT/T 2 0 Dynamic Hardware Branch PredictionDynamic Hardware Branch Prediction: –Correlating Branch Predictors all »The standard predictor mispredicted all branches! Initial value of d d==0?b1Value of d before b2d==1?b2 0Yes Not taken 1Yes Not taken 1NoTaken1YesNot taken 2NoTaken2NoTaken

ILP: Advanced HWCSCE430/830 ILP: Advanced HW Approaches The 2 Prediction bits (p1/p2)Prediction if last branch not taken (p1)Prediction if last branch taken (p2) NT/NTNT NT/TNTT T/NTTNT T/TTT The Action of the 1-bit Predictor with 1-bit correlation, Initialized to Not Taken/Not Taken d=?b1 predictionb1 actionNew b1 predictionb2 predictionb2 actionNew b2 prediction 2NT/NTTT/NTNT/NTTNT/T 0T/NTNTT/NTNT/TNTNT/T 2 0 Dynamic Hardware Branch PredictionDynamic Hardware Branch Prediction: –Correlating Branch Predictors all »The standard predictor mispredicted all branches! Initial value of d d==0?b1Value of d before b2d==1?b2 0Yes Not taken 1Yes Not taken 1NoTaken1YesNot taken 2NoTaken2NoTaken

ILP: Advanced HWCSCE430/830 ILP: Advanced HW Approaches The 2 Prediction bits (p1/p2)Prediction if last branch not taken (p1)Prediction if last branch taken (p2) NT/NTNT NT/TNTT T/NTTNT T/TTT The Action of the 1-bit Predictor with 1-bit correlation, Initialized to Not Taken/Not Taken d=?b1 predictionb1 actionNew b1 predictionb2 predictionb2 actionNew b2 prediction 2NT/NTTT/NTNT/NTTNT/T 0T/NTNTT/NTNT/TNTNT/T 2T/NTT NT/TT 0 Dynamic Hardware Branch PredictionDynamic Hardware Branch Prediction: –Correlating Branch Predictors all »The standard predictor mispredicted all branches! Initial value of d d==0?b1Value of d before b2d==1?b2 0Yes Not taken 1Yes Not taken 1NoTaken1YesNot taken 2NoTaken2NoTaken

ILP: Advanced HWCSCE430/830 ILP: Advanced HW Approaches The 2 Prediction bits (p1/p2)Prediction if last branch not taken (p1)Prediction if last branch taken (p2) NT/NTNT NT/TNTT T/NTTNT T/TTT The Action of the 1-bit Predictor with 1-bit correlation, Initialized to Not Taken/Not Taken d=?b1 predictionb1 actionNew b1 predictionb2 predictionb2 actionNew b2 prediction 2NT/NTTT/NTNT/NTTNT/T 0T/NTNTT/NTNT/TNTNT/T 2T/NTT NT/TT 0T/NTNTT/NTNT/TNTNT/T Dynamic Hardware Branch PredictionDynamic Hardware Branch Prediction: –Correlating Branch Predictors all »The standard predictor mispredicted all branches! Initial value of d d==0?b1Value of d before b2d==1?b2 0Yes Not taken 1Yes Not taken 1NoTaken1YesNot taken 2NoTaken2NoTaken

ILP: Advanced HWCSCE430/830 ILP: Advanced HW Approaches Dynamic Hardware Branch PredictionDynamic Hardware Branch Prediction: –Correlating Branch Predictors (1,1) predictor »With the 1-bit correlation predictor, also called a (1,1) predictor, the only misprediction is on the first iteration! »In general case an (m,n) predictor uses the behavior of the last m branches to choose from 2 m branch predictors, each of which is an n-bit predictor for a single branch. xx prediction xx 2-bit per-branch predictors 4 Lower-bits of Branch address 2-bit global branch history (shift register) »The number of bits in an (m,n) predictor is: 2 m *n *(number of prediction entries selected by the branch address)

ILP: Advanced HWCSCE430/830 ILP: Advanced HW Approaches Dynamic Hardware Branch PredictionDynamic Hardware Branch Prediction: –Performance of Correlating Branch Predictors

ILP: Advanced HWCSCE430/830 ILP: Advanced HW Approaches Dynamic Hardware Branch PredictionDynamic Hardware Branch Prediction: –Tournament Predictors: Adaptively Combining Local and Global Predictors »Takes the insight that adding global information to local predictors helps improve performance to the next level, by Using multiple predictors, usually one based on global information and one based on local information, and Combining them with a selector »Better accuracy at medium sizes (8K bits – 32K bits) and more effective use of very large numbers of prediction bits: the right predictor for the right branch »Existing tournament predictors use a 2-bit saturating counter per branch to choose among two different predictors: State Transition Diagram 0/1 1/0 Use predictor 1 Use predictor 2 Use predictor 1 Use predictor 2 0/1 0/0, 0/1,1/1 0/0, 1/0,1/1 0/0, 1/1 The counter is incremented whenever the “predicted” predictor is correct and the other predictor is incorrect, and it is decremented in the reverse situation

ILP: Advanced HWCSCE430/830 ILP: Advanced HW Approaches Dynamic Hardware Branch PredictionDynamic Hardware Branch Prediction: –Performance of Tournament Predictors: Prediction due to local predictor Misprediction rate of 3 different predictors

ILP: Advanced HWCSCE430/830 Instruction-Level Parallelism Dynamic Hardware Branch PredictionDynamic Hardware Branch Prediction: –The Alpha Branch Predictor: »4K 2-bit saturating counters indexed by the local branch address to choose from among: A Global Predictor that has –4K entries that are indexed by the history of the last 12 branches; –Each entry is a standard 2-bit predictor A Local Predictor that consists of a two-level predictor –At the top level is a local history table consisting of bit entries, with each entry corresponding to the most recent 10 branch outcomes for the entry; –At the bottom level is a table of 1K entries, indexed by the 10-bit entry of the top level, consisting of 3-bit saturating counters which provide the local prediction »It uses a total of 29K bits for branch prediction, resulting in very high accuracy: 1 misprediction in 1000 for SPECfp95 and 11.5 in 1000 for SPECint95

ILP: Advanced HWCSCE430/830 ILP: Advanced HW Approaches High-Performance Instruction DeliveryHigh-Performance Instruction Delivery: –Branch-Target Buffers »Branch-prediction cache »Branch-prediction cache that stores the predicted address for the next instruction after a branch: Predicting the next instruction address before decoding the current instruction! Accessing the target buffer during the IF stage using the instruction address of the fetched instruction (a possible branch) to index the buffer. PC of instruction to fetch Predicted PCLook up Number of entries in branch- target buffer = No: instruction is not predicted to be branch; proceed normally Yes: then instruction is a taken branch and predicted PC should be used as the next PC Branch predicted taken or untaken

ILP: Advanced HWCSCE430/830 ILP: Advanced HW Approaches Handling branch-target buffers : Integrated Instruction Fetch Units: to meet the demands of multiple-issue processors, recent designs have used an integrated instruction fetch unit that integrates several functions: –Integrated branch prediction –Integrated branch prediction – the branch predictor becomes part of the instruction fetch unit and is constantly predicting branches, so as to drive the fetch pipeline –Instruction prefetch –Instruction prefetch – to deliver multiple instructions per clock, the instruction fetch unit will likely need to fetch ahead, autonomously managing the prefetching of instructions and integrating it with branch prediction –Instruction memory access and buffering – encapsulates the complexity of fetching multiple instructions per clock, trying to hide the cost of crossing cache blocks, and provides buffering, acting as an on- demand unit to provide instructions to the issue stage as needed and in the quantity needed Send PC to memory and branch-target buffer Entry found in branch- target buffer? Is instruction a taken branch? Taken branch? Send out predicted PC Enter branch instruction address and next PC into branch-target buffer (2 cycle penalty) Mispredicted branch, kill fetched instruction; restart fetch at other target; delete entry from target buffer (2 cycle penalty) Branch correctly predicted; continue execution with no stalls (0 cycle penalty) Normal instruction execution (0 cycle penalty) No Yes No Yes IF ID EX

ILP: Advanced HWCSCE430/830 ILP: Advanced HW Approaches Taking Advantage of More ILP with Multiple Issue –Superscalar: –Superscalar: issue varying numbers of instructions per cycle that are either statically scheduled (using compiler techniques, thus in-order execution) or dynamically scheduled (using techniques based on Tomasulo ’ s algorithm, thus out-order execution); –VLIW (very long instruction word): EPIC, –VLIW (very long instruction word): issue a fixed number of instructions formatted either as one large instruction or as a fixed instruction packet with the parallelism among instructions explicitly indicated by the instruction (hence, they are also known as EPIC, explicitly parallel instruction computers). VLIW and EPIC processors are inherently statically scheduled by the compiler. Common Name Issue Structure Hazard Detection SchedulingDistinguishing Characteristics Examples Superscalar (static) Dynamic (IS packet <= 8) HardwareStaticIn-order executionSun UltraSPARC II/III Superscalar (dynamic) Dynamic (split&piped) HardwareDynamicSome out-of-order execution IBM Power2 Superscalar (speculative) DynamicHardwareDynamic with speculation Out-of-order execution with speculation Pentium III/4, MIPS R 10K, Alpha 21264, HP PA 8500, IBM RS64III VLIW/LIWStaticSoftwareStaticNo hazards between issue packets Trimedia, i860 EPICMostly staticMostly software Mostly staticExplicit dependences marked by compiler Itanium

ILP: Advanced HWCSCE430/830 ILP: Advanced HW Approaches Taking Advantage of More ILP with Multiple Issue –Multiple Instruction Issue with Dynamic Scheduling: –Multiple Instruction Issue with Dynamic Scheduling: dual-issue with Tomasulo ’ s Iteration No. InstructionsIssues atExecutesMem AccessWrite CDBComments 1 L.D F0,0(R1)1234First issue 1 ADD.D F4,F0,F2158Wait for L.D 1 S.D F4,0(R1)239 Wait for ADD.D 1 DADDIU R1,R1,#-8245Wait for ALU 1 BNE R1,R2,Loop36 Wait for DADDIU 2 L.D F0,0(R1)4789Wait for BNE complete 2 ADD.D F4,F0,F Wait for L.D 2 S.D F4,0(R1)5814 Wait for ADD.D 2 DADDIU R1,R1,#-85910Wait for ALU 2 BNE R1,R2,Loop611 Wait for DADDIU 3 L.D F0,0(R1) Wait for BNE complete 3 ADD.D F4,F0,F Wait for L.D 3 S.D F4,0(R1)81319 Wait for ADD.D 3 DADDIU R1,R1,# Wait for ALU 3 BNE R1,R2,Loop916 Wait for DADDIU

ILP: Advanced HWCSCE430/830 ILP: Advanced HW Approaches Taking Advantage of More ILP with Multiple Issue: resource usage Clock numberInteger ALUFP ALUData cacheCDBComments 21/L.D 31/S.D 1/L.D 41/DAADIU 1/L.D 5 1/ADD.D 1/DADDIU 6 72/L.D 82/S.D 2/L.D1/ADD.D 92/DADDIU 1/S.D2/L.D 10 2/ADD.D2/DADDIU /L.D 133/S.D 3/L.D2/ADD.D 143/DADDIU 2/S.D3/L.D 15 3/ADD.D 3/DADDIU /ADD.D 19 3/S.D 20

ILP: Advanced HWCSCE430/830 ILP: Advanced HW Approaches Taking Advantage of More ILP with Multiple Issue –Multiple Instruction Issue with Dynamic Scheduling: –Multiple Instruction Issue with Dynamic Scheduling: + an adder and a CBD Iteration No. InstructionsIssues atExecutesMem AccessWrite CDBComments 1 L.D F0,0(R1)1234First issue 1 ADD.D F4,F0,F2158Wait for L.D 1 S.D F4,0(R1)239 Wait for ADD.D 1 DADDIU R1,R1,#-8234Executes earlier 1 BNE R1,R2,Loop35 Wait for DADDIU 2 L.D F0,0(R1)4678Wait for BNE complete 2 ADD.D F4,F0,F249 12Wait for L.D 2 S.D F4,0(R1)5713 Wait for ADD.D 2 DADDIU R1,R1,#-85610Executes earlier 2 BNE R1,R2,Loop68 Wait for DADDIU 3 L.D F0,0(R1)791011Wait for BNE complete 3 ADD.D F4,F0,F Wait for L.D 3 S.D F4,0(R1)81016 Wait for ADD.D 3 DADDIU R1,R1,#-88910Executes earlier 3 BNE R1,R2,Loop911 Wait for DADDIU

ILP: Advanced HWCSCE430/830 ILP: Advanced HW Approaches Taking Advantage of More ILP with Multiple Issue: more resource Clock numberInteger ALUAddress adderFP ALUData cacheCDB#1CDB#2 21/L.D 31/DAADIU1/S.D 1/L.D 4 1/DADDIU 5 1/ADD.D 6 2/DADDIU2/L.D 72/S.D 2/L.D2/DADDIU 8 1/ADD.D2/L.D 93/DADDIU3/L.D2/ADD.D1/S.D 10 3/S.D3/L.D3/DADDIU 11 3/L.D 12 3/ADD.D 2/ADD.D 13 2/S.D /DADDIU 16 3/S.D