EENG449b/Savvides Lec 10.1 2/17/04 February 17, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG.

Slides:



Advertisements
Similar presentations
Instruction-Level Parallelism compiler techniques and branch prediction prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University March.
Advertisements

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
Dynamic Branch Prediction (Sec 4.3) Control dependences become a limiting factor in exploiting ILP So far, we’ve discussed only static branch prediction.
Pipelining and Control Hazards Oct
Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
CPE 631: Branch Prediction Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar Milenkovic,
Dynamic Branch Prediction
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
W04S1 COMP s1 Seminar 4: Branch Prediction Slides due to David A. Patterson, 2001.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 8, 2003 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 7, 2002 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
EECE476: Computer Architecture Lecture 20: Branch Prediction Chapter extra The University of British ColumbiaEECE 476© 2005 Guy Lemieux.
EECC551 - Shaaban #1 lec # 5 Fall Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
EENG449b/Savvides Lec /20/04 February 12, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
EECC551 - Shaaban #1 lec # 7 Fall Hardware Dynamic Branch Prediction Simplest method: –A branch prediction buffer or Branch History Table.
Goal: Reduce the Penalty of Control Hazards
EECC551 - Shaaban #1 lec # 5 Winter Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
COMP381 by M. Hamdi 1 (Recap) Control Hazards. COMP381 by M. Hamdi 2 Control (Branch) Hazard A: beqz r2, label B: label: P: Problem: The outcome.
1 COMP 740: Computer Architecture and Implementation Montek Singh Thu, Feb 19, 2009 Topic: Instruction-Level Parallelism III (Dynamic Branch Prediction)
Dynamic Branch Prediction
EENG449b/Savvides Lec /25/05 March 24, 2005 Prof. Andreas Savvides Spring g449b EENG 449bG/CPSC 439bG.
CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.
1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )
ENGS 116 Lecture 91 Dynamic Branch Prediction and Speculation Vincent H. Berk October 10, 2005 Reading for today: Chapter 3.2 – 3.6 Reading for Wednesday:
CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim.
1 Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3)
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
CSCE 614 Fall Hardware-Based Speculation As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing.
CSC 4250 Computer Architectures October 31, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation.
CPE 631 Session 17 Branch Prediction Electrical and Computer Engineering University of Alabama in Huntsville.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
Dynamic Branch Prediction
Instruction-Level Parallelism Dynamic Branch Prediction
Instruction-Level Parallelism and Its Dynamic Exploitation
CS203 – Advanced Computer Architecture
Concepts and Challenges
Dynamic Branch Prediction
COMP 740: Computer Architecture and Implementation
CS 704 Advanced Computer Architecture
Instruction-Level Parallelism (ILP)
CMSC 611: Advanced Computer Architecture
Module 3: Branch Prediction
Lecture 6: Static ILP, Branch prediction
So far we have dealt with control hazards in instruction pipelines by:
Dynamic Hardware Branch Prediction
CPE 631: Branch Prediction
Chapter 3: ILP and Its Exploitation
Dynamic Branch Prediction
Advanced Computer Architecture
So far we have dealt with control hazards in instruction pipelines by:
Lecture 10: Branch Prediction and Instruction Delivery
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Adapted from the slides of Prof
Dynamic Hardware Prediction
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Procedure Return Predictors
CPE 631 Lecture 12: Branch Prediction
Presentation transcript:

EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer Systems Lecture 11 Instruction Level Parallelism II

EENG449b/Savvides Lec /17/04 Announcements Midterm Next Thursday 02/19/04 TA extra office hour –Sobeeh will have an extra office hour tomorrow –Office hours 5:00 – 7:00pm, AKW 201 Reading for this lecture: Chapter 3 pages

EENG449b/Savvides Lec /17/04 Dynamic Hardware Prediction Last time: Tomasulo’s Algorithm for ILP –Dynamic scheduling –Register renaming –Dynamic memory disambiguation »Avoid conflicts in load and store instructions –Tomasulo’s algorithm deals with data dependences Today: Dynamic branch prediction –Deal with control dependences –Control dependences become the limiting factor in ILP optimizations »Remember from last lecture – basic block sizes between 4 – 7 instructions….

EENG449b/Savvides Lec /17/04 Predicting Branches In Appendix A: static techniques –Delay slot execution –Action taken does not depend on the dynamic behavior of a branch Dynamic branch prediction –Try to predict the outcome of a branch early on in order to avoid stalls –Branch prediction is critical for multiple issue processors »In an n-issue processor, branches will come n times faster than a single issue processor

EENG449b/Savvides Lec /17/04 Branch Prediction Metrics To evaluate the effectiveness of branch prediction you need to consider –Prediction accuracy –Penalties associated with branch taken and branch not taken –The associated penalties are artifacts of »Pipeline design »Type of predictor »Branch frequency »Strategy to deal with the misprediction

EENG449b/Savvides Lec /17/04 Basic Branch Predictor Use a 1-bit branch predictor buffer or branch history table 1 bit of memory stating whether the branch was recently taken or not –Indexed by the lower portion of the branch predict instruction Bit entry updated each time the branch instruction is executed Problem with 1-bit prediction –It will always give the wrong prediction twice –Imagine executing a loop »Predictor will be wrong on the first and last iteration

EENG449b/Savvides Lec /17/04 A 2-bit Prediction Scheme 2- bit prediction scheme –Generalization for n-bit prediction A prediction must miss twice before it is changed

EENG449b/Savvides Lec /17/04 Branch Prediction Implementation Implications Branch predictors held in branch predictor buffers –Implemented as small caches accessed with instruction address at the IF phase of a pipeline –OR it could be implemented as a pair of bits attached to each block in the instruction cache This branch prediction scheme does not help in the basic 5-stage pipeline –The decision whether a branch is taken and the target address are computed at the same stage…

EENG449b/Savvides Lec /17/04 Branch Prediction Accuracy on SPEC 89 Benchmark Using 2-bit prediction, 4KB cache FP programs Integer programs

EENG449b/Savvides Lec /17/04 Performance of SPEC 98 Benchmark Remember –To evaluate performance you need to know the branch frequencies and misprediction penalties FP programs typically come from scientific applications and are more loop based Branches harder to predict in integer programs –Typically have higher branch frequency How can this be improved? –Perhaps increase the cache buffer –Increase the effectiveness of the predictor

EENG449b/Savvides Lec /17/04 Effects of Cache Buffer Size

EENG449b/Savvides Lec /17/04 Correlating Bit Predictors What about considering the behavior of other branches than the ones we are trying to predict? Goal: Use correlating or 2-level predictors to exploit the correlation between consecutive branches…

EENG449b/Savvides Lec /17/04 Branch Correlation Example if (aa==2) aa=0; if (bb==2) bb=0; if (aa!=bb){ DSUBUI R3, R1, #2 BNEZ R3, L1; branch b1 DADD R1, R0, R0 L1:DSUBUI R3,R2,#2 BNEZ R3, L2; branch b2 DADD R2,R0,R0 L2:DSUBU R3,R1,R2 BEQZ R3, L3; branch b3 Branch b3 is correlated with b1 and b2

EENG449b/Savvides Lec /17/04 Correlated Branch Example Consider the following code: if (d==0) d=1; if (d==1) BNEZ R1, L1 ; branch b1 DADDUI R1,R0,#1 L1: DADDUI R3,R1, #-1 BNEZ R3,L2 ; branch b2 … L2: What are the possible execution sequences when d=0,1,2?

EENG449b/Savvides Lec /17/04 Using a 1-bit Predictor Consider a sequence of b=2,0,2,0 and a 1-bit predictor P. b1 A. b1 NP. b1 P. b2 A. b2 NP. b2 d=2 NT T T NT T T d=0 T NT NT T NT NT d=2 NT T T NTT T d=0 T NT NT T NT NT BNEZ R1, L1 ; branch b1 DADDUI R1,R0,#1 L1: DADDUI R3,R1, #-1 BNEZ R3,L2 ; branch b2 … L2:

EENG449b/Savvides Lec /17/04 Using a 1-bit Predictor Consider a sequence of b=2,0,2,0 and a 1-bit predictor P. b1 A. b1 NP. b1 P. b2 A. b2 NP. b2 d=2 NT T T NT T T d=0 T NT NT T NT NT d=2 NT T T NTT T d=0 T NT NT T NT NT All branches are mispredicted !!! BNEZ R1, L1 ; branch b1 DADDUI R1,R0,#1 L1: DADDUI R3,R1, #-1 BNEZ R3,L2 ; branch b2 … L2:

EENG449b/Savvides Lec /17/04 Using a 1-bit Predictor with 1-bit Correlation X/X Prediction if last branch was NOT taken Prediction if last branch was taken NOTE: last branch refers to the preceding branch instruction not the previous execution of the current branch instruction

EENG449b/Savvides Lec /17/04 Using a 1-bit Predictor with 1-bit Correlation Consider a sequence of b=2,0,2,0 and a 1-bit predictor P. b1 A. b1 NP. b1 P. b2 A. b2 NP. b2 d=2 NT/NT T T/NT NT/NT T NT/T d=0 T/NT NT T/NT NT/T NT NT/T d=2 T/NT T T/NT NT/T T NT/T d=0 T/NT NT T/NT NT/T NT NT/T BNEZ R1, L1 ; branch b1 DADDUI R1,R0,#1 L1: DADDUI R3,R1, #-1 BNEZ R3,L2 ; branch b2 … L2:

EENG449b/Savvides Lec /17/04 Using a 1-bit Predictor with 1-bit Correlation Consider a sequence of b=2,0,2,0 and a 1-bit predictor P. b1 A. b1 NP. b1 P. b2 A. b2 NP. b2 d=2 NT/NT T T/NT NT/NT T NT/T d=0 T/NT NT T/NT NT/T NT NT/T d=2 T/NT T T/NT NT/T T NT/T d=0 T/NT NT T/NT NT/T NT NT/T Misprediction only on the first iteration of d=2! BNEZ R1, L1 ; branch b1 DADDUI R1,R0,#1 L1: DADDUI R3,R1, #-1 BNEZ R3,L2 ; branch b2 … L2:

EENG449b/Savvides Lec /17/04 (m,n) Predictors Use the behavior of last m branches to choose from 2 m branch predictors. Each is an n-bit predictor for a single branch Ex. A (2,2) branch predictor

EENG449b/Savvides Lec /17/04 Tournament Predictors N-bit predictors – use local information (m,n) predictors – use global information Tournament predictors –Local + global – enhanced performance Example of tournament predictors –Multilevel branch predictors »Uses several levels of branch prediction table »Has an algorithm to select from multiple predictors

EENG449b/Savvides Lec /17/04 Comparing Predictors

EENG449b/Savvides Lec /17/04 High Performance Instruction Delivery What else can be done besides branch prediction? Need to have high bandwidth instruction delivery –Modern multiple issue processors require 4-8 instructions per CPI

EENG449b/Savvides Lec /17/04 Branch-Target Buffers (BTB) How can we further reduce branch penalty? We need to know what is the instruction of the next instruction to fetch If the instruction is a branch and we know the PC then the penalty would be zero Branch-target-buffer – stores the predicted address for the next instruction after a branch Advantage for a 5-stage pipeline –Know the predicted instruction address 1 cycle earlier IF stage instead of ID stage

EENG449b/Savvides Lec /17/04 BTB has a cache structure Note that only predicted taken branches need to be stored Represent addresses of known branches

EENG449b/Savvides Lec /17/04 Branch Target Buffer Operation

EENG449b/Savvides Lec /17/04 Integrated Instruction Fetch Units Instead of using instruction fetch as one of the pipeline phases, use a more advanced instruction fetch unit –To support the demands of multiple issue processors Integrated IF has 3 main units –Integrated Branch Prediction –Instruction Prefetch »autonomously fetching ahead the given instructions –Instruction memory access and buffering »Tries to hide the overhead associated with fetching instructions from multiple cache lines by buffering instructions

EENG449b/Savvides Lec /17/04 Return Address Predictors Predict the return address of jumps that are not known at compile time –Returns from procedure calls. »Procedures get called at different points in the code Use a small stack of return addresses –Before a procedure is called put the return address on a stack and pop the stack on return –If the stack has enough depth – optimal prediction

EENG449b/Savvides Lec /17/04 Prediction Stack Performance Results based on a number of SPEC benchmarks

EENG449b/Savvides Lec /17/04 Recap So far we have seen Dynamic Scheduling – reduce data dependences –Tomasulo’s algorithms Dynamic Branch Prediction – Trying to reduce control dependences –N-bit predictors, (m,n) predictors, Tournament Predictors Achieve and ideal CPI of 1 –Branch target buffer, integrated IF, return address prediction

EENG449b/Savvides Lec /17/04 Multiple Issue Processors Try to issue multiple instructions per clock cycle Two basic flavors –Superscalar Processors »Issue variable number of instructions per clock cycle »Can be statically or dynamically scheduled –VLIW (Very Large Instruction Set) Processors »Issue a constant number of instructions formatted as a packet of smaller instructions »Parallelism across instructions is specifically indicated »Statically scheduled by the compiler

EENG449b/Savvides Lec /17/04 Next Time Midterm Next Tuesday –Multiple Issue Processors