Lecture: Branch Prediction

Slides:



Advertisements
Similar presentations
Branch prediction Titov Alexander MDSP November, 2009.
Advertisements

Pipelining V Topics Branch prediction State machine design Systems I.
Dynamic Branch Prediction (Sec 4.3) Control dependences become a limiting factor in exploiting ILP So far, we’ve discussed only static branch prediction.
CPE 631: Branch Prediction Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar Milenkovic,
Dynamic Branch Prediction
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
1 Lecture: Branch Prediction Topics: branch prediction, bimodal/global/local/tournament predictors, branch target buffer (Section 3.3, notes on class webpage)
W04S1 COMP s1 Seminar 4: Branch Prediction Slides due to David A. Patterson, 2001.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
1 Lecture 7: Static ILP, Branch prediction Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections )
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 8, 2003 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
Combining Branch Predictors
Branch Target Buffers BPB: Tag + Prediction
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
1 Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections , )
1 Lecture 18: Pipelining Today’s topics:  Hazards and instruction scheduling  Branch prediction  Out-of-order execution Reminder:  Assignment 7 will.
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )
1 COMP 740: Computer Architecture and Implementation Montek Singh Thu, Feb 19, 2009 Topic: Instruction-Level Parallelism III (Dynamic Branch Prediction)
Dynamic Branch Prediction
EENG449b/Savvides Lec /25/05 March 24, 2005 Prof. Andreas Savvides Spring g449b EENG 449bG/CPSC 439bG.
CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.
Spring 2003CSE P5481 Control Hazard Review The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction.
1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )
1 Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3)
CS 7810 Lecture 6 The Impact of Delay on the Design of Branch Predictors D.A. Jimenez, S.W. Keckler, C. Lin Proceedings of MICRO
1 Lecture 2: Metrics to Evaluate Systems Topics: Power and technology trends wrap-up, benchmark suites, performance equation, summarizing performance with.
1 CS/EE 6810: Computer Architecture Class format:  Most lectures on YouTube *BEFORE* class  Use class time for discussions, clarifications, problem-solving,
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
CSC 4250 Computer Architectures October 31, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation.
CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.
1 Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
1 Lecture: Static ILP Topics: predication, speculation (Sections C.5, 3.2)
1 Lecture: Out-of-order Processors Topics: branch predictor wrap-up, a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
Branch Prediction Perspectives Using Machine Learning Veerle Desmet Ghent University.
Dynamic Branch Prediction
CSL718 : Pipelined Processors
Lecture: Out-of-order Processors
CS203 – Advanced Computer Architecture
Computer Structure Advanced Branch Prediction
Lecture: Branch Prediction
Computer Architecture Advanced Branch Prediction
UNIVERSITY OF MASSACHUSETTS Dept
COSC3330 Computer Architecture Lecture 15. Branch Prediction
CMSC 611: Advanced Computer Architecture
Lecture 19: Branches, OOO Today’s topics: Instruction scheduling
Module 3: Branch Prediction
Lecture 6: Static ILP, Branch prediction
Lecture 18: Pipelining Today’s topics:
Dynamic Hardware Branch Prediction
Lecture: Static ILP, Branch Prediction
CPE 631: Branch Prediction
Branch statistics Branches occur every 4-6 instructions (16-25%) in integer programs; somewhat less frequently in scientific ones Unconditional branches.
Lecture 18: Pipelining Today’s topics:
Lecture: Branch Prediction
Lecture: Out-of-order Processors
Dynamic Branch Prediction
Lecture 19: Branches, OOO Today’s topics: Instruction scheduling
Pipelining and control flow
Lecture 10: Branch Prediction and Instruction Delivery
Lecture 20: OOO, Memory Hierarchy
Lecture 20: OOO, Memory Hierarchy
Lecture 18: Pipelining Today’s topics:
Adapted from the slides of Prof
Lecture 7: Branch Prediction, Dynamic ILP
CPE 631 Lecture 12: Branch Prediction
Presentation transcript:

Lecture: Branch Prediction Topics: power/energy basics and DFS/DVFS, branch prediction, bimodal/global/local/tournament predictors, branch target buffer (Section 3.3, notes on class webpage)

Power Consumption Trends Dyn power a activity x capacitance x voltage2 x frequency Capacitance per transistor and voltage are decreasing, but number of transistors is increasing at a faster rate; hence clock frequency must be kept steady Leakage power is also rising; is a function of transistor count, leakage current, and supply voltage Power consumption is already between 100-150W in high-performance processors today Energy = power x time = (dynpower + lkgpower) x time

Power Vs. Energy Energy is the ultimate metric: it tells us the true “cost” of performing a fixed task Power (energy/time) poses constraints; can only work fast enough to max out the power delivery or cooling solution If processor A consumes 1.2x the power of processor B, but finishes the task in 30% less time, its relative energy is 1.2 X 0.7 = 0.84; Proc-A is better, assuming that 1.2x power can be supported by the system

Reducing Power and Energy Can gate off transistors that are inactive (reduces leakage) Design for typical case and throttle down when activity exceeds a threshold DFS: Dynamic frequency scaling -- only reduces frequency and dynamic power, but hurts energy DVFS: Dynamic voltage and frequency scaling – can reduce voltage and frequency by (say) 10%; can slow a program by (say) 8%, but reduce dynamic power by 27%, reduce total power by (say) 23%, reduce total energy by 17% (Note: voltage drop  slow transistor  freq drop)

DFS and DVFS DFS DVFS

Pipeline without Branch Predictor PC IF (br) Reg Read Compare Br-target PC + 4 In the 5-stage pipeline, a branch completes in two cycles  If the branch went the wrong way, one incorrect instr is fetched  One stall cycle per incorrect branch

Pipeline with Branch Predictor PC IF (br) Reg Read Compare Br-target Branch Predictor In the 5-stage pipeline, a branch completes in two cycles  If the branch went the wrong way, one incorrect instr is fetched  One stall cycle per incorrect branch

1-Bit Bimodal Prediction For each branch, keep track of what happened last time and use that outcome as the prediction What are prediction accuracies for branches 1 and 2 below: while (1) { for (i=0;i<10;i++) { branch-1 … } for (j=0;j<20;j++) { branch-2

2-Bit Bimodal Prediction For each branch, maintain a 2-bit saturating counter: if the branch is taken: counter = min(3,counter+1) if the branch is not taken: counter = max(0,counter-1) If (counter >= 2), predict taken, else predict not taken Advantage: a few atypical branches will not influence the prediction (a better measure of “the common case”) Especially useful when multiple branches share the same counter (some bits of the branch PC are used to index into the branch predictor) Can be easily extended to N-bits (in most processors, N=2)

Bimodal 1-Bit Predictor Branch PC Table of 1K entries Each entry is a bit 10 bits The table keeps track of what the branch did last time

Bimodal 2-Bit Predictor Branch PC Table of 1K entries Each entry is a 2-bit sat. counter 10 bits The table keeps track of the common-case outcome for the branch

Correlating Predictors Basic branch prediction: maintain a 2-bit saturating counter for each entry (or use 10 branch PC bits to index into one of 1024 counters) – captures the recent “common case” for each branch Can we take advantage of additional information? If a branch recently went 01111, expect 0; if it recently went 11101, expect 1; can we have a separate counter for each case? If the previous branches went 01, expect 0; if the previous branches went 11, expect 1; can we have a separate counter for each case? Hence, build correlating predictors

Global Predictor Branch PC Global history Table of 10 bits 16K entries Each entry is a 2-bit sat. counter 10 bits CAT Global history The table keeps track of the common-case outcome for the branch/history combo

Local Predictor Branch PC Also a two-level predictor that only uses local histories at the first level Branch PC Table of 16K entries of 2-bit saturating counters Use 6 bits of branch PC to index into local history table 10110111011001 14-bit history indexes into next level Table of 64 entries of 14-bit histories for a single branch

Local Predictor Branch PC Local history 10 bit entries 10 bits XOR Table of 1K entries Each entry is a 2-bit sat. counter 6 bits Local history 10 bit entries 64 entries The table keeps track of the common-case outcome for the branch/local-history combo

Local/Global Predictors Instead of maintaining a counter for each branch to capture the common case, Maintain a counter for each branch and surrounding pattern If the surrounding pattern belongs to the branch being predicted, the predictor is referred to as a local predictor If the surrounding pattern includes neighboring branches, the predictor is referred to as a global predictor

Tournament Predictors A local predictor might work well for some branches or programs, while a global predictor might work well for others Provide one of each and maintain another predictor to identify which predictor is best for each branch Alpha 21264: 1K entries in level-1 1K entries in level-2 4K entries 12-bit global history Total capacity: ? Local Predictor M U X Global Predictor Branch PC Tournament Predictor Table of 2-bit saturating counters

Branch Target Prediction In addition to predicting the branch direction, we must also predict the branch target address Branch PC indexes into a predictor table; indirect branches might be problematic Most common indirect branch: return from a procedure – can be easily handled with a stack of return addresses

Title Bullet