CSC 4250 Computer Architectures October 31, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation.

Slides:



Advertisements
Similar presentations
CSCI 4717/5717 Computer Architecture
Advertisements

Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
Pipelining and Control Hazards Oct
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Computer Architecture Computer Architecture Processing of control transfer instructions, part I Ola Flygt Växjö University
CS 211: Computer Architecture Lecture 5 Instruction Level Parallelism and Its Dynamic Exploitation Instructor: M. Lancaster Corresponding to Hennessey.
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
Computer Organization and Architecture
EECE476: Computer Architecture Lecture 21: Faster Branches Branch Prediction with Branch-Target Buffers (not in textbook) The University of British ColumbiaEECE.
1 Lecture 7: Static ILP, Branch prediction Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections )
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 8, 2003 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 7, 2002 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
Computer Organization and Architecture The CPU Structure.
EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
Trace Caches J. Nelson Amaral. Difficulties to Instruction Fetching Where to fetch the next instruction from? – Use branch prediction Sometimes there.
Pipelined Processor II CPSC 321 Andreas Klappenecker.
1 COMP 740: Computer Architecture and Implementation Montek Singh Thu, Feb 19, 2009 Topic: Instruction-Level Parallelism III (Dynamic Branch Prediction)
Dynamic Branch Prediction
EENG449b/Savvides Lec /25/05 March 24, 2005 Prof. Andreas Savvides Spring g449b EENG 449bG/CPSC 439bG.
CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )
1 Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3)
Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2.
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
CSC 4250 Computer Architectures December 5, 2006 Chapter 5. Memory Hierarchy.
Presented by: Sergio Ospina Qing Gao. Contents ♦ 12.1 Processor Organization ♦ 12.2 Register Organization ♦ 12.3 Instruction Cycle ♦ 12.4 Instruction.
TDC 311 The Microarchitecture. Introduction As mentioned earlier in the class, one Java statement generates multiple machine code statements Then one.
1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
CSCE 614 Fall Hardware-Based Speculation As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing.
Spring 2003CSE P5481 Advanced Caching Techniques Approaches to improving memory system performance eliminate memory operations decrease the number of misses.
Chapter 5 Memory III CSE 820. Michigan State University Computer Science and Engineering Miss Rate Reduction (cont’d)
Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex.
Fetch Directed Prefetching - a Study
Lecture 4.5 Pipelines – Control Hazards Topics Control Hazards Branch Prediction Misprediction stalls Readings: Appendix C September 2, 2015 CSCE 513 Computer.
Branch Hazards and Static Branch Prediction Techniques
CPU Design and Pipelining – Page 1CSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: CPU Operations and Pipelining Reading:
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Instruction-Level Parallelism Dynamic Branch Prediction
Instruction-Level Parallelism and Its Dynamic Exploitation
Computer Architecture Chapter (14): Processor Structure and Function
Computer Organization CS224
Dynamic Branch Prediction
COMP 740: Computer Architecture and Implementation
William Stallings Computer Organization and Architecture 8th Edition
CSC 4250 Computer Architectures
PowerPC 604 Superscalar Microprocessor
Samira Khan University of Virginia Nov 13, 2017
Chapter 4 The Processor Part 4
Dr. Javier Navaridas Pipelining Dr. Javier Navaridas COMP25212 System Architecture.
Morgan Kaufmann Publishers The Processor
CMSC 611: Advanced Computer Architecture
The processor: Pipelining and Branching
Module 3: Branch Prediction
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Branch statistics Branches occur every 4-6 instructions (16-25%) in integer programs; somewhat less frequently in scientific ones Unconditional branches.
Lecture 10: Branch Prediction and Instruction Delivery
Recovery: Redirect fetch unit to T path if actually T.
Dynamic Hardware Prediction
Lecture 4: Instruction Set Design/Pipelining
Procedure Return Predictors
Presentation transcript:

CSC 4250 Computer Architectures October 31, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation

Simple 5-Stage Pipeline Branch prediction may not help 5-stage pipeline: IF|ID|EX|ME|WB We decode branch instruction, test branch condition, and compute branch address during ID No gain in predicting branch outcome in ID How to speed up branch prediction?

How to Reduce Branch Penalty 5-stage pipeline:IF|ID|EX|ME|WB “Predict” fetched instruction as a branch instr. ─ Decide that instr. just fetched is a branch during IF “Predict” target instruction and fetch it next ─ No need to compute address for next instr. Branch penalty becomes zero cycle if prediction is correct

Figure A Branch Target Buffer

Figure Steps to handle an instruction with a branch-target buffer

Figure 3.21 Penalties, assuming that we store only taken branches in the buffer: If the branch is not correctly predicted, the penalty is equal to one clock cycle to update the buffer with the correct information (during which an instruction cannot be fetched) and one clock cycle to restart fetching the next correct instruction for the branch If the branch is not found and taken, a two-cycle penalty is encountered, during which time the buffer is updated Instruction in buffer PredictionActual branchPenalty cycle YesTaken 0 YesTakenNot taken2 NoTaken2 NoNot taken0

Example (p. 211) Determine the total branch penalty for a branch-target buffer assuming the penalty cycles from Figure 3.21 The following assumptions are made:  Prediction accuracy is 90% (for instructions in the buffer)  Hit rate in the buffer is 90% (for branches predicted taken)  Assume that 60% of the branches are taken

Answer (p. 211) Compute the penalty by looking at two events: the branch is predicted taken but ends up being not taken, and the branch is taken but is not found in the buffer. Both carry a penalty of two cycles Probability (branch in buffer, but actually not taken) = Percent buffer hit rate × Percent incorrect predictions = 90% × 10% = 0.09 Probability (branch not in buffer, but actually taken) = 10% Branch penalty = ( ) × 2 = 0.38

Comparison Branch-Target Buffer (BTB) versus Branch-Prediction Buffer (BPB): Shape, size, and contents Which stage in pipeline? How to find an entry? Placement of an entry Replacement of an entry With BTB, why need BPB? Does BPB save any clock cycles? If predicted NT, should branch instr. be kept in BTB?

Variation of Branch-Target Buffer (p. 211) Store one or more target instructions instead of, or in addition to, the predicted target address Two potential advantages:  Allow the branch-target buffer access to take longer than the time between successive instruction fetches, possibly allowing a larger branch-target buffer  Allow us to perform an optimization called branch folding

Branch Folding (p. 213) Use branch folding to obtain zero-cycle unconditional branches Consider a branch-target buffer that buffers instructions from the predicted path and is being accessed with the address of an unconditional branch. The only function of the unconditional branch is to change the PC. Thus, when the branch- target buffer signals a hit and indicates that the branch is unconditional, the pipeline can simply substitute the instruction from the branch-target buffer in place of the instruction that is returned from the cache (which is the unconditional branch).

Integrated Instruction Fetch Unit An instruction fetch unit that integrates several functions: 1. Integrated branch prediction ─ the branch predictor becomes a part of the integrated unit and is constantly predicting branches, so as to drive the fetch pipeline 2. Instruction prefetch ─ the unit autonomously manages prefetching, integrating it with branch prediction 3. Instruction memory access and buffering prediction ─ the unit uses prefetching to hide the cost of crossing cache blocks; it also provides buffering, to provide instructions to the issue stage as needed and in the quantity needed.

Return Address Predictor Want to predict indirect jumps, i.e., jumps whose destination address varies at run time Vast majority of indirect jumps come from procedure returns; 85% for SPEC89 May predict procedure returns with a branch- target buffer. But accuracy will be low if procedure is called from multiple sites and the calls from one site are not clustered in time What can we do?

Figure Prediction accuracy for a return address buffer operated as a stack The accuracy is the fraction of return addresses predicted correctly. Since call depths are typically not large, a modest buffer works well.