1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Sep 21, 2005 Topic: Pipelining -- Intermediate Concepts (Control Hazards)

Slides:



Advertisements
Similar presentations
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
Advertisements

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Pipelining and Control Hazards Oct
Lecture Objectives: 1)Define branch prediction. 2)Draw a state machine for a 2 bit branch prediction scheme 3)Explain the impact on the compiler of branch.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
Pipelining - Hazards.
Computer Architecture Computer Architecture Processing of control transfer instructions, part I Ola Flygt Växjö University
8 Processing of control transfer instructions TECH Computer Science 8.1 Introduction 8.2 Basic approaches to branch handling 8.3 Delayed branching 8.4.
Instruction-Level Parallelism (ILP)
1 A few words about the quiz Closed book, but you may bring in a page of handwritten notes. –You need to know what the “core” MIPS instructions do. –I.
EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipeline Hazards See: P&H Chapter 4.7.
Pipeline Control Hazards and Instruction Variations Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H Appendix 4.8 &
Pipeline Control Hazards and Instruction Variations Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H Appendix 4.8 &
CIS429/529 Winter 2007 Pipelining-1 1 Pipeling RISC/MIPS64 five stage pipeline Basic pipeline performance Pipeline hazards Branch hazards More pipeline.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 8, 2003 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 7, 2002 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)
CIS429.S00: Lec10- 1 Control Hazards Created by branch statements BEQZLOC ADDR1,R2,R3. LOCSUBR1,R2,R3 PC needs to be computed but it happens too late in.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 1.
Goal: Reduce the Penalty of Control Hazards
CIS629 Fall 2002 Pipelining 2- 1 Control Hazards Created by branch statements BEQZLOC ADDR1,R2,R3. LOCSUBR1,R2,R3 PC needs to be computed but it happens.
COMP381 by M. Hamdi 1 Pipelining Control Hazards and Deeper pipelines.
King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.
EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
COMP381 by M. Hamdi 1 (Recap) Control Hazards. COMP381 by M. Hamdi 2 Control (Branch) Hazard A: beqz r2, label B: label: P: Problem: The outcome.
Appendix A Pipelining: Basic and Intermediate Concepts
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 17 - Pipelined.
1 Manchester Mark I, This was the second (the first was a small- scale prototype) machine built at Cambridge. A production version of this computer.
-1.1- PIPELINING 2 nd week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM PIPELINING 2 nd week References Pipelining concepts The DLX.
Pipelining. 10/19/ Outline 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and Interrupts Conclusion.
Memory/Storage Architecture Lab Computer Architecture Pipelining Basics.
EEL5708 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Pipelining.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.
CMPE 421 Parallel Computer Architecture
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
Winter 2002CSE Topic Branch Hazards in the Pipelined Processor.
Branch Hazards and Static Branch Prediction Techniques
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 9: MIPS Pipeline.
HazardsCS510 Computer Architectures Lecture Lecture 7 Pipeline Hazards.
LECTURE 7 Pipelining. DATAPATH AND CONTROL We started with the single-cycle implementation, in which a single instruction is executed over a single cycle.
CBP 2005Comp 3070 Computer Architecture1 Last Time … All instructions the same length We learned to program MIPS And a bit about Intel’s x86 Instructions.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
CMPE 421 Parallel Computer Architecture Part 3: Hardware Solution: Control Hazard and Prediction.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
CS203 – Advanced Computer Architecture Pipelining Review.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
Data Hazards Dependent instructions add %g1, %g2, %g3 sub %l1, %g3, %o0 Forwarding helps, but not all hazards can be avoided.
CSCI206 - Computer Organization & Programming
Pipelining review.
Pipelining Chapter 6.
Pipelining in more detail
CSCI206 - Computer Organization & Programming
Pipeline control unit (highly abstracted)
The Processor Lecture 3.6: Control Hazards
Control unit extension for data hazards
Pipeline control unit (highly abstracted)
CS203 – Advanced Computer Architecture
Pipeline Control unit (highly abstracted)
Pipelining (II).
Control unit extension for data hazards
Control unit extension for data hazards
Systems Architecture II
Problem ??: (?? marks) Consider executing the following code on the MIPS pipelined datapath: add $t5, $t6, $t8 add $t9, $t5, $t4 lw $t3, 100($t9) sub $t2,
Presentation transcript:

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Sep 21, 2005 Topic: Pipelining -- Intermediate Concepts (Control Hazards)

2 Control Hazard  A peculiar kind of RAW hazard involving the program counter PC written by branch instruction PC written by branch instruction PC read by instruction fetch unit (not another instruction) PC read by instruction fetch unit (not another instruction)  Possible misbehavior is that instructions fetched and executed after the branch instruction are not the ones specified by the branch instruction

3 Control Hazard: Example Br-1 Br Br+1 Br+2 Br+3 … T Br-1 Br Br+1 Br+2 Br+3 … T Unpipelined implementation Pipelined with PNT strategy

4 More on Control Hazards  Branch delay: the length of the control hazard  What determines branch delay? We need to know that we have a branch instruction We need to know that we have a branch instruction We need to have the BTA We need to have the BTA We need to know the branch outcome We need to know the branch outcome So, we have to wait until we know all of these quantities So, we have to wait until we know all of these quantities  An older pipeline (DLX, HP2): …computes BTA in EX …computes BTA in EX …computes branch outcome in EX …computes branch outcome in EX …changes PC in MEM …changes PC in MEM  To reduce branch delay, these steps are moved to earlier pipeline stages in MIPS (HP3): Can’t move up beyond ID (need to know it’s a branch instruction) Can’t move up beyond ID (need to know it’s a branch instruction)

5 IF/ID ID/EX Reducing Branch Delays Example: sub $10, $4, $8 beq $10, $3, go add $12, $2, $5... go:lw $4, 16($12)

6 Dealing with Branch Delays  Four strategies Stall Stall Predict Taken, variation A (PTA) Predict Taken, variation A (PTA) Predict Taken, variation B (PTB) Predict Taken, variation B (PTB) Predict Not Taken (PNT) Predict Not Taken (PNT)  Consider a hypothetical 12-stage pipeline Instruction is fetched in stage 1 (IF) Instruction is fetched in stage 1 (IF) Opcode becomes known in stage 2 (ID) Opcode becomes known in stage 2 (ID) BTA becomes known in stage 4 BTA becomes known in stage 4 Branch outcome becomes known in stage 6 Branch outcome becomes known in stage 6  Parameters PU, PT, PNT: penalties of unconditional branch, taken branch, untaken branch PU, PT, PNT: penalties of unconditional branch, taken branch, untaken branch T: probability of branch being taken T: probability of branch being taken

7 Stall Strategy: 12-Stage Pipeline  Pipeline stalls on all branches  Instructions 1 and 8 are branches 1 is not taken, 8 is taken 1 is not taken, 8 is taken  Opcode determination in stage 2 stalls pipeline  Branch outcome determination in stage 6 restarts pipeline from IF or ID  BTA determination in stage 4 would restart pipeline from IF for jumps  PU = 3, PT = 5, PNT = 4

8  Pipeline continues execution assuming that the branch will fall through  Instructions 1 and 12 are branches 1 is not taken, 12 is taken 1 is not taken, 12 is taken  Branch outcome determination in stage 6 restarts pipeline from IF for taken branches (cancelling instructions already in pipeline)  BTA determination in stage 4 would restart pipeline from IF for jumps  PU = 3, PT = 5, PNT = 0 PNT Strategy: 12-Stage Pipeline

9 PTA Strategy: 12-Stage Pipeline  Pipeline predicts all branches to be taken and restarts pipeline from IF at BTA as soon as BTA is known (cancelling instructions already in pipe)  Instructions 1 and 7 are branches 1 is not taken, 7 is taken 1 is not taken, 7 is taken  Branch outcome determination in stage 6 restarts pipeline from IF for untaken branches (cancelling instructions already in pipeline)  PU = 3, PT = 3, PNT = 5

10 PTB Strategy: 12-Stage Pipeline  Pipeline predicts all instructions to be taken and starts fetching from BTA as soon as it is known in stage 4 (but without cancelling instructions already in pipeline)  Instructions 1 and 10 are branch instructions 1 is not taken, 10 is taken 1 is not taken, 10 is taken  Branch outcome determination in stage 6 restarts pipeline from IF on fall- through path (for untaken branches), and causes cancellation  PU = 3, PT = 3, PNT = 2

11 Effect of Control Hazards on Pipelines Assume that 20% of all instructions are transfers of control, split 5% for unconditional jumps and 15% for conditional branches. For each of the four branching schemes for the 12-stage pipeline, determine the branch penalty as a function of T, the probability of a branch being taken. Assume that 20% of all instructions are transfers of control, split 5% for unconditional jumps and 15% for conditional branches. For each of the four branching schemes for the 12-stage pipeline, determine the branch penalty as a function of T, the probability of a branch being taken.

12 Solution for 12-Stage Pipeline  Stall: 0.25*3+0.75*(T*5+(1-T)*4) = T  PTA: 0.25*3+0.75*(T*3+(1-T)*5) = T  PTB: 0.25*3+0.75*(T*3+(1-T)*2) = T  PNT: 0.25*3+0.75*(T*5+(1-T)*0) = T

13 Delayed Branches on MIPS  One branch delay slot on MIPS  Always execute instruction in branch delay slot (irrespective of branch outcome)  Question: What instruction do we put in the branch delay slot? Fill with NOP (always possible, penalty = 1) Fill with NOP (always possible, penalty = 1) Fill from before (not always possible, penalty = 0) Fill from before (not always possible, penalty = 0) Fill from target (not always possible, penalty = 1-T) Fill from target (not always possible, penalty = 1-T)  BTA is dynamic  BTA is another branch Fill from fall-through (not always possible, penalty = T) Fill from fall-through (not always possible, penalty = T)

14 Details of Various Branch Flavors X: cond ABCDABCD EFGHEFGH MNPQMNPQ truefalse A: B: C: D: X: if cond goto E M: N: P: Q: … E: F: G: H: A: B: C: D: X: if cond goto E M: N: P: Q: … E: F: G: H: Ordinary A: B: C: X: if cond goto E D: M: N: P: Q: … E: F: G: H: A: B: C: X: if cond goto E D: M: N: P: Q: … E: F: G: H: Delayed, filled from before

15 Instruction Sequence Alteration Strategies  To allow for more aggressive filling of branch delay slot from target or fall-through, we can selectively cancel instructions  Classification of branches Delayed branch Delayed branch  Instruction in branch delay slot is always executed Plain branch Plain branch  Instruction in branch delay slot is cancelled if branch is taken  Useful if compiler filled branch delay slot from fall-through Canceling (annulling, nullifying) branch Canceling (annulling, nullifying) branch  Instruction in branch delay slot is cancelled if branch is not taken  Useful if compiler filled branch delay slot from target  Should not cancel instruction if it may cause exception  A bit in the instruction set by compiler makes the choice MIPS, SPARC, PA-RISC: delayed (0), canceling (1) MIPS, SPARC, PA-RISC: delayed (0), canceling (1) M 88000, i860: delayed (0), plain (1) M 88000, i860: delayed (0), plain (1)

16 Example: Branch Penalties Consider a DLX pipeline with a single branch delay slot in which 25% of branches are unconditional. 50% of the unconditional branches have their delay slots filled from before, 40% from the target, and 10% with NOPs. The branch delay slots of the conditional branches are filled from various sources as shown in the table below, depending on the kind of branch used. For each of the cases, determine the branch penalty as a function of T, the probability that a conditional branch is taken. How do these penalties compare to those obtained by using a Stall, PT, or PNT strategy? Consider a DLX pipeline with a single branch delay slot in which 25% of branches are unconditional. 50% of the unconditional branches have their delay slots filled from before, 40% from the target, and 10% with NOPs. The branch delay slots of the conditional branches are filled from various sources as shown in the table below, depending on the kind of branch used. For each of the cases, determine the branch penalty as a function of T, the probability that a conditional branch is taken. How do these penalties compare to those obtained by using a Stall, PT, or PNT strategy? For all of Stall, PT, and PNT on DLX: PU = 1, PT = 1, PNT = 0

17 Solution: Branch Penalties