EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers The Processor
Advertisements

1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
Pipelining and Control Hazards Oct
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
Computer Organization and Architecture
1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Chapter 12 Pipelining Strategies Performance Hazards.
EECS 470 Pipeline Hazards Lecture 4 Coverage: Appendix A.
Computer Architecture Lecture 3 Coverage: Appendix A
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
Lec 8: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.
COMP381 by M. Hamdi 1 (Recap) Control Hazards. COMP381 by M. Hamdi 2 Control (Branch) Hazard A: beqz r2, label B: label: P: Problem: The outcome.
Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.
Appendix A Pipelining: Basic and Intermediate Concepts
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
CH12 CPU Structure and Function
EECS 470 Further review: Pipeline Hazards and More Lecture 2 – Winter 2014 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti,
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
Edited By Miss Sarwat Iqbal (FUUAST) Last updated:21/1/13
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 8: MIPS Pipelined.
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CMPE 421 Parallel Computer Architecture
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
Sample Code (Simple) Run the following code on a pipelined datapath: add1 2 3 ; reg 3 = reg 1 + reg 2 nand ; reg 6 = reg 4 & reg 5 lw ; reg.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Pipelining Basics.
Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques.
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]
Winter 2002CSE Topic Branch Hazards in the Pipelined Processor.
CDA 5155 Computer Architecture Week 1.5. Start with the materials: Conductors and Insulators Conductor: a material that permits electrical current to.

Cs 152 L1 3.1 DAP Fa97,  U.CB Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Multiple tasks.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.
LECTURE 7 Pipelining. DATAPATH AND CONTROL We started with the single-cycle implementation, in which a single instruction is executed over a single cycle.
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
Introduction to Computer Organization Pipelining.
CDA 5155 Week 3 Branch Prediction Superscalar Execution.
CSCE 212 Chapter 6 Enhancing Performance with Pipelining Instructor: Jason D. Bakos.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
EECS 370 Discussion 1 xkcd.com. EECS 370 Discussion Topics Today: – Control Hazards – Branch Prediction – Project 3 – stackoverflow Example 2.
Computer Organization CS224
Stalling delays the entire pipeline
Morgan Kaufmann Publishers
Pipeline Implementation (4.6)
Chapter 4 The Processor Part 4
Morgan Kaufmann Publishers The Processor
Pipelining review.
The processor: Pipelining and Branching
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Pipelining in more detail
The Processor Lecture 3.6: Control Hazards
Presentation transcript:

EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A

Pipeline function for BEQ Fetch: read instruction from memory Decode: read source operands from reg Execute: calculate target address and test for equality Memory: Send target to PC if test is equal Writeback: Nothing left to do

Control Hazards beq sub time fetch decode execute memory writeback fetch decode execute beq sub

Approaches to handling control hazards Avoidance –Make sure there are no hazards in the code Detect and Stall –Delay fetch until branch resolved. Speculate and Squash if wrong –Go ahead and fetch more instruction in case it is correct, but stop them if they shouldn’t have been executed

Handling branch hazards: avoid all hazards Don’t have branch instructions! –Maybe a little impractical Predication can eliminate some branches –If-conversion –Hyperblocks

if-conversion if (a == b) { x++; y = n / d; } subt1  a, b jnzt1, PC+2 addx  x, #1 divy  n, d sub t1  a, b add(t1) x  x, #1 div(t1) y  n, d sub t1  a, b add t2  x, #1 div t3  n, d cmov(t1) x  t2 cmov(t1) y  t3

Removing hazards by refining a branch instruction Redefine branch instructions: ptbeq regA regB offset prepare to branch if equal If (R[regA] = = R[regB]) execute instructions at PC+1, PC+2, PC+3 then PC+1+offset

ptbnz example t = 5 n = 7 g = c + 2 bnz g, PC + 1 m = 5 a = 3 g = c + 2 bnz g, PC + 4 t = 5 n = 7 noop m = 5 a = 3

Problems with this solution Old programs (legacy code) may not run correctly on new implementations –Longer pipelines tend to need more noops Programs get larger as noops are included –Especially a problem for machines that try to execute more than one instruction every cycle –Harder to find useful instructions Program execution is slower –CPI is one, but some I’s are noops

Handling control hazards: detect and stall Detection: –Must wait until decode –Compare opcode to beq or jalr –Alternately, this is just another control signal Stall: –Keep current instructions in fetch –Pass noop to decode stage (not execute!)

PC Inst mem REG file MUXMUX ALUALU MUXMUX 1 Data memory ++ MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB sign ext Control bnz r1

PC Inst mem REG file MUXMUX ALUALU MUXMUX 1 Data memory ++ MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB sign ext Control noop MUXMUX

Control Hazards beq sub time fetch decode execute memory writeback fetch fetch fetch beq sub fetch or fetch Target:

Problems with detect and stall CPI increases every time a branch is detected! Is that necessary? Not always! –Only about ½ of the time is the branch taken Let’s assume that it is NOT taken… –In this case, we can ignore the beq (treat it like a noop) –Keep fetching PC + 1 What if we are wrong? –OK, as long as we do not COMPLETE any instructions we mistakenly executed (i.e. don’t perform writeback)

Handling data hazards: speculate and squash Speculate: assume not equal –Keep fetching from PC+1 until we know that the branch is really taken Squash: stop bad instructions if taken –Send a noop to: Decode, Execute and Memory –Send target address to PC

PC REG file MUXMUX ALUALU MUXMUX 1 Data memory ++ MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB sign ext Control equal MUXMUX beq sub add nand add subbeq Inst mem noop

Problems with fetching PC+1 CPI increases every time a branch is taken! –About ½ of the time Is that necessary? No!, but how can you fetch from the target before you even know the previous instruction is a branch – much less whether it is taken???

PC Inst mem REG file MUXMUX ALUALU MUXMUX 1 Data memory ++ MUXMUX IF/ ID ID/ EX EX/ Mem Mem/ WB sign ext Control beq bpc MUXMUX target eq?

Branch Target Buffer Fetch PC Predicted target PC Send PC to BTB found? Yes use target use PC+1 No

Branch prediction Predict not taken: ~50% accurate –No BTB needed; always use PC+1 Predict backward taken:~65% accurate –BTB holds targets for backward branches (loops) Predict same as last time:~80% accurate –Update BTB for any taken branch

What about indirect branches? Could use same approach –PC+1 unlikely indirect target –Indirect jumps often have multiple targets (for same instruction) Switch statements Virtual function calls Shared library (DLL) calls

Indirect jump: Special Case Return address stack –Function returns have deterministic behavior (usually) Return to different locations (BTB doesn’t work well) Return location known ahead of time –In some register at the time of the call –Build a specialize structure for return addresses Call instructions write return address to R31 AND RAS Return instructions pop predicted target off stack –Issues: finite size (save or forget on overflow?); –Issues: long jumps (clear when wrong?)

Branch prediction Pentium:~85% accurate Pentium Pro:~92% accurate Best paper designs:~96% accurate

Costs of branch prediction/speculation Performance costs? –Minimal: no difference between waiting and squashing; and it is a huge gain when prediction is correct! Power? –Large: in very long/wide pipelines many instructions can be squashed Squashed = # mispredictions  pipeline length/width before target resolved Area? –Can be large: predictors can get very big as we will see next time Complexity? –Designs are more complex –Testing becomes more difficult

What else can be speculated? Dependencies –I think this data is coming from that store instruction) Values –I think I will load a 0 value Accuracy? –Branch prediction (direction) is Boolean (T,NT) –Branch targets are stable or predictable (RAS) –Dependencies are limited –Values cover a huge space (0 – 4B)