EECE476: Computer Architecture Lecture 19: Pipelining Reducing Control Hazard Penalty Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy.

Slides:

Advertisements

Similar presentations

Morgan Kaufmann Publishers The Processor

Advertisements

1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.

Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.

Lecture Objectives: 1)Define branch prediction. 2)Draw a state machine for a 2 bit branch prediction scheme 3)Explain the impact on the compiler of branch.

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.

Pipelining - Hazards.

Instruction-Level Parallelism (ILP)

1 A few words about the quiz Closed book, but you may bring in a page of handwritten notes. –You need to know what the “core” MIPS instructions do. –I.

EECE476: Computer Architecture Lecture 21: Faster Branches Branch Prediction with Branch-Target Buffers (not in textbook) The University of British ColumbiaEECE.

EECE476: Computer Architecture Lecture 18: Pipelining Control Hazards Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

EECE476: Computer Architecture Lecture 15: Basic Pipelining Datapath & Control Logic Chapter 6.1, 6.2, 6.3 The University of British ColumbiaEECE 476©

MIPS Pipeline Default behaviour and pipeline organization The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipeline Hazards See: P&H Chapter 4.7.

Pipeline Hazards Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H Appendix 4.7.

CS Computer Architecture 1 CS 430 – Computer Architecture Pipelined Execution - Review William J. Taffe using slides of David Patterson.

Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

EECE476 Lecture 9: Multi-cycle CPU Datapath Chapter 5: Section 5.5 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

EECE476: Computer Architecture Lecture 20: Branch Prediction Chapter extra The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan

1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.

Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.

Goal: Reduce the Penalty of Control Hazards

Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.

EECE476: Computer Architecture Lecture 17: Pipelining Data Hazards: Forwarding & Stalls Chapter 6.4, 6.5 The University of British ColumbiaEECE 476© 2005.

ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.

1 Stalls and flushes  So far, we have discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.

Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.

Automobile Manufacturing 1. Build frame. 60 min. 2. Add engine. 50 min. 3. Build body. 80 min. 4. Paint. 40 min. 5. Finish.45 min. 275 min. Latency: Time.

1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.

B10001 Pipelining Hazards ENGR xD52 Eric VanWyk Fall 2012.

EEL5708 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Pipelining.

Pipeline Hazards. CS5513 Fall Pipeline Hazards Situations that prevent the next instructions in the instruction stream from executing during its.

CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.

CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.

CMPE 421 Parallel Computer Architecture

1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.

CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S

Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.

CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]

Winter 2002CSE Topic Branch Hazards in the Pipelined Processor.

CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and

Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.

Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.

CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)

CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.

Introduction to Computer Organization Pipelining.

Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.

ECE/CS 552: Pipeline Hazards © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim.

Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.

Stalling delays the entire pipeline

CDA3101 Recitation Section 8

CSCI206 - Computer Organization & Programming

Morgan Kaufmann Publishers

Appendix C Pipeline implementation

Chapter 4 The Processor Part 3

Pipelining review.

Pipelining in more detail

CSCI206 - Computer Organization & Programming

CSCI206 - Computer Organization & Programming

The Processor Lecture 3.6: Control Hazards

The Processor Lecture 3.5: Data Hazards

Control unit extension for data hazards

Control unit extension for data hazards

Guest Lecturer: Justin Hsia

Presentation transcript:

EECE476: Computer Architecture Lecture 19: Pipelining Reducing Control Hazard Penalty Chapter 6.6 The University of British ColumbiaEECE 476© 2005 Guy Lemieux

2 Reminders Midterm 1: Tomorrow! –50 minutes, not open book, calculator OK –Based on your assignments + lecture material –Covers EVERYTHING up to end of week 5 –No Verilog, Altera tools on midterm –Try Lecture 14 study problems –Try extra study problems on web Partner signup deadline: Friday! –MUST work in pairs ME IMMEDIATELY IF YOU’RE STILL ALONE –MUST to project TA: 2 names, stud #s, s –Late penalty: 5% of final grade

3 Directly Reducing Penalty of Control Hazards Control hazards demand solutions: –Stalling –Nullify –Branch delay slots –All of these negatively impact performance (in various ways) Alternative –Can we directly reduce the negative impact of control hazards? Yes! –Execute branch/jump instruction earlier in pipeline –Outcome known sooner –Fetch fewer instructions enter pipeline after branch (before outcome known) –For BEQ, we must detect “equals” earlier

4 BEQ in “X” Stage Instead? Move logic gates from “M” stage into “X” stage –“AND” gate, PCSrc mux –These logic gates depend on results *after* ALU Benefit –Only 2 instructions follow BEQ into pipeline –Improves only BEQ-if-taken performance But… –ALU may be slowest part of pipeline –Causes longer delay path after ALU, “X” stage slower –Clock rate may be affected –This may negatively affect ALL instructions, be careful!!!!

5 Moving BEQ into “X” Stage Red shows extra signal delay after ALU

6 BEQ in “D” Stage Instead? Move logic gates from “M” stage into “D” stage –But we need ALU to compute (Rs – Rt) and Zero –How can we do this? Key benefit –Only 1 instruction follows “BEQ” into pipeline Notice –“EQ” can be computed efficiently in “D” stage (Rs XOR Rt) == 32’b0 Simple bitwise XOR “== 32’b0” is simple: wide 32-input NOR gate, single output –No need for subtraction Simple logic, no carry chain –Move “ + SgnExt(Imm16) logic into “D” stage as well

7 Detecting “EQ” Earlier

8 Reducing Branch Penalty Reducing branch penalty –Compute (Rs == Rt) and target address in “D” stage –Reduces branch delay to 1 cycle –Works well But, this introduces a forwarding error! Suppose ADD $1, $2, $3  previous instruction BEQ $1, $2, 7  RAW hazard: needs new $1, forwarding? NOP  delay slot –Dependency causes data hazard Solutions? –Option 1: Stall until writeback of dependent instruction –Option 2A: Forward as much as possible (stall 1 cycle for LW) –Option 2B: Forward a bit less (stall 2 cycles for LW, 1 cycle for others)

9 Data Hazard with “BEQ” Example: Clock cycle 1 1ADD $1,$2,$3IDXMW BEQ $1,$2, 7IDXMW NOPIDXMW IDXMW IDXMW IDXMW IDXMW IDXMW IDXMW IDXMW W M X D I

10 Data Hazard with “BEQ” Clock cycle 2 1ADD $1,$2,$3IDXMW 2BEQ $1,$2, 7IDXMW NOPIDXMW IDXMW IDXMW IDXMW IDXMW IDXMW IDXMW IDXMW W M X D I

11 Data Hazard with “BEQ” Clock cycle 3 normal forwarding into X doesn’t work, arrives too late! 1ADD $1,$2,$3IDXMW 2BEQ $1,$2, 7IDXMW 3NOPIDXMW IDXMW IDXMW IDXMW IDXMW IDXMW IDXMW IDXMW W M X D I

12 Forwarding with “BEQ” Clock cycle 4 Option 2B: insert “bubble”, forward to D 1ADD $1,$2,$3IDXMW 2?IDX?? 3BEQ $1,$2, 7IDDXMW 4NOPIDXMW IDXMW IDXMW IDXMW IDXMW IDXMW IDXMW W M ? D I

13 Cause of the Error? Moved “BEQ” execution from X to D –PROBLEM: pipeline currently only forwards data into X stage To resolve, we have two options: –Option 1: Low performance, add Hazard Detection Unit (HDU) condition BEQ depends on earlier instruction Stall >= 1 cycle until dependent instruction instruction finishes writeback If not told otherwise, assume this approach –Option 2, Higher performance, add Forward Detection Unit (FDU) and muxes Option 2A: Forward data from ALU out, DataMem out, W result into D stage –Avoids most stalls (no HDU needed, except for LW case again). –Longer delay, will probably slow clock and affect all instructions. Option 2B: Stall if dependent instr in X, forward data from M, W stages into D stage, stall if dependence in X –HDU must now stall 1 cycle when BEQ depends on immediately prior R-type instruction.

14 Control Hazards Summary Branches/jumps cause interruptions to control flow –This affects the stream of instructions entering pipeline afterward the branch/jump These interruptions cause a utilization problem –We may fetch the wrong instruction(s) after branch/jump Option 1: stall after every branch/jump Option 2: nullify-if-branch-taken (small performance improvement) Option 3: declare as a “delay slot”, always-execute (bad idea for future ISAs) –Default: assume MIPS behaviour – Option 3 with 1 Delay Slot To reduce Utilization problem, move branch/jump to D stage –This introduces new data hazards if the branch depends upon recent instructions These hazards introduce a new forwarding problem –Branch/jump may depend on result of recent instruction(s) Option 1: HDU forces stall until writeback (multiple cycles) Option 2: minimal HDU stall, forward data when dependence can be resolved –2A: more forwarding needed, stall only for LW, may slow down clock speed –2B: less fowarding needed, stall for LW and R-type until forwarding can be used –Default: assume FORWARDING OPTION 1 for this course