Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 ECE 313 - Computer Organization Lecture 18 - Pipelined.

Similar presentations


Presentation on theme: "Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 ECE 313 - Computer Organization Lecture 18 - Pipelined."— Presentation transcript:

1 Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 nestorj@lafayette.edu ECE 313 - Computer Organization Lecture 18 - Pipelined Processor Design 2 Fall 2004 Reading: 6.3-6.6, 6.8 Portions of these slides are derived from: Textbook figures © 1998 Morgan Kaufmann Publishers all rights reserved Tod Amon's COD2e Slides © 1998 Morgan Kaufmann Publishers all rights reserved Dave Patterson’s CS 152 Slides - Fall 1997 © UCB Rob Rutenbar’s 18-347 Slides - Fall 1999 CMU other sources as noted

2 ECE 313 Fall 2004Lecture 18 - Pipelining 22 Pipelining Outline  Introduction  Pipelined Processor Design   Datapath  Control  Dealing with Hazards & Forwarding  Branch Prediction  Exceptions  Performance  Advanced Pipelining  Superscalar  Dynamic Pipelining  Examples

3 ECE 313 Fall 2004Lecture 18 - Pipelining 23 Pipelining in MIPS  MIPS architecture was designed to be pipelined  Simple instruction format (makes IF, ID easy) Single-word instructions Small number of instruction formats Common fields in same place (e.g., rs, rt) in different formats  Memory operations only in lw, sw instructions (simplifies EX)  Memory operands aligned in memory (simplifies MEM)  Single value for writeback (limits forwarding)  Pipelining is harder in CISC architectures

4 ECE 313 Fall 2004Lecture 18 - Pipelining 24 Pipelined Datapath with Control Signals

5 ECE 313 Fall 2004Lecture 18 - Pipelining 25 Next Step: Adding Control  Basic approach: build on single-cycle control  Place control unit in ID stage  Pass control signals to following stages  Later: extra features to deal with:  Data forwarding  Stalls  Exceptions

6 ECE 313 Fall 2004Lecture 18 - Pipelining 26 Control for Pipelined Datapath Source: Book Fig. 6.29, p 469 RegDst ALUOp[1:0] ALUSrc MemRead MemWrite Branch RegWrite MemtoReg

7 ECE 313 Fall 2004Lecture 18 - Pipelining 27 Control for Pipelined Datapath Source: Book Fig. 6.25, p 401

8 ECE 313 Fall 2004Lecture 18 - Pipelining 28 Datapath and Control Unit

9 ECE 313 Fall 2004Lecture 18 - Pipelining 29 Tracking Control Signals - Cycle 1 LW

10 ECE 313 Fall 2004Lecture 18 - Pipelining 210 Tracking Control Signals - Cycle 2 SWLW

11 ECE 313 Fall 2004Lecture 18 - Pipelining 211 Tracking Control Signals - Cycle 3 ADDSWLW 0 01 1

12 ECE 313 Fall 2004Lecture 18 - Pipelining 212 Tracking Control Signals - Cycle 4 SUBADD SW LW 1 0 0

13 ECE 313 Fall 2004Lecture 18 - Pipelining 213 1 1 ADD Tracking Control Signals - Cycle 5 SUB SW LW

14 ECE 313 Fall 2004Lecture 18 - Pipelining 214 Pipelining Outline - Coming Up  Introduction  Pipelined Processor Design  Datapath  Control  Dealing with Hazards & Forwarding   Branch Prediction  Exceptions  Performance  Advanced Pipelining  Superscalar  Dynamic Pipelining  Examples

15 ECE 313 Fall 2004Lecture 18 - Pipelining 215 Data Hazards Revisited…  Data hazards occur when data is used before it is stored (Fig. 6.28)

16 ECE 313 Fall 2004Lecture 18 - Pipelining 216 Data Hazard Solution: Forwarding  Key idea: connect data internally before it's stored (Fig. 6.29)

17 ECE 313 Fall 2004Lecture 18 - Pipelining 217 Data Hazard Solution: Forwarding  Add hardware to feed back ALU and MEM results to both ALU inputs (Fig. 6.32)

18 ECE 313 Fall 2004Lecture 18 - Pipelining 218 Controlling Forwarding  Need to test when register numbers match in rs, rt, and rd fields stored in pipeline registers  "EX" hazard:  EX/MEM - test whether instruction writes register file and examine rd register  ID/EX - test whether instruction reads rs or rt register and matches rd register in EX/MEM  "MEM" hazard:  MEM/WB - test whether instruction writes register file and examine rd (rt) register  ID/EX - test whether instruction reads rs or rt register and matches rd (rt) register in EX/MEM

19 ECE 313 Fall 2004Lecture 18 - Pipelining 219 Forwarding Unit Detail - EX Hazard if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10 if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10

20 ECE 313 Fall 2004Lecture 18 - Pipelining 220 Forwarding Unit Detail - MEM Hazard if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01

21 ECE 313 Fall 2004Lecture 18 - Pipelining 221 EX Hazard Complication  What if a register is changed more than once?  add $1, $1, $2;  add $1, $1, $3;  add $1, $1, $4;  Answer: forward most recent result (in MEM stage)

22 ECE 313 Fall 2004Lecture 18 - Pipelining 222 Forwarding Unit Detail - MEM Hazard Revised if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and (EX/MEM.RegisterRd ≠ ID/EX.RegisterRs) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and (EX/MEM.RegisterRd ≠ ID/EX.RegisterRt) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01

23 ECE 313 Fall 2004Lecture 18 - Pipelining 223 Fig (6.33) Forwarding Elaboration  Extra 2-1 mux needed for immediate instructions Added Mux

24 ECE 313 Fall 2004Lecture 18 - Pipelining 224 Data Hazards and Stalls  We still have to stall when register is loaded from memory and used in following instruction (Fig. 6.34)

25 ECE 313 Fall 2004Lecture 18 - Pipelining 225 Data Hazards and Stalls  Add a hazard detection unit to detect this condition and stall (Fig. 6.35) Typo: Should read AND

26 ECE 313 Fall 2004Lecture 18 - Pipelining 226 Pipelined Processor with Hazard Detection (Fig. 6.36)

27 ECE 313 Fall 2004Lecture 18 - Pipelining 227 Data Transfer Instructions - Binary Representation  Used for load, store instructions  op: Basic operation of the instruction (opcode)  rs: first register source operand  rt: second register source operand  offset: 16-bit signed address offset (-32,768 to +32,767)  Also called “I-Format” or “I-Type” instructions oprsrtoffset 6 bits5 bits 16 bits Address source for sw destination for lw

28 ECE 313 Fall 2004Lecture 18 - Pipelining 228 Hazard Detection Unit - Control Detail if (ID/EX.MemRead and ((ID/EX.RegisterRt = IF/ID.RegisterRs) or ((ID/EX.RegisterRt = IF/ID.RegisterRt))) stall

29 ECE 313 Fall 2004Lecture 18 - Pipelining 229 Hazard detection unit - what happens  MUX zeros out control signals for instruction in ID  "squashes” the instruction  “no-op” propagates through following stages  IF/ID holds stalled instruction until next clock cycle  PC holds current value until next clock cycle (re- loads first instruction)

30 ECE 313 Fall 2004Lecture 18 - Pipelining 230 Branch Hazards  Just stalling for each branch is not practical  Common assumption: branch not taken  When assumption fails: flush three instructions (Fig. 6.37)

31 ECE 313 Fall 2004Lecture 18 - Pipelining 231 Reducing Branch Delay  Key idea: move branch logic to ID stage of pipeline  New adder calculates branch target (PC + 4 + extend(IMM) << 2)  New hardware tests rs == rt after register read  Add flush signal to squash instruction in IF/ID register  Reduced penalty (1 cycle) when branch taken  Example: Figure 6.38, p. 420

32 ECE 313 Fall 2004Lecture 18 - Pipelining 232 Pipelined Processor - Branch Hardware in ID (Old Fig. 6.51)

33 ECE 313 Fall 2004Lecture 18 - Pipelining 233 Pipelining Outline  Introduction  Pipelined Processor Design  Datapath  Control  Dealing with Hazards & Forwarding  Branch Prediction   Exceptions  Performance  Advanced Pipelining  Superscalar  Dynamic Pipelining  Examples

34 ECE 313 Fall 2004Lecture 18 - Pipelining 234 Branch Prediction  Key idea: instead of always assuming branch not taken, use a prediction based on previous history  Branch history table: small memory index using lower bits instruction address save “what happened” on last execution –branch taken OR –branch not taken  Use history to make prediction

35 ECE 313 Fall 2004Lecture 18 - Pipelining 235 More about Branch Prediction  Consider nested loops: for (i=1; i<M; i++) { oloop:... for (j=1;j<N; j++) { iloop:......... } bne $1,$2, iloop } bne $3,$4, oloop  Prediction fails on first and last branch  More history can improve performance

36 ECE 313 Fall 2004Lecture 18 - Pipelining 236 Branch Prediction w/2-Bit History  Key idea: must be wrong twice before changing prediction

37 ECE 313 Fall 2004Lecture 18 - Pipelining 237 Pipelining Outline  Introduction  Pipelined Processor Design  Datapath  Control  Dealing with Hazards & Forwarding  Branch Prediction  Exceptions   Performance  Advanced Pipelining  Superscalar  Dynamic Pipelining  Examples

38 ECE 313 Fall 2004Lecture 18 - Pipelining 238 Pipelining and Exceptions  Exceptions require suspension of execution  Complicating factors  Several instructions are in pipeline  Exception may occur before instruction is complete  Must flush pipeline to suspend execution, but may lose information about the exception

39 ECE 313 Fall 2004Lecture 18 - Pipelining 239 Pipelining and Exceptions (cont’d) (Fig. 6.42, old 6.55)

40 ECE 313 Fall 2004Lecture 18 - Pipelining 240 Pipelining and Exceptions (cont’d)  Operation: Figure 6.43 (p. 508)  Exceptions make life difficult - take a computer architecture course to learn more.

41 ECE 313 Fall 2004Lecture 18 - Pipelining 241 Pipelining Outline  Introduction  Pipelined Processor Design  Datapath  Control  Dealing with Hazards & Forwarding  Branch Prediction  Exceptions  Performance   Advanced Pipelining  Superscalar  Dynamic Pipelining  Examples

42 ECE 313 Fall 2004Lecture 18 - Pipelining 242 Performance of the Pipelined Implementation  Use “gcc” instr. mix to calculate CPI lw25%1 cycle (2 cycles when load-use hazard) sw10%1 cycle R-type52%1 cycle branch11%1 cycle (2 when prediction wrong) jump2%2 cycles  Assumptions:  50% of load instructions are followed by immed. use  25% of branch predictions are wrong  Calculating CPI  CPI = (1.5 cycles * 0.25) + (1 cycle * 0.10) + (1 cycle * 0.52) + (1.25 cycles * 0.11) + (2 cycles * 0.02)  CPI = 1.17 cycles per instruction

43 ECE 313 Fall 2004Lecture 18 - Pipelining 243 Performance of the Pipelined Implementation (cont’d)  Calculate the average execution time: Pipelined1.17 CPI * 200ps/clock= 234ps Single-Cycle 1 CPI * 600ps/clock=600ps Multicycle4.12 CPI * 200ps / clock=824ps  Speedup of pipelined implementation  2.56X faster than single cycle  3.4X faster than multicycle  CPI may differ as instruction mix changes, id est, depending on the performance benchmarks

44 ECE 313 Fall 2004Lecture 18 - Pipelining 244 Pipelining Outline  Introduction  Pipelined Processor Design  Datapath  Control  Dealing with Hazards & Forwarding  Branch Prediction  Exceptions  Performance  Advanced Pipelining   Superscalar  Dynamic Pipelining  Examples

45 ECE 313 Fall 2004Lecture 18 - Pipelining 245 Pipelining Outline  Introduction  Pipelined Processor Design   Datapath  Control  Dealing with Hazards & Forwarding  Branch Prediction  Exceptions  Performance  Advanced Pipelining  Superscalar  Dynamic Pipelining  Examples

46 ECE 313 Fall 2004Lecture 18 - Pipelining 246 Pipelining in MIPS  MIPS architecture was designed to be pipelined  Simple instruction format (makes IF, ID easy) Single-word instructions Small number of instruction formats Common fields in same place (e.g., rs, rt) in different formats  Memory operations only in lw, sw instructions (simplifies EX)  Memory operands aligned in memory (simplifies MEM)  Single value for writeback (limits forwarding)  Pipelining is harder in CISC architectures

47 ECE 313 Fall 2004Lecture 18 - Pipelining 247 Next Step: Adding Control  Basic approach: build on single-cycle control  Place control unit in ID stage  Pass control signals to following stages  Later: extra features to deal with:  Data forwarding  Stalls  Exceptions

48 ECE 313 Fall 2004Lecture 18 - Pipelining 248 Pipelining Outline - Coming Up  Introduction  Pipelined Processor Design  Datapath  Control  Dealing with Hazards & Forwarding   Branch Prediction  Exceptions  Performance  Advanced Pipelining  Superscalar  Dynamic Pipelining  Examples

49 ECE 313 Fall 2004Lecture 18 - Pipelining 249 Pipelining Outline  Introduction  Pipelined Processor Design  Datapath  Control  Dealing with Hazards & Forwarding  Branch Prediction   Exceptions  Performance  Advanced Pipelining  Superscalar  Dynamic Pipelining  Examples

50 ECE 313 Fall 2004Lecture 18 - Pipelining 250 Pipelining Outline  Introduction  Pipelined Processor Design  Datapath  Control  Dealing with Hazards & Forwarding  Branch Prediction  Exceptions   Performance  Advanced Pipelining  Superscalar  Dynamic Pipelining  Examples

51 ECE 313 Fall 2004Lecture 18 - Pipelining 251 Pipelining Outline  Introduction  Pipelined Processor Design  Datapath  Control  Dealing with Hazards & Forwarding  Branch Prediction  Exceptions  Performance   Advanced Pipelining  Superscalar  Dynamic Pipelining  Examples

52 ECE 313 Fall 2004Lecture 18 - Pipelining 252 Performance of the Pipelined Implementation  Use “gcc” instr. mix to calculate CPI lw25%1 cycle (2 cycles when load-use hazard) sw10%1 cycle R-type52%1 cycle branch11%1 cycle (2 when prediction wrong) jump2%2 cycles  Assmptions:  50% of load instructions are followed by immed. use  25% of branch predictions are wrong  Calculating CPI  CPI = (1.5 cycles * 0.25) + (1 cycle * 0.10) + (1 cycle * 0.52) + (1.25 cycles * 0.11) + (2 cycles * 0.02)  CPI = 1.17 cycles per instruction

53 ECE 313 Fall 2004Lecture 18 - Pipelining 253 Performance of the Pipelined Implementation (cont’d)  Calculate the average execution time: Pipelined1.17 CPI * 200ps/clock= 234ps Single-Cycle 1 CPI * 600ps/clock=600ps Multicycle4.12 CPI * 200ps / clock=824ps  Speedup of pipelined implementation  2.56X faster than single cycle  3.4X faster than multicycle  “Your mileage may differ” as instruction mix changes

54 ECE 313 Fall 2004Lecture 18 - Pipelining 254 Pipelining Outline  Introduction  Pipelined Processor Design  Datapath  Control  Dealing with Hazards & Forwarding  Branch Prediction  Exceptions  Performance  Advanced Pipelining   Superscalar  Dynamic Pipelining  Examples


Download ppt "Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 ECE 313 - Computer Organization Lecture 18 - Pipelined."

Similar presentations


Ads by Google