1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers The Processor
Advertisements

Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
Chapter Six 1.
11/1/2005Comp 120 Fall November Exam Postmortem 11 Classes to go! Read Sections 7.1 and 7.2 You read 6.1 for this time. Right? Pipelining then.
ECE 445 – Computer Organization
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Pipelined Processor.
Review: MIPS Pipeline Data and Control Paths
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Pipelining III Andreas Klappenecker CPSC321 Computer Architecture.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan
Chapter Six Enhancing Performance with Pipelining
1 Chapter Six - 2nd Half Pipelined Processor Forwarding, Hazards, Branching EE3055 Web:
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
L18 – Pipeline Issues 1 Comp 411 – Spring /03/08 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you.
L17 – Pipeline Issues 1 Comp 411 – Fall /1308 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you been.
1 CSE SUNY New Paltz Chapter Six Enhancing Performance with Pipelining.
Computer Organization Lecture Set – 06 Chapter 6 Huei-Yung Lin.
Lecture 28: Chapter 4 Today’s topic –Data Hazards –Forwarding 1.
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve performance by increasing instruction throughput.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
Pipeline Data Hazards: Detection and Circumvention Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly.
Pipelined Datapath and Control
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CMPE 421 Parallel Computer Architecture
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
CMPE 421 Parallel Computer Architecture Part 2: Hardware Solution: Forwarding.
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]
Cs 152 L1 3.1 DAP Fa97,  U.CB Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Multiple tasks.
Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
2/15/02CSE Data Hazzards Data Hazards in the Pipelined Implementation.
Computing Systems Pipelining: enhancing performance.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 9: MIPS Pipeline.
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
LECTURE 7 Pipelining. DATAPATH AND CONTROL We started with the single-cycle implementation, in which a single instruction is executed over a single cycle.
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
Introduction to Computer Organization Pipelining.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
L17 – Pipeline Issues 1 Comp 411 – Fall /23/09 CPU Pipelining Issues Read Chapter This pipe stuff makes my head hurt! What have you been.
Pipelining Intro Computer Organization 1 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Basic Instruction Timings Making some assumptions.
1  2004 Morgan Kaufmann Publishers No encoding: –1 bit for each datapath operation –faster, requires more memory (logic) –used for Vax 780 — an astonishing.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
CSE 340 Computer Architecture Spring 2016 Overcoming Data Hazards.
Chapter Six.
Computer Organization
ECE232: Hardware Organization and Design
Chapter 4 The Processor Part 3
Review: MIPS Pipeline Data and Control Paths
Computer Organization CS224
Pipelined Control (Simplified)
Chapter Six.
Chapter Six.
The Processor Lecture 3.5: Data Hazards
November 5 No exam results today. 9 Classes to go!
CSC3050 – Computer Architecture
Pipelining (II).
Morgan Kaufmann Publishers The Processor
Systems Architecture II
Presentation transcript:

1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many time We can reorganize the operation Make each hardware block independent –1. Instruction Fetch Unit –2. Register Read Unit –3. ALU Unit –4. Data Memory Read/Write Unit –5. Register Write Unit Units in 3 and 5 cannot be independent, but operations can be Let each unit just do its required job for each instruction If for some instruction, a unit need not do anything, it can simply perform a noop

2 Gain of Pipelining Improve performance by increasing instruction throughput Ideal speedup is number of stages in the pipeline Do we achieve this? No, why not?

3 Pipelining What makes it easy –all instructions are the same length –just a few instruction formats –memory operands appear only in loads and stores What makes it hard? –structural hazards: suppose we had only one memory –control hazards: need to worry about branch instructions –data hazards: an instruction depends on a previous instruction We’ll study these issues using a simple pipeline Other complication: –exception handling –trying to improve performance with out-of-order execution, etc.

4 Basic Idea What do we need to add to actually split the datapath into stages?

5 Pipelined Data Path Can you find a problem even if there are no dependencies? What instructions can we execute to manifest the problem?

6 Corrected Data Path

7 Execution Time Time of n instructions depends on –Number of instructions n –# of stages k –# of control hazard and penalty of each step –# of data hazards and penalty for each Time = n + k load hazard penalty + branch penalty Load hazard penalty is 1 or 0 cycle –depending on data use with forwarding branch penalty is 3, 2, 1, or zero cycles depending on scheme

8 Design and Performance Issues With Pipelining Pipelined processors are not EASY to design Technology affect implementation Instruction set design affect the performance, i.e., beq, bne More stages do not lead to higher performance

9 Pipeline Operation In pipeline one operation begins in every cycle Also, one operation completes in each cycle Each instruction takes 5 clock cycles (k cycles in general) When a stage is not used, no control needs to be applied In one clock cycle, several instructions are active Different stages are executing different instructions How to generate control signals for them is an issue

10 Graphically Representing Pipelines Can help with answering questions like: –how many cycles does it take to execute this code? –what is the ALU doing during cycle 4? –use this representation to help understand datapaths

11 Instruction Format JUMP JUMP ADDRESS REG 1REG 2 BEQ/BNE BRANCH ADDRESS OFFSET REG 1REG 2 SW STORE ADDRESS OFFSET REG 1REG 2 LW LOAD ADDRESS OFFSET REG 1REG 2DSTR-TYPESHIFT AMOUNTADD/AND/OR/SLT

12 Operation for Each Instruction LW: 1. READ INST 2. READ REG 1 READ REG 2 3. ADD REG 1 + OFFSET 4. READ MEM 5. WRITE REG2 SW: 1. READ INST 2. READ REG 1 READ REG 2 3. ADD REG 1 + OFFSET 4. WRITE MEM 5. R-Type: 1. READ INST 2. READ REG 1 READ REG 2 3. OPERATE on REG 1 / REG WRITE DST BR-Type: 1. READ INST 2. READ REG 1 READ REG 2 3. SUB REG 2 from REG JMP-Type: 1. READ INST

13 Pipeline Data Path Operation PCPC 4 ADDADD INST MEMORY IA INST MUXMUX MUXMUX MUXMUX Control REG FILE RA RA2 RD1 RD2 WA WD MUXMUX Sign Ext Shift Left 2 MUXMUX MUXMUX MUXMUX ALUALU ADDADD MUXMUX MEMMEM WD ADDR

14 Fetch Unit PCPC 4 ADDADD INST MEMORY IA INST MUXMUX MUXMUX MUXMUX NPC INST Jump Address Jump Register Address Branch Address

15 Register Fetch Unit Control REG FILE RA RA2 RD1 RD2 WA WD NPC INST

16 ALU Operation and Branch Logic MUXMUX Sign Ext Shift Left 2 MUXMUX MUXMUX MUXMUX ALUALU ADDADD RD1 RD2 INST Branch address Reg Write Address Write Data ALU OUTPUT

17 Memory and Write back Stage MUXMUX MEMMEM WD ADDR WRITE DATA ADDR Data Read Data ALU

18 Pipeline Data Path Operation PCPC 4 ADDADD INST MEMORY IA INST MUXMUX MUXMUX MUXMUX Control REG FILE RA RA2 RD1 RD2 WA WD MUXMUX Sign Ext Shift Left 2 MUXMUX MUXMUX MUXMUX ALUALU ADDADD MUXMUX MEMMEM WD ADDR

19 Problem with starting next instruction before first is finished –dependencies that “go backward in time” are data hazards Dependencies

20 Consider the following program add$t0, $t1, $t2 add$t1, $t0, $t3 and$t2, $t4, $t0 or$t3, $t1, $t0 slt$t4, $t2, $t3 Problem with starting next instruction before first is finished –dependencies that “go backward in time” are data hazards A program with data dependencies

21 Data Path Operation C1 C2 C3 C4 C5 C6 C7 C8 C9 ALUALU MUXMUX INST FETCH REG FILE MUXMUX DATA MEMORY ALUALU MUXMUX INST FETCH REG FILE MUXMUX DATA MEMORY ALUALU MUXMUX INST FETCH REG FILE MUXMUX DATA MEMORY ALUALU MUXMUX INST FETCH REG FILE MUXMUX DATA MEMORY ALUALU MUXMUX INST FETCH REG FILE MUXMUX DATA MEMORY add $t0, $t1, $t2 add $t1, $t0, $t3 and $t2, $t4, $t0 or $t3, $t1, $t0 slt $t4, $t2, $t3

22 Have compiler guarantee no hazards Where do we insert the “no-ops” ? sub$2, $1, $3 and $12, $2, $5 or$13, $6, $2 add$14, $2, $2 sw$15, 100($2) Problem: this really slows us down! –Also, the program will always be slow even if a techniques like forwarding is employed afterwards in newer version Hardware can detect dependencies and insert no-ops in hardware –Hardware detection and no-op insertion is called stalling –This is a bubble in pipeline and waste one cycle at all stages –Need two or three bubbles between write and read of a register Solution: Software No-ops/Hardware Bubbles

23 Hazard Detection Unit Stall by letting an instruction that won’t write anything go forward

24 Stalling Hardware detection and no-op insertion is called stalling We stall the pipeline by keeping an instruction in the same stage

25 Stalled Operation (no write before read) C1 C2 C3 C4 C5 C6 C7 C8 C9 ALUALU MUXMUX INST FETCH REG FILE MUXMUX DATA MEMORY ALUALU MUXMUX INST FETCH REG FILE MUXMUX DATA MEMORY ALUALU MUXMUX INST FETCH REG FILE MUXMUX DATA MEMORY ALUALU MUXMUX INST FETCH REG FILE MUXMUX DATA MEMORY ALUALU MUXMUX INST FETCH REG FILE MUXMUX DATA MEMORY add $t0, $t1, $t2 add $t1, $t0, $t3

26 Stalled Operation (write before read) C1 C2 C3 C4 C5 C6 C7 C8 C9 ALUALU MUXMUX INST FETCH REG FILE MUXMUX DATA MEMORY ALUALU MUXMUX INST FETCH REG FILE MUXMUX DATA MEMORY ALUALU MUXMUX INST FETCH REG FILE MUXMUX DATA MEMORY ALUALU MUXMUX INST FETCH REG FILE MUXMUX DATA MEMORY ALUALU MUXMUX INST FETCH REG FILE MUXMUX DATA MEMORY add $t0, $t1, $t2 add $t1, $t0, $t3 and $t2, $t4, $t0 add $t1, $t0, $t3

27 EX hazard –If ((EX/MEM.RegWrite) and (EX/MEM.RegisterRd != 0) and (EX/MEM.REgisterRd = ID/EX.RegisterRs)) ForwardA = 10 –If ((EX/MEM.RegWrite) and (EX/MEM.RegisterRd != 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10 MEM hazard –If ((MEM/WB.RegWrite) and (MEM/WB.REgisterRd != 0) and (MEM/WB.REgisterRd = ID/EX.RegisterRs)) ForwardA = 01 –If ((MEM/WB.RegWrite) and (MEM/WB.REgisterRd != 0) and (MEM/WB.REgisterRd = ID/EX.RegisterRt)) ForwardB = 10 In case of lw followed by a sw instruction, forwarding will not work. This is because data in MEM stage are still being read –Plan on adding forwarding in MEM stage of put a stall/bubble In case of lw followed by an instruction that uses the value –One has to add an stall Detecting Hazards for Forwarding

28 Use temporary results, don’t wait for them to be written –register file forwarding to handle read/write to same register –ALU forwarding –May also need forwarding to memory (think!!) Forwarding what if this $2 was $13?

29 Forwarding

30 Load word can still cause a hazard: –an instruction tries to read a register following a load instruction that writes to the same register. Thus, we need a hazard detection unit to “stall” the load instruction Can't always forward

31 When we decide to branch, other instructions are in the pipeline! We are predicting “branch not taken” –need to add hardware for flushing instructions if we are wrong Branch Hazards

32 Improving Performance Try and avoid stalls! E.g., reorder these instructions: lw $t0, 0($t1) lw $t2, 4($t1) sw $t2, 0($t1) sw $t0, 4($t1) Add a “branch delay slot” –the next instruction after a branch is always executed –rely on compiler to “fill” the slot with something useful Superscalar: start more than one instruction in the same cycle

33 Other Issues in Pipelines Exceptions –Errors in ALU for arithmetic instructions –Memory non-availability Exceptions lead to a jump in a program However, the current PC value must be saved so that the program can return to it back for recoverable errors Multiple exception can occur in a pipeline Preciseness of exception location is important in some cases I/O exceptions are handled in the same manner

34 Handling Branches Branch Prediction –Usually we may simply assume that branch is not taken –If it is taken, then we flush the pipeline Clear control signals for instruction following branch Delayed branch –Fill instructions that need to be executed even if branch occur –If none available fill NOOPs Reduce delay in resolving branches –Compare at register stage –Branch prediction table PC value (for branch) and next address One or two bits to store what should be prediction

35 Two State vs Four State Branch Prediction Two state model Four State Model Predict Taken Predict Not Taken Not Taken Taken Predict Not Taken Predict Not Taken Not Taken Taken Predict Taken Predict Taken Not Taken Taken

36 Pipeline with Early Branch Resolution/Exception

37 Superscalar Architecture

38 A Modern Pipelined Microprocessor

39 Important Facts to Remember Pipelined processors divide the execution in multiple steps However pipeline hazards reduce performance –Structural, data, and control hazard Data forwarding helps resolve data hazards –But all hazards cannot be resolved –Some data hazards require bubble or noop insertion Effects of control hazard reduced by branch prediction –Predict always taken, delayed slots, branch prediction table –Structural hazards are resolved by duplicating resources

40 We have 5 stages. What needs to be controlled in each stage? –Instruction Fetch and PC Increment –Instruction Decode / Register Fetch –Execution –Memory Stage –Write Back How would control be handled in an automobile plant? –a fancy control center telling everyone what to do? –should we use a finite state machine? Pipeline control

41 Pipeline Control

42 Pass control signals along just like the data Pipeline Control

43 Data Path with Control

44 Flushing Instructions