1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers The Processor
Advertisements

Instruction-Level Parallelism (ILP)
ECE 445 – Computer Organization
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Pipelined Processor.
Part 2 - Data Hazards and Forwarding 3/24/04++
Review: MIPS Pipeline Data and Control Paths
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan
Chapter Six Enhancing Performance with Pipelining
Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
DLX Instruction Format
1 CSE SUNY New Paltz Chapter Six Enhancing Performance with Pipelining.
Appendix A Pipelining: Basic and Intermediate Concepts
Computer ArchitectureFall 2008 © October 6th, 2008 Majd F. Sakr CS-447– Computer Architecture.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Sep 11, 2002 Topic: Pipelining (Intermediate Concepts)
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
Enhancing Performance with Pipelining Slides developed by Rami Abielmona and modified by Miodrag Bolic High-Level Computer Systems Design.
Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.
Pipeline Data Hazards: Detection and Circumvention Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly.
Automobile Manufacturing 1. Build frame. 60 min. 2. Add engine. 50 min. 3. Build body. 80 min. 4. Paint. 40 min. 5. Finish.45 min. 275 min. Latency: Time.
B 0000 Pipelining ENGR xD52 Eric VanWyk Fall
Pipelined Datapath and Control
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
Basic Pipelining & MIPS Pipelining Chapter 6 [Computer Organization and Design, © 2007 Patterson (UCB) & Hennessy (Stanford), & Slides Adapted from: Mary.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CMPE 421 Parallel Computer Architecture
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
CMPE 421 Parallel Computer Architecture Part 2: Hardware Solution: Forwarding.

Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
CSE431 L07 Overcoming Data Hazards.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 07: Overcoming Data Hazards Mary Jane Irwin (
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)
CSIE30300 Computer Architecture Unit 05: Overcoming Data Hazards Hsin-Chou Chi [Adapted from material by and
L17 – Pipeline Issues 1 Comp 411 – Fall /23/09 CPU Pipelining Issues Read Chapter This pipe stuff makes my head hurt! What have you been.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
CSE 340 Computer Architecture Spring 2016 Overcoming Data Hazards.
Computer Organization
Stalling delays the entire pipeline
Note how everything goes left to right, except …
Pipelining: Hazards Ver. Jan 14, 2014
Appendix C Pipeline implementation
ECE232: Hardware Organization and Design
ECS 154B Computer Architecture II Spring 2009
\course\cpeg323-08F\Topic6b-323
ECE232: Hardware Organization and Design
Forwarding Now, we’ll introduce some problems that data hazards can cause for our pipelined processor, and show how to handle them with forwarding.
Chapter 4 The Processor Part 3
Review: MIPS Pipeline Data and Control Paths
Single-cycle datapath, slightly rearranged
\course\cpeg323-05F\Topic6b-323
Pipeline control unit (highly abstracted)
Control unit extension for data hazards
The Processor Lecture 3.5: Data Hazards
Instruction Execution Cycle
Pipeline control unit (highly abstracted)
Pipeline Control unit (highly abstracted)
Control unit extension for data hazards
Morgan Kaufmann Publishers The Processor
Control unit extension for data hazards
CMCS Computer Architecture Lecture 20 Pipelined Datapath and Control April 11, CMSC411.htm Mohamed.
©2003 Craig Zilles (derived from slides by Howard Huang)
Pipelining Hazards.
Presentation transcript:

1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in Z3 used mechanical relays and the program was on a punched tape. It used a binary floating point format. The picture above was of a reproduction built in the 1960s.

2 COMP 206: Computer Architecture and Implementation Montek Singh Thu, Feb 5, 2009 Topic: Pipelining II (Intermediate Concepts)

3Outline  Control of the pipeline  Performance improvement  Problems: Hazards Structural hazards Structural hazards Data hazards Data hazards Hazard resolution Hazard resolution Reading: Appendix A (HP4)

4 Pipeline with Control Signals Note that we want control to follow instruction. For example, RegWrite at WB stage, not ID

5 Detail: ALU Control ALU op in low order bits Op Rs1Rs2Rd Opx

6ALUSrc Simple control: register for R- type, immediate for lw and sw

7RegDst Chooses portion of instruction to use as destination register (R-type and lw different syntax)

8 Not to Belabor the Point…  Signals are decoded and carried as far as necessary “Main Control”: generates control signals during Reg/Dec “Main Control”: generates control signals during Reg/Dec  Control signals for Exec (ExtOp, ALUSrc,...) are used 1 cycle later  Control signals for Mem (MemWr, Branch) are used 2 cycles later  Control signals for WrB (MemtoReg,MemWr) are used 3 cycles later

9 Control and Data

10Aside Sounds more complex than it really is to implement  In Verilog or VHDL you would just write simple logical expressions (or just rename wires)  Ex: assign ALUSrc = (Inst[31:26] == 35 || Inst[31:26] == 43); Inst[31:26] == 43);

11 Wr Clk Cycle 1 Multiple Cycle Implementation: Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9Cycle 10 LoadIfetchRegExecMemWr IfetchRegExecMem LoadStore Pipelined Implementation: IfetchRegExecMemWrStore Clk Single Cycle Implementation : LoadStoreWaste Ifetch R-type IfetchRegExecMemWrR-type Cycle 1Cycle 2 IfetchRegExecMem Single Cycle vs. Multiple Cycle vs. Pipelined

12 CPU Designs: Summary  Disadvantages of the Single Cycle Processor Long cycle time Long cycle time Cycle time wasted for the faster instructions Cycle time wasted for the faster instructions  Multiple Clock Cycle Processor Divide the instructions into smaller steps Divide the instructions into smaller steps Execute each step (instead of the entire instruction) in 1 cycle Execute each step (instead of the entire instruction) in 1 cycle  Pipelined Processor Natural enhancement of the multiple clock cycle processor Natural enhancement of the multiple clock cycle processor Each functional unit used only once per instruction Each functional unit used only once per instruction If an instruction is going to use a functional unit: If an instruction is going to use a functional unit:  it must use it at the same stage as all other instructions Pipeline Control: Pipeline Control:  each stage’s control signal depends ONLY on the instruction that is currently in that stage

13 Speedup & Throughput of a Pipeline

14 Example from HP4  Non-pipelined, 1ns clk, 4 cycles for ALU ops & branches, and 5 cycles for memory op  Relative frequencies: 40%, 20%, 40%  Pipelined, 0.2ns overhead (setup, etc) Avg. execution time (non-pipelined) = Clock × Avg. CPI Avg. execution time (non-pipelined) = Clock × Avg. CPI = 1ns × ((40% + 20%) × % × 5) = 1ns × ((40% + 20%) × % × 5) = 1ns × 4.4 = 1ns × 4.4 = 4.4 ns = 4.4 ns Speedup = Unpipelined / Pipelined Speedup = Unpipelined / Pipelined = 4.4 ns / 1.2 ns = 4.4 ns / 1.2 ns = 3.7 times = 3.7 times  The overhead limits total speedup

15 Not Quite this Rosy  Run into problems with contention for resources and dependencies between instructions  Next: Hazards

16 Pipeline Hazards: Structural Hazard  A relation between two instructions indicating that: the two instructions may want to use the same hardware resource (function unit, register file port, shared bus, cache port, etc.) the two instructions may want to use the same hardware resource (function unit, register file port, shared bus, cache port, etc.) …at the same time …at the same time  In principle, eliminated by duplicating resources Low hardware utilization; increased cost Low hardware utilization; increased cost  MIPS pipeline as designed so far does not have structural hazard But we had to avoid it (see example later) But we had to avoid it (see example later)  Usually occurs when a long-latency functional unit is not fully pipelined (e.g., a floating point unit)

17 Structural Hazard: Example Consider system w/ single-ported memory

18Solutions  Stall (insert a bubble)  We could also use other techniques, such as split cache, instruction buffer, etc. More when we discuss memory

19 Resolving Structural Hazards  Early resolution (scheduling) Done well before the collision could occur, and usually at a place different from where the collision could happen Done well before the collision could occur, and usually at a place different from where the collision could happen Example: instructions are delayed in the ID stage Example: instructions are delayed in the ID stage  Late resolution Done at the place where the collision might happen Done at the place where the collision might happen Done just before the collision is about to happen Done just before the collision is about to happen Example: Using an arbiter or a priority encoder Example: Using an arbiter or a priority encoder  One instruction wins  Others are denied access, stall, and wait for their next chance  Why allow structural hazards in the first place? Reduce cost Reduce cost Reduce unit latency (by avoiding pipeline latch delays) Reduce unit latency (by avoiding pipeline latch delays) Hazards may be infrequent (“make common case fast”) Hazards may be infrequent (“make common case fast”)

20 Data Hazard: Example  Consider the following code fragment sub $2, $1, $3# Reg 2 written and $12, $2, $5# $2 used or $13, $6, $2 add $14, $2, $2 sw $15, 100($2)  Clearly the programmer would expect the newly set value of register 2 to be used

21 Data Hazard: Example (contd)

22 No Problem with “sw”

23 Maybe OK with “add” If register file can be read and written in a half cycle each

24 Not Correct Result for “and”, “or”

25 Types of Data Hazards  RAW hazards corresponding to value dependences are most difficult to deal with, since they can never be eliminated The second instruction is waiting for information produced by the first instruction The second instruction is waiting for information produced by the first instruction  WAR and WAW hazards are name dependences Two instructions happen to use the same register (name), although they don’t have to Two instructions happen to use the same register (name), although they don’t have to Can often be eliminated by renaming, either in software or hardware Can often be eliminated by renaming, either in software or hardware  Implies the use of additional resources, hence additional cost  Renaming is not always possible: implicit operands such as accumulator, PC, or condition codes cannot be renamed These hazards don’t cause problems for MIPS pipeline These hazards don’t cause problems for MIPS pipeline  Relative timing does not change even with pipelined execution, because reads occur early and writes occur late in pipeline

26 Easy Fix!!! Let Compiler Deal sub $2, $1, $3# Reg 2 written nopnop and $12, $2, $5# $2 used, now OK or $13, $6, $2 add $14, $2, $2 sw $15, 100($2)  Original code sequence common, though, so it’s not a good solution to waste so much (clock) time

27 Hardware Solution: Forwarding  Correct value of $2 is available, just not stored in the register. Send from where available!

28 How to Detect the Hazard  Let’s look at logic for the two types of data hazards we’ve seen so far 1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt  Rd – destination  Rs and Rt – two sources

29 Type 1a and 2a Hazards 1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt To detect hazard, AND these conditions with the RegWrite signal

30 From This Pipeline to…

31 Modified Pipeline From EX/MEM & MEM/WB to each of 2 reg inputs

32 With Control (RegWr)

33Example sub $2, $1, $3 and $4, $2, $5 or $4, $4, $2 add $9, $4, $2  Lots of potential hazards

34 Clock 3

35 Clock 4

36 Clock 5

37 Clock 6 Make sure we get the correct value of $4

38 Next Time  Sometimes forwarding not good enough  Control hazards