Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used.

Similar presentations


Presentation on theme: "1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used."— Presentation transcript:

1 1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used a binary floating point format. The picture above was of a reproduction built in the 1960s.

2 2 COMP 206: Computer Architecture and Implementation Montek Singh Thu, Feb 5, 2009 Topic: Pipelining II (Intermediate Concepts)

3 3Outline  Control of the pipeline  Performance improvement  Problems: Hazards Structural hazards Structural hazards Data hazards Data hazards Hazard resolution Hazard resolution Reading: Appendix A (HP4)

4 4 Pipeline with Control Signals Note that we want control to follow instruction. For example, RegWrite at WB stage, not ID

5 5 Detail: ALU Control ALU op in low order bits Op 312601516202125 Rs1Rs2Rd Opx 56 1011

6 6ALUSrc Simple control: register for R- type, immediate for lw and sw

7 7RegDst Chooses portion of instruction to use as destination register (R-type and lw different syntax)

8 8 Not to Belabor the Point…  Signals are decoded and carried as far as necessary “Main Control”: generates control signals during Reg/Dec “Main Control”: generates control signals during Reg/Dec  Control signals for Exec (ExtOp, ALUSrc,...) are used 1 cycle later  Control signals for Mem (MemWr, Branch) are used 2 cycles later  Control signals for WrB (MemtoReg,MemWr) are used 3 cycles later

9 9 Control and Data

10 10Aside Sounds more complex than it really is to implement  In Verilog or VHDL you would just write simple logical expressions (or just rename wires)  Ex: assign ALUSrc = (Inst[31:26] == 35 || Inst[31:26] == 43); Inst[31:26] == 43);

11 11 Wr Clk Cycle 1 Multiple Cycle Implementation: Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9Cycle 10 LoadIfetchRegExecMemWr IfetchRegExecMem LoadStore Pipelined Implementation: IfetchRegExecMemWrStore Clk Single Cycle Implementation : LoadStoreWaste Ifetch R-type IfetchRegExecMemWrR-type Cycle 1Cycle 2 IfetchRegExecMem Single Cycle vs. Multiple Cycle vs. Pipelined

12 12 CPU Designs: Summary  Disadvantages of the Single Cycle Processor Long cycle time Long cycle time Cycle time wasted for the faster instructions Cycle time wasted for the faster instructions  Multiple Clock Cycle Processor Divide the instructions into smaller steps Divide the instructions into smaller steps Execute each step (instead of the entire instruction) in 1 cycle Execute each step (instead of the entire instruction) in 1 cycle  Pipelined Processor Natural enhancement of the multiple clock cycle processor Natural enhancement of the multiple clock cycle processor Each functional unit used only once per instruction Each functional unit used only once per instruction If an instruction is going to use a functional unit: If an instruction is going to use a functional unit:  it must use it at the same stage as all other instructions Pipeline Control: Pipeline Control:  each stage’s control signal depends ONLY on the instruction that is currently in that stage

13 13 Speedup & Throughput of a Pipeline

14 14 Example from HP4  Non-pipelined, 1ns clk, 4 cycles for ALU ops & branches, and 5 cycles for memory op  Relative frequencies: 40%, 20%, 40%  Pipelined, 0.2ns overhead (setup, etc) Avg. execution time (non-pipelined) = Clock × Avg. CPI Avg. execution time (non-pipelined) = Clock × Avg. CPI = 1ns × ((40% + 20%) × 4 + 40% × 5) = 1ns × ((40% + 20%) × 4 + 40% × 5) = 1ns × 4.4 = 1ns × 4.4 = 4.4 ns = 4.4 ns Speedup = Unpipelined / Pipelined Speedup = Unpipelined / Pipelined = 4.4 ns / 1.2 ns = 4.4 ns / 1.2 ns = 3.7 times = 3.7 times  The overhead limits total speedup

15 15 Not Quite this Rosy  Run into problems with contention for resources and dependencies between instructions  Next: Hazards

16 16 Pipeline Hazards: Structural Hazard  A relation between two instructions indicating that: the two instructions may want to use the same hardware resource (function unit, register file port, shared bus, cache port, etc.) the two instructions may want to use the same hardware resource (function unit, register file port, shared bus, cache port, etc.) …at the same time …at the same time  In principle, eliminated by duplicating resources Low hardware utilization; increased cost Low hardware utilization; increased cost  MIPS pipeline as designed so far does not have structural hazard But we had to avoid it (see example later) But we had to avoid it (see example later)  Usually occurs when a long-latency functional unit is not fully pipelined (e.g., a floating point unit)

17 17 Structural Hazard: Example Consider system w/ single-ported memory

18 18Solutions  Stall (insert a bubble)  We could also use other techniques, such as split cache, instruction buffer, etc. More when we discuss memory

19 19 Resolving Structural Hazards  Early resolution (scheduling) Done well before the collision could occur, and usually at a place different from where the collision could happen Done well before the collision could occur, and usually at a place different from where the collision could happen Example: instructions are delayed in the ID stage Example: instructions are delayed in the ID stage  Late resolution Done at the place where the collision might happen Done at the place where the collision might happen Done just before the collision is about to happen Done just before the collision is about to happen Example: Using an arbiter or a priority encoder Example: Using an arbiter or a priority encoder  One instruction wins  Others are denied access, stall, and wait for their next chance  Why allow structural hazards in the first place? Reduce cost Reduce cost Reduce unit latency (by avoiding pipeline latch delays) Reduce unit latency (by avoiding pipeline latch delays) Hazards may be infrequent (“make common case fast”) Hazards may be infrequent (“make common case fast”)

20 20 Data Hazard: Example  Consider the following code fragment sub $2, $1, $3# Reg 2 written and $12, $2, $5# $2 used or $13, $6, $2 add $14, $2, $2 sw $15, 100($2)  Clearly the programmer would expect the newly set value of register 2 to be used

21 21 Data Hazard: Example (contd)

22 22 No Problem with “sw”

23 23 Maybe OK with “add” If register file can be read and written in a half cycle each

24 24 Not Correct Result for “and”, “or”

25 25 Types of Data Hazards  RAW hazards corresponding to value dependences are most difficult to deal with, since they can never be eliminated The second instruction is waiting for information produced by the first instruction The second instruction is waiting for information produced by the first instruction  WAR and WAW hazards are name dependences Two instructions happen to use the same register (name), although they don’t have to Two instructions happen to use the same register (name), although they don’t have to Can often be eliminated by renaming, either in software or hardware Can often be eliminated by renaming, either in software or hardware  Implies the use of additional resources, hence additional cost  Renaming is not always possible: implicit operands such as accumulator, PC, or condition codes cannot be renamed These hazards don’t cause problems for MIPS pipeline These hazards don’t cause problems for MIPS pipeline  Relative timing does not change even with pipelined execution, because reads occur early and writes occur late in pipeline

26 26 Easy Fix!!! Let Compiler Deal sub $2, $1, $3# Reg 2 written nopnop and $12, $2, $5# $2 used, now OK or $13, $6, $2 add $14, $2, $2 sw $15, 100($2)  Original code sequence common, though, so it’s not a good solution to waste so much (clock) time

27 27 Hardware Solution: Forwarding  Correct value of $2 is available, just not stored in the register. Send from where available!

28 28 How to Detect the Hazard  Let’s look at logic for the two types of data hazards we’ve seen so far 1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt  Rd – destination  Rs and Rt – two sources

29 29 Type 1a and 2a Hazards 1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt To detect hazard, AND these conditions with the RegWrite signal

30 30 From This Pipeline to…

31 31 Modified Pipeline From EX/MEM & MEM/WB to each of 2 reg inputs

32 32 With Control (RegWr)

33 33Example sub $2, $1, $3 and $4, $2, $5 or $4, $4, $2 add $9, $4, $2  Lots of potential hazards

34 34 Clock 3

35 35 Clock 4

36 36 Clock 5

37 37 Clock 6 Make sure we get the correct value of $4

38 38 Next Time  Sometimes forwarding not good enough  Control hazards


Download ppt "1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in 1941. Z3 used mechanical relays and the program was on a punched tape. It used."

Similar presentations


Ads by Google