1 Z3, built by German scientist Konrad Zuse (pictured) and demonstrated in Z3 used mechanical relays and the program was on a punched tape. It used a binary floating point format. The picture above was of a reproduction built in the 1960s.
2 COMP 206: Computer Architecture and Implementation Montek Singh Thu, Feb 5, 2009 Topic: Pipelining II (Intermediate Concepts)
3Outline Control of the pipeline Performance improvement Problems: Hazards Structural hazards Structural hazards Data hazards Data hazards Hazard resolution Hazard resolution Reading: Appendix A (HP4)
4 Pipeline with Control Signals Note that we want control to follow instruction. For example, RegWrite at WB stage, not ID
5 Detail: ALU Control ALU op in low order bits Op Rs1Rs2Rd Opx
6ALUSrc Simple control: register for R- type, immediate for lw and sw
7RegDst Chooses portion of instruction to use as destination register (R-type and lw different syntax)
8 Not to Belabor the Point… Signals are decoded and carried as far as necessary “Main Control”: generates control signals during Reg/Dec “Main Control”: generates control signals during Reg/Dec Control signals for Exec (ExtOp, ALUSrc,...) are used 1 cycle later Control signals for Mem (MemWr, Branch) are used 2 cycles later Control signals for WrB (MemtoReg,MemWr) are used 3 cycles later
9 Control and Data
10Aside Sounds more complex than it really is to implement In Verilog or VHDL you would just write simple logical expressions (or just rename wires) Ex: assign ALUSrc = (Inst[31:26] == 35 || Inst[31:26] == 43); Inst[31:26] == 43);
11 Wr Clk Cycle 1 Multiple Cycle Implementation: Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9Cycle 10 LoadIfetchRegExecMemWr IfetchRegExecMem LoadStore Pipelined Implementation: IfetchRegExecMemWrStore Clk Single Cycle Implementation : LoadStoreWaste Ifetch R-type IfetchRegExecMemWrR-type Cycle 1Cycle 2 IfetchRegExecMem Single Cycle vs. Multiple Cycle vs. Pipelined
12 CPU Designs: Summary Disadvantages of the Single Cycle Processor Long cycle time Long cycle time Cycle time wasted for the faster instructions Cycle time wasted for the faster instructions Multiple Clock Cycle Processor Divide the instructions into smaller steps Divide the instructions into smaller steps Execute each step (instead of the entire instruction) in 1 cycle Execute each step (instead of the entire instruction) in 1 cycle Pipelined Processor Natural enhancement of the multiple clock cycle processor Natural enhancement of the multiple clock cycle processor Each functional unit used only once per instruction Each functional unit used only once per instruction If an instruction is going to use a functional unit: If an instruction is going to use a functional unit: it must use it at the same stage as all other instructions Pipeline Control: Pipeline Control: each stage’s control signal depends ONLY on the instruction that is currently in that stage
13 Speedup & Throughput of a Pipeline
14 Example from HP4 Non-pipelined, 1ns clk, 4 cycles for ALU ops & branches, and 5 cycles for memory op Relative frequencies: 40%, 20%, 40% Pipelined, 0.2ns overhead (setup, etc) Avg. execution time (non-pipelined) = Clock × Avg. CPI Avg. execution time (non-pipelined) = Clock × Avg. CPI = 1ns × ((40% + 20%) × % × 5) = 1ns × ((40% + 20%) × % × 5) = 1ns × 4.4 = 1ns × 4.4 = 4.4 ns = 4.4 ns Speedup = Unpipelined / Pipelined Speedup = Unpipelined / Pipelined = 4.4 ns / 1.2 ns = 4.4 ns / 1.2 ns = 3.7 times = 3.7 times The overhead limits total speedup
15 Not Quite this Rosy Run into problems with contention for resources and dependencies between instructions Next: Hazards
16 Pipeline Hazards: Structural Hazard A relation between two instructions indicating that: the two instructions may want to use the same hardware resource (function unit, register file port, shared bus, cache port, etc.) the two instructions may want to use the same hardware resource (function unit, register file port, shared bus, cache port, etc.) …at the same time …at the same time In principle, eliminated by duplicating resources Low hardware utilization; increased cost Low hardware utilization; increased cost MIPS pipeline as designed so far does not have structural hazard But we had to avoid it (see example later) But we had to avoid it (see example later) Usually occurs when a long-latency functional unit is not fully pipelined (e.g., a floating point unit)
17 Structural Hazard: Example Consider system w/ single-ported memory
18Solutions Stall (insert a bubble) We could also use other techniques, such as split cache, instruction buffer, etc. More when we discuss memory
19 Resolving Structural Hazards Early resolution (scheduling) Done well before the collision could occur, and usually at a place different from where the collision could happen Done well before the collision could occur, and usually at a place different from where the collision could happen Example: instructions are delayed in the ID stage Example: instructions are delayed in the ID stage Late resolution Done at the place where the collision might happen Done at the place where the collision might happen Done just before the collision is about to happen Done just before the collision is about to happen Example: Using an arbiter or a priority encoder Example: Using an arbiter or a priority encoder One instruction wins Others are denied access, stall, and wait for their next chance Why allow structural hazards in the first place? Reduce cost Reduce cost Reduce unit latency (by avoiding pipeline latch delays) Reduce unit latency (by avoiding pipeline latch delays) Hazards may be infrequent (“make common case fast”) Hazards may be infrequent (“make common case fast”)
20 Data Hazard: Example Consider the following code fragment sub $2, $1, $3# Reg 2 written and $12, $2, $5# $2 used or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) Clearly the programmer would expect the newly set value of register 2 to be used
21 Data Hazard: Example (contd)
22 No Problem with “sw”
23 Maybe OK with “add” If register file can be read and written in a half cycle each
24 Not Correct Result for “and”, “or”
25 Types of Data Hazards RAW hazards corresponding to value dependences are most difficult to deal with, since they can never be eliminated The second instruction is waiting for information produced by the first instruction The second instruction is waiting for information produced by the first instruction WAR and WAW hazards are name dependences Two instructions happen to use the same register (name), although they don’t have to Two instructions happen to use the same register (name), although they don’t have to Can often be eliminated by renaming, either in software or hardware Can often be eliminated by renaming, either in software or hardware Implies the use of additional resources, hence additional cost Renaming is not always possible: implicit operands such as accumulator, PC, or condition codes cannot be renamed These hazards don’t cause problems for MIPS pipeline These hazards don’t cause problems for MIPS pipeline Relative timing does not change even with pipelined execution, because reads occur early and writes occur late in pipeline
26 Easy Fix!!! Let Compiler Deal sub $2, $1, $3# Reg 2 written nopnop and $12, $2, $5# $2 used, now OK or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) Original code sequence common, though, so it’s not a good solution to waste so much (clock) time
27 Hardware Solution: Forwarding Correct value of $2 is available, just not stored in the register. Send from where available!
28 How to Detect the Hazard Let’s look at logic for the two types of data hazards we’ve seen so far 1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt Rd – destination Rs and Rt – two sources
29 Type 1a and 2a Hazards 1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt To detect hazard, AND these conditions with the RegWrite signal
30 From This Pipeline to…
31 Modified Pipeline From EX/MEM & MEM/WB to each of 2 reg inputs
32 With Control (RegWr)
33Example sub $2, $1, $3 and $4, $2, $5 or $4, $4, $2 add $9, $4, $2 Lots of potential hazards
34 Clock 3
35 Clock 4
36 Clock 5
37 Clock 6 Make sure we get the correct value of $4
38 Next Time Sometimes forwarding not good enough Control hazards