Download presentation
Presentation is loading. Please wait.
1
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Sep 11, 2002 Topic: Pipelining (Intermediate Concepts)
2
2 MemWr IF/ID: ID/Ex Register Ex/Mem: Load’s Address Mem/Wr Register PC Data Mem WA Di RA Do IUnit A I RFile Di Ra Rb Rw RegWr ExtOp=1 Exec Unit busA busB Imm16 ALUOp=Add ALUSrc=1 Mux 1 0 MemtoReg 1 0 RegDst=0 Rt Rd Imm16 PC+4 Rs Rt PC+4 Zero Branch 1 0 IfetchReg/DecExecMem Wr How About Control Signals? Key Observation: Control Signals at Stage N = Func (Instr. at Stage N) for N = Exec, Mem, or WrB. Control Signals at Exec Stage = Func(Load’s Exec) Control Signals at Exec Stage = Func(Load’s Exec) What about Ifetch and Reg/Dec? What about Ifetch and Reg/Dec?
3
3 IF/ID Register ID/Ex Register Ex/Mem Register Mem/Wr Register Reg/DecExecMem ExtOp ALUOp RegDst ALUSrc Branch MemWr MemtoReg RegWr Main Control ExtOp ALUOp RegDst ALUSrc MemtoReg RegWr MemtoReg RegWr MemtoReg RegWr Branch MemWr Branch MemWr WrB Clk Pipeline Control “Main Control”: generates control signals during Reg/Dec Control signals for Exec (ExtOp, ALUSrc,...) are used 1 cycle later Control signals for Mem (MemWr, Branch) are used 2 cycles later Control signals for WrB (MemtoReg,MemWr) are used 3 cycles later
4
4 A More Extensive Pipelining Example End of Cycle 4: Load’s Mem, R-type’s Exec, Store’s Reg, Beq’s Ifetch End of Cycle 5: Load’s WrB, R-type’s Mem, Store’s Exec, Beq’s Reg End of Cycle 6: R-type’s WrB, Store’s Mem, Beq’s Exec End of Cycle 7: Store’s WrB, Beq’s Mem Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8 IfetchReg/DecExecMemWrB0: Load IfetchReg/DecExecMemWrB4: R-type IfetchReg/DecExecMemWrB8: Store IfetchReg/DecExecMemWrB12: Beq (target is 1000) End of Cycle 4 End of Cycle 5 End of Cycle 6 End of Cycle 7
5
5 Pipelining Example: End of Cycle 4 0: Load’s Mem 4: R-type’s Exec 8: Store’s Reg 12: Beq’s Ifetch
6
6 Pipelining Example: End of Cycle 5 0: Lw’s Wr 4: R’s Mem 8: Store’s Exec 12: Beq’s Reg 16: R’s Ifetch
7
7 Pipelining Example: End of Cycle 6 4: R’s Wr 8: Store’s Mem 12: Beq’s Exec 16: R’s Reg 20: R’s Ifetch
8
8 Pipelining Example: End of Cycle 7 8: Store’s Wr 12: Beq’s Mem 16: R’s Exec 20: R’s Reg 24: R’s Ifet
9
9 CPU Designs: Summary Disadvantages of the Single Cycle Processor Long cycle time Long cycle time Cycle time is too long for all instructions except the Load Cycle time is too long for all instructions except the Load Multiple Clock Cycle Processor Divide the instructions into smaller steps Divide the instructions into smaller steps Execute each step (instead of the entire instruction) in one cycle Execute each step (instead of the entire instruction) in one cycle Pipelined Processor Natural enhancement of the multiple clock cycle processor Natural enhancement of the multiple clock cycle processor Each functional unit can only be used once per instruction Each functional unit can only be used once per instruction If an instruction is going to use a functional unit: If an instruction is going to use a functional unit: it must use it at the same stage as all other instructions Pipeline Control: Pipeline Control: Each stage’s control signal depends ONLY on the instruction that is currently in that stage
10
10 Wr Clk Cycle 1 Multiple Cycle Implementation: Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9Cycle 10 LoadIfetchRegExecMemWr IfetchRegExecMem LoadStore Pipelined Implementation: IfetchRegExecMemWrStore Clk Single Cycle Implementation : LoadStoreWaste Ifetch R-type IfetchRegExecMemWrR-type Cycle 1Cycle 2 IfetchRegExecMem Single Cycle vs. Multiple Cycle vs. Pipelined
11
11 Pipelining: Notation, Terminology etc. Time Discrete time steps Discrete time steps Represented as 1, 2, 3, … Represented as 1, 2, 3, … Space Pipe stages or segments (things that do processing) Pipe stages or segments (things that do processing) Represented as P, Q, R, S (or F, D, X, M, W for the MIPS pipeline) Represented as P, Q, R, S (or F, D, X, M, W for the MIPS pipeline) Operands Instructions or data items Instructions or data items Things that flow through, and are processed by, the pipeline Things that flow through, and are processed by, the pipeline Represented as a, b, c, … Represented as a, b, c, … In drawing pipelines, we conceal the obvious fact that each operand undergoes some changes in each pipe stage
12
12 Notations for Describing Pipelines Space-time diagram, or Gantt chart Reservation table by stages Rows represent pipeline stages Unbounded one way Notation of HP3 Reservation table by instructions Rows represent operands Unbounded both ways
13
13 Basic Terms Filling a pipeline Flushing or draining a pipeline Stage or segment delay Each stage may have a different stage delay Each stage may have a different stage delay Beat time (= max stage delay), or clock cycle time Number of stages End-to-end latency number of stages × beat time number of stages × beat time Stages are separated by latches (registers)
14
14 Speedup & Throughput of a Pipeline
15
15 Pipeline Hazards: Structural Hazard A relation between two instructions indicating that the two instructions may want to use the same hardware resource (function unit, register file port, shared bus, cache port, etc.) at the same time In principle, can always be eliminated by duplicating resources Low hardware utilization Low hardware utilization Increased cost Increased cost MIPS pipeline as designed so far does not have structural hazard But we had to avoid it (see example later) But we had to avoid it (see example later) Usually occurs when a functional unit is not fully pipelined (e.g., in floating point pipeline)
16
16 Example: Unified I- and D-Memory These diagrams are invalid: structural hazard on single memory port Pipeline diagrams with hazards resolved
17
17 Resolving Structural Hazards Early resolution (scheduling) Done well before the collision could occur, and usually at a place different from where the collision could happen Done well before the collision could occur, and usually at a place different from where the collision could happen Example: instructions are delayed in the ID stage Example: instructions are delayed in the ID stage Late resolution Done at the place where the collision might happen Done at the place where the collision might happen Done just before the collision is about to happen Done just before the collision is about to happen Example: Using an arbiter or a priority encoder Example: Using an arbiter or a priority encoder One instruction wins Others are denied access, stall, and wait for their next chance Why allow structural hazards in the first place? Reduce cost Reduce cost Reduce unit latency (by avoiding pipeline latch delays) Reduce unit latency (by avoiding pipeline latch delays) Hazards may be infrequent events (“make the common case fast”) Hazards may be infrequent events (“make the common case fast”)
18
18 Example: Cost of Structural Hazard Suppose that 40% of instruction mix are loads or stores, and that the ideal CPI of the pipelined machine is 1. Assume that the machine with the structural hazard has a clock rate that is 5% higher than the clock rate of the machine without the hazard. Which pipeline is faster, and by how much? Suppose that 40% of instruction mix are loads or stores, and that the ideal CPI of the pipelined machine is 1. Assume that the machine with the structural hazard has a clock rate that is 5% higher than the clock rate of the machine without the hazard. Which pipeline is faster, and by how much?
19
19 Data Hazard: Setup Instruction u D ( u ): domain of instruction u The set of all memory locations, registers (including implicit ones), flags, condition codes etc. that may be read by instruction u R ( u ): range of instruction u The set of all memory locations, registers (including implicit ones), flags, condition codes etc. that may be written by instruction u Instruction u Instruction v Instruction u Instruction v u < v is a relation that means that instruction u precedes instruction v in the original program order (i.e., on an unpipelined machine) The relation < is irreflexive, anti-symmetric, and transitive
20
20 Data Hazard: Definition Given two instructions u and v, such that u < v, there is a data hazard between them if any of the following conditions holds: The existence of one of these conditions means that a change in the order of reading/writing operands by the instructions from the order seen by sequentially executing instructions on an unpipelined machine could violate the intended semantics
21
21 Why Data Hazards Occur Pipelining changes relative timing of instructions Reads and writes occur at fixed positions of the pipeline So, if two instructions are “too close” (function of pipeline structure), order of reads and writes could change and produce incorrect values This instruction sequence exchanges values in R1 and R2 On unpipelined MIPS, back-to-back execution of sequence produces correct results On current pipelined MIPS, initiation of sequence in consecutive cycles produces incorrect results Reads are early, writes are late, so RAW hazards would be violated Reads are early, writes are late, so RAW hazards would be violated XORR2, R2, R1 XORR1, R1, R2 XORR2, R2, R1 XORR1, R1, R2 XORR2, R2, R1
22
22 Data Dependence and Hazards True (value, flow) dependence between instructions u and v means u produces a result value that v uses This is a producer-consumer relationship This is a producer-consumer relationship This is a dependence based on values, not on the names of the containers of the values This is a dependence based on values, not on the names of the containers of the values Every true dependence is a RAW hazard Not every RAW hazard is a true dependence Any RAW hazard that cannot be removed by renaming is a true dependence Any RAW hazard that cannot be removed by renaming is a true dependence Original program 1: A = B+C 2: A = D+E 3: G = A+H True dependence: (2,3) RAW hazard: (1,3), (2,3) Renamed Program 1: X = B+C 2: A = D+E 3: G = A+H True dependence: (2,3) RAW hazard: (2,3)
23
23 More on Hazards RAW hazards corresponding to value dependences are most difficult to deal with, since they can never be eliminated The second instruction is waiting for information produced by the first instruction The second instruction is waiting for information produced by the first instruction WAR and WAW hazards are name dependences Two instructions happen to use the same register (name), although they don’t have to Two instructions happen to use the same register (name), although they don’t have to Can often be eliminated by renaming, either in software or hardware Can often be eliminated by renaming, either in software or hardware Implies the use of additional resources, hence additional cost Renaming is not always possible: implicit operands such as accumulator, PC, or condition codes cannot be renamed These hazards don’t cause problems for MIPS pipeline These hazards don’t cause problems for MIPS pipeline Relative timing does not change even with pipelined execution, because reads occur early and writes occur late in pipeline
24
24 The Precedence Relation Consider a straight line program listed in original program order Define a relation D (the dependence relation) between pairs of instructions (u, v) as follows: D(u, v) if and only if (u < v), and there is a WAR, WAW, or RAW hazard between instructions u and v D(u, v) if and only if (u < v), and there is a WAR, WAW, or RAW hazard between instructions u and v D is irreflexive and anti-symmetric but not transitive D is irreflexive and anti-symmetric but not transitive Define the precedence relation P as the transitive closure of the dependence relation D P is irreflexive, anti-symmetric, and transitive P is irreflexive, anti-symmetric, and transitive Represent P by graph of its transitive reduction Represent P by graph of its transitive reduction precedence graph If P(u,v), then u must precede v in execution If P(u,v), then u must precede v in execution the two instructions cannot be interchanged, and in a pipeline they must maintain a “sufficient” distance ADD R4, R5, R6 ADD R3, R4, R5 ADD R2, R3, R7
25
25 Example of Precedence Relation 1:ADDR1, R7, R8 2:SW2000(R9), R8 3:LWR3, 0(R1) 4:LWR4, 3000(R9) 5:ADDR5, R3, R4 6:MULR6, R5, R5 1:ADDR1, R7, R8 2:SW2000(R9), R8 3:LWR3, 0(R1) 4:LWR4, 3000(R9) 5:ADDR5, R3, R4 6:MULR6, R5, R5 Assume that registers R7, R8, R9 are already initialized such that (R7)+(R8) = (R9)+2000 holds 4, 1, 2, 3, 5, 6 1, 4, 2, 3, 5, 6 1, 2, 4, 3, 5, 6 1, 2, 3, 4, 5, 6 4, 2, 1, 3, 5, 6 2, 4, 1, 3, 5, 6 2, 1, 4, 3, 5, 6 2, 1, 3, 4, 5, 6 4, 1, 2, 3, 5, 6 1, 4, 2, 3, 5, 6 1, 2, 4, 3, 5, 6 1, 2, 3, 4, 5, 6 4, 2, 1, 3, 5, 6 2, 4, 1, 3, 5, 6 2, 1, 4, 3, 5, 6 2, 1, 3, 4, 5, 6 These eight sequences of the six instructions can result in correct execution, because they respect the sequencing constraints of the precedence graph. We still have to ensure that they maintain “sufficient” distance in the instruction pipeline, which depends on the structure of the pipeline and the latencies of the operations. 12 34 5 6
26
26 Data Hazard: Effect on Compiler
27
27 Data Hazard: Effect on Pipelining 1:ADDR1, R2, R3 2:SUBR4, R5, R1 3:ANDR6, R1, R7 4:ORR8, R1, R9 5:XORR10, R1, R11 1:ADDR1, R2, R3 2:SUBR4, R5, R1 3:ANDR6, R1, R7 4:ORR8, R1, R9 5:XORR10, R1, R11 RAW hazards (1,2), (1,3), (1,4),(1,5)
28
28 Value Forwarding/Bypassing There is slack in how soon a value is actually available and how late it is actually required in the pipeline Result of R-type available at end of X stage Result of R-type available at end of X stage Operand of dependent R-type not needed until beginning of X stage Operand of dependent R-type not needed until beginning of X stage Communication of values among instructions happens through register file Globally known names of containers of values Globally known names of containers of values Accessed at fixed stages of pipeline (read in D, written in W) Accessed at fixed stages of pipeline (read in D, written in W) Forwarding/bypassing/short-circuiting corresponds to establishing a direct path between the producer of a value and its consumer, bypassing the container Allows us to exploit slack Allows us to exploit slack Requires additional resources (forwarding paths and controller) Requires additional resources (forwarding paths and controller) Identify all forwarding paths needed on MIPS (Figure in book is incomplete)
29
29 Forwarding: Example 2 1:ADDR1, R2, R3 2:LWR4, 0(R1) 3:SW12(R1), R4 1:ADDR1, R2, R3 2:LWR4, 0(R1) 3:SW12(R1), R4 Execution without forwarding: stall as necessary Execution with forwarding
30
30 Forwarding & Stalling: Example 3 L1:LWR2, 40(R8) L2:LWR3, 60(R8) A:ADDR4, R2, R3 S:SW60(R8), R4 L1:LWR2, 40(R8) L2:LWR3, 60(R8) A:ADDR4, R2, R3 S:SW60(R8), R4 Load has a latency of one cycle that cannot be hidden, as seen between L2 and A Without forwarding: stall where needed Bad attempt at forwarding! With forwarding: need a stall! Cannot go backward in time! Cannot jump ahead in time!
31
31 Forwarding & Stalling: Example 4 L:LWR1, 0(R1) S:SUBR4, R1, R5 A:ANDR6, R1, R7 O:ORR8, R1, R9 L:LWR1, 0(R1) S:SUBR4, R1, R5 A:ANDR6, R1, R7 O:ORR8, R1, R9 No forwarding needed from L to A: can resolve this by writing register file in first half of cycle and reading it in second half of cycle.
32
32 Rd Rt Rs Forwarding Unit Forward A Forward B Registers Data Memory ID/EX EX/MEM MEM/WB Load Data Forwarding
33
33 Compile-Time Scheduling A = B + C; D = E - F; A = B + C; D = E - F; L1:LWRb, B L2:LWRc, C A:ADDRa, Rb, Rc S1:SWA, Ra L3:LWRe, E L4:LWRf, F S:SUBRd, Re, Rf S2:SWD, Rd L1:LWRb, B L2:LWRc, C A:ADDRa, Rb, Rc S1:SWA, Ra L3:LWRe, E L4:LWRf, F S:SUBRd, Re, Rf S2:SWD, Rd L1:LWRb, B L2:LWRc, C L3:LWRe, E A:ADDRa, Rb, Rc L4:LWRf, F S1:SWA, Ra S:SUBRd, Re, Rf S2:SWD, Rd L1:LWRb, B L2:LWRc, C L3:LWRe, E A:ADDRa, Rb, Rc L4:LWRf, F S1:SWA, Ra S:SUBRd, Re, Rf S2:SWD, Rd
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.