Presentation is loading. Please wait.

Presentation is loading. Please wait.

EEL-4713 Ann Gordon-Ross 1 EEL-4713C Computer Architecture Designing a Pipelined Processor.

Similar presentations


Presentation on theme: "EEL-4713 Ann Gordon-Ross 1 EEL-4713C Computer Architecture Designing a Pipelined Processor."— Presentation transcript:

1 EEL-4713 Ann Gordon-Ross 1 EEL-4713C Computer Architecture Designing a Pipelined Processor

2 EEL-4713 Ann Gordon-Ross 2 Outline °Introduction to the pipelined datapath/control Key concepts and examples Beginning with multi-cycle datapath °Reading: 4.5-4.9

3 EEL-4713 Ann Gordon-Ross 3 Recap: Multiple Cycle Implementation °Break the instruction into smaller steps °Execute each step (instead of the entire instruction) in one cycle -Strive to keep steps with similar length °Instructions take multiple cycles to execute Only one instruction uses the datapath at any given cycle °Pipelining Goal: improve performance by having multiple instructions in the datapath at any given cycle

4 EEL-4713 Ann Gordon-Ross 4 Pipelining is Natural! °Laundry Example °Alice, Bob, Charlie, and Debbie each have one load of clothes to wash, dry, and fold °Washer takes 30 minutes °Drier takes 30 minutes °“Folder” takes 30 minutes °“Stasher” takes 30 minutes to put clothes into drawers ABCD

5 EEL-4713 Ann Gordon-Ross 5 Sequential Laundry °Sequential laundry takes 8 hours for 4 loads 30 TaskOrderTaskOrder B C D A Time 30 6 PM 7 8 9 10 11 12 1 2 AM

6 EEL-4713 Ann Gordon-Ross 6 Pipelined Laundry: Start work ASAP °Pipelined laundry takes 3.5 hours for 4 loads! TaskOrderTaskOrder 12 2 AM 6 PM 7 8 9 10 11 1 Time B C D A 30

7 EEL-4713 Ann Gordon-Ross 7 Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Multiple tasks operating simultaneously using different resources °Potential speedup = Number of pipe stages °Also: Pipeline rate limited by slowest pipeline stage Unbalanced lengths of pipe stages reduces speedup Time to “fill” pipeline and time to “drain” it reduces speedup Dependences - stalls 6 PM 789 Time B C D A 30 TaskOrderTaskOrder

8 EEL-4713 Ann Gordon-Ross 8 Why Pipeline? °Suppose we execute 100 instructions °Single Cycle Machine, 220MHz 4.5 ns/cycle x 1 CPI x 100 inst = 450 ns °Multicycle Machine, 1GHz, average CPI 4.6 1 ns/cycle x 4.6 CPI x 100 inst = 460 ns °Ideal pipelined machine, 1GHz 1 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 104 ns

9 EEL-4713 Ann Gordon-Ross 9 Datapath: from multicycle to pipelined °Key approach: Begin by considering each cycle a pipeline stage Add registers between consecutive stages to store data and control bits flowing from one stage to the next Add logic to deal with “hazards” -Ensure functional units are not used by more than one stage at the same time -Ensure producer/consumer dependences are not violated Memory/register write timing must change -Recall single and multiple cycle: read at beginning of cycle, write at end -Pipelined: could write every cycle, writes occur at the beginning of the clock cycle, reads at the end °Example: load instruction

10 EEL-4713 Ann Gordon-Ross 10 Timing Diagram of a Load Instruction Clk PC Rs, Rt, Rd, Op, Func Clk-to-Q ALUctr Instruction Memory Access Time Old ValueNew Value RegWrOld ValueNew Value Delay through Control Logic busA Register File Access Time Old ValueNew Value busB ALU Delay Old ValueNew Value Old ValueNew Value Old Value ExtOpOld ValueNew Value ALUSrcOld ValueNew Value AddressOld ValueNew Value busWOld ValueNew Delay through Extender & Mux Data Memory Access Time Instruction FetchInstr Decode / Reg. Fetch AddressReg WrData Memory Register File Write Time

11 EEL-4713 Ann Gordon-Ross 11 The Five Stages of Load °Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory °Reg/Dec: Registers Fetch and Instruction Decode °Exec: Calculate the memory address °Mem: Read the data from the Data Memory °Wr: Write the data back to the register file Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5 IfetchReg/DecExecMemWrLoad

12 EEL-4713 Ann Gordon-Ross 12 Pipelining the Load Instruction °The five independent functional units in the pipeline datapath are: Instruction Memory for the Ifetch stage Register file: read ports (bus A and busB) for the Reg/Dec stage ALU for the Exec stage Data Memory for the Mem stage Register File’s Write port (bus W) for the Wr stage °One instruction enters the pipeline every cycle One instruction comes out of the pipeline (complete) every cycle The effective cycles per Instruction (CPI) is 1 Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7 IfetchReg/DecExecMemWr1st lw IfetchReg/DecExecMemWr2nd lw IfetchReg/DecExecMemWr3rd lw

13 EEL-4713 Ann Gordon-Ross 13 Why Pipeline? Because the resources are there! I n s t r. O r d e r Time (clock cycles) Inst 0 Inst 1 Inst 2 Inst 4 Inst 3 ALU Im Reg DmReg ALU Im Reg DmReg ALU Im Reg DmReg ALU Im Reg DmReg ALU Im Reg DmReg

14 EEL-4713 Ann Gordon-Ross 14 Conventional Pipelined Execution Representation IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMemWB Program Flow Time

15 EEL-4713 Ann Gordon-Ross 15 Single Cycle, Multiple Cycle, vs. Pipeline Clk Cycle 1 Multiple Cycle Implementation: IfetchRegExecMemWr Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9Cycle 10 LoadIfetchRegExecMemWr IfetchRegExecMem LoadStore Pipeline Implementation: IfetchRegExecMemWrStore Clk Single Cycle Implementation: LoadStoreWaste Ifetch R-type IfetchRegExecMemWrR-type Cycle 1Cycle 2

16 EEL-4713 Ann Gordon-Ross 16 *Can pipelining get us into trouble? °Yes: Pipeline Hazards structural hazards: attempt to use the same resource two different ways at the same time -E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV) data hazards: attempt to use item before it is ready -E.g., one sock of pair in dryer and one in washer; can’t fold until get sock from washer through dryer -instruction depends on result of prior instruction still in the pipeline control hazards: attempt to make a decision before condition is evaluated -branch instructions °Can always resolve hazards by waiting pipeline control must detect the hazard take action (or delay action) to resolve hazards

17 EEL-4713 Ann Gordon-Ross 17 Mem Single Memory is a Structural Hazard I n s t r. O r d e r Time (clock cycles) Load Instr 1 Instr 2 Instr 3 Instr 4 ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Reg MemReg ALU Mem Reg MemReg Detection is easy in this case! (left half highlight means write, right half read) Notice this ordering is different than the single cycle implementation!

18 EEL-4713 Ann Gordon-Ross 18 Mem Single Register file is not a Structural Hazard I n s t r. O r d e r Time (clock cycles) Load Instr 1 Instr 2 Instr 3 Instr 4 ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Reg MemReg ALU Mem Reg MemReg Detection is easy in this case! (left half highlight means write, right half read) Notice this ordering is different than the single cycle implementation!

19 EEL-4713 Ann Gordon-Ross 19 Structural Hazards limit performance °Example: if average 1.3 memory accesses per instruction and only one memory access per cycle then average CPI >= 1.3 otherwise resource is more than 100% utilized More on Hazards later

20 EEL-4713 Ann Gordon-Ross 20 The Four Stages of R-type °Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory °Reg/Dec: Registers Fetch and Instruction Decode °Exec: ALU operates on the two register operands °Wr: Write the ALU output back to the register file Cycle 1Cycle 2Cycle 3Cycle 4 IfetchReg/DecExecWrR-type

21 EEL-4713 Ann Gordon-Ross 21 Pipelining the R-type and Load Instruction °We have a problem: Two instructions try to write to the register file at the same time! Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecExecWrR-type IfetchReg/DecExecWrR-type IfetchReg/DecExecMemWrLoad IfetchReg/DecExecWrR-type IfetchReg/DecExecWrR-type Ops! We have a problem!

22 EEL-4713 Ann Gordon-Ross 22 Important Observation °Each functional unit can only be used once per instruction °Each functional unit must be used at the same stage for all instructions: Load uses Register File’s Write Port during its 5th stage R-type uses Register File’s Write Port during its 4th stage IfetchReg/DecExecMemWrLoad 12345 IfetchReg/DecExecWrR-type 1234

23 EEL-4713 Ann Gordon-Ross 23 Solution 1: Insert “Bubble/Stall” into the Pipeline °Insert a “bubble/stall” into the pipeline to prevent 2 writes at the same cycle The control logic can be complex Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecExecWrR-type IfetchReg/DecExec IfetchReg/DecExecMemWrLoad IfetchReg/DecExecWr R-type IfetchReg/DecExecWr R-type Pipeline Bubble /Stall IfetchReg/DecExecWr

24 EEL-4713 Ann Gordon-Ross 24 Solution 2: Delay R-type’s Write by One Cycle °Delay R-type’s register write by one cycle: Now R-type instructions also use Reg File’s write port at Stage 5 Mem stage is a NOOP stage: nothing is being done Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecMemWrR-type IfetchReg/DecMemWrR-type IfetchReg/DecExecMemWrLoad IfetchReg/DecMemWrR-type IfetchReg/DecMemWrR-type IfetchReg/DecExecWrR-typeMem Exec 12345

25 EEL-4713 Ann Gordon-Ross 25 The Four Stages of Store °Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory °Reg/Dec: Registers Fetch and Instruction Decode °Exec: Calculate the memory address °Mem: Write the data into the Data Memory Cycle 1Cycle 2Cycle 3Cycle 4 IfetchReg/DecExecMemStoreWr

26 EEL-4713 Ann Gordon-Ross 26 The Four Stages of Beq °Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory °Reg/Dec: Registers Fetch and Instruction Decode °Exec: ALU compares the two register operands Adder calculates the branch target address °Mem: If the registers we compared in the Exec stage are the same, Write the branch target address into the PC Cycle 1Cycle 2Cycle 3Cycle 4 IfetchReg/DecExecMemBeqWr

27 EEL-4713 Ann Gordon-Ross 27 A Pipelined Datapath IF/ID RegisterID/Ex RegisterEx/Mem Register Mem/Wr Register PC Data Mem WA Di RADo IUnit A I RFile Di Ra Rb Rw MemWr RegWr ExtOp Exec Unit busA busB Imm16 ALUOp ALUSrc Mux 1 0 MemtoReg 1 0 RegDst Rt Rd Imm16 PC+4 Rs Rt PC+4 Zero Branch 1 0 Clk IfetchReg/DecExecMemWr

28 EEL-4713 Ann Gordon-Ross 28 The Instruction Fetch Stage IF/ID: lw $1, 100 ($2) ID/Ex Register Ex/Mem RegisterMem/Wr Register PC = 14 Data Mem WA Di RADo IUnit A I RFile Di Ra Rb Rw MemWr RegWr ExtOp Exec Unit busA busB Imm16 ALUOp ALUSrc Mux 1 0 MemtoReg 1 0 RegDst Rt Rd Imm16 PC+4 Rs Rt PC+4 Zero Branch 1 0 Clk IfetchReg/DecExecMem You are here! °Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]

29 EEL-4713 Ann Gordon-Ross 29 A Detailed View of the Instruction Unit °Location 10: lw $1, 0x100($2) IF/ID: lw $1, 100 ($2) PC = 14 1 0 10 Adder Instruction Memory “4”“4” Instruction Address Clk Ifetch You are here! Reg/Dec

30 EEL-4713 Ann Gordon-Ross 30 The Decode / Register Fetch Stage IF/ID: ID/Ex: Reg. 2 & 0x100 Ex/Mem Register Mem/Wr Register PC Data Mem WA Di RADo IUnit A I RFile Di Ra Rb Rw MemWr RegWr ExtOp Exec Unit busA busB Imm16 ALUOp ALUSrc Mux 1 0 MemtoReg 1 0 RegDst Rt Rd Imm16 PC+4 Rs Rt PC+4 Zero Branch 1 0 Clk IfetchReg/DecExecMem You are here! °Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]

31 EEL-4713 Ann Gordon-Ross 31 Load’s Address Calculation Stage IF/ID: ID/Ex Register Ex/Mem: Load ’ s Address Mem/Wr Register PC Data Mem WA Di RADo IUnit A I RFile Di Ra Rb Rw MemWr RegWr ExtOp=1 Exec Unit busA busB Imm16 ALUOp=Add ALUSrc=1 Mux 1 0 MemtoReg 1 0 RegDst=0 Rt Rd Imm16 PC+4 Rs Rt PC+4 Zero Branch 1 0 Clk IfetchReg/DecExecMem You are here! °Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]

32 EEL-4713 Ann Gordon-Ross 32 A Detailed View of the Execution Unit ID/Ex Register Ex/Mem: Load ’ s Memory Address ALU Control ALUctr 32 busA 32 busB Extender Mux 16 imm16 ALUSrc=1 ExtOp=1 3 ALU Zero 0 1 32 ALUout 32 Adder 3 ALUOp=Add << 2 32 PC+4 Target 32 Clk Exec You are here! Mem

33 EEL-4713 Ann Gordon-Ross 33 Load’s Memory Access Stage IF/ID: ID/Ex RegisterEx/Mem Register Mem/Wr: Load ’ s Data PC Data Mem WA Di RADo IUnit A I RFile Di Ra Rb Rw MemWr=0 RegWr ExtOp Exec Unit busA busB Imm16 ALUOp ALUSrc Mux 1 0 MemtoReg 1 0 RegDst Rt Rd Imm16 PC+4 Rs Rt PC+4 Zero Branch=0 1 0 Clk IfetchReg/DecExecMem You are here! °Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]

34 EEL-4713 Ann Gordon-Ross 34 Load’s Write Back Stage IF/ID: ID/Ex Register Ex/Mem RegisterMem/Wr Register PC Data Mem WA Di RADo IUnit A I RFile Di Ra Rb Rw MemWr RegWr=1 ExtOp Exec Unit busA busB Imm16 ALUOp ALUSrc Mux 1 0 MemtoReg=1 1 0 RegDst Rt Rd Imm16 PC+4 Rs Rt PC+4 Zero Branch 1 0 Clk IfetchReg/DecExecMem You are somewhere out there! °Location 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100] Wr

35 EEL-4713 Ann Gordon-Ross 35 How About Control Signals? IF/ID: ID/Ex Register Ex/Mem: Load ’ s Address Mem/Wr Register PC Data Mem WA Di RADo IUnit A I RFile Di Ra Rb Rw MemWr RegWr ExtOp=1 Exec Unit busA busB Imm16 ALUOp=Add ALUSrc=1 Mux 1 0 MemtoReg 1 0 RegDst=0 Rt Rd Imm16 PC+4 Rs Rt PC+4 Zero Branch 1 0 IfetchReg/DecExecMem °Key Observation: Control Signals at Stage N = Func (Instr. at Stage N) N = Exec, Mem, or Wr IF and ID are the same for every instruction °Example: Controls Signals at Exec Stage = Func(Load’s Exec) Wr

36 EEL-4713 Ann Gordon-Ross 36 Pipeline Control °The Main Control generates the control signals during Reg/Dec Control signals for Exec (ExtOp, ALUSrc,...) are used 1 cycle later Control signals for Mem (MemWr Branch) are used 2 cycles later Control signals for Wr (MemtoReg MemWr) are used 3 cycles later IF/ID RegisterID/Ex Register Ex/Mem Register Mem/Wr Register Reg/DecExecMem ExtOp ALUOp RegDst ALUSrc Branch MemWr MemtoReg RegWr Main Control ExtOp ALUOp RegDst ALUSrc MemtoReg RegWr MemtoReg RegWr MemtoReg RegWr Branch MemWr Branch MemWr Wr

37 EEL-4713 Ann Gordon-Ross 37 Beginning of the Wr’s Stage: A Real World Problem °At the beginning of the Wr stage, we have a problem if: RegAdr’s (Rd or Rt) Clk-to-Q > RegWr’s Clk-to-Q °Similarly, at the beginning of the Mem stage, we have a problem if: WrAdr’s Clk-to-Q > MemWr’s Clk-to-Q °We have a race condition between Address and Write Enable! Could write to incorrect address Ex/MemMem/Wr RegAdr RegWrMemWr Data WrAdr Data Reg File Data Memory Clk RegAdr RegWr RegWr ’ s Clk-to-Q RegAdr ’ s Clk-to-Q Clk WrAdr MemWr MemWr ’ s Clk-to-Q WrAdr ’ s Clk-to-Q

38 EEL-4713 Ann Gordon-Ross 38 The Pipeline Problem °Multiple Cycle design prevents race condition between Addr and WrEn: Make sure Address is stable by the end of Cycle N Asserts WrEn during Cycle N + 1 °This approach can NOT be used in the pipeline design because: Must be able to write the register file every cycle Must be able write the data memory every cycle Clock IfetchReg/DecExecMemWrStore IfetchReg/DecExecMemWrStore IfetchReg/DecExecMemWrR-type IfetchReg/DecExecMemWrR-type

39 EEL-4713 Ann Gordon-Ross 39 Synchronize Register File & Synchronize Memory °Solution: And the Write Enable signal with the Clock Must consult circuit expert to ensure no timing violation: -Example: Clock High Time > Write Access Delay WrEn I_Addr I_Data Reg File or Memory Clk I_Addr I_WrEn Address Data I_WrEn C_WrEn Clk Address Data WrEn Reg File or Memory Synchronize Memory and Register File Address, Data, and WrEn must be stable at least 1 set-up time before the Clk edge

40 EEL-4713 Ann Gordon-Ross 40 *A More Extensive Pipelining Example Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8 IfetchReg/DecExecMemWr0: Load IfetchReg/DecExecMemWr4: R-type IfetchReg/DecExecMemWr8: Store IfetchReg/DecExecMemWr12: Beq (target is 1000) End of Cycle 4 End of Cycle 5 End of Cycle 6 End of Cycle 7 °End of Cycle 4: Load’s Mem, R-type’s Exec, Store’s Reg, Beq’s Ifetch °End of Cycle 5: Load’s Wr, R-type’s Mem, Store’s Exec, Beq’s Reg °End of Cycle 6: R-type’s Wr, Store’s Mem, Beq’s Exec °End of Cycle 7: Store’s Wr, Beq’s Mem 16, 20, 24: R-type

41 EEL-4713 Ann Gordon-Ross 41 Pipelining Example: End of Cycle 4 °0: Load’s Mem 4: R-type’s Exec 8: Store’s Reg 12: Beq’s Ifetch IF/ID: Beq Instruction ID/Ex: Store ’ s busA & BEx/Mem: R-type ’ s Result Mem/Wr: Load ’ s Dout PC = 16 Data Mem WA Di RADo IUnit A I RFile Di Ra Rb Rw RegWr=0 ExtOp=x Exec Unit busA busB Imm16 ALUOp=R-type ALUSrc=0 Mux 1 0 MemtoReg=x 1 0 RegDst=1 Rt Rd Imm16 PC+4 Rs Rt PC+4 Zero Branch=0 1 0 12: Beq ’ s Ifet 8: Store ’ s Reg4: R-type ’ s Exec0: Load ’ s Mem Clk MemWr=0 Clk

42 EEL-4713 Ann Gordon-Ross 42 Pipelining Example: End of Cycle 5 °0: Lw’s Wr 4: R’s Mem 8: Store’s Exec 12: Beq’s Reg 16: R’s Ifetch IF/ID: Instruction @ 16 ID/Ex: Beq ’ s busA & B Ex/Mem: Store ’ s AddressMem/Wr: R-type ’ s Result PC = 20 Data Mem WA Di RADo IUnit A I RFile Di Ra Rb Rw RegWr=1 ExtOp=1 Exec Unit busA busB Imm16 ALUOp=Add ALUSrc=1 Mux 1 0 MemtoReg=1 1 0 RegDst=x Rt Rd Imm16 PC+4 Rs Rt PC+4 Zero Branch=0 1 0 16: R ’ s Ifet 12: Beq ’ s Reg8: Store ’ s Exec4: R-type ’ s Mem 0: Load ’ s Wr Clk MemWr=0 Clk

43 EEL-4713 Ann Gordon-Ross 43 Pipelining Example: End of Cycle 6 °4: R’s Wr 8: Store’s Mem 12: Beq’s Exec 16: R’s Reg 20: R’s Ifet IF/ID: Instruction @ 20 ID/Ex:R-type ’ s busA & B Ex/Mem: Beq ’ s Results Mem/Wr: Nothing for St PC = 24 Data Mem WA Di RADo IUnit A I RFile Di Ra Rb Rw RegWr=1 ExtOp=1 Exec Unit busA busB Imm16 ALUOp=Sub ALUSrc=0 Mux 1 0 MemtoReg=0 1 0 RegDst=x Rt Rd Imm16 PC+4 Rs Rt PC+4 Zero Branch=0 1 0 20: R-type ’ s Ifet 16: R-type ’ s Reg12: Beq ’ s Exec8: Store ’ s Mem 4: R-type ’ s Wr Clk MemWr=1 Clk

44 EEL-4713 Ann Gordon-Ross 44 Pipelining Example: End of Cycle 7 °8: Store’s Wr 12: Beq’s Mem 16: R’s Exec 20: R’s Reg 24: R’s Ifet IF/ID: Instruction @ 24 ID/Ex:R-type ’ s busA & BEx/Mem: Rtype ’ s Results Mem/Wr:Nothing for BeqPC = 1000 Data Mem WA Di RADo IUnit A I RFile Di Ra Rb Rw RegWr=0 ExtOp=x Exec Unit busA busB Imm16 ALUOp=R-type ALUSrc=0 Mux 1 0 MemtoReg=x 1 0 RegDst=1 Rt Rd Imm16 PC+4 Rs Rt PC+4 Zero Branch=1 1 0 24: R-type ’ s Ifet 20: R-type ’ s Reg16: R-type ’ s Exec12: Beq ’ s Mem 8: Store ’ s Wr Clk MemWr=0 Clk

45 EEL-4713 Ann Gordon-Ross 45 The Delay Branch Phenomenon °Although Beq is fetched during Cycle 4: Target address is NOT written into the PC until the end of Cycle 7 Branch’s target is NOT fetched until Cycle 8 3-instruction delay before the branch take effect °This is referred to as Branch Hazard: Clever design techniques can reduce the delay to ONE instruction Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9Cycle 10Cycle 11 IfetchReg/DecExecMemWr IfetchReg/DecExecMemWr 16: R-type IfetchReg/DecExecMemWr IfetchReg/DecExecMemWr24: R-type 12: Beq (target is 1000) 20: R-type Clk IfetchReg/DecExecMemWr1000: Target of Br

46 EEL-4713 Ann Gordon-Ross 46 The Delay Load Phenomenon °Although Load is fetched during Cycle 1: The data is NOT written into the Reg File until the end of Cycle 5 We cannot read this value from the Reg File until Cycle 6 3-instruction delay before the load take effect °This is referred to as Data Hazard: Clever design techniques can reduce the delay to ONE instruction Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8 IfetchReg/DecExecMemWrI0: Load IfetchReg/DecExecMemWrPlus 1 IfetchReg/DecExecMemWrPlus 2 IfetchReg/DecExecMemWrPlus 3 IfetchReg/DecExecMemWrPlus 4

47 EEL-4713 Ann Gordon-Ross 47 Summary °Disadvantages of the Single Cycle Processor Long cycle time Cycle time is too long for all instructions except the Load °Multiple Clock Cycle Processor: Divide the instructions into smaller steps Execute each step (instead of the entire instruction) in one cycle °Pipeline Processor: Natural enhancement of the multiple clock cycle processor Each functional unit can only be used once per instruction If a instruction is going to use a functional unit: -it must use it at the same stage as all other instructions Pipeline Control: -Each stage’s control signal depends ONLY on the instruction that is currently in that stage


Download ppt "EEL-4713 Ann Gordon-Ross 1 EEL-4713C Computer Architecture Designing a Pipelined Processor."

Similar presentations


Ads by Google