1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 9, 2002 Topic: Pipelining Basics.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 9, 2002 Topic: Pipelining Basics

2 Reading: Appendix A (HP3) Lecture Overview  A Pipelined Processor Introduction to the concept of pipelined processor Introduction to the concept of pipelined processor Pipelined Datapath Pipelined Datapath Pipeline example: Load Instruction Pipeline example: Load Instruction  Pipelined Datapath and Pipelined Control  Pipeline Example: Interaction among Instructions

3 ABCD Pipelining: It’s Natural! Laundry Example: Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Washer takes 30 minutes Dryer takes 40 minutes Dryer takes 40 minutes “Folder” takes 20 minutes “Folder” takes 20 minutes

4  Sequential laundry takes 6 hours for 4 loads If they learned pipelining, how long would laundry take? A B C D 304020304020304020304020 6 PM 789 10 11 Midnight TaskOrderTaskOrder Time Sequential Laundry

5 Pipelined laundry takes 3.5 hours for 4 loads A B C D 6 PM 789 10 11 Midnight TaskOrderTaskOrder Time 3040 20 Pipelined Laundry: Start work ASAP

6 A B C D 6 PM 789 TaskOrderTaskOrder Time 3040 20 Pipelining Lessons  Pipelining doesn’t help latency of single task, it helps throughput of entire workload  Pipeline rate limited by slowest pipeline stage  Multiple tasks operating simultaneously  Potential speedup = Number pipe stages  Unbalanced lengths of pipe stages reduces speedup  Time to “fill” pipeline and time to “drain” it reduces speedup

7  Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory Fetch the instruction from the Instruction Memory  Reg/Dec: Registers Fetch and Instruction Decode  Exec: Calculate the memory address  Mem: Read the data from the Data Memory  WrB: Write the data back to the register file Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5 IfetchReg/DecExecMemWrBLoad The Five Stages of a RISC Instruction

8 The load instruction has 5 stages: The load instruction has 5 stages: Five independent functional units to work on each stage Five independent functional units to work on each stage  Each functional unit is used only once! A second load can start doing Ifetch as soon as the first load finishes its Ifetch stage A second load can start doing Ifetch as soon as the first load finishes its Ifetch stage Each load still takes five cycles to complete Each load still takes five cycles to complete  The latency of a single load is still 5 cycles The throughput is much higher The throughput is much higher  CPI approaches 1  Cycle time is ~1/5th the cycle time of the single-cycle implementation Instructions start executing before previous instructions complete execution Instructions start executing before previous instructions complete execution IfetchReg/DecExecMemWrBLoad Key Ideas Behind Instruction Pipelining CPI  Cycle time 

9 Pipelining the LOAD Instruction  The five independent pipeline stages are: Read next instruction: The Ifetch stage Read next instruction: The Ifetch stage Decode instruction and fetch register values: The Reg/Dec stage Decode instruction and fetch register values: The Reg/Dec stage Execute the operation: The Exec stage Execute the operation: The Exec stage Access data memory: The Mem stage Access data memory: The Mem stage Write data to destination register: The WrB stage Write data to destination register: The WrB stage  One instruction enters the pipeline every cycle One instruction comes out of the pipeline (completed) every cycle One instruction comes out of the pipeline (completed) every cycle The “effective” CPI is 7/3 (tends to 1); ~1/5 cycle time The “effective” CPI is 7/3 (tends to 1); ~1/5 cycle time Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7IfetchReg/DecExecMemWrB1st lw IfetchReg/DecExecMemWrB2nd lw IfetchReg/DecExecMemWrB3rd lw

10  Ifetch: Instruction fetch Fetch the instruction from the instruction memory Fetch the instruction from the instruction memory  Reg/Dec: Registers fetch and instruction decode  Exec: ALU operates on the two register operands  WrB: Write the ALU output back to the register file Cycle 1Cycle 2Cycle 3Cycle 4 IfetchReg/DecExecWrBR-type The Four Stages of R-type

11  We have a problem called pipeline conflict or hazard Two instructions try to write to the register file at the same time! Two instructions try to write to the register file at the same time! “Contention for a shared resource” (in OS terminology) “Contention for a shared resource” (in OS terminology)  It is no longer meaningful to talk about the execution of a single instruction in isolation Execution is inherently concurrent; need to achieve serializability Execution is inherently concurrent; need to achieve serializability Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecExecWrR-type IfetchReg/DecExecWrR-type IfetchReg/DecExecMemWrLoad IfetchReg/DecExecWrR-type IfetchReg/DecExecWrR-type OOPS! We have a problem! Pipelining the R-type and Load Instructions

12  Each functional unit can only be used once per instruction  Each functional unit must be used at the same stage for all instructions Load uses Register File’s Write Port during its 5th stage Load uses Register File’s Write Port during its 5th stage R-type uses Register File’s Write Port during its 4th stage R-type uses Register File’s Write Port during its 4th stage IfetchReg/DecExecMemWrBLoad 12345 IfetchReg/DecExecWrBR-type 1234   How to resolve this pipeline hazard? Important Observations

13 Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecMemWrBR-type IfetchReg/DecMemWrBR-type IfetchReg/DecExecMemWrBLoad IfetchReg/DecMemWrBR-type IfetchReg/DecMemWrBR-type Exec IfetchReg/DecExecWrR-type Mem 123 4 5 Solution: Delay R-type’s Write by 1 Cycle  Delay R-type’s register write by one cycle: Now R-type instructions also use Reg File’s write port at Stage 5 Now R-type instructions also use Reg File’s write port at Stage 5 Mem stage is a NO-OP stage: nothing is being done. Effective CPI? Mem stage is a NO-OP stage: nothing is being done. Effective CPI?

14  Ifetch: Instruction fetch Fetch the instruction from the instruction memory Fetch the instruction from the instruction memory  Reg/Dec: Registers fetch and instruction decode  Exec: Calculate the memory address  Mem: Write the data into the data memory Cycle 1Cycle 2Cycle 3Cycle 4 IfetchReg/DecExecMemStoreWrB The Four Stages of Store

15  Ifetch: Instruction fetch Fetch the instruction from the instruction memory Fetch the instruction from the instruction memory  Reg/Dec: Registers fetch and instruction decode  Exec: ALU compares the two register operands Adder calculates the branch target address Adder calculates the branch target address  Mem: If the registers we compared in the Exec stage are the same, Write the branch target address into the PC Write the branch target address into the PC Cycle 1Cycle 2Cycle 3Cycle 4 IfetchReg/DecExecMemBeqWrB The Four Stages of Beq

16 IF/ID Register ID/Ex Register Ex/Mem Register Mem/Wr Register PC Data Mem WA Di RADo IF_Unit A I RFile Di Ra Rb Rw MemWr RegWr ExtOp Exec Unit busA busB Imm16 ALUOp ALUSrc Mux 1 0 MemtoReg 1 0 RegDst Rt Rd Imm16 PC+4 Rs Rt PC+4 Zero Branch 1 0 Clk IfetchReg/DecExecMemWrB EX Unit A Pipelined Datapath

17 MemWr IF/ID: lw $1, 100 ($2) ID/Ex Register Ex/Mem Register Mem/Wr Register PC = 12 Data Me m WA Di RADo IF_Unit A I RFile Di Ra Rb Rw RegWr ExtOp Exec Unit busA busB Imm16 ALUOp ALUSrc Mux 1 0 MemtoReg 1 0 RegDst Rt Rd Imm16 PC+4 Rs Rt PC+4 Zero Branch 1 0 Clk IfetchReg/DecExecMem You are here! The Instruction Fetch Stage Location 8: lw $1, 0x100($2) $1  Mem{($2) + 0x100}

18 lw $1, 0x100 ($2) PC = 12 “8” Adder Instruction Memory “4” Instruction Address Clk Ifetch You are here! Reg/Dec PC+4 32 Detailed View of the Instruction Fetch Unit Location 8: lw $1, 0x100($2)

19 The Decode / Register Fetch Stage Location 8: lw $1, 0x100($2) $1  Mem{($2) + 0x100}

20 OP rs rt rd func PC + 4 Rw Control Rb Ra rt rs Register File rt rd Imm16 Bus-A Bus-B PC+4 Din Clk Detailed View of the Fetch/Decode Stage

21 MemWr Load’s Address Calculation Stage Location 8: lw $1, 0x100($2) $1  Mem{($2) + 0x100}

22 ID/Ex Register Ex/Mem: Load’s Memory Address ALU Control ALUctr 32 busA 32 busB Extender Mux 16 imm16 ALUSrc=1 ExtOp=1 3 ALU Zero 0 1 32 ALUout 32 Adder 3 ALUOp=Add << 2 32 PC+4 Target 32 Clk Exec You are here! Mem Detailed View of the Execution Unit

23 Load’s Memory Access Stage Location 8: lw $1, 0x100($2) $1  Mem{($2) + 0x100}

24 Load’s Write Back Stage Location 8: lw $1, 0x100($2) $1  Mem{($2) + 0x100}

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 9, 2002 Topic: Pipelining Basics.

Similar presentations

Presentation on theme: "1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 9, 2002 Topic: Pipelining Basics."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 9, 2002 Topic: Pipelining Basics.

Similar presentations

Presentation on theme: "1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 9, 2002 Topic: Pipelining Basics."— Presentation transcript:

Similar presentations

About project

Feedback