Download presentation
Presentation is loading. Please wait.
1
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 9, 2002 Topic: Pipelining Basics
2
2 Reading: Appendix A (HP3) Lecture Overview A Pipelined Processor Introduction to the concept of pipelined processor Introduction to the concept of pipelined processor Pipelined Datapath Pipelined Datapath Pipeline example: Load Instruction Pipeline example: Load Instruction Pipelined Datapath and Pipelined Control Pipeline Example: Interaction among Instructions
3
3 ABCD Pipelining: It’s Natural! Laundry Example: Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Washer takes 30 minutes Dryer takes 40 minutes Dryer takes 40 minutes “Folder” takes 20 minutes “Folder” takes 20 minutes
4
4 Sequential laundry takes 6 hours for 4 loads If they learned pipelining, how long would laundry take? A B C D 304020304020304020304020 6 PM 789 10 11 Midnight TaskOrderTaskOrder Time Sequential Laundry
5
5 Pipelined laundry takes 3.5 hours for 4 loads A B C D 6 PM 789 10 11 Midnight TaskOrderTaskOrder Time 3040 20 Pipelined Laundry: Start work ASAP
6
6 A B C D 6 PM 789 TaskOrderTaskOrder Time 3040 20 Pipelining Lessons Pipelining doesn’t help latency of single task, it helps throughput of entire workload Pipeline rate limited by slowest pipeline stage Multiple tasks operating simultaneously Potential speedup = Number pipe stages Unbalanced lengths of pipe stages reduces speedup Time to “fill” pipeline and time to “drain” it reduces speedup
7
7 Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory Fetch the instruction from the Instruction Memory Reg/Dec: Registers Fetch and Instruction Decode Exec: Calculate the memory address Mem: Read the data from the Data Memory WrB: Write the data back to the register file Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5 IfetchReg/DecExecMemWrBLoad The Five Stages of a RISC Instruction
8
8 The load instruction has 5 stages: The load instruction has 5 stages: Five independent functional units to work on each stage Five independent functional units to work on each stage Each functional unit is used only once! A second load can start doing Ifetch as soon as the first load finishes its Ifetch stage A second load can start doing Ifetch as soon as the first load finishes its Ifetch stage Each load still takes five cycles to complete Each load still takes five cycles to complete The latency of a single load is still 5 cycles The throughput is much higher The throughput is much higher CPI approaches 1 Cycle time is ~1/5th the cycle time of the single-cycle implementation Instructions start executing before previous instructions complete execution Instructions start executing before previous instructions complete execution IfetchReg/DecExecMemWrBLoad Key Ideas Behind Instruction Pipelining CPI Cycle time
9
9 Pipelining the LOAD Instruction The five independent pipeline stages are: Read next instruction: The Ifetch stage Read next instruction: The Ifetch stage Decode instruction and fetch register values: The Reg/Dec stage Decode instruction and fetch register values: The Reg/Dec stage Execute the operation: The Exec stage Execute the operation: The Exec stage Access data memory: The Mem stage Access data memory: The Mem stage Write data to destination register: The WrB stage Write data to destination register: The WrB stage One instruction enters the pipeline every cycle One instruction comes out of the pipeline (completed) every cycle One instruction comes out of the pipeline (completed) every cycle The “effective” CPI is 7/3 (tends to 1); ~1/5 cycle time The “effective” CPI is 7/3 (tends to 1); ~1/5 cycle time Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7IfetchReg/DecExecMemWrB1st lw IfetchReg/DecExecMemWrB2nd lw IfetchReg/DecExecMemWrB3rd lw
10
10 Ifetch: Instruction fetch Fetch the instruction from the instruction memory Fetch the instruction from the instruction memory Reg/Dec: Registers fetch and instruction decode Exec: ALU operates on the two register operands WrB: Write the ALU output back to the register file Cycle 1Cycle 2Cycle 3Cycle 4 IfetchReg/DecExecWrBR-type The Four Stages of R-type
11
11 We have a problem called pipeline conflict or hazard Two instructions try to write to the register file at the same time! Two instructions try to write to the register file at the same time! “Contention for a shared resource” (in OS terminology) “Contention for a shared resource” (in OS terminology) It is no longer meaningful to talk about the execution of a single instruction in isolation Execution is inherently concurrent; need to achieve serializability Execution is inherently concurrent; need to achieve serializability Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecExecWrR-type IfetchReg/DecExecWrR-type IfetchReg/DecExecMemWrLoad IfetchReg/DecExecWrR-type IfetchReg/DecExecWrR-type OOPS! We have a problem! Pipelining the R-type and Load Instructions
12
12 Each functional unit can only be used once per instruction Each functional unit must be used at the same stage for all instructions Load uses Register File’s Write Port during its 5th stage Load uses Register File’s Write Port during its 5th stage R-type uses Register File’s Write Port during its 4th stage R-type uses Register File’s Write Port during its 4th stage IfetchReg/DecExecMemWrBLoad 12345 IfetchReg/DecExecWrBR-type 1234 How to resolve this pipeline hazard? Important Observations
13
13 Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9 IfetchReg/DecMemWrBR-type IfetchReg/DecMemWrBR-type IfetchReg/DecExecMemWrBLoad IfetchReg/DecMemWrBR-type IfetchReg/DecMemWrBR-type Exec IfetchReg/DecExecWrR-type Mem 123 4 5 Solution: Delay R-type’s Write by 1 Cycle Delay R-type’s register write by one cycle: Now R-type instructions also use Reg File’s write port at Stage 5 Now R-type instructions also use Reg File’s write port at Stage 5 Mem stage is a NO-OP stage: nothing is being done. Effective CPI? Mem stage is a NO-OP stage: nothing is being done. Effective CPI?
14
14 Ifetch: Instruction fetch Fetch the instruction from the instruction memory Fetch the instruction from the instruction memory Reg/Dec: Registers fetch and instruction decode Exec: Calculate the memory address Mem: Write the data into the data memory Cycle 1Cycle 2Cycle 3Cycle 4 IfetchReg/DecExecMemStoreWrB The Four Stages of Store
15
15 Ifetch: Instruction fetch Fetch the instruction from the instruction memory Fetch the instruction from the instruction memory Reg/Dec: Registers fetch and instruction decode Exec: ALU compares the two register operands Adder calculates the branch target address Adder calculates the branch target address Mem: If the registers we compared in the Exec stage are the same, Write the branch target address into the PC Write the branch target address into the PC Cycle 1Cycle 2Cycle 3Cycle 4 IfetchReg/DecExecMemBeqWrB The Four Stages of Beq
16
16 IF/ID Register ID/Ex Register Ex/Mem Register Mem/Wr Register PC Data Mem WA Di RADo IF_Unit A I RFile Di Ra Rb Rw MemWr RegWr ExtOp Exec Unit busA busB Imm16 ALUOp ALUSrc Mux 1 0 MemtoReg 1 0 RegDst Rt Rd Imm16 PC+4 Rs Rt PC+4 Zero Branch 1 0 Clk IfetchReg/DecExecMemWrB EX Unit A Pipelined Datapath
17
17 MemWr IF/ID: lw $1, 100 ($2) ID/Ex Register Ex/Mem Register Mem/Wr Register PC = 12 Data Me m WA Di RADo IF_Unit A I RFile Di Ra Rb Rw RegWr ExtOp Exec Unit busA busB Imm16 ALUOp ALUSrc Mux 1 0 MemtoReg 1 0 RegDst Rt Rd Imm16 PC+4 Rs Rt PC+4 Zero Branch 1 0 Clk IfetchReg/DecExecMem You are here! The Instruction Fetch Stage Location 8: lw $1, 0x100($2) $1 Mem{($2) + 0x100}
18
18 lw $1, 0x100 ($2) PC = 12 “8” Adder Instruction Memory “4” Instruction Address Clk Ifetch You are here! Reg/Dec PC+4 32 Detailed View of the Instruction Fetch Unit Location 8: lw $1, 0x100($2)
19
19 The Decode / Register Fetch Stage Location 8: lw $1, 0x100($2) $1 Mem{($2) + 0x100}
20
20 OP rs rt rd func PC + 4 Rw Control Rb Ra rt rs Register File rt rd Imm16 Bus-A Bus-B PC+4 Din Clk Detailed View of the Fetch/Decode Stage
21
21 MemWr Load’s Address Calculation Stage Location 8: lw $1, 0x100($2) $1 Mem{($2) + 0x100}
22
22 ID/Ex Register Ex/Mem: Load’s Memory Address ALU Control ALUctr 32 busA 32 busB Extender Mux 16 imm16 ALUSrc=1 ExtOp=1 3 ALU Zero 0 1 32 ALUout 32 Adder 3 ALUOp=Add << 2 32 PC+4 Target 32 Clk Exec You are here! Mem Detailed View of the Execution Unit
23
23 Load’s Memory Access Stage Location 8: lw $1, 0x100($2) $1 Mem{($2) + 0x100}
24
24 Load’s Write Back Stage Location 8: lw $1, 0x100($2) $1 Mem{($2) + 0x100}
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.