Download presentation
Presentation is loading. Please wait.
1
An Introduction to pipelining
Lecture 7 An Introduction to pipelining
2
Pipelining: Its Natural!
Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer takes 40 minutes “Folder” takes 20 minutes A B C D
3
Sequential Laundry Sequential laundry takes 6 hours for 4 loads
6 PM 7 8 9 10 11 Midnight Time 30 40 20 30 40 20 30 40 20 30 40 20 T a s k O r d e A B C D Sequential laundry takes 6 hours for 4 loads If they learned pipelining, how long would laundry take?
4
Pipelined Laundry Start work ASAP
6 PM 7 8 9 10 11 Midnight Time 30 40 20 T a s k O r d e A B C D Pipelined laundry takes 3.5 hours for 4 loads
5
Pipelining Lessons Pipelining doesn’t help latency of single task, it helps throughput of entire workload Pipeline rate limited by slowest pipeline stage Multiple tasks operating simultaneously Potential speedup = Number pipe stages Unbalanced lengths of pipe stages reduces speedup Time to “fill” pipeline and time to “drain” it reduces speedup 6 PM 7 8 9 Time T a s k O r d e 30 40 20 A B C D
6
Definitions Pipe stage or pipe segment Pipeline depth Machine cycle
Latency Throughput
7
Design Issues Balance the length of each pipeline stage Problems
Depth of the pipeline Throughput = Time per instruction on unpipelined machine Problems Usually, stages are not balanced Pipelining overhead Hazards (conflicts) Performance (throughput CPU performance equation) Decrease of the CPI Decrease of cycle time
8
DLX Implementation Integer subset of DLX Unpipelined implementation
load/store word branch integer ALU NO jumps, NO FP Unpipelined implementation maximum five cycles per instruction
9
Instruction Formats I opcode rs1 rd immediate R opcode rs1 rs2 rd
5 6 10 11 15 16 31 R opcode rs1 rs2 rd function 5 6 10 11 15 16 20 21 31 J opcode name 5 6 31 Fixed-field decoding
10
1st and 2nd Instruction cycles
Instruction fetch (IF) IR Mem[PC]; NPC PC + 4 Instruction decode & register fetch (ID) A Regs[IR6..10]; B Regs[IR11..15]; Imm ((IR16)16 # # IR16..31)
11
3rd Instruction cycle Execution & effective address (EX)
Memory reference ALUOutput A + Imm Register - Register ALU instruction ALUOutput A func B Register - Immediate ALU instruction ALUOutput A op Imm Branch ALUOutput NPC + Imm; Cond (A op 0)
12
4th Instruction cycle Memory access & branch completion (MEM)
Memory reference PC NPC LMD Mem[ALUOutput] (load) Mem[ALUOutput] B (store) Branch if (cond) PC ALUOutput; else PC NPC
13
5th Instruction cycle Write-back (WB)
Register - register ALU instruction Regs[IR16..20] ALUOutput Register - immediate ALU instruction Regs[IR11..15] ALUOutput Load instruction Regs[IR11..15] LMD
14
Datapath IF ID EX MEM WB Mux Zero? Cond Add 4 Mux Mux A PC ALU Output
NPC 4 Mux Mux A PC ALU Output LMD Instr. Cache ALU IR Regs Data Cache Mux B Sign extend Imm IF ID EX MEM WB
15
Control Step 1 Step 2 Step 3 Step 3 Step 3 Step 3 Step 4 Step 4 Step 4
Load RR ALU Store Imm Step 3 Step 3 Step 3 Step 3 Step 4 Step 4 Step 4 Step 4 Step 5
16
Basic Pipeline Clock number 1 2 3 4 5 6 7 8 9 Instr # i i +1 i +2 i +3
Instr # IF ID EX MEM WB i i +1 IF ID EX MEM WB i +2 IF ID EX MEM WB i +3 IF ID EX MEM WB i +4 IF ID EX MEM WB
17
Pipeline Resources Reg IM DM Reg Reg IM DM Reg Reg IM DM Reg Reg IM DM
ALU Reg IM DM Reg ALU Reg IM DM Reg ALU Reg IM DM Reg ALU Reg IM DM Reg ALU
18
Pipelined Datapath MEM/WB IF/ID ID/EX EX/MEM Mux 4 Zero? Add Mux Mux
PC Instr. Cache ALU Regs Data Cache Mux Sign extend
19
Performance limitations
Imbalance among pipe stages limits cycle time to slowest stage Pipelining overhead Pipeline register delay Clock skew Clock cycle > clock skew + latch overhead
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.