An Introduction to pipelining

An Introduction to pipelining
Lecture 7 An Introduction to pipelining

Pipelining: Its Natural!
Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer takes 40 minutes “Folder” takes 20 minutes A B C D

Sequential Laundry Sequential laundry takes 6 hours for 4 loads
6 PM 7 8 9 10 11 Midnight Time 30 40 20 30 40 20 30 40 20 30 40 20 T a s k O r d e A B C D Sequential laundry takes 6 hours for 4 loads If they learned pipelining, how long would laundry take?

Pipelined Laundry Start work ASAP
6 PM 7 8 9 10 11 Midnight Time 30 40 20 T a s k O r d e A B C D Pipelined laundry takes 3.5 hours for 4 loads

Pipelining Lessons Pipelining doesn’t help latency of single task, it helps throughput of entire workload Pipeline rate limited by slowest pipeline stage Multiple tasks operating simultaneously Potential speedup = Number pipe stages Unbalanced lengths of pipe stages reduces speedup Time to “fill” pipeline and time to “drain” it reduces speedup 6 PM 7 8 9 Time T a s k O r d e 30 40 20 A B C D

Definitions Pipe stage or pipe segment Pipeline depth Machine cycle
Latency Throughput

Design Issues Balance the length of each pipeline stage Problems
Depth of the pipeline Throughput = Time per instruction on unpipelined machine Problems Usually, stages are not balanced Pipelining overhead Hazards (conflicts) Performance (throughput CPU performance equation) Decrease of the CPI Decrease of cycle time

DLX Implementation Integer subset of DLX Unpipelined implementation
load/store word branch integer ALU NO jumps, NO FP Unpipelined implementation maximum five cycles per instruction

Instruction Formats I opcode rs1 rd immediate R opcode rs1 rs2 rd
5 6 10 11 15 16 31 R opcode rs1 rs2 rd function 5 6 10 11 15 16 20 21 31 J opcode name 5 6 31 Fixed-field decoding

1st and 2nd Instruction cycles
Instruction fetch (IF) IR Mem[PC]; NPC PC + 4 Instruction decode & register fetch (ID) A Regs[IR6..10]; B Regs[IR11..15]; Imm ((IR16)16 # # IR16..31)

3rd Instruction cycle Execution & effective address (EX)
Memory reference ALUOutput A + Imm Register - Register ALU instruction ALUOutput A func B Register - Immediate ALU instruction ALUOutput A op Imm Branch ALUOutput NPC + Imm; Cond (A op 0)

4th Instruction cycle Memory access & branch completion (MEM)
Memory reference PC NPC LMD Mem[ALUOutput] (load) Mem[ALUOutput] B (store) Branch if (cond) PC ALUOutput; else PC NPC

5th Instruction cycle Write-back (WB)
Register - register ALU instruction Regs[IR16..20] ALUOutput Register - immediate ALU instruction Regs[IR11..15] ALUOutput Load instruction Regs[IR11..15] LMD

Datapath IF ID EX MEM WB Mux Zero? Cond Add 4 Mux Mux A PC ALU Output
NPC 4 Mux Mux A PC ALU Output LMD Instr. Cache ALU IR Regs Data Cache Mux B Sign extend Imm IF ID EX MEM WB

Control Step 1 Step 2 Step 3 Step 3 Step 3 Step 3 Step 4 Step 4 Step 4
Load RR ALU Store Imm Step 3 Step 3 Step 3 Step 3 Step 4 Step 4 Step 4 Step 4 Step 5

Basic Pipeline Clock number 1 2 3 4 5 6 7 8 9 Instr # i i +1 i +2 i +3
Instr # IF ID EX MEM WB i i +1 IF ID EX MEM WB i +2 IF ID EX MEM WB i +3 IF ID EX MEM WB i +4 IF ID EX MEM WB

Pipeline Resources Reg IM DM Reg Reg IM DM Reg Reg IM DM Reg Reg IM DM
ALU Reg IM DM Reg ALU Reg IM DM Reg ALU Reg IM DM Reg ALU Reg IM DM Reg ALU

Pipelined Datapath MEM/WB IF/ID ID/EX EX/MEM Mux 4 Zero? Add Mux Mux
PC Instr. Cache ALU Regs Data Cache Mux Sign extend

Performance limitations
Imbalance among pipe stages limits cycle time to slowest stage Pipelining overhead Pipeline register delay Clock skew Clock cycle > clock skew + latch overhead

An Introduction to pipelining

Similar presentations

Presentation on theme: "An Introduction to pipelining"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

An Introduction to pipelining

Similar presentations

Presentation on theme: "An Introduction to pipelining"— Presentation transcript:

Similar presentations

About project

Feedback