ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design
Part 11: Pipelining Chapter 4/6 Other handouts Course schedule with due dates To handout next time HW#1 Combinations to AV system, etc (1988 in 113 IST) Call AV hot line at

CPI Calculation CPI stands for average number of Cycles Per Instruction Assume an instruction mix of 24% loads, 12% stores, 44% R-format, 18% branches, and 2% jumps CPI = 0.24 * * * * * 3 = 4.04 Speedup? Question: Can we achieve a CPI of 1???

Speeding up through pipelining
Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer takes 30 minutes “Folder” takes 30 minutes “Stasher” takes 30 minutes to put clothes into drawers A B C D

Sequential Laundry 6 PM 7 8 9 10 11 12 1 2 AM T a s k O r d e 30 30 30
Sequential laundry takes 8 hours for 4 loads If they learned pipelining, how long would laundry take? 6 PM 7 8 9 10 11 12 1 2 AM T a s k O r d e 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 Time A B C D

Pipelined Laundry: Start work ASAP
6 PM 7 8 9 10 11 12 1 2 AM B C D A 30 Time T a s k O r d e Pipelined laundry takes 3.5 hours for 4 loads!

Pipelining Lessons 6 PM 7 8 9 T a s k O r d e B C D A 30
Pipelining doesn’t help latency of single task, it helps throughput of entire workload Multiple tasks operating simultaneously using different resources Potential speedup = Number pipe stages Pipeline rate limited by slowest pipeline stage Unbalanced lengths of pipe stages reduces speedup Time to “fill” pipeline and time to “drain” it reduces speedup 6 PM 7 8 9 Time T a s k O r d e B C D A 30

Pipelining Instructions
Time (in cycles) Fetch = 10 ns Decode = 6 ns Execute = 8 ns Memory = 10 ns Write back = 6 ns F D EX M W F D EX M W F D EX M W Instruction F D EX M W F D EX M W F D EX M W

Single Cycle, Multiple Cycle, vs. Pipeline
Clk Single Cycle Implementation: Load Store Waste Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk Multiple Cycle Implementation: Load Store R-type Ifetch Reg Exec Mem Wr Ifetch Reg Exec Mem Ifetch Here are the timing diagrams showing the differences between the single cycle, multiple cycle, and pipeline implementations. For example, in the pipeline implementation, we can finish executing the Load, Store, and R-type instruction sequence in seven cycles. In the multiple clock cycle implementation, however, we cannot start executing the store until Cycle 6 because we must wait for the load instruction to complete. Similarly, we cannot start the execution of the R-type instruction until the store instruction has completed its execution in Cycle 9. In the Single Cycle implementation, the cycle time is set to accommodate the longest instruction, the Load instruction. Consequently, the cycle time for the Single Cycle implementation can be five times longer than the multiple cycle implementation. But may be more importantly, since the cycle time has to be long enough for the load instruction, it is too long for the store instruction so the last part of the cycle here is wasted. +2 = 77 min. (X:57) Pipeline Implementation: Load Ifetch Reg Exec Mem Wr Store Ifetch Reg Exec Mem Wr R-type Ifetch Reg Exec Mem Wr

Why Pipeline? Suppose we execute 100 instructions Single Cycle Machine
45 ns/cycle x 1 CPI x 100 inst = 4500 ns Multicycle Machine 10 ns/cycle x 4.04 CPI (for the given inst mix) x 100 inst = ns Instruction mix of 24% loads, 12% stores, 44% R-format, 18% branches, and 2% jumps Ideal pipelined machine (with 5 stages) 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns Speedup=4.33 vs. single-cycle 3.88 vs. multi-cycle (for the given inst mix)

Why Pipeline? Because the resources are there!
d e Time (clock cycles) Inst 1 Inst 2 Inst 3 Inst 5 Inst 4 ALU Im Reg Dm

Pipelining Rules Inst 5 Inst 4 Inst 3 Inst 2 Inst 1
ALU IMem Reg DMem Forward traveling signals at each stage are latched Only perform logic on signals in the same stage signal labeling useful to prevent errors, e.g., IRR, IRA, IRM, IRW Backward travelling signals at each stage represent hazards

MIPS Pipelined Datapath
State registers between pipeline stages to isolate them IF:IFetch ID:Dec EX:Execute MEM: MemAccess WB: WriteBack Inst 5 Inst 4 Inst 3 Inst 2 Inst 1 Add Add 4 Shift left 2 Instruction Memory Read Addr 1 Data Memory Register File Read Data 1 Read Addr 2 Read Address IFetch/Dec PC Read Data Dec/Exec Exec/Mem Address Write Addr ALU Read Data 2 Mem/WB Note two exceptions to right-to-left flow WB that writes the result back into the register file in the middle of the datapath Selection of the next value of the PC, one input comes from the calculated branch address from the MEM stage Only later instructions in the pipeline can be influenced by these two REVERSE data movements. The first one (WB to ID) leads to data hazards. The second one (MEM to IF) leads to control hazards. All instructions must update some state in the processor – the register file, the memory, or the PC – so separate pipeline registers are redundant to the state that is updated (not needed). PC can be thought of as a pipeline register: the one that feeds the IF stage of the pipeline. Unlike all of the other pipeline registers, the PC is part of the visible architecture state – its content must be saved when an exception occurs (the contents of the other pipe registers are discarded). Write Data Write Data Sign Extend 16 32 System Clock

Pipeline Hazards Data hazards: an instruction uses the result of a previous instruction (RAW) ADD R1, R2, R3 or SW R1, 4(R2) SUB R4, R1, R5 LW R3, 4(R2) Control hazards: the address of the next instruction to be executed depends on a previous instruction BEQ R1,R2,CONT SUB R6,R7,R8 … CONT: ADD R3,R4,R5 Structural hazards: two instructions need access to the same resource e.g., single memory shared for instruction fetch and load/store

Structural Hazard lw Inst 1 Inst 2 Inst 3 Inst 4
Time (clock cycles) Reading data from memory ALU Mem Reg lw I n s t r. O r d e ALU Mem Reg Inst 1 ALU Mem Reg Inst 2 ALU Mem Reg Inst 3 Reading instruction from memory ALU Mem Reg Inst 4 Fix with separate instruction and data memories (I$ and D$)

Data Hazards (RAW) Time (in cycles) Instruction ADD R1, R2, R3
F D EX M W Write Data to R1 Here F D EX M W Instruction Get data from R1 Here ADD R1, R2, R3 SUB R4, R1, R5

One Way to handle a Data Hazard
By waiting – introducing stalls – but impacts CPI ALU IM Reg DM add $1,… I n s t r. O r d e stall stall stall ALU IM Reg DM sub $4,$1,$5

Must allow Wr/Rd in REG in same cycle
Split cycle into two halves I n s t r. O r d e Time (clock cycles) Inst 1 Inst 2 Inst 3 Inst 5 Inst 4 ALU Im Reg Dm

Only two stall cycles add $1,… stall stall sub $4,$1,$5 and $6,$1,$7
Write in 1st half, Read in 2nd half ALU IM Reg DM add $1,… I n s t r. O r d e stall stall sub $4,$1,$5 and $6,$1,$7 ALU IM Reg DM

Register File (write and then read)
Time (clock cycles) Fix register file access hazard by doing reads in the second half of the cycle and writes in the first half ALU IM Reg DM add $1, I n s t r. O r d e ALU IM Reg DM Inst 1 ALU IM Reg DM Inst 2 ALU IM Reg DM or $8,$1,$9 For lecture Define register reads to occur in the second half of the cycle and register writes in the first half clock edge that controls loading of pipeline state registers

Forwarding with Load-use Data Hazards
ALU IM Reg DM lw $1,4($2) I n s t r. O r d e ALU IM Reg DM sub $4,$1,$5 ALU IM Reg DM and $6,$1,$7 ALU IM Reg DM or $8,$1,$9 For lecture Note that lw is just another example of register usage (beyond ALU ops) Need to stall even with forwarding when data hazard involves a load ALU IM Reg DM xor $4,$1,$5 sub needs to stall Will still need one stall cycle even with forwarding

Injecting Bubbles and sub lw Inst -1 Inst -2 and sub bubble lw Inst -1
IF ID EX MEM WB and sub lw Inst -1 Inst -2 and sub bubble lw Inst -1 Add Add 4 Shift left 2 Instruction Memory Read Addr 1 Data Memory Register File Read Data 1 Read Addr 2 Read Address IFetch/Dec PC Read Data Dec/Exec Exec/Mem Address Write Addr ALU Read Data 2 Mem/WB Write Data Write Data Note two exceptions to right-to-left flow WB that writes the result back into the register file in the middle of the datapath Selection of the next value of the PC, one input comes from the calculated branch address from the MEM stage Only later instructions in the pipeline can be influenced by these two REVERSE data movements. The first one (WB to ID) leads to data hazards. The second one (MEM to IF) leads to control hazards. All instructions must update some state in the processor – the register file, the memory, or the PC – so separate pipeline registers are redundant to the state that is updated (not needed). PC can be thought of as a pipeline register: the one that feeds the IF stage of the pipeline. Unlike all of the other pipeline registers, the PC is part of the visible architecture state – its content must be saved when an exception occurs (the contents of the other pipe registers are discarded). Inst –2 Inst –1 lw sub and Sign Extend 16 32 System Clock

3 Types of Data Hazards RAW (read after write) WAW (write after write)
only hazard for ‘fixed’ pipelines later instruction must read after earlier instruction writes WAW (write after write) variable-length pipeline later instruction must write after earlier instruction writes WAR (write after read) instruction with late read (e.g., waiting for an execution unit) later instruction must write after earlier instruction reads F D EX M W add $1,$2,$3 sub $4,$1,$5 F D EX M W F D E1 E2 E3 E4 E5 W div $1,$4,$3 add $1,$2,$5 F D EX M W mlt $4,$1,$3 add $1,$2,$5 F D s1 s2 s3 s4 s5 E1 E2 E3 W F D EX M W

Control Hazard Time (in cycles) Instruction JR R25 ... XX: ADD ...
F D EX M W Destination Available Here F D EX M W Instruction Need Destination Here JR R25 ... XX: ADD ... Simple solution: Flush Instruction fetch until branch resolved

ECE232: Hardware Organization and Design

Similar presentations

Presentation on theme: "ECE232: Hardware Organization and Design"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ECE232: Hardware Organization and Design

Similar presentations

Presentation on theme: "ECE232: Hardware Organization and Design"— Presentation transcript:

Similar presentations

About project

Feedback