Pipelining - 1.

Slides:

Advertisements

Similar presentations

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Pipelined Processor.

Advertisements

Review: MIPS Pipeline Data and Control Paths

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 18 - Pipelined.

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan

Chapter Six Enhancing Performance with Pipelining

1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.

 The actual result $1 - $3 is computed in clock cycle 3, before it’s needed in cycles 4 and 5  We forward that value to later instructions, to prevent.

1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.

Pipeline Data Hazards: Detection and Circumvention Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly.

Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.

11/13/2015 8:57 AM 1 of 86 Pipelining Chapter 6. 11/13/2015 8:57 AM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.

CMPE 421 Parallel Computer Architecture Part 2: Hardware Solution: Forwarding.

1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,

CSE431 L07 Overcoming Data Hazards.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 07: Overcoming Data Hazards Mary Jane Irwin (

Computing Systems Pipelining: enhancing performance.

1/24/ :00 PM 1 of 86 Pipelining Chapter 6. 1/24/ :00 PM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.

CSIE30300 Computer Architecture Unit 05: Overcoming Data Hazards Hsin-Chou Chi [Adapted from material by and

Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.

Spr 2016, Mar 9... ELEC / Lecture 7 1 ELEC / Computer Architecture and Design Spring 2016 Pipeline Control and Performance.

EECS 322 April 10, 2000 EECS 322 Computer Architecture Pipeline Control, Data Hazards and Branch Hazards.

CSE 340 Computer Architecture Spring 2016 Overcoming Data Hazards.

Computer Organization

Stalling delays the entire pipeline

Note how everything goes left to right, except …

CDA 3101 Spring 2016 Introduction to Computer Organization

Pipelining Chapter 6.

Morgan Kaufmann Publishers

Performance of Single-cycle Design

Single Clock Datapath With Control

Appendix C Pipeline implementation

Chapter 4 The Processor Part 4

ECS 154B Computer Architecture II Spring 2009

ECS 154B Computer Architecture II Spring 2009

\course\cpeg323-08F\Topic6b-323

Forwarding Now, we’ll introduce some problems that data hazards can cause for our pipelined processor, and show how to handle them with forwarding.

Chapter 4 The Processor Part 3

Review: MIPS Pipeline Data and Control Paths

Morgan Kaufmann Publishers The Processor

Morgan Kaufmann Publishers The Processor

Morgan Kaufmann Publishers The Processor

Chapter 4 The Processor Part 2

Pipelining review.

Single-cycle datapath, slightly rearranged

Pipelining Chapter 6.

Morgan Kaufmann Publishers Enhancing Performance with Pipelining

A pipeline diagram Clock cycle lw $t0, 4($sp) IF ID

Lecture 9. MIPS Processor Design – Pipelined Processor Design #2

Pipelining in more detail

Data Hazards Data Hazard

Lecture 5. MIPS Processor Design

Pipeline control unit (highly abstracted)

The Processor Lecture 3.6: Control Hazards

Control unit extension for data hazards

The Processor Lecture 3.5: Data Hazards

Instruction Execution Cycle

Pipeline control unit (highly abstracted)

CSC3050 – Computer Architecture

Pipeline Control unit (highly abstracted)

Pipelining (II).

Control unit extension for data hazards

Morgan Kaufmann Publishers The Processor

Introduction to Computer Organization and Architecture

Control unit extension for data hazards

Stalls and flushes Last time, we discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.

The Processor: Datapath & Control.

©2003 Craig Zilles (derived from slides by Howard Huang)

Pipelined datapath and control

ELEC / Computer Architecture and Design Spring 2015 Pipeline Control and Performance (Chapter 6) Vishwani D. Agrawal James J. Danaher.

Presentation transcript:

Pipelining - 1

Pipelining: Definition An implementation technique instructions are overlapped in execution. A key technique for improving performance in many, if not all, contemporary processors. In multiple cycle implementation, steps of instructions are executed in different stages. functional units for different purposes In pipelined datapath Some units used more efficiently Some units must be replicated

Laundry Analogy Four different stages Identical time (30 min)

Pipelined Laundry

Multicycle Datapath

Pipelined Datapath MIPS instructions classically take five steps: IF - Fetch instruction from memory ID - Read registers while decoding the instruction EX - Execute the operation or calculate an address MEM - Access an operand in memory WB - Write result into a register

Times of Instruction Steps Instruction class Inst. Fetch Register Read ALU op Data access Register write Total time lw 200 ps 100 ps 800 ps sw 700 ps R-format 600 ps Branch 500 ps

Pipelined vs. Nonpipelined Program execution order 200 400 600 800 1000 1200 1400 1600 1800 Time IF ID EX MEM WB lw $t1, 100($s1) lw $t2, 200($s1) lw $t2, 300($s1) lw $t1, 100($s1) lw $t2, 200($s1) lw $t2, 300($s1) Program execution order Time 200 400 600 800 1000 1200 1400 1600 1800 IF EX MEM WB ID EX MEM WB ID IF EX MEM WB ID IF

Aspects of Pipelining Latency is the same (usually a little worse) Instruction throughput increases. What is the ideal speedup due to pipelining? In practice, it is hard to achieve the theoretical speedup because of 1) imperfectly balanced instruction steps and 2) hazards. Features of MIPS that make the pipelining easy: Instructions are of the same length. Few instruction formats Load/store architecture Operands are aligned in memory

Pipelining Problems Structural hazards: Control hazards: Data hazards: Hardware cannot support the combination of instructions in the same clock cycle. one memory for instruction and data. Control hazards: branch instructions change the flow of instructions Data hazards: an instruction depends on the result of a previous instruction The result may not be available when needed by the current instruction

Pipeline Stalls on Control Hazards Program execution order 200 400 600 800 1000 1200 1400 1600 1800 IF ID EX MEM WB add $t1, $s1, $s2 IF ID EX MEM WB beq $t2, $t3, 40 200 ps EX MEM WB ID IF lw $t4, 300($s1) 400 ps Assume that we can resolve branch in the second step Next instruction cannot start until the end of second cycle. The pipeline stops issuing new instruction, or it stalls.

Predict the Outcome Program execution order 2 4 6 8 10 12 14 16 18 IF ID EX MEM WB add $t1, $s1, $s2 IF ID EX MEM WB beq $t2, $t3, 40 2 ns EX MEM WB ID IF lw $t4, 300($s1) 2 ns One simple approach is to predict that branches will always fail. When otherwise occurs (branch is indeed taken), then started instruction is discarded (flushed out) and the instruction at the branch address will start executing.

Delayed Branches A solution actually used by the MIPS architecture. Delayed branches always execute next sequential instruction MIPS software places an instruction immediately after the delayed branch instruction that is not affected by the branch Compilers fill about 50% of the branch delay slots with useful instructions Program execution order 2 4 6 8 10 12 14 16 18 IF ID EX MEM WB beq $t2, $t3, 40 WB EX MEM ID IF add $t1, $s1, $s2 2 ns EX MEM WB ID IF lw $t4, 300($s1)

Data Hazards An instruction depends on the results of a previous instruction still in the pipeline. Example: add $s0, $t0,$t1 sub $t2, $s0, $t3 This is also known as data dependency. Normally, these types of data hazards could severely stall the pipeline We have to add three bubbles in the pipeline.

Forwarding First solution: compiler optimizations rearrange the instruction sequence. Second solution: forwarding (or bypassing). ALU creates the result much sooner than the result is actually written back to a register. As soon as the ALU creates it, the result immediately can be forwarded to the next instruction. Direct use of ALU results.

Direct use of ALU result Forwarding Direct use of ALU result

Forwarding: Examples Forwarding when an R-format instruction following a load instruction.

Reordering Code to Avoid Pipeline Stalls Code that causes a pipeline stall due to data hazard. lw $t0, 0($t1) lw $t2, 4($t1) addi $s0, $t2, 0x5 addi $s1, $t0, 0xA lw $t0, 0($t1) lw $t2, 4($t1) addi $s1, $t0, 0xA addi $s0, $t2, 0x5 Reordered code. No data hazard.

Recall: Single Cycle Datapath

Instruction and Data Flow Instruction and data move generally from left to right with two exceptions: Write-back stage, which places the result back into the register file in the middle of the data path => data hazard The selection of the next value of the PC, choosing between the incremented PC and the branch address from the MEM stage => control hazard

Design of Pipelined Datapath Break the datapath into smaller segments These segments can be shared by instructions Use registers between two consecutive segments of the datapath to hold the intermediate results. Pipeline registers

Pipelined Datapath

Example: IF of lw PC M U X IF/ID ID/EX EX/MEM MEM/WB Add 4 Add M U X Shift Left 2 PC M U X Instruction Memory Registers ALU Data Memory rt M U X 32 16 Sign Ext

Example: ID of lw PC M U X IF/ID ID/EX EX/MEM MEM/WB Add 4 Add M U X Shift Left 2 PC M U X Instruction Memory Registers ALU Data Memory M U X 32 16 Sign Ext rt rt rt

Example: EX of lw PC M U X IF/ID ID/EX EX/MEM MEM/WB Add 4 Add M U X Shift Left 2 PC M U X Instruction Memory Registers ALU Data Memory M U X 32 16 Sign Ext rt rt rt

Example: MEM of lw PC M U X IF/ID ID/EX EX/MEM MEM/WB Add 4 Add M U X Shift Left 2 PC M U X Instruction Memory Registers ALU Data Memory M U X 32 16 Sign Ext rt rt rt

Example: WB of lw PC M U X IF/ID ID/EX EX/MEM MEM/WB Add 4 Add M U X Shift Left 2 PC M U X Instruction Memory Registers ALU Data Memory M U X 32 16 Sign Ext rt rt rt

Example 2: sw, EX stage PC M U X IF/ID ID/EX EX/MEM MEM/WB Add 4 Add M Shift Left 2 PC M U X Instruction Memory Registers ALU Data Memory M U X 32 16 Sign Ext

Example 2: MEM of sw PC M U X IF/ID ID/EX EX/MEM MEM/WB Add 4 Add M U Shift Left 2 PC M U X Instruction Memory Registers ALU Data Memory M U X 32 16 Sign Ext

Oops! Nothing happening Example 2: WB of sw M U X Oops! Nothing happening IF/ID ID/EX EX/MEM MEM/WB Add 4 Add Shift Left 2 PC M U X Instruction Memory Registers ALU Data Memory M U X 32 16 Sign Ext

Summary So Far Not all instructions require the complete datapath. No information transfer from one pipeline stage to the next is possible except through pipeline registers Each component can be used only in a single pipeline stage Everything that happened in the previous stage will be overwritten.

Pipelined Control PC is written on each clock cycle (no write signal) No write signals for pipeline registers. IF stage: no control signal since instruction is read and PC is updated each cycle ID stage: No control signals EX stage: RegDst, ALUOp, ALUSrc MEM stage: branch, MemRead, MemWrite WB stage: MemtoReg, RegWrite

Pipelined Datapath with Control Signals M U X PCSrc ID/EX EX/MEM MEM/WB IF/ID Add Add Shift Left 2 4 RegWrite branch ALU MemtoReg MemWrite ALUSrc PC Registers zero M U X Instruction Memory M U X Data Memory 16 Sign Ext 32 ALU cntrl [15 0] 6 [20 16] rt M U X [15 11] rd MemRead ALUOp RegDst

Control Signals for Instructions Nine control signals that start in the EX stage Main control unit generates the control signals during the ID stage. Instruction EX Stage MEM Stage WB Stage RegDest ALU Op1 ALU Op0 ALUSrc Branch Mem Read Write RegWrite Memto-Reg R-format 1 lw sw x beq

Control Lines for the Last Three Stages instruction WB WB MEM MEM WB EX IF/ID ID/EX EX/MEM MEM/WB

Pipelined Datapath with Control Signals PCSrc M U X ID/EX EX/MEM Control WB M WB MEM/WB IF/ID EX M WB Add Add Shift Left 2 4 RegWrite branch MemtoReg ALU ALUSrc MemWrite PC zero Registers Instruction Memory M U X M U X Data Memory Sign Ext ALU cntrl [15 0] 16 32 [20 16] M U X MemRead [15 11] ALUOp RegDst

Example Show how these five instructions will go through the pipeline: lw $10, 20($1) sub $11, $2, $3 and $12, $4, $5 or $13, $6, $7 add $14, $8, $9

Clock 1 PCSrc PC IF: lw $10, 20($1) ID: before<1> EX: before<2> MEM: before<3> WB: … PCSrc M U X ID/EX Control 00 00 EX/MEM WB 000 000 00 M WB MEM/WB 0000 EX 00 IF/ID M WB Add Add Shift Left 2 4 RegWrite branch MemtoReg ALU ALUSrc MemWrite PC Registers zero Instruction Memory M U X M U X Data Memory Sign Ext ALU cntrl [15 0] 16 32 [20 16] M U X [15 11] MemRead ALUOp RegDst

Clock 2 PC IF: sub $11, $2, $3 ID: lw $10, 20($1) EX: before<1> MEM: before<2> WB: … PCSrc M U X ID/EX Control 11 00 EX/MEM WB 010 000 00 M WB MEM/WB IF/ID 0001 EX 00 M WB Add Add Shift Left 2 4 RegWrite branch MemtoReg ALU ALUSrc MemWrite PC Registers zero Instruction Memory M U X M U X Data Memory Sign Ext ALU cntrl [15 0] 16 32 [20 16] M U X [15 11] MemRead ALUOp RegDst

Clock 3 zero PC IF:and $12, $4, $5 ID: sub $11, $2, $3 EX: lw $10, 20($1) MEM: before<1> WB: … PCSrc M U X ID/EX Control 10 11 EX/MEM WB 000 010 00 M WB MEM/WB IF/ID 1100 EX 00 M WB Add 1 Add Shift Left 2 4 RegWrite branch MemtoReg ALUSrc MemWrite PC Registers ALU zero Instruction Memory M U X M U X Data Memory Sign Ext ALU cntrl [15 0] 16 32 [20 16] M U X [15 11] MemRead ALUOp RegDst

Clock 4 1 PC IF:or $13, $6,$7 ID:and $12, $4, $5 EX: sub $11, $2, $3 MEM: lw $10, 20($1) WB: … PCSrc M U X ID/EX Control 10 10 EX/MEM WB 000 000 11 M WB 1 MEM/WB 1 1100 EX 10 IF/ID M WB Add Add Shift Left 2 4 RegWrite branch MemtoReg ALUSrc MemWrite PC Registers ALU zero Instruction Memory M U X M U X Data Memory Sign Ext ALU cntrl [15 0] 16 32 [20 16] M U X [15 11] MemRead ALUOp RegDst

Clock 5 PC IF:add $14, $8, $9 ID:or $13, $6,$7 EX:and $12, $4, $5 MEM:sub … WB: lw PCSrc M U X ID/EX Control 10 10 EX/MEM WB 000 000 10 M WB 1 MEM/WB 1100 EX 1 IF/ID 10 M WB 1 Add Add Shift Left 2 4 RegWrite branch MemtoReg ALUSrc MemWrite PC Registers ALU zero Instruction Memory M U X M U X Data Memory Sign Ext ALU cntrl [15 0] 16 32 [20 16] M U X [15 11] MemRead ALUOp RegDst

Data Hazards & Forwarding Assumption: a register can be read and written in the same clock cycle.

Data Hazards & Forwarding Software solution for data hazards: sub $2, $1, $3 nop and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) But, these dependencies happen too often to rely on compilers

Pipelined Datapath with Control Signals M U X PCSrc ID/EX EX/MEM MEM/WB IF/ID Add Add Shift Left 2 4 RegWrite branch ALU MemtoReg MemWrite ALUSrc PC Registers zero M U X Instruction Memory M U X Data Memory 16 Sign Ext [15 0] 32 ALU cntrl 6 [20 16] M U X [15 11] MemRead ALUOp RegDst

A Notation for Detecting Data Hazards Two types of data hazards. EX/MEM.rd = ID/EX.rs EX/MEM.rd = ID/EX.rt MEM/WB.rd = ID/EX.rs MEM/WB.rd = ID/EX.rt Easy to detect data hazards using this notation sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 sub-and: EX/MEM.rd = ID/EX.rs = $2 sub-or: MEM/WB.rd = ID/EX.rt = $2

Forwarding

Pipeline Without Forwarding Registers ALU M U X ID/EX EX/MEM Data Memory EX/MEM.rd rd rt

Pipeline with Forwarding Unit Registers ALU M U X ID/EX EX/MEM Data Memory EX/MEM.rd rd rt EX/MEM.RegWrite MEM/WB.RegWrite ForwardA ForwardB rs MEM/WB.rd Forwarding Unit

Control Signals to Resolve Data Dependencies EX Hazard: if (EX/MEM.RegWrite and (EX/MEM.rd  0) and (EX/MEM.rd = ID/EX.rs)) ForwardA = 10 if (EX/MEM.RegWrite and (EX/MEM.rd  0) and (EX/MEM.rd = ID/EX.rt)) ForwardB = 10 MEM Hazard: if (MEM/WB.RegWrite and (MEM/WB.rd  0) and (MEM/WB.rd = ID/EX.rs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.rd  0) and (MEM/WB.rd = ID/EX.rt)) ForwardB = 01

Control Values for Forwarding MUXes MUX control Source Explanation ForwardA=00 ID/EX The first ALU operand comes from the register file ForwardA=10 EX/MEM The first ALU operand is forwarded from the prior ALU result ForwardA=01 MEM/WB The first ALU operand is forwarded from the data memory or an earlier ALU result ForwardB=00 The second ALU operand comes from the register file ForwardB=10 The second ALU operand is forwarded from the prior ALU result ForwardB=01 The second ALU operand is forwarded from the data memory or an earlier ALU result

Can’t Always Forward - Stalls Load word can still cause a hazard:

Pipelined Datapath with Control PCSrc M U X ID/EX EX/MEM Control WB M WB MEM/WB IF/ID EX M WB Add Add Shift Left 2 4 RegWrite branch MemtoReg rs ALU rt ALUSrc MemWrite PC zero Registers Instruction Memory M U X M U X Data Memory Sign Ext ALU cntrl [15 0] 16 32 [20 16] rt M U X MemRead [15 11] ALUOp RegDst

Hazard Detection Unit If (ID/EX.MemRead and ((ID/EX.rt = IF/ID.rs) or (ID/EX.rt = IF/ID.rt))) stall the pipeline

Stalling the Pipeline

How to Stall the Pipeline? The instructions in the IF and ID stages must be stalled; (i.e. not allowed to make progress) IF Stage PC and IF/ID registers have write enable inputs Prevent the PC register and the IF/ID pipeline register from changing (i.e. set their write enable inputs to 0). The instruction in the IF stage continue to read instructions using the same PC.

How to Stall the Pipeline ID Stage Since the data in the IF/ID register doesn’t change, the same instruction is decoded The registers in the ID stage will continue to be read using the same instruction fields in the IF/ID pipeline register. Hazard is detected in the ID stage, thus the we insert a bubble in the EX stage. Inserting a bubble means deasserting all the control signals in the ID stage in order to prevent the bubble writing into anything. Recall that control signals are generated in this stage

Pipelined Control

Pipelined Datapath with Control M U X PCSrc ID/EX EX/MEM MEM/WB IF/ID Add Add Shift Left 2 4 RegWrite branch ALU MemtoReg MemWrite ALUSrc PC Registers zero M U X Instruction Memory M U X Data Memory [15 0] 16 Sign Ext 32 ALU cntrl 6 [20 16] M U X [15 11] MemRead ALUOp

Branch (or Control) Hazards When the branch is decided to be taken, successive instructions following branch are already in the pipeline.

Branch (or Control) Hazards Note that the branch instruction decides whether to branch in the MEM stage different than multicycle desing Control hazards are relatively simpler to understand than data hazards They occur less frequently. There is nothing as effective against control hazards as forwarding is for data hazards. Simpler schemes to overcome misprediction or ease of penalties.

Assume Branch Not Taken “Stall until the branch is complete” is too slow. Alternative is to assume that the branch will not be taken If the branch is taken, the instructions that are being fetched and decoded must be discarded. If half the time the branches are taken, this optimization halves the cost of control hazards. To discard instructions, we merely change the original control values to 0s. Discarding instructions means we must be able to flush instructions in the IF, ID, and EX stages.

Reducing Branch Penalty Moving branch execution earlier in the pipeline If the next PC for a branch is selected at the end of EX stage, then the branch penalty becomes only two clock cycles. Many MIPS implementation move the branch execution to the ID stage. Address calculation can easily be moved to ID stage since the PC and the immediate field are in the IF/ID register. Comparing can be done faster using XOR and OR gates than using subtraction

Flushing @ Branch Misprediction Shift Left 2 + IF.FLUSH Hazard Detection Unit M U X ID/EX EX/MEM Control WB M U X M WB MEM/WB IF/ID EX M WB + RegWrite M U X 4 ALU = PC Registers Data Memory M U X Instruction Memory M U X Sign Ext M U X Forwarding Unit

Dynamic Branch Prediction Branch prediction buffer or branch history table contains a bit for each branch about its past behavior. The buffer is indexed by the lower portion of the address of the instruction. branch target buffer 1-bit prediction scheme has performance shortcomings. Consider a loop branch that branches nine times in a row, then it is not taken once. The accuracy of this prediction scheme is?

2-bit Prediction Scheme In 2-bit prediction scheme, a prediction must be wrong twice before it is changed.

Delayed-Branch Scheduling (a) From before (b) From target sub $t4, $t5, $t6 add $t1, $t2, $t3 beq $t1, $zero Delay slot (c) From fall through add $t1, $t2, $t3 beq $t1, $zero or $t7, $t8, $t9 sub $t4, $t5, $t6 Delay slot add $t1, $t2, $t3 beq $t2, $zero Delay slot beq $t2, $zero becomes add $t1, $t2, $t3 beq $t1, $zero becomes becomes add $t1, $t2, $t3 beq $t2, $zero sub $t4, $t5, $t6 or $t7, $t8, $t9 add $t1, $t2, $t3 sub $t4, $t5, $t6

Visiting the Old Example 1/2 Delays of functional units Memory access: 200 ps, ALU op: 100 ps, Register access: 50 ps. Single cycle datapath 200 + 50 + 100 + 200 + 50 = 600 ps SPECInt2000 Loads: 25%; Stores: 10%; Branch: 11%; Jumps: 2%; ALU Operations: 52% Multi cycle data path CPI = 525% + 410% + 311% + 32% + 452% = 4.12. execution time per instruction: ?  ? = ? ps

Visiting the Old Example 2/2 Assumptions: 50% of loads are immediately followed by an instruction that uses the result of load Branch penalty on misprediction: one clock cycle. 25% of branches are mispredicted. Jumps: 2 clock cycles Pipelined datapath CPI = 1.525% + 110% + 1.2511% + 22% + 152% = 1.1725 Average time per instruction: 1.1725  200 ps = 234.5 ps

Exception in the Pipelined Datapath 0x40 sub $11, $2, $4 0x44 and $12, $2, $5 0x48 or $13, $2, $6 0x4C add $1, $2, $1 0x50 slt $15, $6, $7 0x54 lw $16, 50($7) Suppose an overflow exception occurs in the add instruction. Exception handling routine 0x40000040 sw $25, 1000($0) 0x40000044 sw $26, 1004($0) ...

Example: clock 6 X = causing an exception lw $16, 50($7) slt $15, $6, $7 add $1, $2, $1 or $13, … and $12 … EX.Flush Hazard Detection Unit X M U X ID/EX M U X Control OR WB EX/MEM ID.Flush M M U X WB MEM/WB IF/ID EX M U X M WB Cause + + Except PC Shift Left 2 RegWrite M U X 4 ALU PC = Registers Data Memory M U X Instruction Memory M U X Sign Ext M U X Forwarding Unit IF.FLUSH

Example: clock 7 = + + sw $25, 1000($0) bubble bubble bubble or $13, … EX.Flush Hazard Detection Unit M U X ID/EX M U X Control OR WB EX/MEM ID.Flush M M U X WB MEM/WB IF/ID EX M U X M WB Cause + + Except PC Shift Left 2 RegWrite M U X 4 ALU PC = Registers Data Memory M U X Instruction Memory M U X Sign Ext M U X Forwarding Unit IF.FLUSH