55:035 Computer Architecture and Organization Lecture 10.

Slides:

Advertisements

Similar presentations

Pipeline Example: cycle 1 lw R10,9(R1) sub R11,R2, R3 and R12,R4, R5 or R13,R6, R7.

Advertisements

Instruction-Level Parallelism (ILP)

Review: MIPS Pipeline Data and Control Paths

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 18 - Pipelined.

©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan

Chapter Six Enhancing Performance with Pipelining

1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.

 The actual result $1 - $3 is computed in clock cycle 3, before it’s needed in cycles 4 and 5  We forward that value to later instructions, to prevent.

ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.

Chapter 4 Sections 4.1 – 4.4 Appendix D.1 and D.2 Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.

1 Stalls and flushes  So far, we have discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.

Supplementary notes for pipelining LW ____,____ SUB ____,____,____ BEQ ____,____,____ ; assume that, condition for branch is not satisfied OR ____,____,____.

COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections

1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.

Pipeline Data Hazards: Detection and Circumvention Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly.

Automobile Manufacturing 1. Build frame. 60 min. 2. Add engine. 50 min. 3. Build body. 80 min. 4. Paint. 40 min. 5. Finish.45 min. 275 min. Latency: Time.

Pipelined Datapath and Control

1 A pipeline diagram  A pipeline diagram shows the execution of a series of instructions. —The instruction sequence is shown vertically, from top to bottom.

CPE432 Chapter 4B.1Dr. W. Abu-Sufah, UJ Chapter 4B: The Processor, Part B-2 Read Section 4.7 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University.

Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.

Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.

11/13/2015 8:57 AM 1 of 86 Pipelining Chapter 6. 11/13/2015 8:57 AM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.

CMPE 421 Parallel Computer Architecture

Electrical and Computer Engineering University of Cyprus LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.

CMPE 421 Parallel Computer Architecture Part 2: Hardware Solution: Forwarding.

CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]

1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,

CSE431 L07 Overcoming Data Hazards.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 07: Overcoming Data Hazards Mary Jane Irwin (

Performance of Single-cycle Design

1/24/ :00 PM 1 of 86 Pipelining Chapter 6. 1/24/ :00 PM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.

CSIE30300 Computer Architecture Unit 05: Overcoming Data Hazards Hsin-Chou Chi [Adapted from material by and

Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.

Introduction to Computer Organization Pipelining.

Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.

Fall 2015, Oct 5... ELEC / Lecture 6 1 ELEC / Computer Architecture and Design Fall 2015 Pipelining (Chapter 6) Vishwani.

Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.

Spr 2016, Mar 9... ELEC / Lecture 7 1 ELEC / Computer Architecture and Design Spring 2016 Pipeline Control and Performance.

CSE 340 Computer Architecture Spring 2016 Overcoming Data Hazards.

Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:

Computer Organization

Stalling delays the entire pipeline

Note how everything goes left to right, except …

CDA 3101 Spring 2016 Introduction to Computer Organization

Performance of Single-cycle Design

Single Clock Datapath With Control

ECS 154B Computer Architecture II Spring 2009

ECS 154B Computer Architecture II Spring 2009

ECE232: Hardware Organization and Design

Forwarding Now, we’ll introduce some problems that data hazards can cause for our pipelined processor, and show how to handle them with forwarding.

Chapter 4 The Processor Part 3

Review: MIPS Pipeline Data and Control Paths

Chapter 6 Enhancing Performance with Pipelining

Chapter 4 The Processor Part 2

Single-cycle datapath, slightly rearranged

ELEC / Computer Architecture and Design Spring Pipelining (Chapter 6)

A pipeline diagram Clock cycle lw $t0, 4($sp) IF ID

Pipelining in more detail

Data Hazards Data Hazard

The Processor Lecture 3.6: Control Hazards

The Processor Lecture 3.5: Data Hazards

Pipelining Appendix A and Chapter 3.

Introduction to Computer Organization and Architecture

Pipelining - 1.

A relevant question Assuming you’ve got: One washer (takes 30 minutes)

Stalls and flushes Last time, we discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.

©2003 Craig Zilles (derived from slides by Howard Huang)

Pipelined datapath and control

ELEC / Computer Architecture and Design Spring 2015 Pipeline Control and Performance (Chapter 6) Vishwani D. Agrawal James J. Danaher.

Presentation transcript:

55:035 Computer Architecture and Organization Lecture 10

Outline  Pipelining Basics Throughput Execution Time Pipeline Registers Program Execution Hazards  Structural  Data  Control 255:035 Computer Architecture and Organization

ILP: Instruction Level Parallelism  Single-cycle and multi-cycle datapaths execute one instruction at a time.  How can we get better performance?  Answer: Execute multiple instruction at a time:  Pipelining – Enhance a multi-cycle datapath to fetch one instruction every cycle.  Parallelism – Fetch multiple instructions every cycle. 355:035 Computer Architecture and Organization

Automobile Team Assembly 1 car assembled every four hours 6 cars per day 180 cars per month 2,040 cars per year 1 hour 455:035 Computer Architecture and Organization

Automobile Assembly Line First car assembled in 4 hours (pipeline latency) thereafter 1 car per hour 21 cars on first day, thereafter 24 cars per day 717 cars per month 8,637 cars per year Task 1 1 hour Task 2 1 hour Task 3 1 hour Task 4 1 hour MecahnicalElectricalPaintingTesting 555:035 Computer Architecture and Organization

Throughput: Team Assembly Mechanical Electrical Painting Testing Time of assembling one car=n hours where n is the number of nearly equal subtasks, each requiring 1 unit of time Throughput=1/ncars per unit time Red car completed Red car started TimeBlue car started Blue car completed 655:035 Computer Architecture and Organization

Throughput: Assembly Line Mechanical Electrical Painting Testing Car 1 Car 2 Car 3 Car 4. Car 1 complete Car 2 complete Time to complete first car= ntime units (latency) Cars completed in time T= T – n + 1 Throughput= 1- (n - 1)/ T car per unit time Throughput (assembly line)1 – (n - 1)/ T n (n – 1) ─────────────────── =──────── =n – ───── → n Throughput (team assembly) 1/n T as T→∞ time 755:035 Computer Architecture and Organization

Some Features of Assembly Line Task 1 1 hour Task 2 1 hour Task 3 1 hour Task 4 1 hour MechanicalElectricalPaintingTesting Electrical parts delivered (JIT) Defect found Stall assembly line to fix the cause of defect 3 cars in the assembly line are suspects, to be removed (flush pipeline) 855:035 Computer Architecture and Organization

Pipelining in a Computer Divide datapath into nearly equal tasks, to be performed serially and requiring non-overlapping resources. Insert registers at task boundaries in the datapath; registers pass the output data from one task as input data to the next task. Synchronize tasks with a clock having a cycle time that just exceeds the time required by the longest task. Break each instruction down into a fixed number of tasks so that instructions can be executed in a staggered fashion. 955:035 Computer Architecture and Organization

Single-Cycle Datapath InstructionclassInstr. fetch fetch(IF) Instr. Decode (also reg. file read) (ID) Execution (ALU Operation)(EX) Data access (MEM) Write Back (Reg. file write) (WB) Total time lw lw2ns1ns2ns2ns1ns8ns sw sw2ns1ns2ns2ns8ns R-format add, sub, and, or, slt 2ns1ns2ns1ns8ns B-format, beq 2ns1ns2ns8ns No operation on data; idle time equalizes instruction length to a fixed clock period. 1055:035 Computer Architecture and Organization

Execution Time: Single-Cycle IF ID EX MEM WB Time (ns) lw $1, 100($0) lw $2, 200($0) lw $3, 300($0) Clock cycle time = 8 ns Total time for executing three lw instructions = 24 ns 1155:035 Computer Architecture and Organization

Pipelined Datapath InstructionclassInstr. fetch fetch(IF) Instr. Decode (also reg. file read) (ID) Execu- tion (ALU Opera- tion) (EX) Data access (MEM) Write Back (Reg. file write) (WB) Total time lw lw2ns1ns2ns2ns2ns1ns2ns10ns sw sw2ns1ns2ns2ns2ns1ns2ns10ns R-format: add, sub, and, or, slt 2ns1ns2ns2ns2ns1ns2ns10ns B-format:beq2ns1ns2ns2ns2ns1ns2ns10ns No operation on data; idle time inserted to equalize instruction lengths. 1255:035 Computer Architecture and Organization

Execution Time: Pipeline IF ID EX MEM RW Time (ns) lw $1, 100($0) lw $2, 200($0) lw $3, 300($0) Clock cycle time = 2 ns, four times faster than single-cycle clock Total time for executing three lw instructions = 14 ns Single-cycle time24 Performance ratio =──────────── =── = 1.7 Pipeline time :035 Computer Architecture and Organization

Pipeline Performance Clock cycle time = 2 ns 1,003 lw instructions: Total time for executing 1,003 lw instructions = 2,014 ns Single-cycle time8,024 Performance ratio =──────────── =──── = 3.98 Pipeline time2,014 10,003 lw instructions: Performance ratio =80,024 / 20,014 =3.998 → Clock cycle ratio (4) Pipeline performance approaches clock-cycle ratio for long programs. 1455:035 Computer Architecture and Organization

Single-Cycle Datapath Instr. mem. PC Add Reg. File Data mem. 1 mux 0 0 mux mux 0 Sign ext. Shift left 2 ALU Cont. CONTROL opcode MemWrite MemRead ALU Branch zero ALU MemtoReg ALUOp ALUSrc RegDst RegWrite IF: Instr. fetch ID: Instr. decode, reg. file read EX: Execute, address calc. MEM: mem. access WB: writeback 1555:035 Computer Architecture and Organization

Pipelining of RISC Instructions Fetch Instruction Examine Opcode Fetch Operands Perform Operation Store Result Although an instruction takes five clock cycles, one instruction is completed every cycle. IFIDEXMEMWB InstructionDecode ExecuteMemoryWrite Fetchinstruction andOperationBack Fetch operands to Reg file 1655:035 Computer Architecture and Organization

Pipeline Registers 1755:035 Computer Architecture and Organization

Pipeline Register Functions  Four pipeline registers are added: Register name Data held IF/ID PC+4, Instruction word (IW) ID/EX PC+4, R1, R2, IW(0-15) sign ext., IW(11-15) EX/MEM PC+4, zero, ALUResult, R2, IW(11-15) or IW(16-20) MEM/WB M[ALUResult], ALUResult, IW(11-15) or IW(16-20) 1855:035 Computer Architecture and Organization

Pipelined Datapath Instr mem PC Add Reg. File Data mem 1 mux 0 0 mux 1 4 Sgn ext Shift left 2 opcode ALU zero ALU IF/IDID/EXEX/MEMMEM/WB for R-type for I-type lw 1955:035 Computer Architecture and Organization

Five-Cycle Pipeline CC1 CC2 CC3 CC4 CC5 IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/IDID/EX MEM/WB EX/MEM 2055:035 Computer Architecture and Organization

Add Instruction  add$t0, $s1, $s2 Machine instruction word opcode $s1 $s2 $t0function CC1 CC2 CC3 CC4 CC5 IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/IDID/EX MEM/WB EX/MEM IFID EX MEM WB read $s1 add write $t0 read $s2 $s1+$s2 2155:035 Computer Architecture and Organization

Pipelined Datapath Executing add PC Reg. File 1 mux 0 0 mux 1 4 Sign ext. Shift left 2 opcode ALU zero ALU IF/IDID/EXEX/MEMMEM/WB s1 $s for R-type for I-type lw t0 Instr mem s2 $s2 Add addr Data mem data 2255:035 Computer Architecture and Organization

Load Instruction  lw$t0, 1200 ($t1)   opcode $t1 $t01200 CC1 CC2 CC3 CC4 CC5 IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/IDID/EX MEM/WB EX/MEM IFID EX MEM WB read $t1 add read write $t0 sign ext $t M[addr] :035 Computer Architecture and Organization

Pipelined Datapath Executing lw PC Add Reg. File addr Data mem data 1 mux 0 0 mux 1 4 Sign ext. Shift left 2 opcode ALU zero ALU IF/IDID/EXEX/MEMMEM/WB t $t for R-type for I-type lw t0 Instr mem 2455:035 Computer Architecture and Organization

Store Instruction  sw$t0, 1200 ($t1)   opcode $t1$t01200 CC1 CC2 CC3 CC4 CC5 IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/IDID/EX MEM/WB EX/MEM IFID EX MEM WB read $t1 add write sign ext $t M[addr] 1200 (addr) ← $t0 2555:035 Computer Architecture and Organization

Pipelined Datapath Executing sw PC Reg. File addr Data mem data 1 mux 0 0 mux 1 4 Sign ext. Shift left 2 opcode ALU zero ALU IF/IDID/EXEX/MEMMEM/WB t $t for R-type for I-type lw Instr mem t0 $t0 Add 2655:035 Computer Architecture and Organization

Executing a Program Consider a five-instruction segment: lw$10, 20($1) sub$11, $2, $3 add$12, $3, $4 lw$13, 24($1) add$14, $5, $6 2755:035 Computer Architecture and Organization

Program Execution time CC1 CC2 CC3 CC4 CC5 IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/ID ID/EX MEM/WB EX/MEM IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/ID ID/EX MEM/WB EX/MEM IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/ID ID/EX MEM/WB EX/MEM IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/IDID/EX MEM/WB EX/MEM IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/IDID/EX MEM/WB EX/MEM lw$10, 20($1) sub$11, $2, $3 add$12, $3, $4 lw$13, 24($1) add$14, $5, $6 Program instructions 2855:035 Computer Architecture and Organization

CC5 Instr mem PC Add Reg. File Data mem. 1 mux 0 0 mux 1 4 Sign ext. Shift left 2 opcode ALU zero ALU IF/IDID/EXEX/MEMMEM/WB for R-type for I-type lw IF: add $14, $5, $6ID: lw $13, 24($1)EX: add $12, $3, $4 MEM: sub $11, $2, $3 WB: lw $10, 20($1) 2955:035 Computer Architecture and Organization

Advantages of Pipeline After the fifth cycle (CC5), one instruction is completed each cycle; CPI ≈ 1, neglecting the initial pipeline latency of 5 cycles.  Pipeline latency is defined as the number of stages in the pipeline, or  The number of clock cycles after which the first instruction is completed. The clock cycle time is about four times shorter than that of single-cycle datapath and about the same as that of multicycle datapath. For multicycle datapath, CPI = 3. …. So, pipelined execution is faster, but :035 Computer Architecture and Organization

Pipeline Hazards  Definition: Hazard in a pipeline is a situation in which the next instruction cannot complete execution one clock cycle after completion of the present instruction.  Three types of hazards: Structural hazard (resource conflict) Data hazard Control hazard 3155:035 Computer Architecture and Organization

Structural Hazard  Two instructions cannot execute due to a resource conflict.  Example: Consider a computer with a common data and instruction memory. The fourth cycle of a lw instruction requires memory access (memory read) and at the same time the first cycle of the fourth instruction requires instruction fetch (memory read). This will cause a memory resource conflict. 3255:035 Computer Architecture and Organization

Example of Structural Hazard CC1 CC2 CC3 CC4 CC5 IM/DM ID, REG. FILE READ ALU IM/DM REG. FILE WRITE IF/ID ID/EX MEM/WB EX/MEM IM/DM ID, REG. FILE READ ALU IM/DM REG. FILE WRITE IF/IDID/EX MEM/WB EX/MEM IM/DM ID, REG. FILE READ ALU IM/DM REG. FILE WRITE IF/IDID/EX MEM/WB EX/MEM IM/DM ID, REG. FILE READ ALU IM/DM REG. FILE WRITE IF/IDID/EX MEM/WB EX/MEM lw$10, 20($1) sub$11, $2, $3 add$12, $3, $4 lw$13, 24($1) time Program instructions Common data and instr. Mem. Needed by two instructions 3355:035 Computer Architecture and Organization

Possible Remedies for Structural Hazards  Provide duplicate hardware resources in datapath.  Control unit or compiler can insert delays (no-op cycles) between instructions. This is known as pipeline stall or bubble. 3455:035 Computer Architecture and Organization

Stall (Bubble) for Structural Hazard CC1 CC2 CC3 CC4 CC5 IM/DM ID, REG. FILE READ ALU IM/DM REG. FILE WRITE IF/IDID/EX MEM/WB EX/MEM IM/DM ID, REG. FILE READ ALU IM/DM REG. FILE WRITE IF/ID ID/EX MEM/WB EX/MEM IM/DM ID, REG. FILE READ ALU IM/DM REG. FILE WRITE IF/ID ID/EX MEM/WB EX/MEM IM/DM ID, REG. FILE READ ALU IM/DM REG. FILE WRITE IF/ID ID/EX MEM/WB EX/MEM lw$10, 20($1) sub$11, $2, $3 add$12, $3, $4 lw$13, 24($1) time Program instructions Stall (bubble) 3555:035 Computer Architecture and Organization

Data Hazard  Data hazard means that an instruction cannot be completed because the needed data, being generated by another instruction in the pipeline, is not available.  Example: consider two instructions:  add$s0, $t0, $t1  sub$t2, $s0, $t3# needs $s0 3655:035 Computer Architecture and Organization

Example of Data Hazard CC1 CC2 CC3 CC4 CC5 IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/ID ID/EXMEM/WB EX/MEM IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/IDID/EX MEM/WB EX/MEM add$s0, $t0, $t1 sub$t2, $s0, $t3 time Program instructions Write s0 in CC5 Read s0 and t3 in CC3 We need to read s0 from reg file in cycle 3 But s0 will not be written in reg file until cycle 5 However, s0 will only be used in cycle 4 And it is available at the end of cycle :035 Computer Architecture and Organization

Forwarding or Bypassing  Output of a resource used by an instruction is forwarded to the input of some resource being used by another instruction.  Forwarding can eliminate some, but not all, data hazards. 3855:035 Computer Architecture and Organization

Forwarding for Data Hazard CC1 CC2 CC3 CC4 CC5 IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/ID ID/EX MEM/WB EX/MEM IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/IDID/EX MEM/WB EX/MEM add$s0, $t0, $t1 sub$t2, $s0, $t3 time Program instructions Write s0 in CC5 Read s0 and t3 in CC3 Forwarding 3955:035 Computer Architecture and Organization

Forwarding Unit Hardware Forwarding Unit Data Mem. ALU FORW. MUX ID/EXEX/MEMMEM/WB MUX Destination registers Source reg. IDs from opcode Data to reg. file 4055:035 Computer Architecture and Organization

Forwarding Alone May Not Work CC1 CC2 CC3 CC4 CC5 IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/IDID/EX MEM/WB EX/MEM IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/IDID/EX MEM/WB EX/MEM lw$s0, 20($s1) sub$t2, $s0, $t3 time Program instructions Write s0 in CC5 Read s0 and t3 in CC3 data needed by sub (data hazard) data available from memory only at the end of cycle :035 Computer Architecture and Organization

Use Bubble and Forwarding CC1 CC2 CC3 CC4 CC5 IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/ID ID/EXMEM/WB EX/MEM lw$s0, 20($s1) sub$t2, $s0, $t3 time Program instructions Write s0 in CC5 Forwarding stall (bubble) IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/IDID/EX MEM/WB EX/MEM 4255:035 Computer Architecture and Organization

Hazard Detection Unit Hardware Source reg. IDs from opcode Forwarding Unit Data Mem. ALU FORW. MUX ID/EX EX/MEMMEM/WB Control signals to reg. file Hazard Detection Unit PC Control NOP MUX 0 IF/ID Instruction Disable write 4355:035 Computer Architecture and Organization

Resolving Hazards  Hazards are resolved by Hazard detection and forwarding units.  Compiler’s understanding of how these units work can improve performance. 4455:035 Computer Architecture and Organization

Avoiding Stall by Code Reorder C code: A = B + E; C = B + F; MIPS code: lw$t1,0($t0). $t1 written lw$t2,4($t0)..$t2 written add$t3,$t1, $t2... $t1, $t2 needed sw$t3,12($t0).... lw$t4,8($t0)..... $t4 written add$t5,$t1, $t $t4 needed sw$t5,16,($t0) :035 Computer Architecture and Organization

Reordered Code C code: A = B + E; C = B + F; MIPS code: lw$t1,0($t0) lw$t2,4($t0) lw$t4,8($t0) add$t3,$t1, $t2no hazard sw$t3,12($t0) add$t5,$t1, $t4no hazard sw$t5,16,($t0) 4655:035 Computer Architecture and Organization

Control Hazard  Instruction to be fetched is not known!  Example: Instruction being executed is branch- type, which will determine the next instruction: add$4, $5, $6 beq$1, $2, 40 next instruction... 40and$7, $8, $9 4755:035 Computer Architecture and Organization

Stall on Branch CC1 CC2 CC3 CC4 CC5 IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/ID ID/EX MEM/WB EX/MEM IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/ID ID/EX MEM/WB EX/MEM IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/ID ID/EX MEM/WB EX/MEM add$4, $5, $6 beq $1, $2, 40 next instruction or and$7, $8, $9 time Program instructions Stall (bubble)

Why Only One Stall?  Extra hardware in ID phase:  Additional ALU to compute branch address  Comparator to generate zero signal  Hazard detection unit writes the branch address in PC 4955:035 Computer Architecture and Organization

Ways to Handle Branch  Stall or bubble  Branch prediction: Heuristics  Next instruction  Prediction based on statistics (dynamic)  Hardware decision (dynamic) Prediction error: pipeline flush  Delayed branch 5055:035 Computer Architecture and Organization

Delayed Branch Example  Stall on branch add $4, $5, $6 beq $1, $2, skip next instruction... skip or $7, $8, $9  Delayed branch beq $1, $2, skip add $4, $5, $6 next instruction... skip or $7, $8, $9 Instruction executed irrespective of branch decision 5155:035 Computer Architecture and Organization

Delayed Branch CC1 CC2 CC3 CC4 CC5 IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/ID ID/EX MEM/WB EX/MEM IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/ID ID/EX MEM/WB EX/MEM IM ID, REG. FILE READ ALU DM REG. FILE WRITE IF/ID ID/EX MEM/WB EX/MEM add $4, $5, $6 beq $1, $2, skip next instruction or skip or $7, $8, $9 time Program instructions 5255:035 Computer Architecture and Organization

Summary: Hazards Structural hazards  Cause: resource conflict  Remedies: (i) hardware resources, (ii) stall (bubble) Data hazards  Cause: data unavailablity  Remedies: (i) forwarding, (ii) stall (bubble), (iii) code reordering Control hazards  Cause: out-of-sequence execution (branch or jump)  Remedies: (i) stall (bubble), (ii) branch prediction/pipeline flush, (iii) delayed branch/pipeline flush 5355:035 Computer Architecture and Organization