1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.

Slides:

Advertisements

Similar presentations

Morgan Kaufmann Publishers The Processor

Advertisements

Pipeline Example: cycle 1 lw R10,9(R1) sub R11,R2, R3 and R12,R4, R5 or R13,R6, R7.

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

ECE 445 – Computer Organization

Review: MIPS Pipeline Data and Control Paths

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan

 The actual result $1 - $3 is computed in clock cycle 3, before it’s needed in cycles 4 and 5  We forward that value to later instructions, to prevent.

Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.

Lecture 28: Chapter 4 Today’s topic –Data Hazards –Forwarding 1.

Control Hazards.1 Review: Datapath with Data Hazard Control Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register.

1 Stalls and flushes  So far, we have discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.

Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.

Chapter 4B: The Processor, Part B. Review: Why Pipeline? For Performance! I n s t r. O r d e r Time (clock cycles) Inst 0 Inst 1 Inst 2 Inst 4 Inst 3.

55:035 Computer Architecture and Organization Lecture 10.

Pipeline Data Hazards: Detection and Circumvention Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly.

Pipelined Datapath and Control

CPE432 Chapter 4B.1Dr. W. Abu-Sufah, UJ Chapter 4B: The Processor, Part B-2 Read Section 4.7 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University.

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 8: MIPS Pipelined.

Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.

11/13/2015 8:57 AM 1 of 86 Pipelining Chapter 6. 11/13/2015 8:57 AM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.

Basic Pipelining & MIPS Pipelining Chapter 6 [Computer Organization and Design, © 2007 Patterson (UCB) & Hennessy (Stanford), & Slides Adapted from: Mary.

CMPE 421 Parallel Computer Architecture Part 2: Hardware Solution: Forwarding.

Winter 2002CSE Topic Branch Hazards in the Pipelined Processor.

1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,

CSE431 L07 Overcoming Data Hazards.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 07: Overcoming Data Hazards Mary Jane Irwin (

Computing Systems Pipelining: enhancing performance.

1/24/ :00 PM 1 of 86 Pipelining Chapter 6. 1/24/ :00 PM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 9: MIPS Pipeline.

CSIE30300 Computer Architecture Unit 05: Overcoming Data Hazards Hsin-Chou Chi [Adapted from material by and

Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

CMPE 421 Parallel Computer Architecture Part 3: Hardware Solution: Control Hazard and Prediction.

CSIE30300 Computer Architecture Unit 06: Containing Control Hazards

Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.

Designing a Pipelined Processor

Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.

Spr 2016, Mar 9... ELEC / Lecture 7 1 ELEC / Computer Architecture and Design Spring 2016 Pipeline Control and Performance.

CSE 340 Computer Architecture Spring 2016 Overcoming Data Hazards.

Computer Organization CS224

Stalling delays the entire pipeline

CDA 3101 Spring 2016 Introduction to Computer Organization

Morgan Kaufmann Publishers The Processor

Single Clock Datapath With Control

Morgan Kaufmann Publishers The Processor

Chapter 4 The Processor Part 4

ECS 154B Computer Architecture II Spring 2009

ECS 154B Computer Architecture II Spring 2009

Morgan Kaufmann Publishers The Processor

Forwarding Now, we’ll introduce some problems that data hazards can cause for our pipelined processor, and show how to handle them with forwarding.

Chapter 4 The Processor Part 3

Review: MIPS Pipeline Data and Control Paths

Morgan Kaufmann Publishers The Processor

Morgan Kaufmann Publishers The Processor

Pipelining review.

The processor: Pipelining and Branching

Morgan Kaufmann Publishers Enhancing Performance with Pipelining

Computer Organization CS224

Pipelining in more detail

The Processor Lecture 3.6: Control Hazards

The Processor Lecture 3.5: Data Hazards

CSC3050 – Computer Architecture

Pipelining (II).

Introduction to Computer Organization and Architecture

Pipelining - 1.

Stalls and flushes Last time, we discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.

©2003 Craig Zilles (derived from slides by Howard Huang)

Need to stall for one cycle.

ELEC / Computer Architecture and Design Spring 2015 Pipeline Control and Performance (Chapter 6) Vishwani D. Agrawal James J. Danaher.

Presentation transcript:

1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes called a bubble  Notice that we’re still using forwarding in cycle 5, to get data from the MEM/WB pipeline register to the ALU DM Reg IM DM Reg IM lw$2, 20($3) and$12, $2, $5 Clock cycle

2 Stalling and forwarding  Without forwarding, we’d have to stall for two cycles to wait for the LW instruction’s writeback stage  In general, you can always stall to avoid hazards—but dependencies are very common in real code, and stalling often can reduce performance by a significant amount DM Reg IM DM Reg IM lw$2, 20($3) and$12, $2, $5 Clock cycle

Load-Use Hazard Detection Check when using instruction is decoded in ID stage ALU operand register numbers in ID stage are given by IF/ID.RegisterRs, IF/ID.RegisterRt Load-use hazard when ID/EX.MemRead and ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.RegisterRt)) If detected, stall and insert bubble

How to Stall the Pipeline Force control values in ID/EX register to 0 EX, MEM and WB do nop (no-operation) Prevent update of PC and IF/ID register Using instruction is decoded again Following instruction is fetched again 1-cycle stall allows MEM to read data for lw Can subsequently forward to EX stage

5 Stalling delays the entire pipeline  If we delay the second instruction, we’ll have to delay the third one too —This is necessary to make forwarding work between AND and OR —It also prevents problems such as two instructions trying to write to the same register in the same cycle DM Reg IM DM Reg IM DMReg IM lw$2, 20($3) and$12, $2, $5 or$13, $12, $2 Clock cycle

6  But what about the ALU during cycle 4, the data memory in cycle 5, and the register file write in cycle 6?  Those units aren’t used in those cycles because of the stall, so we can set the EX, MEM and WB control signals to all 0s. Reg What about EX, MEM, WB DM Reg IM RegIM lw$2, 20($3) and$12, $2, $5 or$13, $12, $2 DMReg IM DM Reg Clock cycle

7 Detecting Stalls, cont.  When should stalls be detected? EX stage (of the instruction causing the stall) Reg DM Reg IM RegIM lw$2, 20($3) and$12, $2, $5 DM Reg id/exif/id ex/mem mem\wb id/ex if/id ex/mem mem\wb if/id  What is the stall condition? if (ID/EX.MemRead = 1 and (ID/EX.rt = IF/ID.rs or ID/EX.rt = IF/ID.rt)) then stall

8 Adding hazard detection to the CPU

Stalls and Performance  Stalls reduce performance —But are required to get correct results  Compiler can arrange code to avoid hazards and stalls —Requires knowledge of the pipeline structure

Code Scheduling to Avoid Stalls Reorder code to avoid use of load result in the next instruction Ex: c code for A = B + E; C = B + F; lw$t1, 0($t0) lw$t2, 4($t0) add$t3, $t1, $t2 sw$t3, 12($t0) lw$t4, 8($t0) add$t5, $t1, $t4 sw$t5, 16($t0) stall lw$t1, 0($t0) lw$t2, 4($t0) lw$t4, 8($t0) add$t3, $t1, $t2 sw$t3, 12($t0) add$t5, $t1, $t4 sw$t5, 16($t0) 11 cycles13 cycles

11 Branches in the original pipelined datapath Read address Instruction memory Instruction [31-0] Address Write data Data memory Read data MemWrite MemRead 1010 MemToReg 4 Shift left 2 PCPC Add 1010 PCSrc Sign extend ALUSrc Result Zero ALU ALUOp Instr [15 - 0] RegDst Read register 1 Read register 2 Write register Write data Read data 2 Read data 1 Registers RegWrite Add Instr [ ] Instr [ ] IF/ID ID/EX EX/MEM MEM/WB EX M WB Control M WB When are they resolved?

Branch Hazards If branch outcome determined in MEM: PC Flush these instructions (Set control values to 0)

Reducing Branch Delay Move hardware to determine outcome to ID stage —Target address adder —Register comparator Example: branch taken 36: sub $10, $4, $8 40: beq $1, $3, 7 44: and $12, $2, $5 48: or $13, $2, $6 52: add $14, $4, $2 56: slt $15, $6, $ : lw $4, 50($7)

Example: Branch Taken

Data Hazards for Branches If a comparison register is a destination of 2 nd or 3 rd preceding ALU instruction … IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB add $4, $5, $6 add $1, $2, $3 beq $1, $4, target Can resolve using forwarding

Data Hazards for Branches If a comparison register is a destination of preceding ALU instruction or 2 nd preceding load instruction Need 1 stall cycle beq stalled IFIDEXMEMWB IFIDEXMEMWB IFID EXMEMWB add $4, $5, $6 lw $1, addr beq $1, $4, target

Data Hazards for Branches If a comparison register is a destination of immediately preceding load instruction —Need 2 stall cycles beq stalled IFIDEXMEMWB IFID EXMEMWB beq stalled lw $1, addr beq $1, $0, target

Branch Prediction Longer pipelines can’t readily determine branch outcome early Stall penalty becomes unacceptable Predict (i.e., guess) outcome of branch Only stall if prediction is wrong Simplest prediction strategy predict branches not taken Works well for loops if the loop tests are done at the start. Fetch instruction after branch, with no delay

Dynamic Branch Prediction  In deeper and superscalar pipelines, branch penalty is more significant  Use dynamic prediction  Branch prediction buffer (aka branch history table)  Indexed by recent branch instruction addresses  Stores outcome (taken/not taken)  To execute a branch  Check table, expect the same outcome  Start fetching from fall-through or target  If wrong, flush pipeline and flip prediction

1-Bit Predictor: Shortcoming Inner loop branches mispredicted twice! outer: … … inner: … … beq …, …, inner … beq …, …, outer  Mispredict as taken on last iteration of inner loop  Then mispredict as not taken on first iteration of inner loop next time around

2-Bit Predictor Only change prediction on two successive mispredictions

Calculating the Branch Target  Even with predictor, still need to calculate the target address  1-cycle penalty for a taken branch  Branch target buffer  Cache of target addresses  Indexed by PC when instruction fetched  If hit and instruction is branch predicted taken, can fetch target immediately

Concluding Remarks ISA influences design of datapath and control Datapath and control influence design of ISA Pipelining improves instruction throughput using parallelism More instructions completed per second Latency for each instruction not reduced Hazards: structural, data, control Main additions in hardware: forwarding unit hazard detection and stalling branch predictor branch target table