Computer Architecture 2015– Pipeline 1 Computer Architecture Pipeline By Yoav Etsion & Dan Tsafrir Presentation based on slides by David Patterson, Avi.

Slides:

Advertisements

Similar presentations

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

Advertisements

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.

Computer Architecture 2014– Pipeline 1 Computer Architecture Pipeline By Yoav Etsion & Dan Tsafrir Presentation based on slides by David Patterson, Avi.

Computer Structure 2013 – Pipeline 1 Computer Structure MIPS Pipeline Lihu Rappoport and Adi Yoaz Some of the slides were taken from: (1) Avi Mendelson.

Computer Architecture 2011 – Pipeline 1 Computer Architecture MIPS Pipeline Lihu Rappoport and Adi Yoaz Some of the slides were taken from: (1) Avi Mendelson.

Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,

1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

Computer Architecture 2011 – out-of-order execution (lec 7) 1 Computer Architecture Out-of-order execution By Dan Tsafrir, 11/4/2011 Presentation based.

1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.

Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.

L17 – Pipeline Issues 1 Comp 411 – Fall /1308 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you been.

Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.

Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.

Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.

1 Stalls and flushes  So far, we have discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.

Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.

Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.

1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.

CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.

CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.

CMPE 421 Parallel Computer Architecture

1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.

CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S

Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.

CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]

Winter 2002CSE Topic Branch Hazards in the Pipelined Processor.

CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and

1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,

1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.

Branch Hazards and Static Branch Prediction Techniques

Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

CMPE 421 Parallel Computer Architecture Part 3: Hardware Solution: Control Hazard and Prediction.

CSIE30300 Computer Architecture Unit 06: Containing Control Hazards

Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.

Computer Architecture 2011– pipeline (lec2-3) 1 Computer Architecture MIPS Pipeline By Dan Tsafrir, 7/3/2011, 14/3/2011 Presentation based on slides by.

Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.

ECE/CS 552: Pipeline Hazards © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim.

Computer Structure 2015 – Pipeline 1 Computer Structure Pipeline Lecturer: Aharon Kupershtok Created by Lihu Rappoport.

Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.

Computer Organization

Computer Organization CS224

Stalling delays the entire pipeline

Chapter 4 The Processor Part 4

ECS 154B Computer Architecture II Spring 2009

Chapter 4 The Processor Part 3

Morgan Kaufmann Publishers The Processor

Computer Architecture MIPS Pipeline

Pipelining review.

The processor: Pipelining and Branching

Pipelining in more detail

Pipeline control unit (highly abstracted)

The Processor Lecture 3.6: Control Hazards

Control unit extension for data hazards

Pipeline control unit (highly abstracted)

CS203 – Advanced Computer Architecture

Pipeline Control unit (highly abstracted)

Pipelining (II).

Control unit extension for data hazards

Wackiness Algorithm A: Algorithm B:

Control unit extension for data hazards

ELEC / Computer Architecture and Design Spring 2015 Pipeline Control and Performance (Chapter 6) Vishwani D. Agrawal James J. Danaher.

Presentation transcript:

Computer Architecture 2015– Pipeline 1 Computer Architecture Pipeline By Yoav Etsion & Dan Tsafrir Presentation based on slides by David Patterson, Avi Mendelson, Randi Katz, and Lihu Rappoport

Computer Architecture 2015– Pipeline 2 Pipelined CPU with Control

Computer Architecture 2015– Pipeline 3 Pipeline Hazards: 1. Structural Hazards

Computer Architecture 2015– Pipeline 4 Structural Hazard  Two instructions attempt to use same resource simultaneously  Problem: register-file accessed in 2 stages  Write during stage 5 (WB)  Read during stage 2 (ID) => Resource (RF) conflict  Solution  Clock splits stage into two  Reads are combinatorial; Writes are sequential  2 read ports, 1 write port (separate)

Computer Architecture 2015– Pipeline 5 Structural Hazard  Problem: memory accessed in 2 stages  Fetch (stage 1), when reading instructions from memory  Memory (stage 4), when data is read/written from/to memory  Princeton architecture  Solution  Split data/inst. Memories  Harvard architecture  Today, separate instruction cache and data cache

Computer Architecture 2015– Pipeline 6 Pipeline Hazards: 2. Data Hazards

Computer Architecture 2015– Pipeline 7  When two instructions access the same register  RAW: Read-After-Write  True dependency  WAR: Write-After-Read  Anti-dependency  WAW: Write-After-Write  False-dependency  Key problem with regular in-order pipelines is RAW  We will also learn about out-of-order pipelines Data Dependencies

Computer Architecture 2015– Pipeline 8  Problem with starting next instruction before first is finished  dependencies that “go backward in time” are data hazards Data Dependencies sub R2, R1, R3 and R12,R2, R5 or R13,R6, R2 add R14,R2, R2 sw R15,100(R2) Program execution order CC 1CC 2CC 3CC 4CC 5CC 6 Time (clock cycles) CC 7CC 8CC –20 Value of R

Computer Architecture 2015– Pipeline 9 IM bubble IM IM RAW Hazard: HW Solution 1 - Add Stalls IMReg CC 1CC 2CC 3CC 4CC 5CC 6 Time (clock cycles) CC 7CC 8CC 9 100–20 Value of R2 DM Reg IMDMReg Reg IMReg IM Reg DMReg IMDMReg Reg Reg DM sub R2, R1, R3 stall and R12,R2, R5 or R13,R6, R2 add R14,R2, R2 sw R15,100(R2) Program execution order Let the hardware detect hazard and add stalls if needed Problem: slow! Solution: forwarding whenever possible

Computer Architecture 2015– Pipeline 10  Use temporary results, don’t wait for them to be written to the register file  register file forwarding to handle read/write to same register  ALU forwarding RAW Hazard: HW Solution 2 - Forwarding IMReg IMReg IMRegDMReg IMDMReg IMDMReg DMReg Reg Reg Reg XXX–20XXXXX XXXX– XXXX DM sub R2, R1, R3 and R12,R2, R5 or R13,R6, R2 add R14,R2, R2 sw R15,100(R2) Program execution order CC 1CC 2CC 3CC 4CC 5CC 6 Time (clock cycles) CC 7CC 8CC –20 Value of R Value EX/MEM Value MEM/WB

Computer Architecture 2015– Pipeline 11 Forwarding Hardware

Computer Architecture 2015– Pipeline 12 Forwarding Hardware Added 2 mux units before ALU Each mux gets 3 inputs, from: 1.Prev stage (ID/EX) 2.Next stage (EX/MEM) 3.The one after (MEM/WB) Forward unit tells the 2 mux units which input to use

Computer Architecture 2015– Pipeline 13  “load” op can cause “un-forwardable” hazards  load value to R  In the next instruction, use R as input  A bigger problem in longer pipelines Can't always forward (stall inevitable) Reg IM Reg Reg IM IMRegDMReg IMDMReg IMDMReg DMReg Reg Reg DM CC 1CC 2CC 3CC 4CC 5CC 6 Time (clock cycles) CC 7CC 8CC 9 Program execution order lw R2, 30(R1) and R12,R2, R5 or R13,R6, R2 add R14,R2, R2 sw R15,100(R2)

Computer Architecture 2015– Pipeline 14 Pipeline Hazards: 3. Control Hazards

Computer Architecture 2015– Pipeline 15 Branch, but where?  The decision to branch happens deep within the pipeline  Likewise, the target of the branch becomes known deep within the pipeline  How does this affect the pipeline logic?  For example…

Computer Architecture 2015– Pipeline 16 Executing a BEQ Instruction (i) BEQ R4, R5, 27 → if (R4-R5=0) then PC  PC+4+SignExt(27)*4 ; else PC  PC+4 0 or 4 beq R4, R5, 27 8 and 12 sw 16 sub Assume this program state

Computer Architecture 2015– Pipeline 17 Executing a BEQ Instruction (i) BEQ R4, R5, 27 → if (R4-R5=0) then PC  PC+4+SignExt(27)*4 ; else PC  PC+4 0 or 4 beq R4, R5, 27 8 and 12 sw 16 sub We know: Values of registers We don’t know: If branch will be taken What is its target

Computer Architecture 2015– Pipeline 18 Executing a BEQ Instruction (ii) BEQ R4, R5, 27 → if (R4-R5=0) then PC  PC+4+SignExt(27)*4 ; else PC  PC+4 0 or 4 beq R4, R5, 27 8 and 12 sw 16 sub …Now we know, but only in next cycle will this effect PC Calculate branch condition = compute R4-R5 & compare to 0 Calculate branch target

Computer Architecture 2015– Pipeline 19 Executing a BEQ Instruction (iii) BEQ R4, R5, 27 → if (R4-R5=0) then PC  PC+4+SignExt(27)*4 ; else PC  PC+4 0 or 4 beq R4, R5, 27 8 and 12 sw 16 sub Finally, if taken, branch sets the PC

Computer Architecture 2015– Pipeline 20 Control Hazard on Branches And Beq sub sw Inst from target IMRegDM Reg PC IMRegDM Reg IMRegDM Reg IMRegDM Reg IMRegDM Reg Outcome: The 3 instructions following the branch are in the pipeline even if branch is taken!

Computer Architecture 2015– Pipeline 21 Stall  Easiest solution:  Stall pipe when branch encountered until resolved  But there’s a prices. Assume:  CPI = 1  20% of instructions are branches (realistic)  Stall 3 cycles on every branch (extra 3 cycles for each branch)  Then the price is:  CPI new = × 3 = 1.6// 1 = all instr., including branch  [ CPI new = CPI Ideal + avg. stall cycles / instr. ]  Namely:  IPC drops from 1 to 1/1.6  We lose ~37% of the performance!

Computer Architecture 2015– Pipeline 22 Traps, Exceptions and Interrupts  Indication of events that require a higher authority to intervene (i.e. the operating system)  Atomically changes the protection mode and branches to OS  Protection mode determines what the running is allowed to do (access devices, memory, etc).  Traps: Synchronous  The program asks for OS services (e.g. access a device)  Exceptions: Synchronous  The program did something bad (divide-by-zero; prot. violation)  Interrupts: Asynchronous  An external device needs OS attention (finished an operation)  Can these be handled like regular branches?

Computer Architecture 2015– Pipeline 23 Branch Prediction and Speculative Execution

Computer Architecture 2015– Pipeline 24 Static prediction: branch not taken  Execute instructions from the fall-through (not-taken) path  As if there is no branch  If the branch is not-taken (~50%), no penalty is paid  If branch actually taken  Flush the fall-through path instructions before they change the machine state (memory / registers)  Fetch the instructions from the correct (taken) path  Assuming ~50% branches not taken on average  CPI new = 1 + (0.2 × 0.5) × 3 = 1.3  23% slowdown instead of 37%  What happens in longer pipelines?

Computer Architecture 2015– Pipeline 25 Dynamic branch prediction  Branch prediction is a key impediment to performance  Modern processors employ complex branch predictors  Often achieve < 3% misprediction rate  Given an instruction, we need to predict  Is it a branch?  Branch taken?  Target address?  To avoid stalling  Prediction needed at end of ‘fetch’  Before we even now what the instruction is…  A simple mechanism: Branch Target Buffer (BTB)

Computer Architecture 2015– Pipeline 26 BTB – the idea fast lookup table PC of fetched instruction ?= Predicted branch taken or not taken? (last few times) No => we don’t know, so we don’t predict Yes => instruction is a branch, so let’s predict it Branch PC Target PC History Predicted Target (Works in a straightforward manner only for direct branches, otherwise target PC changes)

Computer Architecture 2015– Pipeline 27 How it works in a nutshell  Until proven otherwise, assume branches are not taken  Fall through instructions (assume branch has no effect)  Upon the first time a branch is taken  Pay the price (in terms of stalls), but  Save the details of the branch in the BTB (= PC, target PC, and whether or not branch was taken)  While fetching, HW checks in parallel  Whether PC is in BTB  If found, make a prediction  Taken? Address?  Upon misprediction  Flush (throw out) pipeline content & start over from right PC

Computer Architecture 2015– Pipeline 28 Prediction steps 1. Allocate  Insert instruction to BTB once identified as taken branch  Do we want to insert not-taken branches?  Option: Implicitly predict they’d continue not to be taken  Insert both conditional & unconditional  To identify, and to save arithmetic 2. Predict  BTB lookup done in parallel to PC-lookup, providing:  Indication whether PC is a branch (=> BTB “hit”)  Branch target  Branch direction (forward or backward in program)  Branch type (conditional or not) 3. Update (when branch taken & its outcome becomes known)  Branch target, history (taken or not)

Computer Architecture 2015– Pipeline 29 Misprediction  Occurs when  Predict = not taken, reality = taken  Predict = taken, reality = not taken  Branch taken as predicted, but wrong target (indirect, as in the jmp register)  Must flush pipeline  Reset pipeline registers (similar to turning all into NOPs)  Commonly, other flush methods are easier to implement  Set the PC to the correct path  Start fetching instruction from correct path

Computer Architecture 2015– Pipeline 30 CPI  Assuming a fraction of p correct predictions  CPI_new = 1 + (0.2 × (1-p)) × 3  Example, p=0.7  CPI_new = 1 + (0.2 × 0.3) × 3 = 1.18  Example, p=0.98  CPI_new = 1 + (0.2 × 0.02) × 3 =  (But this is a simplistic model; in reality the price can sometimes be much higher.)

Computer Architecture 2015– Pipeline 31 History & prediction algorithm  “Always backward” prediction  Works for long loops  Some branches exhibit “locality”  Typically behave as the last time they were invoked  Typically depend on their previous outcome (& it alone)  Can save a history window  What happened last time, and before that, and before…  The bigger the window, the greater the complexity  Some branches regularly alternate between taken & untaken  Taken, then untaken, then taken, …  Need only one history bit to identify this  Some branches are correlated with previous branches  Those that lead to them

Computer Architecture 2015– Pipeline 32 Adding a BTB to the Pipeline

Computer Architecture 2015– Pipeline 33 Using The BTB PC moves to next instruction Inst Mem gets PC and fetches new inst BTB gets PC and looks it up IF/ID reg. loaded with new inst BTB Hit ?Br taken ? PC  PC + 4PC  pred addr IF ID IF/ID reg. loaded with pred inst IF/ID reg. loaded with seq. inst Branch ? yesno yes noyes EXE

Computer Architecture 2015– Pipeline 34 Using The BTB (cont.) ID EXE MEM WB Branch ? Calculate br cond & trgt Flush pipe & update PC Corect pred ? yesno IF/ID latch loaded with correct inst continue Update BTB yes no continue

Computer Architecture 2015– Pipeline 35 Prediction algorithm  Can do an entire course on this issue  Still actively researched  As noted, modern predictors can often achieve misprediction < 3%  Still, it has been shown that these 3% can sometimes significantly worsen performance  A real problem in out-of-order pipelines  We did not talk about the issue of indirect branches  As in virtual function calls (object oriented)  Where the branch target is written in memory, elsewhere