Slide 1 Instruction-Level Parallelism Review of Pipelining (the laundry analogy)

Slides:

Advertisements

Similar presentations

Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.

Advertisements

1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

COMP 4211 Seminar Presentation Based On: Computer Architecture A Quantitative Approach by Hennessey and Patterson Presenter : Feri Danes.

Instruction Set Issues MIPS easy –Instructions are only committed at MEM  WB transition Other architectures are more difficult –Instructions may update.

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

CMSC 611: Advanced Computer Architecture Instruction Level Parallelism Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides.

Lecture 6: Pipelining MIPS R4000 and More Kai Bu

Instruction-Level Parallelism (ILP)

1 IF IDEX MEM L.D F4,0(R2) MUL.D F0, F4, F6 ADD.D F2, F0, F8 L.D F2, 0(R2) WB IF IDM1 MEM WBM2M3M4M5M6M7 stall.

1 Lecture: Pipeline Wrap-Up and Static ILP Topics: multi-cycle instructions, precise exceptions, deep pipelines, compiler scheduling, loop unrolling, software.

CIS429/529 Winter 2007 Pipelining-1 1 Pipeling RISC/MIPS64 five stage pipeline Basic pipeline performance Pipeline hazards Branch hazards More pipeline.

1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Sep 24, 2003 Topic: Pipelining -- Intermediate Concepts (Multicycle Operations;

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

1 Lecture 5: Pipeline Wrap-up, Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2) Assignment 1 due at the start of class on Thursday.

EECC551 - Shaaban #1 Fall 2002 lec# Floating Point/Multicycle Pipelining in MIPS Completion of MIPS EX stage floating point arithmetic operations.

COMP381 by M. Hamdi 1 Pipelining Control Hazards and Deeper pipelines.

EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.

DLX Instruction Format

1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)

EECC551 - Shaaban #1 Lec # 2 Winter Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple.

EECC551 - Shaaban #1 Spring 2004 lec# Definition of basic instruction blocks Increasing Instruction-Level Parallelism & Size of Basic Blocks.

Appendix A Pipelining: Basic and Intermediate Concepts

EENG449b/Savvides Lec 5.1 1/27/04 January 27, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.

ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.

EECC551 - Shaaban #1 Fall 2001 lec# Floating Point/Multicycle Pipelining in DLX Completion of DLX EX stage floating point arithmetic operations.

1 Manchester Mark I, This was the second (the first was a small- scale prototype) machine built at Cambridge. A production version of this computer.

-1.1- PIPELINING 2 nd week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM PIPELINING 2 nd week References Pipelining concepts The DLX.

Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.

Pipelining. 10/19/ Outline 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and Interrupts Conclusion.

CPE 731 Advanced Computer Architecture Pipelining Review Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of California,

Memory/Storage Architecture Lab Computer Architecture Pipelining Basics.

1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.

CSC 4250 Computer Architectures September 26, 2006 Appendix A. Pipelining.

Pipeline Extensions prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University MIPS Extensions1May 2015.

CMPE 421 Parallel Computer Architecture

Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.

Spring 2003CSE P5481 Precise Interrupts Precise interrupts preserve the model that instructions execute in program-generated order, one at a time If an.

Recap Multicycle Operations –MIPS Floating Point Putting It All Together: the MIPS R4000 Pipeline.

Appendix A. Pipelining: Basic and Intermediate Concept

11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.

Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.

LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,

CSC 4250 Computer Architectures September 22, 2006 Appendix A. Pipelining.

CS203 – Advanced Computer Architecture Pipelining Review.

Instruction-Level Parallelism and Its Dynamic Exploitation

CMSC 611: Advanced Computer Architecture

Instruction-Level Parallelism

Lecture 15: Pipelining: Branching & Complications

Lecture 07: Pipelining Multicycle, MIPS R4000, and More

Pipelining Wrapup Brief overview of the rest of chapter 3

Pipeline Implementation (4.6)

Appendix C Pipeline implementation

Exceptions & Multi-cycle Operations

Appendix A - Pipelining

Pipelining: Advanced ILP

CS 5513 Computer Architecture Pipelining Examples

Lecture 6: Advanced Pipelines

Pipelining Multicycle, MIPS R4000, and More

CS 704 Advanced Computer Architecture

How to improve (decrease) CPI

Project Instruction Scheduler Assembler for DLX

Overview What are pipeline hazards? Types of hazards

Pipelining Multicycle, MIPS R4000, and More

Extending simple pipeline to multiple pipes

CMSC 611: Advanced Computer Architecture

Lecture 5: Pipeline Wrap-up, Static ILP

CS 3853 Computer Architecture Pipelining Examples

CMSC 611: Advanced Computer Architecture

Presentation transcript:

Slide 1 Instruction-Level Parallelism Review of Pipelining (the laundry analogy)

Slide 2 Instruction-Level Parallelism Review of Pipelining (Appendix A)

Slide 3 Instruction-Level Parallelism Review of Pipelining (Appendix A) –MIPS pipeline –MIPS pipeline five stages : »IF – instruction fetch »ID – instruction decoding and operands fetch »EX – execution using ALU, including effective address and target address computing »MEM – accessing memory for L & S instructions »WB – write result back to (destination) register

Slide 4 The “naïve” MIPS pipeline

Slide 5 The “naïve” MIPS pipeline -- implementation Instruction-Level Parallelism

Slide 6 Instruction-Level Parallelism A series of datapaths shifted in time

Slide 7 Instruction-Level Parallelism A pipeline showing the pipeline registers between stages

Slide 8 The major hurdles of pipelining: pipeline hazards Structural HazardsStructural Hazards: resource conflicts, such as bus, register file ports, memory ports, etc.

Slide 9 The major hurdles of pipelining: pipeline hazards Data Hazards forwardingData Hazards: data dependency (producer-consumer relationship, or read after write). Some can be resolved by forwarding

Slide 10 The major hurdles of pipelining: pipeline hazards Data HazardsData Hazards: data hazards detection in MIPS pipeline

Slide 11 The major hurdles of pipelining: pipeline hazards Data HazardsData Hazards: the logic for forwarding of data in MIPS pipeline

Slide 12 The major hurdles of pipelining: pipeline hazards Data HazardsData Hazards: the forwarding of data in MIPS pipeline

Slide 13 The major hurdles of pipelining: pipeline hazards Data Hazards forwarding,Data Hazards: Some cannot be resolved by forwarding, thus requiring stalls

Slide 14 The major hurdles of pipelining: pipeline hazards Data HazardsData Hazards: Avoid non-forwardable data hazards through compiler scheduling:

Slide 15 The major hurdles of pipelining: pipeline hazards Branch (Control) HazardsBranch (Control) Hazards: can cause greater performance loss (e.g., a 3-cycle loss in the “naïve” MIPS pipeline)

Slide 16 The major hurdles of pipelining: pipeline hazards Branch (Control) HazardsBranch (Control) Hazards: improved MIPS pipelined with one- cycle loss

Slide 17 The major hurdles of pipelining: pipeline hazards Reducing branch penaltiesReducing branch penalties : 1.Freeze or Flussh 2.Predict-not-taken or Predict-taken 3.Delayed Branch 1)Branch instruction 2)Sequential successor 3)Branch target if taken “Canceling/nullifying” Branch if prediction incorrect

Slide 18 The major hurdles of pipelining: pipeline hazards Scheduling the branch delay slotScheduling the branch delay slot :

Slide 19 Performance of Pipelining Example 1Example 1 : –Consider an unpipelined machine A and a pipelined machine B where CCT A = 10ns, CPI(A) ALU = CPI(A) Br = 4, CPI(A) l/s = 5, CCT B = 11ns. Assuming an instruction mix of 40% for ALU, 20% for branches, and 40% for l/s, what is the speedup of B over A under ideal conditions?machine A machine B

Slide 20 Performance of Pipelining Impacts of pipeline hazardsImpacts of pipeline hazards :

Slide 21 Performance of Pipelining Performance of branch schemesPerformance of branch schemes : Overall costs of a variety of branch schemes with the MIPS pipeline

Slide 22 Performance of Pipelining Example 2Example 2: For a deeper pipeline such as that in a MIPS R4000, it takes three pipeline stages before the target-address is known and an additional stage before the condition is evaluated. This leads to the branch penalties for the three simplest branch schemes listed below:R4000 Find the effective addition to the CPI arising from branches for this pipeline, assuming that unconditional, untaken conditional, and taken conditional branches account for 4%, 6%, and 10%, respectively.Answer:

Slide 23 What Makes Pipelining Hard to Implement? 1.Exceptional conditions (e.g., interrupts, etc) often change the order of instruction execution;

Slide 24 What Makes Pipelining Hard to Implement? Actions needed for different types of exceptional conditions:

Slide 25 What Makes Pipelining Hard to Implement? Stopping and Restarting Execution: Two Challenges

Slide 26 What Makes Pipelining Hard to Implement? Stopping and Restarting Execution: Two Challenges (cont’d)

Slide 27 What Makes Pipelining Hard to Implement? Precise Exception Handling in MIPS Pipeline StageProblem exceptions occurring IF Page fault on instruction fetch; misaligned memory access; memory protection violation IDUndefined or illegal opcode EXArithmetic exception MEMPage fault on data fetch; misaligned memory access; memory-protection violation WBNone

Slide 28 What Makes Pipelining Hard to Implement? Precise Exception Handling in MIPS

Slide 29 Extending MIPS Pipeline to Handle Multicycle Operations Handle floating point operations: single cycle (CPI=1)  very long CCT or highly complex logic circuit Multiple cycle  long latency: with EX cycle repeated many times and/or with multiple PF function units The MIPS pipeline with three additional unpipelined, floating point units

Slide 30 Extending MIPS Pipeline to Handle Multicycle Operations Pipelining FP functional units: LatencyLatency: number of intervening cycles between the producer and the consumer of an operand -- 0 for ALU and 1 for LW Initiation intervalInitiation interval: number of minimum cycles between two issues of instructions using the same functional unit. F. UnitInt. ALUData MemFP AddMultiplyDivide Latency Init. Interval

Slide 31 Extending MIPS Pipeline to Handle Multicycle Operations Pipeline timing of a set of independent FP instructions : A typical FP code sequence showing the stalls arising from RAW hazards : Three instructions want to perform a write back to the FP register simultaneously MUL.D IFIDM1M2M3M4M5M6M7MEMWB ADD.D IFIDA1A2A3A4MEMWB L.D IFIDEXMEMWB S.D IFIDEXMEMWB L.D. F4,0(R2) IFIDEXMMWB MUL.D F0,F4,F6 IFIDStallM1M2M3M4M5M6M7MMWB ADD.D F2,F0,F8 IFSt’lIDSt’l A1A2A3A4MMWB S.D. F2,0(R2) IFSt’l IDEXSt’l MM MUL.D F0, F4, F6IFIDM1M2M3M4M5M6M7MEMWB …IFIDEXMEMWB …IFIDEXMEMWB ADD.D F2, F4, F6IFIDA1A2A3A4MEMWB …IFIDEXMEMWB …IFIDEXMEMWB L.D. F2, 0(R2)IFIDEXMEMWB

Slide 32 Extending MIPS Pipeline to Handle Multicycle Operations Difficulties in exploiting ILP:Difficulties in exploiting ILP: various hazards that impose dependency among instructions, as a result: –RAW(read after write): j tries to read a source before i writes to it –WAW(write after write): j tries to write an operand before it is written by i –WAR(write after read): j tries to write a destination before it is read by Implementing pipeline in FP: hazards and forwarding in longer latency pipelinesImplementing pipeline in FP: hazards and forwarding in longer latency pipelines –Divide not fully pipelined (structural hazard) –Multiple –Multiple writes in a cycle and arrive at WB variably,WAW and structural hazards. Would there be WAR? –Out-of-order completion of instructions  more problems for exception handling –Higher RAW frequency and longer stalls due to longer latency

Slide 33 Extending MIPS Pipeline to Handle Multicycle Operations Introduce interlock:Introduce interlock: –tracking the use of write port at ID and stalling issue if detected –use shift register for tracking issued instructions' use of write port –stall when entering MEM: 1.can stall any of the contending instructions, 2.no need to detect conflict early when is it harder to see, 3.give priority to the unit with the longest latency, 4.can cause bottleneck stalling –WAW occurs if LD is issued one cycle earlier and has F2 as destination (WAW with ADDD); Solution: 1.delay issuing LD until ADDD enters MEM, or, 2.stamp out result of ADD »Hazard detection with FP pipeline: 1.check for structural hazards: a. functional units, b. write ports 2.check for RAW hazard: source reg. in ID = dest. reg. (issued) 3.check for WAW hazard: dest reg. in ID = dest. reg. (issued)

Slide 34 Extending MIPS Pipeline to Handle Multicycle Operations Maintain precise exception:Maintain precise exception: –Example of out-of-order completion: »DIVF F0, F2, F3 ; exception of SUBF at end of ADDF »ADDF F10, F10, F8 ; cause imprecise exception which »SUBF F12, F12, F14 ; cannot be solved by HW/SW –Solutions: 1.Fast imprecise (tolerable in 60's & 70s, but much less so now due to pipelined FP, virtual memory, and IEEE standard) or slow precise 2.Buffering of result until all predecessors finish: –the bigger the difference among instruction execution lengths, the more expensive to implement (e.g., large number of comparators and MUXs and large amount of buffer space) –history file: keeps track of register values – future file: keeps newer values of registers until all predecessors are completed 3.Quasi-precise exception: keep enough information for trap-handling routine to create a precise sequence for exception: –operations in the pipeline and their PCs –software finishes all instructions issued prior to the latest completed instruction 4.Guarded issuing: issue only if it is certain that all prior instructions will complete without causing an exception –stalling to maintain precise exception

Slide 35 The MIPS R4000 Pipeline –R4000 pipeline leads to a 2-cycle load delay

Slide 36 The MIPS R4000 Pipeline –R4000 pipeline leads to a 3-cycle basic branch delay since the condition evaluation is performed during the EX stage

Slide 37 Dynamic Scheduling with Scoreboard Dynamic Scheduling: Dynamic Scheduling: hardware re-arranges the instruction execution order to reduce stalls: 1.handles situations where dependences are unknown or difficult to detect at compile time, thus simplifying the compiler design; 2.increases portability of the compiled code; 3.solves problems associated with the so-called “head-of-the-queue” (HOTQ) blocking caused by “in-order issue” of earlier pipelines. Example: 4.MIPS, which is “in-order issue”, can be made to “out-of-order” execute (implying “out-of-order” completion) by splitting ID into two phases: (1) In-order Issue: check for structural hazards. (2) Read operands: wait until no data hazards, then read operands (and then execute, possibly out-of-order!). The HOTQ problem above can be solved in this new MIPS!

Slide 38 Dynamic Scheduling with Scoreboard

Slide 39 Dynamic Scheduling with Scoreboard

Slide 40 Dynamic Scheduling with Scoreboard

Slide 41 Dynamic Scheduling with Scoreboard

Slide 42 Dynamic Scheduling with Scoreboard

Slide 43 Dynamic Scheduling with Scoreboard

Slide 44 Dynamic Scheduling with Scoreboard

Slide 45 Unpipelined Processor (MIPS)

Slide 46 Pipelined Processor (MIPS)

Slide 47 The Eight-stage Pipeline of the R4000

Slide 48 A 2-cycle Load Delay of The R4000 Integer Pipeline