L17 – Pipeline Issues 1 Comp 411 – Fall 2008 11/1308 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you been.

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers The Processor
Advertisements

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Pipeline Issues This pipeline stuff makes my head hurt! Maybe it’s that dumb hat.
Pipelined Processor II (cont’d) CPSC 321
11/1/2005Comp 120 Fall November Exam Postmortem 11 Classes to go! Read Sections 7.1 and 7.2 You read 6.1 for this time. Right? Pipelining then.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipeline Hazards See: P&H Chapter 4.7.
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Pipelining III Andreas Klappenecker CPSC321 Computer Architecture.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
1 Chapter Six - 2nd Half Pipelined Processor Forwarding, Hazards, Branching EE3055 Web:
L18 – Pipeline Issues 1 Comp 411 – Spring /03/08 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you.
1 CSE SUNY New Paltz Chapter Six Enhancing Performance with Pipelining.
Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve performance by increasing instruction throughput.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
L16 – Pipelining 1 Comp 411 – Spring /20/2011 Pipelining Between 411 problems sets, I haven’t had a minute to do laundry Now that’s what I call dirty.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CMPE 421 Parallel Computer Architecture
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
Computer Organization and Design Building a Computer!
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]
Cs 152 L1 3.1 DAP Fa97,  U.CB Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Multiple tasks.
Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
Computing Systems Pipelining: enhancing performance.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
Computer Organization and Design Pipelining Montek Singh Mon, Dec 2, 2013 Lecture 16.
Computer Organization and Design Pipelining Montek Singh Dec 2, 2015 Lecture 16 (SELF STUDY – not covered on the final exam)
1. Convert the RISCEE 1 Architecture into a pipeline Architecture (like Figure 6.30) (showing the number data and control bits). 2. Build the control line.
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
Introduction to Computer Organization Pipelining.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
L17 – Pipeline Issues 1 Comp 411 – Fall /23/09 CPU Pipelining Issues Read Chapter This pipe stuff makes my head hurt! What have you been.
Chapter Six.
Morgan Kaufmann Publishers
ELEN 468 Advanced Logic Design
Single Clock Datapath With Control
Computer Organization and Design Building a Computer!
Chapter 4 The Processor Part 3
Pipelining review.
Pipelining Read Chapter
Building a Computer I wonder where this goes? MIPS Kit ALU 1
Lecture 9. MIPS Processor Design – Pipelined Processor Design #2
How Computers Work Lecture 13 Details of the Pipelined Beta
Building a Computer I wonder where this goes? MIPS Kit ALU 1
Pipelining in more detail
Chapter Six.
Chapter Six.
Computer Organization and Design Pipelining
November 5 No exam results today. 9 Classes to go!
Computer Organization and Design Building a Computer!
CS203 – Advanced Computer Architecture
Building a Computer! Don Porter Lecture 14.
Computer Organization and Design Building a Computer!
Guest Lecturer: Justin Hsia
Presentation transcript:

L17 – Pipeline Issues 1 Comp 411 – Fall /1308 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you been beating your head against?

L17 – Pipeline Issues 2 Comp 411 – Fall /1308 ALU AB ALUFN Data Memory RD WD R/W Adr Wr WDSEL PC+4 Z VNC PC +4 Instruction Memory A D 00 BT PC :J :00 JT PCSEL x x x PC REG 00 IR REG WA Register File RA1RA2 RD1RD2 J: Imm: + x4 BT JT Rt: Rs: ASEL 20 BSEL 01 SEXT shamt: “16” 1 = BZ 5-Stage miniMIPS PC ALU 00 IR ALU A B WD ALU PC MEM 00 IR MEM Y MEM WD MEM WA Register File WA WD WE WERF WASEL Rd: Rt: “31” “27” Instruction Fetch Register File ALU Write Back PC WB 00 IR WB Y WB Memory Address is available right after instruction enters Memory stage Data is needed just before rising clock edge at end of Write Back stage almost 2 clock cycles Omits some details NO bypass or interlock logic

L17 – Pipeline Issues 3 Comp 411 – Fall /1308 Pipelining Improve performance by increasing instruction throughput Ideal speedup is number of stages in the pipeline. Do we achieve this?

L17 – Pipeline Issues 4 Comp 411 – Fall /1308 Pipelining What makes it easy all instructions are the same length just a few instruction formats memory operands appear only in loads and stores What makes it hard? structural hazards: suppose we had only one memory control hazards: need to worry about branch instructions data hazards: an instruction depends on a previous instruction Individual Instructions still take the same number of cycles But we’ve improved the through-put by increasing the number of simultaneously executing instructions

L17 – Pipeline Issues 5 Comp 411 – Fall /1308 Structural Hazards Inst Fetch Reg Read ALUData Access Reg Write Inst Fetch Reg Read ALUData Access Reg Write Inst Fetch Reg Read ALUData Access Reg Write Inst Fetch Reg Read ALUData Access Reg Write

L17 – Pipeline Issues 6 Comp 411 – Fall /1308 Problem with starting next instruction before first is finished dependencies that “go backward in time” are data hazards Data Hazards

L17 – Pipeline Issues 7 Comp 411 – Fall /1308 Have compiler guarantee no hazards Where do we insert the “nops” ? sub$2, $1, $3 and $12, $2, $5 or$13, $6, $2 add$14, $2, $2 sw$15, 100($2) Problem: this really slows us down! Software Solution

L17 – Pipeline Issues 8 Comp 411 – Fall /1308 Use temporary results, don’t wait for them to be written register file forwarding to handle read/write to same register ALU forwarding Forwarding

L17 – Pipeline Issues 9 Comp 411 – Fall /1308 Load word can still cause a hazard: an instruction tries to read a register following a load instruction that writes to the same register. Thus, we need a hazard detection unit to “stall” the instruction Can't always forward

L17 – Pipeline Issues 10 Comp 411 – Fall /1308 Stalling We can stall the pipeline by keeping an instruction in the same stage

L17 – Pipeline Issues 11 Comp 411 – Fall /1308 When we decide to branch, other instructions are in the pipeline! We are predicting “branch not taken” need to add hardware for flushing instructions if we are wrong Branch Hazards

L17 – Pipeline Issues 12 Comp 411 – Fall /1308 Improving Performance Try to avoid stalls! E.g., reorder these instructions: lw $t0, 0($t1) lw $t2, 4($t1) sw $t2, 0($t1) sw $t0, 4($t1) Add a “branch delay slot” the next instruction after a branch is always executed rely on compiler to “fill” the slot with something useful Superscalar: start more than one instruction in the same cycle

L17 – Pipeline Issues 13 Comp 411 – Fall /1308 Dynamic Scheduling The hardware performs the “scheduling” hardware tries to find instructions to execute out of order execution is possible speculative execution and dynamic branch prediction All modern processors are very complicated Pentium 4: 20 stage pipeline, 6 simultaneous instructions PowerPC and Pentium: branch history table Compiler technology important

L17 – Pipeline Issues 14 Comp 411 – Fall /1308 ALU AB ALUFN RD WD R/W Adr Wr WDSEL PC+4 Z VNC PC +4 Instruction Memory A D 00 BT PC :J :00 JT PCSEL x x x PC REG 00 IR REG WA Register File RA1RA2 RD1RD2 J: Imm: + x4 BT JT Rt: Rs: ASEL 20 BSEL 01 SEXT shamt: “16” 1 = BZ 5-Stage miniMIPS PC ALU 00 IR ALU A B WD ALU PC MEM 00 IR MEM Y MEM WD MEM WA Register File WA WD WE WERF WASEL Rd: Rt: “31” “27” Instruction Fetch Register File ALU Write Back PC WB 00 IR WB Y WB Memory We wanted a simple, clean pipeline but… added CLK EN to freeze IF/RF stages so we can wait for lw to reach WB stage NOP Data Memory broke the sequential semantics of ISA by adding a branch delay-slot and early branch resolution logic added A/B bypass muxes to get data before it’s written to regfile

L17 – Pipeline Issues 15 Comp 411 – Fall /1308 Bypass MUX Details RD1RD2 Register File A BypassB Bypass ASEL 02 A ALU BSEL 01 B ALU ’16’SEXT(imm) To ALU WD ALU To Mem JT from ALU/MEM/WB/PC 1 shamt = BZ The previous diagram was oversimplified. Really need for the bypass muxes to precede the A and B muxes to provide the correct values for the jump target (JT), write data, and early branch decision logic.

L17 – Pipeline Issues 16 Comp 411 – Fall /1308 ALU AB ALUFN RD WD R/W Adr Wr WDSEL PC+4 Z VNC PC +4 Instruction Memory A D 00 BT PC :J :00 JT PCSEL x x x PC REG 00 IR REG WA Register File RA1RA2 RD1RD2 J: Imm: + x4 BT JT Rt: Rs: ASEL 20 BSEL 01 SEXT shamt: “16” 1 = BZ Final 5-Stage miniMIPS PC ALU 00 IR ALU A B WD ALU PC MEM 00 IR MEM Y MEM WD MEM WA Register File WA WD WE WERF WASEL Rd: Rt: “31” “27” Instruction Fetch Register File ALU Write Back PC WB 00 IR WB Y WB Memory Data Memory NOP Added branch delay slot and early branch resolution logic to fix a CONTROL hazard Added lots of bypass paths and detection logic to fix various STRUCTURAL hazards Added pipeline interlocks to fix load delay STRUCTURAL hazard NOP

L17 – Pipeline Issues 17 Comp 411 – Fall /1308 Pipeline Summary (I) Started with unpipelined implementation – direct execute, 1 cycle/instruction – it had a long cycle time: mem + regs + alu + mem + wb We ended up with a 5-stage pipelined implementation – increase throughput (3x???) – delayed branch decision (1 cycle) Choose to execute instruction after branch – delayed register writeback (3 cycles) Add bypass paths (6 x 2 = 12) to forward correct value – memory data available only in WB stage Introduce NOPs at IR ALU, to stall IF and RF stages until LD result was ready

L17 – Pipeline Issues 18 Comp 411 – Fall /1308 Pipeline Summary (II) Fallacy #1: Pipelining is easy Smart people get it wrong all of the time! Fallacy #2: Pipelining is independent of ISA Many ISA decisions impact how easy/costly it is to implement pipelining (i.e. branch semantics, addressing modes). Fallacy #3: Increasing Pipeline stages improves performance Diminishing returns. Increasing complexity.