Computer Organization and Design Pipelining Montek Singh Dec 2, 2015 Lecture 16 (SELF STUDY – not covered on the final exam)

Slides:



Advertisements
Similar presentations
PipelineCSCE430/830 Pipeline: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Prof. Yifeng Zhu, U of Maine Fall,
Advertisements

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
ELEN 468 Advanced Logic Design
Now that’s what I call dirty laundry
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
Pipelining I (1) Fall 2005 Lecture 18: Pipelining I.
Pipelining Hwanmo Sung CS147 Presentation Professor Sin-Min Lee.
Goal: Describe Pipelining
Computer Architecture
Chapter Six 1.
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan
1 COMP541 Sequencing – III (Sequencing a Computer) Montek Singh April 9, 2007.
L18 – Pipeline Issues 1 Comp 411 – Spring /03/08 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you.
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
L17 – Pipeline Issues 1 Comp 411 – Fall /1308 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you been.
Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
DLX Instruction Format
1 CSE SUNY New Paltz Chapter Six Enhancing Performance with Pipelining.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 9, 2002 Topic: Pipelining Basics.
1 Atanasoff–Berry Computer, built by Professor John Vincent Atanasoff and grad student Clifford Berry in the basement of the physics building at Iowa State.
Appendix A Pipelining: Basic and Intermediate Concepts
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
CS1104: Computer Organisation School of Computing National University of Singapore.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
L16 – Pipelining 1 Comp 411 – Spring /20/2011 Pipelining Between 411 problems sets, I haven’t had a minute to do laundry Now that’s what I call dirty.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CMPE 421 Parallel Computer Architecture
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
Electrical and Computer Engineering University of Cyprus LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining.
Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
Pipelining Example Laundry Example: Three Stages
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
Computer Organization and Design Pipelining Montek Singh Mon, Dec 2, 2013 Lecture 16.
CBP 2005Comp 3070 Computer Architecture1 Last Time … All instructions the same length We learned to program MIPS And a bit about Intel’s x86 Instructions.
1. Convert the RISCEE 1 Architecture into a pipeline Architecture (like Figure 6.30) (showing the number data and control bits). 2. Build the control line.
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
Introduction to Computer Organization Pipelining.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
L17 – Pipeline Issues 1 Comp 411 – Fall /23/09 CPU Pipelining Issues Read Chapter This pipe stuff makes my head hurt! What have you been.
Chapter Six.
Lecture 18: Pipelining I.
Computer Organization
Performance of Single-cycle Design
ELEN 468 Advanced Logic Design
CMSC 611: Advanced Computer Architecture
Single Clock Datapath With Control
Pipeline Implementation (4.6)
Computer Organization and Design Building a Computer!
Pipelining Read Chapter
How Computers Work Lecture 13 Details of the Pipelined Beta
Serial versus Pipelined Execution
Chapter Six.
Chapter Six.
Computer Organization and Design Pipelining
November 5 No exam results today. 9 Classes to go!
Now that’s what I call dirty laundry
Guest Lecturer: Justin Hsia
A relevant question Assuming you’ve got: One washer (takes 30 minutes)
Presentation transcript:

Computer Organization and Design Pipelining Montek Singh Dec 2, 2015 Lecture 16 (SELF STUDY – not covered on the final exam)

Pipelining Between 411 problems sets, I haven’t had a minute to do laundry Now that’s what I call dirty laundry

Laundry Example Device: Washer Function: Fill, Agitate, Spin Washer PD = 30 mins Device: Dryer Function: Heat, Spin Dryer PD = 60 mins INPUT: dirty laundry OUTPUT: 4 more weeks

Laundry: One Load at a Time  Everyone knows that the real reason one puts off doing laundry so long is not because we procrastinate, are lazy, or even have better things to do. The fact is, doing laundry one load at a time is not smart. The fact is, doing laundry one load at a time is not smart. Step 1: Step 2: Total = Washer PD + Dryer PD = _________ mins 90

Laundry: Doing N Loads!  Here’s how one would do laundry the “unpipelined” way. Step 1: Step 2: Step 3: Step 4: Total = N*(Washer PD + Dryer PD ) = ____________ mins N*90 …

Laundry: Doing N Loads!  Here’s how to “pipeline” the laundry process. Much more efficient! Much more efficient! Step 1: Step 2: Step 3: Total = N * Max(Washer PD, Dryer PD ) = ____________ mins N*60 … Actually, it’s more like N* if we account for the startup time (i.e., filling up the pipeline) correctly. When doing pipeline analysis, we’re mostly interested in the “steady state” where we assume we have an infinite supply of inputs.

Recall Our Performance Measures  Latency: Delay from input to corresponding output Delay from input to corresponding output  Unpipelined Laundry = _________ mins  Pipelined Laundry = _________ mins  Throughput: Rate at which inputs or outputs are processed Rate at which inputs or outputs are processed  Unpipelined Laundry = _________ outputs/min  Pipelined Laundry = _________ outputs/min /90 1/60 Assuming that the wash is started as soon as possible and waits (wet) in the washer until dryer is available. Even though we increase latency, it takes less time per load

Pipelining Summary  Advantages: Higher throughput than combinational system Higher throughput than combinational system Different parts of the logic work on different parts of the problem… Different parts of the logic work on different parts of the problem…  Disadvantages: Generally, increases latency Generally, increases latency Only as good as the *weakest* link (often called the pipeline’s BOTTLENECK) Only as good as the *weakest* link (often called the pipeline’s BOTTLENECK)

Review of CPU Performance MIPS = Millions of Instructions/Second Freq = Clock Frequency, MHz CPI = Clocks per Instruction MIPS = Freq CPI To Increase MIPS: 1. DECREASE CPI. - RISC simplicity reduces CPI to CPI below 1.0? State-of-the-art multiple instruction issue 2. INCREASE Freq. - Freq limited by delay along longest combinational path; hence - PIPELINING is the key to improving performance.

Where Are the Bottlenecks? Pipelining goal: Break LONG combinational paths  memories, ALU in separate stages

Goal: 5-Stage Pipeline GOAL: Maintain (nearly) 1.0 CPI, but increase clock speed to barely include slowest components (mems, regfile, ALU) APPROACH: structure processor as 5-stage pipeline: IF Instruction Fetch stage: Maintains PC, fetches one instruction per cycle and passes it to WB Write-Back stage: writes result back into register file. ID/RF Instruction Decode/Register File stage: Decode control lines and select source operands ALU ALU stage: Performs specified operation, passes result to … MEM Memory stage: If it’s a lw, use ALU result as an address, pass mem data (or ALU result if not lw) to …

ALU AB ALUFN Data Memory RD WD R/W Adr Wr WDSEL PC+4 Z VNC PC +4 Instruction Memory A D 00 BT PC :J :00 JT PCSEL x x x PC REG 00 IR REG WA Register File RA1RA2 RD1RD2 J: Imm: + x4 BT JT Rt: Rs: ASEL 20 BSEL 01 SEXT shamt: “16” 1 = BZ 5-Stage miniMIPS PC ALU 00 IR ALU A B WD ALU PC MEM 00 IR MEM Y MEM WD MEM WA Register File WA WD WE WERF WASEL Rd: Rt: “31” “27” Instruction Fetch Register File ALU Write Back PC WB 00 IR WB Y WB Memory Address is available right after instruction enters Memory stage Data is needed just before rising clock edge at end of Write Back stage Omits some details

Pipelining  Improve performance by increasing instruction throughput  Ideal speedup is number of stages in the pipeline. Do we achieve this?

Pipelining  What makes it easy all instructions are the same length all instructions are the same length just a few instruction formats just a few instruction formats memory operands appear only in loads and stores memory operands appear only in loads and stores  What makes it hard? structural hazards: suppose we had only one memory structural hazards: suppose we had only one memory control hazards: need to worry about branch instructions control hazards: need to worry about branch instructions data hazards: an instruction depends on a previous instruction data hazards: an instruction depends on a previous instruction  Net effect: Individual instructions still take the same number of cycles Individual instructions still take the same number of cycles But improved throughput by increasing the number of simultaneously executing instructions But improved throughput by increasing the number of simultaneously executing instructions

Data Hazards  Problem with starting next instruction before first is finished dependencies that “go backward in time” are data hazards dependencies that “go backward in time” are data hazards

Software Solution  Have compiler guarantee no hazards Where do we insert the “nops” ? Where do we insert the “nops” ?  Between “producing” and “consuming” instructions! sub$2, $1, $3 and $12, $2, $5 or$13, $6, $2 add$14, $2, $2 sw$15, 100($2)  Problem: this really slows us down!

Forwarding  Bypass/forward results as soon as they are produced/needed. Don’t wait for them to be written back into registers!

Can't always forward  Load word can still cause a hazard: an instruction tries to read a register following a load instruction that writes to the same register. STALL! an instruction tries to read a register following a load instruction that writes to the same register. STALL!

Stalling  When needed, stall the pipeline by keeping an instruction in the same stage fpr an extra clock cycle.

Branch Hazards  When branching, other instructions are in the pipeline! need to add hardware for flushing instructions if we are wrong need to add hardware for flushing instructions if we are wrong

Pipeline Summary  A very common technique to improve throughput of any circuit used in all modern processors! used in all modern processors!  Fallacies: “Pipelining is easy.” No, smart people get it wrong all of the time! “Pipelining is easy.” No, smart people get it wrong all of the time! “Pipelining is independent of ISA.” No, many ISA decisions impact how easy/costly it is to implement pipelining (i.e. branch semantics, addressing modes). “Pipelining is independent of ISA.” No, many ISA decisions impact how easy/costly it is to implement pipelining (i.e. branch semantics, addressing modes). “Increasing pipeline stages improves performance.” No, returns diminish because of increasing complexity. “Increasing pipeline stages improves performance.” No, returns diminish because of increasing complexity.