Pipelining Intro Computer Organization 1 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Basic Instruction Timings Making some assumptions.

Slides:



Advertisements
Similar presentations
Pipelining (Week 8).
Advertisements

Morgan Kaufmann Publishers The Processor
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Data Dependencies Describes the normal situation that the data that instructions use depend upon the data created by other instructions, or data is stored.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Intro to Computer Org. Pipelining, Part 2 – Data hazards + Stalls.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
CMPT 334 Computer Organization
Goal: Describe Pipelining
Chapter Six 1.
1 A few words about the quiz Closed book, but you may bring in a page of handwritten notes. –You need to know what the “core” MIPS instructions do. –I.
MIPS Pipelined Datapath
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Chapter Six Enhancing Performance with Pipelining
King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.
Appendix A Pipelining: Basic and Intermediate Concepts
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve performance by increasing instruction throughput.
Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Automobile Manufacturing 1. Build frame. 60 min. 2. Add engine. 50 min. 3. Build body. 80 min. 4. Paint. 40 min. 5. Finish.45 min. 275 min. Latency: Time.
Pipelined Datapath and Control
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
11/13/2015 8:57 AM 1 of 86 Pipelining Chapter 6. 11/13/2015 8:57 AM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CMPE 421 Parallel Computer Architecture
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]
Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
Computing Systems Pipelining: enhancing performance.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
LECTURE 7 Pipelining. DATAPATH AND CONTROL We started with the single-cycle implementation, in which a single instruction is executed over a single cycle.
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
Introduction to Computer Organization Pipelining.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
LECTURE 9 Pipeline Hazards. PIPELINED DATAPATH AND CONTROL In the previous lecture, we finalized the pipelined datapath for instruction sequences which.
1  2004 Morgan Kaufmann Publishers No encoding: –1 bit for each datapath operation –faster, requires more memory (logic) –used for Vax 780 — an astonishing.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
Interstage Buffers 1 Computer Organization II © McQuain Pipeline Timing Issues Consider executing: add $t2, $t1, $t0 sub $t3, $t1, $t0 or.
Chapter Six.
Pipeline Timing Issues
Computer Organization
CSCI206 - Computer Organization & Programming
Single Clock Datapath With Control
Pipeline Implementation (4.6)
Chapter 4 The Processor Part 3
Morgan Kaufmann Publishers The Processor
Single-cycle datapath, slightly rearranged
Current Design.
Pipelining in more detail
CSCI206 - Computer Organization & Programming
Pipeline control unit (highly abstracted)
Chapter Six.
Chapter Six.
November 5 No exam results today. 9 Classes to go!
Instruction Execution Cycle
Pipeline control unit (highly abstracted)
Pipeline Control unit (highly abstracted)
MIPS Pipelined Datapath
Presentation transcript:

Pipelining Intro Computer Organization 1 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Basic Instruction Timings Making some assumptions regarding the operation times for some of the basic hardware units in our datapath, we have the following timings: Instruction class Instruction fetch Register read ALU operation Data access Register write Total time lw200 ps100 ps200 ps 100 ps800 ps sw200 ps100 ps200 ps 700 ps R-format200 ps100 ps200 ps100 ps600 ps beq200 ps100 ps200 ps500 ps How long would it take to execute the following sequence of instructions? lw $1, 100($0) lw $2, 200($0) lw $3, 300($0) But, maybe there’s a way we can cheat and complete the sequence faster.

Pipelining Intro Computer Organization 2 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Basic Idea What if we think of the datapath as a linear sequence of stages? Can we operate the stages independently, using an earlier one to begin the next instruction before the previous one has completed? Note: single-cycle datapath We have 5 stages, which will mean that on any given cycle up to 5 different instructions will be in various points of execution.

Pipelining Intro Computer Organization 3 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Pipelining We’ve only considered unimaginative execution; consider our longest instruction: Ideal speedup is number of stages in the pipeline. Do we achieve this? Improve performance by increasing instruction throughput:

Pipelining Intro Computer Organization 4 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Pipelining The average time between initiating instructions has dropped to 200 ps. Why do we have idle “gaps”? Assume: -register file write occurs in first half of a cycle -register file read occurs in second half of a cycle Total time here is 1400 ps versus 2400 ps for the original version… …but consider how this would look if we were had 1,000,000 more lw instructions in our sequence… Details:

Pipelining Intro Computer Organization 5 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens MIPS add Pipeline Here's how the pipeline stages would look from the perspective of an add instruction: Shading indicates when the instruction is using a particular hardware resource. Note that the computed result isn't written into the register file until the 5 th stage. What if the next instruction needs the result from the add instruction? Depending on when the result is needed, we may have to stall the pipeline until the result becomes available.

Pipelining Intro Computer Organization 6 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Pipelining What makes it easy -all instructions are the same length -just a few instruction formats -memory operands appear only in loads and stores What makes it hard? -structural hazards: suppose we had only one memory -control hazards: need to worry about branch instructions -data hazards: an instruction depends on a previous instruction We’ll build a simple pipeline and look at these issues We’ll talk about modern processors and what really makes it hard: -exception handling -trying to improve performance with out-of-order execution, etc.

Pipelining Intro Computer Organization 7 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Pipeline Hazards In some cases, the next instruction cannot execute in the following clock cycle. We will introduce some of the potential issues in the next few slides. structural hazard -hardware cannot support the necessary combination of operations at once -reconsider the earlier example with a single memory unit and a fourth lw instruction data hazard -data that is necessary to execute the instruction is not yet available -consider: add $s0, $t0, $t1 sub $t2, $s0, $s3 -load-use hazard occurs when data imported by a load instruction is not available when it is requested

Pipelining Intro Computer Organization 8 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Here, the second instruction needs the final result from the first instruction during the register fetch portion of the instruction decode phase: Data Hazard Example: Forwarding Obviously the value will not be available in register $s0 until the first instruction has completed. However, the computed value IS actually available after the first instruction finishes its third stage, just in time to satisfy the need of the ALU when the second instruction reaches its third stage. This is indicated above by a forwarding link. In principle, the hazard here could be detected and handled. But, what if the "forwarding" link actually went backwards?

Pipelining Intro Computer Organization 9 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Data Hazard Example: Stalling As indicated, this can be resolved by stalling the pipeline, delaying the initiation of the second instruction for 1 cycle. Again, if we can detect this situation, we can in principle impose the solution shown above. A pipeline stall is often referred to as a bubble. Here the first instruction is a load, and its result simply won't be available in time:

Pipelining Intro Computer Organization 10 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Control Hazards: Stall on Branch One approach would be to stall when a branch instruction is discovered, until the necessary computations are completed and then fetch the correct instruction next. A control hazard occurs when the instruction that was fetched is not the one that is needed. Note that our pipeline discussion so far assumes sequential execution. When the current instruction is a conditional branch, this may be incorrect.

Pipelining Intro Computer Organization 11 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Control Hazards: Branch Prediction A second approach is to predict whether the branch will be taken and load the corresponding instruction into the pipeline. If we guess that the branch will NOT be taken, we just increment the PC, fetch and proceed: This worked out perfectly. There was no delay… however, what if the branch HAD been taken? More sophisticated variants actually retain information (history) about individual branch instructions and use that history to predict future behavior.

Pipelining Intro Computer Organization 12 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Control Hazards: Delayed Branch A third approach is to delay the time at which the branch takes effect by always executing the next sequential instruction following a branch instruction, and then making the branch (if necessary) immediately after than one-instruction delay. To do this, the assembler will automatically accomplish this by placing an instruction immediately after the branch instruction that is not affected by the branch. This approach is used in the MIPS architecture. ;; programmer writes: add $4, $5, $6 beq $1, $2, 40 or $7, $8, $9 ;; assembler writes: beq $1, $2, 40 add $4, $5, $6 or $7, $8, $9 Of course, it’s not always that simple. What would we do if the add instruction had stored its result in one of the registers used by the beq instruction?

Pipelining Intro Computer Organization 13 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Have assembler guarantee no hazards. One approach would be to rearrange statements; another would be to insert no-op (no operation) statements, to induce the necessary stalls. Where do we insert the “no-ops” ? sub$2, $1, $3 and $12, $2, $5 or$13, $6, $2 add$14, $2, $2 sw$15, 100($2) Problem: this really slows us down! Software Solution

Pipelining Intro Computer Organization 14 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Check Yourself What difficulties can you identify in the following code sequences? ;; 1 lw $t0, 0($t0) add $t1, $t0, $t0 ;; 2 add $t1, $t0, $t0 addi $t2, $t0, 5 addi $t4, $t1, 5 ;; 3 addi $t1, $t0, 1 addi $t2, $t0, 2 addi $t3, $t0, 3 addi $t4, $t0, 4 addi $t5, $t0, 5 The result of the lw is needed by the add during its second cycle; a stall is needed. The result of the add is needed by second addi during its second cycle, but isn’t written to the register file until the next cycle; however, we can forward the value since it’s been computed two cycles before it’s written. No problems here.

Pipelining Intro Computer Organization 15 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Basic Idea Redux What do we need to add/modify to actually split the datapath into stages? Instructions and data generally move from left to right. Two exceptions: -write-back of data to register file -selecting the next value for the PC (incremented PC versus branch address) The cases where data flows right to left do not affect the current instruction, but rather they affect later instructions. The first case can lead to a data hazard; the second can lead to a control hazard.

Pipelining Intro Computer Organization 16 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Analysis Consider a time line showing overlapping pipeline logic for a set of instructions: Problem: the original contents of the IR will be lost when the next instruction is fetched, but those original contents are needed at a later cycle as well. (Why?) So, how do we fix this? Basically, we need the ability to preserve results generated in each stage until they are actually needed. So, we can add a bank of storage locations between each pair of stages.

Pipelining Intro Computer Organization 17 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Datapath with Pipeline Registers Here's a first attempt; we just add an unspecified amount of storage between datapath stages: IR is embedded here Original IR contents are passed forward… for later use Incremented PC value is passed forward How large must the IF/ID register storage be? The next order of business is to examine the other inter-stage registers.

Pipelining Intro Computer Organization 18 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Boundary Analysis Let's consider the "boundaries": No pipeline register is needed after the WB stage. Why? The next order of business is to examine the other inter-stage registers. What about the PC? In effect it IS a pipeline register, feeding the IF stage.

Pipelining Intro Computer Organization 19 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Load Instruction Analysis: IF Register shading indicates whether a write (left half) or a read (right half) is occurring. PC is incremented by 4. Result is written back into PC, but also into IF/ID pipeline register in case it is needed later… … don't know what the instruction actually is yet. Instruction is fetched into the pipeline register.

Pipelining Intro Computer Organization 20 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Load Instruction Analysis: ID Register read numbers are supplied to register file. 16-bit immediate field is supplied to the sign-extender. Values read from register file, and extended immediate field are stored in the next pipeline register. Incremented PC value is also passed forward to next- state pipeline register.

Pipelining Intro Computer Organization 21 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Load Instruction Analysis: EX Contents of first read register and sign-extended immediate are sent to the ALU from the pipeline register. Resulting sum is then placed into next-stage pipeline register. Incremented PC value is NOT carried forward… why?

Pipelining Intro Computer Organization 22 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Load Instruction Analysis: MEM Computed address is passed from pipeline register to memory unit. Retrieved data is written into next-stage pipeline register.

Pipelining Intro Computer Organization 23 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Load Instruction Analysis: WB Data is retrieved from the pipeline register and written into the register file.

Pipelining Intro Computer Organization 24 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Load Instruction Analysis So, what have we learned? One key point: a logical component of the datapath, like the ALU, must be used only in a single pipeline stage… otherwise, we have a structural hazard. If you were paying very close attention, we've uncovered a bug in the proposed handling of a load instruction. Take another look at what happens in the final stage… where does the number of the write register come from? Alas, we will no longer have the original instruction in the IF/ID pipeline register, and so we won't have the information we need. Solution: pass the write register number forward to the MEM/WB pipeline register, so it is still available during the final stage.

Pipelining Intro Computer Organization 25 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Summary Further, similar analysis of other instructions leads to a corrected, but incomplete, pipeline design: An important question is just how much storage must each pipeline register provide? That is left to the reader.

Pipelining Intro Computer Organization 26 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens We have 5 stages. What needs to be controlled in each stage? -Instruction Fetch and PC Increment -Instruction Decode / Register Fetch -Execution -Memory Stage -Write Back How would control be handled in an automobile plant? -a fancy control center telling everyone what to do? -should we use a finite state machine? Pipeline Control

Pipelining Intro Computer Organization 27 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Pipeline Control Identify the necessary control signals:

Pipelining Intro Computer Organization 28 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Pass control signals along just like the data: Pipeline Control

Pipelining Intro Computer Organization 29 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Datapath with Control