S. Barua – CPSC 440 CHAPTER 6 ENHANCING PERFORMANCE WITH PIPELINING This chapter presents pipelining.

Slides:



Advertisements
Similar presentations
Lecture 4: CPU Performance
Advertisements

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
CMPT 334 Computer Organization
Chapter 8. Pipelining.
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
Pipelining I (1) Fall 2005 Lecture 18: Pipelining I.
Pipelining Hwanmo Sung CS147 Presentation Professor Sin-Min Lee.
Goal: Describe Pipelining
Chapter Six 1.
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
1 Lecture 17: Basic Pipelining Today’s topics:  5-stage pipeline  Hazards and instruction scheduling Mid-term exam stats:  Highest: 90, Mean: 58.
Pipelining II Andreas Klappenecker CPSC321 Computer Architecture.
1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 1.
Chapter Six Enhancing Performance with Pipelining
Pipelining Andreas Klappenecker CPSC321 Computer Architecture.
1 Lecture 18: Pipelining Today’s topics:  Hazards and instruction scheduling  Branch prediction  Out-of-order execution Reminder:  Assignment 7 will.
Pipelined Processor II CPSC 321 Andreas Klappenecker.
DLX Instruction Format
Computer ArchitectureFall 2007 © October 31, CS-447– Computer Architecture M,W 10-11:20am Lecture 17 Review.
1 Lecture 4: Advanced Pipelines Control hazards, multi-cycle in-order pipelines, static ILP (Appendix A.4-A.10, Sections )
Appendix A Pipelining: Basic and Intermediate Concepts
S. Barua – CPSC 440 CHAPTER 5 THE PROCESSOR: DATAPATH AND CONTROL Goals – Understand how the various.
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 17 - Pipelined.
Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.
Lecture 24: CPU Design Today’s topic –Multi-Cycle ALU –Introduction to Pipelining 1.
-1.1- PIPELINING 2 nd week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM PIPELINING 2 nd week References Pipelining concepts The DLX.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 8: MIPS Pipelined.
Pipelining Enhancing Performance. Datapath as Designed in Ch. 5 Consider execution of: lw $t1,100($t0) lw $t2,200($t0) lw $t3,300($t0) Datapath segments.
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
11/13/2015 8:57 AM 1 of 86 Pipelining Chapter 6. 11/13/2015 8:57 AM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.
Pipeline Hazards. CS5513 Fall Pipeline Hazards Situations that prevent the next instructions in the instruction stream from executing during its.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CMPE 421 Parallel Computer Architecture
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
Electrical and Computer Engineering University of Cyprus LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Pipelining Basics.
1 COMP541 Pipelined MIPS Montek Singh Mar 30, 2010.
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)
1/24/ :00 PM 1 of 86 Pipelining Chapter 6. 1/24/ :00 PM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.
Pipelining Example Laundry Example: Three Stages
LECTURE 7 Pipelining. DATAPATH AND CONTROL We started with the single-cycle implementation, in which a single instruction is executed over a single cycle.
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
Computer Architecture Chapter (14): Processor Structure and Function
CMSC 611: Advanced Computer Architecture
Single Clock Datapath With Control
Pipeline Implementation (4.6)
ECE232: Hardware Organization and Design
Morgan Kaufmann Publishers The Processor
Serial versus Pipelined Execution
Pipeline control unit (highly abstracted)
Chapter 8. Pipelining.
Pipeline control unit (highly abstracted)
Pipelining: Basic Concepts
Pipeline Control unit (highly abstracted)
Presentation transcript:

S. Barua – CPSC CHAPTER 6 ENHANCING PERFORMANCE WITH PIPELINING This chapter presents pipelining as a means of improving performance Topics to be covered  Pipeline concept, its potential for speedup, and the need for balance among the pipeline stages  Pipeline hazards  Techniques to resolve hazards

S. Barua – CPSC What is Pipelining Pipelining is an implementation technique that overlaps multiple instruction execution.  An instruction is broken into smaller steps  Each smaller step (pipeline stage or pipeline segment) takes a fraction of the time needed to complete the entire instruction.

S. Barua – CPSC Example (without pipelining) Consider the lw instruction for the multiple cycle implementation we discussed in Chapter 5. The operation times for the major functional units in the implementation are as follows: Memory units : 200 ps (for read and write) ALU : 200 ps Register file : 100 ps (for read and write) Assume that the multiplexors, control unit, PC access, and sign-extension unit have no delay.

S. Barua – CPSC Example (without pipelining) - Continued Five steps are involved in the lw fetch and execution. Time taken to complete each step is as follows: Instruction fetch: 200 ps Register read: 100 ps (for base value) ALU: 200 ps (for memory address) Memory read: 200 ps (for reading data from memory) Register write: 100 ps (for register write) Execution time for lw instruction = 800 ps Execution time for a sequence of 3 lw instructions = 2400 ps

S. Barua – CPSC Example (with pipelining) Since the lw instruction is divided into five steps, a 5 stage pipeline is employed.  Each pipeline stage takes one clock cycle.  Clock cycle for a pipeline stage must be long enough to accommodate the slowest operation (200 ps in our example). Figure 6.3 Nonpipelined versus pipelined execution of 3 lw instructions From the pipelined example, we see that, the first lw instruction execution takes 800 ps and each additional lw instruction execution adds 200 ps to the total execution time. Thus, the total execution time for the sequence of 3 lw instructions is 1200 ps

S. Barua – CPSC Figure 6.3 Nonpipelined versus pipelined execution of 3 lw instructions

S. Barua – CPSC Pipeline Performance - Summary  Pipeline does not change the individual instruction execution time  Pipeline improves performance by increasing the instruction throughput  The pipelined processor has a lower average CPI when compared to a multicycle implementation with the same clock rate.  The pipelined processor has a lower product of clock rate and CPI when compared to the single cycle implementation  Ideal speedup is proportional to the number of stages

S. Barua – CPSC Need for Registers Between Pipeline Stages Registers are needed between the pipeline stages  To store the value(s) generated by each pipeline stage  to allow the data path to be shared by other instructions in the pipeline. All instructions advance during each clock cycle from one pipeline register to the next.

S. Barua – CPSC Pipeline Hazards Hazard:A situation in pipelining when the next instruction cannot execute in the next clock cycle Three types of hazards:  Structural hazard  Data hazard  Control (branch) hazard

S. Barua – CPSC Structural Hazard The hardware cannot support the combination of instructions that we want to execute in the same clock cycle.

S. Barua – CPSC Data Hazards Data hazard can occur when one or more of the instructions in the pipeline are data dependent. Consider the following sequence of instructions: add$s0, $t0, $t1 sub$t2, $s0, $t3 The sub instruction is dependent on the result in register $s0 of the first instruction. Consider the following sequence of instructions: lw$s0, 20 ($t1) sub$t2, $s0, $t3 The data required by the sub instruction is available only after the fourth stage of the first instruction.

S. Barua – CPSC Data Hazard - Solutions Two methods are used to resolve a data hazard.  Forwarding or bypassing Retrieves the missing data element from internal buffers instead of waiting for it to come from the registers or memory location specified by the instruction (Figure 6.5)  Pipeline stall (bubble) Stall the pipeline by the required number of stages. This guarantees correct execution, but could result in a lower performance. In our example (lw followed by sub), we would have to stall by one stage (Figure 6.6).

S. Barua – CPSC Figure 6.5 Forwarding or bypassing

S. Barua – CPSC Figure 6.6 Pipeline stall (bubble)

S. Barua – CPSC Control (Branch) Hazards In a pipeline, an instruction is fetched at every clock cycle to sustain the pipeline. If the instruction fetched is a “branch” instruction, the decision about whether to branch does not occur until the memory pipeline stage. The delay in determining the proper instruction to fetch is called a “control hazard” or “branch hazard”.

S. Barua – CPSC Resolving Branch Hazards Techniques employed are:  Always stall Pipeline is stalled until the branch is complete. The penalty will be several clock cycles.  Assume branch not taken Execution of the branch instruction is continued in the pipeline assuming that the branch is not likely to take place. If the branch is taken, the instructions that are being fetched and decoded are discarded (flushed).

S. Barua – CPSC Performance of Pipelined Systems Pipelining reduces the average execution time per instruction, thereby improving the system performance. Hazards limit the performance improvement, but appropriate hardware/software techniques can be devised to circumvent these limits.

S. Barua – CPSC Superscalar Technique The internal components of the computer are replicated so that the processor can launch multiple instructions in every pipeline stage. Launching multiple instructions per stage allows the instruction execution rate to exceed the clock rate (CPI < 1).