CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Pipelining Basics.

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers The Processor
Advertisements

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Pipelining and Control Hazards Oct
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
The Pipelined CPU Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Revised 9/22/2013.
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
Chapter Six 1.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipeline Hazards See: P&H Chapter 4.7.
S. Barua – CPSC 440 CHAPTER 6 ENHANCING PERFORMANCE WITH PIPELINING This chapter presents pipelining.
Pipelining III Andreas Klappenecker CPSC321 Computer Architecture.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 1.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
COMP381 by M. Hamdi 1 (Recap) Control Hazards. COMP381 by M. Hamdi 2 Control (Branch) Hazard A: beqz r2, label B: label: P: Problem: The outcome.
Appendix A Pipelining: Basic and Intermediate Concepts
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 17 - Pipelined.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
Abstraction Question General purpose processors have an abstraction layer fixed at the ISA and have little control over the compilers or code run on the.
Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 8: MIPS Pipelined.
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
11/13/2015 8:57 AM 1 of 86 Pipelining Chapter 6. 11/13/2015 8:57 AM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.
Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CMPE 421 Parallel Computer Architecture
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
1 COMP541 Pipelined MIPS Montek Singh Mar 30, 2010.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
Winter 2002CSE Topic Branch Hazards in the Pipelined Processor.
Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
Branch Hazards and Static Branch Prediction Techniques
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
Gary MarsdenSlide 1University of Cape Town Pipelining  Technique where multiple instructions are overlapped in execution (key for speed)
1/24/ :00 PM 1 of 86 Pipelining Chapter 6. 1/24/ :00 PM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.
LECTURE 7 Pipelining. DATAPATH AND CONTROL We started with the single-cycle implementation, in which a single instruction is executed over a single cycle.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
Pipelining Intro Computer Organization 1 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Basic Instruction Timings Making some assumptions.
EECS 370 Discussion 1 Calvin and Hobbes by Bill Watterson.
ECE/CS 552: Pipeline Hazards © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
Chapter Six.
Computer Organization
Computer Organization CS224
Morgan Kaufmann Publishers
Single Clock Datapath With Control
Pipeline Implementation (4.6)
ECS 154B Computer Architecture II Spring 2009
Morgan Kaufmann Publishers The Processor
The processor: Pipelining and Branching
Pipelining in more detail
CSCI206 - Computer Organization & Programming
Chapter Six.
Chapter Six.
November 5 No exam results today. 9 Classes to go!
CS203 – Advanced Computer Architecture
CSC3050 – Computer Architecture
Wackiness Algorithm A: Algorithm B:
Pipelining Hazards.
Presentation transcript:

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Pipelining Basics

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst The 5 Cycles in MIPS MIPS steps: 1.Fetch the instruction from RAM 2.Decode and read the regs 3.Execute the operation or calculate the effective address 4.Read/write RAM; store the regs 5.Save a RAM read into regs Pipelining principle: multiple instructions are overlapped in execution

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Basic Pipelining History CDC 6600 –One of the first pipeline processors –Dates back to 1970 –Designed by Seymour Cray Most modern CPUs, even in PCs and embedded chips, now include pipelining.

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Performance Possibilities Consider 1000 instructions to be pipelined Single cycle machine / non-pipelined –CCT = 8 ns due to longest datapath –CPI = 1 but 8 ns per instruction –8 ns * 1000 = 8000 ns Multi-cycle machine / pipelined –CCT = 2 ns due to longest stage in datapath –5 stages  10 ns per instruction –8 ns + 2 ns * 1000 = 2008 ns Speedup = 8000 / 2008 = 3.98  4 To “fill” the pipeline

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Pipeline Performance A single instruction takes more (or the same amount of ) time A group / sequence of instructions takes less time Pipelining increases throughput rather than decreasing execution time for an individual instruction Design principle: –Good designs demand good compromises

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst CPI Revisited CPI = total # of cycles total # of instructions Hypothetically,the CPI of a pipelined processor is 1

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Hazards Limits to Pipelined Performance

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Roadblocks to Pipelining Structural hazards –Multiple instructions vying for a single shared resource –Ex: RAM, ALU –Instruction! Data hazards –Later instruction uses the result of an earlier instruction –Ex: lw followed by an add that uses the loaded data Control hazards –Fetch of a later instruction relies on the result of an earlier instruction to determine the correct control path –Ex: conditional branches that are taken

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Structural Hazards Suppose Princeton architecture – one RAM for both instructions and data Structural hazard  two instructions require RAM in the same cycle Need to use Harvard architecture to accommodate this Lw FDEMW FDEMW FDEMW F

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Structural Hazards Which “instruction” is coming from the I-MEM in any given cycle? –Need to replicate it! Structural hazards can (usually) be removed by adding duplicate hardware

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst More Structural Hazards Which “instruction” is coming from the I-MEM in any given cycle? –Need to replicate it! Structural hazards can (usually) be removed by adding duplicate hardware How do I read and write to the register file at the same time?!?

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Pipelining Requirements Decode is performed in the second half of the D stage –D stage involves a read from the register file Write back is performed in the first half of the W stage –W stage involves a write to the register file Not actually how it is implemented (but the concept works) lw FDEMW lw FDEMW lw FDEMW

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Control & Data Hazards Solutions

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Hazard # 2 - Data Hazards nand cannot read reg 1 until add has stored it Since read/write can occur in the same cycle, must stall 2 cycles here before nand can proceed add FDEMW nand F--DEMW

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Solutions to Data Hazards Forwarding / bypassing –Data is forwarded, as soon is it available, from one stage to another –Forwarding occurs prior to the M/W stages Result of add is forwarded from E stage (output reg from ALU of add ) to the E stage of nand (back to the ALU again, but for nand this time) –reg 1 is not written until W stage, but its value is used earlier anyway add FDEMW nand FDEMW

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst More Forwarding / Bypassing add1 2 3 FDEMW sw FDEMW add1 2 3 FDEMW sw FDEMW

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Data Hazards: Load Stalls Cannot forward “back in time” – must permit a “load stall” to wait on the result of the load –Forwarding can’t solve everything (unfortunately) lw FDEMW add2 1 3 FDEMW lw FDEMW add2 1 3 FD-EMW

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Test Yourself Consider the following instruction sequence: What are the forwarding paths required to correctly implement this sequence? Are there any forwarding paths that conflict? add1 2 2 FDEMW add1 1 2 FDEMW add1 1 1 FDEMW

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Hazard # 3 - Control Hazards The lw instruction should only complete if the branch fails! add4 5 6 FDEMW beq1 2 loop FDEMW lw FDEMW

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Control Hazards (2) Stalls are “bubbles” in the pipeline – no useful work is accomplished in a stall The multi-cycle machine “resolves” branches in the E stage –Branch resolution could be completed in the D stage if we pass rA and rB thru a special “subtractor” and bypass the A and B regs –Resolving branches in the D stage requires only a single cycle of stalling in the pipeline (vs 2 if we stick to branch resolution in E) add4 5 6 FDEMW beq1 2 loop FDEMW lw FD add F next instruction  FDEMW

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Simple Solutions to Control Hazards What to do about control hazards: 1.Always stall –Resolve branches fast – in the D stage to reduce the stall to 1 cycle 2.Guess! (ok, “predict”) –Gamble on the most likely outcome of the branch test, and fetch the instruction that would be executed –If wrong  undo the fetch, and get the correct instruction – Ex : always predict branch failure, or always predict branch success

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Branch Prediction Example (1) Predict failure  if correct, this sequence proceeds without a stall Branch failure is equivalent to nop since the branch instruction does nothing add4 5 6 FDEMW beq1 2 loop FDEMW lw FDEMW

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Branch Prediction Example (2) Predict failure  if incorrect, must clear out the incorrect lw instruction and refetch the correct next instruction instead –Results in a 1 cycle stall when using early resolution add4 5 6 FDEMW beq1 2 loop FDEMW lw F---- correct instruction FDEMW

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst (Somewhat) Clever Solutions to Control Hazards 3.Dynamic branch prediction –Predict the next instruction based on the past history of the branch instruction –Requires a table of recent results of all branches encountered – “branch prediction table” Could predict branches with a 1 bit predictor model: –Save the result of a branch in a 1 bit buffer –The buffer is a table indexed by the low order bits of the address of the branch instruction If buffer contents = 0  predict branch not taken If buffer contents = 1  predict branch taken

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst 1 Bit Dynamic Branch Prediction for (int i = 0; i < 10; i++) { … } becomes lw1 0 ten add2 0 0 loop:beq2 1 exit … If using simple “not taken” prediction, we’re wrong 90% of the time! With a 1-bit predictor: –On first iteration, prediction is “not take branch”  incorrect –On last iteration, prediction is “take branch”  incorrect –2 mispredictions out of 10 tests  80% correct for 90% branch success

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst taken 2 Bit Dynamic Branch Prediction Could predict branches with a 2 bit predictor/corrector FSM: –(basically a 2-bit “saturating” adder) On the same example, we get 90% correct with 90% branch success Predict taken Predict not taken taken not taken taken not taken (weak) (strong!)

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Modern Branch Prediction Modern branch prediction is extremely important! –Long pipelines  huge branch penalties –We need to be right as much as possible. Because of the importance, modern predictors are also –Extremely complex (some mimic AI routines in hardware) –Take up a lot of space (lots of memories to store historical information)

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Branch Prediction – Some Stats Predict: –Not taken – ~50-60% accurate –NT but backwards taken – ~ 65% accurate –Same as last time – ~ 80% accurate Actual Designs –Pentium – ~85% accurate –Pentium Pro – ~92% accurate Researched Designs –Papers have demonstrated over 96-98% accuracy