How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining.

Slides:

Advertisements

Similar presentations

Lecture 4: CPU Performance

Advertisements

Advanced Computer Architectures Laboratory on DLX Pipelining Vittorio Zaccaria.

Computer Architecture Lecture 2 Abhinav Agarwal Veeramani V.

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.

Chapter 8. Pipelining.

Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.

Pipelining Hwanmo Sung CS147 Presentation Professor Sin-Min Lee.

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

1 COMP541 Sequencing – III (Sequencing a Computer) Montek Singh April 9, 2007.

L18 – Pipeline Issues 1 Comp 411 – Spring /03/08 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you.

Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.

L17 – Pipeline Issues 1 Comp 411 – Fall /1308 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you been.

Lec 17 Nov 2 Chapter 4 – CPU design data path design control logic design single-cycle CPU performance limitations of single cycle CPU multi-cycle CPU.

Topics covered: CPU Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.

Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.

DLX Instruction Format

Computer ArchitectureFall 2007 © October 31, CS-447– Computer Architecture M,W 10-11:20am Lecture 17 Review.

Appendix A Pipelining: Basic and Intermediate Concepts

Computer ArchitectureFall 2008 © October 6th, 2008 Majd F. Sakr CS-447– Computer Architecture.

Instruction Sets and Pipelining Cover basics of instruction set types and fundamental ideas of pipelining Later in the course we will go into more depth.

Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.

9.2 Pipelining Suppose we want to perform the combined multiply and add operations with a stream of numbers: A i * B i + C i for i =1,2,3,…,7.

What have mr aldred’s dirty clothes got to do with the cpu

COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections

Memory/Storage Architecture Lab Computer Architecture Pipelining Basics.

Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.

B 0000 Pipelining ENGR xD52 Eric VanWyk Fall

Lecture 8: Processors, Introduction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014,

B10001 Pipelining Hazards ENGR xD52 Eric VanWyk Fall 2012.

Pipelining the Beta bet·ta ('be-t&) n. Any of various species

Lec 15Systems Architecture1 Systems Architecture Lecture 15: A Simple Implementation of MIPS Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some.

Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.

ECE 456 Computer Architecture Lecture #14 – CPU (III) Instruction Cycle & Pipelining Instructor: Dr. Honggang Wang Fall 2013.

CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.

CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.

CMPE 421 Parallel Computer Architecture

1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.

Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.

CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and

Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.

Pipelining Example Laundry Example: Three Stages

CPU Overview Computer Organization II 1 February 2009 © McQuain & Ribbens Introduction CPU performance factors – Instruction count n Determined.

Computer Organization and Design Pipelining Montek Singh Mon, Dec 2, 2013 Lecture 16.

Computer Organization and Design Pipelining Montek Singh Dec 2, 2015 Lecture 16 (SELF STUDY – not covered on the final exam)

11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.

Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.

Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.

Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 10 Computer Hardware Design (Pipeline Datapath and Control Design) Prof. Dr.

DICCD Class-08. Parallel processing A parallel processing system is able to perform concurrent data processing to achieve faster execution time The system.

Computer Organization

Pipelining: Hazards Ver. Jan 14, 2014

Morgan Kaufmann Publishers

Lecture: Pipelining Basics

Performance of Single-cycle Design

Morgan Kaufmann Publishers

Processor Architecture: Introduction to RISC Datapath (MIPS and Nios II) CSCE 230.

Chapter 4 The Processor Part 2

How Computers Work Lecture 13 Details of the Pipelined Beta

Serial versus Pipelined Execution

The Processor Lecture 3.1: Introduction & Logic Design Conventions

Data Dependence Distances

Pipelining Appendix A and Chapter 3.

RTL for the SRC pipeline registers

Introduction to Computer Organization and Architecture

A relevant question Assuming you’ve got: One washer (takes 30 minutes)

Lecture: Pipelining Basics

Presentation transcript:

How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining

How Computers Work Lecture 12 Page 2 A Common Chore of College Life

How Computers Work Lecture 12 Page 3 Propagation Times Tpd wash = _______ Tpd dry = _______

How Computers Work Lecture 12 Page 4 Doing 1 Load Total Time = _______________ = _______________ Step 1: Step 2:

How Computers Work Lecture 12 Page 5 Doing 2 Loads Combinational (Harvard) Method Step 1: Step 2: Step 3: Step 4: Total Time = ________

How Computers Work Lecture 12 Page 6 Doing 2 Loads Pipelined (MIT) Method Step 1: Step 2: Step 3: Total Time = ________

How Computers Work Lecture 12 Page 7 Doing N Loads Harvard Method:_________________ MIT Method:____________________

How Computers Work Lecture 12 Page 8 A Few Definitions Latency: Time for 1 object to pass through entire system. (= ________ for Harvard laundry) (= ________ for MIT laundry) Throughput: Rate of objects going through. (= ________ for Harvard laundry) (= ________ for MIT laundry)

How Computers Work Lecture 12 Page 9 A Computational Problem Add 4 Numbers: ++ + ABCD A + B + C + D

How Computers Work Lecture 12 Page 10 As a Combinational Circuit ++ + Tpd Throughput 1 / 2 Tpd Latency 2 Tpd

How Computers Work Lecture 12 Page 11 As a Pipelined Circuit ++ + Tpd Throughput 1 / Tpd Latency 2 Tpd Tpd clock

How Computers Work Lecture 12 Page 12 Simplifying Assumptions ++ + Tpd clock 1. Synchronous inputs 2. Ts = Th = 0 Tpd c-q = 0 Tcd c-q = 0 clock

How Computers Work Lecture 12 Page 13 An Inhomogeneous Case (Combinational) ** + Throughput 1 / 3 Latency 3 Tpd = 2 Tpd = 1

How Computers Work Lecture 12 Page 14 ** + Throughput 1 / 2 Latency 4 Tpd = 2 Tpd = 1 An Inhomogeneous Case (Pipelined)

How Computers Work Lecture 12 Page 15 How about this one? * (1) + (4) + (1) + (4) + (1) Comb. Latency 6 Comb. Throughput 1/6 Pipe. Latency 12 Pipe. Throughput 1/4

How Computers Work Lecture 12 Page 16 How MIT Students REALLY do Laundry Steady State Throughput = ____________ Steady State Latency = ____________

How Computers Work Lecture 12 Page 17 Interleaving (an alternative to Pipelining) For N Units of delay Tpd, steady state Throughput N / Tpd Latency Tpd

How Computers Work Lecture 12 Page 18 Interleaving Parallel Circuits clk 1-4 sel xxxx 1234

How Computers Work Lecture 12 Page 19 Definition of a Well-Formed Pipeline Same number of registers along path from any input to every computational unit –Insures that every computational unit sees inputs IN PHASE Is true (non-obvious) whenever the # of registered between all inputs and all outputs is the same.

How Computers Work Lecture 12 Page 20 Method for Forming Well-Formed Pipelines Add registers to system output at will Propagate registers from intermediate outputs to intermediate inputs, cloning registers as necessary. * (2) + (1) + (1) + (1) + (1)

How Computers Work Lecture 12 Page 21 Method for Maximizing Throughput Pipeline around longest latency element Pipeline around other sections with latency as large as possible, but <= longest latency element. * (2) + (1) + (1) + (1) + (1) + (1) + (1) Comb. Latency 5 Comb. Throughput 1/5 Pipe. Latency 6 Pipe. Throughput 1/2

How Computers Work Lecture 12 Page 22 A Few Questions Assuming a circuit is pipelined for optimum throughput with 0 delay registers, is the pipelined throughput always greater than or equal to the combinational throughput? –A: Yes Is the pipelined latency ever less than combinational latency? –A: No When is the pipelined latency equal to combinational latency? –A: If contents of all pipeline stages have equal combinational latency

How Computers Work Lecture 12 Page 23 CPU Performance MIPS = Millions of Instructions Per Second Freq = Clock Frequency, MHz CPI = Clocks per Instruction MIPS = Freq CPI To Increase MIPS: 1. DECREASE CPI. - RISC reduces CPI to CPI < 0? Tough... we’ll see multiple instruction issue machines at end of term. 2. INCREASE Freq. - Freq limited by delay along longest combinational path; hence - PIPELINING is the key to improved performance through fast clocks.

How Computers Work Lecture 12 Page 24 Review: A Top-Down View of the Beta Architecture With st(ra,C,rc) : Mem[C+ ]

How Computers Work Lecture 12 Page 25 Pipeline Stages GOAL: Maintain (nearly) 1.0 CPI, but increase clock speed. APPROACH: structure processor as 4-stage pipeline: Instruction Fetch stage: Maintains PC, fetches one instruction per cycle and passes it to Register File stage: Reads source operands from register file, passes them to ALU stage: Performs indicated operation, passes result to Write-Back stage: writes result back into register file. IF RF ALU WB WHAT OTHER information do we have to pass down the pipeline?

How Computers Work Lecture 12 Page 26 Sketch of 4-Stage Pipeline IF instruction Instruction Fetch ALU instruction ALU Y CL ABinstruction Register File CL instruction Write Back CL RF (read) RF (write)

How Computers Work Lecture 12 Page 27 IF RF ALU WB

How Computers Work Lecture 12 Page 28 4-Pipeline Parallelism... ADDC(r1, 1, r2) SUBC(r1, 1, r3) XOR(r1, r5, r1) MUL(r1, r2, r0)... Consider a sequence of instructions: Executed on our 4-stage pipeline: ADDC(r1,1,r2)IFRFALUWB SUBC(r1,1,r3)IFRFALUWB XOR(r1,r5,r1)IFRFALUWB MUL(r1,r2,r0)IFRFALUWB Time R2 Written R3 Written R1 Written R0 Written R1 Read R1,R5 Read R1,R2 Read

How Computers Work Lecture 12 Page 29 Pipeline Problems LOOP:ADD(r1, r2, r3) CMPLEC(r3, 100, r0) BT(r0, LOOP) XOR(r31, r31, r3) MUL(r1, r2, r2)... BUT, consider instead: ADD(r1,r2,r3)IFRFALUWB CMPLEC(r3,100,r0)IFRFALUWB BT(r0.LOOP)IFRFALUWB XOR(r31,r31,r3)IFRFALUWB MUL(r1,r2,r2)IFRFALUWB Time

How Computers Work Lecture 12 Page 30 Pipeline Hazards PROBLEM: Contents of a register WRITTEN by instruction k is READ by instruction k+1... before its stored in RF! EG: ADD(r1, r2, r3) CMPLEC(r3, 100, r0) MULC(r1, 100, r4) SUB(r1, r2, r5) fails since CMPLEC sees “stale”. ADD(r1,r2,r3)IFRFALUWB CMPLEC(r3,100,r0)IFRFALUWB BT(r0.LOOP)IFRFALUWB XOR(r31,r31,r3)IFRFALUWB MUL(r1,r2,r2)IFRFALUWB Time R3 Written R3 Read

How Computers Work Lecture 12 Page 31 SOLUTIONS: 1. “Program around it”.... document weirdo semantics, declare it a software problem. - Breaks sequential semantics! - Costs code efficiency. ADD(r1, r2, r3) CMPLEC(r3, 100, r0) MULC(r1, 100, r4) SUB(r1, r2, r5) ADD(r1, r2, r3) MULC(r1, 100, r4) SUB(r1, r2, r5) CMPLEC(r3, 100, r0) EXAMPLE: Rewrite as HOW OFTEN can we do this? ADD(r1,r2,r3)IFRFALUWB CMPLEC(r3,100,r0)IFRFALUWB IFRFALUWB CMPLEC(r3,100,r0)IFRFALUWB IFRFALUWB R3 Written R3 Read

How Computers Work Lecture 12 Page 32 SOLUTIONS: 2. Stall the pipeline. Freeze IF, RF stages for 2 cycles, inserting NOPs into ALU IR... DRAWBACK: SLOW ADD(r1,r2,r3)IFRFALUWB NOPIFRFALUWB NOPIFRFALUWB CMPLEC(r3,100,r0)IFRFALUWB BT(r0.LOOP)IFRFALUWB XOR(r31,r31,r3)IFRFALUWB MUL(r1,r2,r2)IFRFALUWB R3 Written R3 Read

How Computers Work Lecture 12 Page 33 SOLUTIONS: 3. Bypass Paths. Add extra data paths & control logic to re-route data in problem cases. ADD(r1,r2,r3)IFRFALUWB CMPLEC(r3,100,r0)IFRFALUWB BT(r0.LOOP)IFRFALUWB XOR(r31,r31,r3)IFRFALUWB MUL(r1,r2,r2)IFRFALUWB + Produced + Used

How Computers Work Lecture 12 Page 34 IF RF ALU WB Hardware Implementation of Bypass Paths

How Computers Work Lecture 12 Page 35 Next Time: Detailed Design of –Bypass Paths + Control Logic What to do when Bypass Paths Don’t Work –Branch Delays / Tradeoffs –Load/Store Delays / Tradeoffs –Multi-Stage Memory Pipeline