Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP.

Slides:

Advertisements

Similar presentations

Morgan Kaufmann Publishers The Processor

Advertisements

Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.

CMPT 334 Computer Organization

MIPS Pipelined Datapath

Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.

ECE 445 – Computer Organization

Pipelined Datapath and Control (Lecture #13) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Pipelined Processor.

Part 2 - Data Hazards and Forwarding 3/24/04++

Review: MIPS Pipeline Data and Control Paths

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 18 - Pipelined.

Chapter Six Enhancing Performance with Pipelining

Computer Organization Lecture Set – 06 Chapter 6 Huei-Yung Lin.

Lecture 28: Chapter 4 Today’s topic –Data Hazards –Forwarding 1.

Morgan Kaufmann Publishers

Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.

1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.

Pipeline Data Hazards: Detection and Circumvention Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly.

Lecture 8: Processors, Introduction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014,

Lecture 14: Processors CS 2011 Fall 2014, Dr. Rozier.

Pipelined Datapath and Control

Pipeline Computer Organization II 1 Pipelining Analogy Pipelined laundry: overlapping execution – Parallelism improves performance Four loads: – Speedup.

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 8: MIPS Pipelined.

Computer Organization CS224 Fall 2012 Lesson 28. Pipelining Analogy  Pipelined laundry: overlapping execution l Parallelism improves performance §4.5.

Morgan Kaufmann Publishers

Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.

Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

CMPE 421 Parallel Computer Architecture Part 2: Hardware Solution: Forwarding.

2/15/02CSE Data Hazzards Data Hazards in the Pipelined Implementation.

CSE431 L07 Overcoming Data Hazards.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 07: Overcoming Data Hazards Mary Jane Irwin (

Computing Systems Pipelining: enhancing performance.

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 9: MIPS Pipeline.

Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.

C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor.

Introduction to Computer Organization Pipelining.

Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.

CPE432 Chapter 4B.1Dr. W. Abu-Sufah, UJ Chapter 4B: The Processor, Part B-1 Read Sections 4.7 Adapted from Slides by Prof. Mary Jane Irwin, Penn State.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.

CSE 340 Computer Architecture Spring 2016 Overcoming Data Hazards.

Note how everything goes left to right, except …

Morgan Kaufmann Publishers

Morgan Kaufmann Publishers The Processor

Morgan Kaufmann Publishers The Processor

Single Clock Datapath With Control

Pipeline Implementation (4.6)

CDA 3101 Spring 2016 Introduction to Computer Organization

Morgan Kaufmann Publishers The Processor

Forwarding Now, we’ll introduce some problems that data hazards can cause for our pipelined processor, and show how to handle them with forwarding.

Chapter 4 The Processor Part 3

Review: MIPS Pipeline Data and Control Paths

Morgan Kaufmann Publishers The Processor

Csci 136 Computer Architecture II – Data Hazard, Forwarding, Stall

Morgan Kaufmann Publishers The Processor

Morgan Kaufmann Publishers The Processor

Chapter 4 The Processor Part 2

Morgan Kaufmann Publishers Enhancing Performance with Pipelining

Computer Organization CS224

Pipelining in more detail

Pipelined Control (Simplified)

The Processor Lecture 3.4: Pipelining Datapath and Control

The Processor Lecture 3.5: Data Hazards

CSC3050 – Computer Architecture

Pipelining (II).

Morgan Kaufmann Publishers The Processor

Guest Lecturer: Justin Hsia

MIPS Pipelined Datapath

©2003 Craig Zilles (derived from slides by Howard Huang)

Presentation transcript:

Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP

Chapter 4 — The Processor — 2 SCP With Jumps Added

Chapter 4 — The Processor — 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory  register file  ALU  data memory  register file Not feasible to vary period for different instructions Violates design principle Making the common case fast We will improve performance by pipelining

Chapter 4 — The Processor — 4 Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance §4.5 An Overview of Pipelining Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup = 2n/0.5n ≈ 4 = number of stages

Chapter 4 — The Processor — 5 MIPS Pipeline Five stages, one step per stage 1.IF: Instruction fetch from memory 2.ID: Instruction decode & register read 3.EX: Execute operation or calculate address 4.MEM: Access memory operand 5.WB: Write result back to register

Chapter 4 — The Processor — 6 Pipeline Performance Assume time for stages is 100ps for register read or write 200ps for other stages Compare pipelined datapath with single-cycle datapath InstrInstr fetchRegister read ALU opMemory access Register write Total time lw200ps100 ps200ps 100 ps800ps sw200ps100 ps200ps 700ps R-format200ps100 ps200ps100 ps600ps beq200ps100 ps200ps500ps

Chapter 4 — The Processor — 7 Pipeline Performance Single-cycle (T c = 800ps) Pipelined (T c = 200ps)

Chapter 4 — The Processor — 8 Pipeline Speedup If all stages are balanced i.e., all take the same time Time between instructions pipelined = Time between instructions nonpipelined Number of stages If not balanced, speedup is less Speedup due to increased throughput Latency (time for each instruction) does not decrease

Chapter 4 — The Processor — 9 Pipelining and ISA Design MIPS ISA designed for pipelining All instructions are 32-bits Easier to fetch and decode in one cycle c.f. x86: 1- to 17-byte instructions Few and regular instruction formats Can decode and read registers in one step Load/store addressing Can calculate address in 3 rd stage, access memory in 4 th stage Alignment of memory operands Memory access takes only one cycle

Chapter 4 — The Processor — 10 Hazards Situations that prevent starting the next instruction in the next cycle Structure hazards A required resource is busy Data hazard Need to wait for previous instruction to complete its data read/write Control hazard Deciding on control action depends on previous instruction There are ways to handle those hazards. Let’s ignore them for now

Chapter 4 — The Processor — 11 MIPS Pipelined Datapath §4.6 Pipelined Datapath and Control WB MEM Right-to-left flow leads to hazards

Chapter 4 — The Processor — 12 Pipeline registers Need registers between stages To hold information produced in previous cycle

Chapter 4 — The Processor — 13 Pipeline Operation Cycle-by-cycle flow of instructions through the pipelined datapath “Single-clock-cycle” pipeline diagram Shows pipeline usage in a single cycle Highlight resources used c.f. “multi-clock-cycle” diagram Graph of operation over time We’ll look at “single-clock-cycle” diagrams for load & store

Chapter 4 — The Processor — 14 IF for Load, Store, …

Chapter 4 — The Processor — 15 ID for Load, Store, …

Chapter 4 — The Processor — 16 EX for Load

Chapter 4 — The Processor — 17 MEM for Load

Chapter 4 — The Processor — 18 WB for Load Wrong register number

Chapter 4 — The Processor — 19 Corrected Datapath for Load

Chapter 4 — The Processor — 20 EX for Store

Chapter 4 — The Processor — 21 MEM for Store

Chapter 4 — The Processor — 22 WB for Store

Chapter 4 — The Processor — 23 Multi-Cycle Pipeline Diagram Form showing resource usage

Chapter 4 — The Processor — 24 Multi-Cycle Pipeline Diagram Traditional form

Chapter 4 — The Processor — 25 Single-Cycle Pipeline Diagram State of pipeline in a given cycle

Chapter 4 — The Processor — 26 Pipelined Control (Simplified)

Chapter 4 — The Processor — 27 Pipelined Control Control signals derived from instruction As in single-cycle implementation

Chapter 4 — The Processor — 28 Pipelined Control

Chapter 4 — The Processor — 29 Pipeline Summary Pipelining improves performance by increasing instruction throughput Executes multiple instructions in parallel Each instruction has the same latency Subject to hazards Structure, data, control (will be studied) Instruction set design affects complexity of pipeline implementation The BIG Picture

Chapter 4 — The Processor — 30 Hazards Situations that prevent starting the next instruction in the next cycle Structure hazards A required resource is busy Data hazard Need to wait for previous instruction to complete its data read/write Control hazard Deciding on control action depends on previous instruction

Chapter 4 — The Processor — 31 Structure Hazards Conflict for use of a resource In MIPS pipeline with a single memory Load/store requires data access Instruction fetch would have to stall for that cycle Would cause a pipeline “bubble” Hence, pipelined datapaths require separate instruction/data memories Or separate instruction/data caches

Structure Hazards How about the Registers? For a given cycle, an lw/ALU instruction may write to the Registers, while a new instruction is reading from the Registers This is NOT a structure hazards The above two instructions are using different ports of the Registers This is a data hazard (to be discussed) Chapter 1 — Computer Abstractions and Technology — 32

Chapter 4 — The Processor — 33 Data Hazards An instruction depends on completion of data access by a previous instruction add$s0, $t0, $t1 sub$t2, $s0, $t3 How about the following code? lw$t0, 100($gp) lw$t1, 104($gp) add$t2, $t0, $t1 sub$t3, $t2, $s0 sw$t3, 108($gp)

Chapter 4 — The Processor — 34 Data Hazards A naïve approach: Stall the 2 nd instruction in the dependence add$s0, $t0, $t1 sub$t2, $s0, $t3

Data Hazards in ALU Instructions Chapter 1 — Computer Abstractions and Technology — 35

Chapter 4 — The Processor — 36 Dependencies & Forwarding

Chapter 4 — The Processor — 37 Detecting the Need to Forward Pass register numbers along pipeline e.g., ID/EX.RegisterRs = register number for Rs sitting in ID/EX pipeline register ALU operand register numbers in EX stage are given by ID/EX.RegisterRs, ID/EX.RegisterRt Data hazards when 1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt Fwd from EX/MEM pipeline reg Fwd from MEM/WB pipeline reg

Chapter 4 — The Processor — 38 Detecting the Need to Forward But only if forwarding instruction will write to a register! EX/MEM.RegWrite, MEM/WB.RegWrite And only if Rd for that instruction is not $zero EX/MEM.RegisterRd ≠ 0, MEM/WB.RegisterRd ≠ 0

Chapter 4 — The Processor — 39 Forwarding Paths

Chapter 4 — The Processor — 40 Forwarding Conditions EX hazard if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10 if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10 MEM hazard if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01

More Thoughts on Data Hazards When are we concerned with data hazards? Instruction B uses the output of another instruction A; and When B reads the Registers, A hasn’t yet written to the Registers In single-cycle processor, this never happens All operations of A completes before B starts In pipelined processor, we have multiple instructions pending in the pipeline Chapter 1 — Computer Abstractions and Technology — 41

More Thoughts on Data Hazards Two type of data dependences Register: Value passed through register Producer: ALU, lw Consumer: ALU, lw, sw, beq Memory: Value passed through memory Producer: sw Consumer: lw Chapter 1 — Computer Abstractions and Technology — 42

More Thoughts on Data Hazards No pipeline bubble with data forwarding An ALU instruction produces its register output value at the end of its EX stage Other instructions consumes their register inputs in the beginning of their EX stage The datum is already in the pipeline when needed! How about lw instruction? A lw instruction produces its register output value at the end of its MEM stage Chapter 1 — Computer Abstractions and Technology — 43