Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.

Slides:

Advertisements

Similar presentations

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

Advertisements

Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.

CMPT 334 Computer Organization

Pipelining I (1) Fall 2005 Lecture 18: Pipelining I.

Pipelining II (1) Fall 2005 Lecture 19: Pipelining II.

CS61C L29 CPU Design : Pipelining to Improve Performance II (1) Garcia, Fall 2006 © UCB Running away with it  Our #8 football team had a great effort.

CS61C L24 Review Pipeline © UC Regents 1 CS61C - Machine Structures Lecture 24 - Review Pipelined Execution November 29, 2000 David Patterson

ECE 445 – Computer Organization

Pipelined Datapath and Control (Lecture #13) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.

CS Computer Architecture 1 CS 430 – Computer Architecture Pipelined Execution - Review William J. Taffe using slides of David Patterson.

Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 29 – CPU Design : Pipelining to Improve Performance II Cal researcher Marty.

CS61C L30 CPU Design : Pipelining to Improve Performance II (1) Garcia, Spring 2007 © UCB E-voting bill in congress!  Rep Rush Holt (D-NJ) has a bill.

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 1.

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan

1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.

CS61C L31 Pipelined Execution, part II (1) Garcia, Fall 2004 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.

Scott Beamer, Instructor

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 17 - Pipelined.

Morgan Kaufmann Publishers

CS 61C L38 Pipelined Execution, part II (1) Garcia, Spring 2004 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.

Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.

CS3350B Computer Architecture Winter 2015 Lecture 6.2: Instructional Level Parallelism: Hazards and Resolutions Marc Moreno Maza

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Pipelining Instructors: Krste Asanovic & Vladimir Stojanovic

CS 61C: Great Ideas in Computer Architecture Lecture 13: 5-Stage MIPS CPU (Pipelining) Instructor: Sagar Karandikar

1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.

Computer Science Education

Lecture 8: Processors, Introduction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014,

Lecture 14: Processors CS 2011 Fall 2014, Dr. Rozier.

Pipeline Computer Organization II 1 Pipelining Analogy Pipelined laundry: overlapping execution – Parallelism improves performance Four loads: – Speedup.

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 8: MIPS Pipelined.

Computer Organization CS224 Fall 2012 Lesson 28. Pipelining Analogy  Pipelined laundry: overlapping execution l Parallelism improves performance §4.5.

Morgan Kaufmann Publishers

Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.

Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.

Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP.

CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.

CMPE 421 Parallel Computer Architecture

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

CS 61C: Great Ideas in Computer Architecture Pipelining & Hazards 1 Instructors: John Wawrzynek & Vladimir Stojanovic

Cs 152 L1 3.1 DAP Fa97,  U.CB Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Multiple tasks.

1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.

Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.

Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.

LECTURE 7 Pipelining. DATAPATH AND CONTROL We started with the single-cycle implementation, in which a single instruction is executed over a single cycle.

Introduction to Computer Organization Pipelining.

Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards 1 Instructors: Vladimir Stojanovic and Nicholas Weaver

CS 110 Computer Architecture Lecture 11: Pipelining Instructor: Sören Schwertfeger School of Information Science and Technology.

Lecture 5. MIPS Processor Design Pipelined MIPS #1 Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212 Computer Architecture.

Pipelining concepts, datapath and hazards

Morgan Kaufmann Publishers

Performance of Single-cycle Design

Instructor: Justin Hsia

Pipeline Implementation (4.6)

Chapter 4 The Processor Part 3

Review: MIPS Pipeline Data and Control Paths

Morgan Kaufmann Publishers The Processor

Pipelining Chapter 6.

Instruction Execution Cycle

Pipeline Control unit (highly abstracted)

Morgan Kaufmann Publishers The Processor

Guest Lecturer: Justin Hsia

Presentation transcript:

Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1

Great Idea #4: Parallelism Smart Phone Warehouse Scale Computer Leverage Parallelism & Achieve High Performance Core … Memory Input/Output Computer Core Parallel Requests Assigned to computer e.g. search “Garcia” Parallel Threads Assigned to core e.g. lookup, ads Parallel Instructions > 1 one time e.g. 5 pipelined instructions Parallel Data > 1 data one time e.g. add of 4 pairs of words Hardware descriptions All gates functioning in parallel at same time Software Hardware Cache Memory Core Instruction Unit(s) Functional Unit(s) A 0 +B 0 A 1 +B 1 A 2 +B 2 A 3 +B 3 Logic Gates 2

Review of Last Lecture Implementing controller for your datapath – Take decoded signals from instruction and generate control signals – Use “AND” and “OR” Logic scheme Pipelining improves performance by exploiting Instruction Level Parallelism – 5-stage pipeline for MIPS: IF, ID, EX, MEM, WB – Executes multiple instructions in parallel – What can go wrong??? 3

Agenda Pipelining Performance Structural Hazards Administrivia Data Hazards – Forwarding – Load Delay Slot Control Hazards 4

Review: Pipelined Datapath 5

Pipelined Execution Representation Every instruction must take same number of steps, so some stages will idle – e.g. MEM stage for any arithmetic instruction IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB Time 6

Graphical Pipeline Diagrams Use datapath figure below to represent pipeline: IFIDEXMemWB ALU I$ Reg D$Reg 1. Instruction Fetch 2. Decode/ Register Read 3. Execute4. Memory 5. Write Back PC instruction memory +4 Register File rt rs rd ALU Data memory imm MUX 7

InstrOrderInstrOrder Load Add Store Sub Or I$ Time (clock cycles) I$ ALU Reg I$ D$ ALU Reg D$ Reg I$ D$ Reg ALU Reg D$ Reg D$ ALU RegFile: left half is write, right half is read Reg I$ Graphical Pipeline Representation 8

Pipelining Performance (1/3) Use T c (“time between completion of instructions”) to measure speedup – – Equality only achieved if stages are balanced (i.e. take the same amount of time) If not balanced, speedup is reduced Speedup due to increased throughput – Latency for each instruction does not decrease 9

Pipelining Performance (2/3) Assume time for stages is – 100ps for register read or write – 200ps for other stages What is pipelined clock rate? – Compare pipelined datapath with single-cycle datapath InstrInstr fetch Register read ALU opMemory access Register write Total time lw200ps100 ps200ps 100 ps800ps sw200ps100 ps200ps 700ps R-format200ps100 ps200ps100 ps600ps beq200ps100 ps200ps500ps 10

Pipelining Performance (3/3) Single-cycle T c = 800 ps Pipelined T c = 200 ps 11

Pipelining Hazards A hazard is a situation that prevents starting the next instruction in the next clock cycle 1)Structural hazard – A required resource is busy (e.g. needed in multiple stages) 2)Data hazard – Data dependency between instructions – Need to wait for previous instruction to complete its data read/write 3)Control hazard – Flow of execution depends on previous instruction 12

1. Structural Hazards Conflict for use of a resource MIPS pipeline with a single memory? – Load/Store requires memory access for data – Instruction fetch would have to stall for that cycle Causes a pipeline “bubble” Hence, pipelined datapaths require separate instruction/data memories – Separate L1 I$ and L1 D$ take care of this 13

I$ Load Instr 1 Instr 2 Instr 3 Instr 4 ALU I$ Reg D$Reg ALU I$ Reg D$Reg ALU I$ Reg D$Reg ALU Reg D$Reg ALU I$ Reg D$Reg InstrOrderInstrOrder Time (clock cycles) Structural Hazard #1: Single Memory Trying to read same memory twice in same clock cycle 14

Structural Hazard #2: Registers (1/2) I$ Load Instr 1 Instr 2 Instr 3 Instr 4 ALU I$ Reg D$Reg ALU I$ Reg D$Reg ALU I$ Reg D$Reg ALU Reg D$Reg ALU I$ Reg D$Reg InstrOrderInstrOrder Time (clock cycles) Can we read and write to registers simultaneously? 15

Structural Hazard #2: Registers (2/2) Two different solutions have been used: 1)Split RegFile access in two: Write during 1 st half and Read during 2 nd half of each clock cycle Possible because RegFile access is VERY fast (takes less than half the time of ALU stage) 2)Build RegFile with independent read and write ports Conclusion: Read and Write to registers during same clock cycle is okay 16

Administrivia Check-in with Project 3… 17

2. Data Hazards (1/2) Consider the following sequence of instructions: add $t0, $t1, $t2 sub $t4, $t0, $t3 and $t5, $t0, $t6 or $t7, $t0, $t8 xor $t9, $t0, $t10 18

2. Data Hazards (2/2) Data-flow backwards in time are hazards sub $t4,$t0,$t3 ALU I$ Reg D$Reg and $t5,$t0,$t6 ALU I$ Reg D$Reg or $t7,$t0,$t8 I$ ALU Reg D$Reg xor $t9,$t0,$t10 ALU I$ Reg D$Reg add $t0,$t1,$t2 IFID/RFEXMEMWB ALU I$ Reg D$ Reg InstrOrderInstrOrder Time (clock cycles) 19

Data Hazard Solution: Forwarding Forward result as soon as it is available – OK that it’s not stored in RegFile yet sub $t4,$t0,$t3 ALU I$ Reg D$Reg and $t5,$t0,$t6 ALU I$ Reg D$Reg or $t7,$t0,$t8 I$ ALU Reg D$Reg xor $t9,$t0,$t10 ALU I$ Reg D$Reg add $t0,$t1,$t2 IFID/RFEXMEMWB ALU I$ Reg D$ Reg 20

Datapath for Forwarding (1/2) What changes need to be made here? 21

Datapath for Forwarding (2/2) Handled by forwarding unit 22

Data Hazard: Loads (1/4) Recall: Dataflow backwards in time are hazards Can’t solve all cases with forwarding –Must stall instruction dependent on load, then forward (more hardware) sub $t3,$t0,$t2 ALU I$ Reg D$Reg lw $t0,0($t1) IFID/RFEX MEM WB ALU I$ Reg D$ Reg 23

Data Hazard: Loads (2/4) Hardware stalls pipeline – Called “hardware interlock” sub $t3,$t0,$t2 ALU I$ Reg D$Reg bub ble and $t5,$t0,$t4 ALU I$ Reg D$Reg bub ble or $t7,$t0,$t6 I$ ALU Reg D$ bub ble lw $t0, 0($t1) IFID/RFEX MEM WB ALU I$ Reg D$ Reg Schematically, this is what we want, but in reality stalls done “horizontally” How to stall just part of pipeline? 24

Data Hazard: Loads (3/4) Stall is equivalent to nop sub $t3,$t0,$t2 and $t5,$t0,$t4 or $t7,$t0,$t6 I$ ALU Reg D$ lw $t0, 0($t1) ALU I$ Reg D$ Reg bub ble ALU I$ Reg D$ Reg ALU I$ Reg D$ Reg nop 25

Data Hazard: Loads (4/4) Slot after a load is called a load delay slot – If that instruction uses the result of the load, then the hardware interlock will stall it for one cycle – Letting the hardware stall the instruction in the delay slot is equivalent to putting a nop in the slot (except the latter uses more code space) Idea: Let the compiler put an unrelated instruction in that slot  no stall! 26

Code Scheduling to Avoid Stalls Reorder code to avoid use of load result in the next instruction! MIPS code for D=A+B; E=A+C; # Method 1: lw$t1, 0($t0) lw$t2, 4($t0) add$t3, $t1, $t2 sw$t3, 12($t0) lw$t4, 8($t0) add$t5, $t1, $t4 sw$t5, 16($t0) # Method 2: lw$t1, 0($t0) lw$t2, 4($t0) lw$t4, 8($t0) add$t3, $t1, $t2 sw$t3, 12($t0) add$t5, $t1, $t4 sw$t5, 16($t0) Stall! 13 cycles 11 cycles 27

Question: For each code sequences below, choose one of the statements below: No stalls as is A) No stalls with forwarding B) Must stall C) 1: lw$t0,0($t0) add $t1,$t0,$t0 2: add $t1,$t0,$t0 addi $t2,$t0,5 addi $t4,$t1,5 3: addi $t1,$t0,1 addi $t2,$t0,2 addi $t3,$t0,2 addi $t3,$t0,4 addi $t5,$t1,5 28 PEER INSTRUCTION

Code Sequence 1 I$ lw add instr ALU I$ Reg D$Reg ALU I$ Reg D$Reg ALU I$ Reg D$Reg ALU Reg D$Reg ALU I$ Reg D$Reg InstrOrderInstrOrder Time (clock cycles) Must stall 29 PEER INSTRUCTION ANSWER

Code Sequence 2 I$ add addi instr ALU I$ Reg D$Reg ALU I$ Reg D$Reg ALU I$ Reg D$Reg ALU Reg D$Reg ALU I$ Reg D$Reg InstrOrderInstrOrder Time (clock cycles) forwarding no forwarding No stalls with forwarding 30 PEER INSTRUCTION ANSWER

Code Sequence 3 I$ addi ALU I$ Reg D$Reg ALU I$ Reg D$Reg ALU I$ Reg D$Reg ALU Reg D$Reg ALU I$ Reg D$Reg InstrOrderInstrOrder Time (clock cycles) No stalls as is 31 PEER INSTRUCTION ANSWER

Summary Hazards reduce effectiveness of pipelining – Cause stalls/bubbles Structural Hazards – Conflict in use of datapath component Data Hazards – Need to wait for result of a previous instruction Control Hazards – Address of next instruction uncertain/unknown – More to come next lecture! 32