Guest Lecturer: Justin Hsia

Slides:

Advertisements

Similar presentations

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

Advertisements

Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.

Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 1.

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan

L17 – Pipeline Issues 1 Comp 411 – Fall /1308 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you been.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 17 - Pipelined.

Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.

CS3350B Computer Architecture Winter 2015 Lecture 6.2: Instructional Level Parallelism: Hazards and Resolutions Marc Moreno Maza

1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 8: MIPS Pipelined.

Computer Organization CS224 Fall 2012 Lesson 28. Pipelining Analogy  Pipelined laundry: overlapping execution l Parallelism improves performance §4.5.

Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.

Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.

Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP.

CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.

CMPE 421 Parallel Computer Architecture

1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.

1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.

Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.

Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.

LECTURE 7 Pipelining. DATAPATH AND CONTROL We started with the single-cycle implementation, in which a single instruction is executed over a single cycle.

Introduction to Computer Organization Pipelining.

Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards 1 Instructors: Vladimir Stojanovic and Nicholas Weaver

CS 110 Computer Architecture Lecture 11: Pipelining Instructor: Sören Schwertfeger School of Information Science and Technology.

Lecture 5. MIPS Processor Design Pipelined MIPS #1 Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212 Computer Architecture.

Lecture 18: Pipelining I.

CDA 3101 Spring 2016 Introduction to Computer Organization

Pipelining Chapter 6.

CSCI206 - Computer Organization & Programming

Pipelining concepts, datapath and hazards

Morgan Kaufmann Publishers

Morgan Kaufmann Publishers The Processor

Performance of Single-cycle Design

Morgan Kaufmann Publishers The Processor

Instructor: Justin Hsia

Single Clock Datapath With Control

Pipeline Implementation (4.6)

CDA 3101 Spring 2016 Introduction to Computer Organization

Pipelining Lessons 6 PM T a s k O r d e B C D A 30

Chapter 4 The Processor Part 3

Review: MIPS Pipeline Data and Control Paths

Morgan Kaufmann Publishers The Processor

Morgan Kaufmann Publishers The Processor

Pipelining review.

Pipelining Chapter 6.

Morgan Kaufmann Publishers Enhancing Performance with Pipelining

Lecturer: Alan Christopher

Pipelining in more detail

CSCI206 - Computer Organization & Programming

Pipelining Lessons 6 PM T a s k O r d e B C D A 30

CSCI206 - Computer Organization & Programming

Rocky K. C. Chang 6 November 2017

Data Hazards Data Hazard

Pipeline control unit (highly abstracted)

The Processor Lecture 3.6: Control Hazards

Guest Lecturer TA: Shreyas Chand

Instruction Execution Cycle

Pipeline control unit (highly abstracted)

CS203 – Advanced Computer Architecture

Pipelining: Basic Concepts

Pipeline Control unit (highly abstracted)

Morgan Kaufmann Publishers The Processor

Pipelining Hazards.

Presentation transcript:

Guest Lecturer: Justin Hsia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards Guest Lecturer: Justin Hsia 4/12/2013 Spring 2013 -- Lecture #31

Great Idea #4: Parallelism Software Hardware Parallel Requests Assigned to computer e.g. search “Garcia” Parallel Threads Assigned to core e.g. lookup, ads Parallel Instructions > 1 instruction @ one time e.g. 5 pipelined instructions Parallel Data > 1 data item @ one time e.g. add of 4 pairs of words Hardware descriptions All gates functioning in parallel at same time Warehouse Scale Computer Smart Phone Leverage Parallelism & Achieve High Performance Core … Memory Input/Output Computer Cache Memory Core Instruction Unit(s) Functional Unit(s) A0+B0 A1+B1 A2+B2 A3+B3 Logic Gates 4/12/2013 Spring 2013 -- Lecture #31

Review of Last Lecture Implementing controller for your datapath Take decoded signals from instruction and generate control signals Use “AND” and “OR” Logic scheme Pipelining improves performance by exploiting Instruction Level Parallelism 5-stage pipeline for MIPS: IF, ID, EX, MEM, WB Executes multiple instructions in parallel What can go wrong??? 4/12/2013 Spring 2013 -- Lecture #31

Agenda Pipelining Performance Structural Hazards Administrivia Data Hazards Forwarding Load Delay Slot Control Hazards 4/12/2013 Spring 2013 -- Lecture #31

Review: Pipelined Datapath Morgan Kaufmann Publishers Review: Pipelined Datapath 4/12/2013 Spring 2013 -- Lecture #31 Chapter 4 — The Processor

Pipelined Execution Representation Time IF ID EX MEM WB Each instruction has identical latency! Every instruction must take same number of steps, so some stages will idle e.g. MEM stage for any arithmetic instruction 4/12/2013 Spring 2013 -- Lecture #31

Graphical Pipeline Diagrams 1. Instruction Fetch 2. Decode/ Register Read 3. Execute 4. Memory 5. Write Back PC instruction memory +4 Register File rt rs rd ALU Data imm MUX Use datapath figure below to represent pipeline: IF ID EX Mem WB ALU I$ Reg D$ 4/12/2013 Spring 2013 -- Lecture #31

Graphical Pipeline Representation RegFile: left half is write, right half is read Time (clock cycles) I n s t r O d e I$ ALU Reg Reg I$ D$ ALU I$ Reg I$ ALU Reg D$ I$ Load Add Store Sub Or D$ Reg ALU 4/12/2013 Spring 2013 -- Lecture #31

Pipelining Performance (1/3) Morgan Kaufmann Publishers Pipelining Performance (1/3) Use Tc (“time between completion of instructions”) to measure speedup Equality only achieved if stages are balanced (i.e. take the same amount of time) If not balanced, speedup is reduced Speedup due to increased throughput Latency for each instruction does not decrease 7/24/2012 Summer 2012 -- Lecture #21 Chapter 4 — The Processor

Pipelining Performance (2/3) Morgan Kaufmann Publishers Pipelining Performance (2/3) Assume time for stages is 100ps for register read or write 200ps for other stages What is pipelined clock rate? Compare pipelined datapath with single-cycle datapath Instr Instr fetch Register read ALU op Memory access Register write Total time lw 200ps 100 ps 800ps sw 700ps R-format 600ps beq 500ps 7/24/2012 Summer 2012 -- Lecture #21 Chapter 4 — The Processor

Pipelining Performance (3/3) Morgan Kaufmann Publishers Pipelining Performance (3/3) Single-cycle Tc = 800 ps Here using Tc as “time between completion of instructions.” Pipelined Tc = 200 ps 7/24/2012 Summer 2012 -- Lecture #21 Chapter 4 — The Processor

Morgan Kaufmann Publishers Pipelining Hazards A hazard is a situation that prevents starting the next instruction in the next clock cycle Structural hazard A required resource is busy (e.g. needed in multiple stages) Data hazard Data dependency between instructions Need to wait for previous instruction to complete its data read/write Control hazard Flow of execution depends on previous instruction 4/12/2013 Spring 2013 -- Lecture #31 Chapter 4 — The Processor

Agenda Pipelining Performance Structural Hazards Administrivia Data Hazards Forwarding Load Delay Slot Control Hazards 4/12/2013 Spring 2013 -- Lecture #31

Morgan Kaufmann Publishers 1. Structural Hazards Conflict for use of a resource MIPS pipeline with a single memory? Load/Store requires memory access for data Instruction fetch would have to stall for that cycle Causes a pipeline “bubble” Hence, pipelined datapaths require separate instruction/data memories Separate L1 I$ and L1 D$ take care of this 4/12/2013 Spring 2013 -- Lecture #31 Chapter 4 — The Processor

Structural Hazard #1: Single Memory Load Instr 1 Instr 2 Instr 3 Instr 4 ALU Reg D$ I n s t r O d e Time (clock cycles) Trying to read same memory twice in same clock cycle 4/12/2013 Spring 2013 -- Lecture #31

Structural Hazard #2: Registers (1/2) Load Instr 1 Instr 2 Instr 3 Instr 4 ALU Reg D$ I n s t r O d e Time (clock cycles) Can we read and write to registers simultaneously? 4/12/2013 Spring 2013 -- Lecture #31

Structural Hazard #2: Registers (2/2) Two different solutions have been used: Split RegFile access in two: Write during 1st half and Read during 2nd half of each clock cycle Possible because RegFile access is VERY fast (takes less than half the time of ALU stage) Build RegFile with independent read and write ports Conclusion: Read and Write to registers during same clock cycle is okay 4/12/2013 Spring 2013 -- Lecture #31

Agenda Pipelining Performance Structural Hazards Administrivia Data Hazards Forwarding Load Delay Slot Control Hazards 4/12/2013 Spring 2013 -- Lecture #31

Administrivia Project 2: Performance Optimization Part 1 due Sunday (4/14) Part 2 released by Sunday night, due 4/21 Built-in performance competition for Part 2! Never too early to start looking at past exams! Dan has moved his OH because of traveling See Piazza post @1492 4/12/2013 Spring 2013 -- Lecture #31

Agenda Pipelining Performance Structural Hazards Administrivia Data Hazards Forwarding Load Delay Slot Control Hazards 4/12/2013 Spring 2013 -- Lecture #31

2. Data Hazards (1/2) Consider the following sequence of instructions: add $t0, $t1, $t2 sub $t4, $t0, $t3 and $t5, $t0, $t6 or $t7, $t0, $t8 xor $t9, $t0, $t10 4/12/2013 Spring 2013 -- Lecture #31

2. Data Hazards (2/2) Data-flow backwards in time are hazards sub $t4,$t0,$t3 ALU I$ Reg D$ and $t5,$t0,$t6 or $t7,$t0,$t8 xor $t9,$t0,$t10 add $t0,$t1,$t2 IF ID/RF EX MEM WB I n s t r O d e Time (clock cycles) 4/12/2013 Spring 2013 -- Lecture #31

Data Hazard Solution: Forwarding Forward result as soon as it is available OK that it’s not stored in RegFile yet sub $t4,$t0,$t3 ALU I$ Reg D$ and $t5,$t0,$t6 or $t7,$t0,$t8 xor $t9,$t0,$t10 add $t0,$t1,$t2 IF ID/RF EX MEM WB 4/12/2013 Spring 2013 -- Lecture #31

Datapath for Forwarding (1/2) Morgan Kaufmann Publishers Datapath for Forwarding (1/2) What changes need to be made here? 4/12/2013 Spring 2013 -- Lecture #31 Chapter 4 — The Processor

Datapath for Forwarding (2/2) Handled by forwarding unit Scan figure 4.54 on p. 368. 4/12/2013 Spring 2013 -- Lecture #31

Data Hazard: Loads (1/4) Recall: Dataflow backwards in time are hazards Can’t solve all cases with forwarding Must stall instruction dependent on load, then forward (more hardware) sub $t3,$t0,$t2 ALU I$ Reg D$ lw $t0,0($t1) IF ID/RF EX MEM WB 4/12/2013 Spring 2013 -- Lecture #31

Data Hazard: Loads (2/4) Hardware stalls pipeline lw $t0, 0($t1) Schematically, this is what we want, but in reality stalls done “horizontally” Hardware stalls pipeline Called “hardware interlock” sub $t3,$t0,$t2 ALU I$ Reg D$ bubble and $t5,$t0,$t4 or $t7,$t0,$t6 lw $t0, 0($t1) IF ID/RF EX MEM WB How to stall just part of pipeline? 4/12/2013 Spring 2013 -- Lecture #31

Data Hazard: Loads (3/4) Stall is equivalent to nop lw $t0, 0($t1) nop sub $t3,$t0,$t2 and $t5,$t0,$t4 or $t7,$t0,$t6 I$ ALU Reg D$ lw $t0, 0($t1) bubble nop 4/12/2013 Spring 2013 -- Lecture #31

Data Hazard: Loads (4/4) Slot after a load is called a load delay slot If that instruction uses the result of the load, then the hardware interlock will stall it for one cycle Letting the hardware stall the instruction in the delay slot is equivalent to putting a nop in the slot (except the latter uses more code space) Idea: Let the compiler put an unrelated instruction in that slot  no stall! 4/12/2013 Spring 2013 -- Lecture #31

Code Scheduling to Avoid Stalls Morgan Kaufmann Publishers Code Scheduling to Avoid Stalls Reorder code to avoid use of load result in the next instruction! MIPS code for D=A+B; E=A+C; # Method 1: lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t1, $t2 sw $t3, 12($t0) lw $t4, 8($t0) add $t5, $t1, $t4 sw $t5, 16($t0) # Method 2: lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2 sw $t3, 12($t0) add $t5, $t1, $t4 sw $t5, 16($t0) 11 cycles Stall! Stall! 13 cycles 4/12/2013 Spring 2013 -- Lecture #31 Chapter 4 — The Processor

Agenda More Pipelining Structural Hazards Administrivia Data Hazards Forwarding Load Delay Slot Control Hazards 4/12/2013 Spring 2013 -- Lecture #31

Morgan Kaufmann Publishers 3. Control Hazards Branch (beq, bne) determines flow of control Fetching next instruction depends on branch outcome Pipeline can’t always fetch correct instruction Still working on ID stage of branch Simple Solution: Stall on every branch until we have the new PC value How long must we stall? 4/12/2013 Spring 2013 -- Lecture #31 Chapter 4 — The Processor

Branch Stall When is comparison result available? Time (clock cycles) beq Instr 1 Instr 2 Instr 3 Instr 4 ALU Reg D$ I n s t r O d e Time (clock cycles) TWO bubbles required per branch! 4/12/2013 Spring 2013 -- Lecture #31

Summary Hazards reduce effectiveness of pipelining Structural Hazards Cause stalls/bubbles Structural Hazards Conflict in use of datapath component Data Hazards Need to wait for result of a previous instruction Control Hazards Address of next instruction uncertain/unknown More to come next lecture! 4/12/2013 Spring 2013 -- Lecture #31

No stalls with forwarding B) Question: For each code sequences below, choose one of the statements below: 1: lw $t0,0($t0) add $t1,$t0,$t0 2: addi $t2,$t0,5 addi $t4,$t1,5 3: addi $t1,$t0,1 addi $t2,$t0,2 addi $t3,$t0,2 addi $t3,$t0,4 addi $t5,$t1,5 No stalls as is A) No stalls with forwarding B) Must stall C)

Code Sequence 1 Time (clock cycles) I n s Must stall lw t r O add d e instr ALU Reg D$ I n s t r O d e Time (clock cycles) Must stall 7/25/2012 Summer 2012 -- Lecture #22

Code Sequence 2 Time (clock cycles) I n s add t addi instr ALU Reg D$ I n s t r O d e Time (clock cycles) forwarding no forwarding No stalls with forwarding 7/25/2012 Summer 2012 -- Lecture #22

Code Sequence 3 Time (clock cycles) I n No stalls as is s addi t r O d ALU Reg D$ I n s t r O d e Time (clock cycles) No stalls as is 7/25/2012 Summer 2012 -- Lecture #22