Pipelining concepts, datapath and hazards

Slides:



Advertisements
Similar presentations
PipelineCSCE430/830 Pipeline: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Prof. Yifeng Zhu, U of Maine Fall,
Advertisements

Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
CMPT 334 Computer Organization
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
Pipelining I (1) Fall 2005 Lecture 18: Pipelining I.
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 28 – CPU Design : Pipelining to Improve Performance The College Board.
CS61C L29 CPU Design : Pipelining to Improve Performance (1) Garcia, Spring 2007 © UCB Wirelessly recharge batt  Powercast & Philips have developed a.
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 1.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
CS61C L28 CPU Design : Pipelining to Improve Performance I (1) Garcia, Fall 2006 © UCB 100 Msites!  Sometimes it’s nice to stop and reflect. The web was.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 9, 2002 Topic: Pipelining Basics.
CS61C L21 Pipelining I (1) Chae, Summer 2008 © UCB Albert Chae, Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #21 – Pipelining.
Scott Beamer, Instructor
CS 61C L30 Introduction to Pipelined Execution (1) Garcia, Fall 2004 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 17 - Pipelined.
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Pipelining Instructors: Krste Asanovic & Vladimir Stojanovic
CS1104: Computer Organisation School of Computing National University of Singapore.
Computer Science Education
Analogy: Gotta Do Laundry
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
ECE 232 L18.Pipeline.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 18 Pipelining.
CS 61C: Great Ideas in Computer Architecture Pipelining & Hazards 1 Instructors: John Wawrzynek & Vladimir Stojanovic
Cs 152 L1 3.1 DAP Fa97,  U.CB Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Multiple tasks.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
CS 61C L5.2.2 Pipelining I (1) K. Meinz, Summer 2004 © UCB CS61C : Machine Structures Lecture Pipelining I Kurt Meinz inst.eecs.berkeley.edu/~cs61c.
Pipelining Example Laundry Example: Three Stages
Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 10 Computer Hardware Design (Pipeline Datapath and Control Design) Prof. Dr.
CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards 1 Instructors: Vladimir Stojanovic and Nicholas Weaver
CS 110 Computer Architecture Lecture 11: Pipelining Instructor: Sören Schwertfeger School of Information Science and Technology.
Lecture 5. MIPS Processor Design Pipelined MIPS #1 Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212 Computer Architecture.
Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:
Lecture 18: Pipelining I.
Computer Organization
Pipelines An overview of pipelining
Review: Instruction Set Evolution
Morgan Kaufmann Publishers
Performance of Single-cycle Design
CMSC 611: Advanced Computer Architecture
Instructor: Justin Hsia
Single Clock Datapath With Control
Pipeline Implementation (4.6)
ECE232: Hardware Organization and Design
Chapter 3: Pipelining 순천향대학교 컴퓨터학부 이 상 정 Adapted from
Chapter 4 The Processor Part 3
Chapter 4 The Processor Part 2
CS 61C: Great Ideas in Computer Architecture Control and Pipelining
Inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 20 CPU Design: Control II & Pipelining I TA Noah Johnson Greet class.
Morgan Kaufmann Publishers Enhancing Performance with Pipelining
Lecturer: Alan Christopher
Serial versus Pipelined Execution
Pipelining Lessons 6 PM T a s k O r d e B C D A 30
Rocky K. C. Chang 6 November 2017
An Introduction to pipelining
Chapter 8. Pipelining.
CS203 – Advanced Computer Architecture
Pipelining Appendix A and Chapter 3.
Morgan Kaufmann Publishers The Processor
Guest Lecturer: Justin Hsia
A relevant question Assuming you’ve got: One washer (takes 30 minutes)
Recall: Performance Evaluation
Presentation transcript:

Pipelining concepts, datapath and hazards Lecture 17 CDA 3103 07-16-2014

Boolean Exprs for Controller rtype = ~op5  ~op4  ~op3  ~op2  ~op1  ~op0, ori = ~op5  ~op4  op3  op2  ~op1  op0 lw = op5  ~op4  ~op3  ~op2  op1  op0 sw = op5  ~op4  op3  ~op2  op1  op0 beq = ~op5  ~op4  ~op3  op2  ~op1  ~op0 jump = ~op5  ~op4  ~op3  ~op2  op1  ~op0 Instruction<31:0> Inst Memory Op 0-5 are really Instruction bits 26-31 <0:5> <26:31> <21:25> <16:20> <11:15> <0:15> Func 0-5 are really Instruction bits 0-5 Adr Op Fun Rt Rs Rd Imm16 add = rtype  func5  ~func4  ~func3  ~func2  ~func1  ~func0 sub = rtype  func5  ~func4  ~func3  ~func2  func1  ~func0 ADD 0000 00ss ssst tttt dddd d000 0010 0000 SUB 0000 00ss ssst tttt dddd d000 0010 0010 ORI 0011 01ss ssst tttt iiii iiii iiii iiii LW 1000 11ss ssst tttt iiii iiii iiii iiii SW 1010 11ss ssst tttt iiii iiii iiii iiii BEQ 0001 00ss ssst tttt iiii iiii iiii iiii JUMP 0000 10ii iiii iiii iiii iiii iiii iiii How do we implement this in gates? Dr Dan Garcia

Boolean Exprs for Controller RegDst = add + sub ALUSrc = ori + lw + sw MemtoReg = lw RegWrite = add + sub + ori + lw MemWrite = sw nPCsel = beq Jump = jump ExtOp = lw + sw ALUctr[0] = sub + beq ALUctr[1] = ori (assume ALUctr is 00 ADD, 01 SUB, 10 OR) ADD 0000 00ss ssst tttt dddd d000 0010 0000 SUB 0000 00ss ssst tttt dddd d000 0010 0010 ORI 0011 01ss ssst tttt iiii iiii iiii iiii LW 1000 11ss ssst tttt iiii iiii iiii iiii SW 1010 11ss ssst tttt iiii iiii iiii iiii BEQ 0001 00ss ssst tttt iiii iiii iiii iiii JUMP 0000 10ii iiii iiii iiii iiii iiii iiii How do we implement this in gates? Dr Dan Garcia

Controller Implementation opcode func RegDst add ALUSrc sub MemtoReg ori RegWrite “AND” logic “OR” logic lw MemWrite nPCsel sw Jump beq ExtOp jump ALUctr[0] ALUctr[1] Dr Dan Garcia

Call home, we’ve made HW/SW contact! temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; High Level Language Program (e.g., C) Compiler lw $t0, 0($2) lw $t1, 4($2) sw $t1, 0($2) sw $t0, 4($2) Assembly Language Program (e.g.,MIPS) Assembler Machine Language Program (MIPS) 0000 1001 1100 0110 1010 1111 0101 1000 1010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111 Machine Interpretation Hardware Architecture Description (e.g., block diagrams) Architecture Implementation Logic Circuit Description (Circuit Schematic Diagrams) Dr Dan Garcia

Review: Single-cycle Processor Five steps to design a processor: 1. Analyze instruction set  datapath requirements 2. Select set of datapath components & establish clock methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. 5. Assemble the control logic Formulate Logic Equations Design Circuits Control Datapath Memory Processor Input Output Dr Dan Garcia

Agenda Pipelining Performance Structural Hazards Data Hazards Forwarding Load Delay Slot Control Hazards Dr Dan Garcia

Morgan Kaufmann Publishers 11 September, 2018 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory  register file  ALU  data memory  register file Not feasible to vary period for different instructions Violates design principle Making the common case fast We will improve performance by pipelining Chapter 4 — The Processor

Single Cycle Performance Morgan Kaufmann Publishers 11 September, 2018 Single Cycle Performance Assume time for actions are 100ps for register read or write; 200ps for other events Clock rate is? Instr Instr fetch Register read ALU op Memory access Register write Total time lw 200ps 100 ps 800ps sw 700ps R-format 600ps beq 500ps 1.25 GHz What can we do to improve clock rate? Will this improve performance as well? Want increased clock rate to mean faster programs Dr Dan Garcia Chapter 4 — The Processor

Single Cycle Performance Morgan Kaufmann Publishers 11 September, 2018 Single Cycle Performance Assume time for actions are 100ps for register read or write; 200ps for other events Clock rate is? Instr Instr fetch Register read ALU op Memory access Register write Total time lw 200ps 100 ps 800ps sw 700ps R-format 600ps beq 500ps 1.25 GHz What can we do to improve clock rate? Will this improve performance as well? Want increased clock rate to mean faster programs Dr Dan Garcia Chapter 4 — The Processor

Gotta Do Laundry Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, fold, and put away Washer takes 30 minutes Dryer takes 30 minutes “Folder” takes 30 minutes “Stasher” takes 30 minutes to put clothes into drawers A B C D Dr Dan Garcia

Sequential Laundry Sequential laundry takes 8 hours for 4 loads 30 Time 6 PM 7 8 9 10 11 12 1 2 AM T a s k O r d e B C D A Dr Dan Garcia

Pipelined Laundry Pipelined laundry takes 3.5 hours for 4 loads! 12 2 AM 6 PM 7 8 9 10 11 1 Time 30 T a s k O r d e B C D A Dr Dan Garcia

Pipelining Lessons (1/2) 6 PM 7 8 9 Time B C D A 30 T a s k O r d e Pipelining doesn’t help latency of single task, it helps throughput of entire workload Multiple tasks operating simultaneously using different resources Potential speedup = Number pipe stages Time to “fill” pipeline and time to “drain” it reduces speedup: 2.3X v. 4X in this example Dr Dan Garcia

Pipelining Lessons (2/2) 6 PM 7 8 9 Time B C D A 30 T a s k O r d e Suppose new Washer takes 20 minutes, new Stasher takes 20 minutes. How much faster is pipeline? Pipeline rate limited by slowest pipeline stage Unbalanced lengths of pipe stages reduces speedup Dr Dan Garcia

Steps in Executing MIPS 1) IFtch: Instruction Fetch, Increment PC 2) Dcd: Instruction Decode, Read Registers 3) Exec: Mem-ref: Calculate Address Arith-log: Perform Operation 4) Mem: Load: Read Data from Memory Store: Write Data to Memory 5) WB: Write Data Back to Register Dr Dan Garcia

Single Cycle Datapath instruction memory +4 rt rs rd registers ALU PC instruction memory +4 rt rs rd registers ALU Data imm 2. Decode/ Register Read 1. Instruction Fetch 5. Write Back 3. Execute 4. Memory Dr Dan Garcia

Pipeline registers Need registers between stages PC instruction memory +4 rt rs rd registers ALU Data imm 2. Decode/ Register Read 1. Instruction Fetch 5. Write Back 3. Execute 4. Memory Need registers between stages To hold information produced in previous cycle Dr Dan Garcia

More Detailed Pipeline Morgan Kaufmann Publishers 11 September, 2018 More Detailed Pipeline Dr Dan Garcia Chapter 4 — The Processor

Morgan Kaufmann Publishers 11 September, 2018 IF for Load, Store, … Dr Dan Garcia Chapter 4 — The Processor

Morgan Kaufmann Publishers 11 September, 2018 ID for Load, Store, … Dr Dan Garcia Chapter 4 — The Processor

Morgan Kaufmann Publishers 11 September, 2018 EX for Load Dr Dan Garcia Chapter 4 — The Processor

Morgan Kaufmann Publishers 11 September, 2018 MEM for Load Dr Dan Garcia Chapter 4 — The Processor

Morgan Kaufmann Publishers 11 September, 2018 WB for Load – Oops! Wrong register number Chapter 4 — The Processor

Corrected Datapath for Load Morgan Kaufmann Publishers 11 September, 2018 Corrected Datapath for Load Dr Dan Garcia Chapter 4 — The Processor

Pipelined Execution Representation Time IF ID EX MEM WB Each instruction has identical latency! Every instruction must take same number of steps, so some stages will idle e.g. MEM stage for any arithmetic instruction Dr Dan Garcia

Graphical Pipeline Diagrams 1. Instruction Fetch 2. Decode/ Register Read 3. Execute 4. Memory 5. Write Back PC instruction memory +4 Register File rt rs rd ALU Data imm MUX Use datapath figure below to represent pipeline: IF ID EX Mem WB ALU I$ Reg D$ Dr Dan Garcia

Graphical Pipeline Representation RegFile: left half is write, right half is read Time (clock cycles) I n s t r O d e I$ ALU Reg Reg I$ D$ ALU I$ Reg I$ ALU Reg D$ I$ Load Add Store Sub Or D$ Reg ALU Dr Dan Garcia

Pipelining Performance (1/3) Morgan Kaufmann Publishers Pipelining Performance (1/3) Use Tc (“time between completion of instructions”) to measure speedup Equality only achieved if stages are balanced (i.e. take the same amount of time) If not balanced, speedup is reduced Speedup due to increased throughput Latency for each instruction does not decrease Dr Dan Garcia Chapter 4 — The Processor

Pipelining Performance (2/3) Morgan Kaufmann Publishers Pipelining Performance (2/3) Assume time for stages is 100ps for register read or write 200ps for other stages What is pipelined clock rate? Compare pipelined datapath with single-cycle datapath Instr Instr fetch Register read ALU op Memory access Register write Total time lw 200ps 100 ps 800ps sw 700ps R-format 600ps beq 500ps Dr Dan Garcia Chapter 4 — The Processor

Pipelining Performance (3/3) Morgan Kaufmann Publishers Pipelining Performance (3/3) Single-cycle Tc = 800 ps Here using Tc as “time between completion of instructions.” Pipelined Tc = 200 ps Dr Dan Garcia Chapter 4 — The Processor

Morgan Kaufmann Publishers Pipelining Hazards A hazard is a situation that prevents starting the next instruction in the next clock cycle Structural hazard A required resource is busy (e.g. needed in multiple stages) Data hazard Data dependency between instructions Need to wait for previous instruction to complete its data read/write Control hazard Flow of execution depends on previous instruction Dr Dan Garcia Chapter 4 — The Processor

Agenda Pipelining Performance Structural Hazards Data Hazards Forwarding Load Delay Slot Control Hazards Dr Dan Garcia

Morgan Kaufmann Publishers 1. Structural Hazards Conflict for use of a resource MIPS pipeline with a single memory? Load/Store requires memory access for data Instruction fetch would have to stall for that cycle Causes a pipeline “bubble” Hence, pipelined datapaths require separate instruction/data memories Separate L1 I$ and L1 D$ take care of this Dr Dan Garcia Chapter 4 — The Processor

Structural Hazard #1: Single Memory Load Instr 1 Instr 2 Instr 3 Instr 4 ALU Reg D$ I n s t r O d e Time (clock cycles) Trying to read same memory twice in same clock cycle Dr Dan Garcia

Structural Hazard #2: Registers (1/2) Load Instr 1 Instr 2 Instr 3 Instr 4 ALU Reg D$ I n s t r O d e Time (clock cycles) Can we read and write to registers simultaneously? Dr Dan Garcia

Structural Hazard #2: Registers (2/2) Two different solutions have been used: Split RegFile access in two: Write during 1st half and Read during 2nd half of each clock cycle Possible because RegFile access is VERY fast (takes less than half the time of ALU stage) Build RegFile with independent read and write ports Conclusion: Read and Write to registers during same clock cycle is okay Dr Dan Garcia

Agenda Pipelining Performance Structural Hazards Data Hazards Forwarding Load Delay Slot Control Hazards Dr Dan Garcia

2. Data Hazards (1/2) Consider the following sequence of instructions: add $t0, $t1, $t2 sub $t4, $t0, $t3 and $t5, $t0, $t6 or $t7, $t0, $t8 xor $t9, $t0, $t10 Dr Dan Garcia

2. Data Hazards (2/2) Data-flow backwards in time are hazards sub $t4,$t0,$t3 ALU I$ Reg D$ and $t5,$t0,$t6 or $t7,$t0,$t8 xor $t9,$t0,$t10 add $t0,$t1,$t2 IF ID/RF EX MEM WB I n s t r O d e Time (clock cycles) Dr Dan Garcia

Data Hazard Solution: Forwarding Forward result as soon as it is available OK that it’s not stored in RegFile yet sub $t4,$t0,$t3 ALU I$ Reg D$ and $t5,$t0,$t6 or $t7,$t0,$t8 xor $t9,$t0,$t10 add $t0,$t1,$t2 IF ID/RF EX MEM WB Dr Dan Garcia

Datapath for Forwarding (1/2) Morgan Kaufmann Publishers Datapath for Forwarding (1/2) What changes need to be made here? Dr Dan Garcia Chapter 4 — The Processor

Datapath for Forwarding (2/2) Handled by forwarding unit Scan figure 4.54 on p. 368. Dr Dan Garcia

Data Hazard: Loads (1/4) Recall: Dataflow backwards in time are hazards Can’t solve all cases with forwarding Must stall instruction dependent on load, then forward (more hardware) sub $t3,$t0,$t2 ALU I$ Reg D$ lw $t0,0($t1) IF ID/RF EX MEM WB Dr Dan Garcia

Data Hazard: Loads (2/4) Hardware stalls pipeline lw $t0, 0($t1) Schematically, this is what we want, but in reality stalls done “horizontally” Hardware stalls pipeline Called “hardware interlock” sub $t3,$t0,$t2 ALU I$ Reg D$ bubble and $t5,$t0,$t4 or $t7,$t0,$t6 lw $t0, 0($t1) IF ID/RF EX MEM WB How to stall just part of pipeline? Dr Dan Garcia

Data Hazard: Loads (3/4) Stall is equivalent to nop lw $t0, 0($t1) nop sub $t3,$t0,$t2 and $t5,$t0,$t4 or $t7,$t0,$t6 I$ ALU Reg D$ lw $t0, 0($t1) bubble nop Dr Dan Garcia

Data Hazard: Loads (4/4) Slot after a load is called a load delay slot If that instruction uses the result of the load, then the hardware interlock will stall it for one cycle Letting the hardware stall the instruction in the delay slot is equivalent to putting a nop in the slot (except the latter uses more code space) Idea: Let the compiler put an unrelated instruction in that slot  no stall! Dr Dan Garcia

Code Scheduling to Avoid Stalls Morgan Kaufmann Publishers Code Scheduling to Avoid Stalls Reorder code to avoid use of load result in the next instruction! MIPS code for D=A+B; E=A+C; # Method 1: lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t1, $t2 sw $t3, 12($t0) lw $t4, 8($t0) add $t5, $t1, $t4 sw $t5, 16($t0) # Method 2: lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2 sw $t3, 12($t0) add $t5, $t1, $t4 sw $t5, 16($t0) 11 cycles Stall! Stall! 13 cycles Dr Dan Garcia Chapter 4 — The Processor

Agenda More Pipelining Structural Hazards Data Hazards Control Hazards Forwarding Load Delay Slot Control Hazards Dr Dan Garcia

Morgan Kaufmann Publishers 3. Control Hazards Branch (beq, bne) determines flow of control Fetching next instruction depends on branch outcome Pipeline can’t always fetch correct instruction Still working on ID stage of branch Simple Solution: Stall on every branch until we have the new PC value How long must we stall? Dr Dan Garcia Chapter 4 — The Processor

Branch Stall When is comparison result available? Time (clock cycles) beq Instr 1 Instr 2 Instr 3 Instr 4 ALU Reg D$ I n s t r O d e Time (clock cycles) TWO bubbles required per branch! Dr Dan Garcia

Summary Hazards reduce effectiveness of pipelining Structural Hazards Cause stalls/bubbles Structural Hazards Conflict in use of datapath component Data Hazards Need to wait for result of a previous instruction Control Hazards Address of next instruction uncertain/unknown Dr Dan Garcia

No stalls with forwarding B) Question: For each code sequences below, choose one of the statements below: 1: lw $t0,0($t0) add $t1,$t0,$t0 2: addi $t2,$t0,5 addi $t4,$t1,5 3: addi $t1,$t0,1 addi $t2,$t0,2 addi $t3,$t0,2 addi $t3,$t0,4 addi $t5,$t1,5 No stalls as is A) No stalls with forwarding B) Must stall C) Dr Dan Garcia

Code Sequence 1 Time (clock cycles) I n s Must stall lw t r O add d e instr ALU Reg D$ I n s t r O d e Time (clock cycles) Must stall Dr Dan Garcia

No stalls with forwarding B) Question: For each code sequences below, choose one of the statements below: 1: lw $t0,0($t0) add $t1,$t0,$t0 2: addi $t2,$t0,5 addi $t4,$t1,5 3: addi $t1,$t0,1 addi $t2,$t0,2 addi $t3,$t0,2 addi $t3,$t0,4 addi $t5,$t1,5 No stalls as is A) No stalls with forwarding B) Must stall C) Dr Dan Garcia

Code Sequence 2 Time (clock cycles) I n s add t addi instr ALU Reg D$ I n s t r O d e Time (clock cycles) forwarding no forwarding No stalls with forwarding Dr Dan Garcia

No stalls with forwarding B) Question: For each code sequences below, choose one of the statements below: 1: lw $t0,0($t0) add $t1,$t0,$t0 2: addi $t2,$t0,5 addi $t4,$t1,5 3: addi $t1,$t0,1 addi $t2,$t0,2 addi $t3,$t0,2 addi $t3,$t0,4 addi $t5,$t1,5 No stalls as is A) No stalls with forwarding B) Must stall C) Dr Dan Garcia

Code Sequence 3 Time (clock cycles) I n No stalls as is s addi t r O d ALU Reg D$ I n s t r O d e Time (clock cycles) No stalls as is Dr Dan Garcia