Morgan Kaufmann Publishers

Slides:

Advertisements

Similar presentations

Pipeline Example: cycle 1 lw R10,9(R1) sub R11,R2, R3 and R12,R4, R5 or R13,R6, R7.

Advertisements

Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.

CMPT 334 Computer Organization

1 Today  All HW1 turned in on time, this is great!  HW2 will be out soon —You will work on procedure calls/stack/etc.  Lab1 will be out soon (possibly.

Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.

Pipelined Datapath and Control (Lecture #13) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan

Chapter Six Enhancing Performance with Pipelining

1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.

Computer Structure - Datapath and Control Goal: Design a Datapath  We will design the datapath of a processor that includes a subset of the MIPS instruction.

1  What is the most boring household activity?. 2 A relevant question  Assuming you’ve got: —One washer (takes 30 minutes) —One drier (takes 40 minutes)

CSE378 Pipelining1 Pipelining Basic concept of assembly line –Split a job A into n sequential subjobs (A 1,A 2,…,A n ) with each A i taking approximately.

Supplementary notes for pipelining LW ____,____ SUB ____,____,____ BEQ ____,____,____ ; assume that, condition for branch is not satisfied OR ____,____,____.

COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections

Automobile Manufacturing 1. Build frame. 60 min. 2. Add engine. 50 min. 3. Build body. 80 min. 4. Paint. 40 min. 5. Finish.45 min. 275 min. Latency: Time.

Ch6a- 2 EE/CS/CPE Computer Organization  Seattle Pacific University Automobile Manufacturing 1. Build frame. 60 min. 2. Add engine. 50 min. 3.

Lecture 8: Processors, Introduction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014,

Pipelined Datapath and Control

1 A pipeline diagram  A pipeline diagram shows the execution of a series of instructions. —The instruction sequence is shown vertically, from top to bottom.

Pipeline Computer Organization II 1 Pipelining Analogy Pipelined laundry: overlapping execution – Parallelism improves performance Four loads: – Speedup.

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 8: MIPS Pipelined.

Computer Organization CS224 Fall 2012 Lesson 28. Pipelining Analogy  Pipelined laundry: overlapping execution l Parallelism improves performance §4.5.

Morgan Kaufmann Publishers

Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.

Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.

Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP.

Computer Organization CS224 Chapter 4 Part b The Processor Spring 2010 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture.

1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

Electrical and Computer Engineering University of Cyprus LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.

1 A single-cycle MIPS processor  An instruction set architecture is an interface that defines the hardware operations which are available to software.

December 26, 2015©2003 Craig Zilles (derived from slides by Howard Huang) 1 A single-cycle MIPS processor  As previously discussed, an instruction set.

Performance of Single-cycle Design

Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.

Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.

February 22, 2016©2003 Craig Zilles (derived from slides by Howard Huang) 1 A single-cycle MIPS processor  As previously discussed, an instruction set.

CDA 3101 Summer 2003 Introduction to Computer Organization Pipeline Control And Pipeline Hazards 17 July 2003.

Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.

Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.

1 The final datapath. 2 Control  The control unit is responsible for setting all the control signals so that each instruction is executed properly. —The.

Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:

Computer Organization

Stalling delays the entire pipeline

Single-Cycle Datapath and Control

Note how everything goes left to right, except …

CDA 3101 Spring 2016 Introduction to Computer Organization

Computer Architecture

IT 251 Computer Organization and Architecture

Performance of Single-cycle Design

Morgan Kaufmann Publishers The Processor

Single Clock Datapath With Control

ECS 154B Computer Architecture II Spring 2009

Forwarding Now, we’ll introduce some problems that data hazards can cause for our pipelined processor, and show how to handle them with forwarding.

Review: MIPS Pipeline Data and Control Paths

Morgan Kaufmann Publishers The Processor

Chapter 4 The Processor Part 2

Single-cycle datapath, slightly rearranged

A pipeline diagram Clock cycle lw $t0, 4($sp) IF ID

Systems Architecture II

The Processor Lecture 3.4: Pipelining Datapath and Control

The Processor Lecture 3.2: Building a Datapath with Control

Instruction Execution Cycle

Morgan Kaufmann Publishers The Processor

Introduction to Computer Organization and Architecture

A relevant question Assuming you’ve got: One washer (takes 30 minutes)

The Processor: Datapath & Control.

©2003 Craig Zilles (derived from slides by Howard Huang)

Processor: Datapath and Control

Pipelined datapath and control

Presentation transcript:

Morgan Kaufmann Publishers 17 April, 2017 Pipeline Performance Assume time for stages is 100ps for register read or write 200ps for other stages Compare pipelined datapath with single-cycle datapath Instr Instr fetch Register read ALU op Memory access Register write Total time lw 200ps 100 ps 800ps sw 700ps R-format 600ps beq 500ps Chapter 4 — The Processor

Morgan Kaufmann Publishers 17 April, 2017 Pipeline Performance Single-cycle (Tc= 800ps) Pipelined (Tc= 200ps) Chapter 4 — The Processor

Morgan Kaufmann Publishers 17 April, 2017 Pipeline Speedup If all stages are balanced i.e., all take the same time Time between instructionspipelined = Time between instructionsnonpipelined Number of stages If not balanced, speedup is less Speedup due to increased throughput Latency (time for each instruction) does not decrease Chapter 4 — The Processor

Pipelining and ISA Design Morgan Kaufmann Publishers 17 April, 2017 Pipelining and ISA Design MIPS ISA designed for pipelining All instructions are 32-bits Easier to fetch and decode in one cycle c.f. x86: 1- to 17-byte instructions Few and regular instruction formats Can decode and read registers in one step Load/store addressing Can calculate address in 3rd stage, access memory in 4th stage Alignment of memory operands Memory access takes only one cycle Chapter 4 — The Processor

Improving performance Morgan Kaufmann Publishers 17 April, 2017 Improving performance Two ideas for improving performance: Spilt each instruction into multiple steps, each taking 1 cycle steps: IF (instruction fetch), ID (instruction decode), EX (execute ALU operation), MEM (memory access), WB (register write-back) slow instructions take more cycles than fast instructions known as a multi-cycle implementation Crucial observation: each instruction uses only a portion of the datapath in each step can overlap instructions; each uses one portion of the datapath known as a pipelined implementation Examples of pipelining: any assembly process (cars, sandwiches), multiple loads of laundry (washer + dryer can be pipelined), etc. 5 Chapter 4 — The Processor

Pipelining not just Multiprocessing Morgan Kaufmann Publishers 17 April, 2017 Pipelining not just Multiprocessing Pipelining does involve parallel processing, but in a specific way Both multiprocessing and pipelining relate to the processing of multiple “things” using multiple “functional units” In multiprocessing, each thing is processed entirely by a single functional unit e.g. multiple lanes at the supermarket In pipelining, each thing is broken into a sequence of pieces, where each piece is handled by a different (specialized) functional unit e.g. checker vs. bagger Pipelining and multiprocessing are not mutually exclusive Modern processors do both, with multiple pipelines (e.g. superscalar) Pipelining is a general-purpose efficiency technique; used elsewhere in CS: Networking, I/O devices, server software architecture 6 Chapter 4 — The Processor

Instruction Fetch (IF) Morgan Kaufmann Publishers 17 April, 2017 Instruction Fetch (IF) While IF is executing, the rest of the data path is sitting idle… Read address Instruction memory [31-0] Write data Data MemWrite MemRead 1 M u x MemToReg Sign extend ALUSrc Result Zero ALU ALUOp I [15 - 0] I [25 - 21] I [20 - 16] I [15 - 11] RegDst register 1 register 2 register data 2 data 1 Registers RegWrite 7 Chapter 4 — The Processor

Instruction Decode (ID) Morgan Kaufmann Publishers 17 April, 2017 Instruction Decode (ID) Then while ID is executing, the IF-related portion becomes idle… Read address Instruction memory [31-0] Write data Data MemWrite MemRead 1 M u x MemToReg Sign extend ALUSrc Result Zero ALU ALUOp I [15 - 0] I [25 - 21] I [20 - 16] I [15 - 11] RegDst register 1 register 2 register data 2 data 1 Registers RegWrite 8 Chapter 4 — The Processor

Morgan Kaufmann Publishers 17 April, 2017 Execute (EX) ..and so on for the EX portion… Read address Instruction memory [31-0] Write data Data MemWrite MemRead 1 M u x MemToReg Sign extend ALUSrc Result Zero ALU ALUOp I [15 - 0] I [25 - 21] I [20 - 16] I [15 - 11] RegDst register 1 register 2 register data 2 data 1 Registers RegWrite 9 Chapter 4 — The Processor

Morgan Kaufmann Publishers 17 April, 2017 Memory (MEM) …the MEM portion… Read address Instruction memory [31-0] Write data Data MemWrite MemRead 1 M u x MemToReg Sign extend ALUSrc Result Zero ALU ALUOp I [15 - 0] I [25 - 21] I [20 - 16] I [15 - 11] RegDst register 1 register 2 register data 2 data 1 Registers RegWrite 10 Chapter 4 — The Processor

Morgan Kaufmann Publishers 17 April, 2017 Write back (WB) …and the WB portion RegWrite MemWrite MemToReg Read address Instruction [31-0] I [25 - 21] Read register 1 Read data 1 ALU Read address Read data 1 M u x I [20 - 16] Read register 2 Zero Instruction memory Read data 2 M u x 1 M u x 1 Result Write address Write register Data memory Write data Registers I [15 - 11] Write data ALUOp MemRead ALUSrc RegDst I [15 - 0] Sign extend 11 Chapter 4 — The Processor

Decoding and fetching together Morgan Kaufmann Publishers 17 April, 2017 Decoding and fetching together Why don’t we go ahead and fetch the next instruction while we’re decoding the first one? Fetch 2nd Decode 1st instruction Instruction memory [31-0] Read address Write data Data MemWrite MemRead 1 M u x MemToReg Sign extend ALUSrc Result Zero ALU ALUOp I [15 - 0] I [25 - 21] I [20 - 16] I [15 - 11] RegDst register 1 register 2 register data 2 data 1 Registers RegWrite 12 Chapter 4 — The Processor

Executing, decoding and fetching Morgan Kaufmann Publishers 17 April, 2017 Executing, decoding and fetching Similarly, once the first instruction enters its Execute stage, we can go ahead and decode the second instruction. But now the instruction memory is free again, so we can fetch the third instruction! Fetch 3rd Decode 2nd Execute 1st Read address Instruction memory [31-0] Write data Data MemWrite MemRead 1 M u x MemToReg Sign extend ALUSrc Result Zero ALU ALUOp I [15 - 0] I [25 - 21] I [20 - 16] I [15 - 11] RegDst register 1 register 2 register data 2 data 1 Registers RegWrite 13 Chapter 4 — The Processor

Break datapath into 5 stages Morgan Kaufmann Publishers 17 April, 2017 Break datapath into 5 stages Each stage has its own functional units Full pipeline  the datapath is simultaneously working on 5 instructions! IF ID EXE MEM WB RegWrite MemWrite MemToReg Read address Instruction [31-0] I [25 - 21] Read register 1 Read data 1 ALU Read address Read data 1 M u x I [20 - 16] Read register 2 Zero Instruction memory Read data 2 M u x 1 M u x 1 Result Write address Write register Data memory Write data Registers I [15 - 11] Write data ALUOp MemRead ALUSrc RegDst I [15 - 0] Sign extend newest oldest 14 Chapter 4 — The Processor

Morgan Kaufmann Publishers 17 April, 2017 A pipeline diagram Clock cycle 1 2 3 4 5 6 7 8 9 lw $t0, 4($sp) IF ID EX MEM WB sub $v0, $a0, $a1 and $t1, $t2, $t3 or $s0, $s1, $s2 addi $sp, $sp, -4 A pipeline diagram shows the execution of a series of instructions The instruction sequence is shown vertically, from top to bottom Clock cycles are shown horizontally, from left to right Each instruction is divided into its component stages This clearly indicates the overlapping of instructions. For example, there are three instructions active in the third cycle above. The “lw” instruction is in its Execute stage. Simultaneously, the “sub” is in its Instruction Decode stage. Also, the “and” instruction is just being fetched. 15 Chapter 4 — The Processor

Morgan Kaufmann Publishers 17 April, 2017 Pipeline terminology Clock cycle 1 2 3 4 5 6 7 8 9 lw $t0, 4($sp) IF ID EX MEM WB sub $v0, $a0, $a1 and $t1, $t2, $t3 or $s0, $s1, $s2 add $sp, $sp, -4 filling full emptying The pipeline depth is the number of stages—in this case, five In the first four cycles here, the pipeline is filling, since there are unused functional units In cycle 5, the pipeline is full. Five instructions are being executed simultaneously, so all hardware units are in use In cycles 6-9, the pipeline is emptying 16 Chapter 4 — The Processor

Pipelining Performance Morgan Kaufmann Publishers 17 April, 2017 Pipelining Performance Clock cycle 1 2 3 4 5 6 7 8 9 lw $t0, 4($sp) IF ID EX MEM WB lw $t1, 8($sp) lw $t2, 12($sp) lw $t3, 16($sp) lw $t4, 20($sp) filling Execution time on ideal pipeline: time to fill the pipeline + one cycle per instruction How long for N instructions? k  1 + N, where k = pipeline depth Alternate way of arriving at this formula: k cycles for the first instruction, plus 1 for each of the remaining N  1 instructions. Compare this pipelined implementation (2ns clock period) vs. a single cycle implementation (8ns clock period). How much faster is pipelining for N=1000 ? 17 Chapter 4 — The Processor

Pipeline Datapath: Resource Requirements Morgan Kaufmann Publishers 17 April, 2017 Pipeline Datapath: Resource Requirements Clock cycle 1 2 3 4 5 6 7 8 9 lw $t0, 4($sp) IF ID EX MEM WB lw $t1, 8($sp) lw $t2, 12($sp) lw $t3, 16($sp) lw $t4, 20($sp) We need to perform several operations in the same cycle. Increment the PC and add registers at the same time. Fetch one instruction while another one reads or writes data. What does that mean for our hardware? Separate ADDER and ALU Two memories (instruction memory and data memory) 18 Chapter 4 — The Processor

Single-cycle datapath, slightly rearranged Morgan Kaufmann Publishers 17 April, 2017 Single-cycle datapath, slightly rearranged 1 PCSrc 4 Add Add P C Shift left 2 RegWrite Read register 1 Read data 1 MemWrite ALU Read address Instruction [31-0] Zero Read register 2 Read data 2 1 Result Address Write register Data memory Instruction memory MemToReg Registers ALUOp Write data ALUSrc Write data Read data 1 Instr [15 - 0] Sign extend RegDst MemRead Instr [20 - 16] 1 Instr [15 - 11] 19 Chapter 4 — The Processor

Morgan Kaufmann Publishers 17 April, 2017 Pipeline registers In pipelining, we divide instruction execution into multiple cycles IF ID EX MEM WB Information computed during one cycle may be needed in a later cycle: Instruction read in IF stage determines which registers are fetched in ID stage, what immediate is used for EX stage, and what destination register is for WB Register values read in ID are used in EX and/or MEM stages ALU output produced in EX is an effective address for MEM or a result for WB A lot of information to save! Saved in intermediate registers called pipeline registers The registers are named for the stages they connect: IF/ID ID/EX EX/MEM MEM/WB No register is needed after the WB stage, because after WB the instruction is done 20 Chapter 4 — The Processor

Morgan Kaufmann Publishers 17 April, 2017 Pipelined datapath 1 PCSrc 4 IF/ID ID/EX EX/MEM MEM/WB Add Add P C Shift left 2 RegWrite Read register 1 Read data 1 MemWrite ALU Read address Instruction [31-0] Zero Read register 2 Read data 2 1 Result Address Write register Data memory Instruction memory MemToReg Registers ALUOp Write data ALUSrc Write data Read data 1 Instr [15 - 0] Sign extend RegDst MemRead Instr [20 - 16] 1 Instr [15 - 11] 21 Chapter 4 — The Processor

Propagating values forward Morgan Kaufmann Publishers 17 April, 2017 Propagating values forward Data values required later propagated through the pipeline registers The most extreme example is the destination register (rd or rt) It is retrieved in IF, but isn’t updated until the WB Thus, it must be passed through all pipeline stages, as shown in red on the next slide Notice that we can’t keep a single “instruction register,” because the pipelined machine needs to fetch a new instruction every clock cycle 22 Chapter 4 — The Processor

The destination register Morgan Kaufmann Publishers 17 April, 2017 The destination register 1 PCSrc 4 IF/ID ID/EX EX/MEM MEM/WB Add Add P C Shift left 2 RegWrite Read register 1 Read data 1 MemWrite ALU Read address Instruction [31-0] Zero Read register 2 Read data 2 1 Result Address Write register Data memory Instruction memory MemToReg Registers ALUOp Write data ALUSrc Write data Read data 1 Instr [15 - 0] Sign extend RegDst MemRead Instr [20 - 16] 1 Instr [15 - 11] 23 Chapter 4 — The Processor

What about control signals? Morgan Kaufmann Publishers 17 April, 2017 What about control signals? Control signals generated similar to the single-cycle processor in the ID stage, the processor decodes the instruction fetched in IF and produces the appropriate control values Some of the control signals will not be needed until later stages These signals must be propagated through the pipeline until they reach the appropriate stage We just pass them in the pipeline registers, along with the data Control signals can be categorized by the pipeline stage that uses them Stage Control signals needed EX ALUSrc ALUOp RegDst MEM MemRead MemWrite PCSrc WB RegWrite MemToReg 24 Chapter 4 — The Processor

Pipelined data path and control Morgan Kaufmann Publishers 17 April, 2017 Pipelined data path and control 1 ID/EX WB EX/MEM PCSrc Control M WB MEM/WB 4 IF/ID EX M WB Add Add P C Shift left 2 RegWrite Read register 1 Read data 1 MemWrite ALU Read address Instruction [31-0] Zero Read register 2 Read data 2 1 Result Address Write register Data memory Instruction memory MemToReg Registers ALUOp Write data ALUSrc Write data Read data 1 Instr [15 - 0] Sign extend RegDst MemRead Instr [20 - 16] 1 Instr [15 - 11] 25 Chapter 4 — The Processor

An example execution sequence Morgan Kaufmann Publishers 17 April, 2017 An example execution sequence Here’s a sample sequence of instructions to execute 1000: lw $8, 4($29) 1004: sub $2, $4, $5 1008: and $9, $10, $11 1012: or $16, $17, $18 1016: add $13, $14, $0 We’ll make some assumptions, just so we can show actual data values: Each register contains its number plus 100. For instance, register $8 contains 108, register $29 contains 129, etc. Every data memory location contains 99 Our pipeline diagrams will follow some conventions: An X indicates values that aren’t important, like the constant field of an R-type instruction Question marks ??? indicate values we don’t know, usually resulting from instructions coming before and after the ones in our example addresses in decimal 26 Chapter 4 — The Processor

Morgan Kaufmann Publishers 17 April, 2017 Cycle 1 (filling) IF: lw $8, 4($29) ID: ??? EX: ??? MEM: ??? WB: ??? Read address Instruction memory [31-0] Address Write data Data MemWrite (?) MemRead (?) 1 MemToReg (?) Shift left 2 Add PCSrc ALUSrc (?) Result Zero ALU ALUOp (???) RegDst (?) register 1 register 2 register data 2 data 1 Registers RegWrite (?) IF/ID ID/EX EX/MEM MEM/WB Control M WB 1000 1004 ??? 4 P C Sign extend EX 27 Chapter 4 — The Processor

Morgan Kaufmann Publishers 17 April, 2017 Cycle 2 IF: sub $2, $4, $5 ID: lw $8, 4($29) EX: ??? MEM: ??? WB: ??? Read address Instruction memory [31-0] Address Write data Data 1 4 Shift left 2 Add PCSrc Result Zero ALU register 1 register 2 register data 2 data 1 Registers X 8 IF/ID ID/EX EX/MEM MEM/WB Control M WB 1004 29 1008 129 MemToReg (?) ??? RegWrite (?) MemWrite (?) MemRead (?) ALUSrc (?) ALUOp (???) RegDst (?) P C Sign extend EX 28 Chapter 4 — The Processor

Morgan Kaufmann Publishers 17 April, 2017 Cycle 3 IF: and $9, $10, $11 ID: sub $2, $4, $5 EX: lw $8, 4($29) MEM: ??? WB: ??? MemToReg (?) Read address Instruction memory [31-0] Address Write data Data MemWrite (?) MemRead (?) 1 4 Shift left 2 Add PCSrc ALUSrc (1) Result Zero ALU ALUOp (add) X RegDst (0) register 1 register 2 register data 2 data 1 Registers 2 IF/ID ID/EX EX/MEM MEM/WB Control M WB 1008 5 1012 104 105 129 8 133 ??? RegWrite (?) P C Sign extend EX 29 Chapter 4 — The Processor

Morgan Kaufmann Publishers 17 April, 2017 Cycle 4 IF: or $16, $17, $18 ID: and $9, $10, $11 EX: sub $2, $4, $5 MEM: lw $8, 4($29) WB: ??? Read address Instruction memory [31-0] Address Write data Data MemWrite (0) MemRead (1) 1 MemToReg (?) 4 Shift left 2 Add PCSrc ALUSrc (0) Result Zero ALU ALUOp (sub) X RegDst (1) register 1 register 2 register data 2 data 1 Registers RegWrite (?) 9 IF/ID ID/EX EX/MEM MEM/WB Control M WB 1012 10 11 1016 110 111 104 105 2 –1 133 99 8 ??? P C Sign extend EX 30 Chapter 4 — The Processor

Morgan Kaufmann Publishers 17 April, 2017 Cycle 5 (full) IF: add $13, $14, $0 ID: or $16, $17, $18 EX: and $9, $10, $11 MEM: sub $2, $4, $5 WB: lw $8, 4($29) Read address Instruction memory [31-0] Address Write data Data MemWrite (0) MemRead (0) 1 MemToReg (1) 4 Shift left 2 Add PCSrc ALUSrc (0) Result Zero ALU ALUOp (and) X RegDst (1) register 1 register 2 register data 2 data 1 Registers RegWrite (1) 16 IF/ID ID/EX EX/MEM MEM/WB Control M WB 1016 17 18 1020 117 118 110 111 9 -1 105 2 99 133 8 P C Sign extend EX 31 Chapter 4 — The Processor

Morgan Kaufmann Publishers 17 April, 2017 Cycle 6 (emptying) IF: ??? ID: add $13, $14, $0 EX: or $16, $17, $18 MEM: and $9, $10, $11 WB: sub $2, $4, $5 Read address Instruction memory [31-0] Address Write data Data MemWrite (0) MemRead (0) 1 MemToReg (0) 4 Shift left 2 Add PCSrc ALUSrc (0) Result Zero ALU ALUOp (or) X RegDst (1) register 1 register 2 register data 2 data 1 Registers RegWrite (1) 13 IF/ID ID/EX EX/MEM MEM/WB Control M WB 1020 14 ??? 114 117 118 16 119 110 111 9 -1 2 P C Sign extend EX 32 Chapter 4 — The Processor

Morgan Kaufmann Publishers 17 April, 2017 Cycle 7 IF: ??? ID: ??? EX: add $13, $14, $0 MEM: or $16, $17, $18 WB: and $9, $10, $11 Read address Instruction memory [31-0] Address Write data Data MemWrite (0) MemRead (0) 1 MemToReg (0) 4 Shift left 2 Add PCSrc ALUSrc (0) Result Zero ALU ALUOp (add) ??? RegDst (1) register 1 register 2 register data 2 data 1 Registers RegWrite (1) IF/ID ID/EX EX/MEM MEM/WB Control M WB 114 X 13 119 118 16 110 9 P C Sign extend EX 33 Chapter 4 — The Processor

Morgan Kaufmann Publishers 17 April, 2017 Cycle 8 IF: ??? ID: ??? EX: ??? MEM: add $13, $14, $0 WB: or $16, $17, $18 Read address Instruction memory [31-0] Address Write data Data MemWrite (0) MemRead (0) 1 MemToReg (0) 4 Shift left 2 Add PCSrc ALUSrc (?) Result Zero ALU ALUOp (???) ??? RegDst (?) register 1 register 2 register data 2 data 1 Registers RegWrite (1) IF/ID ID/EX EX/MEM MEM/WB Control M WB 114 X 13 119 16 P C Sign extend EX 34 Chapter 4 — The Processor

Morgan Kaufmann Publishers 17 April, 2017 Cycle 9 IF: ??? ID: ??? EX: ??? MEM: ??? WB: add $13, $14, $0 Read address Instruction memory [31-0] Address Write data Data MemWrite (?) MemRead (?) 1 MemToReg (0) 4 Shift left 2 Add PCSrc ALUSrc (?) Result Zero ALU ALUOp (???) ??? RegDst (?) register 1 register 2 register data 2 data 1 Registers RegWrite (1) IF/ID ID/EX EX/MEM MEM/WB Control M WB ? X 114 13 P C Sign extend EX 35 Chapter 4 — The Processor

That’s a lot of diagrams there Morgan Kaufmann Publishers 17 April, 2017 That’s a lot of diagrams there Compare the last few slides with the pipeline diagram above You can see how instruction executions are overlapped Each functional unit is used by a different instruction in each cycle The pipeline registers save control and data values generated in previous clock cycles for later use When the pipeline is full in clock cycle 5, all of the hardware units are utilized. This is the ideal situation, and what makes pipelined processors so fast Clock cycle 1 2 3 4 5 6 7 8 9 lw $t0, 4($sp) IF ID EX MEM WB sub $v0, $a0, $a1 and $t1, $t2, $t3 or $s0, $s1, $s2 add $t5, $t6, $0 36 Chapter 4 — The Processor

Note how everything goes left to right, except Morgan Kaufmann Publishers 17 April, 2017 Note how everything goes left to right, except 1 PCSrc 4 IF/ID ID/EX EX/MEM MEM/WB Add Add P C Shift left 2 RegWrite Read register 1 Read data 1 MemWrite ALU Read address Instruction [31-0] Zero Read register 2 Read data 2 1 Result Address Write register Data memory Instruction memory MemToReg Registers ALUOp Write data ALUSrc Write data Read data 1 Instr [15 - 0] Sign extend RegDst MemRead Instr [20 - 16] 1 Instr [15 - 11] 37 Chapter 4 — The Processor