Major CPU Design Steps 1. Analyze instruction set operations using independent RTN ISA => RTN => datapath requirements. This provides the the required.

Slides:



Advertisements
Similar presentations
ELEN 350 Multi-Cycle Datapath Adapted from the lecture notes of John Kubiatowicz (UCB) and Hank Walker (TAMU)
Advertisements

1 Chapter Five The Processor: Datapath and Control.
EECC550 - Shaaban #1 Lec # 4 Summer Major CPU Design Steps 1Using independent RTN, write the micro- operations required for all target ISA.
EECC550 - Shaaban #1 Lec # 5 Winter Major CPU Design Steps 1. Analyze instruction set operations using independent RTN ISA => RTN => datapath.
CS61C L26 Single Cycle CPU Datapath II (1) Garcia © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.
Chapter 5 The Processor: Datapath and Control Basic MIPS Architecture Homework 2 due October 28 th. Project Designs due October 28 th. Project Reports.
EECC550 - Shaaban #1 Lec # 5 Winter CPU Design Steps 1. Analyze instruction set operations using independent ISA => RTN => datapath requirements.
Savio Chau Single Cycle Controller Design Last Time: Discussed the Designing of a Single Cycle Datapath Control Datapath Memory Processor (CPU) Input Output.
CPU Organization (Design)
CSE378 Multicycle impl,.1 Drawbacks of single cycle implementation All instructions take the same time although –some instructions are longer than others;
EECC550 - Shaaban #1 Lec # 5 Winter CPU Design Steps 1. Analyze instruction set operations using independent RTN => datapath requirements.
EECC550 - Shaaban #1 Lec # 5 Winter CPU Design Steps 1. Analyze instruction set operations using independent RTN => datapath requirements.
EECC550 - Shaaban #1 Lec # 4 Winter CPU Organization Datapath Design: –Capabilities & performance characteristics of principal Functional.
ECE 232 L15.Miulticycle.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 15 Multi-cycle.
Microprocessor Design
EECC250 - Shaaban #1 lec #22 Winter The Von-Neumann Computer Model Partitioning of the computing engine into components: –Central Processing.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
ECE 232 L13. Control.1 ©UCB, DAP’ 97 ECE 232 Hardware Organization and Design Lecture 13 Control Design
EECC550 - Shaaban #1 Lec # 4 Winter CPU Organization (Design) Datapath Design: –Capabilities & performance characteristics of principal.
CS61C L26 CPU Design : Designing a Single-Cycle CPU II (1) Garcia, Fall 2006 © UCB Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.
EECC550 - Shaaban #1 Lec # 4 Winter CPU Organization (Design) Datapath Design: –Capabilities & performance characteristics of principal.
EECC550 - Shaaban #1 Lec # 4 Winter Major CPU Design Steps 1Using independent RTN, write the micro- operations required for all target.
EEM 486: Computer Architecture Lecture 3 Designing a Single Cycle Datapath.
EECC550 - Shaaban #1 Selected Chapter 5 For More Practice Exercises Winter The MIPS jump and link instruction, jal is used to support procedure.
EECC550 - Shaaban #1 Lec # 5 Winter Major CPU Design Steps 1. Analyze instruction set operations using independent RTN ISA => RTN => datapath.
EECC550 - Shaaban #1 Lec # 5 Spring CPU Design Steps 1. Analyze instruction set operations using independent RTN => datapath requirements.
EECC550 - Shaaban #1 Lec # 5 Spring CPU Design Steps 1. Analyze instruction set operations using independent RTN => datapath requirements.
Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.
CS3350B Computer Architecture Winter 2015 Lecture 5.6: Single-Cycle CPU: Datapath Control (Part 1) Marc Moreno Maza [Adapted.
COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections
Computer Organization CS224 Fall 2012 Lesson 26. Summary of Control Signals addsuborilwswbeqj RegDst ALUSrc MemtoReg RegWrite MemWrite Branch Jump ExtOp.
CASE STUDY OF A MULTYCYCLE DATAPATH. Alternative Multiple Cycle Datapath (In Textbook) Minimizes Hardware: 1 memory, 1 ALU Ideal Memory Din Address 32.
EEM 486: Computer Architecture Designing Single Cycle Control.
CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides
EEM 486: Computer Architecture Designing a Single Cycle Datapath.
Computer Architecture and Design – ECEN 350 Part 6 [Some slides adapted from A. Sprintson, M. Irwin, D. Paterson and others]
Datapath and Control Unit Design
CS3350B Computer Architecture Winter 2015 Lecture 5.7: Single-Cycle CPU: Datapath Control (Part 2) Marc Moreno Maza [Adapted.
1 Processor: Datapath and Control Single cycle processor –Datapath and Control Multicycle processor –Datapath and Control Microprogramming –Vertical and.
1 Processor: Datapath and Control Single cycle processor –Datapath and Control Multicycle processor –Datapath and Control Microprogramming –Vertical and.
LECTURE 6 Multi-Cycle Datapath and Control. SINGLE-CYCLE IMPLEMENTATION As we’ve seen, single-cycle implementation, although easy to implement, could.
ECE-C355 Computer Structures Winter 2008 The MIPS Datapath Slides have been adapted from Prof. Mary Jane Irwin ( )
Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.
EEM 486: Computer Architecture Lecture 3 Designing Single Cycle Control.
Single Cycle Controller Design
Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.
Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:
Design a MIPS Processor (II)
Multi-Cycle Datapath and Control
Problem with Single Cycle Processor Design
IT 251 Computer Organization and Architecture
Designing a Multicycle Processor
Processor (I).
CS/COE0447 Computer Organization & Assembly Language
Multiple Cycle Implementation of MIPS-Lite CPU
CPU Organization (Design)
Chapter Five The Processor: Datapath and Control
CS152 Computer Architecture and Engineering Lecture 8 Designing a Single Cycle Datapath Start: X:40.
Vishwani D. Agrawal James J. Danaher Professor
COMS 361 Computer Organization
COSC 2021: Computer Organization Instructor: Dr. Amir Asif
Processor: Multi-Cycle Datapath & Control
Multi-Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Chapter Four The Processor: Datapath and Control
5.5 A Multicycle Implementation
Instructors: Randy H. Katz David A. Patterson
Alternative datapath (book): Multiple Cycle Datapath
The Processor: Datapath & Control.
COMS 361 Computer Organization
Processor: Datapath and Control
Presentation transcript:

Major CPU Design Steps 1. Analyze instruction set operations using independent RTN ISA => RTN => datapath requirements. This provides the the required datapath components and how they are connected to meet ISA requirements. 2. Select required datapath components, connections & establish clock methodology (e.g clock edge-triggered). 3. Assemble datapath meeting the requirements. 4. Identify and define the function of all control points or signals needed by the datapath. Analyze implementation of each instruction to determine setting of control points that affects its operations and register transfer. 5. Design & assemble the control logic. Hard-Wired: Finite-state machine implementation. Microprogrammed. Datapath Control (Chapter 5.5)

Single Cycle MIPS Datapath: CPI = 1, Long Clock Cycle T = I x CPI x C imm16 32 ALUop (2-bits) Clk busW RegWr busA busB 5 Rw Ra Rb 32 32-bit Registers Rs Rt Rd RegDst Extender Mux 16 ALUSrc ExtOp MemtoReg Data In WrEn Adr Data Memory MemWr ALU Zero Instruction<31:0> 1 <21:25> <16:20> <11:15> <0:15> Imm16 = Adder PC 00 4 PCSrc PC Ext Inst Branch PC+4 Target R[rs] R[rt] Main (Includes ORI not in book version) Control Function Field Jump Not Included

Drawbacks of Single-Cycle Processor Long cycle time: All instructions must take as much time as the slowest: Cycle time for load is longer than needed for all other instructions. Real memory is not as well-behaved as idealized memory Cannot always complete data access in one (short) cycle. Impossible to implement complex, variable-length instructions and complex addressing modes in a single cycle. e.g indirect memory addressing. High and duplicate hardware resource requirements Any hardware functional unit cannot be used more than once in a single cycle (e.g. ALUs). Cannot pipeline (overlap) the processing of one instruction with the previous instructions. (instruction pipelining, chapter 6).

Abstract View of Single Cycle CPU PC Next PC Register Fetch ALU Reg. Wrt Access Mem Data Instruction Result Store ALUctr RegDst ALUSrc ExtOp MemWr Equal Branch, Jump RegWr MemRd Main Control control op fun Ext 1 ns 2 ns 1 ns 2 ns 2 ns One CPU Clock Cycle Duration C = 8ns One instruction per cycle CPI = 1 Assuming the following datapath/control hardware components delays: Memory Units: 2 ns ALU and adders: 2 ns Register File: 1 ns Control Unit < 1 ns

Single Cycle Instruction Timing PC Inst Memory mux ALU Data Mem Reg File cmp Arithmetic & Logical Load Store Branch Critical Path setup (Determines CPU clock cycle, C)

Clock Cycle Time & Critical Path One CPU Clock Cycle Duration C = 8ns here Clk . Critical Path i.e longest delay Critical path: the slowest path between any two storage devices Clock Cycle time is a function of the critical path, and must be greater than: Clock-to-Q + Longest Delay Path through the Combination Logic + Setup + Clock Skew Assuming the following datapath/control hardware components delays: Memory Units: 2 ns ALU and adders: 2 ns Register File: 1 ns Control Unit < 1 ns

Reducing Cycle Time: Multi-Cycle Design Cut combinational dependency graph by inserting registers / latches. The same work is done in two or more shorter cycles, rather than one long cycle. storage element storage element Two shorter cycles One long cycle Acyclic Combinational Logic (A) Acyclic Combinational Logic Cycle 1 e.g CPI =1 e.g CPI =2 => storage element Acyclic Combinational Logic (B) Cycle 2 storage element Place registers to: Get a balanced clock cycle length Save any results needed for the remaining cycles storage element

Basic MIPS Instruction Processing Steps Instruction Memory Instruction Fetch Decode Execute Result Store Next } Obtain instruction from program storage Instruction ¬ Mem[PC] Update program counter to address of next instruction Common steps for all instructions PC ¬ PC + 4 Determine instruction type Obtain operands from registers Done by Control Unit Compute result value or status Store result in register/memory if needed (usually called Write Back).

Partitioning The Single Cycle Datapath Add registers between steps to break into cycles 2 ns 1 ns 2 ns Branch, Jump 1 ns 2 ns ExtOp MemRd MemWr RegDst RegWr MemWr ALUSrc ALUctr Reg. File Operand Fetch Exec Instruction Fetch Mem Access PC Next PC Result Store Data Mem Data Memory Access Cycle (MEM) Instruction Fetch Cycle (IF) Instruction Decode Cycle (ID) Execution Cycle (EX) Write back Cycle (WB) 1 2 3 4 5 Place registers to: Get a balanced clock cycle length Save any results needed for the remaining cycles

Example Multi-cycle Datapath To Control Unit Branch, Jump MemToReg MemRd MemWr RegDst RegWr ExtOp ALUSrc ALUctr Equal Reg. File A Ext ALU Reg File R PC Instruction Fetch IR Next PC B Mem Access M Instruction Fetch (IF) 2ns Instruction Decode (ID) 1ns Data Mem Execution (EX) 2ns Memory (MEM) 2ns Write Back (WB) 1ns 1 2 3 4 5 Registers added: All clock-edge triggered (not shown register write enable control lines) IR: Instruction register A, B: Two registers to hold operands read from register file. R: or ALUOut, holds the output of the main ALU M: or Memory data register (MDR) to hold data read from data memory CPU Clock Cycle Time: Worst cycle delay = C = 2ns (ignoring MUX, CLK-Q delays) Assuming the following datapath/control hardware components delays: Memory Units: 2 ns ALU and adders: 2 ns Register File: 1 ns Control Unit < 1 ns Thus Clock Rate: f = 1 / 2ns = 500 MHz

Operations (Dependant RTN) for Each Cycle Logic Immediate IR ¬ Mem[PC] A ¬ R[rs] B ¬ R[rt R ¬ A OR ZeroExt[imm16] R[rt] ¬ R PC ¬ PC + 4 R-Type IR ¬ Mem[PC] A ¬ R[rs] B ¬ R[rt] R ¬ A funct B R[rd] ¬ R PC ¬ PC + 4 Load IR ¬ Mem[PC] A ¬ R[rs] B ¬ R[rt R ¬ A + SignEx(Im16) M ¬ Mem[R] R[rt] ¬ M PC ¬ PC + 4 Store IR ¬ Mem[PC] A ¬ R[rs] B ¬ R[rt] R ¬ A + SignEx(Im16) Mem[R] ¬ B PC ¬ PC + 4 Branch IR ¬ Mem[PC] A ¬ R[rs] B ¬ R[rt] Zero ¬ A - B If Zero = 1: PC ¬ PC + 4 + (SignExt(imm16) x4) else (i.e Zero =0): PC ¬ PC + 4 Instruction Fetch IF ID EX MEM WB Instruction Decode Execution Memory Write Back Instruction Fetch (IF) & Instruction Decode cycles are common for all instructions

MIPS Multi-Cycle Datapath: Five Cycles of Load IF ID EX MEM WB Load 1- Instruction Fetch (IF): Fetch the instruction from instruction Memory. 2- Instruction Decode (ID): Operand Register Fetch and Instruction Decode. 3- Execute (EX): Calculate the effective memory address. 4- Memory (MEM): Read the data from the Data Memory. 5- Write Back (WB): Write the loaded data to the register file. Update PC.

Multi-cycle Datapath Instruction CPI R-Type/Immediate: Require four cycles, CPI = 4 IF, ID, EX, WB Loads: Require five cycles, CPI = 5 IF, ID, EX, MEM, WB Stores: Require four cycles, CPI = 4 IF, ID, EX, MEM Branches/Jumps: Require three cycles, CPI = 3 IF, ID, EX Average or effective program CPI: 3 £ CPI £ 5 depending on program profile (instruction mix).

Single Cycle Vs. Multi-Cycle CPU Clk Cycle 1 Multiple Cycle Implementation: IF ID EX MEM WB Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Load Store Single Cycle Implementation: Waste R-type 8 ns 2ns (500 MHz) 8ns (125 MHz) Single-Cycle CPU: CPI = 1 C = 8ns One million instructions take = I x CPI x C = 106 x 1 x 8x10-9 = 8 msec Multi-Cycle CPU: CPI = 3 to 5 C = 2ns One million instructions take from 106 x 3 x 2x10-9 = 6 msec to 106 x 5 x 2x10-9 = 10 msec depending on instruction mix used. f = 125 MHz f = 500 MHz Assuming the following datapath/control hardware components delays: Memory Units: 2 ns ALU and adders: 2 ns Register File: 1 ns Control Unit < 1 ns

Finite State Machine (FSM) Control Model Control Unit Design: Finite State Machine (FSM) Control Model State specifies control points (outputs) for Register Transfer. Control points (outputs) are assumed to depend only on the current state and not inputs (i.e. Moore finite state machine) Transfer (register/memory writes) and state transition occur upon exiting the state on the falling edge of the clock. Control State Next State Logic Output Logic inputs (opcode, conditions) outputs (control points) Last State State X Current State Register Transfer Control Points e.g Flip-Flops State Transition Depends on Inputs Next State To datapath

Control Specification For Multi-cycle CPU Finite State Machine (FSM) - State Transition Diagram IR ¬ MEM[PC] R-type A ¬ R[rs] B ¬ R[rt] R ¬ A fun B R[rd] ¬ R PC ¬ PC + 4 R ¬ A or ZX R[rt] ¬ R PC ¬ PC + 4 ORi R ¬ A + SX R[rt] ¬ M M ¬ MEM[R] LW MEM[R] ¬ B BEQ & Zero BEQ & ~Zero PC ¬ PC + 4+ SX || 00 SW “instruction fetch” “decode / operand fetch” Execute Memory Write-back (Start state) To instruction fetch 13 states: 4 State Flip-Flops needed To instruction fetch To instruction fetch

Traditional FSM Controller next state Outputs control points state op cond Next State Logic Output Logic State Transition Table next State Inputs control points 11 Equal 6 Opcode Current State State 4 Outputs (Control points) op To datapath datapath State State register (4 Flip-Flops)

Traditional FSM Controller datapath + state diagram => control Translate RTN statements into control points. Assign states. Implement the controller. More on FSM controller implementation in Appendix C

Mapping RTNs To Control Points Examples & State Assignments IR ¬ MEM[PC] 0000 R-type A ¬ R[rs] B ¬ R[rt] 0001 R ¬ A fun B 0100 R[rd] ¬ R PC ¬ PC + 4 0101 R ¬ A or ZX 0110 R[rt] ¬ R PC ¬ PC + 4 0111 ORi R ¬ A + SX 1000 R[rt] ¬ M 1010 M ¬ MEM[R] 1001 LW 1011 MEM[R] ¬ B 1100 BEQ & Zero BEQ & ~Zero 0011 PC ¬ PC + 4+SX || 00 0010 SW “instruction fetch” “decode / operand fetch” Execute Memory Write-back imem_rd, IRen Aen, Ben ALUfun, Sen RegDst, RegWr, PCen 1 4 2 6 8 11 12 3 9 To instruction fetch state 0000 5 7 10 To instruction fetch state 0000 To instruction fetch state 0000

Detailed Control Specification - State Transition Table Current Op field Z Next IR PC Ops Exec Mem Write-Back State en sel A B Ex Sr ALU S R W M M-R Wr Dst 0000 ?????? ? 0001 1 0001 BEQ 0 0011 1 1 0001 BEQ 1 0010 1 1 0001 R-type x 0100 1 1 0001 orI x 0110 1 1 0001 LW x 1000 1 1 0001 SW x 1011 1 1 0010 xxxxxx x 0000 1 1 0011 xxxxxx x 0000 1 0 0100 xxxxxx x 0101 0 1 fun 1 0101 xxxxxx x 0000 1 0 0 1 1 0110 xxxxxx x 0111 0 0 or 1 0111 xxxxxx x 0000 1 0 0 1 0 1000 xxxxxx x 1001 1 0 add 1 1001 xxxxxx x 1010 1 0 1 1010 xxxxxx x 0000 1 0 1 1 0 1011 xxxxxx x 1100 1 0 add 1 1100 xxxxxx x 0000 1 0 0 1 IF ID BEQ Can be combined in one state R ORI LW SW More on FSM controller implementation in Appendix C

Alternative Multiple Cycle Datapath (In Textbook) Miminizes Hardware: 1 memory, 1 ALU PCWr PCWrCond PCSrc Zero IorD MemWr IRWr RegDst RegWr ALUSrcA 1 32 32 Mux PC Mux 1 32 Instruction Reg PC Zero Rs Mux 1 Ra 32 Address 5 32 Rt ALU Out 32 Rb busA A 32 Ideal Memory 32 ALU 5 Reg File Mux 1 Rt Rw 32 32 32 B 32 32 Mem Data Reg Rd 4 Din Dout busW busB 1 32 2 ALU Control Mux 1 3 MemRd Extend << 2 Imm 16 32 ALUOp MemtoReg ALUSrcB

Alternative Multiple Cycle Datapath (In Textbook) rs rt rd imm16 Shared instruction/data memory unit A single ALU shared among instructions Shared units require additional or widened multiplexors Temporary registers to hold data between clock cycles of the instruction: Additional registers: Instruction Register (IR), Memory Data Register (MDR), A, B, ALUOut (Figure 5.27 page 322)

Alternative Multiple Cycle Datapath With Control Lines (Fig 5 Alternative Multiple Cycle Datapath With Control Lines (Fig 5.28 In Textbook) (ORI not supported, Jump supported) PC+ 4 Branch Target rs rt rd 2 imm16 32 PC (Figure 5.28 page 323)

The Effect of The 1-bit Control Signals Name RegDst RegWrite ALUSrcA MemRead MemWrite MemtoReg IorD IRWrite PCWrite PCWriteCond Effect when deasserted (=0) The register destination number for the write register comes from the rt field (instruction bits 20:16). None The first ALU operand is the PC The value fed to the register write data input comes from ALUOut register. The PC is used to supply the address to the memory unit. Effect when asserted (=1) The register destination number for the write register comes from the rd field (instruction bits 15:11). The register on the write register input is written with the value on the Write data input. The First ALU operand is register A (I.e R[rs]) Content of memory specified by the address input are put on the memory data output. Memory contents specified by the address input is replaced by the value on the Write data input. The value fed to the register write data input comes from data memory register (MDR). The ALUOut register is used to supply the the address to the memory unit. The output of the memory is written into Instruction Register (IR) The PC is written; the source is controlled by PCSource The PC is written if the Zero output of the ALU is also active. (Figure 5.29 page 324)

The Effect of The 2-bit Control Signals Name ALUOp ALUSrcB PCSource Value (Binary) 00 01 10 11 Effect The ALU performs an add operation The ALU performs a subtract operation The funct field of the instruction determines the ALU operation (R-Type) The second input of the ALU comes from register B The second input of the ALU is the constant 4 The second input of the ALU is the sign-extended 16-bit immediate (imm16) field of the instruction in IR The second input of the ALU is is the sign-extended 16-bit immediate field of IR shifted left 2 bits Output of the ALU (PC+4) is sent to the PC for writing The content of ALUOut (the branch target address) is sent to the PC for writing The jump target address (IR[25:0] shifted left 2 bits and concatenated with PC+4[31:28] is sent to the PC for writing i.e jump address (Figure 5.29 page 324)

Operations (Dependant RTN) for Each Cycle R-Type IR ¬ Mem[PC] PC ¬ PC + 4 A ¬ R[rs] B ¬ R[rt] ALUout ¬ PC + (SignExt(imm16) x4) ALUout ¬ A funct B R[rd] ¬ ALUout Load IR ¬ Mem[PC] PC ¬ PC + 4 A ¬ R[rs] B ¬ R[rt] ALUout ¬ PC + (SignExt(imm16) x4) ALUout ¬ A + SignEx(Imm16) MDR ¬ Mem[ALUout] R[rt] ¬ MDR Store IR ¬ Mem[PC] PC ¬ PC + 4 A ¬ R[rs] B ¬ R[rt] ALUout ¬ PC + (SignExt(imm16) x4) ALUout ¬ A + SignEx(Imm16) Mem[ALUout] ¬ B Branch IR ¬ Mem[PC] PC ¬ PC + 4 A ¬ R[rs] B ¬ R[rt] ALUout ¬ PC + (SignExt(imm16) x4) Zero ¬ A - B Zero: PC ¬ ALUout Jump IR ¬ Mem[PC] PC ¬ PC + 4 A ¬ R[rs] B ¬ R[rt] ALUout ¬ PC + (SignExt(imm16) x4) PC ¬ Jump Address Instruction Fetch IF ID EX MEM WB Instruction Decode Execution Memory Write Back Instruction Fetch (IF) & Instruction Decode (ID) cycles are common for all instructions

High-Level View of Finite State Machine Control (Figure 5.32) (Figure 5.33) (Figure 5.34) (Figure 5.35) (Figure 5.36) First steps are independent of the instruction class Then a series of sequences that depend on the instruction opcode Then the control returns to fetch a new instruction. Each box above represents one or several state. (Figure 5.31 page 332)

Instruction Fetch (IF) and Decode (ID) FSM States A ¬ R[rs] B ¬ R[rt] ALUout ¬ PC + (SignExt(imm16) x4) IF ID IR ¬ Mem[PC] PC ¬ PC + 4 (Figure 5.33) (Figure 5.34) (Figure 5.35) (Figure 5.36) (Figure 5.32 page 333)

Instruction Fetch (IF) Cycle (State 0) IR ¬ Mem[PC] PC ¬ PC + 4 MemRead = 1 ALUSrcA = 0 IorD = 0 IRWrite =1 ALUSrcB = 01 ALUOp = 00 (add) PCWrite = 1 PCSource = 00 (ORI not supported, Jump supported) PC+ 4 Branch Target rs rt rd 2 imm16 32 PC 00 1 1 1 01 1 00 Add (Figure 5.28 page 323)

Instruction Decode (ID) Cycle (State 1) A ¬ R[rs] B ¬ R[rt] ALUout ¬ PC + (SignExt(imm16) x4) ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 (add) (Calculate branch target) (ORI not supported, Jump supported) PC+ 4 Branch Target rs rt rd 2 imm16 32 PC 11 00 Add (Figure 5.28 page 323)

Load/Store Instructions FSM States (From Instruction Decode) EX ALUout ¬ A + SignEx(Imm16) MDR ¬ Mem[ALUout] Mem[ALUout] ¬ B MEM R[rt] ¬ MDR WB To Instruction Fetch (Figure 5.32) (Figure 5.33 page 334)

Load/Store Execution (EX) Cycle (State 2) Effective address calculation ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 (add) ALUout ¬ A + SignEx(Imm16) (ORI not supported, Jump supported) PC+ 4 Branch Target rs rt rd 2 imm16 32 PC 10 1 00 Add (Figure 5.28 page 323)

Load Memory (MEM) Cycle (State 3) MDR ¬ Mem[ALUout] MemRead = 1 IorD = 1 (ORI not supported, Jump supported) PC+ 4 Branch Target rs rt rd 2 imm16 32 PC 1 1 (Figure 5.28 page 323)

Load Write Back (WB) Cycle (State 4) R[rt] ¬ MDR RegWrite = 1 MemtoReg = 1 RegDst = 0 (ORI not supported, Jump supported) PC+ 4 Branch Target rs rt rd 2 imm16 32 PC 1 1 (Figure 5.28 page 323)

Store Memory (MEM) Cycle (State 5) Mem[ALUout] ¬ B MemWrite = 1 IorD = 1 (ORI not supported, Jump supported) PC+ 4 Branch Target rs rt rd 2 imm16 32 PC 1 1 (Figure 5.28 page 323)

R-Type Instructions FSM States (From Instruction Decode) R-Type Instructions FSM States EX ALUout ¬ A funct B WB R[rd] ¬ ALUout To State 0 (Instruction Fetch) (Figure 5.32) (Figure 5.34 page 335)

R-Type Execution (EX) Cycle (State 6) ALUout ¬ A funct B ALUSrcA = 1 ALUSrcB = 00 ALUOp = 10 (R-Type) (ORI not supported, Jump supported) PC+ 4 Branch Target rs rt rd 2 imm16 32 PC 00 1 10 R-Type (Figure 5.28 page 323)

R-Type Write Back (WB) Cycle (State 7) R[rd] ¬ ALUout RegWrite = 1 MemtoReg = 0 RegDst = 1 (ORI not supported, Jump supported) PC+ 4 Branch Target rs rt rd 2 imm16 32 PC 1 1 (Figure 5.28 page 323)

Branch Instruction Single EX State Jump Instruction Single EX State (From Instruction Decode) (From Instruction Decode) Zero ¬ A - B Zero : PC ¬ ALUout PC ¬ Jump Address EX EX To State 0 (Instruction Fetch) (Figure 5.32) To State 0 (Instruction Fetch) (Figure 5.32) (Figures 5.35, 5.36 page 337)

Branch Execution (EX) Cycle (State 8) Zero ¬ A - B Zero : PC ¬ ALUout ALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 (Subtract) PCWriteCond = 1 PCSource = 01 (ORI not supported, Jump supported) PC+ 4 Branch Target rs rt rd 2 imm16 32 PC 1 01 00 1 01 Subtract (Figure 5.28 page 323)

Jump Execution (EX) Cycle (State 9) PC ¬ Jump Address PCWrite = 1 PCSource = 10 (ORI not supported, Jump supported) PC+ 4 Branch Target rs rt rd 2 imm16 32 PC 10 1 1 (Figure 5.28 page 323)

FSM State Transition Diagram (From Book) IF ID (Figure 5.38 page 339) A ¬ R[rs] B ¬ R[rt] ALUout ¬ PC + (SignExt(imm16) x4) (Figure 5.38 page 339) IR ¬ Mem[PC] PC ¬ PC + 4 ALUout ¬ A + SignEx(Imm16) EX PC ¬ Jump Address ALUout ¬ A func B Zero ¬ A -B Zero: PC ¬ ALUout MDR ¬ Mem[ALUout] WB MEM R[rd] ¬ ALUout Mem[ALUout] ¬ B Total 10 states R[rt] ¬ MDR WB More on FSM controller implementation in Appendix C

MIPS Multi-cycle Datapath Performance Evaluation What is the average CPI? State diagram gives CPI for each instruction type. Workload (program) below gives frequency of each type. Type CPIi for type Frequency CPIi x freqIi Arith/Logic 4 40% 1.6 Load 5 30% 1.5 Store 4 10% 0.4 branch 3 20% 0.6 Average CPI: 4.1 Better than CPI = 5 if all instructions took the same number of clock cycles (5).

Adding Support for swap to Multi Cycle Datapath (For More Practice Exercise 5.42) You are to add support for a new instruction, swap that exchanges the values of two registers to the MIPS multicycle datapath of Figure 5.28 on page 232 swap $rs, $rt Swap used the R-Type format with: the value of field rs = the value of field rd Add any necessary datapaths and control signals to the multicycle datapath. Find a solution that minimizes the number of clock cycles required for the new instruction without modifying the register file. Justify the need for the modifications, if any. Show the necessary modifications to the multicycle control finite state machine of Figure 5.38 on page 339 when adding the swap instruction. For each new state added, provide the dependent RTN and active control signal values. i.e No additional register write ports

Adding swap Instruction Support to Multi Cycle Datapath We assume here rs = rd in instruction encoding Swap $rs, $rt R[rt] ¬ R[rs] R[rs] ¬ R[rt] op rs rt rd [31-26] [25-21] [20-16] [10-6] 2 2 PC+ 4 rs Branch Target rt R[rs] R[rt] rd imm16 2 The outputs of A and B should be connected to the multiplexor controlled by MemtoReg if one of the two fields (rs and rd) contains the name of one of the registers being swapped. The other register is specified by rt. The MemtoReg control signal becomes two bits. (For More Practice Exercise 5.42)

Adding swap Instruction Support to Multi Cycle Datapath IF A ¬ R[rs] B ¬ R[rt] ALUout ¬ PC + (SignExt(imm16) x4) IR ¬ Mem[PC] PC ¬ PC + 4 ID EX ALUout ¬ A + SignEx(Imm16) WB1 R[rd] ¬ B ALUout ¬ A func B Zero ¬ A -B Zero: PC ¬ ALUout WB2 R[rt] ¬ A R[rd] ¬ ALUout MEM WB Swap takes 4 cycles WB (For More Practice Exercise 5.42)

Adding Support for add3 to Multi Cycle Datapath (For More Practice Exercise 5.45) You are to add support for a new instruction, add3, that adds the values of three registers, to the MIPS multicycle datapath of Figure 5.28 on page 232 For example: add3 $s0,$s1, $s2, $s3 Register $s0 gets the sum of $s1, $s2 and $s3. The instruction encoding uses a modified R-format, with an additional register specifier rx added replacing the five low bits of the “funct” field. Add necessary datapath components, connections, and control signals to the multicycle datapath without modifying the register bank or adding additional ALUs. Find a solution that minimizes the number of clock cycles required for the new instruction. Justify the need for the modifications, if any. Show the necessary modifications to the multicycle control finite state machine of Figure 5.38 on page 339 when adding the add3 instruction. For each new state added, provide the dependent RTN and active control signal values. OP rs rt rd rx $s1 $s2 Not used 6 bits [31-26] 5 bits [25-21] [20-16] [15-11] add3 [4-0] $s0 $s3 [10-5]

Exercise 5.45: add3 instruction support to Multi Cycle Datapath Add3 $rd, $rs, $rt, $rx R[rd] ¬ R[rs] + R[rt] + R[rx] rx is a new register specifier in field [0-4] of the instruction No additional register read ports or ALUs allowed Modified R-Format op rs rt rd rx [31-26] [25-21] [20-16] [10-6] [4-0] 2 WriteB 2 2 PC+ 4 rs Branch Target rt rx rd imm16 1. ALUout is added as an extra input to first ALU operand MUX to use the previous ALU result as an input for the second addition. 2. A multiplexor should be added to select between rt and the new field rx containing register number of the 3rd operand (bits 4-0 for the instruction) for input for Read Register 2. This multiplexor will be controlled by a new one bit control signal called ReadSrc. 3. WriteB control line added to enable writing R[rx] to B

Exercise 5.45: add3 instruction support to Multi Cycle Datapath IF A ¬ R[rs] B ¬ R[rt] ALUout ¬ PC + (SignExt(imm16) x4) IR ¬ Mem[PC] PC ¬ PC + 4 ID EX ALUout ¬ A + SignEx(Im16) WriteB EX1 ALUout ¬ A + B B ¬ R[rx] WriteB ALUout ¬ A func B EX2 Zero ¬ A -B Zero: PC ¬ ALUout ALUout ¬ ALUout + B R[rd] ¬ ALUout MEM WB Add3 takes 5 cycles WB (For More Practice Exercise 5.45)