Multiple Cycle Implementation of MIPS-Lite CPU

Slides:



Advertisements
Similar presentations
1 Chapter Five The Processor: Datapath and Control.
Advertisements

CS-447– Computer Architecture Lecture 12 Multiple Cycle Datapath
The Processor: Datapath & Control
The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04.
Chapter 5 The Processor: Datapath and Control Basic MIPS Architecture Homework 2 due October 28 th. Project Designs due October 28 th. Project Reports.
CSE378 Multicycle impl,.1 Drawbacks of single cycle implementation All instructions take the same time although –some instructions are longer than others;
1 5.5 A Multicycle Implementation A single memory unit is used for both instructions and data. There is a single ALU, rather than an ALU and two adders.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Advanced Computer Architecture 5MD00 / 5Z033 MIPS Design data path and control Henk Corporaal TUEindhoven 2007.
331 W10.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 10 Building a Multi-Cycle Datapath [Adapted from Dave Patterson’s.
1 The Processor: Datapath and Control We will design a microprocessor that includes a subset of the MIPS instruction set: –Memory access: load/store word.
The Multicycle Processor CPSC 321 Andreas Klappenecker.
CSE431 L05 Basic MIPS Architecture.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 05: Basic MIPS Architecture Review Mary Jane Irwin.
Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Multi-Cycle Processor.
COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections
Computer Architecture Chapter 5 Fall 2005 Department of Computer Science Kent State University.
CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides
1 CS/COE0447 Computer Organization & Assembly Language Multi-Cycle Execution.
C HAPTER 5 T HE PROCESSOR : D ATAPATH AND C ONTROL M ULTICYCLE D ESIGN.
Gary MarsdenSlide 1University of Cape Town Stages.
1 Processor: Datapath and Control Single cycle processor –Datapath and Control Multicycle processor –Datapath and Control Microprogramming –Vertical and.
1  2004 Morgan Kaufmann Publishers Chapter Five.
Multicycle Implementation
LECTURE 6 Multi-Cycle Datapath and Control. SINGLE-CYCLE IMPLEMENTATION As we’ve seen, single-cycle implementation, although easy to implement, could.
ECE-C355 Computer Structures Winter 2008 The MIPS Datapath Slides have been adapted from Prof. Mary Jane Irwin ( )
Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 16 - Multi-Cycle.
Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.
Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.
1 CS/COE0447 Computer Organization & Assembly Language Chapter 5 Part 3.
Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:
Design a MIPS Processor (II)
Multi-Cycle Datapath and Control
CS161 – Design and Architecture of Computer Systems
IT 251 Computer Organization and Architecture
Chapter 5 The Processor: Datapath and Control
/ Computer Architecture and Design
Systems Architecture I
Extensions to the Multicycle CPU
Five Execution Steps Instruction Fetch
MIPS Instructions.
Multi-Cycle CPU.
Basic MIPS Architecture
CS/COE0447 Computer Organization & Assembly Language
CSCE 212 Chapter 5 The Processor: Datapath and Control
Processor: Finite State Machine & Microprogramming
Chapter Five The Processor: Datapath and Control
Multicycle Approach Break up the instructions into steps
The Multicycle Implementation
CS/COE0447 Computer Organization & Assembly Language
Chapter Five The Processor: Datapath and Control
Drawbacks of single cycle implementation
The Multicycle Implementation
Systems Architecture I
The Processor Lecture 3.2: Building a Datapath with Control
Vishwani D. Agrawal James J. Danaher Professor
Multicycle Approach We will be reusing functional units
COSC 2021: Computer Organization Instructor: Dr. Amir Asif
COSC 2021: Computer Organization Instructor: Dr. Amir Asif
Processor: Multi-Cycle Datapath & Control
Multicycle Design.
CS/COE0447 Computer Organization & Assembly Language
Multi-Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Chapter Four The Processor: Datapath and Control
5.5 A Multicycle Implementation
Systems Architecture I
The Processor: Datapath & Control.
Processor: Datapath and Control
CS161 – Design and Architecture of Computer Systems
Presentation transcript:

Multiple Cycle Implementation of MIPS-Lite CPU CS365 Lecture 8

Review on Single Cycle Datapath Subset of the core MIPS ISA Arithmetic/Logic instructions: AND, OR, ADD, SUB, SLT Data flow instructions: LW, SW Branch instructions: BEQ, J Five steps in processor design Analyze the instruction Determine the datapath components Assemble the components Determine the control Design the control unit CS465 Multi-cycle CPU

Complete Single Cycle Datapath How lw, sw, R-Type, beq, j instructions work? CS465 Multi-cycle CPU

Delays in Single Cycle Datapath What are the delays for lw, sw, R-Type, beq, j instructions? CS465 Multi-cycle CPU

Single Cycle Implementation Calculate cycle time assuming negligible delays except: memory (2ns), ALU and adders (2ns), register file access (1ns) Instruction class Instruction Fetch Register Access ALU Register/ Memory Access R-Type X R 6 Load M 8 Store 7 Branch 5 Jump 2 CS465 Multi-cycle CPU

Remarks on Single Cycle Datapath Single cycle datapath ensures the execution of any instruction within one clock cycle Functional units must be duplicated if used multiple times by one instruction, e.g. ALU Why? Functional units can be shared if used by different instructions Single cycle datapath is not efficient in time Clock cycle time is determined by the instruction taking the longest time, eg. lw in MIPS Variable clock cycle time is too complicated Alternative design/implementation approaches Multiple clock cycles per instruction – this lecture Pipelining – Chapter 6 CS465 Multi-cycle CPU

Multi-cycle Approach Single cycle problems: One solution: What if we had a more complicated instruction like FP? Wasteful of area One solution: Use a “smaller” cycle time Have different instructions take different numbers of cycles A “multicycle” datapath P C M e m o r y A d s I n t u c i a D g R # L U B O CS465 Multi-cycle CPU

Multistage Execution of Instructions We have informally described instructions as executing in several steps Instruction fetch and PC update Instruction decode and reading from registers Performing an ALU computation Reading/writing data memory Storing data back to register file What if we made these stages explicit in the hardware design? CS465 Multi-cycle CPU

Benefits Performance benefits Cost benefits Each instruction can execute only the stages that are necessary E.g. branch instruction beq Instructions can complete as soon as possible, instead of being limited by the slowest instruction Cost benefits As an added bonus, we can eliminate some of the extra hardware from the single-cycle datapath We will restrict to using each functional unit once per cycle We could reuse some units in a different cycle during the execution of a single instruction E.g. beq can compare operand and compute target in different cycles CS465 Multi-cycle CPU

Multicycle Approach We will be reusing functional units ALU used to compute address and to increment PC Memory used for instruction and data Unlike the single cycle implementation, our control signals will not be determined solely by instruction E.g., what should the ALU do for a “subtract” instruction? Need to specify not only instruction but also which cycle in instruction’s execution We will use a finite state machine for control CS465 Multi-cycle CPU

Review: Finite State Machines A set of states and Next state function (determined by current state and the input) Output function (determined by current state and possibly input) There are two types of FSM according to the output function N e x t - s a f u n c i o C r l k O p I CS465 Multi-cycle CPU

Moore Machine vs. Mealy Machine The output function depends on the current state Mealy machine The output function depends on both the current state and the current input We will use a Moore machine Refer to Appendix B.10 (page B-67) N e x t - s a f u n c i o C r l k O p I CS465 Multi-cycle CPU

Multicycle Approach Break up the instructions into steps, each step takes a cycle Balance the amount of work to be done Restrict each cycle to use only one major functional unit At the end of a cycle Store values for use in later cycles Introduce additional “internal” registers S h i f t l e 2 P C M m o r y D a W d u x 1 R g s 4 I n c [ 5 – ] 3 6 A L U Z B O CS465 Multi-cycle CPU

Changes to Datapath A single memory unit is used to store both instructions and data Memory address specified by PC or ALUout Need a new multiplexor (IorD) to control whether the memory is being accessed to read an instruction or read/write data (for lw/sw) P C M e m o r y A d s I n t u c i a D g R # L U B O CS465 Multi-cycle CPU

Changes to Datapath New internal registers added to support multi-cycle operation Store data output by functional elements for use in subsequent cycles Memory access: Instruction register (IR), memory data register (MDR); register access:A, B; ALU op: ALUOut All except IR hold data only between adjacent cycles Only IR needs to hold the instruction until the end of execution of that instruction, hence need a write control P C M e m o r y A d s I n t u c i a D g R # L U B O CS465 Multi-cycle CPU

Changes to Datapath ALU used for all arithmetic (including computing branch target address and PC+4) Need a new multiplexor for top input to ALU (ALUSrcA): choose between PC and register A Bottom input multiplexor (ALUSrcB) to ALU extended to have four inputs (instead of two in single cycle datapath): register B, sign-extended immediate, address offset (sign-extended and shifted), constant 4 P C M e m o r y A d s I n t u c i a D g R # L U B O CS465 Multi-cycle CPU

Multicycle Datapath with Control Figure 5.27 CS465 Multi-cycle CPU

Multicycle Datapath & Control S h i f t l e 2 P C M u x 1 R g s r W d a I n c o [ 5 – ] 4 3 6 A L U Z m y B D O p - J 8 With support to jump and branch Figure 5.28 CS465 Multi-cycle CPU

Five Execution Steps Instruction fetch Instruction decode and register fetch Execution, memory address computation, or branch completion Memory access or R-type instruction completion Write-back step INSTRUCTIONS TAKE FROM 3 - 5 CYCLES! CS465 Multi-cycle CPU

High Level View of FSM Control Figure 5.31 CS465 Multi-cycle CPU

Step 1: Instruction Fetch Operations Use PC to get instruction and put it in the Instruction Register Increment the PC by 4 and put the result back in PC Described in RTL (Register-Transfer Language) IR = Memory[PC]; PC = PC + 4; Control signals Assert MemRead, IRwrite Set IorD to 0  select PC as the source of the address Increment PC by 4: setting ALUSrcA to 0 (sending PC to ALU), ALUSrcB to 01 (sending 4), ALUOp  00 (for add), store PC+4 to PC  PCSource 00, and PCWrite  1 What is the advantage of updating the PC now? succintly CS465 Multi-cycle CPU

Step 2: Instr. Decode & Reg. Fetch Operations Read registers rs and rt in case we need them Compute the branch address in case the instruction is a branch RTL: A = Reg[IR[25-21]]; B = Reg[IR[20-16]]; ALUOut = PC + (sign-extend(IR[15-0]) << 2); Control signals We are not setting any control lines based on the instruction type yet (we are busy "decoding" it in our control logic) Set ALUSrcA to 0 so that PC is sent to ALU, ALUSrcB to 11 so that sign-extended and shifted offset to ALU CS465 Multi-cycle CPU

Instruction Fetch & Decode A L U S r c = B 1 O p M e m R a d I o D W i t P C u n s f h / g ( ' ) - y E Q J F 5 . 3 4 35 36 CS465 Multi-cycle CPU

Step 3: Execution ALU is performing one of three functions, based on instruction type Need to set ALUSrcA, ALUSrcB, ALUOp Memory reference (LW/SW) ALUOut = A + sign-extend(IR[15-0]); R-type: ALUOut = A op B; Branch: if (A==B) PC = ALUOut; Need to set PCWriteCond, PCSource, use Zero Jump PC = PC[31:28]|(IR[25:0], 2’b00) Need to set PCWrite, PCSource CS465 Multi-cycle CPU

Step 4: R-type or Memory-access Loads and stores access memory MDR = Memory[ALUOut]; or Memory[ALUOut] = B; Need to set MemRead/MemWrite, IorD R-type instructions finish Reg[IR[15-11]] = ALUOut; Need to set RegDst, RegWrite, MemtoReg The write actually takes place at the end of the cycle on the edge CS465 Multi-cycle CPU

Step 5: Write-back For LW instruction: Reg[IR[20-16]]= MDR; Need to set RegWrite, MemtoReg, RegDst What about all the other instructions? CS465 Multi-cycle CPU

FSM for Memory Access Instr. W r i t I o D = 1 R a d A L U S c B O p g s y u n ( ' ) - b k 4 2 5 3 F T . CS465 Multi-cycle CPU

FSM for R-format Instructions L U S r c = 1 B O p R e g D s t W i M m o E x u n - y l 6 7 ( ) F a T 5 . 3 2 CS465 Multi-cycle CPU

FSM for Branch Instruction p l e t i 8 ( O = ' E Q ) F s 1 T g u 5 . 3 2 A L U S P C W d CS465 Multi-cycle CPU

FSM for Jump J u m p c o l e t i n 9 ( O = ' ) F r s a 1 T g 5 . 3 2 P g 5 . 3 2 P C W S CS465 Multi-cycle CPU

Summary of Steps CS465 Multi-cycle CPU

Complete Finite State Machine W r i t e S o u c = 1 A L U B O p n d R g D s M m I a f h / J l E x y - b k ( ' ) Q 4 9 8 6 2 7 5 3 CS465 Multi-cycle CPU

Implementing the Control Value of control signals is dependent upon: What instruction is being executed Which step is being performed Use the information we’ve accumulated to specify a finite state machine Specify the finite state machine graphically, or Use microprogramming Implementation can be derived from specification CS465 Multi-cycle CPU

Finite State Machine for Control Implementation: P C W r i t e o n d I D M m R g S u c A L U O p B s N 3 2 1 5 4 a f l CS465 Multi-cycle CPU

Simple Questions How many cycles will it take to execute this code? lw $t2, 0($t3) lw $t3, 4($t3) beq $t2, $t3, Label #assume not taken add $t5, $t2, $t3 sw $t5, 8($t3) Label: ... What is going on during the 8th cycle of execution? In what cycle does the actual addition of $t2 and $t3 takes place? CS465 Multi-cycle CPU

Exceptions Hardest part of control is to implement exceptions & interrupts Type of event From where? MIPS terminology I/O device request External Interrupt Invoke the operating system from user program Internal Exception Arithmetic overflow Using an undefined instruction Hardware malfunctions Internal or External Exception or interrupt CS465 Multi-cycle CPU

How Are Exceptions Handled? In our design, we will consider two types of exceptions Arithmetic overflow Execution of an undefined instruction Actions on exception Save address of offending instruction in the Exception Program Counter (EPC) Transfer control to the operating system at a pre-specified address (exception handler) OS then takes appropriate action CS465 Multi-cycle CPU

Exception Handling For the OS to take appropriate action, it must know the reason for the exception Two ways to communicate reason to OS Have a Status register which holds a field that indicates the reason for the exception Vectored interrupts Address to which control is transferred depends upon the cause of the exception MIPS uses first method above; it has a register called Cause (in addition to the EPC register) Undefined instruction =0 Arithmetic overflow =1 CS465 Multi-cycle CPU

CPU w/ Exception Support h i f t l e 2 M m o r y D a W d u x 1 I n s c [ 5 – ] 4 g 3 6 A L U Z B R P C O p - J 8 E CS465 Multi-cycle CPU

Exception Handling Datapath additions Control signals Three steps EPC, Cause (for undefined instruction, Cause = 0, arithmetic overflow Cause = 1) Control signals EPCWrite, CauseWrite IntCause (sets the Cause register) PCSrc has to be modified so that one of its sources is the OS entry point Three steps Write Cause EPC = PC – 4 (Have to use ALU, so need to expand multiplexors for ALUSrcA and ALUSrcB Write PC CS465 Multi-cycle CPU

CPU w/ Exception Support h i f t l e 2 M m o r y D a W d u x 1 I n s c [ 5 – ] 4 g 3 6 A L U Z B R P C O p - J 8 E CS465 Multi-cycle CPU

States for Handling Exceptions 1 T o s t a e b g i n x r u c S = A L U B O p E P C W I PC CS465 Multi-cycle CPU

FSM including Exception Support A L U S r c = 1 B O p P C W i t e o n d u R g D s M m I a f h / J l E x y - b k ( ' ) Q 4 9 8 6 2 7 5 3 v w CS465 Multi-cycle CPU

Exercise Given the following instruction mix, what is the CPI for the multi-cycle CPU, assuming each step requires 1 clock cycle? Loads: 20%, stores: 10%, branches: 13%, jumps: 2%, ALU: 55% Idea Get the cycles needed for every instruction type: these are their individual CPIs Calculate the average CPI considering the instruction mix (instruction frequency) CS465 Multi-cycle CPU

Cycles for Individual Instructions Loads: 5, stores: 4 ALU instructions: 4 Branches: 3 Jumps: 3 CS465 Multi-cycle CPU

Exercise Given the following instruction mix, what is the CPI for the multi-cycle CPU, assuming each step requires 1 clock cycle? Loads: 20%, stores: 10%, branches: 13%, jumps: 2%, ALU: 55% CPI for each of them: loads: 5, stores: 4, branches: 3, jumps:3, ALU: 4 Average CPI = 0.205 + 0.104 + 0.133 + 0.023 + 0.554 = 4.05 Compare with the worst-case CPI of 5.0 if all instructions take the same number of cycles CS465 Multi-cycle CPU

Summary Multiple-cycle datapath and control Typical steps Break up the instructions into steps, each step takes a cycle Reduced clock cycle time; different instructions take varied cycles to implement Functional units can be reused Typical steps Instruction fetch Instruction decode and register fetch Execution: R-type, memory address computation, condition check Memory access or R-type write Write back from memory CS465 Multi-cycle CPU

Next Lecture Topic Reading Pipelining Ch6 Patterson and Hennessy CS465 Multi-cycle CPU