Single Cycle MIPS Implementation

Slides:

Advertisements

Similar presentations

Adding the Jump Instruction

Advertisements

Pipeline Example: cycle 1 lw R10,9(R1) sub R11,R2, R3 and R12,R4, R5 or R13,R6, R7.

Pipelined Processor II (cont’d) CPSC 321

Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Performance See: P&H 1.4.

1 Chapter Five The Processor: Datapath and Control.

1 Lecture 17: Basic Pipelining Today’s topics:  5-stage pipeline  Hazards and instruction scheduling Mid-term exam stats:  Highest: 90, Mean: 58.

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

EECE476 Lecture 9: Multi-cycle CPU Datapath Chapter 5: Section 5.5 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

1 5.5 A Multicycle Implementation A single memory unit is used for both instructions and data. There is a single ALU, rather than an ALU and two adders.

Multicycle Review Performance Examples. Single Cycle MIPS Implementation All instructions take the same amount of time Signals propagate along longest.

©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan

©UCB CS 161Computer Architecture Chapter 5 Lecture 11 Instructor: L.N. Bhuyan Adapted from notes by Dave Patterson (http.cs.berkeley.edu/~patterson)

Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.

COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections

Datapath and Control: MultiCycle Implementation. Performance of Single Cycle Machines °Assume following operation times: Memory units : 200 ps ALU and.

COMP541 Multicycle MIPS Montek Singh Apr 8, 2015.

CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.

CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.

1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.

CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

1 CS/COE0447 Computer Organization & Assembly Language Multi-Cycle Execution.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

Let’s look at a normal lw instruction first… 1. 2 Register file addresscontent 6 (00110) (00111) OpcodeSource register Destination register.

COMP541 Multicycle MIPS Montek Singh Mar 25, 2010.

Performance of Single-cycle Design

IT 251 Computer Organization and Architecture Multi Cycle CPU Datapath Chia-Chi Teng.

LECTURE 6 Multi-Cycle Datapath and Control. SINGLE-CYCLE IMPLEMENTATION As we’ve seen, single-cycle implementation, although easy to implement, could.

Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.

Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 10: Control Design

Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.

1 CS/COE0447 Computer Organization & Assembly Language Chapter 5 Part 3.

1 The final datapath. 2 Control  The control unit is responsible for setting all the control signals so that each instruction is executed properly. —The.

Multi-Cycle Datapath and Control

Single Cycle CPU.

Exceptions Another form of control hazard Could be caused by…

IT 251 Computer Organization and Architecture

Lecture: Pipelining Basics

Performance of Single-cycle Design

Systems Architecture I

Multi-Cycle CPU.

Single Cycle Processor

D.4 Finite State Diagram for the Multi-cycle processor

Multi-Cycle CPU.

ECS 154B Computer Architecture II Spring 2009

Multiple Cycle Implementation of MIPS-Lite CPU

Chapter 4 The Processor Part 2

Chapter Five The Processor: Datapath and Control

Multicycle Approach Break up the instructions into steps

A pipeline diagram Clock cycle lw $t0, 4($sp) IF ID

The Multicycle Implementation

CS/COE0447 Computer Organization & Assembly Language

Serial versus Pipelined Execution

Pipelining in more detail

Chapter Five The Processor: Datapath and Control

Drawbacks of single cycle implementation

The Multicycle Implementation

Systems Architecture I

Vishwani D. Agrawal James J. Danaher Professor

COSC 2021: Computer Organization Instructor: Dr. Amir Asif

Processor: Multi-Cycle Datapath & Control

Multi-Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

Chapter Four The Processor: Datapath and Control

Control Unit for Multiple Cycle Implementation

5.5 A Multicycle Implementation

Processor Design Datapath and Design.

Systems Architecture I

Control Unit for Multiple Cycle Implementation

FloorPlan for Multicycle MIPS

A relevant question Assuming you’ve got: One washer (takes 30 minutes)

The Processor: Datapath & Control.

Presentation transcript:

Single Cycle MIPS Implementation All instructions take the same amount of time Signals propagate along longest path CPI = 1 Lots of operations happening in parallel Increment PC ALU Branch target computation Inefficient

Multicycle MIPS Implementation Instructions take different number of cycles Cycles are identical in length Share resources across cycles E.g. one ALU for everything Minimize hardware Cycles are independent across instructions R-type and memory-reference instructions do different things in their 4th cycles CPI is 3,4, or 5 depending on instruction “Independent” cycles – just means that there is no “reservation” of a cycle so that only R-type instructions do useful work in that cycle, while others wait.

Multicycle versions of various instructions R-type (add, sub, etc.) – 4 cycles Read instruction Decode/read registers ALU operation ALU Result stored back to destination register. Branch – 3 cycles Get branch address (ALU); read regs for comparing. ALU compares registers; if branch taken, update PC

Multicycle versions of various instructions Load – 5 cycles Read instruction Decode/read registers ALU adds immediate to register to form address Address passed to memory; data is read into MDR Data in MDR is stored into destination register Store – 4 cycles ALU adds immediate to a register to form address Save data from the other source register into memory at address from cycle 3

Control for new instructions Suppose we introduce lw2r: lw2r $1, $2, $3: compute address as $2+$3 put result into $1. In other words: lw $1, 0($2+$3) R-type instruction How does the state diagram change?

Control for new instructions Suppose we introduce lw2r: lw2r $1, $2, $3: compute address as $2+$3 Load value at this address into $1 In other words: lw $1, 0($2+$3) R-type instruction How does the state diagram change? New states: A,B,C State 1 (op=‘lw2r’) State A  State B  State C  back to 0 A controls: ALUOp=00, ALUSrcA=1, ALUSrcB=0 B controls: MemRead=1, IorD = 1 C controls: RegDst = 1, RegWrite = 1, MemToReg = 1

Performance CPI: cycles per instruction Execution time = IC * CPI * C Average CPI based on instruction mixes Execution time = IC * CPI * C Where IC = instruction count; C = clock cycle time Performance: inverse of execution time MIPS = million instructions per second Higher is better Amdahl’s Law: CPI is what you want to minimize for better performance, assuming same program and same processor speed. Ideal CPI? 1 for pipelined, in reality greater. For modern CPU’s (superscalar, dynamic sched, VLIW) – historically went from CPI of about 1 to a CPI of about 1 (!!!). Why? – slower memory, slower wires, etc.

Performance Examples Finding average CPI:

Performance Examples Finding average CPI:

Performance Examples CPI = 1.74 Assume a 2GHz P4, with program consisting of 1,000,000,000 instructions. Find execution time

Performance Examples CPI = 1.74, 2GHz P4, 10^9 instructions. Execution_time = IC * CPI * Cycletime = 10^9 * 1.74 * 0.5 ns = 0.87 seconds

Performance Examples We improve the design and change CPI of load/store to 1. Speedup assuming the same program?

Performance Examples We improve the design and change CPI of load/store to 1. Speedup assuming the same program/cycle time? CPInew = 0.5*1 + 0.08*2 + 0.08*3 + 0.34*1 CPInew = 1.24 Speedup = 1.74/1.24 = 1.4

Amdahl’s Law Suppose I make my add instructions twice as fast. Suppose 20% of my program is doing adds Speedup? What if I make the adds infinitely fast? New Exectime = old_exectime(4/5 + (1/5)/2) = 9/10 * old_exectime. Speedup = 10/9 Infinitely fast  speedup = 5/4 = 20% improvement!

Amdahl’s Law Suppose I make my add instructions twice as fast. Suppose 20% of my program is doing adds Speedup? New Exectime = old_exectime(4/5 + (1/5)/2) = 9/10 * old_exectime Speedup = 10/9 What if I make the adds infinitely fast? Speedup = 5/4, only 20% improvement! New Exectime = old_exectime(4/5 + (1/5)/2) = 9/10 * old_exectime. Speedup = 10/9 Infinitely fast  speedup = 5/4 = 20% improvement!