Multicycle Review Performance Examples. Single Cycle MIPS Implementation All instructions take the same amount of time Signals propagate along longest.

Slides:



Advertisements
Similar presentations
Adding the Jump Instruction
Advertisements

Pipeline Example: cycle 1 lw R10,9(R1) sub R11,R2, R3 and R12,R4, R5 or R13,R6, R7.
1 Datapath and Control (Multicycle datapath) CDA 3101 Discussion Section 11.
1 Chapter Five The Processor: Datapath and Control.
The Processor: Datapath & Control
EECE476 Lecture 9: Multi-cycle CPU Datapath Chapter 5: Section 5.5 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.
CSE378 Multicycle impl,.1 Drawbacks of single cycle implementation All instructions take the same time although –some instructions are longer than others;
1 5.5 A Multicycle Implementation A single memory unit is used for both instructions and data. There is a single ALU, rather than an ALU and two adders.
Levels in Processor Design
©UCB CS 161Computer Architecture Chapter 5 Lecture 11 Instructor: L.N. Bhuyan Adapted from notes by Dave Patterson (http.cs.berkeley.edu/~patterson)
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
331 W10.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 10 Building a Multi-Cycle Datapath [Adapted from Dave Patterson’s.
The Processor: Datapath & Control. Implementing Instructions Simplified instruction set memory-reference instructions: lw, sw arithmetic-logical instructions:
Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.
COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections
Datapath and Control: MultiCycle Implementation. Performance of Single Cycle Machines °Assume following operation times: Memory units : 200 ps ALU and.
EE204 L12-Single Cycle DP PerformanceHina Anwar Khan EE204 Computer Architecture Single Cycle Data path Performance.
CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides
1 CS/COE0447 Computer Organization & Assembly Language Multi-Cycle Execution.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Let’s look at a normal lw instruction first… 1. 2 Register file addresscontent 6 (00110) (00111) OpcodeSource register Destination register.
IT 251 Computer Organization and Architecture Multi Cycle CPU Datapath Chia-Chi Teng.
LECTURE 6 Multi-Cycle Datapath and Control. SINGLE-CYCLE IMPLEMENTATION As we’ve seen, single-cycle implementation, although easy to implement, could.
ECE-C355 Computer Structures Winter 2008 The MIPS Datapath Slides have been adapted from Prof. Mary Jane Irwin ( )
Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.
Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.
Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 10: Control Design
MIPS Processor.
Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.
1 CS/COE0447 Computer Organization & Assembly Language Chapter 5 Part 3.
1 The final datapath. 2 Control  The control unit is responsible for setting all the control signals so that each instruction is executed properly. —The.
Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:
Multi-Cycle Datapath and Control
Chapter 5: A Multi-Cycle CPU.
Single Cycle CPU.
Exceptions Another form of control hazard Could be caused by…
Computer Architecture
IT 251 Computer Organization and Architecture
Performance of Single-cycle Design
Systems Architecture I
Multi-Cycle CPU.
Single Cycle Processor
D.4 Finite State Diagram for the Multi-cycle processor
Multi-Cycle CPU.
ECS 154B Computer Architecture II Spring 2009
Multiple Cycle Implementation of MIPS-Lite CPU
Multicycle Approach Break up the instructions into steps
A pipeline diagram Clock cycle lw $t0, 4($sp) IF ID
The Multicycle Implementation
CS/COE0447 Computer Organization & Assembly Language
Pipelining in more detail
Chapter Five The Processor: Datapath and Control
The Multicycle Implementation
Systems Architecture I
Vishwani D. Agrawal James J. Danaher Professor
COSC 2021: Computer Organization Instructor: Dr. Amir Asif
Processor: Multi-Cycle Datapath & Control
Multi-Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Chapter Four The Processor: Datapath and Control
Control Unit for Multiple Cycle Implementation
5.5 A Multicycle Implementation
Processor Design Datapath and Design.
Systems Architecture I
Control Unit for Multiple Cycle Implementation
FloorPlan for Multicycle MIPS
Control Unit (single cycle implementation)
The Processor: Datapath & Control.
Single Cycle MIPS Implementation
CS161 – Design and Architecture of Computer Systems
Presentation transcript:

Multicycle Review Performance Examples

Single Cycle MIPS Implementation All instructions take the same amount of time Signals propagate along longest path CPI = 1 Lots of operations happening in parallel Increment PC ALU Branch target computation Inefficient

Multicycle MIPS Implementation Instructions take different number of cycles Cycles are identical in length Share resources across cycles E.g. one ALU for everything Minimize hardware Cycles are independent across instructions R-type and memory-reference instructions do different things in their 4th cycles CPI is 3,4, or 5 depending on instruction

Multicycle versions of various instructions R-type (add, sub, etc.) – 4 cycles 1. Read instruction 2. Decode/read registers 3. ALU operation 4. ALU Result stored back to destination register. Branch – 3 cycles 1. Read instruction 2. Get branch address (ALU); read regs for comparing. 3. ALU compares registers; if branch taken, update PC

Multicycle versions of various instructions Load – 5 cycles 1. Read instruction 2. Decode/read registers 3. ALU adds immediate to register to form address 4. Address passed to memory; data is read into MDR 5. Data in MDR is stored into destination register Store – 4 cycles 1. Read instruction 2. Decode/read registers 3. ALU adds immediate to a register to form address 4. Save data from the other source register into memory at address from cycle 3

Control for new instructions Suppose we introduce lw2r: lw2r $1, $2, $3: compute address as $2+$3 put result into $1. In other words: lw $1, 0($2+$3) R-type instruction How does the state diagram change?

Control for new instructions Suppose we introduce lw2r: lw2r $1, $2, $3: compute address as $2+$3 Load value at this address into $1 In other words: lw $1, 0($2+$3) R-type instruction How does the state diagram change? New states: A,B,C State 1  (op=‘lw2r’) State A  State B  State C  back to 0 A controls: ALUOp=00, ALUSrcA=1, ALUSrcB=0 B controls: MemRead=1, IorD = 1 C controls: RegDst = 1, RegWrite = 1, MemToReg = 1

Performance CPI: cycles per instruction Average CPI based on instruction mixes Execution time = IC * CPI * C Where IC = instruction count; C = clock cycle time Performance: inverse of execution time MIPS = million instructions per second Higher is better Amdahl’s Law:

Performance Examples Finding average CPI:

Performance Examples Finding average CPI: CPI = 0.50* * * *1 CPI = 1.74

Performance Examples CPI = 1.74 Assume a 2GHz P4, with program consisting of 1,000,000,000 instructions. Find execution time

Performance Examples CPI = 1.74, 2GHz P4, 10^9 instructions. Execution_time = IC * CPI * Cycletime = 10^9 * 1.74 * 0.5 ns = 0.87 seconds

Performance Examples We improve the design and change CPI of load/store to 1. Speedup assuming the same program?

Performance Examples We improve the design and change CPI of load/store to 1. Speedup assuming the same program/cycle time? CPI new = 0.5* * * *1 CPI new = 1.24 Speedup = 1.74/1.24 = 1.4

Amdahl’s Law Suppose I make my add instructions twice as fast. Suppose 20% of my program is doing adds Speedup? What if I make the adds infinitely fast?

Amdahl’s Law Suppose I make my add instructions twice as fast. Suppose 20% of my program is doing adds Speedup? New Exectime = old_exectime(4/5 + (1/5)/2) = 9/10 * old_exectime Speedup = 10/9 What if I make the adds infinitely fast? Speedup = 5/4, only 20% improvement!

Multicycle performance example Multicycle can have better performance than single cycle Instructions take only as many cycles as they need CPI Example Loads: 5, stores: 4, R-type: 4, branches: 3 % of total instructions: loads: 22%, stores: 11%, R-type: 50%, branches: 17% Same # of instructions for single cycle and multicycle! CPI single = 1 but each cycle of M single is equivalent to 5 cycles of M multi So effectively, CPI single = 5 for this comparison CPI multi = 5* * * *.17 = 4.05 Speedup = 5/4.05 = 1.2