EECS 322: Computer Architecture

Slides:



Advertisements
Similar presentations
Another Implementation Style
Advertisements

Adding the Jump Instruction
Multicycle Datapath & Control Andreas Klappenecker CPSC321 Computer Architecture.
1  1998 Morgan Kaufmann Publishers Summary:. 2  1998 Morgan Kaufmann Publishers How many cycles will it take to execute this code? lw $t2, 0($t3) lw.
1 Chapter Five The Processor: Datapath and Control.
CS-447– Computer Architecture Lecture 12 Multiple Cycle Datapath
L14 – Control & Execution 1 Comp 411 – Fall /04/09 Control & Execution Finite State Machines for Control MIPS Execution.
Chapter 5 The Processor: Datapath and Control Basic MIPS Architecture Homework 2 due October 28 th. Project Designs due October 28 th. Project Reports.
CSE378 Multicycle impl,.1 Drawbacks of single cycle implementation All instructions take the same time although –some instructions are longer than others;
Microprogramming Andreas Klappenecker CPSC321 Computer Architecture.
CS 161Computer Architecture Chapter 5 Lecture 12
CSCE 212 Chapter 5 The Processor: Datapath and Control Instructor: Jason D. Bakos.
Fall 2007 MIPS Datapath (Single Cycle and Multi-Cycle)
1 Chapter Five. 2 We're ready to look at an implementation of the MIPS Simplified to contain only: –memory-reference instructions: lw, sw –arithmetic-logical.
Preparation for Midterm Binary Data Storage (integer, char, float pt) and Operations, Logic, Flip Flops, Switch Debouncing, Timing, Synchronous / Asynchronous.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
L15 – Control & Execution 1 Comp 411 – Spring /25/08 Control & Execution Finite State Machines for Control MIPS Execution.
331 W10.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 10 Building a Multi-Cycle Datapath [Adapted from Dave Patterson’s.
1  1998 Morgan Kaufmann Publishers We're ready to look at an implementation of the MIPS Simplified to contain only: –memory-reference instructions: lw,
CPU Architecture Why not single cycle? Why not single cycle? Hardware complexity Hardware complexity Why not pipelined? Why not pipelined? Time constraints.
1 The Processor: Datapath and Control We will design a microprocessor that includes a subset of the MIPS instruction set: –Memory access: load/store word.
Class 9.1 Computer Architecture - HUJI Computer Architecture Class 9 Microprogramming.
1 We're ready to look at an implementation of the MIPS Simplified to contain only: –memory-reference instructions: lw, sw –arithmetic-logical instructions:
S. Barua – CPSC 440 CHAPTER 5 THE PROCESSOR: DATAPATH AND CONTROL Goals – Understand how the various.
Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.
Computing Systems The Processor: Datapath and Control.
1 For some program running on machine X, Performance X = 1 / Execution time X "X is n times faster than Y" Performance X / Performance Y = n Problem:
Chapter 5 Processor Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides We're ready to look at an implementation of the MIPS Simplified.
1 Computer Organization & Design Microcode for Control Sec. 5.7 (CDROM) Appendix C (CDROM) / / pdf / lec_3a_notes.pdf.
CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides
1 CS/COE0447 Computer Organization & Assembly Language Multi-Cycle Execution.
1  2004 Morgan Kaufmann Publishers Chapter Five.
EECS 322 Computer Architecture
1  1998 Morgan Kaufmann Publishers Value of control signals is dependent upon: –what instruction is being executed –which step is being performed Use.
Multicycle Implementation
ECE-C355 Computer Structures Winter 2008 The MIPS Datapath Slides have been adapted from Prof. Mary Jane Irwin ( )
Multicycle datapath.
MIPS Processor.
1  2004 Morgan Kaufmann Publishers No encoding: –1 bit for each datapath operation –faster, requires more memory (logic) –used for Vax 780 — an astonishing.
1 CS/COE0447 Computer Organization & Assembly Language Chapter 5 Part 3.
CWRU EECS 318 EECS 318 CAD Computer Aided Design LECTURE 7: Multicycle CPU Instructor: Francis G. Wolff Case Western Reserve University.
Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.
1. 2 MIPS Hardware Implementation Full die photograph of the MIPS R2000 RISC Microprocessor. The 1986 MIPS R2000 with five pipeline stages and 450,000.
Design a MIPS Processor (II)
CS161 – Design and Architecture of Computer Systems
Control & Execution Finite State Machines for Control MIPS Execution.
Systems Architecture I
Computer Organization & Design Microcode for Control Sec. 5
CS/COE0447 Computer Organization & Assembly Language
Chapter Five.
Control & Execution Finite State Machines for Control MIPS Execution.
CSCE 212 Chapter 5 The Processor: Datapath and Control
Processor: Finite State Machine & Microprogramming
Chapter Five The Processor: Datapath and Control
MIPS Processor.
Chapter Five The Processor: Datapath and Control
Systems Architecture I
Multicycle Approach We will be reusing functional units
COSC 2021: Computer Organization Instructor: Dr. Amir Asif
Processor (II).
Processor: Multi-Cycle Datapath & Control
Multicycle Design.
Lecture 17: Multi Cycle MIPS Processor
Multi-Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Review Fig 4.15 page 320 / Fig page 322
Chapter Four The Processor: Datapath and Control
5.5 A Multicycle Implementation
Systems Architecture I
CS161 – Design and Architecture of Computer Systems
Presentation transcript:

EECS 322: Computer Architecture Single-cycle Multi-cycle FSM controller Multi-cycle microcontroller CWRU EECS 322 March 6, 2000

MIPS instruction formats B y t e H a l f w o r d W R g i s M m 1 . I n 2 3 4 P C - o p r s r t I m m e d i a t e Arithmetic add $rd,$rs,$rt o p r s r t r d . . . f u n c t Data Transfer lw $rd,offset($rs) sw $rd,offset($rs) o p r s r t A d d r e s s + l a t i v e a d d r e s s i n g Conditional branch beq $rd,$rs,raddr o p r s r t A d d r e s s P C + Unconditional jump j addr 5 . P s e u d o d i r e c t a d d r e s s i n g o p A d d r e s s P C CWRU EECS 322 March 6, 2000

Single Cycle = 2 adders + 1 ALU Single Cycle Implementation Calculate instruction cycle time assuming negligible delays except: memory (2ns), ALU and adders (2ns), register file access (1ns) Adder2: PCPC+signext(IR[15-0]) <<2 Adder1: PC  PC + 4 Adder3: Arithmetic ALU Single Cycle = 2 adders + 1 ALU CWRU EECS 322 March 6, 2000

Architectural improved performance without speeding up the clock! Single/Multi-Clock Comparison add = 6ns = Fetch(2ns)+RegR(1ns)+ALU(2ns)+RegW(2ns) lw = 8ns = Fetch(2ns)+RegR(1ns)+ALU(2ns)+MemR(2ns)+RegW(2ns) sw = 7ns = Fetch(2ns)+RegR(1ns)+ALU(2ns)+MemW(2ns) beq = 5ns = Fetch(2ns)+RegR(1ns)+ALU(2ns) j = 2ns = Fetch(2ns) Architectural improved performance without speeding up the clock! CWRU EECS 322 March 6, 2000

Some Design Trade-offs High level design techniques Algorithms: change instruction usage minimize  ninstruction * tinstruction Architecture: Datapath, FSM, Microprogramming adders: ripple versus carry lookahead multiplier types, … Lower level design techniques (closer to physical design) clocking: single verus multi clock technology: layout tools: better place and route process technology: 0.5 micron to .18 micron CWRU EECS 322 March 6, 2000

Single-cycle problems what if we had a more complicated instruction like floating point? (fadd = 30ns, fmul=100ns) wasteful of area (2 adders + 1 ALU) One Solution: use a “smaller” cycle time (if the technology can do it) have different instructions take different numbers of cycles a “multicycle” datapath (1 ALU) Multi-cycle approach We will be reusing functional units: ALU used to increment PC (Adder1) and to compute address (Adder2) Memory used for instruction and data CWRU EECS 322 March 6, 2000

Reality Check: Intel 8086 clock cycles Arithmetic 3 add reg16, reg16 118-133 mul dx:ax, reg16 very slow!! 128-154 imul dx:ax, reg16 114-162 div dx:ax, reg16 165-184 idiv dx:ax, reg16 Data Transfer 14 mov reg16, mem16 15 mov mem16, reg16 Conditional Branch 4/16 je displacement8 Unconditional Jump 15 jmp segment:offset16 CWRU EECS 322 March 6, 2000

Multi-cycle = 1 ALU + Controller Multi-cycle Datapath Multi-cycle = 1 ALU + Controller CWRU EECS 322 March 6, 2000

Multi-cycle Datapath: with controller CWRU EECS 322 March 6, 2000

Multi-cycle: 5 execution steps T1 (a,lw,sw,beq,j) Instruction Fetch T2 (a,lw,sw,beq,j) Instruction Decode and Register Fetch T3 (a,lw,sw,beq,j) Execution, Memory Address Calculation, or Branch Completion T4 (a,lw,sw) Memory Access or R-type instruction completion T5 (a,lw) Write-back step INSTRUCTIONS TAKE FROM 3 - 5 CYCLES! CWRU EECS 322 March 6, 2000

Multi-cycle Approach All operations in each clock cycle Ti are done in parallel not sequential! For example, T1, IR = Memory[PC] and PC=PC+4 are done simultaneously! T1 T2 T3 T4 T5 Between Clock T2 and T3 the microcode sequencer will do a dispatch 1 CWRU EECS 322 March 6, 2000

Multi-cycle using Microprogramming Finite State Machine ( hardwired control ) Microcode controller M i c r o c o d e C o m b i n a t i o n a l s t o r a g e c o n t r o l l o g i c D a t a p a t h c o n t r o l o u t p u t s D a t a p a t h O u t p u t s c o n t r o l firmware o u t p u t s O u t p u t s I n p u t 1 I n p u t s M i c r o p r o g r a m c o u n t e r S e q u e n c i n g c o n t r o l A d d e r N e x t s t a t e A d d r e s s s e l e c t l o g i c I n p u t s f r o m i n s t r u c t i o n S t a t e r e g i s t e r r e g i s t e r o p c o d e f i e l d I n p u t s f r o m i n s t r u c t i o n r e g i s t e r o p c o d e f i e l d Requires microcode memory to be faster than main memory CWRU EECS 322 March 6, 2000

Microcode: Trade-offs Distinction between specification and implementation is sometimes blurred Specification Advantages: Easy to design and write (maintenance) Design architecture and microcode in parallel Implementation (off-chip ROM) Advantages Easy to change since values are in memory Can emulate other architectures Can make use of internal registers Implementation Disadvantages, SLOWER now that: Control is implemented on same chip as processor ROM is no longer faster than RAM No need to go back and make changes CWRU EECS 322 March 6, 2000

Microinstruction format CWRU EECS 322 March 6, 2000

Microinstruction format: Maximally vs. Minimally Encoded No encoding: 1 bit for each datapath operation faster, requires more memory (logic) used for Vax 780 — an astonishing 400K of memory! Lots of encoding: send the microinstructions through logic to get control signals uses less memory, slower Historical context of CISC: Too much logic to put on a single chip with everything else Use a ROM (or even RAM) to hold the microcode It’s easy to add new instructions CWRU EECS 322 March 6, 2000

Microprogramming: program CWRU EECS 322 March 6, 2000

Microprogramming: program overview T1 T2 T3 T4 T5 Fetch Fetch+1 Dispatch 1 Rformat1 BEQ1 JUMP1 Mem1 Dispatch 2 Rformat1+1 LW2 SW2 LW2+1 CWRU EECS 322 March 6, 2000

Microprogram steping: T1 Fetch (Done in parallel) IRMEMORY[PC] & PC  PC + 4 Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq Fetch add pc 4 ReadPC ALU Seq CWRU EECS 322 March 6, 2000

T2 Fetch + 1 AReg[IR[25-21]] & BReg[IR[20-16]] & ALUOutPC+signext(IR[15-0]) <<2 Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq add pc ExtSh Read D#1 CWRU EECS 322 March 6, 2000

T3 Dispatch 1: Mem1 ALUOut  A + sign_extend(IR[15-0]) Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq Mem1 add A ExtSh D#2 CWRU EECS 322 March 6, 2000

T4 Dispatch 2: LW2 MDR  Memory[ALUOut] Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq LW2 ReadALU Seq CWRU EECS 322 March 6, 2000

T5 LW2+1 Reg[ IR[20-16] ]  MDR Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq WMDR Fetch CWRU EECS 322 March 6, 2000

T4 Dispatch 2: SW2 Memory[ ALUOut ]  B Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq SW2 WriteALU Fetch CWRU EECS 322 March 6, 2000

T3 Dispatch 1: Rformat1 ALUOut  A op(IR[31-26]) B op(IR[31-26]) Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq Rf...1 op A B Seq CWRU EECS 322 March 6, 2000

T4 Dispatch 1: Rformat1+1 Reg[ IR[15-11] ]  ALUOut Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq WALU Fetch CWRU EECS 322 March 6, 2000

T3 Dispatch 1: BEQ1 If (A - B == 0) { PC  ALUOut; } ALUOut = Address computed in T2 ! Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq BEQ1 subt A B ALUOut-0 Fetch CWRU EECS 322 March 6, 2000

T3 Dispatch 1: Jump1 PC  PC[31-28] || IR[25-0]<<2 Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq Jump1 Jaddr Fetch CWRU EECS 322 March 6, 2000

The Big Picture CWRU EECS 322 March 6, 2000