Download presentation
1
EECS 322: Computer Architecture
Single-cycle Multi-cycle FSM controller Multi-cycle microcontroller CWRU EECS 322 March 6, 2000
2
MIPS instruction formats
B y t e H a l f w o r d W R g i s M m 1 . I n 2 3 4 P C - o p r s r t I m m e d i a t e Arithmetic add $rd,$rs,$rt o p r s r t r d . . . f u n c t Data Transfer lw $rd,offset($rs) sw $rd,offset($rs) o p r s r t A d d r e s s + l a t i v e a d d r e s s i n g Conditional branch beq $rd,$rs,raddr o p r s r t A d d r e s s P C + Unconditional jump j addr 5 . P s e u d o d i r e c t a d d r e s s i n g o p A d d r e s s P C CWRU EECS 322 March 6, 2000
3
Single Cycle = 2 adders + 1 ALU
Single Cycle Implementation Calculate instruction cycle time assuming negligible delays except: memory (2ns), ALU and adders (2ns), register file access (1ns) Adder2: PCPC+signext(IR[15-0]) <<2 Adder1: PC PC + 4 Adder3: Arithmetic ALU Single Cycle = 2 adders + 1 ALU CWRU EECS 322 March 6, 2000
4
Architectural improved performance without speeding up the clock!
Single/Multi-Clock Comparison add = 6ns = Fetch(2ns)+RegR(1ns)+ALU(2ns)+RegW(2ns) lw = 8ns = Fetch(2ns)+RegR(1ns)+ALU(2ns)+MemR(2ns)+RegW(2ns) sw = 7ns = Fetch(2ns)+RegR(1ns)+ALU(2ns)+MemW(2ns) beq = 5ns = Fetch(2ns)+RegR(1ns)+ALU(2ns) j = 2ns = Fetch(2ns) Architectural improved performance without speeding up the clock! CWRU EECS 322 March 6, 2000
5
Some Design Trade-offs
High level design techniques Algorithms: change instruction usage minimize ninstruction * tinstruction Architecture: Datapath, FSM, Microprogramming adders: ripple versus carry lookahead multiplier types, … Lower level design techniques (closer to physical design) clocking: single verus multi clock technology: layout tools: better place and route process technology: 0.5 micron to .18 micron CWRU EECS 322 March 6, 2000
6
Single-cycle problems
what if we had a more complicated instruction like floating point? (fadd = 30ns, fmul=100ns) wasteful of area (2 adders + 1 ALU) One Solution: use a “smaller” cycle time (if the technology can do it) have different instructions take different numbers of cycles a “multicycle” datapath (1 ALU) Multi-cycle approach We will be reusing functional units: ALU used to increment PC (Adder1) and to compute address (Adder2) Memory used for instruction and data CWRU EECS 322 March 6, 2000
7
Reality Check: Intel 8086 clock cycles
Arithmetic add reg16, reg mul dx:ax, reg16 very slow!! imul dx:ax, reg div dx:ax, reg idiv dx:ax, reg16 Data Transfer mov reg16, mem mov mem16, reg16 Conditional Branch /16 je displacement8 Unconditional Jump jmp segment:offset16 CWRU EECS 322 March 6, 2000
8
Multi-cycle = 1 ALU + Controller
Multi-cycle Datapath Multi-cycle = 1 ALU + Controller CWRU EECS 322 March 6, 2000
9
Multi-cycle Datapath: with controller
CWRU EECS 322 March 6, 2000
10
Multi-cycle: 5 execution steps
T1 (a,lw,sw,beq,j) Instruction Fetch T2 (a,lw,sw,beq,j) Instruction Decode and Register Fetch T3 (a,lw,sw,beq,j) Execution, Memory Address Calculation, or Branch Completion T4 (a,lw,sw) Memory Access or R-type instruction completion T5 (a,lw) Write-back step INSTRUCTIONS TAKE FROM CYCLES! CWRU EECS 322 March 6, 2000
11
Multi-cycle Approach All operations in each clock cycle Ti are done in parallel not sequential! For example, T1, IR = Memory[PC] and PC=PC+4 are done simultaneously! T1 T2 T3 T4 T5 Between Clock T2 and T3 the microcode sequencer will do a dispatch 1 CWRU EECS 322 March 6, 2000
12
Multi-cycle using Microprogramming
Finite State Machine ( hardwired control ) Microcode controller M i c r o c o d e C o m b i n a t i o n a l s t o r a g e c o n t r o l l o g i c D a t a p a t h c o n t r o l o u t p u t s D a t a p a t h O u t p u t s c o n t r o l firmware o u t p u t s O u t p u t s I n p u t 1 I n p u t s M i c r o p r o g r a m c o u n t e r S e q u e n c i n g c o n t r o l A d d e r N e x t s t a t e A d d r e s s s e l e c t l o g i c I n p u t s f r o m i n s t r u c t i o n S t a t e r e g i s t e r r e g i s t e r o p c o d e f i e l d I n p u t s f r o m i n s t r u c t i o n r e g i s t e r o p c o d e f i e l d Requires microcode memory to be faster than main memory CWRU EECS 322 March 6, 2000
13
Microcode: Trade-offs
Distinction between specification and implementation is sometimes blurred Specification Advantages: Easy to design and write (maintenance) Design architecture and microcode in parallel Implementation (off-chip ROM) Advantages Easy to change since values are in memory Can emulate other architectures Can make use of internal registers Implementation Disadvantages, SLOWER now that: Control is implemented on same chip as processor ROM is no longer faster than RAM No need to go back and make changes CWRU EECS 322 March 6, 2000
14
Microinstruction format
CWRU EECS 322 March 6, 2000
15
Microinstruction format: Maximally vs. Minimally Encoded
No encoding: 1 bit for each datapath operation faster, requires more memory (logic) used for Vax 780 — an astonishing 400K of memory! Lots of encoding: send the microinstructions through logic to get control signals uses less memory, slower Historical context of CISC: Too much logic to put on a single chip with everything else Use a ROM (or even RAM) to hold the microcode It’s easy to add new instructions CWRU EECS 322 March 6, 2000
16
Microprogramming: program
CWRU EECS 322 March 6, 2000
17
Microprogramming: program overview
T1 T2 T3 T4 T5 Fetch Fetch+1 Dispatch 1 Rformat1 BEQ1 JUMP1 Mem1 Dispatch 2 Rformat1+1 LW2 SW2 LW2+1 CWRU EECS 322 March 6, 2000
18
Microprogram steping: T1 Fetch
(Done in parallel) IRMEMORY[PC] & PC PC + 4 Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq Fetch add pc 4 ReadPC ALU Seq CWRU EECS 322 March 6, 2000
19
T2 Fetch + 1 AReg[IR[25-21]] & BReg[IR[20-16]] & ALUOutPC+signext(IR[15-0]) <<2 Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq add pc ExtSh Read D#1 CWRU EECS 322 March 6, 2000
20
T3 Dispatch 1: Mem1 ALUOut A + sign_extend(IR[15-0])
Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq Mem1 add A ExtSh D#2 CWRU EECS 322 March 6, 2000
21
T4 Dispatch 2: LW2 MDR Memory[ALUOut]
Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq LW2 ReadALU Seq CWRU EECS 322 March 6, 2000
22
T5 LW2+1 Reg[ IR[20-16] ] MDR Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq WMDR Fetch CWRU EECS 322 March 6, 2000
23
T4 Dispatch 2: SW2 Memory[ ALUOut ] B
Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq SW2 WriteALU Fetch CWRU EECS 322 March 6, 2000
24
T3 Dispatch 1: Rformat1 ALUOut A op(IR[31-26]) B op(IR[31-26])
Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq Rf...1 op A B Seq CWRU EECS 322 March 6, 2000
25
T4 Dispatch 1: Rformat1+1 Reg[ IR[15-11] ] ALUOut
Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq WALU Fetch CWRU EECS 322 March 6, 2000
26
T3 Dispatch 1: BEQ1 If (A - B == 0) { PC ALUOut; }
ALUOut = Address computed in T2 ! Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq BEQ1 subt A B ALUOut-0 Fetch CWRU EECS 322 March 6, 2000
27
T3 Dispatch 1: Jump1 PC PC[31-28] || IR[25-0]<<2
Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq Jump Jaddr Fetch CWRU EECS 322 March 6, 2000
28
The Big Picture CWRU EECS 322 March 6, 2000
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.