Designing a Multicycle Processor

Slides:



Advertisements
Similar presentations
EEM 486 EEM 486: Computer Architecture Lecture 4 Designing a Multicycle Processor.
Advertisements

ELEN 350 Multi-Cycle Datapath Adapted from the lecture notes of John Kubiatowicz (UCB) and Hank Walker (TAMU)
361 multicontroller.1 ECE 361 Computer Architecture Lecture 11: Designing a Multiple Cycle Controller.
361 datapath Computer Architecture Lecture 8: Designing a Single Cycle Datapath.
EECC550 - Shaaban #1 Lec # 5 Winter Major CPU Design Steps 1. Analyze instruction set operations using independent RTN ISA => RTN => datapath.
CS61C L26 Single Cycle CPU Datapath II (1) Garcia © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.
CS61C L26 CPU Design : Designing a Single-Cycle CPU II (1) Garcia, Spring 2007 © UCB 3.6 TB DVDs? Maybe!  Researchers at Harvard have found a way to use.
361 multipath..1 ECE 361 Computer Architecture Lecture 10: Designing a Multiple Cycle Processor.
CS152 / Kubiatowicz Lec10.1 3/1/99©UCB Spring 1999 Alternative datapath (book): Multiple Cycle Datapath °Miminizes Hardware: 1 memory, 1 adder Ideal Memory.
EECC550 - Shaaban #1 Lec # 5 Winter CPU Design Steps 1. Analyze instruction set operations using independent ISA => RTN => datapath requirements.
Savio Chau Single Cycle Controller Design Last Time: Discussed the Designing of a Single Cycle Datapath Control Datapath Memory Processor (CPU) Input Output.
EECC550 - Shaaban #1 Lec # 5 Winter CPU Design Steps 1. Analyze instruction set operations using independent RTN => datapath requirements.
EECC550 - Shaaban #1 Lec # 5 Winter CPU Design Steps 1. Analyze instruction set operations using independent RTN => datapath requirements.
ECE 232 L15.Miulticycle.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 15 Multi-cycle.
Microprocessor Design
Give qualifications of instructors: DAP
CS152 / Kubiatowicz Lec9.1 2/26/03©UCB Spring 2003 CS 152 Computer Architecture and Engineering Lecture 9 Designing a Multicycle Processor February 26,
ECE 232 L13. Control.1 ©UCB, DAP’ 97 ECE 232 Hardware Organization and Design Lecture 13 Control Design
CS61C L26 CPU Design : Designing a Single-Cycle CPU II (1) Garcia, Fall 2006 © UCB Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.
EEM 486: Computer Architecture Lecture 3 Designing a Single Cycle Datapath.
CS 61C L16 Datapath (1) A Carle, Summer 2004 © UCB inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #16 – Datapath Andy.
361 control Computer Architecture Lecture 9: Designing Single Cycle Control.
CS61C L26 CPU Design : Designing a Single-Cycle CPU II (1) Garcia, Spring 2010 © UCB inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures.
EECC550 - Shaaban #1 Lec # 5 Spring CPU Design Steps 1. Analyze instruction set operations using independent RTN => datapath requirements.
ECE 232 L12.Datapath.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 12 Datapath.
EECC550 - Shaaban #1 Lec # 5 Spring CPU Design Steps 1. Analyze instruction set operations using independent RTN => datapath requirements.
CS3350B Computer Architecture Winter 2015 Lecture 5.6: Single-Cycle CPU: Datapath Control (Part 1) Marc Moreno Maza [Adapted.
CASE STUDY OF A MULTYCYCLE DATAPATH. Alternative Multiple Cycle Datapath (In Textbook) Minimizes Hardware: 1 memory, 1 ALU Ideal Memory Din Address 32.
EEM 486: Computer Architecture Designing Single Cycle Control.
EEM 486: Computer Architecture Designing a Single Cycle Datapath.
CPE 442 single-cycle datapath.1 Intro. To Computer Architecture CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath.
W.S Computer System Design Lecture 4 Wannarat Suntiamorntut.
CS3350B Computer Architecture Winter 2015 Lecture 5.7: Single-Cycle CPU: Datapath Control (Part 2) Marc Moreno Maza [Adapted.
By Wannarat Computer System Design Lecture 4 Wannarat Suntiamorntut.
Csci 136 Computer Architecture II –Single-Cycle Datapath Xiuzhen Cheng
EEM 486: Computer Architecture Lecture 3 Designing Single Cycle Control.
CS 61C: Great Ideas in Computer Architecture (Machine Structures) Single-Cycle CPU Datapath & Control Part 2 Instructors: Krste Asanovic & Vladimir Stojanovic.
Single Cycle Controller Design
CS/EE 362 multipath..1 ©DAP & SIK 1995 CS/EE 362 Hardware Fundamentals Lecture 14: Designing a Multi-Cycle Datapath (Chapter 5: Hennessy and Patterson)
Design a MIPS Processor (II)
Problem with Single Cycle Processor Design
Designing a Single-Cycle Processor
IT 251 Computer Organization and Architecture
(Chapter 5: Hennessy and Patterson) Winter Quarter 1998 Chris Myers
Designing a Multicycle Processor
Processor (I).
CS/COE0447 Computer Organization & Assembly Language
CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath Start: X:40.
CPU Organization (Design)
Single Cycle CPU Design
(Chapter 5: Hennessy and Patterson) Winter Quarter 1998 Chris Myers
CS 704 Advanced Computer Architecture
CS 704 Advanced Computer Architecture
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
Instructors: Randy H. Katz David A. Patterson
John Lazzaro ( CS152 – Computer Architecture and Engineering Lecture 8 – Multicycle Design and Microcode John.
The Single Cycle Datapath
Single Cycle datapath.
CS152 Computer Architecture and Engineering Lecture 8 Designing a Single Cycle Datapath Start: X:40.
COMS 361 Computer Organization
EECS 361 Computer Architecture Lecture 11: Designing a Multiple Cycle Controller Start X:40.
inst.eecs.berkeley.edu/~cs61c-to
CSC3050 – Computer Architecture
Prof. Giancarlo Succi, Ph.D., P.Eng.
Instructors: Randy H. Katz David A. Patterson
Alternative datapath (book): Multiple Cycle Datapath
COMS 361 Computer Organization
What You Will Learn In Next Few Sets of Lectures
Designing a Single-Cycle Processor
Processor: Datapath and Control
Presentation transcript:

Designing a Multicycle Processor

The Big Picture: Where are We Now? The Five Classic Components of a Computer Today’s Topic: Designing the Datapath for the Multiple Clock Cycle Datapath Processor Input Control Memory So where are in in the overall scheme of things. Well, we just finished designing the processor’s datapath. Now I am going to show you how to design the control for the datapath. +1 = 7 min. (X:47) Datapath Output

Abstract View of our single cycle processor Main Control op ALU control fun ALUSrc Equal ExtOp MemRd MemWr nPC_sel RegDst RegWr MemWr ALUctr Reg. Wrt Register Fetch ALU Ext Mem Access PC Next PC Instruction Fetch Result Store Data Mem looks like a FSM with PC as state

What’s wrong with our CPI=1 processor? Arithmetic & Logical PC Inst Memory Reg File ALU mux mux setup Load PC Inst Memory Reg File ALU Data Mem mux mux setup Critical Path Store PC Inst Memory Reg File ALU Data Mem mux Branch PC Inst Memory Reg File cmp mux Long Cycle Time All instructions take as much time as the slowest Real memory is not as nice as our idealized memory cannot always get the job done in one (short) cycle

Memory Access Time Physics => fast memories are small (large memories are slow) question: register file vs. memory => Use a hierarchy of memories Storage Array selected word line storage cell address bit line address decoder sense amps mem. bus proc. bus memory L2 Cache Cache Processor 1 time-period 20 - 50 time-periods 2-3 time-periods

Reducing Cycle Time Cut combinational dependency graph and insert register / latch Do same work in two fast cycles, rather than one slow one May be able to short-circuit path and remove some components for some instructions! storage element storage element Acyclic Combinational Logic Acyclic Combinational Logic (A)  storage element Acyclic Combinational Logic (B) storage element

Worst Case Timing (Load) Clk Clk-to-Q PC Old Value New Value Instruction Memoey Access Time Rs, Rt, Rd, Op, Func Old Value New Value Delay through Control Logic ALUctr Old Value New Value ExtOp Old Value New Value ALUSrc Old Value New Value This timing diagram shows the worst case timing of our single cycle datapath which occurs at the load instruction. Clock to Q time after the clock tick, PC will present its new value to the Instruction memory. After a delay of instruction access time, the instruction bus (Rs, Rt, ...) becomes valid. Then three things happens in parallel: (a) First the Control generates the control signals (Delay through Control Logic). (b) Secondly, the regiser file is access to put Rs onto busA. (c) And we have to sign extended the immediate field to get the second operand (busB). Here I asuume register file access takes longer time than doing the sign extension so we have to wait until busA valid before the ALU can start the address calculation (ALU delay). With the address ready, we access the data memory and after a delay of the Data Memory Access time, busW will be valid. And by this time, the control unit whould have set the RegWr signal to one so at the next clock tick, we will write the new data coming from memory (busW) into the register file. +3 = 77 min. (Y:57) MemtoReg Old Value New Value Register Write Occurs RegWr Old Value New Value Register File Access Time busA Old Value New Value Delay through Extender & Mux busB Old Value New Value ALU Delay Address Old Value New Value Data Memory Access Time busW 2/23/04 Old Value ©UCB Spring 2004 New

Basic Limits on Cycle Time Next address logic PC <= branch ? PC + offset : PC + 4 Instruction Fetch InstructionReg <= Mem[PC] Register Access A <= R[rs] ALU operation R <= A + B Control MemRd MemWr RegWr MemWr nPC_sel RegDst ALUSrc ALUctr ExtOp Reg. File Operand Fetch Exec Instruction Fetch Mem Access PC Next PC Result Store Data Mem

Partitioning the CPI=1 Datapath Add registers between smallest steps Place enables on all registers Equal MemRd MemWr RegWr MemWr nPC_sel RegDst ExtOp ALUSrc ALUctr Reg. File Operand Fetch Exec Instruction Fetch Mem Access PC Next PC Result Store Data Mem

Example Multicycle Datapath Ext ALU Reg. File Mem Access Data Result Store RegDst RegWr MemWr MemRd S M MemToReg Equal ALUctr ALUSrc ExtOp nPC_sel E Reg File A PC IR Next PC B Instruction Fetch Operand Fetch Critical Path ?

Recall: Step-by-step Processor Design Step 1: ISA => Logical Register Transfers Step 2: Components of the Datapath Step 3: RTL + Components => Datapath Step 4: Datapath + Logical RTs => Physical RTs Step 5: Physical RTs => Control

Step 4: R-rtype (add, sub, . . .) Logical Register Transfer Physical Register Transfers inst Logical Register Transfers ADDU R[rd] <– R[rs] + R[rt]; PC <– PC + 4 inst Physical Register Transfers IR <– MEM[pc] ADDU A<– R[rs]; B <– R[rt] S <– A + B R[rd] <– S; PC <– PC + 4 Time E Reg. File Reg File A S Exec PC IR Next PC Inst. Mem B Mem Access M Data Mem

Step 4: Logical immed Logical Register Transfer Physical Register Transfers inst Logical Register Transfers ORI R[rt] <– R[rs] OR ZExt(Im16); PC <– PC + 4 inst Physical Register Transfers IR <– MEM[pc] ORI A<– R[rs]; B <– R[rt] S <– A or ZExt(Im16) R[rt] <– S; PC <– PC + 4 Time E Reg. File Reg File A S Exec PC IR Next PC Inst. Mem B Mem Access M Data Mem

Step 4 : Load Logical Register Transfer Physical Register Transfers inst Logical Register Transfers LW R[rt] <– MEM[R[rs] + SExt(Im16)]; PC <– PC + 4 Logical Register Transfer Physical Register Transfers inst Physical Register Transfers IR <– MEM[pc] LW A<– R[rs]; B <– R[rt] S <– A + SExt(Im16) M <– MEM[S] R[rd] <– M; PC <– PC + 4 Time E Reg. File Reg File A S Exec PC IR Next PC Inst. Mem B Mem Access M Data Mem

Step 4 : Store Logical Register Transfer Physical Register Transfers inst Logical Register Transfers SW MEM[R[rs] + SExt(Im16)] <– R[rt]; PC <– PC + 4 Logical Register Transfer Physical Register Transfers inst Physical Register Transfers IR <– MEM[pc] SW A<– R[rs]; B <– R[rt] S <– A + SExt(Im16); MEM[S] <– B PC <– PC + 4 Time E Reg. File Reg File A S Exec PC IR Next PC Inst. Mem B Mem Access M Data Mem

Step 4 : Branch Logical Register Transfer Physical Register Transfers inst Logical Register Transfers BEQ if R[rs] == R[rt] then PC <= PC + 4+SExt(Im16) || 00 else PC <= PC + 4 inst Physical Register Transfers IR <– MEM[pc] BEQ E<– (R[rs] = R[rt]) if !E then PC <– PC + 4 else PC <–PC+4+SExt(Im16)||00 Time E Reg. File Reg File S A Exec PC IR Next PC Inst. Mem B Mem Access M Data Mem 2/23/04 ©UCB Spring 2004

Alternative datapath (book): Multiple Cycle Datapath Miminizes Hardware: 1 memory, 1 adder PCWr PCWrCond PCSrc BrWr Zero IorD MemWr IRWr RegDst RegWr ALUSelA 1 Target 32 32 Mux PC Mux 1 32 Zero Rs Mux 1 Ra 32 RAdr 5 32 Rt Rb busA 32 32 Ideal Memory ALU Instruction Reg 5 Reg File 32 ALU Out Putting it all together, here it is: the multiple cycle datapath we set out to built. +1 = 47 min. (Y:47) Mux 1 Rt 4 Rw 32 WrAdr 32 32 Rd 1 32 Din Dout busW busB 32 32 2 ALU Control Mux 1 3 << 2 Extend Imm 16 32 ALUOp ExtOp MemtoReg ALUSelB

Our Control Model State specifies control points for Register Transfer Transfer occurs upon exiting state (same falling edge) inputs (conditions) Next State Logic State X Register Transfer Control Points Control State Depends on Input Output Logic outputs (control points)

Step 4  Control Specification for multicycle proc “instruction fetch” IR <= MEM[PC] A <= R[rs] B <= R[rt] “decode / operand fetch” LW R-type ORi SW BEQ Execute Memory Write-back S <= A fun B S <= A or ZX S <= A + SX S <= A + SX PC <= Next(PC,Equal) M <= MEM[S] MEM[S] <= B PC <= PC + 4 R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4

Traditional FSM Controller next state state op cond control points Truth Table next State control points 11 Equal 6 State 4 op datapath State

Step 5  (datapath + state diagram control) Translate RTs into control points Assign states Then go build the controller

Mapping RTs to Control Points IR <= MEM[PC] “instruction fetch” imem_rd, IRen A <= R[rs] B <= R[rt] “decode” Aen, Ben, Een LW R-type ORi SW BEQ Execute Memory Write-back S <= A fun B S <= A or ZX S <= A + SX S <= A + SX PC <= Next(PC,Equal) ALUfun, Sen M <= MEM[S] MEM[S] <= B PC <= PC + 4 R[rd] <= S PC <= PC + 4 RegDst, RegWr, PCen R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4

Assigning States “instruction fetch” IR <= MEM[PC] 0000 “decode” A <= R[rs] B <= R[rt] “decode” 0001 LW R-type ORi SW BEQ Execute Memory Write-back S <= A fun B S <= A or ZX S <= A + SX S <= A + SX PC <= Next(PC) 0100 0110 1000 1011 0011 M <= MEM[S] MEM[S] <= B PC <= PC + 4 1001 1100 R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4 0101 0111 1010

(Mostly) Detailed Control Specification (missing0) State Op field Eq Next IR PC Ops Exec Mem Write-Back en sel A B E Ex Sr ALU S R W M M-R Wr Dst 0000 ?????? ? 0001 1 0001 BEQ x 0011 1 1 1 0001 R-type x 0100 1 1 1 0001 ORI x 0110 1 1 1 0001 LW x 1000 1 1 1 0001 SW x 1011 1 1 1 0011 xxxxxx 0 0000 1 0 x 0 x 0011 xxxxxx 1 0000 1 1 x 0 x 0100 xxxxxx x 0101 0 1 fun 1 0101 xxxxxx x 0000 1 0 0 1 1 0110 xxxxxx x 0111 0 0 or 1 0111 xxxxxx x 0000 1 0 0 1 0 1000 xxxxxx x 1001 1 0 add 1 1001 xxxxxx x 1010 1 0 1 1010 xxxxxx x 0000 1 0 1 1 0 1011 xxxxxx x 1100 1 0 add 1 1100 xxxxxx x 0000 1 0 0 1 0 -all same in Moore machine BEQ: R: ORi: LW: SW:

Performance Evaluation What is the average CPI? state diagram gives CPI for each instruction type workload gives frequency of each type Type CPIi for type Frequency CPIi x freqIi Arith/Logic 4 40% 1.6 Load 5 30% 1.5 Store 4 10% 0.4 branch 3 20% 0.6 Average CPI: 4.1

Use this structure to construct a simple “microsequencer” Controller Design The state digrams that arise define the controller for an instruction set processor are highly structured Use this structure to construct a simple “microsequencer” Control reduces to programming this very simple device  microprogramming sequencer control datapath control micro-PC microinstruction

Example: Jump-Counter i i 0000 i+1 Map ROM None of above: Do nothing (for wait states) op-code zero inc load Counter

Using a Jump Counter “instruction fetch” IR <= MEM[PC] 0000 inc A <= R[rs] B <= R[rt] “decode” 0001 load LW R-type ORi SW BEQ Execute Memory Write-back S <= A fun B S <= A or ZX S <= A + SX S <= A + SX PC <= Next(PC) 0100 0110 1000 1011 0011 inc inc inc inc zero M <= MEM[S] MEM[S] <= B PC <= PC + 4 1001 1100 inc R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4 zero 0101 0111 1010 zero zero zero

Our Microsequencer taken datapath control Z I L Micro-PC op-code Map ROM

Microprogram Control Specification µPC Taken Next IR PC Ops Exec Mem Write-Back en sel A B Ex Sr ALU S R W M M-R Wr Dst 0000 ? inc 1 0001 0 load 1 1 0011 0 zero 1 0 0011 1 zero 1 1 0100 x inc 0 1 fun 1 0101 x zero 1 0 0 1 1 0110 x inc 0 0 or 1 0111 x zero 1 0 0 1 0 1000 x inc 1 0 add 1 1001 x inc 1 0 1 1010 x zero 1 0 1 1 0 1011 x inc 1 0 add 1 1100 x zero 1 0 0 1 0 BEQ R: ORi: LW: SW:

“microprogrammed control” Overview of Control Control may be designed using one of several initial representations. The choice of sequence control, and how logic is represented, can then be determined independently; the control can then be implemented with one of several methods using a structured logic technique. Initial Representation Finite State Diagram Microprogram Sequencing Control Explicit Next State Microprogram counter Function + Dispatch ROMs Logic Representation Logic Equations Truth Tables Implementation PLA ROM Technique “hardwired control” “microprogrammed control”

Disadvantages of the Single Cycle Processor Summary Disadvantages of the Single Cycle Processor Long cycle time Cycle time is too long for all instructions except the Load Multiple Cycle Processor: Divide the instructions into smaller steps Execute each step (instead of the entire instruction) in one cycle Partition datapath into equal size chunks to minimize cycle time ~10 levels of logic between latches Follow same 5-step method for designing “real” processor

Control is specified by finite state digram Summary (cont’d) Control is specified by finite state digram Specialize state-diagrams easily captured by microsequencer simple increment & “branch” fields datapath control fields Control design reduces to Microprogramming Control is more complicated with: complex instruction sets restricted datapaths (see the book) Simple Instruction set and powerful datapath  simple control could try to reduce hardware (see the book) rather go for speed => many instructions at once!