Presentation is loading. Please wait.

Presentation is loading. Please wait.

Designing a Multicycle Processor

Similar presentations


Presentation on theme: "Designing a Multicycle Processor"— Presentation transcript:

1 Designing a Multicycle Processor

2 The Big Picture: Where are We Now?
The Five Classic Components of a Computer Today’s Topic: Designing the Datapath for the Multiple Clock Cycle Datapath Processor Input Control Memory So where are in in the overall scheme of things. Well, we just finished designing the processor’s datapath. Now I am going to show you how to design the control for the datapath. +1 = 7 min. (X:47) Datapath Output

3 Abstract View of our single cycle processor
Main Control op ALU control fun ALUSrc Equal ExtOp MemRd MemWr nPC_sel RegDst RegWr MemWr ALUctr Reg. Wrt Register Fetch ALU Ext Mem Access PC Next PC Instruction Fetch Result Store Data Mem looks like a FSM with PC as state

4 What’s wrong with our CPI=1 processor?
Arithmetic & Logical PC Inst Memory Reg File ALU mux mux setup Load PC Inst Memory Reg File ALU Data Mem mux mux setup Critical Path Store PC Inst Memory Reg File ALU Data Mem mux Branch PC Inst Memory Reg File cmp mux Long Cycle Time All instructions take as much time as the slowest Real memory is not as nice as our idealized memory cannot always get the job done in one (short) cycle

5 Memory Access Time Physics => fast memories are small (large memories are slow) question: register file vs. memory => Use a hierarchy of memories Storage Array selected word line storage cell address bit line address decoder sense amps mem. bus proc. bus memory L2 Cache Cache Processor 1 time-period time-periods 2-3 time-periods

6 Reducing Cycle Time Cut combinational dependency graph and insert register / latch Do same work in two fast cycles, rather than one slow one May be able to short-circuit path and remove some components for some instructions! storage element storage element Acyclic Combinational Logic Acyclic Combinational Logic (A) storage element Acyclic Combinational Logic (B) storage element

7 Worst Case Timing (Load)
Clk Clk-to-Q PC Old Value New Value Instruction Memoey Access Time Rs, Rt, Rd, Op, Func Old Value New Value Delay through Control Logic ALUctr Old Value New Value ExtOp Old Value New Value ALUSrc Old Value New Value This timing diagram shows the worst case timing of our single cycle datapath which occurs at the load instruction. Clock to Q time after the clock tick, PC will present its new value to the Instruction memory. After a delay of instruction access time, the instruction bus (Rs, Rt, ...) becomes valid. Then three things happens in parallel: (a) First the Control generates the control signals (Delay through Control Logic). (b) Secondly, the regiser file is access to put Rs onto busA. (c) And we have to sign extended the immediate field to get the second operand (busB). Here I asuume register file access takes longer time than doing the sign extension so we have to wait until busA valid before the ALU can start the address calculation (ALU delay). With the address ready, we access the data memory and after a delay of the Data Memory Access time, busW will be valid. And by this time, the control unit whould have set the RegWr signal to one so at the next clock tick, we will write the new data coming from memory (busW) into the register file. +3 = 77 min. (Y:57) MemtoReg Old Value New Value Register Write Occurs RegWr Old Value New Value Register File Access Time busA Old Value New Value Delay through Extender & Mux busB Old Value New Value ALU Delay Address Old Value New Value Data Memory Access Time busW 2/23/04 Old Value ©UCB Spring 2004 New

8 Basic Limits on Cycle Time
Next address logic PC <= branch ? PC + offset : PC + 4 Instruction Fetch InstructionReg <= Mem[PC] Register Access A <= R[rs] ALU operation R <= A + B Control MemRd MemWr RegWr MemWr nPC_sel RegDst ALUSrc ALUctr ExtOp Reg. File Operand Fetch Exec Instruction Fetch Mem Access PC Next PC Result Store Data Mem

9 Partitioning the CPI=1 Datapath
Add registers between smallest steps Place enables on all registers Equal MemRd MemWr RegWr MemWr nPC_sel RegDst ExtOp ALUSrc ALUctr Reg. File Operand Fetch Exec Instruction Fetch Mem Access PC Next PC Result Store Data Mem

10 Example Multicycle Datapath
Ext ALU Reg. File Mem Access Data Result Store RegDst RegWr MemWr MemRd S M MemToReg Equal ALUctr ALUSrc ExtOp nPC_sel E Reg File A PC IR Next PC B Instruction Fetch Operand Fetch Critical Path ?

11 Recall: Step-by-step Processor Design
Step 1: ISA => Logical Register Transfers Step 2: Components of the Datapath Step 3: RTL + Components => Datapath Step 4: Datapath + Logical RTs => Physical RTs Step 5: Physical RTs => Control

12 Step 4: R-rtype (add, sub, . . .)
Logical Register Transfer Physical Register Transfers inst Logical Register Transfers ADDU R[rd] <– R[rs] + R[rt]; PC <– PC + 4 inst Physical Register Transfers IR <– MEM[pc] ADDU A<– R[rs]; B <– R[rt] S <– A + B R[rd] <– S; PC <– PC + 4 Time E Reg. File Reg File A S Exec PC IR Next PC Inst. Mem B Mem Access M Data Mem

13 Step 4: Logical immed Logical Register Transfer
Physical Register Transfers inst Logical Register Transfers ORI R[rt] <– R[rs] OR ZExt(Im16); PC <– PC + 4 inst Physical Register Transfers IR <– MEM[pc] ORI A<– R[rs]; B <– R[rt] S <– A or ZExt(Im16) R[rt] <– S; PC <– PC + 4 Time E Reg. File Reg File A S Exec PC IR Next PC Inst. Mem B Mem Access M Data Mem

14 Step 4 : Load Logical Register Transfer Physical Register Transfers
inst Logical Register Transfers LW R[rt] <– MEM[R[rs] + SExt(Im16)]; PC <– PC + 4 Logical Register Transfer Physical Register Transfers inst Physical Register Transfers IR <– MEM[pc] LW A<– R[rs]; B <– R[rt] S <– A + SExt(Im16) M <– MEM[S] R[rd] <– M; PC <– PC + 4 Time E Reg. File Reg File A S Exec PC IR Next PC Inst. Mem B Mem Access M Data Mem

15 Step 4 : Store Logical Register Transfer Physical Register Transfers
inst Logical Register Transfers SW MEM[R[rs] + SExt(Im16)] <– R[rt]; PC <– PC + 4 Logical Register Transfer Physical Register Transfers inst Physical Register Transfers IR <– MEM[pc] SW A<– R[rs]; B <– R[rt] S <– A + SExt(Im16); MEM[S] <– B PC <– PC + 4 Time E Reg. File Reg File A S Exec PC IR Next PC Inst. Mem B Mem Access M Data Mem

16 Step 4 : Branch Logical Register Transfer Physical Register Transfers
inst Logical Register Transfers BEQ if R[rs] == R[rt] then PC <= PC + 4+SExt(Im16) || 00 else PC <= PC + 4 inst Physical Register Transfers IR <– MEM[pc] BEQ E<– (R[rs] = R[rt]) if !E then PC <– PC else PC <–PC+4+SExt(Im16)||00 Time E Reg. File Reg File S A Exec PC IR Next PC Inst. Mem B Mem Access M Data Mem 2/23/04 ©UCB Spring 2004

17 Alternative datapath (book): Multiple Cycle Datapath
Miminizes Hardware: 1 memory, 1 adder PCWr PCWrCond PCSrc BrWr Zero IorD MemWr IRWr RegDst RegWr ALUSelA 1 Target 32 32 Mux PC Mux 1 32 Zero Rs Mux 1 Ra 32 RAdr 5 32 Rt Rb busA 32 32 Ideal Memory ALU Instruction Reg 5 Reg File 32 ALU Out Putting it all together, here it is: the multiple cycle datapath we set out to built. +1 = 47 min. (Y:47) Mux 1 Rt 4 Rw 32 WrAdr 32 32 Rd 1 32 Din Dout busW busB 32 32 2 ALU Control Mux 1 3 << 2 Extend Imm 16 32 ALUOp ExtOp MemtoReg ALUSelB

18 Our Control Model State specifies control points for Register Transfer
Transfer occurs upon exiting state (same falling edge) inputs (conditions) Next State Logic State X Register Transfer Control Points Control State Depends on Input Output Logic outputs (control points)

19 Step 4  Control Specification for multicycle proc
“instruction fetch” IR <= MEM[PC] A <= R[rs] B <= R[rt] “decode / operand fetch” LW R-type ORi SW BEQ Execute Memory Write-back S <= A fun B S <= A or ZX S <= A + SX S <= A + SX PC <= Next(PC,Equal) M <= MEM[S] MEM[S] <= B PC <= PC + 4 R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4

20 Traditional FSM Controller
next state state op cond control points Truth Table next State control points 11 Equal 6 State 4 op datapath State

21 Step 5  (datapath + state diagram control)
Translate RTs into control points Assign states Then go build the controller

22 Mapping RTs to Control Points
IR <= MEM[PC] “instruction fetch” imem_rd, IRen A <= R[rs] B <= R[rt] “decode” Aen, Ben, Een LW R-type ORi SW BEQ Execute Memory Write-back S <= A fun B S <= A or ZX S <= A + SX S <= A + SX PC <= Next(PC,Equal) ALUfun, Sen M <= MEM[S] MEM[S] <= B PC <= PC + 4 R[rd] <= S PC <= PC + 4 RegDst, RegWr, PCen R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4

23 Assigning States “instruction fetch” IR <= MEM[PC] 0000 “decode”
A <= R[rs] B <= R[rt] “decode” 0001 LW R-type ORi SW BEQ Execute Memory Write-back S <= A fun B S <= A or ZX S <= A + SX S <= A + SX PC <= Next(PC) 0100 0110 1000 1011 0011 M <= MEM[S] MEM[S] <= B PC <= PC + 4 1001 1100 R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4 0101 0111 1010

24 (Mostly) Detailed Control Specification (missing0)
State Op field Eq Next IR PC Ops Exec Mem Write-Back en sel A B E Ex Sr ALU S R W M M-R Wr Dst 0000 ?????? ? 0001 BEQ x 0001 R-type x 0001 ORI x 0001 LW x 0001 SW x 0011 xxxxxx x x 0011 xxxxxx x x 0100 xxxxxx x fun 1 0101 xxxxxx x 0110 xxxxxx x or 1 0111 xxxxxx x 1000 xxxxxx x add 1 1001 xxxxxx x 1010 xxxxxx x 1011 xxxxxx x add 1 1100 xxxxxx x -all same in Moore machine BEQ: R: ORi: LW: SW:

25 Performance Evaluation
What is the average CPI? state diagram gives CPI for each instruction type workload gives frequency of each type Type CPIi for type Frequency CPIi x freqIi Arith/Logic 4 40% 1.6 Load 5 30% 1.5 Store 4 10% 0.4 branch 3 20% 0.6 Average CPI: 4.1

26 Use this structure to construct a simple “microsequencer”
Controller Design The state digrams that arise define the controller for an instruction set processor are highly structured Use this structure to construct a simple “microsequencer” Control reduces to programming this very simple device  microprogramming sequencer control datapath control micro-PC microinstruction

27 Example: Jump-Counter
i i 0000 i+1 Map ROM None of above: Do nothing (for wait states) op-code zero inc load Counter

28 Using a Jump Counter “instruction fetch” IR <= MEM[PC] 0000 inc
A <= R[rs] B <= R[rt] “decode” 0001 load LW R-type ORi SW BEQ Execute Memory Write-back S <= A fun B S <= A or ZX S <= A + SX S <= A + SX PC <= Next(PC) 0100 0110 1000 1011 0011 inc inc inc inc zero M <= MEM[S] MEM[S] <= B PC <= PC + 4 1001 1100 inc R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4 zero 0101 0111 1010 zero zero zero

29 Our Microsequencer taken datapath control Z I L Micro-PC op-code
Map ROM

30 Microprogram Control Specification
µPC Taken Next IR PC Ops Exec Mem Write-Back en sel A B Ex Sr ALU S R W M M-R Wr Dst 0000 ? inc 1 load 1 1 zero zero 0100 x inc fun 1 0101 x zero 0110 x inc or 1 0111 x zero 1000 x inc add 1 1001 x inc x zero 1011 x inc add 1 1100 x zero BEQ R: ORi: LW: SW:

31 “microprogrammed control”
Overview of Control Control may be designed using one of several initial representations. The choice of sequence control, and how logic is represented, can then be determined independently; the control can then be implemented with one of several methods using a structured logic technique. Initial Representation Finite State Diagram Microprogram Sequencing Control Explicit Next State Microprogram counter Function + Dispatch ROMs Logic Representation Logic Equations Truth Tables Implementation PLA ROM Technique “hardwired control” “microprogrammed control”

32 Disadvantages of the Single Cycle Processor
Summary Disadvantages of the Single Cycle Processor Long cycle time Cycle time is too long for all instructions except the Load Multiple Cycle Processor: Divide the instructions into smaller steps Execute each step (instead of the entire instruction) in one cycle Partition datapath into equal size chunks to minimize cycle time ~10 levels of logic between latches Follow same 5-step method for designing “real” processor

33 Control is specified by finite state digram
Summary (cont’d) Control is specified by finite state digram Specialize state-diagrams easily captured by microsequencer simple increment & “branch” fields datapath control fields Control design reduces to Microprogramming Control is more complicated with: complex instruction sets restricted datapaths (see the book) Simple Instruction set and powerful datapath  simple control could try to reduce hardware (see the book) rather go for speed => many instructions at once!


Download ppt "Designing a Multicycle Processor"

Similar presentations


Ads by Google