Download presentation
Presentation is loading. Please wait.
1
Designing a Multicycle Processor
Fudan University, Software School, 2017
2
Processor Design is a Process
Bottom-up assemble components in target technology to establish critical timing Top-down specify component behavior from high-level requirements Iterative refinement establish partial solution, expand and improve Instruction Set Architecture processor datapath control Reg. File Mux ALU Reg Mem Decoder Sequencer Cells Gates 2
3
A Single Cycle Datapath
Instruction<31:0> nPC_sel Instruction Fetch Unit Rd Rt <21:25> <16:20> <11:15> <0:15> RegDst Clk 1 Mux Rs Rt Rs Rt Rd Imm16 RegWr ALUctr 5 5 5 MemtoReg busA Equal MemWr Rw Ra Rb busW 32 32 32-bit Registers ALU 32 busB 32 Clk 32 Mux Mux 32 WrEn Adr 1 1 Data In 32 Data Memory imm16 Extender 32 16 Clk ALUSrc ExtOp 3
4
The “Truth Table” for the Main Control
op 6 ALU (Local) func 3 ALUop ALUctr RegDst ALUSrc : R-type ori lw sw beq jump RegDst ALUSrc MemtoReg RegWrite MemWrite Branch Jump ExtOp ALUop (Symbolic) 1 x “R-type” Or Add Subtract xxx op ALUop <2> ALUop <1> ALUop <0> 4
5
PLA Implementation of the Main Control
op<0> op<5> . <0> R-type ori lw sw beq jump RegWrite ALUSrc MemtoReg MemWrite Branch Jump RegDst ExtOp ALUop<2> ALUop<1> ALUop<0> 5
6
Systematic Generation of Control
OPcode Control Logic / Store (PLA, ROM) Decode microinstruction Conditions Instruction Control Points Datapath In our single-cycle processor, each instruction is realized by exactly one control command or “microinstruction” in general, the controller is a finite state machine microinstruction can also control sequencing (see later) 6
7
The Big Picture: Where are We Now?
The Five Classic Components of a Computer Today’s Topic: Designing the Datapath for the Multiple Clock Cycle Datapath Processor Input Control Memory Datapath Output 7
8
Abstract View of our single cycle processor
Main Control op ALU control fun ALUSrc Equal ExtOp MemRd MemWr nPC_sel RegDst RegWr MemWr ALUctr Reg. Wrt Register Fetch ALU Ext Mem Access PC Next PC Instruction Fetch Result Store Data Mem looks like a FSM with PC as state 8
9
What’s wrong with our CPI=1 processor?
Arithmetic & Logical PC Inst Memory Reg File ALU mux mux setup Load PC Inst Memory Reg File ALU Data Mem mux mux setup Critical Path Store PC Inst Memory Reg File ALU Data Mem mux Branch PC Inst Memory Reg File cmp mux Long Cycle Time All instructions take as much time as the slowest Real memory is not as nice as our idealized memory cannot always get the job done in one (short) cycle 9
10
Memory Access Time Physics => fast memories are small (large memories are slow) => Use a hierarchy of memories Storage Array selected word line storage cell address bit line address decoder sense amps mem. bus proc. bus memory L2 Cache Cache Processor 1 time-period time-periods 2-3 time-periods 10
11
Reducing Cycle Time Cut combinational dependency graph and insert register / latch Do same work in two fast cycles, rather than one slow one storage element Acyclic Combinational Logic Logic (A) Logic (B) May be able to short-circuit path and remove some components for some instructions! 11
12
An Abstract View of the Critical Path (Load)
Register file and ideal memory: The CLK input is a factor ONLY during write operation During read operation, behave as combinational logic: Address valid => Output valid after “access time.” Critical Path (Load Operation) = PC’s Clk-to-Q + Instruction Memory’s Access Time + Register File’s Access Time + ALU to Perform a 32-bit Add + Data Memory Access Time + Setup Time for Register File Write + Clock Skew Ideal Instruction Memory Instruction Rd Rs Rt Imm 5 5 5 16 Instruction Address A Data Address Clk PC Rw Ra Rb ALU 32 32 32 Ideal Data Memory 32 32-bit Registers Next Address Data In B Clk Clk 32 12
13
Worst Case Timing (Load)
Clk Clk-to-Q PC Old Value New Value Instruction Memoey Access Time Rs, Rt, Rd, Op, Func Old Value New Value Delay through Control Logic ALUctr Old Value New Value ExtOp Old Value New Value ALUSrc Old Value New Value MemtoReg Old Value New Value Register Write Occurs RegWr Old Value New Value Register File Access Time busA Old Value New Value Delay through Extender & Mux busB Old Value New Value ALU Delay Address Old Value New Value Data Memory Access Time busW Old Value New 13
14
Five stages of single cycle datapath
14
15
Ideal and Real Memory 15
16
Race condition 16
17
How to avoid the race condition
17
18
How to avoid the race condition
18
19
Basic Limits on Cycle Time
Next address logic PC <= branch ? PC + offset : PC + 4 Instruction Fetch InstructionReg <= Mem[PC] Register Access A <= R[rs] ALU operation R <= A + B Control MemRd MemWr nPC_sel RegDst RegWr MemWr ALUSrc ALUctr ExtOp Reg. File Operand Fetch Exec Instruction Fetch Mem Access PC Next PC Result Store Data Mem 19
20
Partitioning the CPI=1 Datapath
Add registers between smallest steps Place enables on all registers Equal MemRd MemWr nPC_sel RegDst RegWr MemWr ExtOp ALUSrc ALUctr Reg. File Operand Fetch Exec Instruction Fetch Mem Access PC Next PC Result Store Data Mem 20
21
Example Multicycle Datapath
Ext ALU Reg. File Mem Access Data Result Store RegDst RegWr MemWr MemRd S M MemToReg Equal ALUctr ALUSrc ExtOp nPC_sel E Reg File A PC IR Next PC B Instruction Fetch Operand Fetch Advantages? 21
22
Recall: Step-by-step Processor Design
Step 1: ISA => Logical Register Transfers Step 2: Components of the Datapath Step 3: RTL + Components => Datapath Step 4: Datapath + Logical RTs => Physical RTs Step 5: Physical RTs => Control 22
23
Step 4: R-rtype (add, sub, . . .)
Logical Register Transfer Physical Register Transfers inst Logical Register Transfers ADDU R[rd] <– R[rs] + R[rt]; PC <– PC + 4 inst Physical Register Transfers IR <– MEM[pc] ADDU A<– R[rs]; B <– R[rt] S <– A + B R[rd] <– S; PC <– PC + 4 Time E Reg. File Reg File A S Exec PC IR Next PC Inst. Mem B Mem Access M Data Mem 23
24
Step 4: Logical immed Logical Register Transfer
Physical Register Transfers inst Logical Register Transfers ORI R[rt] <– R[rs] OR ZExt(Im16); PC <– PC + 4 inst Physical Register Transfers IR <– MEM[pc] ORI A<– R[rs]; B <– R[rt] S <– A or ZExt(Im16) R[rt] <– S; PC <– PC + 4 Time E Reg. File Reg File A S Exec PC IR Next PC Inst. Mem B Mem Access M Data Mem 24
25
Step 4 : Load Logical Register Transfer Physical Register Transfers
inst Logical Register Transfers LW R[rt] <– MEM[R[rs] + SExt(Im16)]; PC <– PC + 4 Logical Register Transfer Physical Register Transfers inst Physical Register Transfers IR <– MEM[pc] LW A<– R[rs]; B <– R[rt] S <– A + SExt(Im16) M <– MEM[S] R[rd] <– M; PC <– PC + 4 Time E Reg. File Reg File A S Exec PC IR Next PC Inst. Mem B Mem Access M Data Mem 25
26
Step 4 : Store Logical Register Transfer Physical Register Transfers
inst Logical Register Transfers SW MEM[R[rs] + SExt(Im16)] <– R[rt]; PC <– PC + 4 Logical Register Transfer Physical Register Transfers inst Physical Register Transfers IR <– MEM[pc] SW A<– R[rs]; B <– R[rt] S <– A + SExt(Im16); MEM[S] <– B PC <– PC + 4 Time E Reg. File Reg File A S Exec PC IR Next PC Inst. Mem B Mem Access M Data Mem 26
27
Step 4 : Branch Logical Register Transfer Physical Register Transfers
inst Logical Register Transfers BEQ if R[rs] == R[rt] then PC <= PC + 4+SExt(Im16) || 00 else PC <= PC + 4 inst Physical Register Transfers IR <– MEM[pc] BEQ E<– (R[rs] = R[rt]) if !E then PC <– PC else PC <–PC+4+SExt(Im16)||00 Time E Reg. File Reg File S A Exec PC IR Next PC Inst. Mem B Mem Access M Data Mem 27
28
Alternative datapath (book): Multiple Cycle Datapath
Miminizes Hardware: 1 memory, 1 adder PCWr PCWrCond PCSrc BrWr Zero IorD MemWr IRWr RegDst RegWr ALUSelA 1 Target 32 32 Mux PC Mux 1 32 Zero Rs Mux 1 Ra 32 RAdr 5 32 Rt Rb busA 32 32 Ideal Memory ALU Instruction Reg 5 Reg File 32 ALU Out Mux 1 Rt 4 Rw 32 WrAdr 32 32 32 Rd 1 Din Dout busW busB 32 32 2 ALU Control Mux 1 3 << 2 Extend Imm 16 32 ALUOp ExtOp MemtoReg ALUSelB 28
29
Instruction Fetch (beginning)
29
30
Instruction Fetch (end)
30
31
Instruction Fetch 31
32
Instruction decode/register fetch
32
33
Instruction decode/register fetch
33
34
Branch Completion 34
35
Instruction decode (R-type)
35
36
The Execution of R-type
36
37
The Completion of R-type
37
38
Instruction decode (ORi)
38
39
The Execution of ORi 39
40
The Completion of ORi 40
41
Instruction decode (mem op)
41
42
Memory address computation
42
43
Memory Access (Store) 43
44
Memory Access (Load) 44
45
Write back 45
46
The directions are defined by a next-state function
Finite State Machine A finite state machine states directions on how to change states. The directions are defined by a next-state function A signal that is not explicitly asserted is deasserted, rather than don’t care. For example, the RegWrite signal should be asserted only when a register file entry is to be written; when it is not explicitly asserted, it must be deasserted. 46
47
Our Control Model State specifies control points for Register Transfer
Transfer occurs upon exiting state (same falling edge) inputs (conditions) Next State Logic State X Register Transfer Control Points Control State Depends on Input Output Logic outputs (control points) 47
48
The complete FSM 48
49
Step 4 Control Specification for multicycle proc
“instruction fetch” IR <= MEM[PC] A <= R[rs] B <= R[rt] “decode / operand fetch” LW R-type ORi SW BEQ Execute Memory Write-back S <= A fun B S <= A or ZX S <= A + SX S <= A + SX PC <= Next(PC,Equal) M <= MEM[S] MEM[S] <= B PC <= PC + 4 R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4 49
50
Traditional FSM Controller
next state state op cond control points Truth Table next State control points 11 Equal 6 State 4 op datapath State 50
51
Step 5 (datapath + state diagram control)
Translate RTs into control points Assign states Then go build the controller 51
52
Mapping RTs to Control Points
IR <= MEM[PC] “instruction fetch” imem_rd, IRen A <= R[rs] B <= R[rt] “decode” Aen, Ben, Een LW R-type ORi SW BEQ Execute Memory Write-back S <= A fun B ALUfun, Sen S <= A or ZX S <= A + SX S <= A + SX PC <= Next(PC,Equal) M <= MEM[S] MEM[S] <= B PC <= PC + 4 R[rd] <= S PC <= PC + 4 RegDst, RegWr, PCen R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4 52
53
Assigning States “instruction fetch” IR <= MEM[PC] 0000 “decode”
A <= R[rs] B <= R[rt] “decode” 0001 LW R-type ORi SW BEQ Execute Memory Write-back S <= A fun B S <= A or ZX S <= A + SX S <= A + SX PC <= Next(PC) 0100 0110 1000 1011 0011 M <= MEM[S] MEM[S] <= B PC <= PC + 4 1001 1100 R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4 0101 0111 1010 53
54
(Mostly) Detailed Control Specification (missing0)
State Op field Eq Next IR PC Ops Exec Mem Write-Back en sel A B E Ex Sr ALU S R W M M-R Wr Dst 0000 ?????? ? 0001 BEQ x 0001 R-type x 0001 ORI x 0001 LW x 0001 SW x 0011 xxxxxx x x 0011 xxxxxx x x 0100 xxxxxx x fun 1 0101 xxxxxx x 0110 xxxxxx x or 1 0111 xxxxxx x 1000 xxxxxx x add 1 1001 xxxxxx x 1010 xxxxxx x 1011 xxxxxx x add 1 1100 xxxxxx x -all same in Moore machine BEQ: R: ORi: LW: SW: 54
55
Performance Evaluation
What is the average CPI? state diagram gives CPI for each instruction type workload gives frequency of each type Type CPIi for type Frequency Arith/Logic 4 40% Load 5 30% Store 4 10% branch 3 20% CPIi x freqIi = +0.4+0.6 Average CPI: 4.1 55
56
Use this structure to construct a simple “microsequencer”
Controller Design The state digrams that arise define the controller for an instruction set processor are highly structured Use this structure to construct a simple “microsequencer” Control reduces to programming this very simple device microprogramming sequencer control datapath control micro-PC microinstruction 56
57
Example: Jump-Counter
i i 0000 i+1 Map ROM op-code zero inc load Counter 57
58
Using a Jump Counter 58
59
Our Microsequencer taken datapath control Z I L Micro-PC op-code
Map ROM 59
60
Microprogram Control Specification
µPC Taken Next IR PC Ops Exec Mem Write-Back en sel A B Ex Sr ALU S R W M M-R Wr Dst 0000 ? inc 1 load 1 1 zero zero 0100 x inc fun 1 0101 x zero 0110 x inc or 1 0111 x zero 1000 x inc add 1 1001 x inc x zero 1011 x inc add 1 1100 x zero BEQ R: ORi: LW: SW: 60
61
Designing a Microinstruction Set
1) Start with list of control signals 2) Group signals together that make sense (vs. random): called “fields” 3) Place fields in some logical order (e.g., ALU operation & ALU operands first and microinstruction sequencing last) 4) To minimize the width, encode operations that will never be used at the same time 5) Create a symbolic legend for the microinstruction format, showing name of field values and how they set the control signals Use computers to design computers 61
62
Again: Alternative multicycle datapath (book)
Miminizes Hardware: 1 memory, 1 adder PCWr PCWrCond PCSrc Zero IorD MemWr IRWr RegDst RegWr ALUSelA 1 32 32 Mux PC Mux 1 32 Instruction Reg Zero Rs Mux 1 Ra 32 RAdr 5 32 Rt ALU Out 32 Rb busA A 32 Ideal Memory 32 ALU 5 Reg File Mux 1 Rt 4 Rw 32 WrAdr 32 B 32 Rd 32 1 32 Din Dout Mem Data Reg busW busB 32 2 ALU Control Mux 1 3 << 2 Extend Imm 16 32 ALUOp ExtOp MemtoReg ALUSelB 62
63
1&2) Start with list of control signals, grouped into fields
Signal name Effect when deasserted Effect when asserted ALUSelA 1st ALU operand = PC 1st ALU operand = Reg[rs] RegWrite None Reg. is written MemtoReg Reg. write data input = ALU Reg. write data input = memory RegDst Reg. dest. no. = rt Reg. dest. no. = rd MemRead None Memory at address is read, MDR <= Mem[addr] MemWrite None Memory at address is written IorD Memory address = PC Memory address = S IRWrite None IR <= Memory PCWrite None PC <= PCSource PCWriteCond None IF ALUzero then PC <= PCSource PCSource PCSource = ALU PCSource = ALUout ExtOp Zero Extended Sign Extended Single Bit Control Signal name Value Effect ALUOp 00 ALU adds ALU subtracts ALU does function code 11 ALU does logical OR ALUSelB 00 2nd ALU input = nd ALU input = Reg[rt] nd ALU input = extended,shift left nd ALU input = extended Multiple Bit Control 63
64
5) Legend of Fields and Symbolic Names
Field Name Values for Field Function of Field with Specific Value 64
65
Controller handles non-ideal memory
“instruction fetch” IR <= MEM[PC] wait ~wait A <= R[rs] B <= R[rt] “decode / operand fetch” LW R-type ORi SW BEQ Execute Memory Write-back PC <= Next(PC) S <= A fun B S <= A or ZX S <= A + SX S <= A + SX M <= MEM[S] MEM[S] <= B ~wait wait wait ~wait R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4 PC <= PC + 4 65
66
Really Simple Time-State Control
instruction fetch IR <= MEM[PC] wait ~wait A <= R[rs] B <= R[rt] decode LW R-type ORi SW BEQ Execute S <= A fun B S <= A or ZX S <= A + SX S <= A + SX Memory M <= MEM[S] MEM[S] <= B wait wait R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4 PC <= Next(PC) write-back PC <= PC + 4 66
67
Time-state Control Path
Local decode and control at each stage Valid IRex IR IRwb Inst. Mem IRmem WB Ctrl Dcd Ctrl Ex Ctrl Mem Ctrl Equal Reg. File Reg File A S Exec PC Next PC B Mem Access M Data Mem 67
68
“microprogrammed control”
Overview of Control Control may be designed using one of several initial representations. The choice of sequence control, and how logic is represented, can then be determined independently; the control can then be implemented with one of several methods using a structured logic technique. Initial Representation Finite State Diagram Microprogram Sequencing Control Explicit Next State Microprogram counter Function + Dispatch ROMs Logic Representation Logic Equations Truth Tables Implementation PLA ROM Technique “hardwired control” “microprogrammed control” 68
69
Disadvantages of the Single Cycle Processor
Summary Disadvantages of the Single Cycle Processor Long cycle time Cycle time is too long for all instructions except the Load Multiple Cycle Processor: Divide the instructions into smaller steps Execute each step (instead of the entire instruction) in one cycle Partition datapath into equal size chunks to minimize cycle time ~10 levels of logic between latches Follow same 5-step method for designing “real” processor 69
70
Control is specified by finite state digram
Summary (cont’d) Control is specified by finite state digram Specialize state-diagrams easily captured by microsequencer simple increment & “branch” fields datapath control fields Control design reduces to Microprogramming Control is more complicated with: complex instruction sets restricted datapaths (see the book) Simple Instruction set and powerful datapath simple control 70
71
Exceptions Exception = unprogrammed control transfer System
user program System Exception Handler Exception: return from exception normal control flow: sequential, jumps, branches, calls, returns Exception = unprogrammed control transfer system takes action to handle the exception must record the address of the offending instruction record any other information necessary to return afterwards returns control to user must save & restore user state 71
72
Two Types of Exceptions: Interrupts and Traps
caused by external events: Network, Keyboard, Disk I/O, Timer asynchronous to program execution Most interrupts can be disabled for brief periods of time Some (like “Power Failing”) are non-maskable (NMI) may be handled between instructions simply suspend and resume user program Traps caused by internal events exceptional conditions (overflow) errors (parity) faults (non-resident page) synchronous to program execution condition must be remedied by the handler instruction may be retried or simulated and program continued or program may be aborted 72
73
Big Picture: user / system modes
Two modes of execution (user/system) : operating system runs in privileged mode and has access to all of the resources of the computer presents “virtual resources” to each user that are more convenient that the physical resources files vs. disk sectors virtual memory vs physical memory protects each user program from others protects system from malicious users. OS is assumed to “know best”, and is trusted code, so enter system mode on exception Exceptions allow the system to taken action in response to events that occur while user program is executing: Might provide supplemental behavior (dealing with denormal floating-point numbers for instance). “Unimplemented instruction” used to emulate instructions that were not included in hardware (I.e. MicroVax) 73
74
Additions to MIPS ISA to support Exceptions?
Exception state is kept in “coprocessor 0”. Use mfc0 read contents of these registers Every register is 32 bits, but may be only partially defined BadVAddr (register 8) register contained memory address at which memory reference occurred Status (register 12) interrupt mask and enable bits Cause (register 13) the cause of the exception Bits 5 to 2 of this register encodes the exception type (e.g undefined instruction=10 and arithmetic overflow=12) EPC (register 14) address of the affected instruction (register 14 of coprocessor 0). Control signals to write BadVAddr, Status, Cause, and EPC Be able to write exception address into PC ( hex) May have to undo PC = PC + 4, since want EPC to point to offending instruction (not its successor): PC = PC - 4 74
75
Example: How Control Handles Traps in our FSD
Undefined Instruction–detected when no next state is defined from state 1 for the op value. We handle this exception by defining the next state value for all op values other than lw, sw, 0 (R-type), jmp, beq, and ori as new state 12. Shown symbolically using “other” to indicate that the op field does not match any of the opcodes that label arcs out of state 1. Arithmetic overflow–detected on ALU ops such as signed add Used to save PC and enter exception handler External Interrupt – flagged by asserted interrupt line Again, must save PC and enter exception handler Note: Challenge in designing control of a real machine is to handle different interactions between instructions and other exception- causing events such that control logic remains small and fast. Complex interactions makes the control unit the most challenging aspect of hardware design 75
76
How add traps and interrupts to state diagram?
“instruction fetch” EPC <= PC - 4 PC <= exp_addr cause <= 0(INT) Handle Interrupt Pending INT IR <= MEM[PC] PC <= PC + 4 0000 overflow EPC <= PC - 4 PC <= exp_addr cause <= 12 (Ovf) undefined instruction EPC <= PC - 4 PC <= exp_addr cause <= 10 (RI) other “decode” S<= PC +SX 0001 LW BEQ R-type ORi SW If A = B then PC <= S S <= A - B S <= A fun B S <= A op ZX S <= A + SX S <= A + SX 0100 0110 1000 1011 0010 M <= MEM[S] MEM[S] <= B 1001 Interrupts are precise because user-visible state committed after exceptions flagged! 1100 R[rd] <= S R[rt] <= S R[rt] <= M 0101 0111 1010 76
77
Where to get more information?
D. Patterson, “Microprograming,” Scientific American, March 1983. D. Patterson and D. Ditzel, “The Case for the Reduced Instruction Set Computer,” Computer Architecture News 8, 6 (October 15, 1980) 77
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.