Processor: Finite State Machine & Microprogramming COMPUTER ARCHITECTURE Processor: Finite State Machine & Microprogramming (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3rd Ed., Morgan Kaufmann, 2007)
COURSE CONTENTS Introduction Instructions Computer Arithmetic Performance Processor: Datapath Processor: Control Pipelining Techniques Memory Input/Output Devices
PROCESSOR: DATAPATH & CONTROL Multicycle Datapath & Control Control: Finite State Machine Control: Microprogramming 3
Defining the Control for Multi-Cycle Datapath Multi-cycle vs single-cycle datapath: for single-cycle, truth-tables to specify setting of control signals based on instruction for multi-cycle, control is more complex due to instruction is executed in steps; control must specify both the control signals in any step & the next step in the sequence Value of control signals dependent upon: what instruction is being executed which step is being performed Two different control techniques: Finite state machine (FSM) Microprogramming Implementation can be derived from specification
Finite State Machine (FSM) Control Consists of set of states & directions on how to change states Each state specifies a set of control signal outputs that are asserted when machine is at that state Each state in FSM takes 1 clock cycle First two states (state 0 & state 1) common for all instructions After state 1, signals asserted depend on instruction (this process is called instruction decoding) After last step (state) of an instruction, FSM returns to state 0 to begin fetching next instruction
The Complete FSM Control W r i t e S o u c = 1 A L U B O p n d R g D s M m I a f h / J l E x y - b k ( ' ) Q 4 9 8 6 2 7 5 3 Graphical specification:
CPI in Multi-Cycle CPU Example: CPI = 0.22 x 5 + 0.11 x 4 + 0.49 x 4 + 0.16 x 3 + 0.02 x 3 = 1.1 + 0.44 + 1.96 + 0.48 + 0.06 = 4.04 Better than worst case CPI (if all instructions took same number of cycles = 5)
FSM Controller Implementation Typically by a block of combinational logic & a state register to hold the current state P C W r i t e o n d I D M m R g S u c A L U O p B s N 3 2 1 5 4 a f l Combinational control logic Total of 9 states --> 4 bit state register Combinational control logic: Inputs: current state & any input used to determine the next state (in this case is 6-bit opcode) Outputs: next state number & control signals to be asserted for current state Note: here outputs depend only on current state, not on inputs (Moore machine)
PLA Implementation of the Combinational Control Logic If I picked a horizontal or a vertical line, could you explain it? Note: upper half is AND plane & lower half is OR plane O p 5 4 3 2 1 S I o r D R W i t e M m a d P C n g u c A L U B s N Example: PCWrite = 1 if (current state is state 0) or (current state is state 9), i.e., Example: next state bit 2 NS2 = 1 (i.e. states 4, 5, 6, or 7) if (current state is 3) or (current state is 2 and op = 101011 (sw)) or (current state is 1 and op = 000000 (R-type)) or (current state is 6), I.e.
ROM Implementation of Combinational Control Logic Combinational control logic can be express in a truth table: inputs are current state values (S3 - S0) & Opcodes (Op5 - Op0); outputs are control signals & next state values (NS3 - NS0) A ROM can be used to implement a truth table if the address (inputs) is m-bits, we can address 2m entries in the ROM outputs are the bits of data that the address points to 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 1 1 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 1 0 1 1 1 0 1 1 1 address data Example: m n ROM
ROM Implementation of Combinational Control Logic How many inputs are there? 6 bits for opcode, 4 bits for current-state = 10 address lines (i.e., 210 = 1024 different addresses) How many outputs are there? 16 datapath-control outputs, 4 next-state bits = 20 bit outputs ROM is 210 x 20 = 20K bits (and a rather unusual size) Rather wasteful, since lots of input combinations (addresses) will never occur — e.g. many opcodes are illegal, some states (e.g. states 10 to 15) are illegal
ROM vs. PLA Break up the table into two parts — 4 state bits tell you the 16 outputs, 24 x 16 bits of ROM — 10 bits tell you the 4 next state bits, 210 x 4 bits of ROM + small circuit — Total: 4.3K bits of ROM + small circuit PLA is much smaller — can share product terms — only need entries that produce an active output — can take into account don't cares Size is (#inputs ´ #product-terms) + (#outputs ´ #product-terms) For this example, PLA size prop. to = (10x17)+(20x17) = 510 PLA cells PLA cells usually about (slightly bigger) the size of a ROM cell (bit) PLA is a much more efficient implementation for this control unit
Microprogramming Control If the assembly language instruction set becomes very large, FSM could require hundreds to thousands of states & many arcs (sequences) -- very complex Complex control better managed by microprogramming Basic idea: All control signals in a cycle form a microinstruction, each microinst. defines: the set of datapath control signals that must be asserted in a given state (cycle) next microinstruction Executing a microinstruction = asserting the control signals specified A sequence of microinstructions form a microprogram Each cycle, a microinstruction is fetched from the microprogram & executed Microprogramming -- designing the control as a program implementing machine instructions by simpler microinstructions Each control state corresponds to a microinstruction Our basic FSM: 10 states 10 micro-instructions
Microinstruction Format A microinstruction contains several fields + 1 label Each field specifies a non-overlapping set of control signals Signals that are never asserted simultaneously may share the same field A last field specifies how to choose the next microinstruction Label: some micro-instructions have a label to be branched at In our example, we have 7 fields + 1 label 1st to 6th fields: control specification; 7th field: next instruction
A Microprogram Control Unit Microinstructions are placed in a ROM or PLA The state (in state register) enters as input or address to define the current microinstruction, which in turn asserting relevant control signals State change at the edge of clock Sequencing: ways to choose next microinstruction (next state): increment current address/state (AddrCtl selects +1 adder) (Seq) branch to microinstruction that begins execution of the next MIPS instruction (AddrCtl selects address 0) (Fetch) choose next microinstruction based on opcode (AddrCtl selects dispatch table) (Dispatch) A d r C t l O u p s P L o R M S a e c g i [ 5 – ] I n f 1 W D m U B 2
A Review of Our State Diagram P C W r i t e S o u c = 1 A L U B O p n d R g D s M m I a f h / J l E x y - b k ( ' ) Q 4 9 8 6 2 7 5 3 Graphical specification:
Sequencing: Address Select Logic p A d r 1 P L o R M u x 3 2 D i s c h C l g I n f Dispatch ROM 1 Op Opcode name Value 000000 R-format 0110 000010 jmp 1001 000100 beq 1000 100011 lw 0010 101011 sw Dispatch ROM 2 Op Opcode name Value 100011 lw 0011 101011 sw 0101 State number Address-control action Value of AddrCtl Use incremented state 3 1 Use dispatch ROM 1 2 Use dispatch ROM 2 4 Replace state number by 0 5 6 7 8 9
A Microprogram Control Unit W r i t e o n d I D M m R g S u c A L U O p B l s y a [ 5 – ] 1 h f A microprogram control unit controlling the datapath ROM or PLA is now microcode memory (control memory) state register is now microprogram counter (PC) Sequencer Microcode storage
A Review of Datapath & Control 2 Note the reason for each control signal; also note that we have included the jump instruction 2
A Review of the Instruction Execution Steps 1. IR <= Memory[PC]; PC <= PC + 4; (State 0) 2. Instruction Decode (All instructions); A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]]; (State 1) ALUOut <= PC + (sign-extend(IR[15:0]) << 2); 3. Memory address computation (for lw, sw): ALUOut <= A + sign-extend(IR[15:0]); (State 2) ALU (R-type): ALUOut <= A op B; (State 6) Conditional branch: if (A==B) then PC <= ALUOut; (State 8) Jump: PC <= PC[31:28] || (IR[25:0]<<2); (State 9) 4. For lw or sw instructions (access memory): MDR <= Memory[ALUOut] (State 3) or Memory[ALUOut] <= B; (State 5) For ALU (R-type) instructions (write result to register): Reg[IR[15:11]] <= ALUOut; (State 7) 5. For lw instruction only (write data from MDR to register): Reg[IR[20:16]]<= MDR; (State 4)
A Symbolic Microprogram A specification methodology appropriate if hundreds of opcodes, modes, cycles, etc. signals specified symbolically using microinstructions E.g. Read PC = Read memory using PC as address and write result into IR (& MDR) (see next slide for details) Our symbolic microprogram with 10 microinstructions: Microassembler: performs checks to remove combinations that cannot be supported in datapath
Control Signals for Each Symbol in Each Field in the Microprogram
Maximally vs Minimally Encoded No encoding of control signals in microinstruction format (horizontal microprogram): 1 bit for each control signal in datapath operation; e.g. control signals s, t, u, v, w, x, y, z will occupy 8 bits in microinstruction faster, but requires more memory (logic) used for Vax 780 — an astonishing 400K of control memory! Lots of encoding of control signals in microinstruction format (vertical microprogram): E.g. s, t, u, v, w, x, y, z will be encoded in say, 4 bits, with 0000 meaning u = 1 (others = 0), 1010 meaning u = w = 1 (others = 0), etc. I.e. all possible combinations are encoded send the microinstructions through logic to get control signals uses less memory, but slower Select a good trade-off Microcode implementation: on-chip vs off-chip
Exceptions Exception: unexpected event from within the processor (e.g. arithmetic overflow) Interrupt: “unexpected” event from outside of the processor (e.g. from an I/O device) An exception or an interrupt causes an unexpected change in control flow: How does the control unit handle an exception/interrupt? In case of an exception, processor should: save address of the offending instruction in exception program counter (EPC) indicate the reason for exception in Cause register (status register) transfer control to operating system at some specified address (the OS can then provide some service: taking predefined action in response to overflow or stopping the program & reporting an error). If OS continues program execution, it uses EPC to determine where to restart Another way is vectored interrupts: the address to which control is transferred is determined by cause of the exception I/O device request External Interrupt Invoke OS from user program Internal Exception Arithmetic overflow Using undefined instruction Hardware malfunctions Either Exception or interrupt
Exceptions Handling by Control Unit two more control signals: EPCWrite & CauseWrite; also IntCause modify the mux to PC to 4-way mux to allow exception address to PC (the exception address is OS entry point for exception handling, and is 8000 0180hex for MIPS) To handle two types of exceptions: undefined instruction & arithmetic overflow add two states in state diagram to do the above: one when no state is defined for the op value at state 1 (then state 10), the other when overflow is detected from ALU in state 7 (then state 11)
Chapter Summary Part 1: Part 2: Elements of datapath: instruction subset, resources, clocking method Datapath for different instruction classes Building single-cycle datapath: multiplexors, functional units, control signals Single-cycle datapath control unit logic: ALU control, main control Single-cycle datapath & control: complete picture, critical path, problems Part 2: Multi-cycle datapath: approach, additional registers & multiplexors, control signals Breaking instructions into execution steps Multi-cycle datapath & control: complete picture Finite state machine (FSM) (hardwired) control & controller implementation Microprogramming: control, microinstruction format, controller implementation, symbolic microprogram & its control signals, issues Exception Handling