CS 704 Advanced Computer Architecture

CS 704 Advanced Computer Architecture
Lecture 9 Computer Hardware Design (Multi Cycle and Pipeline - Datapath and Control Design) Prof. Dr. M. Ashraf Chughtai Welcome to the 9th lecture of the series of lectures on Advanced Computer Architecture. Today we will continue with the review discussion on the hardware design of computer

Lecture 9 – Computer Hardware Design (3)
Today’s Topics Recap: multi cycle datapath and control Features of Multi cycle design Multi Cycle Control Design Introduction to Pipeline datapath Summary After a quick review of the previous lectures on the Instruction Set principles we will be start our discussion on Hardware design principles MAC/VU-Advanced Computer Architecture Lecture 9 – Computer Hardware Design (3)

Recap: Lecture 8 Information flow and Control signals for single cycles data path to execute: – Add/Subtract Instruction – Immediate Instruction – Load/Store Instructions – Control Instructions Analysis of single cycle data path How effectively are different sections used? …. Next please MAC/VU-Advanced Computer Architecture Lecture 9 – Computer Hardware Design (3)

How effectively different sections are used?
– Memory is used twice, at different times (i.e., Instruction Fetch and Load or Store) Adders in IF section are used once for fraction of time (Fetch Phase) ALU is used for the execution of R-type instructions and memory address calculation Conclusion: We can reduce H/W without hurting performance by using extra control MAC/VU-Advanced Computer Architecture Lecture 9 – Computer Hardware Design (3)

Multiple Cycle Approach
Clk Cycle I fetch ID/Reg Exec Mem Wr Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 The single cycle operations are performed in five steps: Instruction Fetch Instruction Decode and Register Read Execute (R- I-type or address for Load/store/Branch) Memory (Read/write) Write (to register file) +2 = 77 min. (X:57) MAC/VU-Advanced Computer Architecture Lecture 9 – Computer Hardware Design (3)

Multiple Cycle Approach
In the Single Cycle implementation, the cycle time is set to accommodate the longest instruction, the Load instruction. In the Multiple Cycles implementation, the cycle time is set to accomplish longest step, the memory read/write Consequently, the cycle time for the Single Cycle implementation can be five times longer than the multiple cycle implementation. As an example, if T = 5 µ Sec. for single cycle then T= 1 µ Sec. for multi cycle implementation MAC/VU-Advanced Computer Architecture Lecture 9 – Computer Hardware Design (3)

Single Cycle vs. Multiple Cycle
Clk Cycle 1 Multiple Cycle Implementation: I fetch ID/Reg Exec Mem Wr Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Load Store Single Cycle Implementation: Ifetch R-type Waste +2 = 77 min. (X:57) MAC/VU-Advanced Computer Architecture Lecture 9 – Computer Hardware Design (3)

Single Cycle vs. Multiple Cycle: Explanation
For different classes of instructions, Multi Cycle implementation may take 3, 4 or 5 cycles to fetch and execute an instruction Now in order to compare the performance of single cycle and multi cycle implementations, let us consider a program segment comprising three instructions, given in the sequence: Load Store R-type (say Add) MAC/VU-Advanced Computer Architecture Lecture 9 – Computer Hardware Design (3)

The execution time for these three instructions using single cycle implementation with cycle length equals 5 µ Sec is: T exe = 3 x 5 µ Sec = 15 µsec. Note that here the cycle time is long enough for the load instruction, but it is too long for the Store and R-type instruction So the last part of the cycle, in case of the store and 4th (memory) part in case of R-type instruction is wasted. MAC/VU-Advanced Computer Architecture Lecture 9 – Computer Hardware Design (3)

In Multi cycle implementation, Load is completed in 5 Cycles, and store and R-type each takes 4 cycles to complete. Thus, these three instructions take = 13 cycles, if the cycle length is 1 µ Sec then the execution time for the three instructions is: T exe = 13 x 1 µ Sec = 13 µsec. Conclusion: The multi cycle is 15/13 = 1.24 times faster Next: High-view of multi cycle datapath MAC/VU-Advanced Computer Architecture Lecture 9 – Computer Hardware Design (3)

High Level View of Multiple Cycle Datapath
Rreg # Register File Wreg# A ALU B Inst. Reg. ALUout P C Memory Address Inst. Or data Putting it all together, here it is: the multiple cycle datapath we set out to built. +1 = 47 min. (Y:47) Explanation Next slide ………. MAC/VU-Advanced Computer Architecture Lecture 9 – Computer Hardware Design (3)

High level view of Multiple Cycle Datapath: Explanation
Here, a shared memory is used, as the instruction fetch and data read/write are performed in different cycles The single ALU is shared among the instruction fetch, execute arithmetic and logic instructions and address calculation in different cycles The use of shared function unit (ALU) requires additional multiplexers or widening of multiplexers New temporary registers, Instruction register, Data memory, operand A and B and ALUout, are included to hold the information for use in later cycle E.g.; Memory read in cycle 4 is written in cycle 5 (Load), operand registers A and B read in cycle 2 may be used in cycle 3 or 4, and so on MAC/VU-Advanced Computer Architecture Lecture 9 – Computer Hardware Design (3)

Multiple Cycle Datapath Design
PCWr PCWrCond PCSrc BrWr Zero IRWr IorD MemWr RegDst RegWr ALUSelA 1 Target 32 32 Mux 2 PC Mux3 1 32 Zero Rs Mux 1 1 Ra busA 32 RAdr 5 32 IR Rt Rb A 32 32 Ideal Memory ALU Mux 4 1 5 Reg File 32 ALU Out Rt 4 WrAdr 32 Rw 32 B 1 32 32 Rd Din Dout MDR busW busB 32 Mux5 2 32 ALU Control Putting it all together, here it is: the multiple cycle datapath we set out to built. +1 = 47 min. (Y:47) Mux6 1 3 << 2 Extend Imm 16 32 ALUOp ExtOp ALUSelB MemtoReg MAC/VU-Advanced Computer Architecture Lecture 9 – Computer Hardware Design (3)

Multiple Cycle Datapath Architecture
Immunized Hardware: 1 memory, 1 adder Cycle 1 - [Instruction Fetch]: firstly, MUX-1 select input IorD =0 and the PC is connected to the Memory Read address input RAdr; instruction is fetched from the memory at Dout and is placed in the Instruction Register by inserting IRWr [Yellow Path] Secondly, the select input ALUSelA to MUX-3, is made equal to 0,, ALUSelB to MUX-5 is made equal to 00 to add 4 to PC; then PCSrc of MUX-2 is made 0 and PCWr is asserted to load PC+4 to the PC as address of the next instruction MAC/VU-Advanced Computer Architecture Lecture 9 – Computer Hardware Design (3)

Cycle 2 – [ID and Reg. Rd.] firstly the Instruction is decoded; the Rs, Rt, Rd and Imm16 fields are made available on respective lines (Shown in orange) Secondly the registers at Rs and Rt are read at buses A and B, respectively MAC/VU-Advanced Computer Architecture Lecture 9 – Computer Hardware Design (3)

Cycle 3 - [Exe] The select inputs ALUSelA and ALUSelB to the MUX-3 and MUX-5, respectively for the instruction in hand; available at ALUop input to the ALU Control Unit - For R-type instructions: ALUSelA = 1 and ALUSelB = 01 to connect bus A and bus B to ALU to perform the operation [Green Path] - MAC/VU-Advanced Computer Architecture Lecture 9 – Computer Hardware Design (3)

- For I-type and Memory Instructions: ALUSelA = 1 and ALUSelB = 11 to connect bus A and Sign Extended Imm16 to ALU to perform the operation on immediate data [Red Path] The ALU output is kept in ALU OUT Register as result of ALU OP execution in case of I-type operation and as Memory address in case of memory instructions Load/store MAC/VU-Advanced Computer Architecture Lecture 9 – Computer Hardware Design (3)

- For J- type Instructions: 1: Condition Test: ALUSelA = 1 and ALUSelB = 01; ALUop=SUB If ALU output Zero =1 then assert PCWrCond and 2: PC  PC+4+[Sign Extend Imm16 and Shift left 2 bits] ALUSelA = 0 ; ALUSelB = 10 Assert BrWr ; and PCSrc of MUX-2 = 1 to pass the target address to PC [Blue Path] MAC/VU-Advanced Computer Architecture Lecture 9 – Computer Hardware Design (3)

Cycle 4 - [Memory Instruction Load/Store] - Load instruction: IorD=1 to pass the ALUout Register as RAdr (Read Address) input to the memory to read data at the Dout [Dark Green Path] - Store instruction: MemWr is asserted; as the ALUout Register output is wired to WrAdr (Write address input) [Dark Green Path] and bus B of the register file is wired to Din (Data In) [Dark blue] of the memory MAC/VU-Advanced Computer Architecture Lecture 9 – Computer Hardware Design (3)

Cycle 5 - [Write Back] - R-type instruction: RegDest of MUX-4 = 1 to select Rd as the destination address; MemToReg = 0 to connect ALUout to Bus-W and RegWr is asserted memory - I-type instruction: RegDest of MUX-4 = 0 to select Rt as the destination address; MemToReg = 0 to connect ALUout to Bus-W and RegWr is asserted memory Load instruction next … MAC/VU-Advanced Computer Architecture Lecture 9 – Computer Hardware Design (3)

Cycle 5 - [Write Back] - Load instruction: RegDest of MUX-4 = 0 to select Rt as the destination address; MemToReg = 1 to connect Dout of the memory to Bus-W or the register file and RegWr is asserted MAC/VU-Advanced Computer Architecture Lecture 9 – Computer Hardware Design (3)

Multi Cycle Control design
Control may be designed in the following steps using the initial representation as: Finite State Machine Here, the sequence control is defined by explicit next state functions, logic is represented by logic equations and usually PLAs are used to implement the machine Micro-program - Here, micro-program counter and a dispatch ROM defines the sequence control, logic is represented by truth table and control is implemented using ROM MAC/VU-Advanced Computer Architecture Lecture 9 – Computer Hardware Design (3)

Multi Cycle Controller FSM Specifications
IR <= MEM[PC] PC <= PC + 4 “instruction fetch” 0000 A <= R[rs] B <= R[rt] “decode” 0001 Equal BEQ PC <= PC + SX || 00 0010 0011 S <= A - B LW R-type ORi SW Execute S <= A fun B S <= A op ZX S <= A + SX S <= A + SX 0100 0110 1000 1011 ~Equal Memory M <= MEM[S] MEM[S] <= B 1001 1100 R[rd] <= S R[rt] <= S R[rt] <= M Write-back 0101 0111 1010 MAC/VU-Advanced Computer Architecture Lecture 9 – Computer Hardware Design (3)

Micro program Controller
Control Logic Multicycle Datapath Outputs Inputs 1 State Reg Adder Address Select Logic Opcode MAC/VU-Advanced Computer Architecture Lecture 9 – Computer Hardware Design (3)

“Macroinstruction” Interpretation
User program plus Data this can change! Main Memory ADD SUB AND . one of these is mapped into one of these DATA execution unit CPU control memory AND microsequence e.g., Fetch Calc Operand Addr Fetch Operand(s) Calculate Save Answer(s) MAC/VU-Advanced Computer Architecture Lecture 9 – Computer Hardware Design (3)

Designing a Microinstruction Set
1) Start with list of control signals 2) Group signals together that make sense (vs. random): called “fields” 3) Places fields in some logical order (e.g., ALU operation & ALU operands first and microinstruction sequencing last) 4) Create a symbolic legend for the microinstruction format, showing name of field values and how they set the control signals Use computers to design computers 5) To minimize the width, encode operations that will never be used at the same time MAC/VU-Advanced Computer Architecture Lecture 9 – Computer Hardware Design (3)

Computer Architecture
Microprogramming Specialize state-diagrams easily captured by micro sequencer simple increment & “branch” fields datapath control fields Control design reduces to Microprogramming Microprogramming is a fundamental concept implement an instruction set by building a very simple processor and interpreting the instructions essential for very complex instructions and when few register transfers are possible overkill when ISA matches datapath 1-1 MAC/EE 443 Lectur 15 Computer Architecture

Microprogramming: inspiration for RISC
If simple instruction could execute at very high clock rate… If you could even write compilers to produce microinstructions… If most programs use simple instructions and addressing modes… If microcode is kept in RAM instead of ROM so as to fix bugs … If same memory used for control memory could be used instead as cache for “macroinstructions”… Then why not skip instruction interpretation by a micro-program and simply compile directly into lowest language of machine? (microprogramming is overkill when ISA matches datapath 1-1) MAC/EE 443 Lectur 15 Computer Architecture

Summary Single cycle verses multi cycle datapath Key components of multi cycle data path Design and information flow in multi cycle data path Multi cycle control unit design Finite State Machine –based control Unit Micro program- based controller MAC/VU-Advanced Computer Architecture Lecture 9 – Computer Hardware Design (3)

Asslam-u-aLacum and ALLAH Hafiz MAC/VU-Advanced Computer Architecture Lecture 9 – Computer Hardware Design (3)

CS 704 Advanced Computer Architecture

Similar presentations

Presentation on theme: "CS 704 Advanced Computer Architecture"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 704 Advanced Computer Architecture

Similar presentations

Presentation on theme: "CS 704 Advanced Computer Architecture"— Presentation transcript:

Similar presentations

About project

Feedback