1 Chapter 5: Datapath and Control CS 447 Jason Bakos.

Slides:

Advertisements

Similar presentations

1 Today  All HW1 turned in on time, this is great!  HW2 will be out soon —You will work on procedure calls/stack/etc.  Lab1 will be out soon (possibly.

Advertisements

1 Chapter Five The Processor: Datapath and Control.

CS-447– Computer Architecture Lecture 12 Multiple Cycle Datapath

The Processor: Datapath & Control

1  1998 Morgan Kaufmann Publishers Chapter Five The Processor: Datapath and Control.

Chapter 5 The Processor: Datapath and Control Basic MIPS Architecture Homework 2 due October 28 th. Project Designs due October 28 th. Project Reports.

331 W9.1Spring :332:331 Computer Architecture and Assembly Language Spring 2006 Week 9 Building a Single-Cycle Datapath [Adapted from Dave Patterson’s.

Levels in Processor Design

331 Lec 14.1Fall 2002 Review: Abstract Implementation View  Split memory (Harvard) model - single cycle operation  Simplified to contain only the instructions:

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

Computer Structure - Datapath and Control Goal: Design a Datapath  We will design the datapath of a processor that includes a subset of the MIPS instruction.

Chapter Five The Processor: Datapath and Control.

1 Chapter 5: Datapath and Control (Part 3) CS 447 Jason Bakos.

Shift Instructions (1/4)

Copyright 1998 Morgan Kaufmann Publishers, Inc. All rights reserved. Digital Architectures1 Machine instructions execution steps (1) FETCH = Read the instruction.

331 W10.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 10 Building a Multi-Cycle Datapath [Adapted from Dave Patterson’s.

1 Chapter 5: Datapath and Control CS 447 Jason Bakos.

Processor I CPSC 321 Andreas Klappenecker. Midterm 1 Thursday, October 7, during the regular class time Covers all material up to that point History MIPS.

The Processor: Datapath & Control. Implementing Instructions Simplified instruction set memory-reference instructions: lw, sw arithmetic-logical instructions:

Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.

Chapter 4 Sections 4.1 – 4.4 Appendix D.1 and D.2 Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.

Computing Systems The Processor: Datapath and Control.

COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections

Processor: Datapath and Control

Lec 15Systems Architecture1 Systems Architecture Lecture 15: A Simple Implementation of MIPS Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some.

Datapath and Control: MultiCycle Implementation. Performance of Single Cycle Machines °Assume following operation times: Memory units : 200 ps ALU and.

Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per.

CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

C HAPTER 5 T HE PROCESSOR : D ATAPATH AND C ONTROL M ULTICYCLE D ESIGN.

Computer Architecture and Design – ECEN 350 Part 6 [Some slides adapted from A. Sprintson, M. Irwin, D. Paterson and others]

Datapath and Control Unit Design

1 A single-cycle MIPS processor  An instruction set architecture is an interface that defines the hardware operations which are available to software.

MIPS processor continued. In Class Exercise Question Show the datapath of a processor that supports only R-type and jr reg instructions.

1 Processor: Datapath and Control Single cycle processor –Datapath and Control Multicycle processor –Datapath and Control Microprogramming –Vertical and.

December 26, 2015©2003 Craig Zilles (derived from slides by Howard Huang) 1 A single-cycle MIPS processor  As previously discussed, an instruction set.

ECE-C355 Computer Structures Winter 2008 The MIPS Datapath Slides have been adapted from Prof. Mary Jane Irwin ( )

Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.

Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.

February 22, 2016©2003 Craig Zilles (derived from slides by Howard Huang) 1 A single-cycle MIPS processor  As previously discussed, an instruction set.

MIPS processor continued

Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.

COM181 Computer Hardware Lecture 6: The MIPs CPU.

1 Chapter 5: Datapath and Control (Part 2) CS 447 Jason Bakos.

Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.

1 The final datapath. 2 Control  The control unit is responsible for setting all the control signals so that each instruction is executed properly. —The.

Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:

CS161 – Design and Architecture of Computer Systems

Single-Cycle Datapath and Control

Computer Architecture

Morgan Kaufmann Publishers

CS/COE0447 Computer Organization & Assembly Language

Design of the Control Unit for Single-Cycle Instruction Execution

Multiple Cycle Implementation of MIPS-Lite CPU

MIPS processor continued

Single-Cycle CPU DataPath.

Chapter Five The Processor: Datapath and Control

Levels in Processor Design

Topic 5: Processor Architecture Implementation Methodology

Rocky K. C. Chang 6 November 2017

The Processor Lecture 3.2: Building a Datapath with Control

Vishwani D. Agrawal James J. Danaher Professor

COSC 2021: Computer Organization Instructor: Dr. Amir Asif

COSC 2021: Computer Organization Instructor: Dr. Amir Asif

Lecture 14: Single Cycle MIPS Processor

Processor: Multi-Cycle Datapath & Control

Chapter Four The Processor: Datapath and Control

MIPS processor continued

The Processor: Datapath & Control.

Processor: Datapath and Control

Presentation transcript:

1 Chapter 5: Datapath and Control CS 447 Jason Bakos

2 Review of Digital Logic Review AND, OR, NOT, and XOR gates Review negative-logic (inverted) inputs and outputs –NAND, NOR, XNOR –Sum-of-products with NAND gates –Product-of-sums with NOR gates “Double-bubble” cancellation DeMorgan’s Law –Completeness of NAND and NOR gates Review of muxes and decoders Boolean algebra equations vs. digital logic gate schematics Review of truth tables –Product-of-sums

3 Review of Digital Logic Logic minimization –Boolean algebra Identity Law –A+0=A and A*1=A Zero and One Laws –A+1=1 and A*0=0 Inverse Laws –A + (not A)=1 and A*(not A)=0 Commutative Laws –A+B=B+A and A*B=B*A Associative Laws –A+(B+C)=(A+B)+C and A*(B*C)=(A*B)*C Distributive Laws –A*(B+C)=AB+AC and A+(B*C)=(A+B)*(A+C) DeMorgan’s Law –not (A+B)=(not A)*(not B) and not(A*B)=(not A)+(not B)

4 Review of Digital Logic –Review Karnaugh Map logic minimization mux2 example –Review “don’t care” logic minimization mux2 example –Review Boolean algebra logic minimization mux2 example

5 Memory Devices Consider cross-coupled NOR gates –This is the most simple memory device, called an SR-flip-flop RSQ+Qb+comment 00QbbQbhold “invalid” Let’s eliminate the S input and provide a clock input In this configuration, the clock acts as an “enable” and is a level sensitive clock

6 Memory Devices Clocked memory devices are divided into two categories: –Latches are level-sensitive devices where the output samples the input the entire time the clock signal is high: Latches are “transparent”, they are open whenever the clock is asserted –Flip-flips only sample the input on the rising or falling edge of the clock We only want state changes on one of the edges of the clock

7 Memory Devices Here’s a master-slave approach to designing a falling-edge triggered FF Here’s a timing diagram for this device

8 Memory Devices Flip flops, depending on their design and technology, have set-up and hold times –Set-up time is the amount of time the input signal (D) must be stable prior to the clock edge that samples it –Hold time is the amount of time the input signal (D) must be stable after the clock edge

9 Memory Devices For the master-slave design, the set-up time was very long, which is why we need a better design –We won’t get into other ways to design edge- triggered flip-flips, but there are many with varying numbers of gates Usually the classic SR-latch acts as a building block for such devices –Flip-flips also have asynchronous sets/resets and sometimes enables –Some textbooks refer to the last design as a “pulse”-trigger flip-flip, since the input must be stable for the entire clock pulse

10 Finite State Machines (FSM) So far we’ve mainly did circuit design with combinational logic systems –Combinational logic circuits have an output that is some function of the inputs Next we’re going to start using sequential systems –Sequential circuits have an output that is some function of the inputs and its input history The first example of these are state machines

11 Finite State Machines (FSM) State machines can be either synchronous or asynchronous –Synchronous state machines only change state with a clock event (edge) –Asynchronous state machines do not have this restriction –We’ll start by building a synchronous state machine We’ll assume we have access to good positive edge triggered D flip-flip cells

12 Finite State Machines Here’s two different representations of the FSM in digital logic:

13 Finite State Machines There are two different ways of designing state machines: Mealy and Moore –In all state machines, the next state (which will be the current state after the next clock edge) is computed as a combinational function of the current state and the inputs –The outputs, on the other hand, are computed either as a function of the current state or as a function of the current state AND the inputs (hence Moore vs. Mealy) Note: Moore is less, because Moore machines are restricted to synchronous outputs (outputs that only change on a clock edge) Mealy machines do not have this restriction

14 Finite State Machines In order to build a state machine, we must first have our input signals and output signals Then we start adding states and transitions –For a Mealy machine, the outputs will be on the transitions –For a Moore machine, the outputs will be in the states

15 Finite State Machines Next, we need to encode state values for each of our states –Try to minimize bit changes on state transitions –Recall: We’ll need lg n flip-flops if we have n states –Then, use Karnaugh maps to minimize our next-state and output logic –Note: we could use a state machine table (truth table)

16 Finite State Machine Examples First, let’s tackle an example –3 bit counter –Outputs: 3 counter bits (no inputs) Here’s another example –Let’s design a combination lock with 2-bit combination inputs and an enter key –The output will be an “unlock” signal Next, let’s do a Coke machine example (where a coke is 35 cents) –Inputs: quarter, dime, nickel –Output: release_coke

17 Registers A register is simply an array of D-flip- flops (8-bit, 32-bit, etc.) The important distinction between flip-flips and registers is that it is VERY important for registers to have enable inputs

18 Wide Multiplexors Wide multiplexors (not an official name) are simply an array of single muxes –For example, if we want a 32 bit 4-to-1 mux, we need to array 32 4-to-1 muxes Using state machine controllers, registers, and muxes, we can very easily implement control for a digital system

19 Example: Checksummer You are to design a device that accepts a data packet comprised of a series of 8-bit words. The packet format is the following: Each 8-bit word is valid on the falling edge of each clock. The synch. characters signal the beginning of a new packet. Synch. character 1 is “ ” and synch. character 2 is “ ”. The length field specifies how many words are contained in the data portion of the packet. The data payload is the actual data payload of the packet (which can be anything). Your device will keep a running modulo 256 sum of these data words and compare that value to the value of the checksum field at the end of the packet. synch. character 1synch. character 2lengthdata payloadchecksum 8-bits ‘length’-bytes8-bits

20 Example: Checksummer Your device has the following input signals: –Clock – clock input –DataIn – 8-bit bus that puts a new character out on every falling edge of the clock –Reset – active-high reset The device will have the following output signals: –ChecksumError – this signal will be asserted for one clock cycle following the data input if there is a checksum error in the data packet. I must be valid on the rising edge that defines the end of the checksum word. –DataValid – this signal goes high at the on the rising edge that defines the beginning of the payload and goes low on the rising edge the defines the beginning of the checksum word.

21 Example: Checksummer First, what type of components do we need for this device? How do we design the state machine control? –There’s too many signals to actually implement the controller on the board How do we interconnect this device?

22 Chapter 5: Datapath and Control (Part 2) CS 447 Jason Bakos

23 Building a Datapath Which components do we need for the A/L, load, and branch classes of MIPS instructions? –First, we need a memory to hold our instructions Assume it has an address input, data output, and a MemRead and MemWrite control signals –A Program Counter (PC) register to hold the address of the next instruction Typical register (clk, en, rst, D, and Q) –ALU (the one we built in Chap. 4) A, B, ALUOp, and Out –Register file Dual-port (ReadAddr1, ReadAddr2, WriteReg, WriteData, RegWrite, ReadData1, ReadData2) –Instruction Register Like the PC, but holds the current instruction word

24 Building a Datapath

25 Datapaths Assuming our instruction is already fetched, using our components we need to build datapaths for the following: –PC=PC+4 –Executing A/L R-type instruction and writing back result –Executing load/store effective address calculation We need a sign extender for this –Computing a branch target address and determining whether or not a branch should be taken (for beq) We need a sign extender and a 2-bit shifter for this

26 Datapaths PC+4 datapathR-type A/L datapath

27 Datapaths Load/Store Datapath

28 Datapaths Branch (beq) Datapath

29 Simple CPU Implementation We want to implement the simplest possible implementation of our MIPS subset of instructions –lw/sw –beq –add, sub, and, or, and slt

30 Combining Datapaths Let’s combine the datapaths that we looked at into a single datapath Let’s assume that we want to execute all our instructions in a single clock cycle –This means that we can only use each datapath component once per instruction We need a separate instruction and data memory We may need to duplicate some components (but we can share components across different instruction types) We need multiplexors for this

31 Integrated Datapaths Here we combine all our datapaths We also add our fetch hardware Next we’ll need a control unit to assert the control signals

32 Control Signals Recall the ALU control table… Let’s create a small control “lookup table” for the ALU... ALU OperationFunction 000and 001or 010add 110subtract 111set on less than

33 Control Signals InstructionALUOpFuncFieldDesired ALU Action ALU Control Input LW00XXXXXXadd010 SW00XXXXXXadd010 BEQ01XXXXXXsubtract110 R-type (add) add010 R-type (sub) subtract110 R-type (and) and000 R-type (or) or001 R-type (slt) slt111 Note that ALUOp will come from the main control unit

34 Designing the Main Control Unit First, let’s take a look at all our current control signals and their effect... Signal Name Effect when deasserted Effect when asserted RegDst Register destination comes from rt field (20-16) Register destination comes from the rd field (15-11) RegWrite NoneA register is written to ALUSrc The second ALU operand comes from register file (2) The second ALU operand is the sign-extended register immediate PCSrc The PC is replaced by the output of the adder (PC+4) The PC is replaced by the adder that computes branch target MemRead NoneData memory read MemWrite NoneData memory written MemtoReg The value fed to the register file comes from the ALU The value fed to the register file comes from data memory

35 CPU with Control Unit

36 R-type Control For an R-type instruction, let’s decide what needs to be done (note this is done in parallel) –Fetch instruction and increment PC by 4 –Read two registers –ALU does computation –Result is written back to register file

37 Load/Store Control Let’s decide what needs to be done for a lw instruction –Fetch/increment PC –Read base register from reg. file –ALU computes effective address (base+offset) –Data from memory is written back to register file

38 Branch-on-Equal Control Finally, let’s decide what needs to be done in order to perform the beq instruction –Fetch/increment PC –Read two registers –ALU subtracts –ALU computes effective branch target (PC+offset*4) –Zero result from ALU decides if we should write the new value to the PC

39 Control Signals InstructionRegDstALUSrcMemto Reg RegWriteMem Read Mem Write BranchALU Op1 ALU Op2 R-type lw swX1X beqX0X000101

40 Control Next time we’ll find out why a single- cycle CPU like this is not practical –We need a FSM to handle control in order to reuse components during a single instruction execution

41 Chapter 5: Datapath and Control (Part 3) CS 447 Jason Bakos

42 Single-Cycle CPU CPI of the single cycle CPU from the last lecture had a CPI of 1 –Clock cycle is determined by the longest possible path in the machine loads are the worst – they use 5 functional units in series –Performance, utilization, and efficiency are not going to be good, because most instructions don’t need such a long clock cycle –A variable-speed clock could be used to solve this problem, but hinders parallelism Pipelining overlaps instruction executions

43 Multicycle Implementation Break instructions into steps, where each step requires one clock cycle We want to reuse functional units within an instruction instead of just across instructions –Reduces hardware Use single memory for instructions and data Single ALU instead of one ALU and two adders Add registers to functional units to hold intermediate results (state data) for future cycles –Use within instruction executions Register file and memory hold state data to be used across instruction executions –These are programmer-visible We will need a FSM to control CPU

44 Registers Locations of registers is determined by the following: –What combinatorial units will fit in one clock cycles Assume memory access, regfile access (two reads or one write), or ALU operation Any data needed by these operations must be stored in a temporary register –Instruction Register, Memory Data Register, A, B, and ALUOut registers added to design –All these except IR only need to hold data between two adjacent clock cycles –What data are needed in later cycles implementing the instruction

45 Multiplexors Need to add extra multiplexors (or expand existing muxes) to facilitate the reuse of the ALU within instructions –Add mux to first ALU input –Expand mux to second ALU input

46 Multicycle CPU

47 Breaking Instruction Execution into Clock Cycles Goal is to balance the latency of the operations performed during each clock cycle –At most one of the following can occur in series: One ALU operation One register file access (or multiple in parallel) One memory access (this is a joke, but we’ll accept this for now)

48 Execution Stages In order to clearly define the CPU operation for each step in the operation, we’ll use RTL (register transfer language) Architecture research has defined 5 standard phases of instruction execution –Instruction fetch –Decode Fetch register values from register file –Execute Perform arithmetic/logic operation –Memory Load/Store memory –Write back Write register result back to register file

49 Execution Stages Fetch –IR=Memory[PC] –PC=PC+4 Decode –A=Reg[IR[25..21]] –B=Reg[IR[20..16]] –ALUOut=PC+(sign_extend(IR[15..0]) << 2

50 Execution Stages Execute –Memory access ALUOut=A+sign_extend(IR[15..0]) –R-type ALUOut=A op B –Branch (beq) if (A==B) PC=ALUOut –PC=PC[31..28] || (IR[25..0]<<2)

51 Execution Stages Memory Access/Write Back –Load MDR=Memory[ALUOut] –Store Memory[ALUOut]=B –R-type Reg[IR[15..11]]=ALUOut Memory Read Completion –Load Reg[IR[20..16]]=MDR

52 Control Signals Control Unit signals –Refer to figure 5.34 (pg. 384) in the book ALU Control signals –Provide an appropriate ALUOp signal based on what the ALU is being used for (if for an R-type, perform lookup based on function code)

53 Control Signals All that’s left is for us to build the control unit as a FSM and the ALU control as a lookup table

54 Control Unit The fetch and decode stages are the same for every instruction...

55 Control Unit Here’s the states and transitions for the memory-reference instructions

56 Control Unit Here’s the states and transitions for R-type, branch, and jump instructions

57 Control Unit Final control unit FSM...

58 Problems to Think About How could we add bne, blt, and bgez instructions to our CPU? Do do you calculate CPI for our CPU if we are given instruction-type distributions?