Lecture 16: Basic CPU Design

Slides:

Advertisements

Similar presentations

CS/COE1541: Introduction to Computer Architecture Datapath and Control Review Sangyeun Cho Computer Science Department University of Pittsburgh.

Advertisements

Adding the Jump Instruction

Multicycle Datapath & Control Andreas Klappenecker CPSC321 Computer Architecture.

1 Chapter Five The Processor: Datapath and Control.

1 Lecture 17: Basic Pipelining Today’s topics:  5-stage pipeline  Hazards and instruction scheduling Mid-term exam stats:  Highest: 90, Mean: 58.

CS-447– Computer Architecture Lecture 12 Multiple Cycle Datapath

The Processor: Datapath & Control

1  1998 Morgan Kaufmann Publishers Chapter Five The Processor: Datapath and Control.

Chapter 5 The Processor: Datapath and Control Basic MIPS Architecture Homework 2 due October 28 th. Project Designs due October 28 th. Project Reports.

Lec 17 Nov 2 Chapter 4 – CPU design data path design control logic design single-cycle CPU performance limitations of single cycle CPU multi-cycle CPU.

Computer ArchitectureFall 2007 © October 3rd, 2007 Majd F. Sakr CS-447– Computer Architecture.

Preparation for Midterm Binary Data Storage (integer, char, float pt) and Operations, Logic, Flip Flops, Switch Debouncing, Timing, Synchronous / Asynchronous.

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

The Processor 2 Andreas Klappenecker CPSC321 Computer Architecture.

Chapter Five The Processor: Datapath and Control.

1 Lecture 14: FSM and Basic CPU Design Today’s topics:  Finite state machines  Single-cycle CPU Reminder: midterm on Tue 10/24  will cover Chapters.

Datapath and Control Andreas Klappenecker CPSC321 Computer Architecture.

Class 9.1 Computer Architecture - HUJI Computer Architecture Class 9 Microprogramming.

Processor I CPSC 321 Andreas Klappenecker. Midterm 1 Thursday, October 7, during the regular class time Covers all material up to that point History MIPS.

The Multicycle Processor CPSC 321 Andreas Klappenecker.

The Processor Andreas Klappenecker CPSC321 Computer Architecture.

The Processor Data Path & Control Chapter 5 Part 1 - Introduction and Single Clock Cycle Design N. Guydosh 2/29/04.

CSE431 L05 Basic MIPS Architecture.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 05: Basic MIPS Architecture Review Mary Jane Irwin.

Lecture 24: CPU Design Today’s topic –Multi-Cycle ALU –Introduction to Pipelining 1.

Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.

Chapter 4 Sections 4.1 – 4.4 Appendix D.1 and D.2 Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.

COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections

Lec 15Systems Architecture1 Systems Architecture Lecture 15: A Simple Implementation of MIPS Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some.

CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.

EE204 L12-Single Cycle DP PerformanceHina Anwar Khan EE204 Computer Architecture Single Cycle Data path Performance.

CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides

ECE 445 – Computer Organization

Computer Architecture and Design – ECEN 350 Part 6 [Some slides adapted from A. Sprintson, M. Irwin, D. Paterson and others]

1. Building A CPU  We’ve built a small ALU l Add, Subtract, SLT, And, Or l Could figure out Multiply and Divide  What about the rest l How do.

COMP541 Multicycle MIPS Montek Singh Mar 25, 2010.

Lecture 16: Basic Pipelining

ECE-C355 Computer Structures Winter 2008 The MIPS Datapath Slides have been adapted from Prof. Mary Jane Irwin ( )

Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.

Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.

COM181 Computer Hardware Lecture 6: The MIPs CPU.

MIPS Processor.

1 CS/COE0447 Computer Organization & Assembly Language Chapter 5 Part 3.

May 22, 2000Systems Architecture I1 Systems Architecture I (CS ) Lecture 14: A Simple Implementation of MIPS * Jeremy R. Johnson Mon. May 17, 2000.

Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.

1 The final datapath. 2 Control  The control unit is responsible for setting all the control signals so that each instruction is executed properly. —The.

Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:

1 Lecture 13: Sequential Circuits, FSM Today’s topics:  Sequential circuits  Finite state machines  Single-cycle CPU Reminder: midterm on Tue 10/20.

CS161 – Design and Architecture of Computer Systems

CS161 – Design and Architecture of Computer Systems

IT 251 Computer Organization and Architecture

Lecture 15: Basic CPU Design

Lecture 16: Basic Pipelining

Systems Architecture I

Morgan Kaufmann Publishers

CS/COE0447 Computer Organization & Assembly Language

Multiple Cycle Implementation of MIPS-Lite CPU

Lecture 16: Basic Pipelining

Lecture 13: Sequential Circuits, FSM

Lecture 15: Basic CPU Design

Lecture 13: Sequential Circuits, FSM

MIPS Processor.

Systems Architecture I

The Processor Lecture 3.2: Building a Datapath with Control

Multicycle Approach We will be reusing functional units

Systems Architecture I

COSC 2021: Computer Organization Instructor: Dr. Amir Asif

Chapter Four The Processor: Datapath and Control

Systems Architecture I

CS161 – Design and Architecture of Computer Systems

Presentation transcript:

Lecture 16: Basic CPU Design Today’s topics: Single-cycle CPU Multi-cycle CPU Reminder: Assignment 6 will be posted today – due in a week

Basic MIPS Architecture Now that we understand clocks and storage of states, we’ll design a simple CPU that executes: basic math (add, sub, and, or, slt) memory access (lw and sw) branch and jump instructions (beq and j)

Implementation Overview We need memory to store instructions to store data for now, let’s make them separate units We need registers, ALU, and a whole lot of control logic CPU operations common to all instructions: use the program counter (PC) to pull instruction out of instruction memory read register values

Note: we haven’t bothered View from 30,000 Feet Note: we haven’t bothered showing multiplexors What is the role of the Add units? Explain the inputs to the data memory unit Explain the inputs to the ALU Explain the inputs to the register unit

Clocking Methodology Which of the above units need a clock? What is being saved (latched) on the rising edge of the clock? Keep in mind that the latched value remains there for an entire cycle

Implementing R-type Instructions Instructions of the form add $t1, $t2, $t3 Explain the role of each signal

Implementing Loads/Stores Instructions of the form lw $t1, 8($t2) and sw $t1, 8($t2) Where does this input come from?

Implementing J-type Instructions Instructions of the form beq $t1, $t2, offset

View from 10,000 Feet

View from 5,000 Feet

Single Vs. Multi-Cycle Machine In this implementation, every instruction requires one cycle to complete  cycle time = time taken for the slowest instruction If the execution was broken into multiple (faster) cycles, the shorter instructions can finish sooner Cycle time = 20 ns Cycle time = 5 ns 1 cycle 4 cycles Load Load 1 cycle 3 cycles Add Add 1 cycle 2 cycles Beq Beq Time for a load, add, and beq = 60 ns 45 ns

Multi-Cycle Processor Single memory unit shared by instructions and memory Single ALU also used for PC updates Registers (latches) to store the result of every block

Cycle 1 The PC is used to select the appropriate instruction out of the memory unit The instruction is latched into the instruction register at the end of the clock cycle The ALU performs PC+4 and stores it in the PC at the end of the clock cycle (note that ALU is free this cycle) The control circuits must now be “cycle-aware” – the new PC need not look up the instr-memory until we’re done executing the current instruction

Cycle 2 The instruction specifies the required register values – these are read from the register file and stored in latches A and B (this happens even if the operands are not required) The last 16 bits are also used to compute PC+4+offset (in case this instruction turns out to be a branch) – this is latched into ALUOut Note that we haven’t yet figured out the instruction type, so the above operations are “speculative”

Cycle 3 The operations depend on the instruction type Memory access: the address is computed by adding the offset to the value read from the register file, result is latched into ALUOut ALU: ALU operations are performed on the values read from the register file and the result is latched into ALUOut Branch: the ALU performs the operations for “beq” and if the branch happens, the branch target (currently in ALUOut) is latched into the PC at the end of the cycle Note that the branch operation has completed by the end of cycle 3, the other two are still

Cycle 4 Memory access: the address in ALUOut is used to pick out a word from memory – this is latched into the memory data register ALU: the result latched into ALUOut is fed as input to the register file, the instruction stored in the instruction-latch specifies where the result is written into At the end of this cycle, the ALU operation and memory writes are complete

Cycle 5 Memory read: the value read from memory (and latched in the memory data register) is now written into the register file Summary: Branches and jumps: 3 cycles ALU, stores: 4 cycles Memory access: 5 cycles ALU is slower since it requires a register file write Store is slower since it requires a data memory write Load is slower since it requires a data memory read and a register file write

Average CPI Now we can compute average CPI for a program: if the given program is composed of loads (25%), stores (10%), branches (13%), and ALU ops (52%), the average CPI is 0.25 x 5 + 0.1 x 4 + 0.13 x 3 + 0.52 x 4 = 4.12 You can break this CPU design into shorter cycles, for example, a load would then take 10 cycles, stores 8, ALU 8, branch 6  average CPI would double, but so would the clock speed, the net performance would remain roughly the same Later, we’ll see that this strategy does help in most other cases.

Control Logic Note that the control signals for every unit are determined by two factors: the instruction type the cycle number for this instruction The control is therefore implemented as a finite state machine – every cycle, the FSM transitions to a new state with a certain set of outputs (the control signals) and this is a function of the inputs (the instr type)

Title Bullet