Chapter 5 Processor Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides We're ready to look at an implementation of the MIPS Simplified.

Slides:



Advertisements
Similar presentations
1  1998 Morgan Kaufmann Publishers Summary:. 2  1998 Morgan Kaufmann Publishers How many cycles will it take to execute this code? lw $t2, 0($t3) lw.
Advertisements

331 W08.1Spring :332:331 Computer Architecture and Assembly Language Spring 2006 Week 8: Datapath Design [Adapted from Dave Patterson’s UCB CS152.
1 Chapter Five The Processor: Datapath and Control.
CS-447– Computer Architecture Lecture 12 Multiple Cycle Datapath
The Processor: Datapath & Control
1  1998 Morgan Kaufmann Publishers Chapter Five The Processor: Datapath and Control.
Chapter 5 The Processor: Datapath and Control Basic MIPS Architecture Homework 2 due October 28 th. Project Designs due October 28 th. Project Reports.
CS 161Computer Architecture Chapter 5 Lecture 12
Embedded Systems in Silicon TD5102 MIPS design Datapath and Control Henk Corporaal Technical University.
Levels in Processor Design
1 Chapter Five. 2 We're ready to look at an implementation of the MIPS Simplified to contain only: –memory-reference instructions: lw, sw –arithmetic-logical.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
The Processor 2 Andreas Klappenecker CPSC321 Computer Architecture.
Advanced Computer Architecture 5MD00 / 5Z033 MIPS Design data path and control Henk Corporaal TUEindhoven 2007.
Chapter Five The Processor: Datapath and Control.
331 W10.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 10 Building a Multi-Cycle Datapath [Adapted from Dave Patterson’s.
1  1998 Morgan Kaufmann Publishers We're ready to look at an implementation of the MIPS Simplified to contain only: –memory-reference instructions: lw,
CPU Architecture Why not single cycle? Why not single cycle? Hardware complexity Hardware complexity Why not pipelined? Why not pipelined? Time constraints.
Datapath and Control Andreas Klappenecker CPSC321 Computer Architecture.
Class 9.1 Computer Architecture - HUJI Computer Architecture Class 9 Microprogramming.
Processor I CPSC 321 Andreas Klappenecker. Midterm 1 Thursday, October 7, during the regular class time Covers all material up to that point History MIPS.
1 We're ready to look at an implementation of the MIPS Simplified to contain only: –memory-reference instructions: lw, sw –arithmetic-logical instructions:
The Processor Data Path & Control Chapter 5 Part 1 - Introduction and Single Clock Cycle Design N. Guydosh 2/29/04.
Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.
Chapter 4 Sections 4.1 – 4.4 Appendix D.1 and D.2 Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.
Computing Systems The Processor: Datapath and Control.
COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections
1 For some program running on machine X, Performance X = 1 / Execution time X "X is n times faster than Y" Performance X / Performance Y = n Problem:
Lec 15Systems Architecture1 Systems Architecture Lecture 15: A Simple Implementation of MIPS Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some.
EECS 322: Computer Architecture
1 Computer Organization & Design Microcode for Control Sec. 5.7 (CDROM) Appendix C (CDROM) / / pdf / lec_3a_notes.pdf.
CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides
Computer Architecture and Design – ECEN 350 Part 6 [Some slides adapted from A. Sprintson, M. Irwin, D. Paterson and others]
1  2004 Morgan Kaufmann Publishers Chapter Five.
1  1998 Morgan Kaufmann Publishers Value of control signals is dependent upon: –what instruction is being executed –which step is being performed Use.
ECE-C355 Computer Structures Winter 2008 The MIPS Datapath Slides have been adapted from Prof. Mary Jane Irwin ( )
1  1998 Morgan Kaufmann Publishers Simple Implementation Include the functional units we need for each instruction Why do we need this stuff?
COM181 Computer Hardware Lecture 6: The MIPs CPU.
1  2004 Morgan Kaufmann Publishers No encoding: –1 bit for each datapath operation –faster, requires more memory (logic) –used for Vax 780 — an astonishing.
May 22, 2000Systems Architecture I1 Systems Architecture I (CS ) Lecture 14: A Simple Implementation of MIPS * Jeremy R. Johnson Mon. May 17, 2000.
Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.
1. 2 MIPS Hardware Implementation Full die photograph of the MIPS R2000 RISC Microprocessor. The 1986 MIPS R2000 with five pipeline stages and 450,000.
Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:
Design a MIPS Processor (II)
CS161 – Design and Architecture of Computer Systems
CS161 – Design and Architecture of Computer Systems
Chapter 5 The Processor: Datapath and Control
Computer Organization & Design Microcode for Control Sec. 5
MIPS Instructions.
Processor (I).
CS/COE0447 Computer Organization & Assembly Language
Multiple Cycle Implementation of MIPS-Lite CPU
Chapter Five.
Processor: Finite State Machine & Microprogramming
Chapter Five The Processor: Datapath and Control
The Multicycle Implementation
Chapter Five The Processor: Datapath and Control
The Multicycle Implementation
The Processor Lecture 3.1: Introduction & Logic Design Conventions
Multicycle Approach We will be reusing functional units
Systems Architecture I
Processor (II).
Processor: Multi-Cycle Datapath & Control
Multicycle Design.
Basic MIPS Implementation
Multi-Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Chapter Four The Processor: Datapath and Control
The Processor: Datapath & Control.
CS161 – Design and Architecture of Computer Systems
Presentation transcript:

Chapter 5 Processor Design

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides We're ready to look at an implementation of the MIPS Simplified to contain only: memory-reference instructions: lw, sw arithmetic-logical instructions: add, sub, and, or, slt control flow instructions: beq, j Generic Implementation: use the program counter (PC) to supply instruction address get the instruction from memory read registers use the instruction to decide exactly what to do All instructions use the ALU after reading the registers Why? memory-reference? arithmetic? control flow? The Processor: Datapath & Control

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Abstract / Simplified View: Two types of functional units: elements that operate on data values (combinational) elements that contain state (sequential) More Implementation Details

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Our Implementation An edge triggered methodology Typical execution: read contents of some state elements, send values through some combinational logic write results to one or more state elements

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Built using D flip-flops Register File Do you understand? What is the “Mux” above?

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Register File Note: we still use the real clock to determine when to write

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Simple Implementation Include the functional units we need for each instruction Why do we need this stuff?

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Information flow – add instruction PC Address bus Data bus Instruction memory Read register 1 Read register 2 Write register Write data Read data 1 Read data 2 ALU rs rt rd Register operands Rd = Rs + Rt Instruction address Instruction code Register file 4 + Write register Write PC Function select rs rt rd shamt opcode function

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Information flow – lw instruction PC Address bus Data bus Instruction memory Read register 1 Read register 2 Write register Write data Read data 1 Read data 2 ALU rs rt rd Register file 4 + Write register Write PC Function select Address bus Data out Data memory Sign extend Read memory 16-bit displacement 32-bit displacement rs rt displacement opcode Rt = Mem[Rs+disp] operand address

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Information flow – sw instruction PC Address bus Data bus Instruction memory Read register 1 Read register 2 Write register Write data Read data 1 Read data 2 ALU rs Register file 4 + Write register Write PC Function select Address bus Data out Data memory Sign extend Read memory 16-bit constant 32-bit displacement Write memory Data in rs rt displacement opcode Mem[Rs+disp] = Rt rt rd

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Information flow – beq instruction PC Address bus Data bus Instruction memory Read register 1 Read register 2 Write register Write data Read data 1 Read data 2 ALU rs Register file 4 + Write register Write PC Function select Address bus Data out Data memory Sign extend Read memory Write memory Data in rs rt displacement Shift x 2 + branch address zero pc + 4 pc offset (shifted) rt rd

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Control Unit Selecting the operations to perform (ALU, read/write, etc.) Controlling the flow of data (multiplexor inputs) Information comes from the 32 bits of the instruction Example: add $8, $17, $18 Instruction Format: op rs rtrd shamtfunct ALU's operation based on instruction type & function code

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Ex., what should the ALU do with this instruction lw $1, 100($2) op rs rt 16 bit offset ALU control input 0000 AND 0001OR 0010add 0110subtract 0111set-on-less-than 1100NOR Refer to previous mux-based ALU design for control coding ALU Function Control

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Must describe hardware to compute 4-bit ALU control input given instruction type 00 = lw, sw 01 = beq, 10 = arithmetic derive function code for arithmetic Describe it using a truth table (can turn into gates): ALUOp computed from instruction type ALU Function Control

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Control Unit Design – Identify Control Points Text – Fig. 5.24

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Control Logic

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Performance of Single Cycle Implementation Cycle time determined by length of the longest path Calculate cycle time assuming negligible delays except: memory (200ps), ALU/adders (100ps), register file access (50ps) Instruction Class Instruction Memory Register Read ALU Operation Data Memory Register WriteTotal R-type ps LW ps SW ps BEQ ps J ps

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Muticycle Approach Single Cycle Problems: what if we had a more complicated instruction like floating point? wasteful of area (ex. multiple adder circuits) clock cycle must accommodate longest delay path (wasted time for many instructions) One Solution: partition hardware into multiple stages use a “smaller” cycle time – to accommodate longest “stage” have different instructions take different numbers of cycles reuse components for different functions in different cycles use one ALU for instruction operands and next PC value use one memory for instructions and data use a finite state machine for control

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Break up the instructions into steps, each step takes a cycle balance the amount of work to be done restrict each cycle to use only one major functional unit At the end of a cycle store values for use in later cycles (easiest thing to do) introduce additional “internal” registers Multicycle Approach

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Instructions from ISA perspective Consider each instruction from perspective of ISA. Example: The add instruction changes a register. Register specified by bits 15:11 of instruction. Instruction specified by the PC. New value is the sum (“op”) of two registers. Registers specified by bits 25:21 and 20:16 of the instruction Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] op Reg[Memory[PC][20:16]] In order to accomplish this we must break up the instruction. (kind of like introducing variables when programming)

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Breaking down an instruction ISA definition of arithmetic: Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] op Reg[Memory[PC][20:16]] Could break down to: IR <= Memory[PC] A <= Reg[IR[25:21]] B <= Reg[IR[20:16]] ALUOut <= A op B Reg[IR[15:11]] <= ALUOut We forgot an important part of the instruction execution! PC <= PC + 4

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Idea behind multicycle approach We define each instruction from the ISA perspective (do this!) Break it down into steps following our rule that data flows through at most one major functional unit (e.g., balance work across steps) Introduce new registers as needed (e.g, A, B, ALUOut, MDR, etc.) Finally try and pack as much work into each step (avoid unnecessary cycles) while also trying to share steps where possible (minimizes control, helps to simplify solution) Result: Our book’s multicycle Implementation!

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Instruction Fetch Instruction Decode and Register Fetch Execution, Memory Address Computation, or Branch Completion Memory Access or R-type instruction completion Write-back step INSTRUCTIONS TAKE FROM CYCLES! Five Execution Steps

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Use PC to get instruction and put it in the Instruction Register. Increment the PC by 4 and put the result back in the PC. Can be described succinctly using RTL "Register-Transfer Language" IR <= Memory[PC]; PC <= PC + 4; Can we figure out the values of the control signals? What is the advantage of updating the PC now? Step 1: Instruction Fetch

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Read registers rs and rt in case we need them Compute the branch address in case the instruction is a branch RTL: A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]]; ALUOut <= PC + (sign-extend(IR[15:0]) << 2); We aren't setting any control lines based on the instruction type (we are busy "decoding" it in our control logic) Step 2: Instruction Decode and Register Fetch

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides ALU is performing one of three functions, based on instruction type Memory Reference: ALUOut <= A + sign-extend(IR[15:0]); R-type: ALUOut <= A op B; Branch: if (A==B) PC <= ALUOut; Step 3 (instruction dependent)

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Loads and stores access memory MDR <= Memory[ALUOut]; or Memory[ALUOut] <= B; R-type instructions finish Reg[IR[15:11]] <= ALUOut; The write actually takes place at the end of the cycle on the edge Step 4 (R-type or memory-access)

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Reg[IR[20:16]] <= MDR; Which instruction needs this? Step 5 (Write-back step)

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Summary:

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Complete Multi-Cycle Datapath Text – Fig. 5.28

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Finite state machines: a set of states and next state function (determined by current state and the input) output function (determined by current state and possibly input) Review: finite state machines We’ll use a Moore machine (output based only on current state) Combinational Logic Memory Outputs (control Signals) Next State Present State Inputs (opcode)

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Value of control signals is dependent upon: what instruction is being executed which step is being performed Use the information we’ve accumulated to specify a finite state machine specify the finite state machine graphically, or use microprogramming Implementation can be derived from specification Implementing the Control

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides State transitions next state is function of current state and inputs Timing control signals load registers/memory asserted if name only otherwise not asserted Path setup control signals control mux’s and ALU exact value listed don’t care if not listed Graphical Specification of FSM Text – Fig. 5.38

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Fetch & decode cycles

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Execute LW/SW

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Execute R-type

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Execute BEQ & J

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Implementation: Finite State Machine for Control

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides PLA Implementation

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides ROM = "Read Only Memory" values of memory locations are fixed ahead of time 2 10 x 20 = 20K bits ROM can be used to implement the controller truth table m = 10 address inputs: 6 opcode + 4 state bits n = 20 outputs: 16 datapath-control signals + 4 state bits Rather wasteful, since for lots of the entries, the outputs are the same — i.e., opcode is often ignored ROM Implementation mn

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Complex instructions: the "next state" is often current state + 1 Another Implementation Style

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Details Dispatch ROM 1Dispatch ROM 2 OpOpcode nameValueOpOpcode nameValue R-format lw jmp sw beq lw sw 0010

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Microprogramming

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides A specification methodology appropriate if hundreds of opcodes, modes, cycles, etc. signals specified symbolically using microinstructions Will two implementations of the same architecture have the same microcode? What would a microassembler do? Microprogramming

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Microinstruction format

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides No encoding: 1 bit for each datapath operation faster, requires more memory (logic) used for Vax 780 — an astonishing 400K of memory! Lots of encoding: send the microinstructions through logic to get control signals uses less memory, slower Historical context of CISC: Too much logic to put on a single chip with everything else Use a ROM (or even RAM) to hold the microcode It’s easy to add new instructions Maximally vs. Minimally Encoded

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Microcode: Trade-offs Distinction between specification and implementation is sometimes blurred Specification Advantages: Easy to design and write Design architecture and microcode in parallel Implementation (off-chip ROM) Advantages Easy to change since values are in memory Can emulate other architectures Can make use of internal registers Implementation Disadvantages, SLOWER now that: Control is implemented on same chip as processor ROM is no longer faster than RAM No need to go back and make changes

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Historical Perspective In the ‘60s and ‘70s microprogramming was very important for implementing machines This led to more sophisticated ISAs and the VAX In the ‘80s RISC processors based on pipelining became popular Pipelining the microinstructions is also possible! Implementations of IA-32 architecture processors since 486 use: “hardwired control” for simpler instructions (few cycles, FSM control implemented using PLA or random logic) “microcoded control” for more complex instructions (large numbers of cycles, central control store) The IA-64 architecture uses a RISC-style ISA and can be implemented without a large central control store

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Chapter 5 Summary If we understand the instructions… We can build a simple processor! If instructions take different amounts of time, multi- cycle is better Datapath implemented using: Combinational logic for arithmetic State holding elements to remember bits Control implemented using: Combinational logic for single-cycle implementation Finite state machine for multi-cycle implementation