Computer Organization Lecture Set – 05.1 Chapter 5 Huei-Yung Lin.

Slides:



Advertisements
Similar presentations
331 W08.1Spring :332:331 Computer Architecture and Assembly Language Spring 2006 Week 8: Datapath Design [Adapted from Dave Patterson’s UCB CS152.
Advertisements

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 11 - Processor.
The Processor: Datapath & Control
Savio Chau Single Cycle Controller Design Last Time: Discussed the Designing of a Single Cycle Datapath Control Datapath Memory Processor (CPU) Input Output.
Levels in Processor Design
Lec 17 Nov 2 Chapter 4 – CPU design data path design control logic design single-cycle CPU performance limitations of single cycle CPU multi-cycle CPU.
331 Lec 14.1Fall 2002 Review: Abstract Implementation View  Split memory (Harvard) model - single cycle operation  Simplified to contain only the instructions:
Computer ArchitectureFall 2007 © October 3rd, 2007 Majd F. Sakr CS-447– Computer Architecture.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
ECE 232 L13. Control.1 ©UCB, DAP’ 97 ECE 232 Hardware Organization and Design Lecture 13 Control Design
Computer Structure - Datapath and Control Goal: Design a Datapath  We will design the datapath of a processor that includes a subset of the MIPS instruction.
The Processor 2 Andreas Klappenecker CPSC321 Computer Architecture.
Shift Instructions (1/4)
Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 25 CPU design (of a single-cycle CPU) Intel is prototyping circuits that.
Processor I CPSC 321 Andreas Klappenecker. Midterm 1 Thursday, October 7, during the regular class time Covers all material up to that point History MIPS.
The Processor Data Path & Control Chapter 5 Part 1 - Introduction and Single Clock Cycle Design N. Guydosh 2/29/04.
The Processor: Datapath & Control. Implementing Instructions Simplified instruction set memory-reference instructions: lw, sw arithmetic-logical instructions:
Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.
Chapter 4 Sections 4.1 – 4.4 Appendix D.1 and D.2 Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.
COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections
Chapter 4 CSF 2009 The processor: Building the datapath.
Processor: Datapath and Control
Lec 15Systems Architecture1 Systems Architecture Lecture 15: A Simple Implementation of MIPS Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some.
Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per.
Computer Organization CS224 Fall 2012 Lesson 22. The Big Picture  The Five Classic Components of a Computer  Chapter 4 Topic: Processor Design Control.
ECE 445 – Computer Organization
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /19/2013 Lecture 17: The Processor - Overview Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER.
EEM 486: Computer Architecture Designing a Single Cycle Datapath.
CDA 3101 Fall 2013 Introduction to Computer Organization
CS2100 Computer Organisation The Processor: Datapath (AY2015/6) Semester 1.
Computer Architecture and Design – ECEN 350 Part 6 [Some slides adapted from A. Sprintson, M. Irwin, D. Paterson and others]
Datapath and Control Unit Design
1 A single-cycle MIPS processor  An instruction set architecture is an interface that defines the hardware operations which are available to software.
1 Processor: Datapath and Control Single cycle processor –Datapath and Control Multicycle processor –Datapath and Control Microprogramming –Vertical and.
Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.
Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.
COM181 Computer Hardware Lecture 6: The MIPs CPU.
Single Cycle Controller Design
Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.
Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:
CS161 – Design and Architecture of Computer Systems
Morgan Kaufmann Publishers
IT 251 Computer Organization and Architecture
Introduction CPU performance factors
Morgan Kaufmann Publishers The Processor
Morgan Kaufmann Publishers
Processor (I).
CS/COE0447 Computer Organization & Assembly Language
Single Cycle CPU Design
CSCI206 - Computer Organization & Programming
CS/COE0447 Computer Organization & Assembly Language
Single-Cycle CPU DataPath.
CS/COE0447 Computer Organization & Assembly Language
Levels in Processor Design
Topic 5: Processor Architecture Implementation Methodology
Rocky K. C. Chang 6 November 2017
Composing the Elements
Composing the Elements
The Processor Lecture 3.2: Building a Datapath with Control
Topic 5: Processor Architecture
Datapath: Instruction Store/Fetch & PC Increment
COMS 361 Computer Organization
COSC 2021: Computer Organization Instructor: Dr. Amir Asif
Lecture 14: Single Cycle MIPS Processor
COMP541 Datapaths I Montek Singh Mar 18, 2010.
CS/COE0447 Computer Organization & Assembly Language
The Processor: Datapath & Control.
COMS 361 Computer Organization
Processor: Datapath and Control
CS/COE0447 Computer Organization & Assembly Language
Presentation transcript:

Computer Organization Lecture Set – 05.1 Chapter 5 Huei-Yung Lin

H.Y. Lin, CCUEE Computer Organization 2 Computer Systems Overview Technology Trends Performance Instruction Sets (and Software) Logic and Arithmetic Processor Implementation  Memory Systems Input/Output Roadmap for the Term: Major Topics

H.Y. Lin, CCUEE Computer Organization 3 Outline - Processor Implementation Overview   Review of Processor Operation  Steps in Processor Design  Implementation Styles  The “ MIPS Lite ” Instruction Subset Single-Cycle Implementation Multi-Cycle Implementation Pipelined Implementation

H.Y. Lin, CCUEE Computer Organization 4 Processor  Datapath  Control Memory Input Output Input Processor Control Datapath Output Memory Review: The “ Five Classic Components ”

H.Y. Lin, CCUEE Computer Organization 5 Executing Programs - the “ fetch/execute ” cycle  Processor fetches instruction from memory  Processor executes “ machine language ” instruction Perform calculation Read/write data  Repeat with “ next ” instruction Processor Control Datapath Memory PC Address Instruction Review: Processor Operation

H.Y. Lin, CCUEE Computer Organization 6 Processor Design Goals Design hardware that:  Fetches instructions from memory  Executes instructions as specified by ISA Design considerations  Cost  Speed  Power

H.Y. Lin, CCUEE Computer Organization 7 Steps in Processor Design 1.Analyze instruction set; get datapath requirements 2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals

H.Y. Lin, CCUEE Computer Organization 8 Processor Implementation Styles Single Cycle  Perform each instruction in 1 clock cycle  Disadvantage: only as fast as “ slowest ” instruction Multi-Cycle  Break fetch/execute cycle into multiple steps  Perform 1 step in each clock cycle Pipelined  Execute each instruction in multiple steps  Perform 1 step / instruction in each clock cycle  Process multiple instructions in parallel - “ assembly line ”

H.Y. Lin, CCUEE Computer Organization 9 “ MIPS Lite ” - A Pedagogical Example Use a MIPS to illustrate processor design Limit initial design to a subset of instructions:  Memory access: lw, sw  Arithmetic/Logical: add, sub, and, or, slt  Branch/Jump: beq, j Add instructions as we go along (e.g., addi )

H.Y. Lin, CCUEE Computer Organization 10 Review - MIPS Instruction Formats Field definitions:  op: instruction opcode  rs, rt, rd: source (2) and destination (1) register numbers  shamt: shift amount  funct: function code (works with opcode to specify op)  offset/immediate: address offset or immediate value  address: target address for jumps op rsrtoffset 6 bits5 bits 16 bits op rsrtrd funct shamt 6 bits5 bits 6 bits R-Format I-Format op address 6 bits26 bits J-Format

H.Y. Lin, CCUEE Computer Organization 11 MIPS Instruction Subset Arithmetic & Logical Instructions add $s0, $s1, $s2 sub $s0, $s1, $s2 and $s0, $s1, $s2 or $s0, $s1, $s2 Data Transfer Instructions lw $s1, offset($s0) sw $s2, offset($s3) Branch beq $s0, offset j address

H.Y. Lin, CCUEE Computer Organization 12 MIPS Instruction Execution General Procedure 1.Fetch Instruction from memory 2.Decode Instruction, read register values 3.If necessary, perform an ALU operation 4.If load or store, do memory access 5.Write results back to register file and increment PC Register Transfers provide a concise description

H.Y. Lin, CCUEE Computer Organization 13 Instruction Fetch Instruction <= MEM[PC] Instruction Execution Instr.Register Transfers add R[rd] <= R[rs] + R[rt];PC <= PC + 4 sub R[rd] <= R[rs] – R[rt];PC <= PC + 4 and R[rd] <= R[rs] & R[rt];PC <= PC + 4 or R[rd] <= R[rs] | R[rt];PC <= PC + 4 lw R[rt] <= MEM[R[rs] + s_extend(offset)]; PC<= PC + 4 sw MEM[R[rs] + sign_extend(offset)] <= R[rt];PC <= PC + 4 beq if (R[rs] == R[rt]) then PC <= PC+4 + s_extend(offset<<2) else PC <= PC + 4 j PC <= << 2) Register Transfers for the MIPS Subset

H.Y. Lin, CCUEE Computer Organization 14 Outline - Processor Implementation Overview Single-Cycle Implementation 1.Analyze instruction set; get datapath requirements  2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals Multi-Cycle Implementation Pipelined Implementation

H.Y. Lin, CCUEE Computer Organization Instruction Set Requirements Memory  Read Instructions  Read and Write Data Registers - 32  read (from rs field in instruction)  read (from rt field in instruction)  write (from rd or rt field in instruction) PC Sign Extender Add and Subtract (register values) Add 4 or extended immediate to PC

H.Y. Lin, CCUEE Computer Organization 16 Outline - Processor Implementation Overview Single-Cycle Implementation 1.Analyze instruction set; get datapath requirements 2.Select datapath components and  establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals Multi-Cycle Implementation Pipelined Implementation

H.Y. Lin, CCUEE Computer Organization (a) Choose Datapath Components Combinational Components  Adder  ALU  Multiplexer  Sign Extender Storage Components  Registers  Register File  Memory

H.Y. Lin, CCUEE Computer Organization 18 Datapath Combinational Components NOTES: - Blue-green inputs are control lines - Blue lines often hidden to suppress detail AdderALU Multiplexer Sign Extender

H.Y. Lin, CCUEE Computer Organization 19 Datapath Storage - Registers Registers store multiple bit values New value loaded on clock edge when EN asserted

H.Y. Lin, CCUEE Computer Organization 20 Datapath Storage: Idealized Memory Data Read  Place Address on ADDR  Assert MemRead  Data Available on RD after memory “ access time ” Data Write  Place address on ADDR  Place data input on WD  Assert MemWrite  Data written on clock edge

H.Y. Lin, CCUEE Computer Organization 21 Datapath Storage: Register File Register File - 32 registers (including $zero ) Two data outputs RD1, RD2  Assert register number RN1/RN2  Read output RD1/RD2 after “ access time ” (propagation delay) One data input WD  Assert register number WN  Assert value on WD  Assert RegWrite  Value loaded on clock edge Implemented as a small multiport memory

H.Y. Lin, CCUEE Computer Organization (b) Choose Clocking Methodology Clocking methodology defines  When signals can be read from storage elements  When signals can be written to storage elements Typical clocking methodologies  Single-Phase Edge Triggered  Single-Phase Level Triggered  Multiple-Phase Level Triggered Authors ’ choice: Single-Phase Edge Triggered  All registers updated on one edge of clock cycle  Simplest to work with

H.Y. Lin, CCUEE Computer Organization 23 Review: Edge-Triggered Clocking Controls sequential circuit operation  Register outputs change after first clock edge  Combinational logic determines “ next state ”  Storage elements store new state on next clock edge Adder Mux Combinational LogicRegister Output Register Input Clock

H.Y. Lin, CCUEE Computer Organization 24 Review: Edge-Triggered Clocking Propagation delay - t prop Logic (including register outputs) Interconnect Register setup time - t setup Clock Adder Mux Combinational LogicRegister Output Register Input t prop t setup t clock > t prop + t setup t clock = t prop + t setup + t slack

H.Y. Lin, CCUEE Computer Organization 25 Outline - Processor Implementation Overview Single-Cycle Implementation 1.Analyze instruction set; get datapath requirements 2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements  4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals Multi-Cycle Implementation Pipelined Implementation

H.Y. Lin, CCUEE Computer Organization Assemble Datapath Tasks processor must implement 1.Fetch Instruction from memory 2.Decode Instruction, read register values 3.If necessary, perform an ALU operation 4.If memory address, perform load/store 5.Write results back to register file and increment PC How can we do this with the datapath hardware?

H.Y. Lin, CCUEE Computer Organization 27 Instruction <= MEM[PC] PC <= PC + 4 Datapath for Instruction Fetch

H.Y. Lin, CCUEE Computer Organization 28 Datapath for R-Type Instructions add rd, rs, rt R[rd] <= R[rs] + R[rt];

H.Y. Lin, CCUEE Computer Organization 29 Datapath for Load/Store Instructions lw rt, offset(rs) R[rt] <- MEM[R[rs] + s_extend(offset)];

H.Y. Lin, CCUEE Computer Organization 30 Datapath for Load/Store Instructions sw rt, offset(rs) MEM[R[rs] + sign_extend(offset)] <= R[rt]

H.Y. Lin, CCUEE Computer Organization 31 Datapath for Branch Instructions beq rs, rt, offset if (R[rs] == R[rt]) then PC <= PC+4 + s_extend(offset<<2)

H.Y. Lin, CCUEE Computer Organization 32 Putting It All Together … Goal: merge datapaths for each function  Instruction Fetch  R-Type Instructions  Load/Store Instructions  Branch instructions Add multiplexers to steer data as needed

H.Y. Lin, CCUEE Computer Organization 33 Select an ALU input from either  Register File output RD2 (for R-Type)  Sign-extender output (for LW/SW) Select Register File input WD1 from either  ALU output (for R-Type)  Memory output RD (for LW) Example: Combine R-Type and Load/Store Datapaths

H.Y. Lin, CCUEE Computer Organization 34 Combined Datapath: R-Type and Load/Store Instructions

H.Y. Lin, CCUEE Computer Organization 35 add rd,rs,rt Combined Datapath: Executing an R-Type Instruction

H.Y. Lin, CCUEE Computer Organization 36 lw rt,offset(rs) Combined Datapath: Executing a load instruction

H.Y. Lin, CCUEE Computer Organization 37 sw rt,offset(rs) Combined Datapath: Executing a store instruction

H.Y. Lin, CCUEE Computer Organization 38 Complete Single-Cycle Datapath

H.Y. Lin, CCUEE Computer Organization 39 Complete Datapath Executing add add rd, rs, rt

H.Y. Lin, CCUEE Computer Organization 40 Complete Datapath Executing load lw rt,offset(rs)

H.Y. Lin, CCUEE Computer Organization 41 Complete Datapath Executing store sw rt,offset(rs)

H.Y. Lin, CCUEE Computer Organization 42 beq r1,r2,offset Complete Datapath Executing branch

H.Y. Lin, CCUEE Computer Organization 43 Refining the Complete Datapath Depending on the instruction, register file input WN is fed by different fields of the instruction  R-Type Instructions: rd field (bits 15:11)  Load Instructin: rt field (bits 21:16) Result: need an additional multiplexer on WN input oprsrtoffset 6 bits5 bits 16 bits oprsrtrdfunctshamt 6 bits5 bits 6 bits R-Format I-Format

H.Y. Lin, CCUEE Computer Organization 44 Complete Datapath (Refined)

H.Y. Lin, CCUEE Computer Organization 45 Complete Single-Cycle Datapath Control signals shown in blue

H.Y. Lin, CCUEE Computer Organization 46 Outline - Processor Implementation Overview Single-Cycle Implementation 1.Analyze instruction set; get datapath requirements 2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction  5.Assemble control logic to generate control signals Multi-Cycle Implementation Pipelined Implementation

H.Y. Lin, CCUEE Computer Organization 47 Control Unit Design Desired function:  Given an instruction word ….  Generate control signals needed to execute instruction Implemented as a combinational logic function:  Inputs Instruction word - op and funct fields ALU status output - Zero  Outputs - processor control points ALU control signals Multiplexer control signals Register File & memory control signal

H.Y. Lin, CCUEE Computer Organization 48 Determining Control Points For each instruction type, determine proper value for each control point (control signal)  0  1  X ( don ’ t care - either 1 or 0 ) Ultimately … use these values to build a truth table

H.Y. Lin, CCUEE Computer Organization 49 Review: ALU Control Signals Functions: Figure B.5.13 (also in Ch. 5 - p. 301) ALU control inputFunction 000AND 001OR 010add 110subtract 111set on less than

H.Y. Lin, CCUEE Computer Organization 50 Control Signals - R-Type Instruction Control signals shown in blue ??? Value depends on funct 0 0

H.Y. Lin, CCUEE Computer Organization 51 0 Control Signals - lw Instruction Control signals shown in blue

H.Y. Lin, CCUEE Computer Organization 52 0 Control Signals - sw Instruction Control signals shown in blue X X 0 1 0

H.Y. Lin, CCUEE Computer Organization 53 Control Signals - beq Instruction Control signals shown in blue X X if Zero=1

H.Y. Lin, CCUEE Computer Organization 54 Outline - Processor Implementation Overview Single-Cycle Implementation 1.Analyze instruction set; get datapath requirements 2.Select datapath components and establish clocking methodology 3.Assemble datapath that meets requirements 4.Determine control signal values for each instruction 5.Assemble control logic to generate control signals  Multi-Cycle Implementation Pipelined Implementation

H.Y. Lin, CCUEE Computer Organization 55 Control Unit Structure

H.Y. Lin, CCUEE Computer Organization 56 Control unit as shown: one huge logic block Idea: decompose into smaller logic blocks  Smaller blocks can be faster  Smaller blocks are easier to work with Observation (rephrased):  The only control signal that depends on the funct field is the ALU Operation signal  Idea: separate logic for ALU control More Notes About Control Unit Structure

H.Y. Lin, CCUEE Computer Organization 57 Modified Control Unit Structure This is called “derived control” or “Local decoding”

H.Y. Lin, CCUEE Computer Organization 58 Datapath with Modified Control Unit

H.Y. Lin, CCUEE Computer Organization 59 Review from Ch. 4: ALU Function Functions: Figure B.5.13 (also in Ch. 5 - p. 301) ALU control inputFunction 000AND 001OR 010add 110subtract 111set on less than

H.Y. Lin, CCUEE Computer Organization 60 ALU Usage in Processor Design Usage depends on instruction type  Instruction type (specified by opcode)  funct field (r-type instructions only) Encode instruction type in ALUOp signal OperationDesired Action lwadd swadd beqsubtract add subsubtract and or slt and or set on less than ALU Ctl funct XXXXXX Instr. type data transfer branch r-type ALUOp XXXXXX means “don’t care”

H.Y. Lin, CCUEE Computer Organization 61 ALU Control - Truth Table (Fig. 5-13) Use don’t care values to minimize length  Ignore F5, F4 (they are always “ 10 ” )  Assume ALUOp never equals “ 11 ” Operation ALUOp1 0 X ALUOp0 0 1 X X X X X F5 X X F4 X F3 X F2 X F1 X F0 X XXXXX XX XX XX XX XX

H.Y. Lin, CCUEE Computer Organization 62 ALU Control - Implementation Figure C.2.3, page C-6

H.Y. Lin, CCUEE Computer Organization 63 One More Modification - for Branch BEQ instruction depends on Zero output of ALU No other instruction uses Zero output Local decoding  Implement with new "Branch" control signal  Add AND gate to generate PCSelect

H.Y. Lin, CCUEE Computer Organization 64 Processor Design - Branch Modification

H.Y. Lin, CCUEE Computer Organization 65 Control Unit Implementation Review: Opcodes for key instructions Control Unit Truth Table: Fill in the blanks (or see Fig. 5-18, p. 308) Implementation: Decoder + 2 Gates (Fig. C.2.5) Op5Op4Op3Op2Op1Op0 RegDstALUSrcMemtoRegRegWriteMemReadMemWriteBranchALUOp1ALUOp OP RT lw sw beq InputOutput

H.Y. Lin, CCUEE Computer Organization 66 Control Unit Implementation

H.Y. Lin, CCUEE Computer Organization 67 Final Extension: Implementing j (jump) Instruction Format Register Transfer: PC <= (PC + ( I[25:0] << 2 ) Remember, it’s unconditional address 6 bits26 bits J-Format

H.Y. Lin, CCUEE Computer Organization 68 Final Extension: Implementing jump

H.Y. Lin, CCUEE Computer Organization 69 Performance is limited by the slowest instruction Example: suppose we have the following delays  Memory read/write200ps  ALU and adders100ps  Register File read/write50ps What is the critical path for each instruction?  R-format ps  Load word ps  Store word ps  Branch ps  Jump200200ps The Problem with Single-Cycle Processor Implementation: Performance

H.Y. Lin, CCUEE Computer Organization 70 Alternatives to Single-Cycle Multicycle Processor Implementation  Shorter clock cycle  Multiple clock cycles per instruction  Some instructions take more cycles then others  Less hardware required Pipelined Implementation  Overlap execution of instructions  Try to get short cycle times and low CPI  More hardware required … but also more performance!