Lecture 25 Logistics Last lecture Today HW8 due today

Slides:



Advertisements
Similar presentations
Chapter 7 Henry Hexmoor Registers and RTL
Advertisements

CS-447– Computer Architecture Lecture 12 Multiple Cycle Datapath
Levels in Processor Design
Topics covered: CPU Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.
Chapter 16 Control Unit Implemntation. A Basic Computer Model.
Computer Organization. Motivation Computer Design as an application of digital logic design  Computer = Processing Unit + Memory System  Processing.
Chapter 5 Basic Processing Unit
Chap 7. Register Transfers and Datapaths. 7.1 Datapaths and Operations Two types of modules of digital systems –Datapath perform data-processing operations.
Lec 15Systems Architecture1 Systems Architecture Lecture 15: A Simple Implementation of MIPS Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some.
Computer Organization CS224 Fall 2012 Lesson 22. The Big Picture  The Five Classic Components of a Computer  Chapter 4 Topic: Processor Design Control.
IT253: Computer Organization Lecture 9: Making a Processor: Single-Cycle Processor Design Tonga Institute of Higher Education.
Datapath and Control Unit Design
Computer Organization CDA 3103 Dr. Hassan Foroosh Dept. of Computer Science UCF © Copyright Hassan Foroosh 2002.
1 CSE370, Lecture 26 Lecture 26 u Logistics n Ant extra credit problem due today n Extra lab check-off times íMonday 12:30-4:20 íTuesday 12:00-2:00 n All.
Control units In the last lecture, we introduced the basic structure of a control unit, and translated our assembly instructions into a binary representation.
Functions of Processor Operation Addressing modes Registers i/o module interface Memory module interface Interrupts.
Controller Implementation
Dr.Ahmed Bayoumi Dr.Shady Elmashad
Computer Organization and Architecture + Networks
Control Unit Lecture 6.
ARM Organization and Implementation
Edexcel GCSE Computer Science Topic 15 - The Processor (CPU)
Lecture 27 Logistics Last lecture Today: HW8 due Friday
Chap 7. Register Transfers and Datapaths
Morgan Kaufmann Publishers
Morgan Kaufmann Publishers The Processor
Basics of digital systems
Morgan Kaufmann Publishers
Morgan Kaufmann Publishers The Processor
Prof. Sirer CS 316 Cornell University
Register Transfer and Microoperations
Sequential logic examples
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
Processor (I).
CS/COE0447 Computer Organization & Assembly Language
An Introduction to Microprocessor Architecture using intel 8085 as a classic processor
Instructor: Alexander Stoytchev
Lecture 27 Logistics Last lecture Today: HW8 due Friday
Basic Processing Unit Unit- 7 Engineered for Tomorrow CSE, MVJCE.
Single-Cycle CPU DataPath.
Computer Organization “Central” Processing Unit (CPU)
Morgan Kaufmann Publishers Computer Organization and Assembly Language
Controller Implementation--Part II
Controller Implementation--Part I
Instructor: Alexander Stoytchev
Levels in Processor Design
Rocky K. C. Chang 6 November 2017
Computer Organization
Levels in Processor Design
Guest Lecturer TA: Shreyas Chand
Registers.
Fundamental Concepts Processor fetches one instruction at a time and perform the operation specified. Instructions are fetched from successive memory locations.
Computer Architecture
William Stallings Computer Organization and Architecture 8th Edition
COMS 361 Computer Organization
Recall: ROM example Here are three functions, V2V1V0, implemented with an 8 x 3 ROM. Blue crosses (X) indicate connections between decoder outputs and.
Levels in Processor Design
Branch instructions We’ll implement branch instructions for the eight different conditions shown here. Bits 11-9 of the opcode field will indicate the.
Overview Last lecture Digital hardware systems Today
Prof. Sirer CS 316 Cornell University
Levels in Processor Design
Control units In the last lecture, we introduced the basic structure of a control unit, and translated our assembly instructions into a binary representation.
ECE 352 Digital System Fundamentals
The Stored Program Computer
Lecture 6 CdM-8 CPU overview
Review: The whole processor
Levels in Processor Design
Computer Operation 6/22/2019.
Processor: Datapath and Control
Chapter 4 The Von Neumann Model
Presentation transcript:

Lecture 25 Logistics Last lecture Today HW8 due today Ant extra credit due Friday Final exam, Wednesday March 18, 2:30-4:20 pm here Review session Monday, March 16, 4:30 pm, here Last lecture Encoding & Partitioning examples Today Pipelining & Retiming Control vs Datapath in a simple computer design

Other sequential logic optimization techniques Pipelining --- allows faster clock speed Retiming --- can reduce registers or change delays

Pipelining related definitions Latency: Time to perform a computation Data input to data output Throughput: Input or output data rate Typically the clock rate Combinational delays drive performance Define d  delay through slowest combinational stage n  number of stages from input to output Latency  n * d (in sec) Throughput  1/d (in Hz)

Pipelining What? Why? Subdivide combinational logic Add registers between logic Why? Trade latency for throughput Increased throughput Reduce logic delays Increase clock speed Increased latency Takes cycles to fill the pipe Increase circuit utilization Simultaneous computations Logic Reg Logic Reg Logic Reg

Pipelining When? Where? Need throughput more than latency Reg Logic Reg When? Need throughput more than latency Signal processing Logic delays > setup/hold times Acyclic logic Where? At natural breaks in the combinational logic Adding registers makes sense

Pipelining example

Pipelining and clock skew Which is faster? Which is safer?

Retiming Pipelining adds registers Retiming moves registers around To increase the clock speed Retiming moves registers around Reschedules computations to optimize performance Minimize critical path Optimize logic across register boundaries Reduce register count Without altering functionality

× Retiming in a nutshell Change position of FFs For speed To suit implementation target Retiming modifies state assignment Preserves FSM functionality b a ? ×

Retiming ground rules Rules: Remove one register from each input and add one to each output Remove one register from each output and add one to each input Combinational logic Register

Retiming examples Reduce register count Change output delays a D Q a x b D Q x d d b D Q

Optimal pipelining Add registers Use retiming to optimize location 8 7 13 10 5 6 Input Output Added registers for pipelining

Example: Digital correlator yt = (xt, a0) + (xt–1, a1) + (xt–2, a2) + (xt–3, a3)  is a comparator: (x, a) = 1 if x = a; 0 otherwise yt is the number of matches between input and pattern a0a1a2a3 yt Output + + + xt Input d d d d a0 a1 a2 a3

Example: Digital correlator (cont’d) Delays: Comparator = 3; adder = 7 Output + + + Original design cycle time = 24 Input d d d d a0 a1 a2 a3 + d Input Output a0 a1 a2 a3 Retimed design cycle time = 13

Data-path and control Digital hardware systems = data-path + control datapath: registers, counters, combinational functional units (e.g., ALU), communication (e.g., busses) control: FSM generating sequences of control signals that instructs datapath what to do next "puppeteer who pulls the strings" control status info and inputs control signal outputs state data-path "puppet"

Tri-state gates The third value Tri-state gates logic values: “0”, “1” don't care: “X” (must be 0 or 1 in real circuit!) third value or state: “Z” — high impedance, infinite R, no connection Tri-state gates additional input – output enable (OE) output values are 0, 1, and Z when OE is high, the gate functions normally when OE is low, the gate is disconnected from wire at output allows more than one gate to be connected to the same output wire as long as only one has its output enabled at any one time (otherwise, sparks could fly) OE In Out In OE Out X 0 Z 0 1 0 1 1 1 100 non-inverting tri-state buffer In OE Out

Tri-state and multiplexing When using tri-state logic (1) make sure never more than one "driver" for a wire at any one time (pulling high and low at the same time can severely damage circuits) (2) make sure to only use value on wire when its being driven (using a floating value may cause failures) Using tri-state gates to implement an economical multiplexer when Select is high Input1 is connected to F when Select is low Input0 is connected to F this is essentially a 2:1 mux OE F Input0 Input1 Select

Open-collector gates and wired-AND Open collector: another way to connect gate outputs to the same wire gate only has the ability to pull its output low it cannot actively drive the wire high (default – pulled high through resistor) Wired-AND can be implemented with open collector logic if A and B are "1", output is actively pulled low if C and D are "1", output is actively pulled low if one gate output is low and the other high, then low wins if both gate outputs are "1", the wire value "floats", pulled high by resistor low to high transition usually slower than it would have been with a gate pulling high hence, the two NAND functions are ANDed together with ouputs wired together using "wired-AND" to form (AB)'(CD)' open-collector NAND gates

Structure of a computer Block diagram view control signals data conditions Data Path Control address read/write data Processor Memory System central processing unit (CPU) instruction unit – instruction fetch and interpretation FSM execution unit – functional units and registers

LD asserted during a lo-to-hi clock transition loads new data into FFs Registers Selectively loaded – EN or LD input Output enable – OE input Multiple registers – group 4 or 8 in parallel OE Q7 Q6 Q5 Q4 Q3 Q2 Q1 Q0 LD D7 D6 D5 D4 D3 D2 D1 D0 CLK OE asserted causes FF state to be connected to output pins; otherwise they are left unconnected (high impedance) LD asserted during a lo-to-hi clock transition loads new data into FFs

Register files Collections of registers in one package two-dimensional array of FFs address used as index to a particular word can have separate read and write addresses so can do both at same time 4 by 4 register file 16 D-FFs organized as four words of four bits each write-enable (load) read-enable (output enable) RE RB RA WE WB WA D3 D2 D1 D0 Q3 Q2 Q1 Q0

Memories Larger collections of storage elements implemented not as FFs but as much more efficient latches high-density memories use 1 to 5 switches (transitors) per memory bit Static RAM – 1024 words each 4 bits wide once written, memory holds forever (not true for denser dynamic RAM) address lines to select word (10 lines for 1024 words) read enable same as output enable often called chip select permits connection of many chips into larger array write enable (same as load enable) bi-directional data lines output when reading, input when writing RD WR A9 A8 A7 A6 A5 A4 A3 A2 A1 A0 IO3 IO2 IO1 IO0

Instruction sequencing Example – an instruction to add the contents of two registers (Rx and Ry) and place result in a third register (Rz) Step 1: get the ADD instruction from memory into an instruction register (IR) Step 2: decode instruction instruction in IR has the code of an ADD instruction register indices used to generate output enables for registers Rx and Ry register index used to generate load signal for register Rz Step 3: execute instruction enable Rx and Ry output and direct to ALU setup ALU to perform ADD operation direct result to Rz so that it can be loaded into register

Instruction types Data manipulation Data staging Control add, subtract increment, decrement multiply shift, rotate immediate operands Data staging load/store data to/from memory register-to-register move Control conditional/unconditional branches in program flow subroutine call and return

Elements of the control unit (aka instruction unit) Standard FSM elements state register next-state logic output logic (datapath/control signalling) Moore or synchronous Mealy machine to avoid loops unbroken by FF Plus additional "control" registers instruction register (IR) program counter (PC) Inputs/outputs outputs control elements of data path inputs from data path used to alter flow of program (test if zero)

Instruction execution Control state diagram (for each diagram) reset fetch instruction decode execute Instructions partitioned into three classes branch load/store register-to-register Different sequence through diagram for each instruction type Reset Init Initialize Machine Fetch Instr. Load/ Store XEQ Instr. Branch Register- to-Register Branch Taken Branch Not Taken Incr. PC

Data path (hierarchy) Arithmetic circuits constructed in hierarchical and iterative fashion each bit in datapath is functionally identical 4-bit, 8-bit, 16-bit, 32-bit , 32-bit datapaths Cin Ain Bin Sum Cout FA HA

Data path (ALU) ALU block diagram input: data and operation to perform output: result of operation and status information 16 A B S Z N Operation

Data path (ALU + registers) Accumulator special register one of the inputs to ALU output of ALU stored back in accumulator One-address instructions operation and address of one operand other operand and destination is accumulator register AC  AC op Mem[addr] "single address instructions” (AC implicit operand) Multiple registers part of instruction used to choose register operands 16 Z N OP AC REG

Data path (bit-slice) Bit-slice concept – iterate to build n-bit wide datapaths CO CI ALU AC R0 from memory rs rt rd CO ALU AC R0 from memory rs rt rd CI 1 bit wide 2 bits wide

Instruction path Program counter Instruction register keeps track of program execution address of next instruction to read from memory may have auto-increment feature or use ALU Instruction register current instruction includes ALU operation and address of operand also holds target of jump instruction immediate operands Relationship to data path PC may be incremented through ALU contents of IR may also be required as input to ALU

Data path (memory interface) separate data and instruction memory (Harvard architecture) two address busses, two data busses single combined memory (Princeton architecture) single address bus, single data bus Separate memory ALU output goes to data memory input register input from data memory output data memory address from instruction register instruction register from instruction memory output instruction memory address from program counter Single memory address from PC or IR memory output to instruction and data registers memory input from ALU output

Block diagram of processor Register transfer view of Princeton architecture which register outputs are connected to which register inputs arrows represent data-flow, other are control signals from control FSM MAR may be a simple multiplexer rather than separate register MBR is split in two (REG and IR) load control for each register load path 16 REG AC rd wr 16 16 store path data Data Memory (16-bit words) OP N addr 8 Z Control FSM MAR 16 IR PC 16 16 OP 16

Block diagram of processor Register transfer view of Harvard architecture which register outputs are connected to which register inputs arrows represent data-flow, other are control signals from control FSM two MARs (PC and IR) two MBRs (REG and IR) load control for each register Control FSM 16 Z N OP AC REG load path store path Data Memory (16-bit words) PC IR data addr rd wr Inst Memory (8-bit words)