CS Spring 2008 – Lec #17 – Retiming - 1

Slides:



Advertisements
Similar presentations
ENGIN112 L23: Finite State Machine Design Procedure October 27, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 23 Finite State Machine.
Advertisements

Lecture 11: Sequential Circuit Design. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 11: Sequential Circuits2 Outline  Sequencing  Sequencing Element Design.
Circuits require memory to store intermediate data
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis EE4800 CMOS Digital IC Design & Analysis Lecture 11 Sequential Circuit Design Zhuo Feng.
Synchronous Digital Design Methodology and Guidelines
CS 151 Digital Systems Design Lecture 25 State Reduction and Assignment.
CS Fall 2005 – Lec #16 – Retiming - 1 State Machine Timing zRetiming ySlosh logic between registers to balance latencies and improve clock timings.
CS Spring 2007 – Lec #16 – Retiming - 1 State Machine Timing zRetiming ySlosh logic between registers to balance latencies and improve clock timings.
CS61C L21 State Elements : Circuits that Remember (1) Spring 2007 © UCB 161 Exabytes In 2006  In 2006 we created, captured, and replicated 161 exabytes.
ECE C03 Lecture 101 Lecture 10 Finite State Machine Design Hai Zhou ECE 303 Advanced Digital Design Spring 2002.
EE 141 Project 2May 8, Outstanding Features of Design Maximize speed of one 8-bit Division by: i. Observing loop-holes in 8-bit division ii. Taking.
1 COMP541 Sequencing – III (Sequencing a Computer) Montek Singh April 9, 2007.
Give qualifications of instructors: DAP
1 EECS Components and Design Techniques for Digital Systems Lec 21 – RTL Design Optimization 11/16/2004 David Culler Electrical Engineering and Computer.
Contemporary Logic Design Finite State Machine Design © R.H. Katz Transparency No Chapter #8: Finite State Machine Design Finite State.
Logic and Computer Design Fundamentals Registers and Counters
Spring 2002EECS150 - Lec15-seq2 Page 1 EECS150 - Digital Design Lecture 15 - Sequential Circuits II (Finite State Machines revisited) March 14, 2002 John.
CS 61C L15 Blocks (1) A Carle, Summer 2005 © UCB inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #15: Combinational Logic Blocks.
Embedded Systems Hardware: Storage Elements; Finite State Machines; Sequential Logic.
1 Chapter 5: Datapath and Control CS 447 Jason Bakos.
CS61C L15 Synchronous Digital Systems (1) Beamer, Summer 2007 © UCB Scott Beamer, Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture.
CS 61C L15 State & Blocks (1) A Carle, Summer 2006 © UCB inst.eecs.berkeley.edu/~cs61c/su06 CS61C : Machine Structures Lecture #15: State 2 and Blocks.
Lecture 24: CPU Design Today’s topic –Multi-Cycle ALU –Introduction to Pipelining 1.
CS3350B Computer Architecture Winter 2015 Lecture 5.2: State Circuits: Circuits that Remember Marc Moreno Maza [Adapted.
EKT 221/4 DIGITAL ELECTRONICS II  Registers, Micro-operations and Implementations - Part3.
Lecture 8: Processors, Introduction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014,
Lecture 14: Processors CS 2011 Fall 2014, Dr. Rozier.
1 CSE370, Lecture 19 Lecture 19 u Logistics n Lab 8 this week to be done in pairs íFind a partner before your lab period íOtherwise you will have to wait.
Important Components, Blocks and Methodologies. To remember 1.EXORS 2.Counters and Generalized Counters 3.State Machines (Moore, Mealy, Rabin-Scott) 4.Controllers.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics Basics of register-transfer design: –data paths and controllers; –ASM charts. Pipelining.
DLD Lecture 26 Finite State Machine Design Procedure.
June 2005Computer Architecture, Background and MotivationSlide 1 Part I Background and Motivation.
Registers; State Machines Analysis Section 7-1 Section 5-4.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
CS 61C: Great Ideas in Computer Architecture Sequential Elements, Synchronous Digital Systems 1 Instructors: Vladimir Stojanovic & Nicholas Weaver
EE3A1 Computer Hardware and Digital Design Lecture 9 Pipelining.
RTL Hardware Design by P. Chu Chapter 9 – ECE420 (CSUN) Mirzaei 1 Sequential Circuit Design: Practice Shahnam Mirzaei, PhD Spring 2016 California State.
CS 110 Computer Architecture Lecture 9: Finite State Machines, Functional Units Instructor: Sören Schwertfeger School of.
George Mason University Finite State Machines Refresher ECE 545 Lecture 11.
Chapter #6: Sequential Logic Design
Flip Flops Lecture 10 CAP
© Copyright 2004, Gaetano Borriello and Randy H. Katz
Lecture 28 Logistics Today
Lecture 27 Logistics Last lecture Today: HW8 due Friday
Pipelining and Retiming 1
Instructor: Alexander Stoytchev
Typical Timing Specifications
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
CSE 140 MT 2 Review By Daniel Knapp.
Instructor: Alexander Stoytchev
Inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #21 State Elements: Circuits that Remember Hello to James Muerle in the.
Lecture 27 Logistics Last lecture Today: HW8 due Friday
David Culler Electrical Engineering and Computer Sciences
Guerilla Section #4 10/05 SDS & MIPS Datapath
Lecture 28 Logistics Today
Arithmetic Circuits (Part I) Randy H
Serial versus Pipelined Execution
CSE 370 – Winter Sequential Logic-2 - 1
Guest Lecturer TA: Shreyas Chand
Pipeline Principle A non-pipelined system of combination circuits (A, B, C) that computation requires total of 300 picoseconds. Comb. logic.
Finite State Machines (FSMs)
ECE 352 Digital System Fundamentals
Lecture 20 Logistics Last lecture Today HW6 due Wednesday
Lecture 22 Logistics Last lecture Today HW7 is due on Friday
EGR 2131 Unit 12 Synchronous Sequential Circuits
ECE 352 Digital System Fundamentals
Lecture 22 Logistics Last lecture Today HW7 is due on Friday
A relevant question Assuming you’ve got: One washer (takes 30 minutes)
Instructor: Michael Greenbaum
Presentation transcript:

CS 150 - Spring 2008 – Lec #17 – Retiming - 1 State Machine Timing Retiming Slosh logic between registers to balance latencies and improve clock timings Accelerate or retard cycle in which outputs are asserted Parallelism Doing more than one thing at a time Pipelining Splitting computations into overlapped, smaller time steps CS 150 - Spring 2008 – Lec #17 – Retiming - 1

Recall: Synchronous Mealy Machine Discussion Placement of flipflops before and after the output logic changes the timing of when the output signals are asserted … Synchronizer Circuitry at Inputs and Outputs 2 Cycles 1 Cycle, constraint 1 Cycle,no constraint CS 150 - Spring 2008 – Lec #17 – Retiming - 2

Recall: Synchronous Mealy Machine with Synchronizers Following Outputs Case III: Synchronized Outputs Signal goes into effect one cycle later A asserted during Cycle 0, ƒ' asserted in next cycle Effect of ƒ delayed one cycle CS 150 - Spring 2008 – Lec #17 – Retiming - 3

Vending Machine State Machine Moore machine outputs associated with state Mealy machine outputs associated with transitions 0¢ [0] 10¢ 5¢ 15¢ [1] N’ D’ + Reset D N N+D N’ D’ Reset’ Reset 0¢ 10¢ 5¢ 15¢ (N’ D’ + Reset)/0 D/0 D/1 N/0 N+D/1 N’ D’/0 Reset’/1 Reset/0 CS 150 - Spring 2008 – Lec #17 – Retiming - 4

State Machine Retiming Moore vs. (Async) Mealy Machine Vending Machine Example Open asserted only when in state 15 Open asserted when last coin inserted leading to state 15 CS 150 - Spring 2008 – Lec #17 – Retiming - 5

State Machine Retiming Retiming the Moore Machine: Faster generation of outputs Synchronizing the Mealy Machine: Add a FF, delaying the output These two implementations have identical timing behavior Push the AND gate through the State FFs and synchronize with an output FF Like computing open in the prior state and delaying it one state time CS 150 - Spring 2008 – Lec #17 – Retiming - 6

State Machine Retiming Effect on timing of Open Signal (Moore Case) Clk FF prop delay State Open Out prop delay Out calc Plus set-up NOTE: overlaps with Next State calculation Retimed Open Open Calculation CS 150 - Spring 2008 – Lec #17 – Retiming - 7

State Machine Retiming Timing behavior is the same, but are the implementations really identical? FF input in retimed Moore implementation FF input in synchronous Mealy implementation Only difference in don’t care case of nickel and dime at the same time CS 150 - Spring 2008 – Lec #17 – Retiming - 8

CS 150 - Spring 2008 – Lec #17 – Retiming - 9 General Picture Latch Bank Latch Bank Latch Bank Logic Logic T1 T2 Logic Cycle is set by MAX(T1, T2, T3) Move latches to reduce max delay T3 CS 150 - Spring 2008 – Lec #17 – Retiming - 9

CS 150 - Spring 2008 – Lec #17 – Retiming - 10 Example Two cycles from inputs to output Circuit delay is 3 CS 150 - Spring 2008 – Lec #17 – Retiming - 10

CS 150 - Spring 2008 – Lec #17 – Retiming - 11 Exchange latch and XOR Gate CS 150 - Spring 2008 – Lec #17 – Retiming - 11

CS 150 - Spring 2008 – Lec #17 – Retiming - 12 Retimed Circuit Cycle = 2 Delay = 2 CS 150 - Spring 2008 – Lec #17 – Retiming - 12

CS 150 - Spring 2008 – Lec #17 – Retiming - 13 Retimed Circuit Cycle = 2 Delay = 2 CS 150 - Spring 2008 – Lec #17 – Retiming - 13

Retimed, Simplified Circuit Cycle = 2 Delay = 2 CS 150 - Spring 2008 – Lec #17 – Retiming - 14

Retiming and Resynthesis Move all latches together Resynthesize logic to optimize for power, delay Reinsert latches in circuit to balance delays CS 150 - Spring 2008 – Lec #17 – Retiming - 15

CS 150 - Spring 2008 – Lec #17 – Retiming - 16 Example CS 150 - Spring 2008 – Lec #17 – Retiming - 16

CS 150 - Spring 2008 – Lec #17 – Retiming - 17 Example Retimed CS 150 - Spring 2008 – Lec #17 – Retiming - 17

Example Resynthesized CS 150 - Spring 2008 – Lec #17 – Retiming - 18

CS 150 - Spring 2008 – Lec #17 – Retiming - 19 Parallelism Doing more than one thing at a time: optimization in h/w often involves using parallelism to trade between cost and performance Example, Student final grade calculation: read mt1, mt2, final, project; grade = 0.2  mt1 + 0.2  mt2 + 0.2  final + 0.4  project; write grade; High performance hardware implementation: As many operations as possible are done in parallel CS 150 - Spring 2008 – Lec #17 – Retiming - 19

CS 150 - Spring 2008 – Lec #17 – Retiming - 20 Parallelism Is there a lower cost hardware implementation? Different tree organization? Can factor out multiply by 0.2: How about sharing operators (multipliers and adders)? CS 150 - Spring 2008 – Lec #17 – Retiming - 20

CS 150 - Spring 2008 – Lec #17 – Retiming - 21 Time Multiplexing Reuse single ALU for all adds and multiplies Lower hardware cost, longer latency BUT must add muxes/registers/control Consider the combinational hardware circuit diagram as an abstract computation-graph: Alternative building blocks CS 150 - Spring 2008 – Lec #17 – Retiming - 21

CS 150 - Spring 2008 – Lec #17 – Retiming - 22 Time Multiplexing x + 0.4 Proj x + 0.2 final x + 0.2 mt2 x + 0.2 0.4 mt1 mt2 final proj x + 0.2 mt1 Time-multiplexing “covers” the computation graph by performing the action of each node one at a time CS 150 - Spring 2008 – Lec #17 – Retiming - 22

CS 150 - Spring 2008 – Lec #17 – Retiming - 23 Time Multiplexing x 0.2 final + 0.4 Proj x 0.2 mt1 + mt2 x 0.2 1 final mt1 + 0.4 Proj mt2 x 1 + CS 150 - Spring 2008 – Lec #17 – Retiming - 23

CS 150 - Spring 2008 – Lec #17 – Retiming - 24 Pipelining Principle Pipelining review from CS61C: Analog to washing clothes: step 1: wash (20 minutes) step 2: dry (20 minutes) step 3: fold (20 minutes) 60 minutes x 4 loads  4 hours wash load1 load2 load3 load4 dry load1 load2 load3 load4 fold load1 load2 load3 load4 20 min overlapped  2 hours CS 150 - Spring 2008 – Lec #17 – Retiming - 24

CS 150 - Spring 2008 – Lec #17 – Retiming - 25 Pipelining wash load1 load2 load3 load4 dry load1 load2 load3 load4 fold load1 load2 load3 load4 Increase number of loads, average time per load approaches 20 minutes Latency (time from start to end) for one load = 60 min Throughput = 3 loads/hour Pipelined throughput  # of pipe stages x un-pipelined throughput CS 150 - Spring 2008 – Lec #17 – Retiming - 25

CS 150 - Spring 2008 – Lec #17 – Retiming - 26 Pipelining General principle: Cut the CL block into pieces (stages) and separate with registers: T’ = 4 ns + 1 ns + 4 ns +1 ns = 10 ns F = 1/(4 ns +1 ns) = 200 MHz CL block produces a new result every 5 ns instead of every 9 ns Assume T = 8 ns TFF(setup +clkq) = 1 ns F = 1/9 ns = 111 MHz Assume T1 = T2 = 4 ns CS 150 - Spring 2008 – Lec #17 – Retiming - 26

CS 150 - Spring 2008 – Lec #17 – Retiming - 27 Limits on Pipelining Without FF overhead, throughput improvement proportional to # of stages After many stages are added. FF overhead begins to dominate: Other limiters to effective pipelining: Clock skew contributes to clock overhead Unequal stages FFs dominate cost Clock distribution power consumption feedback (dependencies between loop iterations) FF “overhead” is the setup and clk to Q times. CS 150 - Spring 2008 – Lec #17 – Retiming - 27

CS 150 - Spring 2008 – Lec #17 – Retiming - 28 Pipelining Example F(x) = yi = a xi2 + b xi + c x and y are assumed to be “streams” Divide into 3 (nearly) equal stages. Insert pipeline registers at dashed lines. Can we pipeline basic operators? Computation graph: CS 150 - Spring 2008 – Lec #17 – Retiming - 28

Example: Pipelined Adder Possible, but usually not done … (arithmetic units can often be made sufficiently fast without internal pipelining) CS 150 - Spring 2008 – Lec #17 – Retiming - 29

State Machine Retiming Summary Vending Machine Example Very simple output function in this particular case But if output takes a long time to compute vs. the next state computation time -- can use retiming to “balance” these calculations and reduce the cycle time Parallelism Tradeoffs in cost and performance Time reuse of hardware to reduce cost but sacrifice performance Pipelining Introduce registers to split computation to reduce cycle time and allow parallel computation Trade latency (number of stage delays) for cycle time reduction CS 150 - Spring 2008 – Lec #17 – Retiming - 30

CS 150 - Spring 2008 – Lec #17 – Retiming - 31 Announcements Midterm II – POSTPONED! Let you complete checkpoint 2 Tuesday, April 15 March in CS 150 Laboratory, 2:10 - 3:30+ (that is one week from Tuesday -- tell your friends!) Closed book, open double sided crib sheet TA review session next week Five or so design-oriented questions covering material since start of course Heavy emphasis on material since MT 1 CS 150 - Spring 2008 – Lec #17 – Retiming - 31