Download presentation
Presentation is loading. Please wait.
1
Chapter 2: Combinational Logic Design
Digital Design Chapter 2: Combinational Logic Design Slides to accompany the textbook Digital Design, First Edition, by Frank Vahid, John Wiley and Sons Publishers, 2007. Copyright © 2007 Frank Vahid Instructors of courses requiring Vahid's Digital Design textbook (published by John Wiley and Sons) have permission to modify and use these slides for customary course-related activities, subject to keeping this copyright notice in place and unmodified. These slides may be posted as unanimated pdf versions on publicly-accessible course websites.. PowerPoint source (or pdf with animations) may not be posted to publicly-accessible websites, but may be posted for students on internal protected sites or distributed directly to students by other electronic means. Instructors may make printouts of the slides available to students for a reasonable photocopying charge, without incurring royalties. Any other use requires explicit permission. Instructors may obtain PowerPoint source or obtain special use permissions from Wiley – see for information.
2
Introduction Let’s learn to design digital circuits
2.1 Introduction Digital circuit Let’s learn to design digital circuits We’ll start with a simple form of circuit: Combinational circuit A digital circuit whose outputs depend solely on the present combination of the circuit inputs’ values 1 a Combinational 1 F b digital circuit 1 a Sequential ? F b digital circuit Note: Slides with animation are denoted with a small red "a" near the animated items
3
Switches Electronic switches are the basis of binary digital circuits
2.2 Switches Electronic switches are the basis of binary digital circuits Electrical terminology Voltage: Difference in electric potential between two points Current: Flow of charged particles Resistance: Tendency of wire to resist current flow V = I * R (Ohm’s Law) 4.5 A 2 ohms 9V V 9 + – 4.5 A 4.5 A
4
Switches A switch has three parts Source input, and output
Current wants to flow from source input to output Control input Voltage that controls whether that current can flow control input “off” source output input a control input “on” source output input ( b ) discrete transistor IC relay vacuum tube quarter (to see the relative size)
5
The CMOS Transistor CMOS transistor Basic switch in modern ICs 2.3
conducts does not conduct gate nMOS 1 does not conduct 1 gate pMOS conducts Silicon -- not quite a conductor or insulator: Semiconductor
6
2.4 Boolean Logic Gates Building Blocks for Digital Circuits (Because Switches are Hard to Work With) “Logic gates” are better digital circuit building blocks than switches (transistors) Why?...
7
Boolean Algebra and its Relation to Digital Circuits
Variables represent 0 or 1 only Operators return 0 or 1 only Basic operators AND: a AND b returns 1 only when both a=1 and b=1 OR: a OR b returns 1 if either (or both) a=1 or b=1 NOT: NOT a returns the opposite of a (1 if a=0, 0 if a=1) a 1 b AND a 1 b OR a 1 NOT
8
Converting to Boolean Equations
Convert the following English statements to a Boolean equation Q1. a is 1 and b is 1. Answer: F = a AND b Q2. either of a or b is 1. Answer: F = a OR b Q3. both a and b are not 0. Answer: (a) Option 1: F = NOT(a) AND NOT(b) (b) Option 2: F = a OR b a
9
Relating Boolean Algebra to Digital Design
x y F OR F x y AND Symbol x F x 1 y F Truth table 1 1 y x F 1 x y F Transistor circuit x F Implement Boolean operators using transistors Call those implementations logic gates.
10
NOT/OR/AND Logic Gate Timing Diagrams
1 x y F time 1 time F x
11
Example: Seat Belt Warning Light System
Design circuit for warning light Sensors s=1: seat belt fastened k=1: key inserted p=1: person in seat Capture Boolean equation person in seat, and seat belt not fastened, and key inserted Convert equation to circuit w = p AND NOT(s) AND k k p s w BeltWarn
12
Boolean Algebra Terminology
Example equation: F(a,b,c) = a’bc + abc’ + ab + c Variable Represents a value (0 or 1) Three variables: a, b, and c Literal Appearance of a variable, in true or complemented form Nine literals: a’, b, c, a, b, c’, a, b, and c Product term Product of literals Four product terms: a’bc, abc’, ab, c Sum-of-products Equation written as OR of product terms only Above equation is in sum-of-products form. “F = (a+b)c + d” is not.
13
Boolean Algebra Properties
Example uses of the properties Commutative a + b = b + a a * b = b * a Distributive a * (b + c) = a * b + a * c a + (b * c) = (a + b) * (a + c) (this one is tricky!) Associative (a + b) + c = a + (b + c) (a * b) * c = a * (b * c) Identity 0 + a = a + 0 = a 1 * a = a * 1 = a Complement a + a’ = 1 a * a’ = 0 To prove, just evaluate all possibilities Show abc + abc’ = ab. Use first distributive property abc + abc’ = ab(c+c’). Complement property Replace c+c’ by 1: ab(c+c’) = ab(1). Identity property ab(1) = ab*1 = ab.
14
Boolean Algebra: Additional Properties
Null elements a + 1 = 1 a * 0 = 0 Idempotent Law a + a = a a * a = a Involution Law (a’)’ = a DeMorgan’s Law (a + b)’ = a’b’ (ab)’ = a’ + b’ Very useful! To prove, just evaluate all possibilities Circuit a b c S Circuit S a b c
15
Representations of Boolean Functions
2.6 Representations of Boolean Functions English 1: F outputs 1 when a is 0 and b is 0, or when a is 0 and b is 1. English 2: F outputs 1 when a is 0, regardless of b’s value ( a ) a a b F b 1 Equation 1: F(a,b) = a’b’ + a’b F 1 1 Equation 2: F(a,b) = a’ 1 ( b ) ( c ) 1 1 Circuit 1 Truth table a F ( d ) Circuit 2 T he function F A function can be represented in different ways Above shows seven representations of the same functions F(a,b), using four different methods: English, Equation, Circuit, and Truth Table
16
Truth Table Representation of Boolean Functions
Define value of F for each possible combination of input values 2-input function: 4 rows 3-input function: 8 rows 4-input function: 16 rows Q: Use truth table to define function F(a,b,c) that is 1 when abc is 5 or greater in binary a b F a b c F a b c d F 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ( a ) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ( b ) 1 1 c 1 a b F 1 1 1 1 1 1 1 1 1 1 a 1 1 1 1 1 1 1 1 ( c )
17
Standard Representation: Truth Table
How can we determine if two functions are the same? Used algebraic methods But if we failed, does that prove not equal? No. Solution: Convert to truth tables Only ONE truth table representation of a given function Standard representation -- for given function, only one version in standard form exists Q: Determine if F=ab+a’ is same function as F=a’b’+a’b+ab, by converting each to truth table first a 1 b F F = ab + a ' a 1 b F F = a’b’ + a’b + ab a Same
18
Canonical Form -- Sum of Minterms
Truth tables too big for numerous inputs Use standard form of equation instead Known as canonical form Boolean algebra: create sum of minterms Minterm: product term with every function literal appearing exactly once, in true or complemented form Just multiply-out equation until sum of product terms Then expand each term until all terms are minterms a
19
Multiple-Output Circuits
Many circuits have more than one output Can give each a separate circuit, or can share gates Ex: F = ab + c’, G = ab + bc a b c F G ( ) Option 1: Separate circuits Option 2: Shared gates
20
Multiple-Output Example: BCD to 7-Segment Converter
a = w’x’y’z’ + w’x’yz’ + w’x’yz + w’xy’z + w’xyz’ + w’xyz + wx’y’z’ + wx’y’z b = w’x’y’z’ + w’x’y’z + w’x’yz’ + w’x’yz + w’xy’z’ + w’xyz + wx’y’z’ + wx’y’z
21
Combinational Logic Design Process
2.7 Combinational Logic Design Process Step Description Step 1 Capture the function Create a truth table or equations, whichever is most natural for the given problem, to describe the desired behavior of the combinational logic. Step 2 Convert to equations This step is only necessary if you captured the function using a truth table instead of equations. Create an equation for each output by ORing all the minterms for that output. Simplify the equations if desired. Step 3 Implement as a gate-based circuit For each output, create a circuit corresponding to the output’s equation. (Sharing gates among multiple outputs is OK optionally.)
22
Example: Number of 1s Count
Problem: Output in binary on two outputs yz the number of 1s on three inputs 010 00 Step 1: Capture the function Truth table or equation? Truth table is straightforward Step 2: Convert to equation y = a’bc + ab’c + abc’ + abc z = a’b’c + a’bc’ + ab’c’ + abc Step 3: Implement as a gate-based circuit a b c z a b c y
23
More Gates 2.8 NAND: Opposite of AND (“NOT AND”)
1 x y F 1 x y F NAND NOR XOR XNOR NAND NOR x x F F y y x y F x y F x y F x y F 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 NAND: Opposite of AND (“NOT AND”) NOR: Opposite of OR (“NOT OR”) XOR: Exactly 1 input is 1, for 2-input XOR. (For more inputs -- odd number of 1s) XNOR: Opposite of XOR (“NOT XOR”) NAND same as AND with power & ground switched Why? nMOS conducts 0s well, but not 1s (reasons beyond our scope) -- so NAND more efficient Likewise, NOR same as OR with power/ground switched AND in CMOS: NAND with NOT OR in CMOS: NOR with NOT So NAND/NOR more common
24
More Gates: Example Uses
Aircraft lavatory sign example S = (abc)’ Detecting all 0s Use NOR Detecting equality Use XNOR Detecting odd # of 1s Use XOR Useful for generating “parity” bit common for detecting errors Circuit a b S c 1 a0 b0 a1 b1 a2 b2 A=B
25
Completeness of NAND Any Boolean function can be implemented using just NAND gates. Why? Need AND, OR, and NOT NOT: 1-input NAND (or 2-input NAND with inputs tied together) AND: NAND followed by NOT OR: NAND preceded by NOTs Likewise for NOR
26
2.9 Decoders and Muxes Decoder: Popular combinational logic building block, in addition to logic gates Converts input binary number to one high output 2-input decoder: four possible input binary numbers So has four outputs, one for each possible input binary number Internal design AND gate for each output to detect input combination Decoder with enable e Outputs all 0 if e=0 Regular behavior if e=1 n-input decoder: 2n outputs i0 i1 d0 d1 d2 d3 1 i0 d0 d1 d2 d3 i1 i0 i1 d0 d1 d2 d3 e 1 i1’i0’ i1’i0 i1i0’ i1i0
27
Multiplexor (Mux) Mux: Another popular combinational building block
Routes one of its N data inputs to its one output, based on binary value of select inputs 4 input mux needs 2 select inputs to indicate which input to route through 8 input mux 3 select inputs N inputs log2(N) selects Like a railyard switch
28
Mux Internal Design 2x1 mux 4x1 mux i0 (1*i0=i0) 1 i0 (0+i0=i0) s0 d
× 1 2 × 1 2 × 1 i1 i0 s0 d 1 i0 (0+i0=i0) i0 i0 d d i1 i1 s0 s0 a 2x1 mux i0 4 × 1 i2 i1 i3 s1 s0 d 4x1 mux
29
Muxes Commonly Together -- N-bit Mux
2 × 1 a3 i0 Simplifying notation: d b3 i1 s0 4 4-bit 2 × 1 4 C a2 i0 2x1 d A I 4 b2 i1 s0 4 D C is short B I 1 f or 2 × 1 a1 i0 s0 d b1 i1 c3 s0 s0 c2 2 × 1 a0 i0 d c1 b0 i1 s0 s0 c0 Ex: Two 4-bit inputs, A (a3 a2 a1 a0), and B (b3 b2 b1 b0) 4-bit 2x1 mux (just four 2x1 muxes sharing a select line) can select between A or B
30
N-bit Mux Example Four possible display items
Temperature (T), Average miles-per-gallon (A), Instantaneous mpg (I), and Miles remaining (M) -- each is 8-bits wide Choose which to display using two inputs x and y Use 8-bit 4x1 mux
31
Additional Considerations Schematic Capture and Simulation
2.10 Additional Considerations Schematic Capture and Simulation Inputs Inputs i0 i0 i1 i1 Simulate Simulate Outputs Outputs d3 d3 d2 d2 d1 d1 d0 d0 Schematic capture Computer tool for user to capture logic circuit graphically Simulator Computer tool to show what circuit outputs would be for given inputs Outputs commonly displayed as waveform
32
Additional Considerations Non-Ideal Gate Behavior -- Delay
Real gates have some delay Outputs don’t change immediately after inputs change
33
Chapter Summary Combinational circuits
Circuit whose outputs are function of present inputs No “state” Switches: Basic component in digital circuits Boolean logic gates: AND, OR, NOT -- Better building block than switches Enables use of Boolean algebra to design circuits Boolean algebra: uses true/false variables/operators Representations of Boolean functions: Can translate among Combinational design process: Translate from equation (or table) to circuit through well-defined steps More gates: NAND, NOR, XOR, XNOR also useful Muxes and decoders: Additional useful combinational building blocks
34
Chapter 3: Sequential Logic Design -- Controllers
Digital Design Chapter 3: Sequential Logic Design -- Controllers Slides to accompany the textbook Digital Design, First Edition, by Frank Vahid, John Wiley and Sons Publishers, 2007. Copyright © 2007 Frank Vahid Instructors of courses requiring Vahid's Digital Design textbook (published by John Wiley and Sons) have permission to modify and use these slides for customary course-related activities, subject to keeping this copyright notice in place and unmodified. These slides may be posted as unanimated pdf versions on publicly-accessible course websites.. PowerPoint source (or pdf with animations) may not be posted to publicly-accessible websites, but may be posted for students on internal protected sites or distributed directly to students by other electronic means. Instructors may make printouts of the slides available to students for a reasonable photocopying charge, without incurring royalties. Any other use requires explicit permission. Instructors may obtain PowerPoint source or obtain special use permissions from Wiley – see for information.
35
Must know sequence of past inputs to know output
3.1 Introduction Sequential circuit Output depends not just on present inputs (as in combinational circuit), but on past sequence of inputs Stores bits, also known as having “state” In this chapter, we will: Design a new building block, a flip-flop, that stores one bit Combine that block to build multi-bit storage – a register Describe the sequential behavior using a finite state machine Convert a finite state machine to a controller – a sequential circuit having a register and combinational logic 1 a 1 Combinational digital circuit F b 1 a ? Sequential digital circuit F b Must know sequence of past inputs to know output Note: Slides with animation are denoted with a small red "a" near the animated items
36
Example Needing Bit Storage
3.2 Example Needing Bit Storage Bit Storage Blue light Call button Cancel Flight attendant call button Press call: light turns on Stays on after button released Press cancel: light turns off Logic gate circuit to implement this? 1. Call button pressed – light turns on Bit Storage Blue light Call button Cancel 2. Call button released – light stays on a Q Call Cancel Doesn’t work. Q=1 when Call=1, but doesn’t stay 1 when Call returns to 0 Need some form of “feedback” in the circuit Bit Storage Blue light Call button Cancel 3. Cancel button pressed – light turns off a
37
First attempt at Bit Storage
We need some sort of feedback Does circuit on the right do what we want? S Q t t 1 Q S t 1 Q S 1 t Q S 1 t Q S a 1 Q t S
38
Bit Storage Using an SR Latch
Q S (set) SR latch R (reset) Does the circuit to the right, with cross-coupled NOR gates, do what we want? R=1 S=0 t Q 1 R S t Q S=0 R=0 t Q S=1 R=0 t Q R=0 S=0 Recall… 1 1 1 X 1 1 1 1 1 1 a
39
Example Using SR Latch for Bit Storage
SR latch can serve as bit storage in previous example of flight-attendant call button Call=1 : sets Q to 1 Q stays 1 even after Call=0 Cancel=1 : resets Q to 0 But, there’s a problem... Bit S t o r age Blue lig h C all but on an c el R S Q C all but t on Blue lig h an c el
40
Problem with SR Latch Problem
If S=1 and R=1 simultaneously, we don’t know what value Q will take Q may oscillate. Then, because one path will be slightly longer than the other, Q will eventually settle to 1 or 0 – but we don’t know which.
41
Problem with SR Latch Problem not just one of a user pressing two buttons at same time Can also occur even if SR inputs come from a circuit that supposedly never sets S=1 and R=1 at same time But does, due to different delays of different paths R Y X S SR latch Q A r bit a y circuit The longer path from X to R than to S causes SR=11 for short time – could be long enough to cause oscillation
42
Solution: Level-Sensitive SR Latch
Add enable input “C” as shown Only let S and R change when C=0 Enure circuit in front of SR never sets SR=11, except briefly due to path delays Change C to 1 only after sufficient time for S and R to be stable When C becomes 1, the stable S and R value passes through the two AND gates to the SR latch’s S1 R1 inputs. R1 S1 S C R Level-sensitive SR latch Q S C Q ’ R Though SR=11 briefly... Level-sensitive SR latch 1 S R C S1 R1 S X S1 Level-sensitive SR latch symbol C a Clk ...S1R1 never = 11 Q R R1 Y
43
Clock Signals for a Latch
How do we know when it’s safe to set C=1? Most common solution –make C pulse up/down C=0: Safe to change X, Y C=1: Must not change X, Y We’ll see how to ensure that later Clock signal -- Pulsing signal used to enable latches Sequential circuit whose storage components all use clock signals: synchronous circuit Most common type Asynchronous circuits – important topic, but left for advanced course R1 S1 S X Y C Clk R Q Level-sensitive SR latch
44
Clocks Clock period: time interval between pulses
Above signal: period = 20 ns Clock cycle: one such time interval Above signal shows 3.5 clock cycles Clock frequency: 1/period Above signal: frequency = 1 / 20 ns = 50 MHz 1 Hz = 1/s 100 GHz 10 GHz 1 GHz 100 MHz 10 MHz 0.01 ns 0.1 ns 1 ns 10 ns 100 ns Period Freq
45
Level-Sensitive D Latch
SR latch requires careful design to ensure SR=11 never occurs D latch relieves designer of that burden Inserted inverter ensures R always opposite of S R S D C D latch Q D Q ’ C D latch symbol
46
Problem with Level-Sensitive D Latch
D latch still has problem (as does SR latch) When C=1, through how many latches will a signal travel? Depends on for how long C=1 Clk_A -- signal may travel through multiple latches Clk_B -- signal may travel through fewer latches Hard to pick C that is just the right length Can we design bit storage that only stores a value on the rising edge of a clock signal? rising edges Clk
47
D Flip-Flop Flip-flop: Bit storage that stores on clock edge, not level One design -- master-servant Two latches, output of first goes to input of second, master latch has inverted clock signal So master loaded when C=0, then servant when C=1 When C changes from 0 to 1, master disabled, servant loaded with value that was at D just before C changed -- i.e., value at D during rising edge of C Clk rising edges Note: Hundreds of different flip-flop designs exist D latch master servant D Dm Ds Cs Qm Q s ’ Qs C m Clk D flip-flop
48
D Flip-Flop Q ’ D D Q ’ Q Symbol for rising-edge triggered D flip-flop
Internal design: Just invert servant clock rather than master The triangle means clock input, edge triggered Q Symbol for rising-edge triggered D flip-flop Symbol for falling-edge triggered D flip-flop Clk rising edges falling edges Clk
49
D Flip-Flop Solves problem of not knowing through how many latches a signal travels when C=1 In figure below, signal travels through exactly one flip-flop, for Clk_A or Clk_B Why? Because on rising edge of Clk, all four flip-flops are loaded simultaneously -- then all four no longer pay attention to their input, until the next rising edge. Doesn’t matter how long Clk is 1. D1 Q1 D2 Q2 D3 Q3 D4 Q4 Y Clk Clk_A Clk_B Two latches inside each flip-flop
50
D Latch vs. D Flip-Flop Latch is level-sensitive: Stores D when C=1
Flip-flop is edge triggered: Stores D when C changes from 0 to 1 Saying “level-sensitive latch,” or “edge-triggered flip-flop,” is redundant Two types of flip-flops -- rising or falling edge triggered. Comparing behavior of latch and flip-flop:
51
Bit Storage Summary R (reset) S (set) Q SR latch S1 R1 S Q C R Level-sensitive SR latch S R D Q C D latch D flip-flop D latch master servant Dm Qm C m Ds D Clk Qs’ Cs Qs Q ’ Feature: S=1 sets Q to 1, R=1 resets Q to 0. Problem: SR=11 yield undefined Q. Feature: S and R only have effect when C=1. We can design outside circuit so SR=11 never happens when C=1. Problem: avoiding SR=11 can be a burden. Feature: SR can’t be 11 if D is stable before and while C=1, and will be 11 for only a brief glitch even if D changes while C=1. Problem: C=1 too long propagates new values through too many latches: too short may not enable a store. Feature: Only loads D value present at rising clock edge, so values can’t propagate to other flip-flops during same clock cycle. Tradeoff: uses more gates internally than D latch, and requires more external gates than SR – but gate count is less of an issue today. We considered increasingly better bit storage until we arrived at the robust D flip-flop bit storage
52
Basic Register Typically, we store multi-bit items
e.g., storing a 4-bit binary number Register: multiple flip-flops sharing clock signal From this point, we’ll use registers for bit storage No need to think of latches or flip-flops But now you know what’s inside a register
53
Finite-State Machines (FSMs) and Controllers
3.3 Finite-State Machines (FSMs) and Controllers Want sequential circuit with particular behavior over time Example: Laser timer Push button: x=1 for 3 clock cycles How? Let’s try three flip-flops b=1 gets stored in first D flip-flop Then 2nd flip-flop on next cycle, then 3rd flip-flop on next OR the three flip-flop outputs, so x should be 1 for three cycles Controller x b clk laser patient
54
Describing Behavior of Sequential Circuit: FSM
Finite-State Machine (FSM) A way to describe desired behavior of sequential circuit Akin to Boolean equations for combinational behavior List states, and transitions among states Example: Make x change toggle (0 to 1, or 1 to 0) every clock cycle Two states: “Off” (x=0), and “On” (x=1) Transition from Off to On, or On to Off, on rising clock edge Arrow with no starting state points to initial state (when circuit first starts) Outputs: x On O ff x=0 x=1 clk ^
55
FSM Example: 0,1,1,1,repeat Want 0, 1, 1, 1, 0, 1, 1, 1, ...
Each value for one clock cycle Can describe as FSM Four states Transition on rising clock edge to next state Outputs: x On1 O ff On2 On3 clk ^ x=1 x=0 O ff On1 On2 On3 clk x State Outputs:
56
Extend FSM to Three-Cycles High Laser Timer
On2 On1 On3 O ff clk ^ x=1 x=0 b ’*clk b*clk Inputs: b; Outputs: x Four states Wait in “Off” state while b is 0 (b’) When b is 1 (and rising clock edge), transition to On1 Sets x=1 On next two clock edges, transition to On2, then On3, which also set x=1 So x=1 for three cycles after button pressed
57
FSM Simplification: Rising Clock Edges Implicit
ff x=1 x=0 b’ clk ^ *clk b Inputs: b; Outputs: x Showing rising clock on every transition: cluttered Make implicit -- assume every edge has rising clock, even if not shown What if we wanted a transition without a rising edge We don’t consider such asynchronous FSMs -- less common, and advanced topic Only consider synchronous FSMs -- rising edge on every transition On2 On1 On3 Off x=1 x=0 b ’ Inputs: b; Outputs: x a Note: Transition with no associated condition thus transistions to next state on next clock cycle
58
FSM Definition FSM consists of Set of states
Ex: {Off, On1, On2, On3} Set of inputs, set of outputs Ex: Inputs: {x}, Outputs: {b} Initial state Ex: “Off” Set of transitions Describes next states Ex: Has 5 transitions Set of actions Sets outputs while in states Ex: x=0, x=1, x=1, and x=1 Inputs: b; Outputs: x On2 On1 On3 Off x=1 x=0 b ’ We often draw FSM graphically, known as state diagram Can also use table (state table), or textual languages
59
FSM Example: Secure Car Key
Many new car keys include tiny computer chip When car starts, car’s computer (under engine hood) requests identifier from key Key transmits identifier If not, computer shuts off car FSM Wait until computer requests ID (a=1) Transmit ID (in this case, 1101) K1 K2 K3 K4 r=1 r=0 Wait Inputs: a; Outputs: r a ’
60
FSM Example: Secure Car Key (cont.)
Nice feature of FSM Can evaluate output behavior for different input sequence Timing diagrams show states and output values for different input waveforms K1 K2 K3 K4 r=1 r=0 W ait I nputs: a ; O utputs: r ’ Q: Determine states and r value for given input waveform: W ait K1 K2 K3 K4 clk I nputs O utputs S t a e r clk I nputs a W ait K1 K2 K3 K4 Output State r K1 a
61
FSM Example: Code Detector
Unlock door (u=1) only when buttons pressed in sequence: start, then red, blue, green, red Input from each button: s, r, g, b Also, output a indicates that some colored button pressed FSM Wait for start (s=1) in “Wait” Once started (“Start”) If see red, go to “Red1” Then, if see blue, go to “Blue” Then, if see green, go to “Green” Then, if see red, go to “Red2” In that state, open the door (u=1) Wrong button at any step, return to “Wait”, without opening door s Start u r Door Red Code g lock a Green detector b Blue a Wait Start Red1 R ed2 Green Blue s ’ a r b g ab ag ar u=0 u=1 Inputs: s,r,g,b,a; Outputs: u a Q: Can you trick this FSM to open the door, without knowing the code? a A: Yes, hold all buttons simultaneously
62
Improve FSM for Code Detector
Inputs: s,r,g,b,a; Outputs: u Wait s’ ar’ ab’ ag’ ar’ u=0 a s Start a ’ u=0 ar ab ag ar Red1 Blue Green Red2 a ’ a ’ a ’ u=0 u=0 u=0 u=1 Note: small problem still remains; we’ll discuss later New transition conditions detect if wrong button pressed, returns to “Wait” FSM provides formal, concrete means to accurately define desired behavior
63
Standard Controller Architecture
How implement FSM as sequential circuit? Use standard architecture State register -- to store the present state Combinational logic -- to compute outputs, and next state For laser timer FSM 2-bit state register, can represent four states Input b, output x Known as controller On2 On1 On3 Off x=1 x=0 b ’ Inputs: b; Outputs: x Combinational logic State register s1 s0 n1 n0 x b clk FSM inputs outputs General version Combinational logic S m N O I clk m-bit state register FSM outputs inputs a
64
3.4 Controller Design Five step controller design process
65
Controller Design: Laser Timer Example
Step 1: Capture the FSM Already done Step 2: Create architecture 2-bit state register (for 4 states) Input b, output x Next state signals n1, n0 Step 3: Encode the states Any encoding with each state unique will work Inputs: b; Outputs: x x=0 00 b ’ O ff a b x=1 x=1 x=1 01 On1 10 On2 11 On3 Combinational logic State register s1 s0 n1 n0 x b clk FSM outputs inputs a
66
Controller Design: Laser Timer Example (cont)
Step 4: Create state table Inputs: b; Outputs: x x=0 00 b ’ O ff a b x=1 x=1 x=1 01 On1 10 On2 11 On3
67
Controller Design: Laser Timer Example (cont)
Step 5: Implement combinational logic a x = s1 + s0 (note from the table that x=1 if s1 = 1 or s0 = 1) n1 = s1’s0b’ + s1’s0b + s1s0’b’ + s1s0’b n1 = s1’s0 + s1s0’ n0 = s1’s0’b + s1s0’b’ + s1s0’b n0 = s1’s0’b + s1s0’
68
Controller Design: Laser Timer Example (cont)
Step 5: Implement combinational logic (cont) s0 s1 clk Combinational Logic State register b x n0 n1 a x = s1 + s0 n1 = s1’s0 + s1s0’ n0 = s1’s0’b + s1s0’
69
Understanding the Controller’s Behavior
x=0 x=0 s0 s1 b x n1 n0 x=1 x=0 ’ 01 00 10 11 On2 On1 O ff On3 1 clk st a t e=01 s0 s1 b x n1 n0 x=1 ’ 01 10 11 On2 On1 O ff On3 clk b ’ 00 00 O ff b b 1 x=1 x=1 x=1 01 On1 10 On2 11 On3 b x n1 1 a n0 s1 s0 clk 1 clk st a t e=00 st a t e=00 I nputs: b O utputs: x
70
Controller Example: Button Press Synchronizer
bi bo Want simple sequential circuit that converts button press to single cycle duration, regardless of length of time that button actually pressed We assumed such an ideal button press signal in earlier example, like the button in the laser timer controller
71
Controller Example: Button Press Synchronizer (cont)
Step 2: Create architecture Combinational logic n0 s1 s0 n1 bo bi clk State register FSM inputs outputs Step 1: FSM A B C bo=1 bo=0 bi b i ’ FSM inputs: bi; FSM outputs: bo Step 5: Create combinational circuit clk State register outputs bo bi s1 s0 n1 n0 Combinational logic n1 = s1’s0bi + s1s0bi n0 = s1’s0’bi bo = s1’s0bi’ + s1’s0bi = s1s0 a Step 4: State table FSM Step 3: Encode states 00 01 10 bo=1 bo=0 bi ’ FSM inputs: bi; FSM outputs: bo Step 5: Create combinational circuit
72
Controller Example: Sequence Generator
Want generate sequence 0001, 0011, 1100, 1000, (repeat) Each value for one clock cycle Common, e.g., to create pattern in 4 lights, or control magnets of a “stepper motor” Step 2: Create architecture Combinational logic n0 s1 s0 n1 clk State register w x y z Step 1: Create FSM A B D wxyz=0001 wxyz=1000 wxyz=0011 wxyz=1100 C Inputs: none; Outputs: w,x,y,z A B D wxyz=0001 wxyz=1000 wxyz=0011 wxyz=1100 C Inputs: none; Outputs: w,x,y,z Step 3: Encode states 00 01 10 11 clk State register w x y z FSM outputs n0 s0 s1 n1 Step 4: Create state table w = s1 x = s1s0’ y = s1’s0 z = s1’ n1 = s1 xor s0 n0 = s0’ a Step 5: Create combinational circuit
73
Controller Example: Secure Car Key
(from earlier example) K1 K2 K3 K4 r=1 r=0 Wait Inputs: a; Outputs: r a ’ Step 1 Step 4 a Combinational logic s2 s1 s0 n2 r a n1 n0 clk State register Step 2 a ’ r=0 r=1 000 001 010 011 100 I nputs: ; O utputs: r Step 3 We’ll omit Step 5
74
Example: Seq. Circuit to FSM (Reverse Engineering)
What does this circuit do? y=s1’ z = s1s0’ n1=(s1 xor s0)x n0=(s1’*s0’)x states a Pick any state names you want clk State register y z n0 n1 s0 s1 x states with outputs states with outputs and transitions Work backwards
75
Common Pitfalls Regarding Transition Properties
Only one condition should be true For all transitions leaving a state Else, which one? One condition must be true Else, where go? a
76
Verifying Correct Transition Properties
Can verify using Boolean algebra Only one condition true: AND of each condition pair (for transitions leaving a state) should equal 0 proves pair can never simultaneously be true One condition true: OR of all conditions of transitions leaving a state) should equal 1 proves at least one condition must be true Example a * a’b = (a * a’) * b = 0 * b = 0 OK! Answer: a a + a’b = a*(1+b) + a’b = a + ab + a’b = a + (a+a’)b = a + b Fails! Might not be 1 (i.e., a=0, b=0) Q: For shown transitions, prove whether: * Only one condition true (AND of each pair is always 0) * One condition true (OR of all transitions is always 1)
77
Evidence that Pitfall is Common
Recall code detector FSM We “fixed” a problem with the transition conditions Do the transitions obey the two required transition properties? Consider transitions of state Start, and the “only one true” property Wait s ’ u=0 s a Start a ’ u=0 ar ab ag ar Red1 Blue Green Red2 a ’ a ’ a ’ u=0 u=0 u=0 u=1 ar * a’ a’ * a(r’+b+g) ar * a(r’+b+g) = (a*a’)r = 0*r = (a’*a)*(r’+b+g) = 0*(r’+b+g) = (a*a)*r*(r’+b+g) = a*r*(r’+b+g) = 0 = 0 = arr’+arb+arg = 0 + arb+arg = arb + arg = ar(b+g) Fails! Means that two of Start’s transitions could be true Intuitively: press red and blue buttons at same time: conditions ar, and a(r’+b+g) will both be true. Which one should be taken? Q: How to solve? a A: ar should be arb’g’ (likewise for ab, ag, ar)
78
Simplifying Notations
FSMs Assume unassigned output implicitly assigned 0 Sequential circuits Assume unconnected clock inputs connected to same external clock a
79
Non-Ideal Flip-Flop Behavior
Can’t change flip-flop input too close to clock edge Setup time: time that D must be stable before edge Else, stable value not present at internal latch Hold time: time that D must be held stable after edge Else, new value doesn’t have time to loop around and stabilize in internal latch Setup time violation Leads to oscillation!
80
Flip-Flop Set and Reset Inputs
Some flip-flops have additional inputs Synchronous reset: clears Q to 0 on next clock edge Synchronous set: sets Q to 1 on next clock edge Asynchronous reset: clear Q to 0 immediately (not dependent on clock edge) Example timing diagram shown Asynchronous set: set Q to 1 immediately
81
Initial State of a Controller
All our FSMs had initial state But our sequential circuit designs did not Can accomplish using flip-flops with reset/set inputs Shown circuit initializes flip-flops to 01 Designer must ensure reset input is 1 during power up of circuit By electronic circuit design Inputs: x; Outputs: b x=0 Off b ’ b x=1 x=1 x=1 On1 On2 On3 D Q ’ R S State register clk reset s1 s0 n0 n1 b x Combinational logic
82
Chapter Summary Sequential circuits
Have state Created robust bit-storage device: D flip-flop Put several together to build register, which we used to hold state Defined FSM formal model to describe sequential behavior Using solid mathematical models -- Boolean equations for combinational circuit, and FSMs for sequential circuits -- is very important. Defined 5-step process to convert FSM to sequential circuit Controller So now we know how to build the class of sequential circuits known as controllers
83
Chapter 4: Datapath Components
Digital Design Chapter 4: Datapath Components Slides to accompany the textbook Digital Design, First Edition, by Frank Vahid, John Wiley and Sons Publishers, 2007. Copyright © 2007 Frank Vahid Instructors of courses requiring Vahid's Digital Design textbook (published by John Wiley and Sons) have permission to modify and use these slides for customary course-related activities, subject to keeping this copyright notice in place and unmodified. These slides may be posted as unanimated pdf versions on publicly-accessible course websites.. PowerPoint source (or pdf with animations) may not be posted to publicly-accessible websites, but may be posted for students on internal protected sites or distributed directly to students by other electronic means. Instructors may make printouts of the slides available to students for a reasonable photocopying charge, without incurring royalties. Any other use requires explicit permission. Instructors may obtain PowerPoint source or obtain special use permissions from Wiley – see for information.
84
Introduction si ansis e z
4.1 Introduction Chapters 2 & 3: Introduced increasingly complex digital building blocks Gates, multiplexors, decoders, basic registers, and controllers Controllers good for systems with control inputs/outputs Control input: Single bit (or just a few), representing environment event or state e.g., 1 bit representing button pressed Data input: Multiple bits collectively representing single entity e.g., 7 bits representing temperature in binary Need building blocks for data Datapath components, aka register-transfer-level (RTL) components, store/transform data Put datapath components together to form a datapath This chapter introduces numerous datapath components, and simple datapaths Next chapter will combine controllers and datapaths into “processors” si ansis e z Note: Slides with animation are denoted with a small red "a" near the animated items
85
Registers load si ansis e z Basic register loads on every clock cycle
4.2 Registers Combinational logic State register s1 s0 n1 n0 x b clk Can store data, very common in datapaths Basic register of Ch 3: Loaded every cycle Useful for implementing FSM -- stores encoded state For other uses, may want to load only on certain cycles a load How extend to only load on certain cycles? D Q I 2 3 Q2 Q3 Q1 Q0 1 clk 4-bit register si ansis e z Basic register loads on every clock cycle
86
Register with Parallel Load
Add 2x1 mux to front of each flip-flop Register’s load input selects mux input to pass Either existing flip-flop value, or new value to load
87
Basic Example Using Registers
Q3 Q2 Q1 Q0 a3 a2 a1 a0 I 3 2 1 ld R1 R0 R2 clk This example will show how registers load simultaneously on clock cycles Notice that all load inputs set to 1 in this example -- just for demonstration purposes
88
Basic Example Using Registers
Q3 Q2 Q1 Q0 a3 a2 a1 a0 I 3 2 1 ld R1 R0 R2 clk
89
Register Example using the Load Input: Weight Sampler
Scale has two displays Present weight Saved weight Useful to compare present item with previous item Use register to store weight Pressing button causes present weight to be stored in register Register contents always displayed as “Saved weight,” even when new present weight appears Scale Weight Sampler 1 1 1 Save b I 3 I 2 I 1 I a 2 pounds 3 pounds load 1 Present weight clk Q3 Q2 Q1 Q0 3 pounds Saved weight
90
Register Example: Temperature History Display
Recall Chpt 3 example Timer pulse every hour Previously used as clock. Better design only connects oscillator to clock inputs -- use registers with load input, connect to timer pulse. Q4 Clk C t4 t3 t2 t1 t0 Q3 Q2 Q1 Q0 ld Ra Rb Rc I 4 3 2 1 a4 a3 a2 a1 a0 b4 b3 b2 b1 b0 c4 c3 c2 c1 c0 TemperatureHistoryStorage timer osc new line Q4 C x4 x3 x2 x1 x0 Q3 Q2 Q1 Q0 R a b I 4 3 2 1 a4 a3 a2 a1 a0 c b4 b3 b2 b1 b0 c4 c3 c2 c1 c0 TemperatureHistoryStorage a
91
Register Example: Above-Mirror Display
8 Shorthand notation a C Loaded on clock edge 8 t er d0 load r eg0 T ompu i0 mi c T 2 × 4 o the ab al 8 r r om the car's r t 1 1 or displ r F n 8-bit Ch2 example: Four simultaneous values from car’s computer To reduce wires: Computer writes only 1 value at a time, loads into one of four registers Was: = 32 wires Now: = 11 wires e d1 load c r eg1 A a0 4 × 1 i0 o i1 a v y e i1 8 a1 d D d2 load r eg2 I 8 i2 8 d3 load r eg3 M e 1 load i3 s1 s0 8 x y
92
Register Example: Computerized Checkerboard
Each register holds values for one column of lights 1 lights light Microprocessor loads one register at a time Occurs fast enough that user sees entire board change at once
93
Register Example: Computerized Checkerboard
LED lit LED R7 R6 R5 R4 R3 R2 R1 R0 D i2,i1,i0 000 (R0) 001 (R1) 010 (R2) 011 (R3) 100 (R4) 101 (R5) 110 (R6) 111 (R7) e clk
94
Shift Register Shift right
Register contents Shift right Move each bit one position right Shift in 0 to leftmost bit before shift right Register contents after shift right a Q: Do four right shifts on 1001, showing value after each shift A: 1001 (original) a 0100 shr_in Implementation: Connect flip-flop output to next flip-flop’s input 0010 0001 a 0000
95
Shift Register To allow register to either shift or retain, use 2x1 muxes shr: 0 means retain, 1 shift shr_in: value to shift in May be 0, or 1 Note: Can easily design shift register that shifts left instead
96
Rotate Register Register contents before shift right after shift right Rotate right: Like shift right, but leftmost bit comes from rightmost bit
97
Shift Register Example: Above-Mirror Display
Earlier example: = 11wires from car’s computer to above-mirror display’s four registers Better than 32 wires, but 11 still a lot -- want fewer for smaller wire bundles Use shift registers Wires: 1+2+1=4 Computer sends one value at a time, one bit per clock cycle 11 wires Note: this line is 1 bit, rather than 8 bits like before x y c shr_in d0 shr r eg0 T s1 s0 × i0 2 4 8 shr_in 4 × 1 d1 shr r eg1 A a0 i0 i1 a1 i1 8 shr_in d D d2 shr r eg2 I 8 i2 8 shr_in d3 shr e r eg3 M shi f t i3 8
98
Multifunction Registers
Many registers have multiple functions Load, shift, clear (load all 0s) And retain present value, of course Easily designed using muxes Just connect each mux input to achieve desired function Functions: s1 s0 Operation Maintain present value 1 Parallel load 1 Shift right 1 1 (unused - let's load 0s)
99
Multifunction Registers
Operation Maintain present value 1 Parallel load 1 Shift right 1 1 Shift left
100
Multifunction Registers with Separate Control Inputs
Maintain present value Shift left Shift right Shift right – shr has priority over shl Parallel load Parallel load – ld has priority Operation shl shr ld 1 Q2 Q1 Q0 Q3 I 2 1 3 s1 shr_in shr shl ld s0 shl_in c ombi- n a tional ci r cuit ? Maintain value Shift left Shift right Parallel load Note Operation s0 s1 1 Outputs Inputs ld shr shl Truth table for combinational circuit s1 = ld’*shr’*shl + ld’*shr*shl’ + ld’*shr*shl s0 = ld’*shr’*shl + ld a a a
101
Register Operation Table
Register operations typically shown using compact version of table X means same operation whether value is 0 or 1 One X expands to two rows Two Xs expand to four rows Put highest priority control input on left to make reduced table simple Inputs Outputs No t e ld shr shl s1 s0 Operation ld shr shl Ope r a tion Maintain value M ai n tain v alue 1 1 1 Shift left 1 Shi f t le f t Shift right 1 Shift right X 1 Parallel load X 1 Parallel load 1
102
Register Design Process
Can design register with desired operations using simple four-step process
103
Register Design Example
Operation Maintain present value Parallel load Shift left Synchronous clear Synchronous set s0 1 s1 s2 Desired register operations Load, shift left, synchronous clear, synchronous set Step 1: Determine mux size 5 operations: above, plus maintain present value (don’t forget this one!) --> Use 8x1 mux D Q Qn 7 6 3 2 1 In 5 4 s2 s1 s0 from Qn-1 a Step 2: Create mux operation table Step 3: Connect mux inputs Step 4: Map control lines Operation Maintain present value Shift left Parallel load Set to all 1s Clear to all 0s s0 1 s1 s2 shl X ld clr Inputs Outputs set s2 = clr’*set s1 = clr’*set’*ld’*shl + clr s0 = clr’*set’*ld + clr a
104
Register Design Example
3 I 2 I 1 I shl I 3 I 2 I 1 I s2 s1 shl_in ld combi- shl_in s0 national set circuit Q3 Q2 Q1 Q0 clr Q3 Q2 Q1 Q0 Step 4: Map control lines Operation Maintain present value Shift left Parallel load Set to all 1s Clear to all 0s s0 1 s1 s2 shl X ld clr Inputs Outputs set s2 = clr’*set s1 = clr’*set’*ld’*shl + clr s0 = clr’*set’*ld + clr
105
Adders Adds two N-bit binary numbers
4.3 Adders Adds two N-bit binary numbers 2-bit adder: adds two 2-bit numbers, outputs 3-bit result e.g., = (1 + 3 = 4) Can design using combinational design process of Ch 2, but doesn’t work well for reasonable-size N Why not? Inputs Outputs a1 a0 b1 b0 c s1 s0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
106
Why Adders Aren’t Built Using Standard Combinational Design Process
4.3 Why Adders Aren’t Built Using Standard Combinational Design Process Truth table too big 2-bit adder’s truth table shown Has 2(2+2) = 16 rows 8-bit adder: 2(8+8) = 65,536 rows 16-bit adder: 2(16+16) = ~4 billion rows 32-bit adder: ... Big truth table with numerous 1s/0s yields big logic Plot shows number of transistors for N-bit adders, using state-of-the-art automated combinational design tool 1 s0 s1 c b0 b1 a1 Inputs Outputs a0 10000 8000 6000 4000 2000 1 2 3 4 5 N 6 7 8 T r ansis t ors Q: Predict number of transistors for 16-bit adder A: 1000 transistors for N=5, doubles for each increase of N. So transistors = 1000*2(N-5). Thus, for N=16, transistors = 1000*2(16-5) = 1000*2048 = 2,048, Way too many! Transistors a
107
Alternative Method to Design an Adder: Imitate Adding by Hand
Alternative adder design: mimic how people do addition by hand One column at a time Compute sum, add carry to next column 1 1 1 A: 1 B: a
108
Alternative Method to Design an Adder: Imitate Adding by Hand
Create component for each column Adds that column’s bits, generates sum and carry bits 1 1 A: B: a 1 b c o s a ci A: B: + 1 SUM Half-adder Full-adders
109
Half-Adder Half-adder: Adds 2 bits, generates sum and carry
o s a ci A: B: + 1 SUM Half-adder: Adds 2 bits, generates sum and carry Design using combinational design process from Ch 2 s 1 c o b a I nputs O utputs Step 1: Capture the function co = ab s = a’b + ab’ (same as s = a xor b) Step 2: Convert to equations a b c o s Half-adder Step 3: Create the circuit
110
Full-Adder Full-adder: Adds 3 bits, generates sum and carry
o s a ci A: B: + 1 SUM Full-adder: Adds 3 bits, generates sum and carry Design using combinational design process from Ch 2 Step 1: Capture the function s 1 c o ci b a I nputs O utputs Step 3: Create the circuit c o ci b a s Full adder Step 2: Convert to equations co = a’bc + ab’c + abc’ + abc co = a’bc +abc +ab’c +abc +abc’ +abc co = (a’+a)bc + (b’+b)ac + (c’+c)ab co = bc + ac + ab s = a’b’c + a’bc’ + ab’c’ + abc s = a’(b’c + bc’) + a(b’c’ + bc) s = a’(b xor c)’ + a(b xor c) s = a xor b xor c
111
Carry-Ripple Adder Using half-adder and full-adders, we can build adder that adds like we would by hand Called a carry-ripple adder 4-bit adder shown: Adds two 4-bit numbers, generates 5-bit output 5-bit output can be considered 4-bit “sum” plus 1-bit “carry out” Can easily build any size adder a3 c o s F A b3 a2 b2 s3 s2 s1 ci b a a1 b1 s0 a0 b0 HA ( ) a3 a2 a1 a0 b3 s3 s2 s1 s0 c o b2 b1 b0 ( b ) 4-bit adder
112
Carry-Ripple Adder Using full-adder instead of half-adder for first bit, we can include a “carry in” bit in the addition Will be useful later when we connect smaller adders to form bigger adders a3 b3 a2 b2 a1 b1 a0 b0 ci a b ci a b ci a b ci a b ci a3 a2 a1 a0 b3 b2 b1 b0 F A F A F A F A 4-bit adder ci c o s c o s c o s c o s c o s3 s2 s1 s0 c o s3 s2 s1 s0 ( a ) ( b )
113
Carry-Ripple Adder’s Behavior
c o s F A ci b a Assume all inputs initially 0 1 1 (answer should be 01000) c o s F A o2 o1 o0 ci b a 1 a O utput a f t er 2 ns (1 F A del a y) 1 Wrong answer -- something wrong? No -- just need more time for carry to ripple through the chain of full adders.
114
Carry-Ripple Adder’s Behavior
1 1 1 1 (answer should be 01000) 1 1 a b ci a b ci a b ci a b ci F A F A F A F A c o s c o s c o s c o s 1 c o1 1 Outputs after 4ns (2 FA delays) ( b ) c o s F A 1 o2 ci b a ( ) Outputs after 6ns (3 FA delays) 1 1 a c o s F A 1 ci b a ( d ) Output after 8ns (4 FA delays) 1 1 Correct answer appears after 4 FA delays
115
Cascading Adders a3 a2 a1 a0 b3 s3 s2 s1 s0 c o s7 s6 s5 s4 ci b2 b1
( a ) b 4-bit adder a7.. a0 b7.. b0 s7.. s0 8-bit adder
116
Shifters Shifting (e.g., left shifting 0011 yields 0110) useful for:
4.4 Shifters Shifting (e.g., left shifting 0011 yields 0110) useful for: Manipulating bits Converting serial data to parallel (remember earlier above-mirror display example with shift registers) Shift left once is same as multiplying by 2 (0011 (3) becomes 0110 (6)) Why? Essentially appending a 0 -- Note that multiplying decimal number by 10 accomplished just be appending 0, i.e., by shifting left (55 becomes 550) Shift right once same as dividing by 2 inL i3 q3 q2 q1 q0 i2 i1 i0 inR 2 s0 s1 shL shR 1 Shifter with left shift, right shift, and no shift 1 in sh i3 q3 q2 q1 q0 i2 i1 i0 Shifter with left shift or no shift i2 q3 q2 q1 q0 in i3 i1 i0 Left shifter <<1 Symbol a ( a )
117
Shifter Example: Approximate Celsius to Fahrenheit Converter
Convert 8-bit Celsius input to 8-bit Fahrenheit output F = C * 9/ Approximate: F = C* Use left shift: F = left_shift(C) + 32 <<1 0 (shift in 0) 8-bit adder 8 (32) C * 2 (12) 8 a (24) 8 (56) F
118
Barrel Shifter A shifter that can shift by any amount
1 in sh i3 q3 q2 q1 q0 i2 i1 i0 Shift by 1 shifter uses 2x1 muxes. 8x1 mux solution for 8-bit barrel shifter: too many wires. A shifter that can shift by any amount 4-bit barrel left shift can shift left by 0, 1, 2, or 3 positions 8-bit barrel left shifter can shift left by 0, 1, 2, 3, 4, 5, 6, or 7 positions (Shifting an 8-bit number by 8 positions is pointless -- you just lose all the bits) Could design using 8x1 muxes and lots of wires Too many wires More elegant design Chain three shifters: 4, 2, and 1 Can achieve any shift of 0..7 by enabling the correct combination of those three shifters, i.e., shifts should sum to desired amount Q: xyz=??? to shift by 5? <<1 in sh x y z 8 Q <<2 <<4 I 1 (by 4) a (by 1) Net result: shift by 5:
119
4.5 Comparators N-bit equality comparator: Outputs 1 if two N-bit numbers are equal 4-bit equality comparator with inputs A and B a3 must equal b3, a2 = b2, a1 = b1, a0 = b0 Two bits are equal if both 1, or both 0 eq = (a3b3 + a3’b3’) * (a2b2 + a2’b2’) * (a1b1 + a1’b1’) * (a0b0 + a0’b0’) Recall that XNOR outputs 1 if its two input bits are the same eq = (a3 xnor b3) * (a2 xnor b2) * (a1 xnor b1) * (a0 xnor b0) 0110 = 0111 ? a3 b3 a2 b2 a1 b1 a0 b0 eq ( a ) b 4-bit equality comparator 1 1 1 1 1 a
120
Magnitude Comparator N-bit magnitude comparator: Indicates whether A>B, A=B, or A<B, for its two N-bit inputs A and B How design? Consider how compare by hand. First compare a3 and b3. If equal, compare a2 and b2. And so on. Stop if comparison not equal -- whichever’s bit is 1 is greater. If never see unequal bit pair, A=B. A=1011 B=1001 1011 1001 Equal 1011 1001 Equal 1011 1001 Unequal So A > B a
121
Magnitude Comparator By-hand example leads to idea for design
Start at left, compare each bit pair, pass results to the right Each bit pair called a stage Each stage has 3 inputs indicating results of higher stage, passes results to lower stage in_gt in_eq in_lt out_gt out_eq out_lt Igt Ieq Ilt Stage 3 a3 b3 a b Stage 2 a2 b2 Stage 1 a1 b1 AgtB AeqB AltB Stage 0 a0 b0 ( a ) a3 a2 a1 a0 b3 b2 b1 b0 Igt AgtB 1 Ieq 4-bit magnitude comparator AeqB Ilt AltB ( b )
122
Magnitude Comparator Each stage: out_gt = in_gt + (in_eq * a * b’)
in_lt out_gt out_eq out_lt Igt Ieq Ilt Stage 3 a3 b3 a b Stage 2 a2 b2 Stage 1 a1 b1 AgtB AeqB AltB Stage 0 a0 b0 Each stage: out_gt = in_gt + (in_eq * a * b’) A>B (so far) if already determined in higher stage, or if higher stages equal but in this stage a=1 and b=0 out_lt = in_lt + (in_eq * a’ * b) A<B (so far) if already determined in higher stage, or if higher stages equal but in this stage a=0 and b=1 out_eq = in_eq * (a XNOR b) A=B (so far) if already determined in higher stage and in this stage a=b too Simple circuit inside each stage, just a few gates (not shown)
123
Magnitude Comparator How does it work? 1011 = 1001 ?
1 1 1 a3 b3 a2 b2 a1 b1 a0 b0 1 a b a b a b a b 1 Ieq=1 causes this stage to compare Igt in_gt out_gt in_gt out_gt in_gt out_gt in_gt out_gt A gtB Ieq in_eq out_eq in_eq out_eq in_eq out_eq in_eq out_eq A eqB I lt in_lt out_lt in_lt out_lt in_lt out_lt in_lt out_lt A ltB S tage3 S tage2 S tage1 S tage0 ( a ) a 1 1 = 1 1 1 a3 b3 a2 b2 a1 b1 a0 b0 1 a b a b a b a b Igt in_gt out_gt in_gt out_gt in_gt out_gt in_gt out_gt A gtB 1 Ieq in_eq out_eq in_eq out_eq in_eq out_eq in_eq out_eq A eqB I lt in_lt out_lt in_lt out_lt in_lt out_lt in_lt out_lt A ltB S tage3 S tage2 S tage1 S tage0 ( b )
124
Magnitude Comparator 1011 = 1001 ? Final answer appears on the right
1 > 1 1 Final answer appears on the right Takes time for answer to “ripple” from left to right Thus called “carry-ripple style” after the carry-ripple adder Even though there’s no “carry” involved a3 b3 a2 b2 a1 b1 a0 b0 1 a b a b a b a b Igt in_gt out_gt in_gt out_gt in_gt out_gt in_gt out_gt A gtB 1 Ieq in_eq out_eq in_eq out_eq in_eq out_eq in_eq out_eq A eqB I lt in_lt out_lt in_lt out_lt in_lt out_lt in_lt out_lt A ltB S tage3 S tage2 S tage1 S tage0 ( c ) a 1 1 1 1 1 a3 b3 a2 b2 a1 b1 a0 b0 1 a b a b a b a b Igt in_gt out_gt in_gt out_gt in_gt out_gt in_gt out_gt A gtB 1 Ieq in_eq out_eq in_eq out_eq in_eq out_eq in_eq out_eq A eqB I lt in_lt out_lt in_lt out_lt in_lt out_lt in_lt out_lt A ltB S tage3 S tage2 S tage1 S tage0 ( d )
125
4.6 Counters N-bit up-counter: N-bit register that can increment (add 1) to its own value on each clock cycle 0000, 0001, 0010, 0011, ...., 1110, 1111, 0000 Note how count “rolls over” from 1111 to 0000 Terminal (last) count, tc, equals1 during value just before rollover Internal design Register, incrementer, and N-input AND gate to detect terminal count 1 c n t C 4-bit up-counter 4 a 0101 0100 0001 0011 0000 0010 0000 1111 1 1110 0001 ... ld 4-bit register C t c 4 n 4-bit up-counter +1 a
126
Incrementer Counter design used incrementer Incrementer design t n r I
Could use carry-ripple adder with B input set to But when adding to another number, the leading 0’s obviously don’t need to be considered -- so just two bits being added per column Use half-adders (adds two bits) rather than full-adders (adds three bits) ( a ) b a3 a2 a1 a0 1 s0 s1 s2 s3 c o s HA I r n t +1 0 1 1 1 + carries: unused
127
Incrementer Can build faster incrementer using combinational logic design process Capture truth table Derive equation for each output c0 = a3a2a1a0 ... s0 = a0’ Results in small and fast circuit Note: works for small N -- larger N leads to exponential growth, like for N-bit adder s2 1 s1 s0 s3 c0 a0 a1 a3 I nputs O utputs a2
128
Counter Example: Mode in Above-Mirror Display
Recall above-mirror display example from Chapter 2 Assumed component that incremented xy input each time button pressed: 00, 01, 10, 11, 00, 01, 10, 11, 00, ... Can use 2-bit up-counter Assumes mode=1 for just one clock cycle during each button press Recall “Button press synchronizer” example from Chapter 3 c n t c1 c0 x y 2-bit up ou er mode clk
129
Counter Example: 1 Hz Pulse Generator Using 256 Hz Oscillator
Suppose have 256 Hz oscillator, but want 1 Hz pulse 1 Hz is 1 pulse per second -- useful for keeping time Design using 8-bit up-counter, use tc output as pulse Counts from 0 to 255 (256 counts), so pulses tc every 256 cycles c n t C (unused) 8-bit up-counter 1 osc (256 Hz) 8 p (1 Hz)
130
Down-Counter 4-bit down-counter
1111, 1110, 1101, 1100, …, 0011, 0010, 0001, 0000, 1111, … Terminal count is 0000 Use NOR gate to detect Need decrementer (-1) – design like designed incrementer 4-bit down-counter c n t ld 4-bit register 4 4 4 –1 t c C 4
131
Up/Down-Counter Can count either up or down
Includes both incrementer and decrementer Use dir input to select, using 2x1: dir=0 means up Likewise, dir selects appropriate terminal count value ld 4-bit register C t c 4 n clr dir 4-bit up/down counter –1 +1 1 2 x 4-bit 2
132
Counter with Parallel Load
Up-counter that can be loaded with external value Designed using 2x1 mux – ld input selects incremented value or external value Load the internal register when loading external value or when counting L 4 ld 1 4-bit 2 x 1 4 c n t ld 4-bit register 4 4 4 +1 t c C
133
Counter with Parallel Load
1000 Useful to create pulses at specific multiples of clock Not just at N-bit counter’s natural wrap-around of 2N Example: Pulse every 9 clock cycles Use 4-bit down-counter with parallel load Set parallel load input to 8 (1000) Use terminal count to reload When count reaches 0, next cycle loads 8. Why load 8 and not 9? Because 0 is included in count sequence: 8, 7, 6, 5, 4, 3, 2, 1, 0 9 counts c n t ld C L 1 clk 4 4-bit down-counter
134
Counter Example: 1 Hz Pulse Generator from 60 Hz Clock
U.S. electricity standard uses 60 Hz signal Device may convert that to 1 Hz signal to count seconds Use 6-bit up-counter Can count from 0 to 63 Create simple logic to detect 59 (for 60 counts) Use to clear the counter back to 0 (or to load 0) clr 1 c n t 6-bit up counter osc t c C (60 Hz) p (1 Hz)
135
Subtractor Can build subtractor as we built carry-ripple adder 4.8
Mimic subtraction by hand Compute borrows from columns on left Use full-subtractor component: wi is borrow by column on right, wo borrow from column on left 1st c olumn 1 - 2nd c olumn 10 1 10 - 3 r d c olumn 1 - 4th c olumn 1 1 1 - 1 1 1 1 w o a3 a b FS wi s b3 s3 a2 b2 s2 a1 b1 s1 a0 s0 b0 ( ) c 4-bit subtractor a
136
Subtractor Example: DIP-Switch Based Adding/Subtracting Calculator
Extend earlier calculator example Switch f indicates whether want to add (f=0) or subtract (f=1) Use subtractor and 2x1 mux DIP switches 1 8-bit register CALC LEDs e f clk ld 8 2 x wi ci A B S co wo 8-bit adder 8-bit subtractor
137
Subtractor Example: Color Space Converter – RGB to CMYK
Often represented as weights of three colors: red, green, and blue (RGB) Perhaps 8 bits each, so specific color is 24 bits White: R= , G= , B= Black: R= , G= , B= Other colors: values in between, e.g., R= , G= , B= would be a reddish purple Good for computer monitors, which mix red, green, and blue lights to form all colors Printers use opposite color scheme Because inks absorb light Use complementary colors of RGB: Cyan (absorbs red), reflects green and blue, Magenta (absorbs green), and Yellow (absorbs blue)
138
Subtractor Example: Color Space Converter – RGB to CMYK
- R G B 8 255 C M Y GB t o CMY Printers must quickly convert RGB to CMY C=255-R, M=255-G, Y=255-B Use subtractors as shown
139
Subtractor Example: Color Space Converter – RGB to CMYK
Try to save colored inks Expensive Imperfect – mixing C, M, Y doesn’t yield good-looking black Solution: Factor out the black or gray from the color, print that part using black ink e.g., CMY of (250,200,200)= (200,200,200) + (50,0,0). (200,200,200) is a dark gray – use black ink
140
Subtractor Example: Color Space Converter – RGB to CMYK
Call black part K (200,200,200): K=200 (Letter “B” already used for blue) Compute minimum of C, M, Y values Use MIN component designed earlier, using comparator and mux, to compute K Output resulting K value, and subtract K value from C, M, and Y values Ex: Input of (250,200,200) yields output of (50,0,0,200) - 8 C2 M2 Y2 K MIN C M Y R G GB t o CMY B K o CM t GB R
141
Representing Negative Numbers: Two’s Complement
Negative numbers common How represent in binary? Signed-magnitude Use leftmost bit for sign bit So -5 would be: 1101 using four bits using eight bits Better way: Two’s complement Big advantage: Allows us to perform subtraction using addition Thus, only need adder component, no need for separate subtractor component!
142
Ten’s Complement 9 8 7 6 5 4 3 2 1 Before introducing two’s complement, let’s consider ten’s complement But, be aware that computers DO NOT USE TEN’S COMPLEMENT. Introduced for intuition only. Complements for each base ten number shown to right – Complement is the number that when added results in 10
143
Ten’s Complement - Nice feature of ten’s complement
Instead of subtracting a number, adding its complement results in answer exactly 10 too much So just drop the 1 – results in subtracting using addition only -
144
Two’s Complement is Easy to Compute: Just Invert Bits and Add 1
Hold on! Sure, adding the ten’s complement achieves subtraction using addition only But don’t we have to perform subtraction to have determined the complement in the first place? e.g., we only know that the complement of 4 is 6 by subtracting 10-4=6 in the first place. True – but in binary, it turns out that the two’s complement can be computed easily Two’s complement of 011 is 101, because is 1000 Could compute complement of 011 as 1000 – 011 = 101 Easier method: Just invert all the bits, and add 1 The complement of 011 is = it works! Q: What is the two’s complement of 0101? A: =1011 (check: =10000) a Q: What is the two’s complement of 0011? A: =1101
145
Two’s Complement Subtractor Built with an Adder
Using two’s complement A – B = A + (-B) = A + (two’s complement of B) = A + invert_bits(B) + 1 So build subtractor using adder by inverting B’s bits, and setting carry in to 1 1 cin B A Adder S N-bit
146
Adder/Subtractor Adder/subtractor: control input determines whether add or subtract Can use 2x1 mux – sub input passes either B or inverted B Alternatively, can use XOR gates – if sub input is 0, B’s bits pass through; if sub input is 1, XORs invert B’s bits
147
Adder/Subtractor Example: Calculator
Previous calculator used separate adder and subtractor Improve by using adder/subtractor, and two’s complement numbers DIP switches 1 8-bit register 8-bit adder/subtractor sub C A L LEDs e S B f clk ld 8 DIP swi t ches 1 8-bit r e g is er C A L LEDs f clk ld 8 2 x wi ci B S c o w 8-bit adder 8-bit subt a or
148
Overflow Sometimes result can’t be represented with given number of bits Either too large magnitude of positive or negative e.g., 4-bit two’s complement addition of (7+1=8). But 4-bit two’s complement can’t represent number >7 = WRONG answer, 1000 in two’s complement is -8, not +8 Adder/subtractor should indicate when overflow has occurred, so result can be discarded
149
Detecting Overflow: Method 1
Assuming 4-bit two’s complement numbers, can detect overflow by detecting when the two numbers’ sign bits are the same but are different from the result’s sign bit If the two numbers’ sign bits are different, overflow is impossible Adding a positive and negative can’t exceed largest magnitude positive or negative Simple circuit overflow = a3’b3’s3 + a3b3s3’ Include “overflow” output bit on adder/subtractor 1 + sign bits overflow ( a ) b no overflow c If the numbers’ sign bits have the same value, which differs from the result’s sign bit, overflow has occurred.
150
Detecting Overflow: Method 2
Even simpler method: Detect difference between carry-in to sign bit and carry-out from sign bit Yields simpler circuit: overflow = c3 xor c4 1 + overflow ( a ) b no overflow c If the carry into the sign bit column differs from the carry out of that column, overflow has occurred.
151
Arithmetic-Logic Unit: ALU
4.9 Arithmetic-Logic Unit: ALU ALU: Component that can perform any of various arithmetic (add, subtract, increment, etc.) and logic (AND, OR, etc.) operations, based on control inputs Motivation: Suppose want multi-function calculator that not only adds and subtracts, but also increments, ANDs, ORs, XORs, etc.
152
Multifunction Calculator without an ALU
Can build multifunction calculator using separate components for each operation, and muxes But too many wires, and wasted power computing all those operations when at any time you only use one of the results DIP switches 1 8-bit r e g is t er 8-bit 8 × C A L LEDs z y x clk Id s0 s1 s2 2 3 4 5 6 7 N O T X OR AND +1 Ð + 8 B A lot of wires Wasted power
153
ALU More efficient design uses ALU
ALU design not just separate components multiplexed (same problem as previous slide!), Instead, ALU design uses single adder, plus logic in front of adder’s A and B inputs Logic in front is called an arithmetic-logic extender Extender modifies the A and B inputs such that desired operation will appear at output of the adder
154
Arithmetic-Logic Extender in Front of ALU
xyz=000: Want S=A+B – just pass a to ia, b to ib, and set cin=0 xyz=001: Want S=A-B – pass a to ia, b’ to ib, and set cin=1 xyz=010: Want S=A+1 – pass a to ia, set ib=0, and set cin=1 xyz=011: Want S=A – pass a to ia, set ib=0, and set cin=0 xyz=1000: Want S=A AND B – set ia=a*b, b=0, and cin=0 others: likewise Based on above, create logic for ia(x,y,z,a,b) and ib(x,y,z,a,b) for each abext, and create logic for cin(x,y,z), to complete design of the AL-extender component
155
ALU Example: Multifunction Calculator
DIP swi t ches 1 8-bit r e g is er 8-bit 8 × C A L LEDs z y x clk Id s0 s1 s2 2 3 4 5 6 7 N O T X OR AND +1 Ð + 8 B A lot of wi s . W as ed p o w DIP switches 1 1 8 8 A B Design using ALU is elegant and efficient No mass of wires No big waste of power A B x x y ALU y z z S 8 e ld 8-bit register clk 8 C A L C LEDs
156
4.10 Register Files MxN register file component provides efficient access to M N-bit-wide registers If we have many registers but only need access one or two at a time, a register file is more efficient Ex: Above-mirror display (earlier example), but this time having bit registers Too many wires, and big mux is too slow C d0 d15 e i0 i15 load i3-i0 4 × 16 32 D d s3-s0 32-bit x 1 r eg0 eg15 c ongestion t oo much fanout huge mux er C d0 d1 d2 d3 e i0 i1 i2 i3 a0 a1 load 2 × 4 F r c n t al 8 D d x y s1 s0 8-bit 1 T o v mi a eg0 eg1 eg2 eg3 A I M a er ? s t ompu o the ab ompu om the car or displ o the ab c mi T om the car's or displ r r al r t r F n e c o v a e y -
157
Register File Instead, want component that has one data input and one data output, and allows us to specify which internal register to write and which to read 32 4 W_data W_addr W_en R_data R_addr R_en 16 × register file a a
158
Register File Timing Diagram
Can write one register and read one register each clock cycle May be same register 32 2 W_data W_addr W_en R_data R_addr R_en 4x32 register file
159
Register-File Example: Above-Mirror Display
16 32-bit registers that can be written by car’s computer, and displayed Use 16x32 register file Simple, elegant design Register file hides complexity internally And because only one register needs to be written and/or read at a time, internal design is simple C d0 d15 e i0 i15 load i3-i0 4 × 16 32 D d s3-s0 32-bit x 1 r eg0 eg15 c ongestion t oo much fanout huge mux OLD design a
160
Chapter Summary Need datapath components to store and operate on multibit data Also known as register-transfer-level (RTL) components Components introduced Registers Shifters Adders Comparators Counters Multipliers Subtractors Arithmetic-Logic Units Register Files Next, we’ll combine knowledge of combinational logic design, sequential logic design, and datapath components, to build digital circuits that can perform general and powerful computations
161
Chapter 5: Register-Transfer Level (RTL) Design
Digital Design Chapter 5: Register-Transfer Level (RTL) Design Slides to accompany the textbook Digital Design, First Edition, by Frank Vahid, John Wiley and Sons Publishers, 2007. Copyright © 2007 Frank Vahid Instructors of courses requiring Vahid's Digital Design textbook (published by John Wiley and Sons) have permission to modify and use these slides for customary course-related activities, subject to keeping this copyright notice in place and unmodified. These slides may be posted as unanimated pdf versions on publicly-accessible course websites.. PowerPoint source (or pdf with animations) may not be posted to publicly-accessible websites, but may be posted for students on internal protected sites or distributed directly to students by other electronic means. Instructors may make printouts of the slides available to students for a reasonable photocopying charge, without incurring royalties. Any other use requires explicit permission. Instructors may obtain PowerPoint source or obtain special use permissions from Wiley – see for information.
162
Introduction si ansis e z Datapath Controller Chapter 3: Controllers
5.1 Introduction Combinational logic n0 s1 s0 n1 bo bi clk State register FSM inputs outputs Chapter 3: Controllers Control input/output: single bit (or just a few) representing event or state Finite-state machine describes behavior; implemented as state register and combinational logic Chapter 4: Datapath components Data input/output: Multiple bits collectively representing single entity Datapath components included registers, adders, ALU, comparators, register files, etc. This chapter: custom processors Processor: Controller and datapath components working together to implement an algorithm Register Comparator si ansis ALU Register file Combinational logic n0 s1 s0 n1 bo bi State register e z Register file ALU Datapath Controller Note: Slides with animation are denoted with a small red "a" near the animated items
163
RTL Design: Capture Behavior, Convert to Circuit
Recall Chapter 2: Combinational Logic Design First step: Capture behavior (using equation or truth table) Remaining steps: Convert to circuit Chapter 3: Sequential Logic Design First step: Capture behavior (using FSM) RTL Design (the method for creating custom processors) First step: Capture behavior (using high-level state machine, to be introduced) Capture behavior Convert to circuit
164
5.2 RTL Design Method
165
RTL Design Method: “Preview” Example
Soda dispenser c: bit input, 1 when coin deposited a: 8-bit input having value of deposited coin s: 8-bit input having cost of a soda d: bit output, processor sets to 1 when total value of deposited coins equals or exceeds cost of a soda a s c d Soda dispenser processor 25 1 a s c d Soda dispenser processor 25 1 50 tot: 25 tot: 50 a 1 How can we precisely describe this processor’s behavior?
166
Preview Example: Step 1 -- Capture High-Level State Machine
8 a s c d Soda dispenser processor Declare local register tot Init state: Set d=0, tot=0 Wait state: wait for coin If see coin, go to Add state Add state: Update total value: tot = tot + a Remember, a is present coin’s value Go back to Wait state In Wait state, if tot >= s, go to Disp(ense) state Disp state: Set d=1 (dispense soda) Return to Init state Inputs: c (bit), a (8 bits), s (8 bits) Outputs: d (bit) Local registers: tot (8 bits) Wait Add Disp I nit d=0 tot=0 c’*(tot<s) d=1 c tot=tot+a c’*(tot<s)’
167
Preview Example: Step 2 -- Create Datapath
nputs : c (bit), a(8 bits) , s (8 bits) O utputs d (bit) L ocal r e g is t ers ot (8 bits) W ait A dd Disp nit d=0 ot=0 ‘ ( ot<s) * d=1 ot= ot+a Need tot register Need 8-bit comparator to compare s and tot Need 8-bit adder to perform tot = tot + a Wire the components as needed for above Create control input/outputs, give them names ld clr tot 8-bit < adder 8 s a Datapath tot_ld tot_clr tot_lt_s
168
Preview Example: Step 3 – Connect Datapath to a Controller
ld clr t ot 8-bit < adder 8 s a D tap th ot_ld ot_clr ot_lt_s Controller’s inputs External input c (coin detected) Input from datapath comparator’s output, which we named tot_lt_s Controller’s outputs External output d (dispense soda) Outputs to datapath to load and clear the tot register s a 8 8 c d tot_ld tot_clr Controller Datapath tot_lt_s
169
Preview Example: Step 4 – Derive the Controller’s FSM
Same states and arcs as high-level state machine But set/read datapath control signals for all datapath operations and conditions 8 8 c d tot_ld Controller tot_clr Datapath tot_lt_s ld clr tpt 8-bit < adder 8 s a Datapath tot_ld tot_clr tot_lt_s Inputs: : c , tot_lt_s (bit) Outputs: d tot_ld tot_clr W ait Disp I nit d=0 tot_clr=1 c’* tot_lt_s’ ’ * d=1 tot_ld=1 Controller Add
170
Preview Example: Completing the Design
Implement the FSM as a state register and logic As in Ch3 Table shown on right Inputs: : c , tot_lt_s (bit) Outputs: d tot_ld tot_clr W ait Disp I nit d=0 tot_clr=1 c’* tot_lt_s’ ’ * d=1 tot_ld=1 Controller Add
171
Step 1: Create a High-Level State Machine
Let’s consider each step of the RTL design process in more detail Step 1 Soda dispenser example Not an FSM because: Multi-bit (data) inputs a and s Local register tot Data operations tot=0, tot<s, tot=tot+a. Useful high-level state machine: Data types beyond just bits Local registers Arithmetic equations/expressions I nputs : c (bit), a (8 bits) , s (8 bits) O utputs d (bit) L ocal r e g is t ers ot (8 bits) W ait Disp nit d=0 ot=0 c’ ( ot<s ) d=1 ot= ot+a )’
172
Step 1 Example: Laser-Based Distance Measurer
Object of interest D 2D = T sec * 3*108 m/sec sensor laser T (in seconds) Example of how to create a high-level state machine to describe desired processor behavior Laser-based distance measurement – pulse laser, measure time T to sense reflection Laser light travels at speed of light, 3*108 m/sec Distance is thus D = T sec * 3*108 m/sec / 2
173
Step 1 Example: Laser-Based Distance Measurer
T (in seconds) Laser-based distance measurer 16 from button to display S L D B to laser from sensor laser sensor Inputs/outputs B: bit input, from button to begin measurement L: bit output, activates laser S: bit input, senses laser reflection D: 16-bit output, displays computed distance
174
Step 1 Example: Laser-Based Distance Measurer
16 from button to display S L D B to laser from sensor Inputs: B , S (1 bit each) Outputs: L (bit), D (16 bits) S0 ? a L = 0 (laser off) D = 0 (distance = 0) Step 1: Create high-level state machine Begin by declaring inputs and outputs Create initial state, name it S0 Initialize laser to off (L=0) Initialize displayed distance to 0 (D=0)
175
Step 1 Example: Laser-Based Distance Measurer
16 from button to display S L D B to laser from sensor Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) S1 ? B’ (button not pressed) B (button pressed) a S0 S0 L = 0 D = 0 Add another state, call S1, that waits for a button press B’ – stay in S1, keep waiting B – go to a new state S2 Q: What should S2 do? A: Turn on the laser a
176
Step 1 Example: Laser-Based Distance Measurer
16 from button to display S L D B to laser from sensor Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) B’ S3 L = 0 (laser off) S0 S1 S2 B L = 0 L = 1 a D = 0 (laser on) Add a state S2 that turns on the laser (L=1) Then turn off laser (L=0) in a state S3 Q: What do next? A: Start timer, wait to sense reflection a
177
Step 1 Example: Laser-Based Distance Measurer
16 f om but t on o displ a y S L D B o laser om sensor Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) Local Registers: Dctr (16 bits) B’ S’ (no reflection) S (reflection) ? S0 S1 S2 S3 B L = 0 Dctr = 0 (reset cycle count) L = 1 L = 0 a D = 0 Dctr = Dctr + 1 (count cycles) Stay in S3 until sense reflection (S) To measure time, count cycles for which we are in S3 To count, declare local register Dctr Increment Dctr each cycle in S3 Initialize Dctr to 0 in S1. S2 would have been O.K. too
178
Step 1 Example: Laser-Based Distance Measurer
16 f om but t on o displ a y S L D B o laser om sensor Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) Local Registers: Dctr (16 bits) B’ S’ a D = Dctr / 2 (calculate D) S4 S0 S1 S2 S3 B S L = 0 Dctr = 0 L = 1 L=0 D = 0 Dctr = Dctr + 1 Once reflection detected (S), go to new state S4 Calculate distance Assuming clock frequency is 3x108, Dctr holds number of meters, so D=Dctr/2 After S4, go back to S1 to wait for button again
179
Step 2: Create a Datapath
Datapath must Implement data storage Implement data computations Look at high-level state machine, do three substeps (a) Make data inputs/outputs be datapath inputs/outputs (b) Instantiate declared registers into the datapath (also instantiate a register for each data output) (c) Examine every state and transition, and instantiate datapath components and connections to implement any data computations Instantiate: to introduce a new component into a design.
180
Step 2 Example: Laser-Based Distance Measurer
Local Registers: Dctr (16 bits) S0 S1 S2 S3 L = 0 D = 0 L = 1 L=0 Dctr = Dctr + 1 Dctr = 0 B ‘ S D = Dctr / 2 (calculate D) S4 Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) (a) Make data inputs/outputs be datapath inputs/outputs (b) Instantiate declared registers into the datapath (also instantiate a register for each data output) (c) Examine every state and transition, and instantiate datapath components and connections to implement any data computations load Q I D r eg: 16-bit e g is t er 16 D Q D c t r : 16-bit u p - ou n er clear clear c ou n t a D a tap a th D r eg_clr c tr_clr tr_c n t eg_ld
181
Step 2 Example: Laser-Based Distance Measurer
Local Registers: Dctr (16 bits) S0 S1 S2 S3 L = 0 D = 0 L = 1 L=0 Dctr = Dctr + 1 Dctr = 0 B ‘ S D = Dctr / 2 (calculate D) S4 Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) (c) (continued) Examine every state and transition, and instantiate datapath components and connections to implement any data computations 16 >>1 a D a tap a th D r eg_clr D r eg_ld D c tr_clr clear clear I D c t r : 16-bit D r eg: 16-bit D c tr_c n t c ou n t load u p - c ou n t er r e g is t er Q Q 16 D
182
Step 2 Example Showing Mux Use
R = E + F R = R + G E , F , G, R (16 bits) L ocal r e g is t ers : ( a ) E F G A B + R ( c ) E F G A B + R ( b ) E F G A B + R add_A_s0 add_B_s0 2 × 1 ( d ) a Introduce mux when one component input can come from more than one source
183
Step 3: Connecting the Datapath to a Controller
300 M H z Clock D B L S 16 to display from button Controller to laser from sensor Datapath Dreg_clr Dreg_ld Dctr_clr Dctr_cnt Laser-based distance measurer example Easy – just connect all control signals between controller and datapath clear c ou n t load Q I D r : 16-bit u p - er eg: 16-bit e g is 16 a tap th eg_clr tr_clr tr_c eg_ld >>1
184
Step 4: Deriving the Controller’s FSM
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) Local Registers: Dctr (16 bits) S0 S1 S2 S3 L = 0 D = 0 L = 1 L=0 Dctr = Dctr + 1 Dctr = 0 B’ S’ B S D = Dctr / 2 (calculate D) S4 300 M H z Clock D B L S 16 t o displ a y f r om but on C o n oller o laser om sensor tap th eg_clr eg_ld c tr_clr tr_c S0 S1 S2 S3 L = 0 L = 1 B’ S’ B S S4 FSM has same structure as high-level state machine Inputs/outputs all bits now Replace data operations by bit operations using datapath Inputs: B, S Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt a Dreg_clr = 1 Dreg_ld = 0 Dctr_clr = 0 Dctr_cnt = 0 (laser off) (clear D reg) Dreg_clr = 0 Dreg_ld = 0 Dctr_clr = 1 Dctr_cnt = 0 (clear count) Dreg_clr = 0 Dreg_ld = 0 Dctr_clr = 0 Dctr_cnt = 0 (laser on) Dreg_clr = 0 Dreg_ld = 0 Dctr_clr = 0 Dctr_cnt = 1 (laser off) (count up) Dreg_clr = 0 Dreg_ld = 1 Dctr_clr = 0 Dctr_cnt = 0 (load D reg with Dctr/2) (stop counting)
185
Step 4: Deriving the Controller’s FSM
B’ S’ B S S4 Dreg_clr = 1 Dreg_ld = 0 Dctr_clr = 0 Dctr_cnt = 0 (laser off) (clear D reg) Dreg_clr = 0 Dctr_clr = 1 (clear count) (laser on) Dctr_cnt = 1 (count up) Dreg_ld = 1 (load D reg with Dctr/2) (stop counting) Step 4: Deriving the Controller’s FSM S0 S1 S2 S3 L = 0 L = 1 B’ S’ B S (laser on) S4 Inputs: B, S Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt Dreg_clr = 1 (laser off) (clear D reg) Dctr_clr = 1 (clear count) Dctr_cnt = 1 (count up) Dreg_ld = 1 Dctr_cnt = 0 (load D reg with Dctr/2) (stop counting) Using shorthand of outputs not assigned implicitly assigned 0 a
186
Step 4 300 MHz Clock D B L S 16 to display from button Controller to laser from sensor Datapath Dreg_clr clear c ou n t load Q I D r : 16-bit u p - er eg: 16-bit e g is 16 a tap th eg_clr tr_clr tr_c eg_ld >>1 Dreg_ld Dctr_clr Dctr_cnt S0 S1 S2 S3 L = 0 L = 1 B’ S’ B S (laser on) S4 Inputs: B, S Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt Dreg_clr = 1 (laser off) (clear D reg) Dctr_clr = 1 (clear count) Dctr_cnt = 1 (count up) Dreg_ld = 1 Dctr_cnt = 0 (load D reg with Dctr/2) (stop counting) Implement FSM as state register and logic (Ch3) to complete the design
187
RTL Design Examples and Issues
5.3 RTL Design Examples and Issues We’ll use several more examples to illustrate RTL design Example: Bus interface Master processor can read register from any peripheral Each register has unique 4-bit address Assume 1 register/periph. Sets rd=1, A=address Appropriate peripheral places register data on 32-bit D lines Periph’s address provided on Faddr inputs (maybe from DIP switches, or another register) 32 4 A r d D Per0 Per1 Per15 Master processor Faddr 4 A D r d Bus interface Main part Peripheral Q 32 to/from processor bus
188
RTL Example: Bus Interface
WaitMyAddress Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits) Outputs: D (32 bits) Local register: Q1 (32 bits) rd’ rd SendData D = “Z” Q1 = Q (A = Faddr) and rd ((A = Faddr) and rd’) D = Q1 Step 1: Create high-level state machine State WaitMyAddress Output “nothing” (“Z”) on D, store peripheral’s register value Q into local register Q1 Wait until this peripheral’s address is seen (A=Faddr) and rd=1 State SendData Output Q1 onto D, wait for rd=0 (meaning main processor is done reading the D lines)
189
RTL Example: Bus Interface
WaitMyAddress Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits) Outputs: D (32 bits) Local register: Q1 (32 bits) rd’ rd SendData D = “Z” Q1 = Q (A = Faddr) and rd ((A = Faddr) and rd’) D = Q1
190
RTL Example: Bus Interface
Q addr 4 32 A WaitMyAddress Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits) Outputs: D (32 bits) Local register: Q1 (32 bits) rd’ rd SendData D = “Z” Q1 = Q (A = Faddr) and rd ((A = Faddr) and rd)’ D = Q1 32 D Q1_ld ld Q1 Datapath Bus interface 32 A_eq_ F addr = (4-bit) D_en a Step 2: Create a datapath (a) Datapath inputs/outputs (b) Instantiate declared registers (c) Instantiate datapath components and connections
191
RTL Example: Bus Interface
rd Inputs: rd, A_eq_Faddr (bit) Outputs: Q1_ld, D_en (bit) WaitMyAddress Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits) Outputs: D (32 bits) Local register: Q1 (32 bits) rd’ rd SendData D = “Z” Q1 = Q (A = Faddr) and rd ((A = Faddr) and rd)’ D = Q1 A Faddr Q W ait M y A dd r ess d ‘ S endD a ta D_en = 0 Q1_ld = 1 D_en = 1 Q1_ld = 0 A_eq_ F addr and ( d) 4 4 32 Q1_ld ld Q1 32 A_eq_Faddr = (4-bit) D_en 32 D a Datapath Bus interface Step 3: Connect datapath to controller Step 4: Derive controller’s FSM
192
RTL Example: Video Compression – Sum of Absolute Differences
Only difference: ball moving Digitized frame 1 Frame 1 1 Mbyte ( a ) Digitized frame 2 1 Mbyte Frame 2 Digitized frame 1 Frame 1 1 Mbyte ( b ) Difference of 2 from 1 0.01 Mbyte Frame 2 Just send difference a Video is a series of frames (e.g., 30 per second) Most frames similar to previous frame Compression idea: just send difference from previous frame
193
RTL Example: Video Compression – Sum of Absolute Differences
compare Each is a pixel, assume represented as 1 byte (actually, a color picture might have 3 bytes per pixel, for intensity of red, green, and blue components of pixel) Frame 1 Frame 2 Need to quickly determine whether two frames are similar enough to just send difference for second frame Compare corresponding 16x16 “blocks” Treat 16x16 block as 256-byte array Compute the absolute value of the difference of each array item Sum those differences – if above a threshold, send complete frame for second frame; if below, can use difference method (using another technique, not described)
194
RTL Example: Video Compression – Sum of Absolute Differences
SAD 256-byte array A integer 256-byte array B sad go !(i<256) Want fast sum-of-absolute-differences (SAD) component When go=1, sums the differences of element pairs in arrays A and B, outputs that sum
195
RTL Example: Video Compression – Sum of Absolute Differences
go SAD sad Inputs: A, B (256 byte memory); go (bit) Outputs: sad (32 bits) Local registers: sum, sad_reg (32 bits); i (9 bits) !go S0 go S4 sad_ r eg = sum S0: wait for go S1: initialize sum and index S2: check if done (i>=256) S3: add difference to sum, increment index S4: done, write to output sad_reg S1 sum = 0 i = 0 a S2 i<256 (i<256)’ !(i<256) S3 sum=sum+abs(A[i]-B[i]) i=i+1
196
RTL Example: Video Compression – Sum of Absolute Differences
Inputs: A, B (256 byte memory); go (bit) AB_addr A_data B_data Outputs: sad (32 bits) Local registers: sum, sad_reg (32 bits); i (9 bits) i_lt_256 <256 8 8 9 S0 !go i_inc i – go i_clr sum = 0 a S1 8 i = 0 sum_ld (i<256)’ 32 sum abs S2 sum_clr i<256 !(i<256) 8 32 32 sum=sum+abs(A[i]-B[i]) S3 sad_reg_ld i=i+1 + sad_reg !(i<256) (i_lt_256) S4 sad_ reg=sum 32 Datapath sad Step 2: Create datapath
197
RTL Example: Video Compression – Sum of Absolute Differences
AB_addr A_data B_data go AB_ r d i_lt_256 <256 S0 go’ 8 8 9 go i_inc – sum=0 sum_ld=1; AB_rd=1 sad_reg_ld=1 i_inc=1 i_lt_256 i_clr=1 sum_clr=1 i S1 i_clr i=0 8 S2 sum_ld ? 32 i<256 sum abs sum_clr sum=sum+abs(A[i]-B[i]) S3 8 !(i<256) 32 32 i=i+1 sad_reg_ld S4 sad_reg=sum + sad_reg a !(i<256) (i_lt_256) !(i<256) (i_lt_256) 32 Controller sad Step 3: Connect to controller Step 4: Replace high-level state machine by FSM
198
RTL Example: Video Compression – Sum of Absolute Differences
Comparing software and custom circuit SAD Circuit: Two states (S2 & S3) for each i, 256 i’s 512 clock cycles Software: Loop (for i = 1 to 256), but for each i, must move memory to local registers, subtract, compute absolute value, add to sum, increment i – say about 6 cycles per array item 256*6 = 1536 cycles Circuit is about 3 times (300%) faster Later, we’ll see how to build SAD circuit that is even faster (i<256)’ S2 i<256 sum=sum+abs(A[i]-B[i]) S3 i=i+1 !(i<256) !(i<256) (i_lt_256)
199
RTL Design Pitfalls and Good Practice
Common pitfall: Assuming register is update in the state it’s written Final value of Q? Final state? Answers may surprise you Value of Q unknown Final state is C, not D Why? State A: R=99 and Q=R happen simultaneously State B: R not updated with R+1 until next clock cycle, simultaneously with state register being updated
200
RTL Design Pitfalls and Good Practice
Solutions Read register in following state (Q=R) Insert extra state so that conditions use updated value Other solutions are possible, depends on the example
201
RTL Design Pitfalls and Good Practice
Common pitfall: Reading outputs Outputs can only be written Solution: Introduce additional register, which can be written and read Inputs: A, B (8 bits) Inputs: A, B (8 bits) Outputs: P (8 bits) Outputs: P (8 bits) Local register: R (8 bits) S T S T P=A P=P+B R=A P=R+B P=A ( a ) ( b )
202
RTL Design Pitfalls and Good Practice
Good practice: Register all data outputs In fig (a), output P would show spurious values as addition computes Furthermore, longest register-to-register path, which determines clock period, is not known until that output is connected to another component In fig (b), spurious outputs reduced, and longest register-to-register path is clear B B R R + + P (a) Preg P (b)
203
Control vs. Data Dominated RTL Design
Designs often categorized as control-dominated or data-dominated Control-dominated design – Controller contains most of the complexity Data-dominated design – Datapath contains most of the complexity General, descriptive terms – no hard rule that separates the two types of designs Laser-based distance measurer – control dominated Bus interface, SAD circuit – mix of control and data Now let’s do a data dominated design
204
Data Dominated RTL Design Example: FIR Filter
Filter concept Suppose X is data from a temperature sensor, and particular input sequence is 180, 180, 181, 240, 180, 181 (one per clock cycle) That 240 is probably wrong! Could be electrical noise Filter should remove such noise in its output Y Simple filter: Output average of last N values Small N: less filtering Large N: more filtering, but less sharp output X Y 12 digital filter 12 clk
205
Data Dominated RTL Design Example: FIR Filter
“Finite Impulse Response” Simply a configurable weighted sum of past input values y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2) Above known as “3 tap” Tens of taps more common Very general filter – User sets the constants (c0, c1, c2) to define specific filter RTL design Step 1: Create high-level state machine But there really is none! Data dominated indeed. Go straight to step 2 X Y 12 digital filter 12 clk y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
206
Data Dominated RTL Design Example: FIR Filter
Step 2: Create datapath Begin by creating chain of xt registers to hold past values of X 12 Y clk X digital filter y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2) Suppose sequence is: 180, 181, 240 180 181 240 180 181 180 a
207
Data Dominated RTL Design Example: FIR Filter
Step 2: Create datapath (cont.) Instantiate registers for c0, c1, c2 Instantiate multipliers to compute c*x values 12 Y clk X digital filter y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2) 3-tap FIR filter x(t) c1 c0 c2 x( t -1) x( t -2) x t0 * x t1 x t2 X a clk Y
208
Data Dominated RTL Design Example: FIR Filter
Step 2: Create datapath (cont.) Instantiate adders 12 Y clk X digital filter y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2) 3-tap FIR filter x(t) x( t -1) x( t -2) c0 c1 c2 x t0 x t1 x t2 X clk a * * * + Y
209
Data Dominated RTL Design Example: FIR Filter
Step 2: Create datapath (cont.) Add circuitry to allow loading of particular c register 12 Y clk X digital filter y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2) CL 3-tap FIR filter e 3 Ca1 2x4 2 Ca0 1 C x(t) x(t-1) x(t-2) c0 c1 c2 xt0 xt1 xt2 a X clk * * * + + yreg Y
210
Data Dominated RTL Design Example: FIR Filter
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2) Step 3 & 4: Connect to controller, Create FSM No controller needed Extreme data-dominated example (Example of an extreme control-dominated design – an FSM, with no datapath) Comparing the FIR circuit to a software implementation Circuit Assume adder has 2-gate delay, multiplier has 20-gate delay Longest past goes through one multiplier and two adders = 24-gate delay 100-tap filter, following design on previous slide, would have about a 34-gate delay: 1 multiplier and 7 adders on longest path Software 100-tap filter: 100 multiplications, 100 additions. Say 2 instructions per multiplication, 2 per addition. Say 10-gate delay per instruction. (100* *2)*10 = 4000 gate delays Circuit is more than 100 times faster (10,000% faster). Wow.
211
Determining Clock Frequency
5.4 Determining Clock Frequency Designers of digital circuits often want fastest performance Means want high clock frequency Frequency limited by longest register-to-register delay Known as critical path If clock is any faster, incorrect data may be stored into register Longest path on right is 2 ns Ignoring wire delays, and register setup and hold times, for simplicity a + b c 2 ns del y clk
212
Critical Path Example shows four paths Longest path is thus 7 ns
a to c through +: 2 ns a to d through + and *: 7 ns b to d through + and *: 7 ns b to d through *: 5 ns Longest path is thus 7 ns Fastest frequency 1 / 7 ns = 142 MHz a b 2 ns + * 5 ns delay delay 7 ns 7 ns 2 ns 5 ns 7 ns 7 ns c d Max (2,7,7,5) = 7 ns
213
Critical Path Considering Wire Delays
Real wires have delay too Must include in critical path Example shows two paths Each is = 3 ns Trend 1980s/1990s: Wire delays were tiny compared to logic delays But wire delays not shrinking as fast as logic delays Wire delays may even be greater than logic delays! Must also consider register setup and hold times, also add to path Then add some time to the computed path, just to be safe e.g., if path is 3 ns, say 4 ns instead clk a b 0.5 ns 0.5 ns + 2 ns 0.5 ns 3 ns 3 ns 3 ns c
214
A Circuit May Have Numerous Paths
Paths can exist In the datapath In the controller Between the controller and datapath May be hundreds or thousands of paths Timing analysis tools that evaluate all possible paths automatically very helpful s a Combinational logic 8 8 d tot_ld ld tot t ot_clr c clr ( c ) 8 tot_lt_s n1 8-bit 8-bit n0 < adder tot_lt_s 8 Datapath s1 s0 ( b ) ( a ) clk State register
215
Behavioral Level Design: C to Gates
5.5 Behavioral Level Design: C to Gates i n t SAD (byte A[256], byte B[256]) // not quite C syntax { uint sum; short uint I; sum = 0; i = 0; while (i < 256) { sum = sum + abs(A[i] – B[i]); i = i + 1; } return sum; C code !go S0 go S1 sum = 0 i = 0 S3 sum=sum+abs(A[i]-B[i]) i=i+1 S4 sad_ r eg = sum S2 i<256 (i<256)’ a Earlier sum-of-absolute-differences example Started with high-level state machine C code is an even better starting point -- easier to understand
216
Behavioral-Level Design: Start with C (or Similar Language)
Replace first step of RTL design method by two steps Capture in C, then convert C to high-level state machine How convert from C to high-level state machine? Step 1A: Capture in C Step 1B: Convert to high-level state machine a
217
Converting from C to High-Level State Machine
Convert each C construct to equivalent states and transitions Assignment statement Becomes one state with assignment If-then statement Becomes state with condition check, transitioning to “then” statements if condition true, otherwise to ending state “then” statements would also be converted to states target= expression a target = expression; !cond cond (end) if (cond) { // then stmts } a (then stmts)
218
Converting from C to High-Level State Machine
If-then-else Becomes state with condition check, transitioning to “then” statements if condition true, or to “else” statements if condition false While loop statement Becomes state with condition check, transitioning to while loop’s statements if true, then transitioning back to condition check !cond cond (end) (then stmts) (else stmts) if (cond) { // then stmts } else { // else stmts a !cond cond (while stmts) (end) while (cond) { // while stmts } a
219
Simple Example of Converting from C to High-Level State Machine
(end) (then stmts) (else stmts) (b) X>Y !(X>Y) (end) (c) X>Y !(X>Y) (a) Inputs: uint X, Y Outputs: uint Max if (X > Y) { } else { Max = X; Max = Y; Max=X Max=Y a a Simple example: Computing the maximum of two numbers Convert if-then-else statement to states (b) Then convert assignment statements to states (c)
220
Example: Converting Sum-of-Absolute-Differences C code to High-Level State Machine
!(!go) !go Convert each construct to states Simplify when possible, e.g., merge states From high-level state machine, follow RTL design method to create circuit Thus, can convert C to gates using straightforward automatable process Not all C constructs can be efficiently converted Use C subset if intended for circuit Can use languages other than C, of course sum = sum + abs(A[i] - B[i]); (a) Inputs: byte A[256, B[256] bit go; Output: int sad main() { uint sum; short uint I; while (1) { sum = 0; i = 0; while (!go); while (i < 256) { i = i + 1; } sad = sum; (c) !go go (d) !go go sum=0 i=0 sum=0 i=0 (e) !go go sum=0 i=0 while stmts !(i<256) i<256 (g) !go go sum=0 i=0 !(i<256) i<256 sad = sum sum=sum + abs i = i + 1 (f) !go go sum=0 i=0 !(i<256) i<256 sum=sum + abs i = i + 1 a sad = sum
221
4.10 Register Files MxN register file component provides efficient access to M N-bit-wide registers If we have many registers but only need access one or two at a time, a register file is more efficient Ex: Above-mirror display (earlier example), but this time having bit registers Too many wires, and big mux is too slow C d0 d15 e i0 i15 load i3-i0 4 × 16 32 D d s3-s0 32-bit x 1 r eg0 eg15 c ongestion t oo much fanout huge mux er C d0 d1 d2 d3 e i0 i1 i2 i3 a0 a1 load 2 × 4 F r c n t al 8 D d x y s1 s0 8-bit 1 T o v mi a eg0 eg1 eg2 eg3 A I M a er ? s t ompu o the ab ompu om the car or displ o the ab c mi T om the car's or displ r r al r t r F n e c o v a e y -
222
Register File Instead, want component that has one data input and one data output, and allows us to specify which internal register to write and which to read 32 4 W_data W_addr W_en R_data R_addr R_en 16 × register file a a
223
Register File Timing Diagram
Can write one register and read one register each clock cycle May be same register 32 2 W_data W_addr W_en R_data R_addr R_en 4x32 register file
224
5.6 Memory Components Register-transfer level design instantiates datapath components to create datapath, controlled by a controller A few more components are often used outside the controller and datapath MxN memory M words, N bits wide each Several varieties of memory, which we now introduce M words N-bits wide each M × N memo r y
225
Random Access Memory (RAM)
RAM – Readable and writable memory “Random access memory” Strange name – Created several decades ago to contrast with sequentially-accessed storage like tape drives Logically same as register file – Memory with address inputs, data inputs/outputs, and control RAM usually just one port; register file usually two or more RAM vs. register file RAM typically larger than roughly 512 or 1024 words RAM typically stores bits using a bit storage approach that is more efficient than a flip flop RAM typically implemented on a chip in a square rather than rectangular shape – keeps longest wires (hence delay) short 32 4 W_data W_addr W_en R_data R_addr R_en 16 × register file Register file from Chpt. 4 32 data 10 addr 1024 × 32 r w R A M en RAM block symbol
226
RAM Internal Structure
32 10 data addr r w en 1024x32 RAM wdata(N-1) wdata(N-2) wdata0 Let A = log2M bit storage block (aka “cell”) w o r d word a0 a1 d0 d1 d(M-1) a(A-1) e AxM decoder enable addr0 addr1 RAM cell word enable r w data cell data addr(A-1) addr clk en r w to all cells rdata(N-1) rdata(N-2) rdata0 Similar internal structure as register file Decoder enables appropriate word based on address inputs rw controls whether cell is written or read Let’s see what’s inside each RAM cell
227
Static RAM (SRAM) 1 “Static” RAM cell SRAM cell SRAM cell
32 10 data addr r w en 1024x32 RAM data data’ cell d d’ a word enable “Static” RAM cell 6 transistors (recall inverter is 2 transistors) Writing this cell word enable input comes from decoder When 0, value d loops around inverters That loop is where a bit stays stored When 1, the data bit value enters the loop data is the bit to be stored in this cell data’ enters on other side Example shows a “1” being written into cell SRAM cell data data’ d word enable 1 a data data’ d’ d cell word enable 1 a
228
Static RAM (SRAM) “Static” RAM cell SRAM cell Reading this cell
32 10 data addr r w en 1024x32 RAM SRAM cell “Static” RAM cell Reading this cell Somewhat trickier When rw set to read, the RAM logic sets both data and data’ to 1 The stored bit d will pull either the left line or the right bit down slightly below 1 “Sense amplifiers” detect which side is slightly pulled down The electrical description of SRAM is really beyond our scope – just general idea here, mainly to contrast with DRAM... data data’ 1 d 1 1 <1 a word enable To sense amplifiers
229
Dynamic RAM (DRAM) “Dynamic” RAM cell 1 transistor (rather than 6)
32 10 data addr r w en 1024x32 RAM DRAM cell “Dynamic” RAM cell 1 transistor (rather than 6) Relies on large capacitor to store bit Write: Transistor conducts, data voltage level gets stored on top plate of capacitor Read: Just look at value of d Problem: Capacitor discharges over time Must “refresh” regularly, by reading d and then writing it right back data c ell word enable d capacitor slowly discharging ( a ) data enable discharges d ( b )
230
Comparing Memory Types
Register file Fastest But biggest size SRAM Fast More compact than register file DRAM Slowest And refreshing takes time But very compact Use register file for small items, SRAM for large items, and DRAM for huge items Note: DRAM’s big capacitor requires a special chip design process, so DRAM is often a separate chip MxN Memory implemented as a: register file SRAM DRAM Size comparison for same number of bits (not to scale)
231
Reading and Writing a RAM
clk clk 1 2 3 addr 9 13 9 addr valid setup time data 500 999 Z 500 data valid hold Z 500 time r w 1 means write r w setup time en access RAM[9] RAM[13] time now equals 500 now equals 999 Writing Put address on addr lines, data on data lines, set rw=1, en=1 Reading Set addr and en lines, but put nothing (Z) on data lines, set rw=0 Data will appear on data lines Don’t forget to obey setup and hold times In short – keep inputs stable before and after a clock edge ( b )
232
RAM Example: Digital Sound Recorder
Behavior Record: Digitize sound, store as series of bit digital values in RAM We’ll use a 4096x16 RAM (12-bit wide RAM not common) Play back later Common behavior in telephone answering machine, toys, voice recorders To record, processor should read a-to-d, store read values into successive RAM words To play, processor should read successive RAM words and enable d-to-a
233
RAM Example: Digital Sound Recorder
analog-to- digital converter digital-to- analog ad_ld da_ld Rw Ren Ra 12 16 processor ad_buf 4096x16 RAM RTL design of processor Create high-level state machine Begin with the record behavior Keep local register a Stores current address, ranges from 0 to 4095 (thus need 12 bits) Create state machine that counts from 0 to 4095 using a For each a Read analog-to-digital conv. ad_ld=1, ad_buf=1 Write to RAM at address a Ra=a, Rrw=1, Ren=1 Record behavior S a=0 a=a+1 a=4095 a<4095 T U Local register: a (12 bits) ad_ld=1 ad_buf=1 a Ra=a Rrw=1 Ren=1
234
RAM Example: Digital Sound Recorder
analog-to- digital converter digital-to- analog ad_ld da_ld Rw Ren Ra 12 16 processor ad_buf 4096x16 RAM Now create play behavior Use local register a again, create state machine that counts from 0 to 4095 again For each a Read RAM Write to digital-to-analog conv. Note: Must write d-to-a one cycle after reading RAM, when the read data is available on the data bus The record and play state machines would be parts of a larger state machine controlled by signals that determine when to record or play data bus Play behavior V a=0 a=a+1 a=4095 a<4095 W X Local register: a (12 bits) a ad_buf=0 Ra=a Rrw=0 Ren=1 da_ld=1
235
Read-Only Memory – ROM Memory that can only be read from, not written to Data lines are output only No need for rw input Advantages over RAM Compact: May be smaller Nonvolatile: Saves bits even if power supply is turned off Speed: May be faster (especially than DRAM) Low power: Doesn’t need power supply to save bits, so can extend battery life Choose ROM over RAM if stored data won’t change (or won’t change often) For example, a table of Celsius to Fahrenheit conversions in a digital thermometer 32 10 data addr r w en 1024 × R A M RAM block symbol 32 10 data addr en 1024x32 ROM ROM block symbol
236
Read-Only Memory – ROM addr ROM cell
32 10 data addr en 1024x32 ROM ROM block symbol Let A = log2M word bit storage enable block d0 (aka “cell”) addr0 a0 w o r d addr1 a1 AxM d1 decoder addr(A-1) addr data a(A-1) word word e d(M-1) enable enable clk data en ROM cell rdata(N-1) rdata(N-2) rdata0 Internal logical structure similar to RAM, without the data input lines
237
ROM Types If a ROM can only be read, how are the stored bits stored in the first place? Storing bits in a ROM known as programming Several methods Mask-programmed ROM Bits are hardwired as 0s or 1s during chip manufacturing 2-bit word on right stores “10” word enable (from decoder) simply passes the hardwired value through transistor Notice how compact, and fast, this memory would be 1 data line data line cell cell word enable
238
ROM Types Fuse-Based Programmable ROM Each cell has a fuse
A special device, known as a programmer, blows certain fuses (using higher-than-normal voltage) Those cells will be read as 0s (involving some special electronics) Cells with unblown fuses will be read as 1s 2-bit word on right stores “10” Also known as One-Time Programmable (OTP) ROM 1 data line 1 data line cell cell blown fuse a word enable fuse
239
ROM Types t or ting a r e t t a g Erasable Programmable ROM (EPROM)
Uses “floating-gate transistor” in each cell Special programmer device uses higher-than-normal voltage to cause electrons to tunnel into the gate Electrons become trapped in the gate Only done for cells that should store 0 Other cells (without electrons trapped in gate) will be 1 2-bit word on right stores “10” Details beyond our scope – just general idea is necessary here To erase, shine ultraviolet light onto chip Gives trapped electrons energy to escape Requires chip package to have window data line data line floating-gate transistor c ell c ell 1 t or word e Ð e Ð enable ting a r e t t a g trapped electrons
240
ROM Types Electronically-Erasable Programmable ROM (EEPROM)
Similar to EPROM Uses floating-gate transistor, electronic programming to trap electrons in certain cells But erasing done electronically, not using UV light Erasing done one word at a time Flash memory Like EEPROM, but all words (or large blocks of words) can be erased simultaneously Become common relatively recently (late 1990s) Both types are in-system programmable Can be programmed with new stored bits while in the system in which the ROM operates Requires bi-directional data lines, and write control input Also need busy output to indicate that erasing is in progress – erasing takes some time 32 10 data addr en write busy 1024x32 EEPROM
241
ROM Example: Digital Telephone Answering Machine Using a Flash Memory
Want to record the outgoing announcement When rec=1, record digitized sound in locations 0 to 4095 When play=1, play those stored sounds to digital-to-analog converter What type of memory? Should store without power supply – ROM, not RAM Should be in-system programmable – EEPROM or Flash, not EPROM, OTP ROM, or mask-programmed ROM Will always erase entire memory when reprogramming – Flash better than EEPROM analog-to- digital converter digital-to- analog ad_ld da_ld Rrw Ren er bu Ra 12 16 processor ad_buf busy 4096x16 Flash rec play record microphone speaker “We’re not home.”
242
ROM Example: Digital Telephone Answering Machine Using a Flash Memory
High-level state machine Once rec=1, begin erasing flash by setting er=1 Wait for flash to finish erasing by waiting for bu=0 Execute loop that sets local register a from 0 to 4095, reading analog-to-digital converter and writing to flash for each a analog-to- digital converter digital-to- analog ad_ld da_ld Rrw Ren er bu Ra 12 16 processor ad_buf 4096x16 Flash rec play record microphone speaker Local register: a (13 bits) a=4096 a<4096 U V ad_ld=1 ad_buf=1 Ra=a Rrw=1 Ren=1 a=a+1 a=0 T er=0 bu bu’ a er=1 r ec S
243
Blurring of Distinction Between ROM and RAM
We said that RAM is readable and writable ROM is read-only But some ROMs act almost like RAMs EEPROM and Flash are in-system programmable Essentially means that writes are slow Also, number of writes may be limited (perhaps a few million times) And, some RAMs act almost like ROMs Non-volatile RAMs: Can save their data without the power supply One type: Built-in battery, may work for up to 10 years Another type: Includes ROM backup for RAM – controller writes RAM contents to ROM before turning off New memory technologies evolving that merge RAM and ROM benefits e.g., MRAM Bottom line Lot of choices available to designer, must find best fit with design goals ROM RAM Flash EEPROM NVRAM a
244
Hierarchy and Abstraction
Hierarchy often involves not just grouping items into a new item, but also associating higher-level behavior with the new item, known as abstraction e.g., an 8-bit adder has an understandable high-level behavior – it adds two 8-bit binary numbers Frees designer from having to remember, or even from having to understand, the lower-level details a7.. a0 b7.. b0 8-bit adder ci c o s7.. s0
245
Hierarchy and Composing Larger Components from Smaller Versions
A common task is to compose smaller components into a larger one Gates: Suppose you have plenty of 3-input AND gates, but need a 9-input AND gate Can simple compose the 9-input gate from several 3-input gates Muxes: Suppose you have 4x1 and 2x1 muxes, but need an 8x1 mux s2 selects either top or bottom 4x1 s1s0 select particular 4x1 input Implements 8x1 mux – 8 data inputs, 3 selects, one output a P o r vin c e 1
246
Hierarchy and Composing Larger Components from Smaller Versions
Composing memory very common Making memory words wider Easy – just place memories side-by-side until desired width obtained Share address/control lines, concatenate data lines Example: Compose 1024x8 ROMs into 1024x32 ROM 10 addr addr addr addr 1024x8 1024x8 1024x8 1024x8 addr ROM ROM ROM ROM en en en en data data data data en 8 8 8 8 data(31..0) 10 1024x32 ROM data 32
247
Hierarchy and Composing Larger Components from Smaller Versions
2048x8 ROM data 8 11 1024x8 addr en a9..a0 a10 d0 d1 1x2 dcd i0 e Creating memory with more words Put memories on top of one another until the number of desired words is achieved Use decoder to select among the memories Can use highest order address input(s) as decoder input Although actually, any address line could be used Example: Compose 1024x8 memories into 2048x8 memory 1024x8 ROM addr en data a0 a10 a9 a8 a10 just chooses which memory to access a a To create memory with more words and wider words, can first compose to enough words, then widen.
248
Chapter Summary Modern digital design involves creating processor-level components Four-step RTL method can be used 1. High-level state machine 2. Create datapath 3. Connect datapath to controller 4. Derive controller FSM Several example Control dominated, data dominated, and mix Determining fastest clock frequency By finding critical path Behavioral-level design – C to gates By using method to convert C (subset) to high-level state machine Additional RTL components Memory: RAM, ROM Queues Hierarchy: A key concept used throughout Chapters 2-5
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.