ECE465 Lecture Notes # 11 Clocking Methodologies Shantanu Dutt UIC Acknowledgement: (1) Most slides prepared by Huan Ren from Prof. Dutt’s Lecture Notes (some modifications made by Prof. Dutt). (2) Some slides extracted from Prof. David Pan’s (UT Austin) slides as indicated.
Timing Methodologies Synchronous Sequential Circuits External I/P External O/P 00,11/0 Comb. Logic 01/1 TOPP,Logic (critical path delay In the o/p logic part) A B 00,01,10/0 Memory TNSP,Logic 01/0 (critical path delay In the NS logic part) 11/0 11/0 C 10,00/1 Clk Features Required for Correct Operation 1) All State Transitions take place only with respect to a particular event in the clock (e.g., positive or negative edge, etc. ) Transition occurs only on positive edge of Clk
Timing Methodologies (contd) Features Required for Correct Operation 2) Only one state transition should take place in one clock period. 3) All inputs to all FFs/latches should be correctly available with appropriate setup time (Tsetup or Tsu) and hold time (Thold or Th) around the triggering edge of the clock. ≥ Tsetup ≥ Thold Input Clock Tperiod =TClk i’th state transition (i+1)’th state transition (i+2)’th state transition (i+3)’th state transition [could be to the same state]
Clock Routing Clock Source A path from the clock source to clock sinks (FFs) Different FFs are at different distances from the clock source Clock Source The goal of a clock tree is to get the clock signal from a clock source to clock sinks. The relative arrival times, shape, and amplitude of the clock signal at the sinks need to meet certain criteria for the circuit to work correctly. Simple wires are susceptible to influences of signals in nearby wires and their own parasitics among other things that may compromise the integrity the signal and cause the signal arriving at the sinks to be unacceptable. FF This leads to the clock arriving at different FFs at slightly different time. This difference in clock arrival times is called clock skew From: David Pan, UT Austin
Timing Methodologies: Clock Skew Problem General definition of clock skew: Max(arrival time difference of the “same” clock edge betw all FF pairs). Negative clock skew T-skew = -Max{|tarriv(FFi) – tarriv(FFj)|: FFi drives FFj, tarriv(FFi) < tarriv(FFj)}, tarriv(FFi) is the clock arrival time at FFi. Positive clock skew T+skew = Max{tarriv(FFi) – tarriv(FFj): FFi drives FFj, tarriv(FFi) > tarriv(FFj)} Real-world problems that can cause the two requirements, hold time or setup time to be violated: Hold time violation problem: Clock arrives at driving FF before it arrives at sink/driven FF (negative skew) Safe: If blue horse wins race & wins it by a margin of at least Th 2 1 Unsafe: If brown horse wins race 2 1 New value of D2 via Q1 overwrites old value before Q2 is loaded w/ earlier correct value. This causes an incorrect Q2 change when +ve edge arrives at Clk2 Clk1 Clk2 D1 D2 Q1 Q2 |T-skew| IN 1 0 1 FF1 D Q Logic FF2 D Q D1 Q1 Q2 D2 Values before the clock +ve edge Clk1 Clk2 Clk Current state 00 10 Correct transition 11 Incorrect transition
Safe Value of Negative Tskew (T-skew) 1 D1 FF1 D Q Logic FF2 IN Q1 D2 Q2 Clk Clk1 Clk2 ≥Th ≥Tsu Clk1 Safe if: min (TPLH of FF)+min (TP,Logic between Q1 & Q2) > |T-skew| + Th D1 i.e. if: |T-skew | < min (TPLH)+min (TP,Logic) -Th Typical or min TPLH Q1 Similarly for 1 to 0 transition of Q1: TPHL comes into play, then safe if: |T-skew| < min (TPHL)+min (TP,Logic)-Th D2 min TP,Logic Thus we need: |T-skew | < min (min TPLH, min TPHL)+min (TNSP,Logic) –Th = min(TP,FF) + min(TNS P,Logic) – Th, where TNSP,Logic is the prop. delay of the next state (NS) logic portion of the entire comb. logic in the system, which is the relevant logic block wrt clock skew Thus, the safe |T-skew | limit for negative skew is based on minimum propagation delay of FFs and the NS logic Clk2 Tskew ≥Th
Another problem of clock skew—positive skew Positive clock skew (clock arriving at driven/sink FF before driving FF) can cause setup time violations: If the clock is not designed taking +ve skew into account, then there will not be enough time to complete the FF-load and comb. logic operations Tsu time before the next clock edge arrives at Clk2 If +ve clock skew is taken into account, as it should be, the clock period Tclk will be larger by an amount of T+skew, thus making it “unnecessarily” slower Clk2 Clk1 T+skew Less time avail. for logic and FF delays TFF + Tlogic + Tsu Tclk 1 D1 FF1 D Q Logic FF2 IN Q1 D2 Q2 Clk1 Clk2 Clk Comb. Logic FF1 FF2 Clk Clk1 Clk2 TOPP,Logic TNSP,Logic
Determining Clock Period: Edge Triggered System Clk Level sens. latch Positive edge trigg. negative edge trigg. TOPP,Logic Comb. Logic TNSP,Logic FF1 Clk1 Memory of FF bank with delay TP,FF FF2 Clk Clk2 Max(typical TPHLand typical TPLH) Tsu T+skew TClk-T+skew > max(TP,FF)+ max(TNSP,Logic)+Tsetup = TP,FF+ TNSP,Logic+Tsetup i.e., we will use the normal convention of using TP,FF to mean max(TP,FF) TNSP,Logic to mean max(TNSP,Logic) Also, TClk-T+skew > TP,FF+ TOPP,Logic, where TOPP,Logic is the output logic portion of combinational logic. TP,FF TP,Logic Clk1 TClk Clk2
Determining the Clock Period (Contd.) Without skew: TClk ≥ TP,FF + TNSP,Logic + Tsetup, AND ≥ TP,FF + TOPP,Logic Clk1 TClk If with skew TClk> T+skew+ TP,FF+ TNSP,Logic +Tsetup AND TClk> T+skew+ TP,FF+ TOPP,Logic (o/p needs to be generated before new Q values are asserted at the nect earlies +ve edge of the clock at some FF) TClk> max(T+skew+ TP,FF+ TNSP,Logic +Tsetup, T+skew+ TP,FF+ TOPP,Logic) Use 10% buffer for safety TClk=1.1max(T+skew+ TP,FF+ TNSP,Logic +Tsetup, T+skew+ TP,FF+ TOPP,Logic) Note: If T+skew= 0, then T+skew can be replaced in the above expression for TClk by the the min. magnitude negative skew T-min-skew (this will have a –ve sign) to actually reduce the clock period, where T-min-skew = -Min{|tarriv(FFi) – tarriv(FFj)|: FFi drives FFj, tarriv(FFi) < tarriv(FFj)}, Why is it correct to do so?
Determining the Clock Period of a Datapath w/ a Controller FSM Ignoring clock skew here for simplicity. Can be added later on after deciding the non-skew clock period by adding 1.1T+skew to it. Registers Datapath FFs n CLK Output Logic m2 Next State Comb. m1 I/Ps (external + from datapath) O/Ps (= Control Signals) FU(s) FU(s) FU(s) Delay1 = TP,FF + TNSP,Logic +Tsetup Subpath delay Di = TP,FF+ TFU(s) + Tsetup (+ TopP,Logic + Tmux if i/p mux on subpath); separate formulation needed for mux+demux or only demux on subpath Control logic (muxes, decoders, tri-state buffers, load/enablei/ps) Delay2 = TP,FF+ TopP,Logic + time to reach input muxes or output demuxes T1= max(Delay1, Delay2) What if the smallest subpath delay Dmin is > T1. Why waste resources counting ceiling(Dmin/T1) cc’s? Can set T1=max(Delay1, Delay2, Dmin). But, this can waste time. E.g., T1= 1.5 ns, and 3 subpath delays are: D1=6 ns, D2 = 9 ns, D3 = 15 ns T1 = 6 ns reduced cc counting but this wastes 3 ns in waiting for D2 and D3 delays. Why? A simple technique: Find the approximate greatest common divisor (gcd) of the various subpath delays >= max(Delay1, Delay2). Update T1=max(Delay1, Delay2, above gcd). E.g., in above ex, T1= 3 ns. No waste of time plus reduced counting of cc’s per subpath delay compared to T1= max(Delay1, Delay2) Make TClk = 1.1T1 Each subpath w/ delay Di will then have cc delay of ceiling(Di/ TClk)
B) Another Problem in Seq B) Another Problem in Seq. Circuits: Race Condition (multiple state changes in a cc) A race condition occurs when a FF/latch output changes more than once in a clock cycle (cc). This happens when after the O/P of a latch changes, it feeds back to its input via some logic when the latch is still enabled in the same cc. This cause the O/P to change again. ≥Tsu D latch Comb. Logic Other I/Ps D Q Clk D Q Clk 2 changes of state in Q in 1 cc
Race Condition (contd) Race condition is generally a problem with level sensitive latches. Can be solved using: a) Edge-triggered FFs. D FF Comb. Logic Other I/Ps Clk D Q Clk D Q Only 1 O/P change per cc. D latch Comb. Logic Other I/Ps D Q b) Narrow-width clocking. TClk TClk > Tskew+ TP,FF+ TP,Logic+Tsetup Tw < min (TP,FF)+min(TP,Logic) min (min TPLH, min TPHL) Tw Narrow Width Clk
Correct State Transition Using Level-Sensitive Latches: No race cond Correct State Transition Using Level-Sensitive Latches: No race cond. but potential exists 0/1 10 Transition for the darkened arrow: 0/0 00 1/1 01 Comb. Logic 1 1/0 1/0 1/1 0/0 01 0/1 Comb. Logic 1 Clk CS NS 2 level sens. latches Clk Comb. Logic 1 Clk
Race Condition due to unequal path delays for different NS bits: Incorrect State Transition Using Level-Sensitive Latches 0/1 Required transition for the thick arrow becomes incorrect transition corresponding to the dashed arrow 10 0/0 00 1/1 11 1/0 1 1 1/0 Comb. Logic 1/1 Comb. Logic 01 0/1 0/0 Comb. Logic 1 Clk 1 1 1 1 slow 1 1 1 fast Clk Clk Comb. Logic 1 Clk Comb. Logic 1 Clk 2 level-sens. latches
No Race Condition Using Edge-Triggered FFs 0/1 10 0/0 Correct transition for the darkened arrow irrespective of the relative speed of different excitation (next state) outputs 00 1/1 01 1/0 1/0 1/1 1 1 01 Comb. Logic Comb. Logic 0/1 0/0 Comb. Logic 1 Clk 1 1 1 1 slow 1 1 fast Clk Clk Comb. Logic 1 Clk Comb. Logic 1 Clk 2 M-S or edge-triggered FFs Period Between State Transitions (also clock period)
No Race Condition Using 2-phase non-overlapping clocking and MS level sensitive latches 00 10 0/0 0/1 01 1/1 1/0 Generally, Cost(master-slave (MS) LS latches) < Cost(edge-trigg. FF) Correct transition for the darkened arrow irrespective of the relative speed of different excitation (next state) outputs Comb. Logic 1 Clk2 Clk1 Comb. Logic 1 Clk2 Clk1 fast slow Comb. Logic 1 Clk2 Clk1 fast slow Comb. Logic 1 Clk2 Clk1 1 OR Clk2 T2-1 Clk1 T1-2 Tgap
Two-phase clock period determination Comb. Logic O/Ps I/Ps Clk2 Clk1 CS NS TClk (1-a)T1-2 Clk2 aT1-2 T2-1 Clk1 Tgap2 Tgap1 T1-2 Tgap1 > Tskew (to avoid overlap and thus a race condition & this also takes care of the skew problem that reduces that part of clock period available for the delays of the FF + logic + Tsu ) T2-1+aT1-2 (0< a <1) + Tgap1 > TP,FF+TP,Logic+Tsu + Tskew (1) (Note: Introducing a Tgap1 of at least Tskew also takes care of the reqmt to allow for Tskew in the above sum of the 3 delay components) (1- a)T1-2 > TP,FF + Tsu (2) The value of a is really not going to matter, since it disappears in aT1-2 + (1- a)T1-2 = T1-2, and on adding (1) and (2) we get: T2-1+T1-2 > 2TP,FF+TP,Logic+2Tsu (3) T1-2 = T2-1 (for symmetry requirements) Tgap1 = Tgap2 (for symmetry requirements) > Tskew this again takes care also of skew reducing the clock period in the various prop. delays and setup times are incurred. So, finally: Tclk = 1.1(T2-1 + T1-2 + Tgap1 + Tgap2 ) = 1.1(2TP,FF+TP,Logic+2Tsu+2Tskew) [w/ 10% safety gap] Note: Tgap1 = Tgap2 = Tskew, takes care of both requirements: a) no overlap in Clk1 and Clk2 due to skew; b) enough clock period Tclk to process all delays, where two different arrival times of clk1 (or clk2) at two different master (or slave) latches can differ by Tskew (the "usual" problem that we saw for edge-triggered FFs). No extra Tskew allowance needed in Tclk for the latter issue.
Clock Skew Clock skew is the maximum difference in the arrival time of a clock signal at two different components. Clock skew forces designers to use a large time period between clock pulses. This makes the system slower. So, in addition to other objectives, clock skew should be minimized during clock routing. From: David Pan, UT Austin
Clock Design Problem What are the main concerns for clock design? Skew No. 1 concern for clock networks For increased clock frequency, skew may contribute over 10% of the system cycle time Power very important, as clock is a major power consumer! It switches at every clock cycle! Noise Clock is often a very strong aggressor May need shielding Delay Not really important But slew rate is important (sharp transition) Within most VLSI circuits, data transfer between functional elements is synchronized by the processing clock. Clock frequency can directly determine the performance of a chip. In the case of a microprocessor the clock frequency determines the Instructions Per Second. IPS=f*NIPC. The clock signal is generated external to the chip and provided to the chip through the clock pin. Each functional unit computes and waits for the next clock signal to pass its results to another unit before the next processing cycle The clock signal is generated external to the chip and provided to the chip through the clock pin. Each functional unit computes and waits for the next clock signal to pass its results to another unit before the next processing cycle. Clock should arrive at the same time to all functional units to minimize delay From: David Pan, UT Austin
The Clock Routing Problem Given a source and n sinks (FFs). Connect all sinks to the source by an interconnect tree so as to minimize: Clock Skew = maxi,j |ti - tj| Delay = maxi ti Total wirelength Noise and coupling effect From: David Pan, UT Austin
H-Tree Clock Routing Tapping Point 4 Points 16 Points From: David Pan, UT Austin
Method of Means and Medians (MMM) Applicable when the clock terminals are arbitrarily arranged. Follows a strategy very similar to H-Tree. Recursively partition the terminals into two sets of equal size (median). Then, connect the center of mass of the whole circuit to the centers of mass of the two sub-circuits (mean). Clock skew is only minimized heuristically. The resulting tree may not have zero-skew. From: David Pan, UT Austin
An Example of MMM centers of mass From: David Pan, UT Austin