Adapting Synchronizers to the Effects of On-Chip Variability David Kinniment Alex Yakovlev Jun Zhou Gordon Russell Presented by Dmitry Verbitsky.

Slides:



Advertisements
Similar presentations
Transmission Gate Based Circuits
Advertisements

1 Lecture 16 Timing  Terminology  Timing issues  Asynchronous inputs.
On-Chip Processing for the Wave Union TDC Implemented in FPGA
A Robust, Fast Pulsed Flip- Flop Design By: Arunprasad Venkatraman Rajesh Garg Sunil Khatri Department of Electrical and Computer Engineering, Texas A.
ELEC 256 / Saif Zahir UBC / 2000 Timing Methodology Overview Set of rules for interconnecting components and clocks When followed, guarantee proper operation.
Lecture 11: Sequential Circuit Design. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 11: Sequential Circuits2 Outline  Sequencing  Sequencing Element Design.
CHAPTER 3 Sequential Logic/ Circuits.  Concept of Sequential Logic  Latch and Flip-flops (FFs)  Shift Registers and Application  Counters (Types,
EKT 124 / 3 DIGITAL ELEKTRONIC 1
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Reconfigurable Computing - Clocks John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound, Western Australia.
Synchronous Digital Design Methodology and Guidelines
1 Digital Design: State Machines Timing Behavior Credits : Slides adapted from: J.F. Wakerly, Digital Design, 4/e, Prentice Hall, 2006 C.H. Roth, Fundamentals.
Clock Design Adopted from David Harris of Harvey Mudd College.
Assume array size is 256 (mult: 4ns, add: 2ns)
A 16-Bit Kogge Stone PS-CMOS adder with Signal Completion Seng-Oon Toh, Daniel Huang, Jan Rabaey May 9, 2005 EE241 Final Project.
Dynamic Scan Clock Control In BIST Circuits Priyadharshini Shanmugasundaram Vishwani D. Agrawal
1 Delay Insensitivity does not mean slope insensitivity! Vainbaum Yuri.
Externally Tested Scan Circuit with Built-In Activity Monitor and Adaptive Test Clock Priyadharshini Shanmugasundaram Vishwani D. Agrawal.
1 A Variation-tolerant Sub- threshold Design Approach Nikhil Jayakumar Sunil P. Khatri. Texas A&M University, College Station, TX.
EXAMPLES OF FEEDBACK Today we will
Embedded Systems Hardware:
Lecture #24 Gates to circuits
Asynchronous Input Example Program counter normally increments, jumps to address of interrupt subroutine on asynchronous interrupt How many states can.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 13: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
Embedded Systems Hardware: Storage Elements; Finite State Machines; Sequential Logic.
Lecture 7: Power.
Chapter #6: Sequential Logic Design 6.2 Timing Methodologies
1 EE365 Synchronous Design Methodology Asynchronous Inputs Synchronizers and Metastability.
Comparators  A comparator compares two input words.  The following slide shows a simple comparator which takes two inputs, A, and B, each of length 4.
111/9/2005EE 108A Lecture 13 (c) 2005 W. J. Dally EE108A Lecture 13: Metastability and Synchronization Failure (or When Good Flip-Flops go Bad)
Sequential Circuit  It is a type of logic circuit whose output depends not only on the present value of its input signals but on the past history of its.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
ECE 342 Electronic Circuits 2. MOS Transistors
© 2010 Blended Integrated Circuit Systems, LLC Tom Chaney, Dave Zar, Metastability (What?) TexPoint fonts used in.
ENGG 6090 Topic Review1 How to reduce the power dissipation? Switching Activity Switched Capacitance Voltage Scaling.
MOS Transistors The gate material of Metal Oxide Semiconductor Field Effect Transistors was original made of metal hence the name. Present day devices’
1 CMOS Temperature Sensor with Ring Oscillator for Mobile DRAM Self-refresh Control IEEE International Symposium on Circuits and Systems, Chan-Kyung.
Mehdi Sadi, Italo Armenti Design of a Near Threshold Low Power DLL for Multiphase Clock Generation and Frequency Multiplication.
EE415 VLSI Design DYNAMIC LOGIC [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Chapter 07 Electronic Analysis of CMOS Logic Gates
Paper review: High Speed Dynamic Asynchronous Pipeline: Self Precharging Style Name : Chi-Chuan Chuang Date : 2013/03/20.
DCSL & LVDCSL: A High Fan-in, High Performance Differential Current Switch Logic Families Dinesh Somasekhaar, Kaushik Roy Presented by Hazem Awad.
Low Power – High Speed MCML Circuits (II)
A Robust Pulse-triggered Flip-Flop and Enhanced Scan Cell Design
© BYU 18 ASYNCH Page 1 ECEn 224 Handling Asynchronous Inputs.
Basic Sequential Components CT101 – Computing Systems Organization.
1 CSE370, Lecture 17 Lecture 17 u Logistics n Lab 7 this week n HW6 is due Friday n Office Hours íMine: Friday 10:00-11:00 as usual íSara: Thursday 2:30-3:20.
The Effects of Operating Conditions on Speed and Power of Replica – Based SRAM Circuits Nika Sharifvaghefi Nicholas Kumar EE241 - Spring 2012.
A 10b Ternary SAR (TSAR) ADC with Decision Time Quantization Based Redundancy Jon Guerber, Manideep Gande, Hariprasath Venkatram, Allen Waters, Un-Ku Moon.
Adapting the LHC 1TFB electronic circuit to other equipments The candidates are: PS 1TFB PS TFB PS CBFB PSB TFB PSB 1TFB 1 Alfred Blas Working group meeting.
Sp09 CMPEN 411 L18 S.1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 16: Static Sequential Circuits [Adapted from Rabaey’s Digital Integrated Circuits,
Final Presentation Winter 2010 Performed by: Tomer Michaeli Liav Cohen Supervisor: Shlomo Beer Gingold In collaboration with: characterization.
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Basic FPGA architecture.
By Nasir Mahmood.  The NoC solution brings a networking method to on-chip communication.
Other Logic Implementations
Patricia Gonzalez Divya Akella VLSI Class Project.
EE201C : Stochastic Modeling of FinFET LER and Circuits Optimization based on Stochastic Modeling Shaodi Wang
Modern VLSI Design 3e: Chapter 3 Copyright  1998, 2002 Prentice Hall PTR Topics n Electrical properties of static combinational gates: –transfer characteristics;
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 6.1 EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng.
Synchronous Sequential Circuits by Dr. Amin Danial Asham.
Seok-jae, Lee VLSI Signal Processing Lab. Korea University
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
Performed by: Tomer Michaeli Liav Cohen Instructor: Shlomo Beer Gingold Cooperated with: המעבדה למערכות ספרתיות מהירות High speed digital.
Sequential Logic Design
Lecture 11: Sequential Circuit Design
Reading: Hambley Ch. 7; Rabaey et al. Sec. 5.2
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Hardware Masking, Revisited
Timing Analysis 11/21/2018.
Lecture 19 Logistics Last lecture Today
Presentation transcript:

Adapting Synchronizers to the Effects of On-Chip Variability David Kinniment Alex Yakovlev Jun Zhou Gordon Russell Presented by Dmitry Verbitsky

Overview Introduction Effects of On-chip Variability on Synchronizer Performance Proposed Adaptation Schemes Conclusions

Inter-Clock Domain Communication Data transfer between different clock domains should be performed carefully Incoming data change near receiver clock edge causes metastability, which may lead to a functional failure due to non-deterministic propagation delay –Either set-up or hold time is not satisfied

Metastability Resolution Real life: FF will finally resolve into one of the stable stages –Thanks to thermal noise –Thanks to the next clock transition

Synchronization Failure (1) Metastability is not a singular problem at the sampling time, it spreads through your circuit causing total failure!

Synchronization Failure (2) Long delay due to M/S causes violation of cycle time Failures due new M/S event or incorrect function

MTBF Mean Time Between Failures Given metastability at t = 0, probability of metastability at t > 0 = e -t/  Failure: Still metastable by next clock –Failure = p(enter m.s.)  p(still m.s. after T) –Rate(failure) = Rate(enter m.s)  p(still m.s. after T) =W  F c  F d  e -T/  MTBF = 1/ Rate( failure) =  resolution time constant of the synchronizer

Sources of device variability Random dopant fluctuations (RDF) Line-edge/line-width roughness (LER/LWR) Oxide thickness variations (OTV)

LER/LWR

RDF influence on V t 90 nm NMOS

LER/LWR influence on  (V t )

Voltage Threshold Effects on  and MTBF

Effects of On-chip Variability on Synchronizer Performance (1) Process variation At 180 nm  = 4%, so we can expect one synchronizer out of 1000 may have a 12% worse value of . At 45 nm this value will reach 50%. M.Garg et. al., ISCAS 2005, May 2005 & ITRS 2005

Temperature and Supply Voltage Effects on Synchronization for CMOS device in saturation region When synchronizer operates at low supply voltage, the decrease of drain current lengthens the delay and extends the time constant.

Drain Current vs Temperature Near ZTC (zero temperature coefficient) point temperature dependence is minimized

Carrier mobility vs Temperature Increases when temperature decreases High mobility increases the current At a high supply voltage (Vdd > ZTC and Vdd >> Vth) the drain current is dominantly controlled by the carrier mobility, and hence decreases with temperature rise

Threshold Voltage vs Temperature Increases when temperature decreases Higher Vth decreases the current When Vdd approaches Vth (Vdd < ZTC), Vth has a stronger effect on the drain current, and as a result the current grows with temperature rise

Drain Current in Saturation Region - threshold voltage - carrier mobility

High Vdd

Low Vdd

 Gate Delay vs Vdd, T

 vs Vdd, T

Effects of On-chip Variability on Synchronizer Performance (2) Voltage and Temperature variations Disproportional affect is observed. As a result a 50% reduction in power supply voltage may cause over 100% increase in  Simulation results of Jamb latch at 90nm

Synchronizer Selection Scheme (1) Problem: –Technology : 90 nm –  : 11 ps –  : 8% In the worst case we have to allow for a  of 3.09  or ps to ensure that the probability of a synchronizer having  worse than this is This will add the delay of all synchronizers on the chip and therefore affect the system performance.

Synchronizer Selection Scheme (2) Solution 1: –Make the width of all transistors in the synchronizer N times larger (say N = 4) –Assuming this reduces most of the process variations and the deviation is now: –The worst case becomes ps, but the power is increased by 4 times. –Increasing transistor size can not reduce all kinds of process variations, so the actual  will be more than 4%.

Synchronizer Selection Scheme (3) Solution 2: –Make N standard size synchronizers, measure their  on chip, and select the best one. –After the selection, all the others are powered down, as is the measurement circuitry. –Power during operation is therefore the same as for a single small synchronizer, but the performance is improved.

Synchronizer Selection Scheme (3) Example of N = 4 case –The probability of one synchronizer having  worse than ps is: –The probability of all 4 synchronizers having  worse than this is

Synchronizer Selection Scheme (4) Solution2 achieves better  than Solution1 Solution2 deals with all kinds of process variations (Solution1 doesn’t deal with oxide thickness)

Synchronization Time Adjustment Scheme (1) Problem: –PVT variations cause 50% worse value of  –To achieve the required MTBF, all the synchronizers have to be extended over 1.5 times their original values –Extended synchronization time may be wasted

Synchronization Time Adjustment Scheme (2) Solution: –Adjust synchronization time of each synchronizer according to actual PVT and data rate variations to improve the system performance on the conditions that the required MTBF is met

On-Chip Measurement of Failure Rates (1)

On-Chip Measurement of Failure Rates (2)

Calculation of  from Measured Failure Rates

Calculation of MTBF from Measured Failure Rates

Synchronizer Selection Scheme Architecture N redundant synchronizers Shared by N synchronizers, from which the best one is to be selected Values from counter2 are stored in a FIFO for comparison

Synchronizer Adjustment Scheme Architecture VDL – variable delay line. Used to control the synchronization time of the synchronizer Registers – used to hold the delay of the VDL Comparator – compares calculated MTBF with the user-required Memory – stores the calculation results for later use and user- inputted data such as clock frequencies for calculation

FPGA Implementation On-Chip OverheadOff-Chip Overhead Synchronizer Selection Scheme 9 flip-flops and 6 gates per synchronizer 34 flip-flops and 110 gates Synchronization Time Adjustment Scheme 33 flip-flops and 104 gates per synchronizer 436 flip-flops and 732 gates

Failure Detector Used to detect the failure at 2 different sampling times of the output of synchronizer Synchronizer is clocked by SCLK signal which is generated from the local clock signal CLK Synchronization time = |rising_edge(SCLK)-rising_edge(CLK)| T2-T1 = 100ps in FPGA implementation

Failure Counters (1)

Failure Counters (2) Count the number of failures detected at different sampling times Counters 1 and 2 are used to count the number of failures at the sampling times SCLK+T1 and SCLK+T2 Counter3 is used to count the number of clock cycles For the synchronizer selection scheme Counter3 is not needed so the hardware overhead can be further reduced

Synchronizer Selection Circuit 4 p-type transistors are used to switch the power for the four synchronizers After the best one is selected, the other three are powered down as is selection circuitry

Variable Delay Line(1) Usually implemented by transistor level circuits In FPGA can only be implemented as inverter chains. Inverters, in turn, are implemented by LUTs. LUT delay + wire delay > 1 ns on Spartan3 Smaller incremental delay can be achieved by using the connection delay difference on FPGA

Variable Delay Line(2) Careful placing of internal XOR gates can get an incremental delay which is the difference between the connection delay in two neighboring paths down to 100ps With VDL implemented on chip an incremental delay of 1ps can be easily achieved

Implementation of  and MTBF Calculation A = MTBF2 B = MTBF1 E = T2 – T1 G = T3 – T1 I = Counter3_output F = 

Division Implementation Divider is pipelined to achieve high performance and low area Divisor and dividend inputs are multiplexed to make it reusable Control Counter counts the number of clock cycles used for division Register stores divider output for later division steps

Log Calculation Implementation(1)

Log Calculation Implementation(2) Uses lookup tables Due to possibly large values it is impossible to build a full log LUT Different resolutions can be used for calculating different values (high resolution for small values, and low – for larger ones) 3 LUTs are used to provide an accuracy of 2 decimals, which leads to an error of 1% in calculated MTBF

Hardware Saving 80% of on-chip overhead goes on VDL implementation on FPGA When implemented on chip using transistors the overhead will be reduced by 50% Off-chip part can also be reduced by lowering the calculation accuracy

Application of 2 Schemes (1) Synchronizer selection scheme is aimed at improving synchronizer performance subject to process variation It only needs to operate once when setting up the chip since the process variation is fixed when the chip is fabricated After the selection, power consumption is the same as of single synchronizer, because all redundant modules are powered down The scheme has a small overhead and can be entirely put on chip

Application of 2 Schemes (2) Synchronization time adjustment scheme is used to deal with PVT and data rate variations Consumes relatively large amount of power and hardware Only needs to operate once when deals with process or fixed Vdd variations. Can be powered down after all When deals with frequent data rate or power variations, the scheme needs to be put entirely on chip and operate frequently Power consumption can be reduced by reducing hardware complexity and adjustment rate

Test Results(1) Measured MTBF vs Data Rate –Synchronization time = 3.5 ns –Clock frequency = 10MHz MTBF decreases with the data rate increasing as expected

Test Results(2) Measured MTBF vs Synchronization Time –Data Rate = 5MHz –Clock frequency = 10MHz MTBF increases with the synchronization time increasing as expected

Test Results(3) Measured  vs Vdd  increases with Vdd decreasing as expected

Conclusions Two adaptation schemes have been proposed to reduce the effects of on-chip variability on synchronizers. They both were implemented on Xilinx’s FGPA Spartan3 Synchronizer selection scheme deals with process variations, has a small overhead and can be put entirely on chip Synchronization time adjustment scheme deals with PVT and data rate variations. It has a relatively large overhead, which can be reduced by lowering the calculation accuracy of MTBF.

References J. Zhou, D. J. Kinniment, G. Russell, and A. Yakovlev, “Adapting Synchronizers to the Effects of On Chip Variability”, 14th IEEE International Symposium on Asynchronous Circuits and Systems, pp , Michael Kayam, Ran Ginosar and Charles E. Dike “Symmetric Boost Synchronizer for Robust Low Voltage, Low Temperature Operation,” Technical Report, Jan R. Dobkin, “vSync HDK customers presentation” D.J.Kinniment, A. Bystrov, A.V. Yakovlev, “Synchronization Circuit Performance”, IEEE Journal of Solid-State Circuits, 37(2), pp , 2002

The End Questions?

Rules for Normally Distributed Data If a data distribution is approximately normal, then: –68% of the data values are within  –95% of the data values are within  –99.7% of the data values are within 