1 Practical Design and Performance Evaluation of Completion Detection Circuits Fu-Chiung Cheng Department of Computer Science Columbia University.

Slides:



Advertisements
Similar presentations
Digital Integrated Circuits© Prentice Hall 1995 Combinational Logic COMBINATIONAL LOGIC.
Advertisements

Transmission Gate Based Circuits
CSET 4650 Field Programmable Logic Devices
Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
1 Clockless Logic  Recap: Lookahead Pipelines  High-Capacity Pipelines.
Asynchronous comparator design
Digital Integrated Circuits© Prentice Hall 1995 Devices The MOS Transistor.
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN1600) Lecture 21: Dynamic Combinational Circuit Design Prof. Sherief Reda Division of.
Chapter 09 Advanced Techniques in CMOS Logic Circuits
Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.
Combinational circuits Lection 6
Clock Design Adopted from David Harris of Harvey Mudd College.
Embedding of Asynchronous Wave Pipelines into Synchronous Data Processing Stephan Hermanns, Sorin Alexander Huss University of Technology Darmstadt, Germany.
Digital Integrated Circuits© Prentice Hall 1995 Timing ISSUES IN TIMING.
Designing Combinational Logic Circuits: Part2 Alternative Logic Forms:
1 Clockless Logic Montek Singh Thu, Jan 13, 2004.
Low Power Design for Wireless Sensor Networks Aki Happonen.
Latches Section 4-2 Mano & Kime. Sequential Logic Combinational Logic –Output depends only on current input Sequential Logic –Output depends not only.
Lecture #24 Gates to circuits
S. Reda EN160 SP’08 Design and Implementation of VLSI Systems (EN1600) Lecture 20: Combinational Circuit Design (2/3) Prof. Sherief Reda Division of Engineering,
Lecture #25 Timing issues
High-Throughput Asynchronous Pipelines for Fine-Grain Dynamic Datapaths Montek Singh and Steven Nowick Columbia University New York, USA
Introduction to CMOS VLSI Design Circuit Families.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture10: Delay Estimation Prof. Sherief Reda Division of Engineering, Brown University.
Circuit Families Adopted from David Harris of Harvey Mudd College.
1 Clockless Computing Montek Singh Thu, Sep 13, 2007.
Lecture 11 MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines.
Digital Integrated Circuits© Prentice Hall 1995 Combinational Logic COMBINATIONAL LOGIC.
1 Recap: Lectures 5 & 6 Classic Pipeline Styles 1. Williams and Horowitz’s PS0 pipeline 2. Sutherland’s micropipelines.
Field-Effect Transistors 1.Understand MOSFET operation. 2. Understand the basic operation of CMOS logic gates. 3. Make use of p-fet and n-fet for logic.
1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines.
Lecture 21, Slide 1EECS40, Fall 2004Prof. White Lecture #21 OUTLINE –Sequential logic circuits –Fan-out –Propagation delay –CMOS power consumption Reading:
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 18: Static Combinational Circuit Design (2/2) Prof. Sherief Reda Division.
VLSI Digital Systems Design Alternatives to Fully-Complementary CMOS Logic.
Digital Integrated Circuits for Communication
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 10.1 EE4800 CMOS Digital IC Design & Analysis Lecture 10 Combinational Circuit Design Zhuo Feng.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.
1 Delay Estimation Most digital designs have multiple data paths some of which are not critical. The critical path is defined as the path the offers the.
Asynchronous Datapath Design Adders Comparators Multipliers Registers Completion Detection Bus Pipeline …..
Amitava Mitra Intel Corp., Bangalore, India William F. McLaughlin
MOUSETRAP Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Montek Singh & Steven M. Nowick Department of Computer Science Columbia University,
A Class Presentation for VLSI Course by : Fatemeh Refan Based on the work Leakage Power Analysis and Comparison of Deep Submicron Logic Gates Geoff Merrett.
EE 447 VLSI Design Lecture 8: Circuit Families.
Paper review: High Speed Dynamic Asynchronous Pipeline: Self Precharging Style Name : Chi-Chuan Chuang Date : 2013/03/20.
Ratioed Circuits Ratioed circuits use weak pull-up and stronger pull-down networks. The input capacitance is reduced and hence logical effort. Correct.
NTU Confidential Test Asynchronous FIR Filter Design Presenter: Po-Chun Hsieh Advisor:Tzi-Dar Chiueh Date: 2003/12/1.
Area and Speed Oriented Implementations of Asynchronous Logic Operating Under Strong Constraints.
ECE442: Digital ElectronicsSpring 2008, CSUN, Zahid Static CMOS Logic ECE442: Digital Electronics.
1 Clockless Computing Montek Singh Thu, Sep 6, 2007  Review: Logic Gate Families  A classic asynchronous pipeline by Williams.
Lecture 10: Circuit Families. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 10: Circuit Families2 Outline  Pseudo-nMOS Logic  Dynamic Logic  Pass Transistor.
Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
UNIVERSITY OF ROSTOCK Institute of Applied Microelectronics and Computer Science Single-Rail Self-timed Logic Circuits in Synchronous Designs Frank Grassert,
Advanced VLSI Design Unit 04: Combinational and Sequential Circuits.
Reader: Pushpinder Kaur Chouhan
 Seattle Pacific University EE Logic System DesignNMOS-CMOS-1 Voltage-controlled Switches In order to build circuits that implement logic, we need.
Introduction to CMOS VLSI Design Lecture 9: Circuit Families
Reading Assignment: Rabaey: Chapter 9
NSC-2 Hybrid Hall Effect Devices -- a Novel Building Block for Reconfigurable Logic Steve Ferrera, Nicholas P. Carter University of Illinois at Urbana-Champaign.
Static CMOS Logic Seating chart updates
Solid-State Devices & Circuits
1 Bridging the gap between asynchronous design and designers Peter A. BeerelFulcrum Microsystems, Calabasas Hills, CA, USA Jordi CortadellaUniversitat.
Static Logic vs. Pseudo-nMOS Static Logic includes pull-up and pull-down networks - 2n transistors for n-input function. Pseudo-nMOS - n+1 transistors.
1 Recap: Lecture 4 Logic Implementation Styles:  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates, or “pass-transistor” logic.
1 Clockless Logic Montek Singh Thu, Mar 2, Review: Logic Gate Families  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates,
Reading: Hambley Ch. 7; Rabaey et al. Sec. 5.2
Lecture 10: Circuit Families
332:578 Deep Submicron VLSI Design Lecture 14 Design for Clock Skew
Lecture 10: Circuit Families
Presentation transcript:

1 Practical Design and Performance Evaluation of Completion Detection Circuits Fu-Chiung Cheng Department of Computer Science Columbia University

2 Outline Motivation Previous Work New Completion Detection Circuit Performance Evaluation Conclusion

Motivation Circuits: Synchronous or Asynchronous. Synchronization: Sync: a global clock Async: start and completion mechanisms

Motivation Potential advantages of async. design: No clock skew problem, Low power consumption, Average-case performance, Modularity, composability and reusability Easier technology migration The promise of high performance is especially attractive.

Motivation High performance async. design: 1. fast self-timed components with good average case performance 2. fast completion detection circuits, detecting the completion. Self-timed component + + C AABBAABB SSSSSSSS n-1 1 n-1 Ack 0 Ack n-1 DoneReset

Motivation High performance async. design: 1. fast self-timed components with good average case performance 2. fast completion detection circuits, detecting the completion. Self-timed component + + C AABBAABB SSSSSSSS n-1 1 n-1 Ack 0 Ack n-1 DoneReset

Motivation Fast self-timed components: 1. Delay-insensitive carry-lookahead adders 2. Delay-insensitive comparators:

Motivation Fast completion detection circuits: 1. Completion detection circuits (CDCs) are considered as the major overhead. 2. This paper address the design of fast completion detection circuits.

Previous Work: Self-timed components may use 1. bundled data protocol 2. dual-rail signaling

Previous Work: CDCs for bundled data components 1. Delay elements (an inverter chain). delay > worst case delay. 2. Speculative completion [Nowick97] performance depend on A. number of matched delays and B. associated abort detection network 3. Current-Sensing Completion-Detection [Dean94,Grass96] A. consume substantial power B. requires several gate delays

Previous Work: CDCs for dual-rail self-timed components 1. General model: A. n two-input ORs B. 1 n-input C-element 2. Operations: A. computation cycle: DoneReset=1 B. reset cycle: DoneReset=0 + + C SSSSSSSS n-1 1 n-1 Ack 0 Ack n DoneReset Self-timed component AABBAABB

Previous Work: N-input C-element: a tree of 2-input C-elms 1. long delay 2. large variance C C C C …. Ack 0 Ack 1 Ack n-2 Ack n-1 C

Previous Work: N-input C-element: 1. More efficient implementation: DoneReset = (done+reset DoneReset) A. done circuit: an n-input AND done = Ack 0 Ack 1 … Ack n-1 B. reset: circuit: an n-input OR reset = Ack 0 + Ack 1 + …+ Ack n-1 C. a 2-input C-elem. 2. delay & variance: better than the tree of 2-input C-elem & Ack 0 Ack n Ack 0 Ack n-1 C done reset DoneReset

Previous Work: Wuu’s CDCs [Wuu93]: A. done circuit: a tree of NAND B. reset circuit: a tree of NOR C. long delay D. small variance E. use static gates done reset

Previous Work: Yun’s CDCs [Yun97]: A. done circuit: a tree of domino logic B. no reset circuit C. variant delay D. large variance E. use dynamic CMOS

Our Design Computation Completion detection circuits (dynamic n-input NOR) (static 2-input NOR)

Our Design Reset Completion detection circuits (dynamic 2n-input Or)

Our Design Computation cycle: For the done signal, 1. the PMOS transistor (Acki) will be closed and 2. all NMOS transistors will be open. 3. Thus, the done signal will be turned on.

Our Design Computation cycle: For the reset signal, the reset signal is turned on as soon as any Acki signal goes high

Our Design Reset cycle: For the done signal, the done signal is turned off as soon as any Acki signal is turned off

Our Design Reset cycle: For the reset signal, the reset signal is turned off only after all Acki signals are turned off.

Our Design done + reset circuits = dual-rail multi-input C-element done + reset circuits + 2-input C-element = single-rail multi-input C-element Implementation of 2-input C-element:

DIRCA With CDC: part 1

DIRCA With CDC: part 2

Our Design The PMOS in the pull-up circuit of the done circuit saves power in non-operation mode. In a quiescent state, all Acki signals are zero. All pull-down transistors are closed. To save power, pull-up transistor is open to cut off the path from Vdd to Ground.

Our Design I nput low arrives too early, power is wasted. Input low arrives too late, take a longer time to turn on the done signal. Low power consumption latest Acki signal High performance any not-latest Acki signal

SPICE Output: done circuit ChengDone0: 1. Ack0 is the latest signal. 2. input pulses: 3 and 4 3. buffered input: Ack0: Done: DoneReset: 200 Delay=0.55ns

SPICE Output: done circuit ChengDone1: 1. Ack1 is the latest signal. 2. input pulses: 5 and 6 3. buffered input: Ack1: Done: DoneReset: 200 Delay=0.22ns

SPICE Output: done circuit ChengDone37: 1. All Ack arrive at the same time 2. Done: DoneReset: 200 Delay=0.64ns

SPICE Output: reset circuit Delay=1.23ns ChengReset0: 1. Ack0 is the latest signal. 2. input pulse: 3 and 4 3. buffered input: Reset: DoneReset: 200

SPICE Output: reset circuit Delay=0.87ns ChengReset1: 1. Ack0 is the latest signal. 2. input pulse: 3 and 4 3. buffered input: Reset: DoneReset: 200

SPICE Output: reset circuit Delay=1.34ns ChengReset37: 1. All Ack reset at the same time 2. Done: DoneReset: 200

Our Design Constraint: when conducting, when only one pull-down transistor is conducting. This can be achieved by properly sizing transistors.

Logic Complexity # of transistors

Performance Evaluation SPICE Simulation: 1. use MOSIS 2 micron CMOS level 2 parameters 2. W=3u L=2u (buffer 0.4 ns 2-input Nor 0.18ns) Computation-completion detection circuits 38 typical cases (for Wuu, Yun and Cheng) The delay measured includes the delay of the OR gate for Acki. Reset-completion detection circuits: 38 typical cases (Wuu and Cheng)

Performance Evaluation

Conclusions A new completion detection circuit for dual-rail self-timed components. 1. very fast computation-completion detection 2. very fast reset-completion detection Low-overhead, very fast completion detection circuit is crucial for high performance self-timed circuits.

Conclusions SPICE simulation results: 1. our computation-completion detection circuit 9 times faster than Wuu's and Yun's 2. our reset-completion detection circuit: 2.7 times faster than Wuu's.