Download presentation
Presentation is loading. Please wait.
Published byBritney Cameron Modified over 9 years ago
1
1 Practical Design and Performance Evaluation of Completion Detection Circuits Fu-Chiung Cheng Department of Computer Science Columbia University
2
2 Outline Motivation Previous Work New Completion Detection Circuit Performance Evaluation Conclusion
3
Motivation Circuits: Synchronous or Asynchronous. Synchronization: Sync: a global clock Async: start and completion mechanisms
4
Motivation Potential advantages of async. design: No clock skew problem, Low power consumption, Average-case performance, Modularity, composability and reusability Easier technology migration The promise of high performance is especially attractive.
5
Motivation High performance async. design: 1. fast self-timed components with good average case performance 2. fast completion detection circuits, detecting the completion. Self-timed component + + C AABBAABB 0000 1010 0000 1010 SSSSSSSS 0000 1010 0 n-1 1 n-1 Ack 0 Ack n-1 DoneReset..................
6
Motivation High performance async. design: 1. fast self-timed components with good average case performance 2. fast completion detection circuits, detecting the completion. Self-timed component + + C AABBAABB 0000 1010 0000 1010 SSSSSSSS 0000 1010 0 n-1 1 n-1 Ack 0 Ack n-1 DoneReset..................
7
Motivation Fast self-timed components: 1. Delay-insensitive carry-lookahead adders 2. Delay-insensitive comparators:
8
Motivation Fast completion detection circuits: 1. Completion detection circuits (CDCs) are considered as the major overhead. 2. This paper address the design of fast completion detection circuits.
9
Previous Work: Self-timed components may use 1. bundled data protocol 2. dual-rail signaling
10
Previous Work: CDCs for bundled data components 1. Delay elements (an inverter chain). delay > worst case delay. 2. Speculative completion [Nowick97] performance depend on A. number of matched delays and B. associated abort detection network 3. Current-Sensing Completion-Detection [Dean94,Grass96] A. consume substantial power B. requires several gate delays
11
Previous Work: CDCs for dual-rail self-timed components 1. General model: A. n two-input ORs B. 1 n-input C-element 2. Operations: A. computation cycle: DoneReset=1 B. reset cycle: DoneReset=0 + + C SSSSSSSS 0000 1010 0 n-1 1 n-1 Ack 0 Ack n-1............ DoneReset Self-timed component AABBAABB 0000 1010 0000 1010......
12
Previous Work: N-input C-element: a tree of 2-input C-elms 1. long delay 2. large variance C C C C …. Ack 0 Ack 1 Ack n-2 Ack n-1 C
13
Previous Work: N-input C-element: 1. More efficient implementation: DoneReset = (done+reset DoneReset) A. done circuit: an n-input AND done = Ack 0 Ack 1 … Ack n-1 B. reset: circuit: an n-input OR reset = Ack 0 + Ack 1 + …+ Ack n-1 C. a 2-input C-elem. 2. delay & variance: better than the tree of 2-input C-elem &...... Ack 0 Ack n-1 +...... Ack 0 Ack n-1 C done reset DoneReset
14
Previous Work: Wuu’s CDCs [Wuu93]: A. done circuit: a tree of NAND B. reset circuit: a tree of NOR C. long delay D. small variance E. use static gates done reset
15
Previous Work: Yun’s CDCs [Yun97]: A. done circuit: a tree of domino logic B. no reset circuit C. variant delay D. large variance E. use dynamic CMOS
16
Our Design Computation Completion detection circuits (dynamic n-input NOR) (static 2-input NOR)
17
Our Design Reset Completion detection circuits (dynamic 2n-input Or)
18
Our Design Computation cycle: For the done signal, 1. the PMOS transistor (Acki) will be closed and 2. all NMOS transistors will be open. 3. Thus, the done signal will be turned on.
19
Our Design Computation cycle: For the reset signal, the reset signal is turned on as soon as any Acki signal goes high
20
Our Design Reset cycle: For the done signal, the done signal is turned off as soon as any Acki signal is turned off
21
Our Design Reset cycle: For the reset signal, the reset signal is turned off only after all Acki signals are turned off.
22
Our Design done + reset circuits = dual-rail multi-input C-element done + reset circuits + 2-input C-element = single-rail multi-input C-element Implementation of 2-input C-element:
23
DIRCA With CDC: part 1
24
DIRCA With CDC: part 2
25
Our Design The PMOS in the pull-up circuit of the done circuit saves power in non-operation mode. In a quiescent state, all Acki signals are zero. All pull-down transistors are closed. To save power, pull-up transistor is open to cut off the path from Vdd to Ground.
26
Our Design I nput low arrives too early, power is wasted. Input low arrives too late, take a longer time to turn on the done signal. Low power consumption latest Acki signal High performance any not-latest Acki signal
27
SPICE Output: done circuit ChengDone0: 1. Ack0 is the latest signal. 2. input pulses: 3 and 4 3. buffered input:1004 4. Ack0:100 5. Done:24680 6. DoneReset: 200 Delay=0.55ns
28
SPICE Output: done circuit ChengDone1: 1. Ack1 is the latest signal. 2. input pulses: 5 and 6 3. buffered input:1006 4. Ack1:101 5. Done:24680 6. DoneReset: 200 Delay=0.22ns
29
SPICE Output: done circuit ChengDone37: 1. All Ack arrive at the same time 2. Done:24680 3. DoneReset: 200 Delay=0.64ns
30
SPICE Output: reset circuit Delay=1.23ns ChengReset0: 1. Ack0 is the latest signal. 2. input pulse: 3 and 4 3. buffered input:1004 5. Reset:13579 6. DoneReset: 200
31
SPICE Output: reset circuit Delay=0.87ns ChengReset1: 1. Ack0 is the latest signal. 2. input pulse: 3 and 4 3. buffered input:1004 5. Reset:13579 6. DoneReset: 200
32
SPICE Output: reset circuit Delay=1.34ns ChengReset37: 1. All Ack reset at the same time 2. Done:24680 3. DoneReset: 200
33
Our Design Constraint: when conducting, when only one pull-down transistor is conducting. This can be achieved by properly sizing transistors.
34
Logic Complexity # of transistors
35
Performance Evaluation SPICE Simulation: 1. use MOSIS 2 micron CMOS level 2 parameters 2. W=3u L=2u (buffer 0.4 ns 2-input Nor 0.18ns) Computation-completion detection circuits 38 typical cases (for Wuu, Yun and Cheng) The delay measured includes the delay of the OR gate for Acki. Reset-completion detection circuits: 38 typical cases (Wuu and Cheng)
36
Performance Evaluation
40
Conclusions A new completion detection circuit for dual-rail self-timed components. 1. very fast computation-completion detection 2. very fast reset-completion detection Low-overhead, very fast completion detection circuit is crucial for high performance self-timed circuits.
41
Conclusions SPICE simulation results: 1. our computation-completion detection circuit 9 times faster than Wuu's and Yun's 2. our reset-completion detection circuit: 2.7 times faster than Wuu's.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.