Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Practical Design and Performance Evaluation of Completion Detection Circuits Fu-Chiung Cheng Department of Computer Science Columbia University.

Similar presentations


Presentation on theme: "1 Practical Design and Performance Evaluation of Completion Detection Circuits Fu-Chiung Cheng Department of Computer Science Columbia University."— Presentation transcript:

1 1 Practical Design and Performance Evaluation of Completion Detection Circuits Fu-Chiung Cheng Department of Computer Science Columbia University

2 2 Outline Motivation Previous Work New Completion Detection Circuit Performance Evaluation Conclusion

3 Motivation Circuits: Synchronous or Asynchronous. Synchronization: Sync: a global clock Async: start and completion mechanisms

4 Motivation Potential advantages of async. design: No clock skew problem, Low power consumption, Average-case performance, Modularity, composability and reusability Easier technology migration The promise of high performance is especially attractive.

5 Motivation High performance async. design: 1. fast self-timed components with good average case performance 2. fast completion detection circuits, detecting the completion. Self-timed component + + C AABBAABB 0000 1010 0000 1010 SSSSSSSS 0000 1010 0 n-1 1 n-1 Ack 0 Ack n-1 DoneReset..................

6 Motivation High performance async. design: 1. fast self-timed components with good average case performance 2. fast completion detection circuits, detecting the completion. Self-timed component + + C AABBAABB 0000 1010 0000 1010 SSSSSSSS 0000 1010 0 n-1 1 n-1 Ack 0 Ack n-1 DoneReset..................

7 Motivation Fast self-timed components: 1. Delay-insensitive carry-lookahead adders 2. Delay-insensitive comparators:

8 Motivation Fast completion detection circuits: 1. Completion detection circuits (CDCs) are considered as the major overhead. 2. This paper address the design of fast completion detection circuits.

9 Previous Work: Self-timed components may use 1. bundled data protocol 2. dual-rail signaling

10 Previous Work: CDCs for bundled data components 1. Delay elements (an inverter chain). delay > worst case delay. 2. Speculative completion [Nowick97] performance depend on A. number of matched delays and B. associated abort detection network 3. Current-Sensing Completion-Detection [Dean94,Grass96] A. consume substantial power B. requires several gate delays

11 Previous Work: CDCs for dual-rail self-timed components 1. General model: A. n two-input ORs B. 1 n-input C-element 2. Operations: A. computation cycle: DoneReset=1 B. reset cycle: DoneReset=0 + + C SSSSSSSS 0000 1010 0 n-1 1 n-1 Ack 0 Ack n-1............ DoneReset Self-timed component AABBAABB 0000 1010 0000 1010......

12 Previous Work: N-input C-element: a tree of 2-input C-elms 1. long delay 2. large variance C C C C …. Ack 0 Ack 1 Ack n-2 Ack n-1 C

13 Previous Work: N-input C-element: 1. More efficient implementation: DoneReset = (done+reset DoneReset) A. done circuit: an n-input AND done = Ack 0 Ack 1 … Ack n-1 B. reset: circuit: an n-input OR reset = Ack 0 + Ack 1 + …+ Ack n-1 C. a 2-input C-elem. 2. delay & variance: better than the tree of 2-input C-elem &...... Ack 0 Ack n-1 +...... Ack 0 Ack n-1 C done reset DoneReset

14 Previous Work: Wuu’s CDCs [Wuu93]: A. done circuit: a tree of NAND B. reset circuit: a tree of NOR C. long delay D. small variance E. use static gates done reset

15 Previous Work: Yun’s CDCs [Yun97]: A. done circuit: a tree of domino logic B. no reset circuit C. variant delay D. large variance E. use dynamic CMOS

16 Our Design Computation Completion detection circuits (dynamic n-input NOR) (static 2-input NOR)

17 Our Design Reset Completion detection circuits (dynamic 2n-input Or)

18 Our Design Computation cycle: For the done signal, 1. the PMOS transistor (Acki) will be closed and 2. all NMOS transistors will be open. 3. Thus, the done signal will be turned on.

19 Our Design Computation cycle: For the reset signal, the reset signal is turned on as soon as any Acki signal goes high

20 Our Design Reset cycle: For the done signal, the done signal is turned off as soon as any Acki signal is turned off

21 Our Design Reset cycle: For the reset signal, the reset signal is turned off only after all Acki signals are turned off.

22 Our Design done + reset circuits = dual-rail multi-input C-element done + reset circuits + 2-input C-element = single-rail multi-input C-element Implementation of 2-input C-element:

23 DIRCA With CDC: part 1

24 DIRCA With CDC: part 2

25 Our Design The PMOS in the pull-up circuit of the done circuit saves power in non-operation mode. In a quiescent state, all Acki signals are zero. All pull-down transistors are closed. To save power, pull-up transistor is open to cut off the path from Vdd to Ground.

26 Our Design I nput low arrives too early, power is wasted. Input low arrives too late, take a longer time to turn on the done signal. Low power consumption latest Acki signal High performance any not-latest Acki signal

27 SPICE Output: done circuit ChengDone0: 1. Ack0 is the latest signal. 2. input pulses: 3 and 4 3. buffered input:1004 4. Ack0:100 5. Done:24680 6. DoneReset: 200 Delay=0.55ns

28 SPICE Output: done circuit ChengDone1: 1. Ack1 is the latest signal. 2. input pulses: 5 and 6 3. buffered input:1006 4. Ack1:101 5. Done:24680 6. DoneReset: 200 Delay=0.22ns

29 SPICE Output: done circuit ChengDone37: 1. All Ack arrive at the same time 2. Done:24680 3. DoneReset: 200 Delay=0.64ns

30 SPICE Output: reset circuit Delay=1.23ns ChengReset0: 1. Ack0 is the latest signal. 2. input pulse: 3 and 4 3. buffered input:1004 5. Reset:13579 6. DoneReset: 200

31 SPICE Output: reset circuit Delay=0.87ns ChengReset1: 1. Ack0 is the latest signal. 2. input pulse: 3 and 4 3. buffered input:1004 5. Reset:13579 6. DoneReset: 200

32 SPICE Output: reset circuit Delay=1.34ns ChengReset37: 1. All Ack reset at the same time 2. Done:24680 3. DoneReset: 200

33 Our Design Constraint: when conducting, when only one pull-down transistor is conducting. This can be achieved by properly sizing transistors.

34 Logic Complexity # of transistors

35 Performance Evaluation SPICE Simulation: 1. use MOSIS 2 micron CMOS level 2 parameters 2. W=3u L=2u (buffer 0.4 ns 2-input Nor 0.18ns) Computation-completion detection circuits 38 typical cases (for Wuu, Yun and Cheng) The delay measured includes the delay of the OR gate for Acki. Reset-completion detection circuits: 38 typical cases (Wuu and Cheng)

36 Performance Evaluation

37

38

39

40 Conclusions A new completion detection circuit for dual-rail self-timed components. 1. very fast computation-completion detection 2. very fast reset-completion detection Low-overhead, very fast completion detection circuit is crucial for high performance self-timed circuits.

41 Conclusions SPICE simulation results: 1. our computation-completion detection circuit 9 times faster than Wuu's and Yun's 2. our reset-completion detection circuit: 2.7 times faster than Wuu's.


Download ppt "1 Practical Design and Performance Evaluation of Completion Detection Circuits Fu-Chiung Cheng Department of Computer Science Columbia University."

Similar presentations


Ads by Google