Presentation is loading. Please wait.

Presentation is loading. Please wait.

Synthesis of Asynchronous Control Circuits with Automatically Generated Relative Timing Assumptions Jordi Cortadella, University Politècnica de Catalunya.

Similar presentations


Presentation on theme: "Synthesis of Asynchronous Control Circuits with Automatically Generated Relative Timing Assumptions Jordi Cortadella, University Politècnica de Catalunya."— Presentation transcript:

1

2 Synthesis of Asynchronous Control Circuits with Automatically Generated Relative Timing Assumptions Jordi Cortadella, University Politècnica de Catalunya Mike Kishinevsky, Intel Corporation Steven M. Burns, Intel Corporation Ken Stevens, Intel Corporation Earlier contributions: Luciano Lavagno, Alex Kondratyev, Alex Yakovlev, Alexander Taubin

3 Outline Why asynchronous Relative timing Reminder: design flow for asynchronous circuits Lazy transition systems Timing assumptions and constraints Automatic generation of timing assumptions Results

4 Why asynchronous? –All high-performance “synchronous” design styles are “asynchronous in small” (within one/few clocks). Example: [ISSCC2001 Intel paper on 4GHz IEU for 0.18um CMOS in Pentium 4(tm)]. Requires asynchronous style timing analysis. –Relative sequential distance within a die for global wires is growing –Can we deliver global clock N years from now?

5 Timing assumptions in design flow Synchronous circuits (e.g., static CMOS): –max delay: stabilize within a clock (- setup - clock2q - clock_skew) –min delay: stabilize after hold time (+clock_skew - clock2q) Speed-independent = quasi-delay insensitive: wire delays after a fork smaller than fan-out gate delays [Muller59, Varshavsky et al. 80, Martin89,…]. Problem: fat circuits Burst-mode FSM: circuit stabilizes between two changes at the inputs [Nowick91, Yun94]. Problem: fundamental mode is similar to synchronous (external alignment by the worst case) Timed circuits: Absolute bounds on gate / environment delays are known a priori (before physical design) [Mayers95]. Problem: how do you know absolute delays before sizing/physical design?

6 Speed-independent C-element Relative Timing Asynchronous Circuits a- before b- Timing assumption (on environment): a b c RT C-element: faster,smaller; correct only under timing constraint: a- before b- a b c

7 Relative Timing Circuits Assumptions: “a before b” –for concurrent events: reduces reachable state space –for ordered events: permits early enabling –both increase don’t care space for logic synthesis => simplify logic (better area and timing) “Assume - if useful - guarantee” approach: assumptions are used by the tool to derive a circuit and required timing constraints that must be met in physical design flow Applied to design of the Rotating Asynchronous Pentium Processor(TM) Instruction Decoder (K.Stevens, S.Rotem et al. Intel Corporation)

8 STG for the READ cycle LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ LDS LDTACK D DSr DTACK VME Bus Controller

9 State Graph (Read cycle) DSr+ DTACK- LDS- LDTACK- D- DSr-DTACK+ D+ LDTACK+ LDS+

10 Binary encoding of signals DSr+ DTACK- LDS- LDTACK- D- DSr-DTACK+ D+ LDTACK+ LDS+ 10000 10010 10110 0111001110 01100 00110 10110 (DSr, DTACK, LDTACK, LDS, D)

11 Next-state function 0  1 LDS- LDS+ LDS- 1  0 0  0 1  1 10110

12 Karnaugh map for LDS DTACK DSr D LDTACK 00011110 00 01 11 10 DTACK DSr D LDTACK 00011110 00 01 11 10 LDS = 0 LDS = 1 01-0 000000/1? 1 111 - - - --- ---- - ---- ---

13 Signal Insertion LDS- LDTACK- D- DSr- LDTACK+ LDS+ CSC- CSC+ 101101 101100

14 Speed-independent netlist

15 ER (LDS+) ER (LDS-) LDS- LDS+ LDS- 1  0 0  1 Transition systems Excitation region: enabling = firing, since delay can be zero

16 Lazy Transition Systems ER (LDS+) ER (LDS-) LDS- LDS+ LDS- DTACK- FR (LDS-) Event LDS- is lazy: firing = subset of enabling

17 Timing assumptions (a before b) for concurrent events: concurrency reduction for firing and enabling (a before b) f or ordered events: early enabling (a simultaneous to b wrt c) for triples of events: combination of the above

18 Speed-independent Netlist LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ DTACK D DSr LDS LDTACK csc map

19 Adding timing assumptions (I) LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ DTACK D DSr LDS LDTACK csc map LDTACK- before DSr+ FAST SLOW

20 Adding timing assumptions (I) DTACK D DSr LDS LDTACK csc map LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ LDTACK- before DSr+

21 State space domain LDTACK- before DSr+ LDTACK- DSr+

22 State space domain LDTACK- before DSr+ LDTACK- DSr+

23 State space domain LDTACK- before DSr+ LDTACK- DSr+ Two more unreachable states

24 Boolean domain DTACK DSr D LDTACK 00011110 00 01 11 10 DTACK DSr D LDTACK 00011110 00 01 11 10 LDS = 0 LDS = 1 01-0 000000/1? 1 111 - - - --- ---- - ---- ---

25 Boolean domain DTACK DSr D LDTACK 00011110 00 01 11 10 DTACK DSr D LDTACK 00011110 00 01 11 10 LDS = 0 LDS = 1 01-0 00-001 1 111 - - - --- ---- - ---- --- One more DC vector for all signalsOne state conflict is removed

26 Netlist with one constraint LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ DTACK D DSr LDS LDTACK csc map

27 Netlist with one constraint LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ DTACK D DSr LDS LDTACK LDTACK- before DSr+ TIMING CONSTRAINT

28 Timing assumptions (a before b) for concurrent events: concurrency reduction for firing and enabling (a before b) f or ordered events: early enabling (a simultaneous to b wrt c) for triples of events: combination of the above

29 Ordered events: early enabling a c b a a c b a b b c c F G Logic for gate c may change

30 Adding timing assumptions (II) LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ DTACK D DSr LDS LDTACK D- before LDS-

31 State space domain LDS- D- Reachable space is unchanged For LDS- enabling can be changed in one state D- before LDS- Potential enabling for LDS- DSr-

32 Boolean domain DTACK DSr D LDTACK 00011110 00 01 11 10 DTACK DSr D LDTACK 00011110 00 01 11 10 LDS = 0 LDS = 1 01-0 00-001 1 111 - - - --- ---- - ---- ---

33 Boolean domain DTACK DSr D LDTACK 00011110 00 01 11 10 DTACK DSr D LDTACK 00011110 00 01 11 10 LDS = 0 LDS = 1 01-0 00-001 1 11 - - - - --- ---- - ---- --- One more DC vector for one signal: LDS If used: LDS = DSr, otherwise: LDS = DSr + D

34 Before early enabling LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ DTACK D DSr LDS LDTACK

35 Netlist with two constraints LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ LDTACK- before DSr+ and D- before LDS- TIMING CONSTRAINTS DTACK D DSr LDS LDTACK Both timing assumptions are used for optimization and become constraints

36 Rule I (out of 6): a,b - non-input events –Untimed ordering: a||b and a enabled before b, but not vice versa –Derived assumption: a fires before b –Justification: delay of a gate can be made shorter than delay of two (or more) gates: del(a) < del(c)+del(b) Deriving automatic timing assumptions aaa b b b c c

37 Rule I (out of 6): a,b - non-input events –Untimed ordering: (a||b) and (a enabled before b), but not vice versa –Derived assumption: a fires before b –Justification: delay of a gate can be made shorter than delay of two (or more) gates Deriving automatic timing assumptions aaa b b b c c –Effect I: a state becomes DC for all signals

38 Rule I (out of 6): a,b - non-input events –Untimed ordering: (a||b) and (a enabled before b), but not vice versa –Derived assumption: a fires before b –Justification: delay of a gate can be made shorter than delay of two (or more) gates Deriving automatic timing assumptions aaa b b b c c –Effect II: another state becomes local DC for signal of event b

39 Backannotation of Timing Constraints Timed circuits require post-verification Can synthesis tools help ? –Report the least stringent set of timing constraints required for the correctness of the circuit –Not all initial timing assumptions may be required Petrify reports a set of constraints for order of firing that guarantee the circuit correctness

40 Timing constraints generation a b c d e d d e e b b c c d a Assumptions: d before b and c before e and a before d

41 Timing constraints generation a b c d e Assumptions: d before b and c before e and a before d d d e e b b c c d a

42 Timing constraints generation a b c d e Assumptions: d before b and c before e and a before d d d e e b b c c Correct behavior d a

43 Timing constraints generation a b c d e Assumptions: d before b and c before e and a before d d d e e b b c c 1 2 Incorrect behavior d a

44 Covering incorrect behavior a b c d e Assumptions: d before b and c before e and a before d d d e e b b c c 1 24 3 {1, 3} d before b {1} d before c d a 5 {2, 4} c before e Other possible constraints remove states from assumption domain => invalid

45 Covering incorrect behavior a b c d e Assumptions: d before b and c before e and a before d d d e e b b c c 1 24 3 {1} d before c d a 5 {2, 4} c before e Constraints for the minimal cost solution: d before c and c before e

46 Timing aware state encoding Solve only state conflicts reachable in the RT assumptions domain Generate automatic timing assumptions for inserted state signals => state signals can be implemented as RT logic State variables inserted concurrently with I/O events => latency and cycle time reduction

47 Value of Relative Timing RT circuits provides up to 2-3x (1.3-2x) delay&area reduction with respect to SI circuits synthesized without (with) concurrency reduction Automatic generation of timing assumptions => foundation for automatic synthesis of RT circuits with area/performance comparable/better than manual Back-annotation of timing constraints => minimal required timing information for the back-end tools Timing-aware state encoding allows significant area/performance optimization

48 Specification (STG) State Graph SG with CSC Next-state functions Decomposed functions Gate netlist Reachability analysis State encoding Boolean minimization Logic decomposition Technology mapping Designflowwithouttiming

49 Specification (STG + user assumptions) Lazy State Graph Lazy SG with CSC Next-state functions Decomposed functions Gate netlist Reachability analysis Timing-aware state encoding Boolean minimization Logic decomposition Technology mapping Design Flow with Timing Required Timing Constraints Automatic Timing Assumptions

50 FIFO example FIFO li lo ro ri li- li+ lo+ lo- ro+ ro- ri+ ri-

51 Speed-Independent Implementation without concurrency reduction 3 state signals are required

52 SI implementation with concurrency reduction li lo ro ri x li- li+ lo+ lo- ro+ ro- ri+ ri- x+ x- + gC + -

53 RT implementation li lo ro ri x li- li+ lo+ lo- ro+ ro- ri+ ri- x+ x- OR li- li+ lo+ lo- ro+ ro- ri+ ri- x+ x-

54 RT implementation li lo ro ri x li- li+ lo+ lo- ro+ ro- ri+ ri- x+ x- OR li- li+ lo+ lo- ro+ ro- ri+ ri- x+ x- To satisfy the constraint: Delay(x- ) < Delay (ri+ ) and Delay(lo+) + Delay(x- ) < Delay(ro+ ) + Delay (ri+ ) All constraints are either satisfied by default or easy to satisfy by sizing


Download ppt "Synthesis of Asynchronous Control Circuits with Automatically Generated Relative Timing Assumptions Jordi Cortadella, University Politècnica de Catalunya."

Similar presentations


Ads by Google