Download presentation
Presentation is loading. Please wait.
2
Introduction to asynchronous circuit design: specification and synthesis Jordi Cortadella, Universitat Politècnica de Catalunya, Spain Michael Kishinevsky, Intel Corporation, USA Alex Kondratyev, Theseus Logic, USA Luciano Lavagno, Università di Udine, Italy
3
Outline I: Introduction to basic concepts on asynchronous design II: Synthesis of control circuits from STGs III: Advanced topics on synthesis of control circuits from STGs IV: Synthesis from HDL and other synthesis paradigms Note: no references in the tutorial
4
Introduction to asynchronous circuit design: specification and synthesis Part I: Introduction to basic concepts on asynchronous circuit design
5
Outline What is an asynchronous circuit ? Asynchronous communication Asynchronous logic blocks Micropipelines Control specification and implementation Delay models Why asynchronous circuits ?
6
Synchronous circuit RRRRCL CLK Implicit synchronization
7
Asynchronous circuit RRRRCL Explicit synchronization: Req/Ack handshakes Req Ack
8
Synchronous communication Clock edges determine the time instants where data must be sampled Data wires may glitch between clock edges (set-up/hold times must be satisfied) Data are transmitted at a fixed rate (clock frequency) 110010
9
Dual rail Two wires per bit –“00” = spacer, “01” = 0, “10” = 1 n-bit data communication requires 2n wires Each bit is self-timed Other delay-insensitive codes exist 11 00 1 0
10
Bundled data Validity signal –Similar to an aperiodic local clock n-bit data communication requires n+1 wires Data wires may glitch when no valid Signaling protocols –level sensitive (latch) –transition sensitive (register): 2-phase / 4-phase 110010
11
Example: memory read cycle Transition signaling, 4-phase Valid address Address Valid data Data AA DD
12
Example: memory read cycle Transition signaling, 2-phase Valid address Address Valid data Data AA DD
13
Outline What is an asynchronous circuit ? Asynchronous communication Asynchronous logic blocks Micropipelines Control specification and implementation Delay models Why asynchronous circuits ?
14
Asynchronous modules Signaling protocol: reqin+ start+ [computation] done+ reqout+ ackout+ ackin+ reqin- start- [reset] done- reqout- ackout- ackin- (more concurrency is also possible, e.g. by overlapping the return-to- zero phase of step i-1 with the evaluation phase of step i) Data INData OUT req inreq out ack inack out DATA PATH CONTROL startdone
15
Completion detection Cdone Completion detection tree
16
Asynchronous latches: C element C A B Z A B Z + 0 0 0 0 1 Z 1 0 Z 1 1 1 Vdd Gnd A A A AB B B B Z Z Z
17
Dual-rail logic A.t A.f B.t B.f C.t C.f Dual-rail AND gate Valid behavior for monotonic environment
18
Differential cascode voltage switch logic start A.t B.t C.t A.fB.f C.f Z.tZ.f done 3-input AND/NAND gate
19
Bundled-data logic blocks delay startdone logic Conventional logic + matched delay
20
Micropipelines (Sutherland 89) LLLLlogic R in A out C C C C R out A in delay
21
Data-path / Control LLLLlogic R in R out CONTROL A in A out
22
Outline What is an asynchronous circuit ? Asynchronous communication Asynchronous logic blocks Micropipelines Control specification and implementation Delay models Why asynchronous circuits ?
23
Control specification A+ B+ A- B- A B A input B output
24
Control specification A+ B+ A- B- A B
25
Control specification A+ B- A- B+ A B
26
Control specification A+ C- A- C+ A C B+ B- B C
27
Control specification A+ C- A- C+ A C B+ B- B C
28
Control specification C C Ri Ro Ai Ao Ri+ Ao+ Ri- Ao- Ro+ Ai+ Ro- Ai- Ri Ro Ao Ai FIFO cntrl
29
A simple filter: specification y := 0; loop x := READ (IN); WRITE (OUT, (x+y)/2); y := x; end loop R in A in A out R out IN OUT filter
30
A simple filter: block diagram xy + control R in A in R out A out RxRx AxAx RyRy AyAy RaRa AaAa IN OUT x and y are level-sensitive latches (transparent when R=1) + is a bundled-data adder (matched delay between R a and A a ) R in indicates the validity of IN After A in + the environment is allowed to change IN (R out,A out ) control a level-sensitive latch at the output
31
A simple filter: control spec. xy + control R in A in R out A out RxRx AxAx RyRy AyAy RaRa AaAa IN OUT R in + A in + R in - A in - Rx+Rx+ Ax+Ax+ Rx-Rx- Ax-Ax- Ry+Ry+ Ay+Ay+ Ry-Ry- Ay-Ay- Ra+Ra+ Aa+Aa+ Ra-Ra- Aa-Aa- R out + A out + R out - A out -
32
A simple filter: control impl. R in + A in + R in - A in - Rx+Rx+ Ax+Ax+ Rx-Rx- Ax-Ax- Ry+Ry+ Ay+Ay+ Ry-Ry- Ay-Ay- Ra+Ra+ Aa+Aa+ Ra-Ra- Aa-Aa- R out + A out + R out - A out - C R in A in RxRx AxAx RyRy AyAy AaAa RaRa A out R out
33
Control: observable behavior Rx+Rx+ R in + Ax+Ax+Ra+Ra+Aa+Aa+R out +A out +z+R out -A out -Ry+Ry+ Ry-Ry- Ay+Ay+ Rx-Rx-Ax-Ax- Ay-Ay- A in - A in + Ra-Ra- R in - Aa-Aa- z- C R in A in RxRx AxAx RyRy AyAy AaAa RaRa A out R out z
34
Outline What is an asynchronous circuit ? Asynchronous communication Asynchronous logic blocks Micropipelines Control specification and implementation Delay models Why asynchronous circuits ?
35
Taking delays into account x+ x- y+ y- z+ z- x z y x’ z’ Delay assumptions: Environment: 3 times units Gates: 1 time unit events: x+ x’- y+ z+ z’- x- x’+ z- z’+ y- time: 3 4 5 6 7 9 10 12 13 14
36
Taking delays into account x+ x- y+ y- z+ z- x z y x’ z’ Delay assumptions: unbounded delays events: x+ x’- y+ z+ x- x’+ y- time: 3 4 5 6 9 10 11 very slow failure !
37
Gate vs wire delay models Gate delay model: delays in gates, no delays in wires Wire delay model: delays in gates and wires
38
Delay models for async. circuits Bounded delays (BD): realistic for gates and wires. –Technology mapping is easy, verification is difficult Speed independent (SI): Unbounded (pessimistic) delays for gates and “negligible” (optimistic) delays for wires. –Technology mapping is more difficult, verification is easy Delay insensitive (DI): Unbounded (pessimistic) delays for gates and wires. –DI class (built out of basic gates) is almost empty Quasi-delay insensitive (QDI): Delay insensitive except for critical wire forks (isochronic forks). –Formally, it is the same as speed independent –In practice, different synthesis strategies are used BD SI QDI DI
39
Motivation (designer’s view) Modularity –Plug-and-play interconnectivity Reusability –IPs with abstract timing behaviors High-performance –Average-case performance (no worst-case delay synchronization) –No clock skew (local timing assumptions instead) Many interfaces are asynchronous –Buses, networks,...
40
Motivation (technology aspects) Low power –Automatic clock gating Electromagnetic compatibility –No peak currents around clock edges Robustness –High immunity to technology and environment variations (in-die variations, temperature, power supply,...)
41
Problems Concurrent models for specification –CSP, Petri nets,... Difficult to design –Hazards, synchronization Complex timing analysis –Difficult to estimate performance Difficult to test –No way to stop the clock
42
But we have some success stories... Philips AMULET microprocessors Sharp Intel (RAPPID) IBM (interlocked pipeline) Start-up companies: –Theseus Logic, Cogency, ADD...
43
Introduction to asynchronous circuit design: specification and synthesis Part II: Synthesis of control circuits from STGs
44
Outline Overview of the synthesis flow Specification State graph and next-state functions State encoding Implementability conditions Speed-independent circuit –Complex gates –C-element architecture
45
Specification (STG) State Graph SG with CSC Next-state functions Decomposed functions Gate netlist Reachability analysis State encoding Boolean minimization Logic decomposition Technology mapping Designflow
46
x y z x+ x- y+ y- z+ z- Signal Transition Graph (STG) x y z
47
x y z x+ x- y+ y- z+ z-
48
x+ x- y+ y- z+ z- xyz 000 x+ 100 y+ z+ y+ 101 110 111 x- 001 011 y+ z- 010 y-
49
xyz 000 x+ 100 y+ z+ y+ 101 110 111 x- 001 011 y+ z- 010 y- Next-state functions
50
x z y
51
Outline Overview of the synthesis flow Specification State graph and next-state functions State encoding Implementability conditions Speed-independent circuit –Complex gates –C-element architecture
52
Specification (STG) State Graph SG with CSC Next-state functions Decomposed functions Gate netlist Reachability analysis State encoding Boolean minimization Logic decomposition Technology mapping Designflow
53
VME bus Device LDS LDTACK D DSr DSw DTACK VME Bus Controller Data Transceiver Bus DSr LDS LDTACK D DTACK Read Cycle
54
STG for the READ cycle LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ LDS LDTACK D DSr DTACK VME Bus Controller
55
Choice: Read and Write cycles DSr+ LDS+ LDTACK+ D+ DTACK+ DSr- D- LDS- LDTACK-DTACK- DSw+ D+ LDS+ LDTACK+ D- DTACK+ DSw- LDS- LDTACK-DTACK-
56
Choice: Read and Write cycles DTACK- DSr+ LDS+ LDTACK+ D+ DTACK+ DSr- D- LDS- LDTACK- DSw+ D+ LDS+ LDTACK+ D- DTACK+ DSw- LDS- LDTACK-DTACK-
57
Choice: Read and Write cycles DTACK- DSr+ LDS+ LDTACK+ D+ DTACK+ DSr- D- LDS- LDTACK- DSw+ D+ LDS+ LDTACK+ D- DTACK+ DSw- LDS- LDTACK-DTACK-
58
Choice: Read and Write cycles DTACK- DSr+ LDS+ LDTACK+ D+ DTACK+ DSr- D- LDS- LDTACK- DSw+ D+ LDS+ LDTACK+ D- DTACK+ DSw- LDS- LDTACK-DTACK-
59
Circuit synthesis Goal: –Derive a hazard-free circuit under a given delay model and mode of operation
60
Outline Overview of the synthesis flow Specification State graph and next-state functions State encoding Implementability conditions Speed-independent circuit –Complex gates –C-element architecture
61
Specification (STG) State Graph SG with CSC Next-state functions Decomposed functions Gate netlist Reachability analysis State encoding Boolean minimization Logic decomposition Technology mapping Designflow
62
STG for the READ cycle LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ LDS LDTACK D DSr DTACK VME Bus Controller
63
Binary encoding of signals DSr+ DTACK- LDS- LDTACK- D- DSr-DTACK+ D+ LDTACK+ LDS+
64
Binary encoding of signals DSr+ DTACK- LDS- LDTACK- D- DSr-DTACK+ D+ LDTACK+ LDS+ 10000 10010 10110 0111001110 01100 00110 10110 (DSr, DTACK, LDTACK, LDS, D)
65
QR (LDS+) QR (LDS-) Excitation / Quiescent Regions ER (LDS+) ER (LDS-) LDS- LDS+ LDS-
66
Next-state function 0 1 LDS- LDS+ LDS- 1 0 0 0 1 1 10110
67
Karnaugh map for LDS DTACK DSr D LDTACK 00011110 00 01 11 10 DTACK DSr D LDTACK 00011110 00 01 11 10 LDS = 0 LDS = 1 01-0 000000/1? 1 111 - - - --- ---- - ---- ---
68
Outline Overview of the synthesis flow Specification State graph and next-state functions State encoding Implementability conditions Speed-independent circuit –Complex gates –C-element architecture
69
Specification (STG) State Graph SG with CSC Next-state functions Decomposed functions Gate netlist Reachability analysis State encoding Boolean minimization Logic decomposition Technology mapping Designflow
70
Concurrency reduction LDS- LDS+ LDS- 10110 DSr+
71
Concurrency reduction LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+
72
State encoding conflicts LDS- LDTACK- LDTACK+ LDS+ 10110
73
Signal Insertion LDS- LDTACK- D- DSr- LDTACK+ LDS+ CSC- CSC+ 101101 101100
74
Outline Overview of the synthesis flow Specification State graph and next-state functions State encoding Implementability conditions Speed-independent circuit –Complex gates –C-element architecture
75
Specification (STG) State Graph SG with CSC Next-state functions Decomposed functions Gate netlist Reachability analysis State encoding Boolean minimization Logic decomposition Technology mapping Designflow
76
Complex-gate implementation Under what conditions does a hazard-free implementation exist?
77
Implementability conditions Consistency –Rising and falling transitions of each signal alternate in any trace Complete state coding (CSC) –Next-state functions correctly defined Persistency –No event can be disabled by another event (unless they are both inputs)
78
Implementability conditions Consistency + CSC + persistency There exists a speed-independent circuit that implements the behavior of the STG (under the assumption that any Boolean function can be implemented with one complex gate)
79
Persistency 100000001 a- c+ b+b+ b+b+ a c b a c b is this a pulse ? Speed independence glitch-free output behavior under any delay
80
Speed-independent implementations How can the implementability conditions –Consistency –Complete state coding –Persistency be satisfied? Standard circuit architectures: –Complex (hazard-free) gates –C elements with monotonic covers –“Standard” gates and latches
81
a+ b+ c+ d+ a- b- d- a+ c-a- 0000 1000 1100 0100 0110 0111 1111 10111011 00111001 0001 a+ b+ c+ a- b- c- a+ c- a- d- d+
82
0000 1000 1100 0100 0110 0111 1111 10111011 00111001 0001 a+ b+ c+ a- b- c- a+ c- a- d- d+ ab cd 00011110 00 01 11 10 1 11 11 1 0 0000 ER(d+) ER(d-)
83
ab cd 00011110 00 01 11 10 1 11 11 1 0 0000 0000 1000 1100 0100 0110 0111 1111 10111011 00111001 0001 a+ b+ c+ a- b- c- a+ c- a- d- d+ Complex gate
84
Implementation with C elements C R S z S+ z+ S- R+ z- R- S (set) and R (reset) must be mutually exclusive S must cover ER(z+) and must not intersect ER(z-) QR(z-) R must cover ER(z-) and must not intersect ER(z+) QR(z+)
85
ab cd 00011110 00 01 11 10 1 11 11 1 0 0000 0000 1000 1100 0100 0110 0111 1111 10111011 00111001 0001 a+ b+ c+ a- b- c- a+ c- a- d- d+ C S R d
86
0000 1000 1100 0100 0110 0111 1111 10111011 00111001 0001 a+ b+ c+ a- b- c- a+ c- a- d- d+ C S R d but...
87
0000 1000 1100 0100 0110 0111 1111 10111011 00111001 0001 a+ b+ c+ a- b- c- a+ c- a- d- d+ C S R d Assume that R=ac has an unbounded delay Starting from state 0000 (R=1 and S=0): a+ ; R- ; b+ ; a- ; c+ ; S+ ; d+ ; R+ disabled (potential glitch)
88
ab cd 00011110 00 01 11 10 1 11 11 1 0 0000 0000 1000 1100 0100 0110 0111 1111 10111011 00111001 0001 a+ b+ c+ a- b- c- a+ c- a- d- d+ C S R d Monotonic covers
89
C-based implementations C S R d C d a b c a b c d weak generalized C element (gC)
90
Synthesis exercise y- z-w- y+x+ z+ x- w+ 1011 0111 0011 1001 1000 1010 0001 00000101 00100100 0110 y- y+ x- x+ w+ w- z+ z- w- z- y+ x+ Derive circuits for signals x and z (complex gates and monotonic covers)
91
Synthesis exercise 1011 0111 0011 1001 1000 1010 0001 00000101 00100100 0110 y- y+ x- x+ w+ w- z+ z- w- z- y+ x+ wx yz 00011110 00 01 11 10 - - - - Signal x 1 0 1 1 1 1 1 0 0 0 0 0
92
Synthesis exercise 1011 0111 0011 1001 1000 1010 0001 00000101 00100100 0110 y- y+ x- x+ w+ w- z+ z- w- z- y+ x+ wx yz 00011110 00 01 11 10 - - - - Signal z 1 0 0 0 0 1 1 1 0 0 0 0
93
Introduction to asynchronous circuit design: specification and synthesis Part III: Advanced topics on synthesis of control circuits from STGs
94
Outline Logic decomposition –Hazard-free decomposition –Signal insertion –Technology mapping Optimization based on timing information –Relative timing –Timing assumptions and constraints –Automatic generation of timing assumptions
95
Specification (STG) State Graph SG with CSC Next-state functions Decomposed functions Gate netlist Reachability analysis State encoding Boolean minimization Logic decomposition Technology mapping Designflow
96
No Hazards a b c x 0 abcx 1000 1100 b+ 0100 a- 0110 c+ 1 1 0 0 1 1 0 1 0 1 0 0
97
Decomposition May Lead to Hazards abcx 1000 1100 b+ 0100 a- 0110 c+ a b z c x 1 0 0 0 0 1000 1100 0100 0110 1 1 0 0 0 1 1 1 0 0 0 1 1 0 0 0 1 1 1 1 0 1 0 1 0
98
Decomposition Acknowledgement Generating candidates Hazard-free signal insertion –Event insertion –Signal insertion
99
Global acknowledgement a b c z a b d y d-b+d+y+a-y-c+d- c-d+z-b-z+c+a+c-
100
a b c z a b d y How about 2-input gates ? d-b+d+y+a-y-c+d- c-d+z-b-z+c+a+c-
101
a b c z a b d y d-b+d+y+a-y-c+d- c-d+z-b-z+c+a+c- How about 2-input gates ?
102
a b c z a b d y 0 0 d-b+d+y+a-y-c+d- c-d+z-b-z+c+a+c- How about 2-input gates ?
103
a b c z a b d y d-b+d+y+a-y-c+d- c-d+z-b-z+c+a+c- How about 2-input gates ?
104
c z d y a b d-b+d+y+a-y-c+d- c-d+z-b-z+c+a+c- How about 2-input gates ?
105
Strategy for logic decomposition Each decomposition defines a new internal signal Method: Insert new internal signals such that –After resynthesis, some large gates are decomposed –The new specification is hazard-free Generate candidates for decomposition using standard logic factorization techniques: –Algebraic factorization –Boolean factorization (boolean relations)
106
y- z-w- y+x+ z+ x- w+ Decomposition example 10011011 1000 1010 0001 00000101 00100100 01100111 0011 y- y+ x- x+ w+ w- z+ z- w- z- y+ x+ wxyz
107
yz=1 yz=0 10011011 1000 1010 0001 00000101 00100100 01100111 0011 y- y+ x- x+ w+ w- z+ z- w- z- y+ x+ 10011011 1000 1010 0001 00000101 00100100 01100111 0011 y- y+ x- x+ w+ w- z+ z- w- z- y+ x+ C C x y x y w z x y z y z w z w z y
108
y- z-w- y+x+ z+ x- w+ s- s+ s- s+ s- s=1 s=0 10011011 1000 1010 0111 0011 y+ x- w+ z+ z- 0001 00000101 00100100 0110 x+ w- z- y+ x+ 1001 1000 1010 y+ z- 0111 y-
109
s- s+ s- s=1 s=0 10011011 1000 1010 0111 0011 y+ x- w+ z+ z- 0001 00000101 00100100 0110 x+ w- z- y+ x+ 1001 1000 1010 y+ z- 0111 C C x y x y w z x y z w z w z y s y-
110
C C x y x y w z x y z y z w z w z y yz=1yz=0 10011011 1000 1010 0001 00000101 00100100 01100111 0011 y- y+ x- x+ w+ w- z+ z- w- z- y+ x+ 1011 1000 1010 0001 00000101 00100100 01100111 0011 y- y+ x- x+ w+ w- z+ z- w- z- y+ x+ 1001
111
s- s+ s=1 s=0 1001 1011 0111 0011 x- w+ z+ 0001 00000101 00100100 0110 x+ w- z- y+ x+ 1001 1000 1010 y+ z- 0111 y- z-w- y+x+ z+ x- w+ s- s+ z- is delayed by the new transition s- !
112
C C x y x y w z x y z w z w z yyyyyyy s- s+ s=1 s=0 1001 1011 0111 0011 x- w+ z+ 0001 00000101 00100100 0110 x+ w- z- y+ x+ 1001 1000 1010 y+ z- 0111 y-
113
F C Sr D Decomposition (Algebraic, Boolean relations) Hazard-free ? (Event insertion) NO YES C C C C Sr D D
114
F C D Hazard-free ? (Event insertion) NO YES C C Sr D until no more progress Decomposition (Algebraic, Boolean relations)
115
Signal insertion for function F State Graph F=0F=1 Insertion by input borders F- F+
116
Event insertion a b ER(x) c
117
Event insertion a b ER(x) c x x x x b SR(x) a
118
Properties to preserve a a b b a a b b a a b b x a a b b a a b b b a a b b x x a is persistent a is disabled by b = hazards
119
Boolean decomposition F x1x1 xnxn f HG x1x1 xnxn h1h1 hmhm f f = F (x 1,…,x n )f = G(H(x 1,…,x n )) Our problem: Given F and G, find H
120
C h1h1 h2h2 f state f next(f) (h 1,h 2 ) s 1 0 0 (0,-) (-,0) s 2 0 1 (1,1) s 3 1 0 (0,0) s 4 1 1 (-,1) (1,-) dc - - (-,-) This is a Boolean Relation
121
y- a+c- d- a- c+ a+ y+ a- c- d+ c+ y a c d F Rs y R S
122
y- a+c- d- a- c+ a+ y+ a- c- d+ c+ y a c d Rs y a c d c d
123
y- a+c- d- a- c+ a+ y+ a- c- d+ c+ y a c d Rs ya
124
y- a+c- d- a- c+ a+ y+ a- c- d+ c+ y a c d Rs ya D d c
125
Technology mapping Merging small gates into larger gates introduces no new hazards Standard synchronous technique can be applied, e.g. BDD-based boolean matching Handles sequential gates and combinational feedbacks Due to hazards there is no guarantee to find correct mapping (some gates cannot be decomposed) Timing-aware decomposition can be applied in these rare cases
126
Specification (STG) State Graph SG with CSC Next-state functions Decomposed functions Gate netlist Reachability analysis State encoding Boolean minimization Logic decomposition Technology mapping Designflow
127
Timing assumptions in design flow Speed-independent: wire delays after a fork smaller than fan-out gate delays Burst-mode: circuit stabilizes between two changes at the inputs Timed circuits: Absolute bounds on gate / environment delays are known a priori (before physical design)
128
Relative Timing Circuits Assumptions: “a before b” –for concurrent events: reduces reachable state space –for ordered events: permits early enabling –both increase don’t care space for logic synthesis => simplify logic (better area and timing) “Assume - if useful - guarantee” approach: assumptions are used by the tool to derive a circuit and required timing constraints that must be met in physical design flow Applied to design of the Rotating Asynchronous Pentium Processor(TM) Instruction Decoder (K.Stevens, S.Rotem et al. Intel Corporation)
129
Speed-independent C-element Relative Timing Asynchronous Circuits a- before b- Timing assumption (on environment): a b c RT C-element: faster,smaller; correct only under timing constraint: a- before b- a b c
130
State Graph (Read cycle) DSr+ DTACK- LDS- LDTACK- D- DSr-DTACK+ D+ LDTACK+ LDS+
131
Lazy Transition Systems ER (LDS+) ER (LDS-) LDS- LDS+ LDS- DTACK- FR (LDS-) Event LDS- is lazy: firing = subset of enabling
132
Timing assumptions (a before b) for concurrent events: concurrency reduction for firing and enabling (a before b) f or ordered events: early enabling (a simultaneous to b wrt c) for triples of events: combination of the above
133
Speed-independent Netlist LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ DTACK D DSr LDS LDTACK csc map
134
Adding timing assumptions (I) LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ DTACK D DSr LDS LDTACK csc map LDTACK- before DSr+ FAST SLOW
135
Adding timing assumptions (I) DTACK D DSr LDS LDTACK csc map LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ LDTACK- before DSr+
136
State space domain LDTACK- before DSr+ LDTACK- DSr+
137
State space domain LDTACK- before DSr+ LDTACK- DSr+
138
State space domain LDTACK- before DSr+ LDTACK- DSr+ Two more unreachable states
139
Boolean domain DTACK DSr D LDTACK 00011110 00 01 11 10 DTACK DSr D LDTACK 00011110 00 01 11 10 LDS = 0 LDS = 1 01-0 000000/1? 1 111 - - - --- ---- - ---- ---
140
Boolean domain DTACK DSr D LDTACK 00011110 00 01 11 10 DTACK DSr D LDTACK 00011110 00 01 11 10 LDS = 0 LDS = 1 01-0 00-001 1 111 - - - --- ---- - ---- --- One more DC vector for all signalsOne state conflict is removed
141
Netlist with one constraint LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ DTACK D DSr LDS LDTACK csc map
142
Netlist with one constraint LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ DTACK D DSr LDS LDTACK LDTACK- before DSr+ TIMING CONSTRAINT
143
Timing assumptions (a before b) for concurrent events: concurrency reduction for firing and enabling (a before b) f or ordered events: early enabling (a simultaneous to b wrt c) for triples of events: combination of the above
144
Ordered events: early enabling a c b a a c b a b b c c F G Logic for gate c may change
145
Adding timing assumptions (II) LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ DTACK D DSr LDS LDTACK D- before LDS-
146
State space domain LDS- D- Reachable space is unchanged For LDS- enabling can be changed in one state D- before LDS- Potential enabling for LDS- DSr-
147
Boolean domain DTACK DSr D LDTACK 00011110 00 01 11 10 DTACK DSr D LDTACK 00011110 00 01 11 10 LDS = 0 LDS = 1 01-0 00-001 1 111 - - - --- ---- - ---- ---
148
Boolean domain DTACK DSr D LDTACK 00011110 00 01 11 10 DTACK DSr D LDTACK 00011110 00 01 11 10 LDS = 0 LDS = 1 01-0 00-001 1 11 - - - - --- ---- - ---- --- One more DC vector for one signal: LDS If used: LDS = DSr, otherwise: LDS = DSr + D
149
Before early enabling LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ DTACK D DSr LDS LDTACK
150
Netlist with two constraints LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ LDTACK- before DSr+ and D- before LDS- TIMING CONSTRAINTS DTACK D DSr LDS LDTACK Both timing assumptions are used for optimization and become constraints
151
Rule I (out of 6): a,b - non-input events –Untimed ordering: a||b and a enabled before b, but not vice versa –Derived assumption: a fires before b –Justification: delay of a gate can be made shorter than delay of two (or more) gates: del(a) < del(c)+del(b) Deriving automatic timing assumptions aaa b b b c c
152
Rule I (out of 6): a,b - non-input events –Untimed ordering: (a||b) and (a enabled before b), but not vice versa –Derived assumption: a fires before b –Justification: delay of a gate can be made shorter than delay of two (or more) gates Deriving automatic timing assumptions aaa b b b c c –Effect I: a state becomes DC for all signals
153
Rule I (out of 6): a,b - non-input events –Untimed ordering: (a||b) and (a enabled before b), but not vice versa –Derived assumption: a fires before b –Justification: delay of a gate can be made shorter than delay of two (or more) gates Deriving automatic timing assumptions aaa b b b c c –Effect II: another state becomes local DC for signal of event b
154
Backannotation of Timing Constraints Timed circuits require post-verification Can synthesis tools help ? –Report the least stringent set of timing constraints required for the correctness of the circuit –Not all initial timing assumptions may be required Petrify reports a set of constraints for order of firing that guarantee the circuit correctness
155
Timing constraints generation a b c d e d d e e b b c c d a Assumptions: d before b and c before e and a before d
156
Timing constraints generation a b c d e Assumptions: d before b and c before e and a before d d d e e b b c c d a
157
Timing constraints generation a b c d e Assumptions: d before b and c before e and a before d d d e e b b c c Correct behavior d a
158
Timing constraints generation a b c d e Assumptions: d before b and c before e and a before d d d e e b b c c 1 2 Incorrect behavior d a
159
Covering incorrect behavior a b c d e Assumptions: d before b and c before e and a before d d d e e b b c c 1 24 3 {1, 3} d before b {1} d before c d a 5 {2, 4} c before e Other possible constraints remove states from assumption domain => invalid
160
Covering incorrect behavior a b c d e Assumptions: d before b and c before e and a before d d d e e b b c c 1 24 3 {1} d before c d a 5 {2, 4} c before e Constraints for the minimal cost solution: d before c and c before e
161
Timing aware state encoding Solve only state conflicts reachable in the RT assumptions domain Generate automatic timing assumptions for inserted state signals => state signals can be implemented as RT logic State variables inserted concurrently with I/O events => latency and cycle time reduction
162
Value of Relative Timing RT circuits provides up to 2-3x (1.3-2x) delay&area reduction with respect to SI circuits synthesized without (with) concurrency reduction Automatic generation of timing assumptions => foundation for automatic synthesis of RT circuits with area/performance comparable/better than manual Back-annotation of timing constraints => minimal required timing information for the back-end tools Timing-aware state encoding allows significant area/performance optimization
163
Specification (STG + user assumptions) Lazy State Graph Lazy SG with CSC Next-state functions Decomposed functions Gate netlist Reachability analysis Timing-aware state encoding Boolean minimization Logic decomposition Technology mapping Design Flow with Timing Required Timing Constraints Automatic Timing Assumptions
164
FIFO example FIFO li lo ro ri li- li+ lo+ lo- ro+ ro- ri+ ri-
165
Speed-Independent Implementation without concurrency reduction 3 state signals are required
166
SI implementation with concurrency reduction li lo ro ri x li- li+ lo+ lo- ro+ ro- ri+ ri- x+ x- + gC + -
167
RT implementation li lo ro ri x li- li+ lo+ lo- ro+ ro- ri+ ri- x+ x- OR li- li+ lo+ lo- ro+ ro- ri+ ri- x+ x-
168
RT implementation li lo ro ri x li- li+ lo+ lo- ro+ ro- ri+ ri- x+ x- OR li- li+ lo+ lo- ro+ ro- ri+ ri- x+ x- To satisfy the constraint: Delay(x- ) < Delay (ri+ ) and Delay(lo+) + Delay(x- ) < Delay(ro+ ) + Delay (ri+ ) All constraints are either satisfied by default or easy to satisfy by sizing
169
Introduction to asynchronous circuit design: specification and synthesis Part IV: Synthesis from HDL Other synthesis paradigms
170
Outline Synthesis from standard HDL (Verilog) [L. Lavagno et al Async00] –Subset for asynchronous specification –Data-path/control partitioning –Circuit architecture. Control generation Synthesis from asynchronous HDL (CSP, Tangram) –CSP for control generation [A. Martin et al, Caltech] –Tangram for silicon compilation [K. van Berkel et al, Philips] Control synthesis using FSMs [K. Yun, S. Nowick] –Burst-mode machines –Comparison with STGs Disclaimer: this is NOT a comprehensive review
171
Motivation Language-based design key enabler to synchronous logic success Use HDL as single language for specification logic simulation and debugging synthesis post-layout simulation HDL must support multiple levels of abstraction
172
Control-data partitioning Splitting of asynchronous control and synchronous data path Automated insertion of bundling delays CONTROL UNIT DATA PATH delay request acknowledge
173
Design flow Control/data splitting STG (control) HDL specification Synthesizable HDL (data) Synthesis (petrify) Timing analysis (Synopsys) HDL implementation Synthesis (Synopsys) Logic implementation Delay insertion Logic delays
174
Asynchronous Verilog subset by example always begin wait(start); R = SMP * 3; RES = SMP * 4 + R; if(RES[7] == 1) RES = 0; else begin if(RES[6] == 1) RES = 1; end; done = 1; wait(!start); done = 0; end R RESRES SMP donestart RES C.U. begin-end for sequencing, fork-join for concurrency, if- else for input choice Only structured mix of sequencing, concurrency and choice can be specified
175
Controller design flow Trace Expressions Circuit Petri Net Transformations Reductions Synthesis HDL Syntax-directed translation
176
Trace expressions: example ( a || ( b ; c) ) || (d e) || ; a bc de
177
Reduction Example a fb c dg h e d; a; ( b || f ) c g; h; e
178
Transformation: concurrency reduction a fb c d ; || a bcdf ; ; Concurrency in TE: b and f have a common parallel father
179
a fb c d f and b are ordered ; || a bcdf ; ; ; Transformation: concurrency reduction
180
Synthesis Place-based encoding ( based on a David-cell approach) Transformations to improve area and performance Structural methods to derive a circuit [Pastor et al.] Transactions on CAD, Nov’98
181
Place-based encoding p1 p2 p3 p4 t1 t2 p3+ p1- p2- p4+ p3- t1 t2 p1+ p2+ p4- 1100 0010 0001 ER(t1) = 111- ER(t2) = --11
182
Synthesis example: VME bus p2+ ldtack+ p8-p11- lds+ p1+ D+ p3+ p1- p2- p4+ dtack+ p3- p5+ dsr- p4- p9+p6+ D-p5- p10+p7+ lds-dtack- p9-p6- p11+ ldtack-p8+ dsr+ p10- p7- LDTACK+ D+ DTACK+ DSr- D- DTACK-LDS- LDTACK-DSr+ LDS+ Place encoding
183
VME bus spec after transforms p2+ ldtack+ p8-p11- lds+ p1+ D+ p3+ p1- p2- p4+ dtack+ p3- p5+ dsr- p4- p9+p6+ D-p5- p10+p7+ lds-dtack- p9-p6- p11+ ldtack-p8+ dsr+ p10- p7- ldtack+ lds+d+ dtack+ dsr-p9+ d- lds-dtack- p9-ldtack- dsr+ Reductions Transforms
184
Deriving Next state function x+ z+ z- y- x- y+ p1 p2 p3 p4 p5 p6 p7 Next-state function of signal y ? 000 1-0 1-1 0-1 -0- -1- 010
185
Deriving Next State function x+ z+ z- y- x- y+ p1 p2 p3 p4 p5 p6 p7 Next-state function of signal y ? 000 1-0 1-1 0-1 10- 11- -11 010 y = x + z
186
Conclusion Initial prototype of automated flow without state explosion for ASIC design –From HDLs (control / data splitting) –Existing tools for data-path synthesis –Direct synthesis guarantees implementation (HDL Petri net, Petri-net-based encoding) –Synthesis of large controllers by efficient spec models (Free-choice Petri nets + trace expressions) –Exploration of the design space (optimization) by property-preserving transformations –Logic synthesis by structural methods Quality of design often acceptable Timing post-optimization can be applied
187
Synthesis from asynchronous HDL CSP based languages CSP = communicating sequential processes [T. Hoare] Two synthesis techniques –based on program transformations [Caltech] –based on direct compilation [Philips] Tools are more mature than for asynchronous synthesis from standard HDL Complete shift in design methodology is required
188
Using CSP for control generation After li goes high do full handshake at the right, then complete handshake at the left and iterate. li+ro+ri+ro-ri-lo+li-lo- ro ri li lo Q element *[[li];ro+;[ri];ro-;[not ri];lo+;[not li];lo-] “;” = sequencing operator ro+ = ro goes high; ro- = ro goes low [li] = wait until li is high; [not li] = wait until li is low CSP: STG:
189
Using CSP for control generation *[[li];ro+;[ri];ro-;[not ri];lo+;[not li];lo-] Conflict: ro+ and ro- are not mutually exclusive (since ri+ and li+ are not) Eliminate conflict by state signal insertion (= CSC) CSP: Production rules: li -> ro+; ri -> ro- not ri -> lo+; not li -> lo- ri li ro weak
190
Conflict elimination *[[li];ro+;[ri];x+;[x];ro-;[not ri];lo+;[not li];x-;[not x];lo-] CSP: Production rules: not x and li -> ro+; x or not li -> ro- x and not ri -> lo+; not x or ri -> lo- ri -> x+; not li -> x- FF x not x li lo ri ro
191
Conclusions Generating circuits from CSP control program is similar to STG synthesis One can be reduced to the other Particular technique may vary. Direct CSP program transformations can be (and were) used instead of methods based on state space generation See reference list for more details
192
Buffer example in Tangram (a?byte & b!byte) begin x0: var byte | forever do a?x0 ; b!x0 od end Buffer * x a b T ; T a b passive port active port Each circle mapped to a netlist Data path Q element
193
Summary Tangram program is partitioned into data path and control Data path is implemented as dual or single rail Control is mapped to composition of standard elements (“;” “||” etc) Each standard element is mapped to a circuit Post-optimization is done Composing islands of control elements and re-synthesis with STG can give more aggressive optimization Philips made a few chips using Tangram, including a product: 8051 micro-controller in low-power pager Muna (25 wks battery life from one AAA battery) Similar approach used in Balsa (Manchester Univ., public domain)
194
Burst mode FSM s1 s2 s3 s4 b-/x- a+b+/y+ a-/x+y- c+/y- c-/y+ Close to synchronous FSMs with binary encoded I/O Work in bursts: –Input transitions fire –Output transitions fire –State signals change Mostly limited to fundamental mode: next input burst cannot arrive before stabilization at the outputs
195
Extended Burst mode s1 s2 s3 s4 b-/x- a+b*/y+ a-/x+y- c+/y- c-/y+ Directed don’t cares (b*): some concurrency is allowed for input transitions that do not influence an output burst Conditional guards = “if b=1 then …”
196
Synthesis of XBM Next state and output functions free of functional and logic hazards Sequential feedbacks should not introduce new hazards State assignment –one state of the BM spec to one layer of Karnaugh map –compatible layers are merged –layers are compatible if merging does not introduce CSC violations or hazards –Layers are encoded using race free encoding
197
XBM and STG s1 s2 s3 s4 b-/x- a+b*/y+ a-/x+y- c+/y- c-/y+ x- a+ y+ b+ eps c- a- c+ y- y+ x+ y- b-
198
Summary Specification: XBM is subclass of STGs Synthesis: techniques are extensions of synchronous state assignment and logic minimization Timing: –environment is limited to fundamental mode (difficult for pipelined and highly concurrent systems) –internals are delay insensitive See reference list for details
199
Summary Specification: Signal Transition Graph (formalized timing diagram) Synthesis: –state encoding –Boolean function derivation –algebraic and Boolean sequential decomposition –technology mapping Timing: –delay model implies timing constraints –exploiting timing assumptions leads to minimization and generates further assumptions Future work: –integrated flow –testing
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.