Asynchronous Interface Specification, Analysis and Synthesis J. Cortadella Universitat Politècnica de Catalunya, Barcelona M. Kishinevsky Intel Corporation Thanks to: Alex Kondratyev (The University of Aizu)
What is it about? This tutorial is not about asynchronous signals coming into a synchronous circuit, hence not about synchronization problem It is about asynchronous circuits communicating via handshakes or timing assumptions Note: slides contain no references. All references can be found in the paper
Motivation Interfaces are often asynchronous Subsystems with different clocks often want to talk to each other Self timing provides functional and temporal modularity … and no clock skew, low power, low EMI, average performance, ...
Design flow Specification (STG) State Graph SG with CSC Next-state functions Decomposed functions Gate netlist Reachability analysis State encoding Boolean minimization Logic decomposition Technology mapping Design flow
x x y y z z z+ x- x+ y+ z- y- Signal Transition Graph (STG)
x y z x+ x- y+ y- z+ z-
xyz 000 100 101 110 111 x+ x- y+ y- z+ z- 001 011 010 x+ z+ y+ x- y-
Synchronous xyz 000 100 101 110 111 001 011 010 Asynchronous x+ z+ y+ Current state Next state Asynchronous Current state Next state
Next-state functions xyz 000 100 101 110 111 001 011 010 x+ z+ y+ x-
Next-state functions x z y
Design flow Specification (STG) State Graph SG with CSC Reachability analysis State Graph State encoding SG with CSC Design flow Boolean minimization Next-state functions Logic decomposition Decomposed functions Technology mapping Gate netlist
VME bus Device Read Cycle Bus DSr LDS LDTACK D DTACK LDS LDTACK D DSr DSw DTACK VME Bus Controller Data Transceiver Bus
STG for the READ cycle DSr+ DTACK- LDS+ LDTACK+ D+ DTACK+ DSr- D- VME Bus Controller LDTACK DTACK
Choice: Read and Write cycles DSr+ LDS+ LDTACK+ D+ DTACK+ DSr- D- LDS- LDTACK- DTACK- DSw+ D+ LDS+ LDTACK+ D- DTACK+ DSw- LDS- LDTACK- DTACK-
Choice: Read and Write cycles DTACK- DSr+ LDS+ LDTACK+ D+ DTACK+ DSr- D- LDS- LDTACK- DSw+ DSw-
Choice: Read and Write cycles DTACK- DSr+ LDS+ LDTACK+ D+ DTACK+ DSr- D- LDS- LDTACK- DSw+ DSw-
Choice: Read and Write cycles DTACK- DSr+ LDS+ LDTACK+ D+ DTACK+ DSr- D- LDS- LDTACK- DSw+ DSw-
Circuit synthesis Goal: Derive a hazard-free circuit under a given delay model and mode of operation
Modes of operation Fundamental mode Input / Output mode Single-input changes Multiple-input changes Input / Output mode Concurrency circuit / environment Current state Next state
STG for the READ cycle DSr+ DTACK- LDS+ LDTACK+ D+ DTACK+ DSr- D- VME Bus Controller LDTACK DTACK
Speed independence Delay model Conditions for implementability: Unbounded gate / environment delays Certain wire delays shorter than certain paths in the circuit Conditions for implementability: Consistency Complete State Coding Output persistency
Other synthesis approaches Burst-mode machines Mealy-like FSMs Fundamental mode (slow environment) VLSI programming Syntax-directed translation from CSP (“Communicating Sequential Processes”) No logic synthesis Circuit size ~ Size of the specification
Design flow Specification (STG) State Graph SG with CSC Reachability analysis State Graph State encoding SG with CSC Design flow Boolean minimization Next-state functions Logic decomposition Decomposed functions Technology mapping Gate netlist
State Graph (Read cycle) DSr+ DTACK- LDS+ LDTACK- LDTACK- LDTACK- DSr+ DTACK- LDTACK+ LDS- LDS- LDS- DSr+ DTACK- D+ D- DTACK+ DSr-
Binary encoding of signals LDS = 0 LDS = 1 LDS - LDS + DSr+ DTACK- LDS+ LDTACK- LDTACK- LDTACK- DSr+ DTACK- LDTACK+ LDS- LDS- LDS- DSr+ DTACK- D+ D- DTACK+ DSr-
Binary encoding of signals 10000 DSr+ DTACK- LDS+ LDTACK- LDTACK- LDTACK- 10010 DSr+ DTACK- 01100 00110 LDTACK+ LDS- LDS- LDS- DSr+ DTACK- 10110 01110 10110 D+ D- DTACK+ DSr- (DSr , DTACK , LDTACK , LDS , D)
Excitation / Quiescent Regions ER (LDS+) ER (LDS-) QR (LDS+) QR (LDS-) LDS- LDS+
Next-state function 0 1 LDS- LDS+ 0 0 1 1 1 0 10110
Karnaugh map for LDS - 1 - - - 1 - - - - - - - - - - - - - 1 1 1 - - DTACK DSr D LDTACK 00 01 11 10 DTACK DSr D LDTACK 00 01 11 10 - 1 - - - 1 - - - - - - - - - - - - - 1 1 1 - - 0/1?
Design flow Specification (STG) State Graph SG with CSC Reachability analysis State Graph State encoding SG with CSC Design flow Boolean minimization Next-state functions Logic decomposition Decomposed functions Technology mapping Gate netlist
Concurrency reduction DSr+ LDS+ LDS- LDS- LDS- 10110 10110
Concurrency reduction DSr+ DTACK- LDS+ LDTACK+ D+ DTACK+ DSr- D- LDTACK- LDS-
State encoding conflicts LDS+ LDTACK- LDTACK+ LDS- 10110 10110
Signal Insertion 101101 101100 CSC- CSC+ LDS+ LDTACK- LDTACK+ LDS- D- DSr-
Design flow Specification (STG) State Graph SG with CSC Reachability analysis State Graph State encoding SG with CSC Design flow Boolean minimization Next-state functions Logic decomposition Decomposed functions Technology mapping Gate netlist
Complex-gate implementation
Design flow Specification (STG) State Graph SG with CSC Reachability analysis State Graph State encoding SG with CSC Design flow Boolean minimization Next-state functions Logic decomposition Decomposed functions Technology mapping Gate netlist
Hazards abcx 1000 1100 b+ 1 1 1 1 a b c x 0100 a- 0110 c+
Hazards abcx 1000 1100 b+ 0100 a- 0110 c+ 1000 1 1 1 1 1 1 1 1 a b z c 1 1 1 1 1 1 1 1 a b z c x 1100 1100 0100 0110
Decomposition Global acknowledgement Generating candidates Hazard-free signal insertion Event insertion Signal insertion
Global acknowledgement y+ a- y- c+ c- z- b- z+ a+ a b c z d y
How about 2-input gates ? c z b a y d d- b+ d+ y+ a- y- c+ c- z- b- z+
How about 2-input gates ? c z b a a y b d d- b+ d+ y+ a- y- c+ c- z-
How about 2-input gates ? c z b a a y b d d- b+ d+ y+ a- y- c+ c- z- c z b a a y b d
How about 2-input gates ? c z b a a y b d d- b+ d+ y+ a- y- c+ c- z-
How about 2-input gates ? c z a b y d d- b+ d+ y+ a- y- c+ c- z- b- z+
Strategy for correct logic decomposition Each decomposition defines a new internal signal of the circuit Method: Insert new internal signals such that After resynthesis, some large gates are decomposed The new specification is hazard-free under unbounded gate delays
- Algebraic factorization Decomposition -Boolean relations - Algebraic factorization C Sr D F C C D until no more progress Sr Sr Hazard-free ? (Signal insertion) C D C NO YES
Decomposition example 1001 1011 1000 1010 0001 0000 0101 0010 0100 0110 0111 0011 y- y+ x- x+ w+ w- z+ z- y- z- w- y+ x+ z+ x- w+
C x y w z 1001 1011 1000 1010 0001 0000 0101 0010 0100 0110 0111 0011 y- y+ x- x+ w+ w- z+ z- 1001 1011 1000 1010 0001 0000 0101 0010 0100 0110 0111 0011 y- y+ x- x+ w+ w- z+ z- yz=1 yz=0
s C x y w z y- s- s+ s=1 s=0 1001 1011 1000 1010 0111 0011 y+ x- w+ z+ 0001 0000 0101 0010 0100 0110 x+ w-
s- s+ s- s+ s- s+ s- s+ s- s+ s- s+ s- s+ s- s+ y- z- w- y+ x+ z+ x- 1001 1011 1000 1010 0111 0011 y+ x- w+ z+ z- 0001 0000 0101 0010 0100 0110 x+ w- s- s- s+ s- s+ s- s+ s- s+ s- s+ s- s+ s- s+ s- s+ s- s- s+ s=0
C x y w z 1001 1011 1000 1010 0001 0000 0101 0010 0100 0110 0111 0011 y- y+ x- x+ w+ w- z+ z- yz=1 yz=0
s- s+ s- s+ s- s+ s- s+ s- s+ s- s+ s- s+ s- s+ y- z- w- w+ y+ x+ x- 1001 1011 s- s- s+ s- s+ s- s+ s- s+ s- s+ s- s+ s- s+ s- s+ 1001 w+ z- 0001 0000 0101 0010 0100 0110 x+ w- z- y+ 0011 1000 z- w- w+ y+ x- 1010 y+ x+ x- 0111 s+ s=0 z+ z+ 0111 z- is delayed by the new transition s- !
C x y w z 1001 1011 1000 1010 0001 0000 0101 0010 0100 0110 0111 0011 y- y+ x- x+ w+ w- z+ z- 1001 1011 1000 1010 0001 0000 0101 0010 0100 0110 0111 0011 y- y+ x- x+ w+ w- z+ z- yz=1 yz=0 y y y y y y y
Design flow Specification (STG) State Graph SG with CSC Reachability analysis State Graph State encoding SG with CSC Design flow Boolean minimization Next-state functions Logic decomposition Decomposed functions Technology mapping Gate netlist
Technology mapping BDD-based Boolean matching Handles sequential gates and combinational feedbacks Merging small gates into larger gates introduces no new hazards
Timing optimization Specification (STG) State Graph SG with CSC Reachability analysis State Graph State encoding SG with CSC Timing optimization Boolean minimization Next-state functions Logic decomposition Decomposed functions Technology mapping Gate netlist
Timing optimization If exact timing bounds are unknown, use relative timing assumptions “a+ before b+” a+ b+
Timing assumptions For concurrent events For ordered events concurrency reduction (some states become unreachable) For ordered events early enabling (but without introducing new reachable states)
READ control in 2-input gates DSr+ DTACK- LDS+ LDTACK+ D+ DTACK+ DSr- D- LDTACK- LDS- D DTACK LDS map csc DSr LDTACK
Adding timing assumptions (I) LDS+ LDTACK+ D+ DTACK+ DSr- D- DTACK- LDS- LDTACK- DSr+ LDTACK- before DSr+ D DTACK FAST SLOW LDS map csc DSr LDTACK
Adding timing assumption (I) LDS+ LDTACK+ D+ DTACK+ DSr- D- DTACK- LDS- LDTACK- DSr+ DTACK D DSr LDS LDTACK LDTACK- before DSr+ TIMING CONSTRAINT
Timing optimization for ordered events Early enabling: b a c b a b c c a b c
Adding timing assumptions (II) DSr+ DTACK- LDS+ LDTACK+ D+ DTACK+ DSr- D- D- before LDS- LDTACK- LDS- D DTACK LDS DSr LDTACK
Adding timing assumptions (II) DSr+ DTACK- LDS+ LDTACK+ D+ DTACK+ DSr- D- LDTACK- LDS- DTACK D DSr LDS LDTACK LDTACK- before DSr+ and D- before LDS- TIMING CONSTRAINT
Formal verification Property verification Implementation verification Implementability properties Consistency, persistency, state coding … Behavioral properties (safeness, liveness) Mutual exclusion, “ack” after “req”, ... Implementation verification Circuit Specification Circuit < Specification
Fighting the state explosion FSM-like verification methods: Symbolic methods (BDDs) Partial order reductions Petri-net based methods: Petri net unfoldings Structural theory (invariants)
Testing of asynchronous circuits Can be more expensive than for synchronous Scan-based and IDDQ techniques can be used for stuck-at, delay and bridging faults There are ATPG techniques
Summary Asynchronous design is applicable to asynchronous interfaces high-performance computing low-power design low-emission design There is an increasing interest of a few companies: Intel, Philips, Sun, Sharp, ARM, HP, Cogency
Summary (continued) Asynchronous circuits are more difficult to design than synchronous Clock distribution and on-die variations makes synchronous design more difficult CAD support is crucial CAD tools have matured All synthesis steps of the design flow presented in this tutorial are supported by the tool Petrify