Introduction to asynchronous circuit design: specification and synthesis Jordi Cortadella, Universitat Politècnica de Catalunya, Spain Michael Kishinevsky, Intel Corporation, USA Alex Kondratyev, Theseus Logic, USA Luciano Lavagno, Università di Udine, Italy
Outline I: Introduction to basic concepts on asynchronous design II: Synthesis of control circuits from STGs III: Advanced topics on synthesis of control circuits from STGs IV: Synthesis from HDL and other synthesis paradigms Note: no references in the tutorial
Part I: Introduction to basic concepts on asynchronous circuit design Introduction to asynchronous circuit design: specification and synthesis Part I: Introduction to basic concepts on asynchronous circuit design
Outline What is an asynchronous circuit ? Asynchronous communication Asynchronous logic blocks Micropipelines Control specification and implementation Delay models Why asynchronous circuits ?
Synchronous circuit R CL R CL R CL R CLK Implicit synchronization
Asynchronous circuit Explicit synchronization: Req/Ack handshakes Ack CL R CL R CL R Req Explicit synchronization: Req/Ack handshakes
Synchronous communication 1 1 1 Clock edges determine the time instants where data must be sampled Data wires may glitch between clock edges (set-up/hold times must be satisfied) Data are transmitted at a fixed rate (clock frequency)
Dual rail Two wires per bit n-bit data communication requires 2n wires 1 1 1 Two wires per bit “00” = spacer, “01” = 0, “10” = 1 n-bit data communication requires 2n wires Each bit is self-timed Other delay-insensitive codes exist
Bundled data Validity signal 1 1 1 Validity signal Similar to an aperiodic local clock n-bit data communication requires n+1 wires Data wires may glitch when no valid Signaling protocols level sensitive (latch) transition sensitive (register): 2-phase / 4-phase
Example: memory read cycle Valid address Address A A Valid data Data D D Transition signaling, 4-phase
Example: memory read cycle Valid address Address A A Valid data Data D D Transition signaling, 2-phase
Asynchronous modules DATA PATH Data IN Data OUT start done req in req out CONTROL ack in ack out Signaling protocol: reqin+ start+ [computation] done+ reqout+ ackout+ ackin+ reqin- start- [reset] done- reqout- ackout- ackin- (more concurrency is also possible, e.g. by overlapping the return-to-zero phase of step i-1 with the evaluation phase of step i)
Asynchronous latches: C element Vdd A B C A B Z Z B A Z B A A B Z+ 0 0 0 0 1 Z 1 0 Z 1 1 1 Z A B Gnd
Dual-rail logic Dual-rail AND gate A.f B.t B.f C.t C.f Dual-rail AND gate Valid behavior for monotonic environment
Completion detection C done Completion detection tree • •
Differential cascode voltage switch logic start Z.f Z.t done A.t C.f B.f A.f B.t C.t start 3-input AND/NAND gate
Bundled-data logic blocks • • start done delay Conventional logic + matched delay
Micropipelines (Sutherland 89) Aout delay delay Ain C C L logic L logic L logic L C C Rin delay Rout
Data-path / Control L logic L logic L logic L Rin Rout CONTROL Ain Aout
Control specification B+ A- B A input B output B-
Control specification B+ A B A- B-
Control specification B- A B A- B+
Control specification B+ A C+ C C A- B- B C-
Control specification B+ A C+ C C A- B B- C-
Control specification Ri+ Ao+ Ri- Ao- Ro+ Ai+ Ro- Ai- Ri Ro Ao Ai FIFO cntrl C Ri Ro Ai Ao
A simple filter: specification Ain Rin IN y := 0; loop x := READ (IN); WRITE (OUT, (x+y)/2); y := x; end loop filter Aout Rout OUT
A simple filter: block diagram x y + control Rin Ain Rout Aout Rx Ax Ry Ay Ra Aa IN OUT x and y are level-sensitive latches (transparent when R=1) + is a bundled-data adder (matched delay between Ra and Aa) Rin indicates the validity of IN After Ain+ the environment is allowed to change IN (Rout,Aout) control a level-sensitive latch at the output
A simple filter: control spec. x y + control Rin Ain Rout Aout Rx Ax Ry Ay Ra Aa IN OUT Rin+ Ain+ Rin- Ain- Rx+ Ax+ Rx- Ax- Ry+ Ay+ Ry- Ay- Ra+ Aa+ Ra- Aa- Rout+ Aout+ Rout- Aout-
A simple filter: control impl. Rin Ain Rx Ax Ry Ay Aa Ra Aout Rout Rin+ Ain+ Rin- Ain- Rx+ Ax+ Rx- Ax- Ry+ Ay+ Ry- Ay- Ra+ Aa+ Ra- Aa- Rout+ Aout+ Rout- Aout-
Control: observable behavior Rin Ain Rx Ax Ry Ay Aa Ra Aout Rout z Rin+ Ain- Rin- Aa- Ain+ Ra- Rx+ Ry- z- Ax- Rx- Ay+ Ay- Ax+ Ra+ Aa+ Rout+ Aout+ z+ Rout- Aout- Ry+
Taking delays into account x+ x- y+ y- z+ z- x z y x’ z’ Delay assumptions: Environment: 3 times units Gates: 1 time unit events: x+ x’- y+ z+ z’- x- x’+ z- z’+ y- time: 3 4 5 6 7 9 10 12 13 14
Taking delays into account x+ x- y+ y- z+ z- x’ x y z’ z very slow Delay assumptions: unbounded delays events: x+ x’- y+ z+ x- x’+ y- failure ! time: 3 4 5 6 9 10 11
Gate vs wire delay models Gate delay model: delays in gates, no delays in wires Wire delay model: delays in gates and wires
Delay models for async. circuits Bounded delays (BD): realistic for gates and wires. Technology mapping is easy, verification is difficult Speed independent (SI): Unbounded (pessimistic) delays for gates and “negligible” (optimistic) delays for wires. Technology mapping is more difficult, verification is easy Delay insensitive (DI): Unbounded (pessimistic) delays for gates and wires. DI class (built out of basic gates) is almost empty Quasi-delay insensitive (QDI): Delay insensitive except for critical wire forks (isochronic forks). Formally, it is the same as speed independent In practice, different synthesis strategies are used BD DI SI QDI
Motivation (designer’s view) Modularity Plug-and-play interconnectivity Reusability IPs with abstract timing behaviors High peformance Average-case performance (no worst-case delay synchronization) No clock skew (local timing assumptions) Many interfaces are asynchronous Buses, networks, ...
Motivation (technology aspects) Low power Automatic clock gating Electromagnetic compatibility No peak currents around clock edges Robustness High immunity to technology and environment variations (in-die variations, temperature, power supply, ...)
Dissuasion Concurrent models for specification Difficult to design CSP, Petri nets, ...: no more FSMs Difficult to design Hazards, synchronization Complex timing analysis Difficult to estimate performance Difficult to test No way to stop the clock
But ... some successful stories Philips AMULET microprocessors Sharp Intel (RAPPID) IBM (interlocked pipeline) Start-up companies: Theseus Logic, Cogency ...