Southampton: Oct 99Asynchronous Circuit Compilation- 1 Asynchronous Circuit Compilation Dr. Doug Edwards
Southampton: Oct 99Asynchronous Circuit Compilation- 2 Overview: n Asynchronous circuits n Advantages n Asynchronous Design Paradigms n Syntax Directed Compilation Handshake Circuits n Balsa n Datapath Compilation n Design Example - DMA Controller
Southampton: Oct 99Asynchronous Circuit Compilation- 3 Asynchronous (self-timed) Basics n Synchronous circuits a global clock separates system states – A time domain view of system activity. n Asynchronous circuits input changes separate system states –A sequence or trace domain view of system activity.
Southampton: Oct 99Asynchronous Circuit Compilation- 4 Why Asynchronous? n Low Power data-driven: power is only used to do useful work zero power when idle with instant restart n Low EMI In a clocked circuit, all noise is correlated Async circuits have “distributed” switching activity leading to uncorrelated EMI
Southampton: Oct 99Asynchronous Circuit Compilation- 5 Why Asynchronous? n No clock distribution problems n Composability/Modularity facilitates IP reuse n Average Case Performance exploit the fact that worst-case often occurs infrequently
Southampton: Oct 99Asynchronous Circuit Compilation- 6 Timing Models n Delay Insensitive (DI) Delays in circuits & wires are arbitrary n Quasi-Delay Insensitive (QDI) Similar to DI but assuming isochronic forks n Speed Independent (SI) Wires have no delays, arbitrary gate delays n Bounded Delay Single-sided timing constraints
Southampton: Oct 99Asynchronous Circuit Compilation- 7 Asynchronous Design Paradigms n AFSMs - for fast controllers etc Traditionally hard –hazards, races,state asigment problems Research has led to new techniques –STG/Petri net based SI circuits –Burst-Mode circuits n Macromodule-like for larger systems micropipeline approach, handshake circuits
Southampton: Oct 99Asynchronous Circuit Compilation- 8 n With no clock, some other means is required to co-ordinate control flow n Use a request/acknowledge handshake Asynchronous Control Req Ack Sender
Southampton: Oct 99Asynchronous Circuit Compilation- 9 Signalling Protocols n req & ack are abstractions: layer a signalling protocol on top of them: n Two common protocols 2-phase (transition signalling, NRZ) 4-phase (Return-to-Zero signalling)
Southampton: Oct 99Asynchronous Circuit Compilation- 10 Data Validity Models n Self Timed The validity of the data is encoded within the data itself – redundant coding e.g. Dual Rail: each data bit requires two wires. 00 -> no data, 01 -> ‘0’, 10 -> ‘1’ n Bundled Data approach conventional datapath validity is assured by imposing timing constraints.
Southampton: Oct 99Asynchronous Circuit Compilation- 11 valid 1 transaction1 transaction valid Req Ack 2-phase Protocol n Events are transitions:
Southampton: Oct 99Asynchronous Circuit Compilation phase protocol n Signals are returned to initial state after each transaction Several possible interleavings of the signal transitions
Southampton: Oct 99Asynchronous Circuit Compilation- 13 Comparison of Approaches n 2-phase/4-phase 2-phase conceptually simpler (once an event mind-set is adopted) 2-phase circuits slower & more complex think 2-phase, build 4-phase n Bundled-Data/Dual-rail Current orthodoxy: bundled data is faster, lower power, smaller area with tolerancing task no worse than for a clocked design
Southampton: Oct 99Asynchronous Circuit Compilation- 14 Current Approach n QDI control n Bounded-Delay (bundled-data) datapath n 4-phase signalling Amulet3i
Southampton: Oct 99Asynchronous Circuit Compilation- 15 Asynchronous HDLs n Conventional programming languages lack 3 necessary constructs: communication parallelism/concurrency sharing (of hardware) n Conventional HDLs lack adequate fine-grain concurrency channel based communication primitives
Southampton: Oct 99Asynchronous Circuit Compilation- 16 Asynchronous HDLs – 2 n Tangram, Balsa CSP based + data types + … based on underlying formal semantics –guarantees correct composition rules –easier composition than in sync circuits??? transparent compilation –each production rule in the language translates to an intermediate handshake circuit –allows designer to infer circuit costs & performance from the program
Southampton: Oct 99Asynchronous Circuit Compilation- 17 Handshake Circuits - 1 n Circuits communicate along channels n Channels connect ports at circuit interface n Ports have: Type Direction Sense
Southampton: Oct 99Asynchronous Circuit Compilation- 18 Handshake Circuits - 2 n Port type determines the number of data wires no data wires == control only port! n Port direction is input, output or control only n Port sense Active: initiates transfers Passive: responds to requests
Southampton: Oct 99Asynchronous Circuit Compilation- 19 Micropipeline-Style Circuits: Push Circuits: Circuit waits for data passive input req ack data cct active output req ack data
Southampton: Oct 99Asynchronous Circuit Compilation- 20 Micropipeline-Style Circuits: Push Circuits: data arrives req ack data cct req ack data
Southampton: Oct 99Asynchronous Circuit Compilation- 21 Micropipeline-Style Circuits: Push Circuits: data validity signalled req ack data cct req ack data
Southampton: Oct 99Asynchronous Circuit Compilation- 22 Micropipeline-Style Circuits: Push Circuits: circuit accepts data req ack data cct req ack data
Southampton: Oct 99Asynchronous Circuit Compilation- 23 Micropipeline-Style Circuits: Push Circuits: circuit signals data taken req ack data cct req ack data
Southampton: Oct 99Asynchronous Circuit Compilation- 24 Micropipeline-Style Circuits: Push Circuits: Circuit outputs data req ack data cct req ack data
Southampton: Oct 99Asynchronous Circuit Compilation- 25 Micropipeline-Style Circuits: Push Circuits: Circuit signals validity req ack data cct req ack data
Southampton: Oct 99Asynchronous Circuit Compilation- 26 Micropipeline-Style Circuits: Push Circuits: receiver takes data req ack data cct req ack data
Southampton: Oct 99Asynchronous Circuit Compilation- 27 Micropipeline-Style Circuits: n 4-phase protocol not detailed n Previous circuit decoupled input and ouput implies a latch inside the handshake circuit n An alternative is for the input handshake to enclose the output handshake
Southampton: Oct 99Asynchronous Circuit Compilation- 28 Enclosed Handshake: Push Circuits: data arrives req ack data cct req ack data
Southampton: Oct 99Asynchronous Circuit Compilation- 29 Enclosed Handshake: Push Circuits: data validity signalled req ack data cct req ack data
Southampton: Oct 99Asynchronous Circuit Compilation- 30 Enclosed Handshake: Push Circuits: circuit accepts data req ack data cct req ack data
Southampton: Oct 99Asynchronous Circuit Compilation- 31 Enclosed Handshake: Push Circuits: Circuit outputs data req ack data cct req ack data
Southampton: Oct 99Asynchronous Circuit Compilation- 32 Enclosed Handshake: Push Circuits: Circuit signals validity req ack data cct req ack data
Southampton: Oct 99Asynchronous Circuit Compilation- 33 Enclosed Handshake: Push Circuits: receiver takes data req ack data cct req ack data
Southampton: Oct 99Asynchronous Circuit Compilation- 34 Enclosed Handshake: Push Circuits: input handshake completes No latch required req ack data cct req ack data
Southampton: Oct 99Asynchronous Circuit Compilation- 35 Tangram Style Circuits Pull Circuits: active ported circuits/ control driven req ack data cct req ack data active input port
Southampton: Oct 99Asynchronous Circuit Compilation- 36 Tangram Style Circuits Pull Circuits: Circuit demands data req ack data cct req ack data
Southampton: Oct 99Asynchronous Circuit Compilation- 37 Tangram Style Circuits Pull Circuits: data is sent on demand req ack data cct req ack data
Southampton: Oct 99Asynchronous Circuit Compilation- 38 Tangram Style Circuits Pull Circuits: data is accepted and can then be released req ack data cct req ack data
Southampton: Oct 99Asynchronous Circuit Compilation- 39 Balsa n Language for synthesising large async circuits & systems n CSP/OCCAM background n Tangram-like based on Tangram compilation function compiles to a small (but expanding) set of handshake circuits origins: ESPRIT EXACT project
Southampton: Oct 99Asynchronous Circuit Compilation- 40 Balsa Language Features n Data types based on sequence of bits Arrays and records are bit-based Element extraction is by array slicing Strict data typing n Structural iteration n Arrayed channels n Parameterised & recursive functions
Southampton: Oct 99Asynchronous Circuit Compilation- 41 Balsa Language Features n Enclosed selection semantics Allows passive ported circuits Allows push (micropipeline-style) circuits Allows unbuffered (latch-free) circuits Can be considered a restricted form of Burns’ probe construct.
Southampton: Oct 99Asynchronous Circuit Compilation- 42 Balsa Source
Southampton: Oct 99Asynchronous Circuit Compilation- 43 Example: Single Place Buffer import [balsa.types.basic] public type word is 16 bits procedure buffer (input i : word; output o : word) is local variable x : word begin loop i -> x;-- Input communication o <- x-- Output communication end library mechanism visibility type declaration channel declarations procedure definition implies latch repeat forever sequential operation read input channel into local variable x output local variable x to output channel
Southampton: Oct 99Asynchronous Circuit Compilation- 44 Buffer Handshake Circuit Single-place buffer # x T ; T io activation channel repeater sequencer variable transferrer
Southampton: Oct 99Asynchronous Circuit Compilation- 45 # Buffer Handshake Circuit Single-place buffer repeater is activated x T ; T io
Southampton: Oct 99Asynchronous Circuit Compilation- 46 ; # Buffer Handshake Circuit Single-place buffer Sequencer handshakes to left transferrer x TT io
Southampton: Oct 99Asynchronous Circuit Compilation- 47 ; # Buffer Handshake Circuit Single-place buffer transferrer requests data from environment x TT io
Southampton: Oct 99Asynchronous Circuit Compilation- 48 x ; # Buffer Handshake Circuit Single-place buffer data transferred to variable x TT io
Southampton: Oct 99Asynchronous Circuit Compilation- 49 x ; # Buffer Handshake Circuit Single-place buffer variable handshake completes TT io
Southampton: Oct 99Asynchronous Circuit Compilation- 50 x ; # Buffer Handshake Circuit Single-place buffer transferrer handshake completes to environment TT io
Southampton: Oct 99Asynchronous Circuit Compilation- 51 x ; # Buffer Handshake Circuit Single-place buffer transferrer handshake completes TT io
Southampton: Oct 99Asynchronous Circuit Compilation- 52 x ; # Buffer Handshake Circuit Single-place buffer Sequencer handshakes to right transferrer TT io
Southampton: Oct 99Asynchronous Circuit Compilation- 53 x ; # Buffer Handshake Circuit Single-place buffer Transferrer reads variable TT io
Southampton: Oct 99Asynchronous Circuit Compilation- 54 x ; # Buffer Handshake Circuit Single-place buffer Transferrer outputs to environment TT io
Southampton: Oct 99Asynchronous Circuit Compilation- 55 x ; # Buffer Handshake Circuit Single-place buffer handshakes complete TT io
Southampton: Oct 99Asynchronous Circuit Compilation- 56 x ; # Buffer Handshake Circuit Single-place buffer Sequencer completes its input handshake TT io
Southampton: Oct 99Asynchronous Circuit Compilation- 57 Buffer Handshake Circuit Single-place buffer repeater initiates another transfer, etc x ; # TT i o
Southampton: Oct 99Asynchronous Circuit Compilation- 58 Example: Single Place Buffer import [balsa.types.basic] public type word is 16 bits procedure buffer (input i : word; output o : word) is local variable x : word begin loop i -> x;-- Input communication o <- x-- Output communication end
Southampton: Oct 99Asynchronous Circuit Compilation- 59 Example: 2-place buffer import [balsa.types.basic] import [buffer1a] public type word is 16 bits procedure buffer2c (input i : word; output o : word) is local channel c : word begin buffer (i, c) || buffer (c, o) end parallel composition reuse component internal channel connects two 1-place buffers buffers connected by common signal name
Southampton: Oct 99Asynchronous Circuit Compilation place Buffer Handshake Circuit B i x par component o cc passivator
Southampton: Oct 99Asynchronous Circuit Compilation place Buffer Handshake Circuit x ; # T T i x ; # T T # # par component o cc passivator
Southampton: Oct 99Asynchronous Circuit Compilation- 62 Peephole Optimisation n Composition of handshake circuits leads to inefficiencies at circuit boundaries n Straightforward peephole optimizations
Southampton: Oct 99Asynchronous Circuit Compilation place Buffer Handshake Circuit x ; # T T i x ; # T T # # par component o cc passivator
Southampton: Oct 99Asynchronous Circuit Compilation- 64 Optimized 2-place Buffer Circuit x ; # T T i x ; # T control-only
Southampton: Oct 99Asynchronous Circuit Compilation- 65 The Repeater n “Formal” Definition REP(a ,b ) = (a : #[b ]) denotes active port denotes passive port # denotes repeat : denotes handshake enclosure
Southampton: Oct 99Asynchronous Circuit Compilation- 66 The Repeater n “Formal” Definition REP (a ,b ) = (a : #[b ]) = (a : #[b ;b ]) = (a r : #[b r ; b a ; b r ; b a ]) b r b a a r a a
Southampton: Oct 99Asynchronous Circuit Compilation- 67 The Transferrer n Several Implementations simplest – wire-only: arar crcr baba a brbr caca data[n]
Southampton: Oct 99Asynchronous Circuit Compilation- 68 Balsa Toolkit -1 n balsa-c The compiler for the language n breeze2dot Produces a postscript plot of the generated handshake circuits n breezecost Reports the cost of the compiled circuit in arbitrary units
Southampton: Oct 99Asynchronous Circuit Compilation- 69 Balsa Toolkit -2 n breeze2lard The interface to the LARD simulation environment. –balsa source is translated to LARD –simple test harness is generated n balsa-md An automatic makefile generation facility. n balsa-mgr A GUI project manager
Southampton: Oct 99Asynchronous Circuit Compilation- 70 Mod-16 Counter (all even)
Southampton: Oct 99Asynchronous Circuit Compilation- 71 Bundled-Data Datapaths n Problems random standard cell layout –mixed control + datapath timing analysis required robustness of design reduced n Possible Solutions DI codes hybrid bundled + DI simpler timing analysis
Southampton: Oct 99Asynchronous Circuit Compilation- 72 DI Codes n Dual Rail (used in 1st Tangram system) Can use standard cell approach without timing analysis –no need to distinguish between control & data abandoned in favour of bundled-data –area cost in extra wires –area & time cost in completion detection Tangram/Balsa generates push-pull pipelines with expensive synchronization
Southampton: Oct 99Asynchronous Circuit Compilation- 73 Generic Pipeline n Passivators join compiled procedure B i B o cc passivator
Southampton: Oct 99Asynchronous Circuit Compilation- 74 Passivator Implementation n Bundled Data n Dual Rail arar babaa brbr data[n] d0d0 d1d1 C brbr babaa n-wide C-gate C C n-bits wide d n-1
Southampton: Oct 99Asynchronous Circuit Compilation- 75 DI Code Synchronizations n Expensive need C-element synchronisation tree n A partial solution (not always possible/desirable) is: transform to push-style datapath –(not possible in Tangram only Balsa)
Southampton: Oct 99Asynchronous Circuit Compilation- 76 Push Pipeline B i B o cc Passive input port connector (wires-only)
Southampton: Oct 99Asynchronous Circuit Compilation- 77 Hybrid Solutions n Use DI coding within bundled datapath framework e.g. use dual-rail carry signals within a conventional adder –early completion easily detected n Average-case performance n Only applicable to a few datapath operations
Southampton: Oct 99Asynchronous Circuit Compilation- 78 Simpler Timing Analysis n Separate control and datapath generate regular, compiled, datapath –area improvement over standard cell (because of regular layout) – generate matched delay paths (c.f. self-timed PLAs) must be able to recognize datapath –difficult: control often contains datapath-like elements. –e.g. start at variables and work backwards...
Southampton: Oct 99Asynchronous Circuit Compilation- 79 Datapath meets Control n Example: Balsa case statement data “n” bits wide true/complement lines: dual-rail expansion 1 hot encoding
Southampton: Oct 99Asynchronous Circuit Compilation- 80 Case Component n input from datapath dual-rail simplifies internal logic n expansions parameterisable n “encode” component is similar opposite of case with true/false expansion
Southampton: Oct 99Asynchronous Circuit Compilation- 81 Simpler Timing Analysis n Tool support required use existing (non-Balsa) tools if possible automatically add matched paths/delays to synthesised datapaths n Design own cells where appropriate e.g. hybrid stages
Southampton: Oct 99Asynchronous Circuit Compilation- 82 Future Work n Provide support for DI, hybrid and datapath-compiled datapaths even with datapath compilation, some datapath would still be standard cell –e.g. instruction decoder (control heavy) –datapath in control cost of connecting separate blocks in layout n Test Design required (datapath heavy)
Southampton: Oct 99Asynchronous Circuit Compilation- 83 Tool Enhancement n balsa-c support for attribution to select compilation mechanisms/ optimisation schemes n breeze2lard new models n balsa-netlist: new tech-mapping descriptions interface to datapath compilers
Southampton: Oct 99Asynchronous Circuit Compilation- 84 AMULET3i n Asynchronous macrocell ARM compatible processor core Full custom RAM Compiled ROM Balsa compiled DMA controller Test I/F, synchronous and off-chip bus bridges n Synchronous peripherals Designed by commercial partner...
Southampton: Oct 99Asynchronous Circuit Compilation- 85 AMULET3 System CPU / RAM ROMDMAC Periph1 Sync bridge MARBLESOCB
Southampton: Oct 99Asynchronous Circuit Compilation- 86 DMA Local RAM Access CPU / RAM ROMDMAC Periph1 Sync bridge MARBLESOCB
Southampton: Oct 99Asynchronous Circuit Compilation- 87 DMA Peripheral Accesses CPU / RAM ROMDMAC Periph1 Sync bridge MARBLESOCB DMA requests
Southampton: Oct 99Asynchronous Circuit Compilation- 88 Requirements / Specification n 16 clients, 32 channels n 3 channel types - complicated register structure n Programmable client channel 1 many mapping n Support synchronous requests n Transfers mostly between synchronous clients
Southampton: Oct 99Asynchronous Circuit Compilation- 89 Controller Structure
Southampton: Oct 99Asynchronous Circuit Compilation- 90 Two Controller Descriptions n Sequential (previous slides) Very simple control flow Requires two passes through register bank Slow!, Only memory decoupling helps n Parallel (next slides) Decouple TE actions from memory R/W with a new unit: Transfer Interface Interrupt the register bank on end of transfer
Southampton: Oct 99Asynchronous Circuit Compilation- 91 “Parallel” Design
Southampton: Oct 99Asynchronous Circuit Compilation- 92 The Design n 919 lines of Balsa describing register bank control, TE and TI. n Custom register banks and Synchronous Peripheral Interface n Miscellaneous glue standard cells Register bank controllers MARBLE interfaces n Compass Design Automation CAD
Southampton: Oct 99Asynchronous Circuit Compilation- 93 Implementation Technology n 0.35 m, 3LM CMOS n Standard cells from ARM Ltd. n Locally designed complex gates and asynchronous elements/gates. n Automated standard cell P&R n Only “essential” and simple gate level optimisation (by hand)
Southampton: Oct 99Asynchronous Circuit Compilation- 94 Design Partitioning Marble BUS: outside of DMA controller
Southampton: Oct 99Asynchronous Circuit Compilation- 95 Design Partitioning Balsa synthesised standard cells
Southampton: Oct 99Asynchronous Circuit Compilation- 96 Design Partitioning Custom “regular” layout
Southampton: Oct 99Asynchronous Circuit Compilation- 97 Design Partitioning Hand designed standard cells
Southampton: Oct 99Asynchronous Circuit Compilation- 98 DMA Controller Floor-Plan