Download presentation
Presentation is loading. Please wait.
Published byDarleen Ward Modified over 9 years ago
1
Advanced Digital Design Asynchronous EDA by A. Steininger, J. Lechner and R. Najvirt Vienna University of Technology
2
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 2 Overview Synchronous-Asynchronous Direct Translation (SADT) Synchronous-Asynchronous Direct Translation (SADT) Null Convention Logic Null Convention Logic Syntax Directed Compilation (Balsa) Syntax Directed Compilation (Balsa) Martin Synthesis (Caltech Asynchronous Synthesis Tools) Martin Synthesis (Caltech Asynchronous Synthesis Tools)
3
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 3 Synchronous-Asynchronous Direct Translation (SADT) Starting point: synchronous circuit description in a standard HDL Starting point: synchronous circuit description in a standard HDL Synthesis with conventional tools into sync. gate-level netlist Synthesis with conventional tools into sync. gate-level netlist Transformation of synchronous netlist into asynchronous netlist Transformation of synchronous netlist into asynchronous netlist Technology mapping Technology mapping Place and Route Place and Route Timing Verification Timing Verification
4
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 4De-synchronization SADT approach SADT approach Design style: Bundled data Design style: Bundled data Substitution of flip-flops by latches Substitution of flip-flops by latches Substitution of clock by local asynchronous controllers Substitution of clock by local asynchronous controllers De-synchronized circuits... De-synchronized circuits... never halt (liveness) never halt (liveness) perform same computations as synchronous circuit (flow-equivalence) perform same computations as synchronous circuit (flow-equivalence)
5
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 5 De-synchronization Conversion steps 1. Conversion of Flip-flops to latches D-FF separated into master/slave latches D-FF separated into master/slave latches 2. Generation of delays elements for request signals matched to length of critical path of combinational logic matched to length of critical path of combinational logic 3. Implementation and wiring of asynchronous latch controllers
6
Lecture "Advanced Digital Design" 6 De-synchronization Circuit Architecture [Cortadella et al., 06] De-synchronized circuit Synchronous circuit © A. Steininger & J. Lechner & R. Najvirt / TU Vienna
7
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 7 De-synchronization Asynchronous Controllers Controller for master/slave latches Controller for master/slave latches 4-phase protocol 4-phase protocol Different controller implementations with more or less concurrency possible Different controller implementations with more or less concurrency possible Non-overlapping Non-overlapping Semi-decoupled 4-phase Semi-decoupled 4-phase Fully-decoupled 4-phase Fully-decoupled 4-phase De-synchronization control De-synchronization control More concurrency => fast pipeline More concurrency => fast pipeline More concurrency => larger controllers More concurrency => larger controllers
8
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 8 De-synchronization Flow Equivalence Definition: Two circuits are flow- equivalent if they... Definition: Two circuits are flow- equivalent if they... have the same set of latches have the same set of latches For each latch, the sequence of stored values is the same in both circuits For each latch, the sequence of stored values is the same in both circuits [Cortadella et al., 06]
9
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 9 De-synchronization Pros/Cons Advantages Advantages Use of standard HDLs Use of standard HDLs Use of industrial-strength synthesis tools Use of industrial-strength synthesis tools Almost no re-education for hardware designers necessary Almost no re-education for hardware designers necessary Simple porting of legacy designs Simple porting of legacy designs Negligible area overhead compared to synchronous implementation Negligible area overhead compared to synchronous implementation Disadvantages Disadvantages 1-to-1 mapping of sync. circuits can lead to sub-optimal designs 1-to-1 mapping of sync. circuits can lead to sub-optimal designs
10
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 10 Click Elements Published as an implementation style for data-driven compilation (Haste) Published as an implementation style for data-driven compilation (Haste) Also useful for implementing asynchronous equivalents of synchronous circuits Also useful for implementing asynchronous equivalents of synchronous circuits Uses flip-flops for storage Uses flip-flops for storage Most elements implementable with cells from a standard (sync) library Most elements implementable with cells from a standard (sync) library Arbiter still required (not for SADT) Arbiter still required (not for SADT)
11
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 11 Click Elements
12
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner / TU Vienna 12 Null Convention Logic Synthesis RTL Synthesis RTL Synthesis Transform VHDL/Verilog to 3NCL netlist Transform VHDL/Verilog to 3NCL netlist Netlist contains just AND & INV gates Netlist contains just AND & INV gates Off-the-shelf synthesis tools Off-the-shelf synthesis tools NULL values are treated as “don’t care” NULL values are treated as “don’t care” Logic optimizations Logic optimizations Dual-rail expansion Dual-rail expansion 3NCL netlist to 2NCL netlist 3NCL netlist to 2NCL netlist DIMS implementation of AND & INV gates DIMS implementation of AND & INV gates Produces a delay-insenstive circuit Produces a delay-insenstive circuit Logic optimizations Logic optimizations
13
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner / TU Vienna 13 Dual Rail NAND DIMS implementation [Ligthart et al., 2000]
14
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner / TU Vienna 14 Null Convention Logic Technology Mapping DIMS implementation inefficient DIMS implementation inefficient Techn. mapping on threshold gates Techn. mapping on threshold gates Circuit functionality fully described by set function of DIMS implementation Circuit functionality fully described by set function of DIMS implementation DIMS smoothing: Derive boolean network representing set function DIMS smoothing: Derive boolean network representing set function Threshold gates have specific set function Threshold gates have specific set function Perform logic optimization and map boolean network to available threshold gates Perform logic optimization and map boolean network to available threshold gates
15
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner / TU Vienna 15 Dual Rail NAND DIMS implementation Set function [Ligthart et al., 2000]
16
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner / TU Vienna 16 Null Convention Logic Threshold Gates Library of threshold gates by Theseus Library of threshold gates by Theseus all unate functions with up to 4 inputs all unate functions with up to 4 inputs
17
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 17 Syntax-Directed Compilation 1-to-1 mapping of language constructs to handshake circuit components 1-to-1 mapping of language constructs to handshake circuit components Uses a library of highly optimized standard cell components for simpler physical synthesis and verification Uses a library of highly optimized standard cell components for simpler physical synthesis and verification Allows experienced designer to easily envision the resulting circuit but limits optimization potential Allows experienced designer to easily envision the resulting circuit but limits optimization potential
18
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner / TU Vienna 18 Balsa Handshake Circuits Approx. 40 handshake components Approx. 40 handshake components Connected over channels Connected over channels Data path associated Data path associated Pure control channels (no data transferred) Pure control channels (no data transferred) Active ports initiate communication Active ports initiate communication Passive ports respond to request Passive ports respond to request Push channel Push channel Data flow from active to passive port Data flow from active to passive port Pull channel Pull channel Data flow from passive to active port Data flow from passive to active port
19
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner / TU Vienna 19 Example: Handshake Components Fetch ( ) Fetch ( ) Transfers data upon request Transfers data upon request Case (@) Case (@) Conditional control flow element Conditional control flow element Source: [Balsa Manual]
20
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner / TU Vienna 20 Example: Modulo-10 Counter import [balsa.types.basic] type C_size is nibble constant max_count = 9 procedure count10(sync aclk; output count: C_size) is variable count_reg : C_size variable tmp : C_size begin loop sync aclk; if count_reg /= max_count then tmp := (count_reg + 1 as C_size) else tmp := 0 end || count <- count_reg ; count_reg := tmp end -- loop end -- begin
21
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner / TU Vienna 21 Example: Modulo-10 Counter Source: [Balsa Manual]
22
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 22 Martin synthesis The so-called Martin synthesis process is seminal work of the async group around A. J. Martin at Caltech The so-called Martin synthesis process is seminal work of the async group around A. J. Martin at Caltech Design entry is CHP, result is PRS Design entry is CHP, result is PRS Performs several transformations with designer modifiable intermediate steps Performs several transformations with designer modifiable intermediate steps
23
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 23 Communicating Hardware Processes Main constructs: Simple assignment: Simple assignment: v := true or v := false Selection Selection [G1 -> S1 [] G2 -> S2] [G] is [G -> skip] Repetition Repetition *[G1 -> S1 [] G2 -> S2] *[S] is *[true -> S] Sequencing and concurrent execution Sequencing and concurrent execution S1; S2 and S1, S2 Communication Communication C (synchronization) C!x (transmission) C?x (reception) #C (probe)
24
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 24 Process Decomposition First transformation First transformation Reduces processes with complex control structures to simple concurrent subprocesses Reduces processes with complex control structures to simple concurrent subprocesses Either syntax-directed (SDD) or data- driven (DDD) Either syntax-directed (SDD) or data- driven (DDD)
25
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 25 Syntax Directed Decomposition Rule: A process P with construct S can be replaced with processes P1, P2 and a new channel C by replacing S with the communication C and creating P2 of the form *[[#C -> S; C]] E.g. P: *[A; *[B1 -> S1 [] B2 -> S2]; B] P1: *[A; C; B] P2:*[[#C & B1 -> S1 []#C & B2 -> S2 []#C & B2 -> S2 []#C & ~B1 & ~B2 -> C]] []#C & ~B1 & ~B2 -> C]]
26
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 26 Data Driven Decomposition More fine-grained than SDD More fine-grained than SDD At the end, clustering can be performed to merge subprocesses again for better performance At the end, clustering can be performed to merge subprocesses again for better performance First transformation to dynamic single assignment (DSA) form: First transformation to dynamic single assignment (DSA) form: Each variable can be written only once in each main loop iteration, e.g.: *[A?a; X!a; B?a; Y!a] *[A?a1; X!a1; B?a2; Y!a2]
27
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 27 Data Driven Decomposition (2) Second transformation is projection Second transformation is projection First, transformations to allow projection e.g. variable duplication and channel addition: First, transformations to allow projection e.g. variable duplication and channel addition: *[A?a; x := a, y := ~a; X!x, Y!y] *[A?a; a1 := a, a2 := a; x := a1, y := ~a2; X!x, Y!y] *[A?a; {Ax!a, Ax?a1}, {Ay!a, Ay?a2}; x := a1, y := ~a2; X!x, Y!y] x := a1, y := ~a2; X!x, Y!y] Then projection to some sets of assignments Then projection to some sets of assignments Sets: {A?, a, Ax!, Ay!} {Ax?, a1, x, X!} {Ay?, a2, y, Y!} Projection: *[A?a; Ax!a, Ay!a], *[Ax?a1; x := a1; X!x], *[Ay?a2; y := ~a2; Y!y]
28
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 28 Handshake Expansion (HSE) Each communication channel is replaced by handshake signals, e.g.: Each communication channel is replaced by handshake signals, e.g.: *[…; C; …], *[#C -> …; C] is transformed to (4-phase handshake) *[…; r := 1; [a]; r := 0; [~a]; …], *[r -> …; a := 1; [~r]; a := 0] Reshuffling can then be used to increase concurrency/performance (different handshake controllers) Reshuffling can then be used to increase concurrency/performance (different handshake controllers)
29
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 29 Production Rule Expansion (PRE) Transforms HSE to PR in three steps: Transforms HSE to PR in three steps: State variable insertion State variable insertion PR generation PR generation Symmetrisation Symmetrisation Sequencing must be implemented explicitly Sequencing must be implemented explicitly *[[Lr]; Rr := 1; [Ra]; Rr := 0; [~Ra]; La := 1; [~Lr]; La := 0] La := 1; [~Lr]; La := 0] Lr -> Rr+ Ra -> Rr- ~Ra -> La+ ~Lr -> La-
30
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 30 Production Rule Expansion (PRE) Transforms HSE to PR in three steps: Transforms HSE to PR in three steps: State variable insertion State variable insertion PR generation PR generation Symmetrisation Symmetrisation Sequencing must be implemented explicitly Sequencing must be implemented explicitly *[[Lr]; Rr := 1; [Ra]; Rr := 0; [~Ra]; La := 1; [~Lr]; La := 0] La := 1; [~Lr]; La := 0] *[[Lr]; Rr := 1; [Ra]; x := 1; [x]; Rr := 0; [~Ra]; La := 1; [~Lr]; Rr := 0; [~Ra]; La := 1; [~Lr]; x := 0; [~x]; La := 0] x := 0; [~x]; La := 0] ~x & Lr -> Rr+ Ra -> x+ x -> Rr- x & ~Ra -> La+ ~Lr -> x- ~x -> La-
31
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 31 Production Rule Expansion (PRE) Transforms HSE to PR in three steps: Transforms HSE to PR in three steps: State variable insertion State variable insertion PR generation PR generation Symmetrisation Symmetrisation Sequencing must be implemented explicitly Sequencing must be implemented explicitly *[[Lr]; Rr := 1; [Ra]; Rr := 0; [~Ra]; La := 1; [~Lr]; La := 0] La := 1; [~Lr]; La := 0] *[[Lr]; Rr := 1; [Ra]; x := 1; [x]; Rr := 0; [~Ra]; La := 1; [~Lr]; Rr := 0; [~Ra]; La := 1; [~Lr]; x := 0; [~x]; La := 0] x := 0; [~x]; La := 0] ~x & Lr -> Rr+ Ra -> x+ ~Lr | x -> Rr- x & ~Ra -> La+ ~Lr -> x- Ra | ~x -> La-
32
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 32Summary Synchronous-Asynchronous Direct Translation Synchronous-Asynchronous Direct Translation Synthesis with standard tools Synthesis with standard tools Syncronous-Asynchronous transformation Syncronous-Asynchronous transformation Martin Synthesis Martin Synthesis Process decomposition Process decomposition Handshake expansion Handshake expansion Production rule expanstion Production rule expanstion
33
Lecture "Advanced Digital Design"© A. Steininger & J. Lechner & R. Najvirt / TU Vienna 33References Jordi Cortadella, Alex Kondratyev, Luciano Lavagno, Christos P. Sotiriou. Desynchronization: Synthesis of Asynchronous Circuits From Synchronous Specifications. 2006 Jordi Cortadella, Alex Kondratyev, Luciano Lavagno, Christos P. Sotiriou. Desynchronization: Synthesis of Asynchronous Circuits From Synchronous Specifications. 2006 Alain J. Martin. Programming in VLSI: From Communicating Processes to Self-timed VLSI Circuits. 1987 Alain J. Martin. Programming in VLSI: From Communicating Processes to Self-timed VLSI Circuits. 1987 Catherine G. Wong and Alain J. Martin. High-Level Synthesis of Asynchronous Systems by Data- Driven Decomposition. 2003 Catherine G. Wong and Alain J. Martin. High-Level Synthesis of Asynchronous Systems by Data- Driven Decomposition. 2003 Ad Peeters, Frank te Beest, Mark de Wit, Willem Mallon. Click Elements – An Implementation Style for Data-Driven Compilation. 2010 Ad Peeters, Frank te Beest, Mark de Wit, Willem Mallon. Click Elements – An Implementation Style for Data-Driven Compilation. 2010
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.