Area and Speed Oriented Implementations of Asynchronous Logic Operating Under Strong Constraints
EUROMICRO DSD 2010, Lille2 Outline Asynchronous circuits model used Motivation & proposed method Experimental results Conclusions
EUROMICRO DSD 2010, Lille3 Asynchronous Circuits Model Used Unbounded delay model Gate and wire delays are not limited The circuit is able to recognize the moment when input states have changed Dual-rail encoding Positive and negative values of each signal are provided f (0) = 1, f (1) = 0 – log. 1 f (0) = 0, f (1) = 1 – log. 0 f (0) = 0, f (1) = 0 – space state (spacer) f (0) = 1, f (1) = 1 – not allowed
EUROMICRO DSD 2010, Lille4 Four-Phase Discipline Inputs in space state (00) Inputs in working state (10, 01) Outputs in space state (00) Outputs in working state (10, 01)
EUROMICRO DSD 2010, Lille5 Seitz’s Constraints Strong constraints Each output changes its state only when all inputs have changed their state In contrast to weak constraints Some outputs are permitted to change their state when some inputs have changed their state
EUROMICRO DSD 2010, Lille6 Seitz’s Constraints Strong constraints Each output changes its state only when all inputs have changed their state In contrast to weak constraints Some outputs are permitted to change their state when some inputs have changed their state
EUROMICRO DSD 2010, Lille7 Seitz’s Strong Constraints Pros Regularity Extra completion detection logic not needed Circuit delay is based on actual gate delays No additional synchronization chains Cons Rather high area and delay DIMS (Delay-Insensitive Minterm Synthesis) NCL (Null Convention Logic) Direct Logic
EUROMICRO DSD 2010, Lille8 DIMS (Delay-Insensitive Minterm Synthesis) 2-level implementation 2 n n-input C-elements + n-input OR Function implemented as sum-of-minterms
EUROMICRO DSD 2010, Lille9 NCL (Null Convention Logic) Library of 27 special gates Based on threshold functions Any function up to 4 inputs can be implemented … but in dual-rail, 4 inputs = 2 variables only
EUROMICRO DSD 2010, Lille10 Direct Logic Two ‑ level C-OR DIMS logic implemented as a single gate Both positive and complemented outputs are provided Different delays for each input
EUROMICRO DSD 2010, Lille11 Comparison DIMS Direct logic InputsTrans.DelayTrans.Delay N/A90N/A 6896N/A158N/A NCL 2-input gate Trans.Delay AND, OR215.8 XOR248.6
EUROMICRO DSD 2010, Lille12 Multi-Level Dual-Rail Network Positive and complemented values of each signal provided Each node implemented as DIMS, NCL, or Direct logic
EUROMICRO DSD 2010, Lille13 Motivation & Proposed Method State-of-the-art Nodes are implemented as simple gates (NAND, XOR) 4x 2-input gate = 22*4 = 88 transistors in Direct logic
EUROMICRO DSD 2010, Lille14 Motivation & Proposed Method Proposed Nodes are implemented as complex gates 1x 2-input gate + 1x 3-input gate = = 56 transistors
EUROMICRO DSD 2010, Lille15 Motivation & Proposed Method State-of-the-art Nodes are implemented as simple gates (NAND, XOR) Proposed Nodes are implemented as complex gates, i.e. gates of a given number of inputs and any function Can be implemented both in DIMS and Direct logic Like FPGA LUTs Tools for synchronous synthesis can be used FPGA mapping
EUROMICRO DSD 2010, Lille16 Where’s the Problem? Facts: Increase of the number of node inputs will: Decrease the number of nodes Decrease the number of levels Increase the node size Increase the node delay Question: Where is the trade-off?
EUROMICRO DSD 2010, Lille17 Experimental Setup 228 circuits processed (MCNC, ISCAS) Optimized by ABC choice script Mapped into k-input NANDs (ABC map command ) state-of-the-art (k-NAND) Mapped into k-LUTs (ABC fpga command) complex gates (k-CG) Mapped into MCNC standard cells (ABC map) something in-between (SC) k = 2…6 Implemented as DIMS, Direct logic, and NCL
EUROMICRO DSD 2010, Lille18 Results – DIMS - Area
EUROMICRO DSD 2010, Lille19 Results – DIMS - Area
EUROMICRO DSD 2010, Lille20 Results – DIMS – Delay
EUROMICRO DSD 2010, Lille21 Results – DIMS – Delay
EUROMICRO DSD 2010, Lille22 Discussion - DIMS Implementation using arbitrary 2-input gates is the best one, both in area and delay No big surprise. Complexity (and delay) of DIMS grows exponentially with the number of gate inputs Results are consistent – the more node inputs, the higher area and delay
EUROMICRO DSD 2010, Lille23 Results – Direct Logic - Area
EUROMICRO DSD 2010, Lille24 Results - Direct Logic - Area
EUROMICRO DSD 2010, Lille25 Results – Direct Logic - Delay
EUROMICRO DSD 2010, Lille26 Results – Direct Logic - Delay
EUROMICRO DSD 2010, Lille27 Discussion - Direct Logic Implementation using 3-input complex gates is the best one, both in area and delay This is a good result confirming our theory Results are consistent - no coincidence State-of-the-art 2-NAND implementation is extremely inefficient: 21% area improvement 19% delay improvement 3-CG implementation is even better than NCL 10% area improvement 19% delay improvement
EUROMICRO DSD 2010, Lille28 Conclusions Efficient implementation of asynchronous logic operating under strong constraints proposed Tools (& methods) for synchronous synthesis are used for asynchronous synthesis 3-input complex nodes implemented using Direct logic Extensive experiments confirmed the theory cca. 20% area and delay improvement vs. all state-of-the-art methods