Direct synthesis of large-scale asynchronous controllers using a Petri-net-based approach Ivan BlunnoPolitecnico di Torino Alex BystrovUniv. Newcastle upon Tyne Josep CarmonaUniv. Politècnica de Catalunya Jordi CortadellaUniv. Politècnica de Catalunya Luciano LavagnoUniversità di Udine Alex YakovlevUniv. Newcastle upon Tyne
Outline + Motivation Design flow Verilog HDL specification Petri nets and trace expressions Synthesis process Conclusion
Motivation Language-based design key enabler to synchronous logic success Use HDL as single language for specification logic simulation and debugging synthesis post-layout simulation HDL must support multiple levels of abstraction
Motivation HDL generates large asynchronous controllers: need direct synthesis Guarantee an implementation Automatic exploration of the design space Benefit from existing structural methods for logic synthesis Benefit (at the design stage) from existing performance estimation approaches
Design flow Control/data splitting STG (control) HDL specification Synthesizable HDL (data) Synthesis (petrify) Timing analysis (Synopsys) HDL implementation Synthesis (Synopsys) Logic implementation Delay insertion Logic delays
Design flow What is available? simulators (no synchronous assumption…) logic synthesis (from BFSM, STG, …) layout (almost like synchronous…) What is missing? translator from HDL to synthesis specification model translator from synthesis implementation model to HDL
Other approaches Special-purpose languages pros: syntax and semantics can be tailored to asynchronous Models of Computation (STG, BFSM, process algebrae) cons: not familiar to designers, no standard tool support Examples Tangram Communicating Hardware Processes Balsa
Our approach General-purpose language pros: several tools available, broad user basis cons: syntax and semantics oriented to gates, (not STGs or BFSMs or process algebrae) need to define a subset for synthesis (full language only good for simulation) Choice VHDL Verilog [Blunno & Lavagno, ASYNC’00]
Outline Motivation Design flow + Verilog HDL specification Petri nets and trace expressions Synthesis Conclusion
Asynchronous Verilog subset Module and signal declaration: module example(a, b, c, d); input a, b[7..0]; output c, d; reg e, f, g[11..0]; Currently only single module supported always loop surrounds live behavior initial block defines initialization sequence
Asynchronous Verilog subset Transitions: input signals: wait statement wait(a);... wait (!b); output signals: assignment statement c = a + b; Each statement generates a trace expression and a datapath fragment
Asynchronous Verilog subset Causality relations: Verilog statements begin-end for sequencing fork-join for concurrency if-then-else for input choice Only structured mix of sequencing, concurrency and choice can be specified
Example: simple filter always begin wait(start); R = SMP * 3; RES = SMP * 4; if(b7 == 1) RES = 0; else begin if(b6 == 1) RES = 1; end; done = 1; wait(!start); done = 0; end
Control-data partitioning Splitting of asynchronous control and synchronous data path Automated insertion of bundling delays CONTROL UNIT DATA PATH delay request acknowledge
Outline Motivation Design flow Verilog HDL specification + Petri nets and trace expressions Synthesis Conclusion
Controller design flow PNTE Circuit Petri Net Transformations Reductions Synthesis HDL Syntax-directed translation
Design flow PNTE Boolean equations Performance Estimation Area Estimation Critical cycles Transformations Cost estimation Structural synthesis
PNTE Free-choice Petri net Transitions are trace expressions Trace expressions represent well-structured event relations –Causality –Concurrency –Choice
Trace expressions (TE) TE e TE; TE TE || TE TE TE trace expressions are a subset of CCS agent expressions [Milner 80]
Trace expressions: example ( a || ( b ; c) ) || (d e) || ; a bc de
From PN to PNTE Reductions to simplify the net structure Concurrency relations take –O(n 2 ) in Trace expressions –O(n 3 ) in Free-Choice systems [Kovalyov & Esparza]
Reductions TE 1 TE 2 TE 1 ; TE 2
Reductions TE 1 TE 1 || TE 2 TE 2
Example a fb c dg h e d; a; ( b || f ) c g; h; e
Outline Motivation Design flow Verilog HDL specification Petri nets and trace expressions + Synthesis Conclusion
Exploration of the design space Kit of transformations at Petri net –Concurrency reduction –Increase of concurrency –Event hiding Fast cost estimation –Area (Boolean equations) –Performance (critical cycles)
Transformations at the net level Concurrency reduction a fb c d f and b are concurrent !
Transformations at the net level Concurrency reduction a fb c d f and b are ordered !
Transformations at the net level Concurrency reduction in TE a fb c d ; || a bcdf ; ; Concurrency in TE: b and f have a common parallel antecessor
; || a bcdf ; ; Transformations at the net level Concurrency reduction in TE a fb c d Concurrency reduction: change the parallelizer by a sequencer ;
Transformations at the net level Increase of concurrency a fb c d c is ordered with f and b!
Transformations at the net level Increase of concurrency a f bc d c, f and b are concurrent!
Transformations at the net level Increase of concurrency in TE a fb c d ; || a bcdf ; ; Increase of concurrency: reorganizing the subtree
Transformations at the net level Increase of concurrency in TE a fb c d Increase of concurrency: reorganizing the subtree ; || a bcdf ; ; d c
Transformations at the net level Increase of concurrency in TE a fb c d ; a Increase of concurrency: reorganizing the subtree ; b || cf d
Transformations at the net level Event hiding a fb c d hiding of b !
Transformations at the net level a f c d b hidden ! Event hiding
Transformations at the net level a fb c d ; || a bcdf ; ; Event hiding : delete the corresponding leaf... Event hiding in TE
Transformations at the net level a fb c d ; a cd ; ; || f Event hiding : delete the corresponding leaf... Event hiding in TE
|| f Transformations at the net level a fb c d ; a cd ; ; f Event hiding : delete the corresponding leaf... and simplify the tree structure Event hiding in TE
Synthesis of control logic For large-scale controllers: Direct translation from Petri Net (or STG-h/s- refined) specifications Logic synthesis from fully refined STGs with pseudo-one-hot encoding, structural techniques and STG-level optimisations
Why direct translation? Logic synthesis has problems with state space explosion, repetitive and regular structures (log-based encoding approach) Direct translation has linear complexity but can be area inefficient (inherent one-hot encoding) What about performance?
Shifter Example (x:=y;y:=a)* [Bystrov at al, 6 th UK Async Forum,’99] Control Logic option Speed (ns) Refined STG directly synthesized by Petrify5.4 Circuit decomposition with two D-elements4.2 Circuit decomposition and Petrify re-synthesis3.3 Re-synthesis with relative timing1.7
Direct Translation of Petri Nets Previous work dates back to 70s Synthesis into event-based (2-phase) circuits (similar to micropipeline control) –S.Patil, F.Furtek (MIT) Synthesis into level-based (4-phase) circuits (similar to synthesis from one-hot encoded FSMs) –R. David (’69, translation FSM graphs to CUSA cells) –L. Hollaar (’82, translation from parallel flowcharts) –V. Varshavsky et al. (’90,’96, translation from PN into an interconnection of David Cells)
David’s original approach a b c d x1x1 x’ 2 x’ 1 x2x2 yaya ycyc ybyb x’ 2 x1x1 Fragment of flow graph CUSA for storing state b
Hollaar’s approach K L A B K N M L N Fragment of flow-chart One-hot circuit cell A B (0) (1) 1 1 M
Hollaar’s approach K L M A B K N M L N Fragment of flow-chart One-hot circuit cell A B (1) 0 1
Hollaar’s approach K L M A B K N M L N Fragment of flow-chart One-hot circuit cell A B (1) 0 1
Varshavsky’s Approach p1p2 p1 p2 (1)(0) (1) 1* (1) Operation Controlled To Operation
Varshavsky’s Approach p1p2 p1 p2 (1)(0) 0->11->0 (1)
Varshavsky’s Approach p1p2 p1 p2 1->0 0->1 1->0 1->0->1 1*
Translation in brief This method has been used for designing control of a token ring adaptor [Yakovlev et al.,Async. Design Methods, 1995] The size of control was about 80 David Cells with 50 controlled hand shakes
Direct translation examples In this work we tried direct translation: From STG-refined specification (VME bus controller) –Worse than logic synthesis From a largish abstract specification with high degree of repetition (mod-6 counter) –Considerable gain to logic synthesis From a small concurrent specification with dense coding space (“butterfly” circuit) –Similar or better than logic synthesisb
Example 1: VME bus controller Result of direct translation (DC unoptimised):
VME bus controller After DC-optimisation (in the style of Varshavsky et al WODES’96)
David Cell library
VME bus controller After DC-optimisation (in the style of Varshavsky et al WODES’96)
“Data path” control logic Example of interface with a handshake control (DTACK, DSR/DSW):
Ex 2: “Flat” mod-6 Counter TE-like Specification: ((p?;q!) 5 ;p?;c!)* Petri net (5-safe): p? c! q! 5 5
“Flat” mod-6 Counter Refined (by hand) and optimised (by Petrify) Petri net:
“Flat” mod-6 counter Result of direct translation (optimised by hand):
David Cells and Timed circuits (a) Speed-independent(b) With Relative Timing
“Flat” mod-6 counter (a) speed-independent(b) with relative timing
“Butterfly” circuit a+a- b- dummy b+ Initial Specification: STG after CSC resolution: a+ a- b+ b- x+ x- y+ y- z+ z-
“Butterfly” circuit Speed-independent logic synthesis solution:
“Butterfly” circuit Speed-independent DC-circuit:
“Butterfly” circuit DC-circuit with aggressive relative timing:
Comparison with logic synthesis ExampleLogic synthesis DC-translation VME-bus (overall operation cycle) 6ns11ns Mod-6 count (p->q/c, worst case cycle) >5ns1.6ns Butterfly (with RT, operation cycle) 2ns1.8ns
DC control with Relative Timing DC op1op2
DC control with Relative Timing DC op1op2 David Cell type Token shift time Speed-independent 1.2ns Mild RT (fast bkwd reset)0.8ns Aggressive RT (fast fwd set)0.4ns
Synthesis Encoding based on a David-cell approach Transformations to improve area and performance Structural methods to derive a circuit [Pastor et al.] Transactions on CAD, Nov’98
Synthesis x+ z+ z- y- x- y+ p1 p2 p3 p4 p5 p6 p7 Next-state function of signal y ?
Synthesis x+ z+ z- y- x- y+ p1 p2 p3 p4 p5 p6 p7 Next-state function of signal y ? y = x + z
Synthesis example: VME bus Device LDS LDTACK D DSr DSw DTACK VME Bus Controller Data Transceiver Bus DSr LDS LDTACK D DTACK Read Cycle
Synthesis example: VME bus LDTACK+ D+ DTACK+ DSr- D- DTACK-LDS- LDTACK-DSr+ LDS+ READ CYCLE SPECIFICATION LDTACK+ D+ DTACK+ DSr- D- DTACK- LDS- LDTACK- DSr+ LDS+ csc0- csc0+ PETRIFY ( Optimizing Performance )
Synthesis example: VME bus p2+ ldtack+ p8-p11- lds+ p1+ d+ p3+ p1- p2- p4+ dtack+ p3- p5+ dsr- p4- p9+p6+ d-p5- p10+p7+ lds-dtack- p9-p6- p11+ ldtack-p8+ dsr+ p10- p7- LDTACK+ D+ DTACK+ DSr- D- DTACK-LDS- LDTACK-DSr+ LDS+
Synthesis example: VME bus p2+ ldtack+ p8-p11- lds+ p1+ d+ p3+ p1- p2- p4+ dtack+ p3- p5+ dsr- p4- p9+p6+ d-p5- p10+p7+ lds-dtack- p9-p6- p11+ ldtack-p8+ dsr+ p10- p7- ldtack+ lds+d+ dtack+ dsr- p9+ d- lds-dtack- p9-ldtack- dsr+
Synthesis example: VME bus ldtack+ lds+d+ dtack+ dsr-p9+ d- lds-dtack- p9-ldtack- dsr+ ldtack+ lds+d+ dtack+ dsr- p9+ d- lds-dtack- p9-ldtack- dsr+
Cost estimation Heuristics: –AREA : { # literals in each Excitacion Region} –PERFORMANCE : length of critical cycle in the net Exploration of the design space guided by cost estimations
Performance estimation: critical cycles e a b c d f g h i j k e a b c d f g h i j k Marked-Graph Decomposition
Conclusions Fully automated design flow –From HDLs (control / data splitting) –Existing tools for data-path synthesis –Direct synthesis guarantees implementation (HDL Petri net, Petri-net-based encoding) –Synthesis of large controllers by efficient spec models (Free-choice Petri nets + trace expressions) –Exploration of the design space (optimization) by property-preserving transformations –Logic synthesis by structural methods