Download presentation
Presentation is loading. Please wait.
Published byTimothy Miller Modified over 9 years ago
1
Electrical and Computer Engineering Muhammad Noman Ashraf Optimization of Data-Flow Computations Using Canonical TED Representation M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimization of Data-Flow Computations Using Canonical TED Representation”, in IEEE Transactions on Computer-Aided design of Integrated Circuits and Systems ECE 667 Synthesis and Verification of Digital Systems Spring 2011 Slides adapted from D. Gomez-Prado,Q. Ren, M. Ciesielski, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
2
2 Electrical and Computer Engineering Overview Motivation TED Review Related Work TED Decomposition System TED Linearization Product Term Extraction Sum-Term Extraction Reordering DFG Generation Replacing constant multipliers by Shifters Conclusion References
3
3 Electrical and Computer Engineering Motivation F=a ⋅ (f ⋅ (g+d ⋅ c)+c ⋅ e ⋅ g) F=a ⋅ f ⋅ g+a ⋅ f d ⋅ c+a ⋅ c ⋅ e ⋅ g Minimum number of operations: 5MPY, 2ADD F=(a ⋅ f)(g+d ⋅ c)+(a ⋅ c) ⋅ e ⋅ g number of operations: 6MPY, 2ADD Res: 2 MPY,1 ADD 8 MPY, 2 ADD 1 2 3 4 5 1 2 3 4 L=3 MPY +1 ADD L = 3 MPY +2 ADD Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
4
4 Electrical and Computer Engineering TED Review [Construction] zu qw (zu+qw) + x(zu+qw) pw 2 + + yw Canonical for the given order: x,z,u,q,p,y,w 1 2 w ^2 1 w Notation: NON-LINEAR Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
5
5 Electrical and Computer Engineering RELATED WORK HDL Compilers High level synthesis systems – Cyber, Spark, Catapult C – Lacks local optimility Kernel based decomposition [Hosangadi et al, Optimizing Polynomial Expressions by algebraic factorization and cse, IEEE Transactions 2005] Lacks canonicity Cut based decomposition (TED based) [Askar et al. “Data-flow transformations using Taylor expansion diagrams,” in Proc. Des. Autom. Test Eur., 2007] Limitation – only applicable to TEDs with disjoint decomposition property
6
6 Electrical and Computer Engineering Cut based decomposition (Related Work) Top down approach Apply a series of cuts (additive and multiplicative) to the edges such that it separates into two disjoint sub-graphs Different sequence of cuts results in different DFG Sequence - A3,A1,M1,A2
7
7 Electrical and Computer Engineering Cut based decomposition (Related Work) Top down approach Apply a series of cuts (additive and multiplicative) to the edges such that it separates into two disjoint sub-graphs Different sequence of cuts results in different DFG Sequence – A1,A3,M1,A2 Sequence - A3,A1,M1,A2
8
8 Electrical and Computer Engineering TED decomposition [TDS] Cut based decomposition mentioned earlier only works for TEDs with disjoint decomposition property Many TEDs don’t have this property New approach – Bottom up Identify algebraic operations and extract from the graph Also works for TEDs without disjoint decomposition property TED based factorization, CSE, and decomposition jointly referred asTED decomposition Systematically involves Linearization Product-term extraction Sum-term extraction Reordering DFG generation
9
9 Electrical and Computer Engineering Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009) TDS System Overview TED linearization Variable ordering TED factorization & decomposition Constant multiplication & shifter generation Common subexpression elimination (CSE) TED-based Transformations Static timing analysis Latency optimization Resource constraints DFG-based Transformations Behavioral transformations Optimized DFG TDS netlist Design objectives Design constraints Structural elements Functional TED Structural DFG TDS flow Matrix transforms, Polynomials C, Behavioral HDL DFG extraction High Level Synthesis (GAUT) RTL VHDL Original DFG HLS flow
10
10 Electrical and Computer Engineering TED Linearization TED naturally represents polynomial in its factored form This efficiency is missing when considering non-linear expressions F=a 2 c+abc a could be factored out split a^2 into a1 and a2 F=a 1 (a 2 +b)c
11
11 Electrical and Computer Engineering TED Decomposition split w^2 into w1 and w2 TED Linearization [back to previous example] Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
12
12 Electrical and Computer Engineering TED Linearization [Concept] ^1 x ^n ^0 F0F0 F1F1 FnFn ….. x1x1 ^0 F0F0 x2x2 F1F1 xnxn F n-1 FnFn ^1 ^0 ^1 split x k = x 1.x 2.x 3 …..x k, where x i =x j for all i,j iteratively perform splitting on high order nodes above substitution results in Horner form which contains minimum no. of multiplications
13
13 Electrical and Computer Engineering Product Term Extraction Extractable Product Term – product of variables which appear in expression only once Can be extracted from TED without duplicating any of it’s variables Set of nodes connected by a series of multiplicative edges only starting and ending nodes can have incident additive edges Starting and ending nodes can have more than one incoming or outgoing multiplicative edge Ending node can be terminal node 1 [TDS] recursively identify such terms by traversing the graph in a bottom-up fashion For each node use depth first approach for including nodes in product term
14
14 Electrical and Computer Engineering start u has only one * parent …YES u has only one child path …YES z has only one * parent …YES z has only one * child path …NO CONTINUE BACKTRACK zu P1 P2 Product-Term Extraction [back to example] Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
15
15 Electrical and Computer Engineering Sum Term Extraction Extractable Sum Term – sum of variables which appear in expression only once Can be extracted from TED without duplicating any of it’s variables “Set of nodes incident to multiplicative edges joined at a single common node, such that nodes in question are connected by a chain of additive edges only” [TDS] recursively identify such terms by traversing the graph in a bottom-up fashion For each node, make a list of incident nodes and extract the nodes from the list if connected by additive edges only [TDS] Uses associativity property of addition
16
16 Electrical and Computer Engineering Keep support (irreducible) start S1 Sum-Term Extraction [back to example] Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
17
17 Electrical and Computer Engineering Sum Term Extraction Extractable Sum Term – sum of variables which appear in expression only once Can be extracted from TED without duplicating any of it’s variables “Set of nodes incident to multiplicative edges joined at a single common node, such that nodes in question are connected by a chain of additive edges only” [TDS] recursively identify such terms by traversing the graph in a bottom-up fashion For each node, make a list of incident nodes and extract the nodes from the list if connected by additive edges only [TDS] Uses associativity property of addition
18
18 Electrical and Computer Engineering Example to illustrate Associativity* S1=b+d S2=a+c
19
19 Electrical and Computer Engineering Stop when TED is Irreducible. Now generate DFG – (to be explained later) If Sum term extraction results in more product terms, go back Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009) Sum-Term Extraction [cont. – back to example]
20
20 Electrical and Computer Engineering P3 P4 P5 S3 Stop when TED is Irreducible. S2 Reordering [Back to previous example -> Iteration 2 extraction] Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
21
21 Electrical and Computer Engineering F = S3 = P5+P4 = x·S2+w1·S1 = x·(P1+P3)+w1·(P2+y) = x·(z·u+q·w1)+w1·(p·w2+y) = x·(z·u+q·w)+w·(p·w+y) 1× total: 5 MPY, 3 ADD 1+ Normal Factored Form* Factored form associated with a TED is called NFF for that TED, if the order Of variables in the factored form is Compatible with the order in the given TED Theorem: The NFF derived from a linear TED Is unique Canonical Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
22
22 Electrical and Computer Engineering DFG Generation and Optimization Transform each irreducible TED into simple DFG Additive edge -> addition operation Multiplicative edge -> multiplication operation Break multiple operands operations into chain of operations [TDS] maintain a hash table for DFG nodes keyed by the corresponding function Helps in reusing the node, if same function/expression found again Captures redundancy due to poor variable order during factorization DFG is not unique Can be restructured and balanced to minimize cost
23
23 Electrical and Computer Engineering Data Flow Graph L=2 MPY +2 ADD Req 3 MPY, 2 ADD total: 5 MPY, 3 ADD Reordering cost 1 2 3 4 Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
24
24 Electrical and Computer Engineering S2 P3 P4 S3 L=2 MPY +2 ADD Req 3 MPY, 2 ADD Reordering [-> Iteration 3 extraction] Cost involves Reordering of variable Extraction DFG generation Annotating Latency and resource requirements Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
25
25 Electrical and Computer Engineering 1 2 3 4 F 1 2 3 4 5 total: 4 MPY, 3 ADD F = S3 = P4+P3 = w ⋅ S2+x ⋅ P1 = w ⋅ (q+S1)+x ⋅ (z ⋅ u) = w ⋅ (q+P2+y)+x ⋅ z ⋅ u = w ⋅ (q+p ⋅ w+y)+x ⋅ z ⋅ u L=2 MPY +2 ADD L=2 MPY +3 ADD Req 1 MPY,1 ADD 1× 1+ Reordering cost L=2 MPY +2 ADD Req 2 MPY, 1 ADD Previous cost L=2MPY+2ADD Req=3MPY,2ADD Generating and evaluating new Data Flow Graph [Iteration 3] Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
26
26 Electrical and Computer Engineering Through reordering all cases can be obtained 1 2 3 4 Reordering [-> Iteration 4 extraction,DFG generation] Design Space Exploration Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
27
27 Electrical and Computer Engineering Replacing constant multipliers* By shifters Transform constant multiplications into shifters, while considering factorization involving shifters Steps Represent constant in CSD format – Use shift variable L i (instead of 2 i for shifting i bits Generate TED with shift variables, linearize it and perform decomposition Replace terms involving shift variables (L i ) by i-bit shifters 7a + 6b L 3 (a+b) - L.b - a ((a+b)<<3) – (a+(b<<1)) (L 3 -1)a+(L 3 -L)b
28
28 Electrical and Computer Engineering Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009) TDS – TED Decomposition System RECAP Read in the CDFG file (cdfg) or polynomial expression (poly) or using pre-coded DSP transforms (tr) Translate into functional TED (dfg2ted) and structural elements (comparators etc.) Linearize its data path (linearize) Iterate Iterate Product term extraction Sum term extraction Reorder to minimize latency (reorder) Set of irreducible TEDs Produce Final DFG (ted2dfg)and annotate back the CDFG file (write) Data flow and computation intensive designs - DSP Design Space Exploration
29
29 Electrical and Computer Engineering Conclusion Results in the paper show 15% Latency improvement and 7% area reduction when using DFG generated from TDS instead of using KBD Far better results when compared to original DFG TDS – front end to GAUT Fundamental limitation – decomposition dependent upon variable reordering which is an expensive operation
30
30 Electrical and Computer Engineering REFERENCES M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimization of Data-Flow Computations Using Canonical TED Representation”, in IEEE Transactions on Computer-Aided design of Integrated Circuits and Systems M. Ciesielski, S. Askar, D. Gomez-Prado, J. Guillot, and E. Boutillon, “Data-flow transformations using Taylor expansion diagrams,” in Proc. Des. Autom. Test Eur., 2007, pp. 455–460 TDS—TED-Based Dataflow Decomposition System, Univ. Massachusetts,Amherst, MA. [Online]. Available: http://www.ecs.umass.edu/ece/labs/vlsicad/tds.html
31
31 Electrical and Computer Engineering QUESTIONS?
32
32 Electrical and Computer Engineering Experiment Setup* TED linearization Variable ordering TED factorization & decomposition Constant multiplication & shifter generation Common subexpression elimination (CSE) TED-based Transformations Static timing analysis Latency optimization Resource constraints DFG-based Transformations Behavioral transformations Optimized DFG TDS netlist Design objectives Design constraints Structural elements Functional TED Structural DFG TDS flow Matrix transforms, Polynomials C, Behavioral HDL DFG extraction High Level Synthesis (GAUT) RTL VHDL Original DFG HLS flow KBD ORIGINAL TED Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
33
33 Electrical and Computer Engineering Results* KBD Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
34
34 Electrical and Computer Engineering Results: Quintic Spline* KBD Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
35
35 Electrical and Computer Engineering Results: Quartic spline* KBD Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
36
36 Electrical and Computer Engineering Improvement over KBD and Original* KBD Slide adapted from M. Ciesielski, D. Gomez-Prado,Q. Ren, J. Guillot and Emmanuel Boutillon, “Optimizing Data Flow Graphs to Minimize Hardware Implementations”, DATE (2009)
37
37 Electrical and Computer Engineering
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.