Factoring and Eliminating Common Subexpressions in Polynomial Expressions International Conference on Computer Aided Design (ICCAD), 2004 Farzan Fallah.

Factoring and Eliminating Common Subexpressions in Polynomial Expressions International Conference on Computer Aided Design (ICCAD), 2004 Farzan Fallah Advanced CAD Research Fujitsu Labs. of America Farzan Fallah Advanced CAD Research Fujitsu Labs. of America Anup Hosangadi Ryan Kastner ECE Department, UCSB Anup Hosangadi Ryan Kastner ECE Department, UCSB

Outline Introduction Related Work Algebraic techniques for redundancy elimination Experimental results Conclusions

Introduction Embedded system applications need to compute polynomial expressions –Continuous functions can be approximated by polynomials to desired degree of accuracy. –Adaptive signal processing (Polynomial filters ) –Polynomial interpolation/extrapolation in Computer Graphics –Encryption

Introduction Multiplications are expensive in Embedded systems No good optimization tool for reducing complexity of polynomials –Designers rely on Hand optimized libraries Conventional optimization techniques –CSE, Value numbering: not suited for polynomials –Horner form: most popular representation –a n x n + a 1 x n-1 + ….a n-1 x + a 0 = (…((a n x + a n-1 )x + a n-2 )x +..a 1 )x + a 0 –Not good for multivariate polynomials –Only a single polynomial expression at a time

Introduction Quartic-spline polynomial (3-D graphics) P = zu 4 + 4avu 3 + 6bu 2 v 2 + 4uv 3 w + qv 4 Horner form (from Maple TM ) P = zu 4 + (4au 3 + (6bu 2 + (4uw + qv)v)v)v (17 multiplications) (17 multiplications) Proposed algebraic method: d 1 = v 2 ; d 2 = d 1 *v d 1 = v 2 ; d 2 = d 1 *v P = u 3 (uz + ad 2 ) + d 1 ( qd 1 + u(wd 2 + 6bu) ) P = u 3 (uz + ad 2 ) + d 1 ( qd 1 + u(wd 2 + 6bu) ) (11 multiplications)

Related Work Expression Factorization (M.A.Breuer JACM’69) –Allows only one kind of operator at a time Symbolic algebra techniques (A. Peymandoust, De’Micheli DAC’01) (A. Peymandoust, De’Micheli DAC’01) –Used for mapping DSP datapaths (polynomials) to library elements –Results depend upon exponential library search eg. a 2 – b 2 = (a+b)(a-b) iff (a+b) or (a –b) is in library eg. a 2 – b 2 = (a+b)(a-b) iff (a+b) or (a –b) is in library –Manipulates only one expression at a time. F 1 = A + B + C + D; F 2 = A + P + D; => Extract (A + D)

Motivating Example Consider set of expressions –Naïve implementation: 16 multiplications, 4 additions/subtractions Using CSE –12 multiplications, 4 additions/subtractions

Motivating Example Using our algebraic techniques –Total 7 multiplications, 3 additions/subtractions –Savings of 5 multiplications, 1 addition/subtraction compared to CSE Impossible to obtain such results using conventional techniques

Introduction to algebraic techniques for redundancy elimination Algebraic techniques in multi-level logic synthesis (MLLS) –Decomposition, factoring reduce number of literals – Distill and Condense use Rectangle Covering methods. Polynomial Expressions (Our Technique) –Factoring, Single term common subexpressions reduces number of multiplications –Multiple term common subexpressions reduces number of additions and possibly multiplications Key Differences (Generalization to handle higher orders) –Kernelling techniques –Finding single cube intersections

Introduction to our technique (Outline) Find a subset of all possible subexpressions (kernel generation) Transformation of Polynomial Expressions –Problem formulation Extract multiple term common subexpressions and factors Extract single term common factors

Introduction to our technique Terminology –Literal: A variable or a constant eg. a,b,2,3.14 –Cube: Product of literals eg. +3a 2 b, -2a 3 b 2 c –SOP: Sum of cubes eg. +3a 2 b – 2a 3 b 2 c –Cube-free expression: No literal or cube can divide all the cubes of the expressions –Kernel: A cube free sub-expression of an expression, eg. 3 – 2abc –Co-Kernel: A cube that is used to divide an expression to get a kernel, eg. a 2 b

Introduction to our Technique Matrix Representation of Arithmetic Expressions –F = x 3 y – xy 2 z is represented by –Each row represents a product term –Each column represents a variable/constant –Each element (i,j) represents power of variable j in term i +/-xyz +310 -121

Generation of Kernels (example) P 1 = x 3 y + x 2 y 2 z {L} = {x,y,z} – Divide by x: F t = P 1 /x = x 2 y + xy 2 z F t = P 1 /x = x 2 y + xy 2 z xyz 310 221xyz210 121

Generation of Kernels (example) F t = P 1 /x = x 2 y + xy 2 z F t = P 1 /x = x 2 y + xy 2 z C = Biggest Cube dividing all cubes of F t xyz 210 121 1 1 0 / C =xyz100 011

Generation of Kernels (example) Obtain Kernel: F 1 = F t /C = (x 2 y + xy 2 z)/(xy) = ( x + yz) F 1 = F t /C = (x 2 y + xy 2 z)/(xy) = ( x + yz) Obtain Co-Kernel D 1 = x*(xy) = x 2 y D 1 = x*(xy) = x 2 y –No kernels within F 1. Go back to P 1 P 1 = x 3 y + x 2 y 2 z –Divide now by next variable y F t = x 3 + x 2 yz F t = x 3 + x 2 yz –C = x 2 –But (x < y) ε C Stop Here, to avoid repeating same kernel F t /C = (x + yz) Stop Here, to avoid repeating same kernel F t /C = (x + yz) –No more kernels extracted –Record kernel F 1 = P 1 with co-kernel ‘1’

Concept of kernels and co-kernels Theorem: Two expressions f and g can have a multiple term common subexpression iff there are 2 kernels K f and K g having a multiple term intersection Detection of multiple term common subexpressions by intersection of sets of kernels. Each co-kernel : kernel pair represents a possible factorization – eg. x 3 y + x 2 y 2 z = [x 2 y](x + yz) Set of kernels a subset of all possible subexpressions

All Kernels and Co Kernels Which kernels to choose?

Kernel Cube Matrix (KCM) One row for each Kernel generated One column for each distinct kernel cube Each non-zero element represents a term Kernel Cubes xyz4-yz-x CoKernels4 1 (3) 1 (4) 000 x2yx2yx2yx2y 1 (1) 1 (2) 000 x00 1 (3) 1 (5) 0 xy00 1 (6) 0 1 (7) yz00 1 (4) 0 1 (5) x3yx3y

Finding Kernel Intersections (Distill Algorithm) Each kernel intersection or factor appears as a rectangle –Rectangle: Set of rows and columns such that all elements are ‘1’ Value of a rectangle = weighted sum of the number of operations saved Goal: Maximum valued rectangular covering of KCM Greedy heuristic: covering by prime rectangles –Prime rectangle: Rectangle not covered by any other rectangle

Finding Kernel Intersections (Distill Algorithm) Formula for Value of a rectangle R = number of rows; R = number of rows; C = number of columns C = number of columns M(R i ) = # of multiplications in row (co-kernel) i. M(R i ) = # of multiplications in row (co-kernel) i. M(C i ) = # of multiplications in column (kernel-cube) i M(C i ) = # of multiplications in column (kernel-cube) i m = ratio of weights of multiplication to addition m = ratio of weights of multiplication to addition Value = Formula calculates savings in operation count

Distill Algorithm Kernel Cubes xyz4-yz-x CoKernels4 1 (3) 1 (4) 000 x2yx2yx2yx2y 1 (1) 1 (2) 000 x00 1 (3 ) 1 (5) 0 xy00 1 (6 ) 0 1 (7) yz00 1 (4 ) 0 1 (5) 4x + 4yz = 4d 1 d 1 = (x + yz) x 3 y + x 2 y 2 z = x 2 yd 1 Saves 5 multiplications and 1 addition

Distill Algorithm Kernel Cubes xyz4-yz-x CoKernels4 1 (3) 1 (4) 000 x2yx2yx2yx2y 1 (1) 1 (2) 000 x00 1 (3 ) 1 (5) 0 xy00 1 (6 ) 0 1 (7) yz00 1 (4 ) 0 1 (5) Remove covered terms 4xy – x 2 y = xyd 2 d 2 = 4 – x Saves 2 multiplications

Distill Algorithm Distill algorithm exits after no more kernel intersections can be found P 1 = x 2 yd 1 d 1 = x + yz P 2 = 4d 1 – xyz d 2 = 4 - x P 3 = xyd 1 Can further optimize by finding single cube intersections

Finding single cube intersections (Condense Algorithm) Need an algorithm for finding single term common subexpressions Consider two single term expressions –F 1 = a 4 b 3 c –F 2 = a 2 b 4 c 2 Form Cube Variable Incidence Matrix (CIM) abc 431 242 One row for each product term. One column for each variable

Finding single cube intersections (Condense algorithm) Each (single term) common subexpression appears as a rectangle. –Rectangle: Set of rows and columns where all elements are non- zero Value of a rectangle is number of multiplications saved by selecting it – C = cube corresponding to the rectangle Value = Rows*( (ΣC[i] ) -1) Value = Rows*( (ΣC[i] ) -1) Maximum valued rectangular covering will give minimum number of multiplications Use greedy iterative covering by prime rectangles

Finding single cube intersections (Condense algorithm) abc 431 242 231 d 1 = a 2 b 3 cabc d1d1d1d12001 0111 2310 0110 d 2 = bc

Finding single cube intersections (Condense algorithm) abc d1d1d1d1 d2d2d2d2 20010 00011 22001 01100 20000 d 3 = a 2

Finding single cube intersections (Condense algorithm) Final CIM Final Implementation ( 7 multiplications) d 3 = a*a d 2 = b*c d 1 = b*b*d 2 *d 3 F 1 = d 1 *d 3 F 2 = d 1 *d 2 abc d1d1d1d1 d2d2d2d2 d3d3d3d3000101 000110 020011 011000 200000

Cube Literal Matrix (Condense Algorithm) Literals Term +/- xyz4 d1d1d1d1 d2d2d2d2 Cubes 1+210010 2+000110 3-111000 4+110001 5+100000 6+011000 7+000100 8-100000 Save 2 multiplications by extracting xy CIM for our example after Distill algorithm

Condense Algorithm Literals Term +/- xyz4 d1d1d1d1 d2d2d2d2 Cubes 1+100010 2+000110 3-001000 4+000001 5+100000 6+011000 7+000100 8-100000 Extracting xy No more favorable cube intersections found

Final Implementation –Total 7 multiplications, 3 additions/subtractions –Savings of 5 multiplications, 1 addition/subtraction compared to CSE Impossible to obtain such results using conventional techniques

Optimization of sin(x) Kernels1 -S 3 x 2 S5x4S5x4S5x4S5x4 -S 7 x 6 -S 3 S5x2S5x2S5x2S5x2 -S 7 x 4 S5S5S5S5 -S 7 x 2 x 1 (1) 1 (2) 1 (3) 1 (4) 00000 x3x3x3x30000 1 (2) 1 (3) 1 (4) 00 x5x5x5x50000000 1 (3) 1 (4) Sin (x) = x + x 3 (-S 3 + S 5 x 2 – S 7 x 4 ) Saves 6 multiplications

Optimization of sin(x) Final Implementation: X = x*x X = x*x Sin(x) = x*(1 + (-S 3 + (S 5 + S 7 *X)*X) ) *X) Sin(x) = x*(1 + (-S 3 + (S 5 + S 7 *X)*X) ) *X) –Total 5 multiplications and 3 additions/subtractions SAME AS GNU C HAND optimized form Kernels 1 x2d1x2d1x2d1x2d1 S5S5S5S5 -S 7 x 2 x 1 (1) 1 (2) 00 x2x2x2x200 1 (4) 1 (5)

Experimental Setup (Sequential processor) Signal processing and multimedia applications –MP3 decoder, Mesa (graphics), Adaptive filter, FFT, FIR –Taylor series approximation of trigonometric functions –Optimizations on arithmetic subgraphs from Dataflow graphs (DFGs) Polynomials from computer graphics –Multivariate polynomial approximation Compared number of operations with CSE and Horner form Estimated savings in clock cycles on ARM core

ApplicationFunctionUnoptimizedCSE Horner Horner Our technique AMAMAMAM MP3 decoder hwin_init8026072162801106486 imdct631896310863906354 Mesagl_rotation1092103410371015 Adaptive filter LMS35130358535553540 Gaussian noise filter FIR362243614336893663 Fast convolution FFT451944511245834556 Graphicsquartic-spline423417420414 Graphicsquintic-spline534522523516 Graphicschebyshev832818818811 Graphicscos-wavelet1743172417191517 Average30.312229.572.530.354.428.537.2 Experimental results (comparing number of operations from different methods) Average run time = 0.45s for our technique

Experimental results (Improvement over CSE and Horner) ApplicationFunction Over CSE Over Horner M Clock cycles on ARM 7 M MP3 decoder hwin_init46.9%44.0%21.8%21.6% imdct50.0%44.7%40.0%35.1% Mesagl_rotation55.9%52.8%59.5%56.4% Adaptive filter LMS52.9%48.9%27.3%24.2% Gaussian noise filter FIR55.9%53.3%29.2%27.0% Fast convolution FFT50.0%46.3%32.5%29.3% Graphics quartic- spline 17.6%16.8%30.0%28.8% Graphicsquintic-spline27.3%26.1%30.4%29.2% Graphicschebyshev38.9%35.7%38.9%35.7% Graphicscos-wavelet29.2%27.0%10.5%10.7% Average42.5%39.6%32.0%29.8%

Conclusions Development of new algebraic technique for optimizing polynomial expressions. Currently used for minimizing number of arithmetic operations using greedy rectangular covering Results better than conventional techniques

Future Work Develop and implement optimal algorithms to compare results with our greedy heuristic. Optimization for delay, energy. Integrate our technique with conventional compiler optimization pass to measure impact on the whole application.

Thank You Questions ??

Extra slides

Finding Kernel Intersections (Distill Algorithm) Worst case scenario for Distill algorithm Number of prime rectangles exponential in number of rows/columns –Heuristic methods to find best prime rectangle –In practice polynomial expressions are not so large 1111 1111 1111 1111 1111

Factoring and Eliminating Common Subexpressions in Polynomial Expressions International Conference on Computer Aided Design (ICCAD), 2004 Farzan Fallah.

Similar presentations

Presentation on theme: "Factoring and Eliminating Common Subexpressions in Polynomial Expressions International Conference on Computer Aided Design (ICCAD), 2004 Farzan Fallah."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Factoring and Eliminating Common Subexpressions in Polynomial Expressions International Conference on Computer Aided Design (ICCAD), 2004 Farzan Fallah.

Similar presentations

Presentation on theme: "Factoring and Eliminating Common Subexpressions in Polynomial Expressions International Conference on Computer Aided Design (ICCAD), 2004 Farzan Fallah."— Presentation transcript:

Similar presentations

About project

Feedback