Download presentation
Presentation is loading. Please wait.
1
Reducing Hardware Complexity of Linear DSP Systems by Iteratively Eliminating Two-Term Common Subexpressions IEEE/ACM Asia South Pacific Design Automation Conference (ASP-DAC), Shanghai, 2005 Anup Hosangadi Ryan Kastner ECE Department, UCSB Farzan Fallah Advanced CAD Research Fujitsu Labs of America
2
Outline Introduction Related Work Polynomial transformation Common Subexpression elimination Results Conclusions
3
Introduction Multiplications by constants encountered in many application areas DSP transforms in Audio, Video, Image processing (DFT, DCT, IDCT etc..) Filtering operations in Communication (FIR, IIR filters) Multiple Input Multiple Output (MIMO) systems Polynomials in Computer graphics
4
Introduction Multiplication is expensive in hardware Decompose constant multiplications into shifts and additions 13*X = (1101) 2 *X = X + X<<2 + X<<3 Signed digits can reduce the number of additions/subtractions Canonical Signed Digits (CSD) (Knuth’74) (57) 10 = (0110111) 2 = (100-1001) CSD Further reduction possible by common subexpression elimination Upto 50% reduction (R.Hartley TCS’96)
5
Introduction Common subexpressions = common digit patterns F 1 = 7*X = (0111)*X = X + X<<1 + X<<2 F 2 = 13*X = (1101)*X = X + X<<2 + X<<3 D 1 = X + X<<2 F 1 = D 1 + X<<1 F 2 = D 1 + X<<3 Good for single variable: FIR filters (transposed form) Multiple variable? (DFT, DCT etc..??) “0101” => X + X<<2 3+, 3<< 4+, 4<<
6
Related Work Simple Bipartite matching (Potkonjak et. al TCAD’95) (10101) and (01101) => common pattern = “101” (10010) and (010010) => cannot detect pattern “1001” Recursive Shift and Add (RESANDS) (H.Nguyen et. Al, TVLSI 2000) (10010) and (010010) => common pattern “1001” Exhaustive enumeration of all digit patterns (Pasko et. Al. TCAD’99) (1011) => “0011”, “1001”, “1010”, “0101”, “1011”
7
Related Work Extending techniques for multiple variables Y 1 a 11 a 12 a 13 X 1 Y 2 = a 21 a 22 a 23 x X 2 Y 3 a 31 a 32 a 33 X 3 101100 011101 100101 All Distinct S ij X j and C ik D k Y1Y1 Y2Y2 Y3Y3 Potkonjak et. al. TCAD’95
8
Related Work Multiple Variable Common Subexpression elimination (A.Hosangadi et. al ASAP’04) Polynomial transformation of linear systems. Use rectangular covering methods Cannot find subexpressions with reversed signs eg. (X 1 – X 2 <<1) ≠ (X 2 <<1 – X 1 ) Common occurrence when signed digits are used Rectangle covering has exponential complexity Method to overcome these limitations ?
9
Related Work Algebraic methods in multi- level logic synthesis (MLLS) Reducing literal count in a set of Boolean expressions Factoring, decomposition: Established algebraic techniques Typically used for thousands of variables and literals Apply these methods to optimize linear systems? D 1 = X 1 + X 2 <<2 Y 1 = D 1 + D 1 <<3 + X 1 <<3 Y 2 = D 1 + X 2 <<2
10
Linear systems and polynomial transformation View linear systems as set of arithmetic expressions Expressions consisting of +,-,<< operators Develop methodology for extracting common subexpressions Polynomial formulation C × X = (±X×L i ) (14) 10 × X = (1110) 2 × X = X<<3 + X<<2 + X<<1 = XL 3 + XL 2 + XL 1 = (100-10) CSD × X = XL 4 – XL 1
11
Linear Systems and polynomial transformation Y 0 1 1 1 1 X 0 Y 1 = 2 1 -1 -2 X 1 Y 2 1 -1 -1 1 X 2 Y 3 1 -2 2 -1 X 3 Decomposing constant multiplications Y 0 = X 0 + X 1 + X 2 + X 3 Y 1 = X 0 <<1 + X 1 - X 2 - X 3 <<1 Y 2 = X 0 - X 1 - X 2 + X 3 Y 3 = X 0 - X 1 <<1 + X 2 <<1 - X 3 Y 0 = X 0 + X 1 + X 2 + X 3 Y 1 = X 0 <<1 + X 1 - X 2 - X 3 <<1 Y 2 = X 0 - X 1 - X 2 + X 3 Y 3 = X 0 - X 1 <<1 + X 2 <<1 - X 3 12+, 4<< H.264 Integer Transform
12
Linear Systems and polynomial transformation Y 0 1 1 1 1 X 0 Y 1 = 2 1 -1 -2 X 1 Y 2 1 -1 -1 1 X 2 Y 3 1 -2 2 -1 X 3 Polynomial transformation Y 0 = X 0 + X 1 + X 2 + X 3 Y 1 = X 0 L + X 1 - X 2 - X 3 L Y 2 = X 0 - X 1 - X 2 + X 3 Y 3 = X 0 - X 1 L + X 2 L - X 3 Y 0 = X 0 + X 1 + X 2 + X 3 Y 1 = X 0 L + X 1 - X 2 - X 3 L Y 2 = X 0 - X 1 - X 2 + X 3 Y 3 = X 0 - X 1 L + X 2 L - X 3 12+, 4<< H.264 Integer Transform
13
Fx algorithm Concurrent Decomposition and Factorization of Boolean Expressions (J.Rajski et. al TCAD’92) Popular as Fast-Extract (Fx) algorithm Expression f = gh + r g = (ab + c) => Double cube divisor g = ab => Single cube divisor Fx algorithm for Linear systems?
14
Two-term divisors Obtained from every pair of terms in each expression Divide by the minimum exponent of L eg. F = X 1 + X 2 L + X 3 L 3 { +X 2 L, +X 3 L 3 }: Divide by L => (X 2 + X 3 L 2 ) Divisors = (X 1 + X 2 L), (X 1 + X 3 L 3 ), (X 2 + X 3 L 2 ) Two divisors intersect if The terms involved are distinct (X 1 – X 2 L) ∩ ( X 1 - X 2 L ) = φ (X 1 – X 2 L) ∩ (-X 1 + X 2 L) = φ (reversed signs allowed !!)
15
Two-term divisors Theorem: Multiple term common subexpression in set of expression iff non- overlapping intersection among two-term divisors Many divisors with intersections, which one to choose? Use greedy selection of divisor with most # of intersections Selecting divisors changes expressions Perform concurrent decomposition of expressions
16
Algorithm (Step 1) Creating set of divisors {Divisors}; {Divisors} = φ; for each expression P i { {D new } = Divisors for P i ; {Divisors} = {Divisors} ∩ {D new }; Update frequency statistics of {Divisors} ; }
17
Algorithm (Step 2) Common Subexpression Elimination {Divisors} = Set of all 2-term divisors; while( intersections present) { Find Best_Divisor in {Divisors} ; {T} = Set of terms involved in intersection; {D} = Set of divisors involving any term in {T} ; {Divisors} = {Divisors} – {D}; Rewrite Expressions; {D new } = New Divisors involving new terms; {Divisors} = {Divisors} ∩ {D new }; }
18
Algorithm complexity MxM constant matrix; N digits of precision Y 0 1111 1111 1011 1001 Y 0 = X 0 + X 0 L +... X M-1 L 3 + X M-1 Y 1.. … … … ….. Y M-1 1111 1110 0011 1010 M M N O(MN) terms => O(M 2 N 2 ) divisors
19
Algorithm (Step 1) Creating set of divisors {Divisors}; {Divisors} = φ; for each expression P i { {D new } = Divisors for P i ; {Divisors} = {Divisors} ∩ {D new }; Update frequency statistics of {Divisors} ; } O(M 2 N 2 ) distinct divisors O(M 2 N 2 ) O(M 3 N 2 )
20
Algorithm (Step 2) Common Subexpression Elimination {Divisors} = Set of all 2-term divisors; while( intersections present) { Find Best_Divisor in {Divisors} ; {T} = Set of terms involved in intersection; {D} = Set of divisors involving any term in {T} ; {Divisors} = {Divisors} – {D}; Rewrite Expressions; {D new } = New Divisors involving new terms; {Divisors} = {Divisors} ∩ {D new }; } O(M 2 N 2 )
21
Algorithm H.264 example >> Select D 0 = (X 0 + X 3 ) Y 0 = X 0 + X 1 + X 2 + X 3 Y 1 = X 0 L + X 1 - X 2 - X 3 L Y 2 = X 0 - X 1 - X 2 + X 3 Y 3 = X 0 - X 1 L + X 2 L - X 3 Y 0 = X 0 + X 1 + X 2 + X 3 Y 1 = X 0 L + X 1 - X 2 - X 3 L Y 2 = X 0 - X 1 - X 2 + X 3 Y 3 = X 0 - X 1 L + X 2 L - X 3
22
Algorithm H.264 example >> Select D 1 = (X 1 – X 2 ) Y 0 = D 0 + X 1 + X 2 Y 1 = X 0 L + X 1 - X 2 - X 3 L Y 2 = D 0 - X 1 - X 2 Y 3 = X 0 - X 1 L + X 2 L - X 3 Y 0 = D 0 + X 1 + X 2 Y 1 = X 0 L + X 1 - X 2 - X 3 L Y 2 = D 0 - X 1 - X 2 Y 3 = X 0 - X 1 L + X 2 L - X 3
23
Algorithm H.264 example >> Select D 2 = (X 1 + X 2 ) Y 0 = D 0 + X 1 + X 2 Y 1 = X 0 L + D 1 - X 3 L Y 2 = D 0 - X 1 - X 2 Y 3 = X 0 - D 1 L - X 3 Y 0 = D 0 + X 1 + X 2 Y 1 = X 0 L + D 1 - X 3 L Y 2 = D 0 - X 1 - X 2 Y 3 = X 0 - D 1 L - X 3
24
Algorithm H.264 example >> Select D 3 = (X 0 – X 3 ) Y 0 = D 0 + D 2 Y 1 = X 0 L + D 1 - X 3 L Y 2 = D 0 - D 2 Y 3 = X 0 - D 1 L - X 3 Y 0 = D 0 + D 2 Y 1 = X 0 L + D 1 - X 3 L Y 2 = D 0 - D 2 Y 3 = X 0 - D 1 L - X 3
25
Final Implementation Extracting 4 divisors D 0 = X 0 + X 3 Y 0 = D 0 + D 2 D 1 = X 1 – X 2 Y 1 = D 1 + D 3 L D 2 = X 1 + X 2 Y 2 = D 0 - D 2 D 3 = X 0 - X 3 Y 3 = D 3 – D 1 L D 0 = X 0 + X 3 Y 0 = D 0 + D 2 D 1 = X 1 – X 2 Y 1 = D 1 + D 3 L D 2 = X 1 + X 2 Y 2 = D 0 - D 2 D 3 = X 0 - X 3 Y 3 = D 3 – D 1 L 8+, 2<< Original: 12+, 4<< Rectangle Covering: 10+, 3<<
26
Experimental Setup Goal Reduction in #additions/subtractions Effect on area/latency on synthesis Simulate designs to estimate power consumption Transforms DCT, IDCT,DFT, DST, DHT. 8x8 constant matrices 16 digits precision (CSD representation) Compare with Potkonjak (TCAD’95) RESANDS (Nguyen et. al TVLSI’2000) Rectangle Covering (A.Hosangadi et.al ASAP’04)
27
Experimental Results Example # of additions/subtractions Original (I) Potkonjak (II) RESANDS (III) Rectangle Covering (IV) Two-term CSE (V) DCT274202227174153 IDCT242183222162143 RealDFT253193208165144 ImagDFT207178198134124 DST320238252200187 DHT284209211175158 Average263.3200.5219.7168.3151.5 Run Time 0.81s 0.08s
28
Experimental results Synthesis results (minimum latency constraints) Example Area (Library Units) Latency (Clock cycles) (III)(IV)(V)(III)(IV)(V) DCT 906677331166759101110 IDCT 818686686462883101110 R-DFT 904966982764026101110 I-DFT 75140559405460610 DST 108101847158121411 DHT 93939712726777511 10 Average 90110703226621110.310.810.2 (III) RESANDS (IV) Rect. Covering (V) 2-term CSE
29
Experimental results Power consumption ExamplePower consumption (µWatts) (III)(IV)(V) DCT 729504531 IDCT 662547569 R-DFT 707544554 I-DFT 644575490 DST 607718595 DHT 598545527 Average 657.8572.2544.3 (III) RESANDS (IV) Rect. Covering (V) 2-term CSE
30
Conclusions A new technique for eliminating common subexpressions in linear systems Fewer operations than known methods Much faster than rectangle covering Combine with scheduling on given resources
31
Thank you Questions??
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.