1 Combined LNS Adder/Subtractors for DCT Hardware Jie Ruan & Mark G. Arnold
2 Outline Logarithmic Number System (LNS) Discrete Cosine Transform (DCT) Combined LNS adder/subtractor
3 LNS (Logarithmic Number System) Represents a number by a sign bit and an exponent to a certain base b Exponent (n-1 bits)S F (Precision)
4 Properties of LNS Large dynamic range Easy for multiplications, divisions and exponentiations Additions are not linear operations for LNS Cost of adders is exponential to word lengths Have advantages at low precisions
5 LNS Arithmetic Units Multiplication log b (XY) = log b X + log b Y The cost is a fixed-point adder Addition More complex process than multiplication E.g., when calculating log b (X+Y), (x=log b X, y=log b Y) 1. Calculate z=x-y Z=X/Y 2. Table-lookup s b (z)=log b (1+b z ) 1+X/Y 3. log b (X+Y)=y+s b (z) Y(1+X/Y)=X+Y Subtraction d b (z)=log b |1-b z |
6 LNS Multiplication and Addition s b (z) d b (z) z s b (z)=log b (1+2 z ) =y+s b (z) _ + x y log b (X+Y) d b (z)=log b |1-2 z | x y + LNS multiplicationLNS addition log b (XY) =x+y x=log b X, y=log b Y (=y+d b (z) when S x ≠S y )
7 Discrete Cosine Transform An important part in MPEG encoding 2 Dimensional 8x8 DCT 2-D DCT usually performed through 2 rounds of 1-D DCT to reduce the hardware cost
8 LNS DCT in MPEG encoding Floating-point cost is too high for portable systems LNS has the same visual result as fixed-point at the same precisions LNS have shorter word length than fixed-point numbers At the same dynamic range and precisions for MPEG-1 Fixed-point (12+F) bits LNS (6+F) bits
9 Fast DCT algorithm Chen’s 1-D DCT algorithm (one cycle) Directly factorizes the DCT matrix 16 multiplications 26 additions Perform one 8-point 1-D DCT in one cycle Two-cycle version by reusing hardware 14 adders 10 multipliers Perform one 8-point 1-D DCT in two cycles
10 Diagram of Chen’s 1-D DCT S(1/4) C(1/4) S(1/8) C(1/8) S(1/8) -C(1/8) C(1/4) S(1/4) S(1/16) C(1/16) -S(7/16) C(7/16) S(5/16) C(5/16) -S(3/16) C(3/16) f(0) f(1) f(2) f(3) f(4) f(5) f(6) f(7) F(0) F(4) F(2) F(6) F(1) F(5) F(3) F(7) - S(m/n)=sin(mπ/n), C(m/n)=cos(mπ/n)
11 Many computational units as below in DCT Combined LNS adders/subtractors X+Y X-Y - The above two computation always access different s b (z) table and d b (z) table Share table-lookup part and some combinational parts in the above two computations =
12 Combined LNS adder/subtractors 1. z=x-y 2. Table-lookup s b (z)=log b (1+2 z ) 3. y+s b (z) X+Y x=log b X, y=log b Y 1. z=x-y 2. Table-lookup d b (z)=log b |1-2 z | 3. y+d b (z) X-Y Same hardware Same address for different tables
13 Combined LNS adder/subtractors (type 1) s b (z) d b (z) z=x-y =y+s b (z)_ + x y log b (X+Y) (=y+d b (z) when S x ≠S y ) + =y+d b (z) log b |X-Y| (=y+s b (z) when S x ≠S y )
14 Combined LNS adder/subtractors (type 1) s b (z) d b (z) z=x-y =y+s b (z)_ + x y log b (X+Y) (=y+d b (z) when S x ≠S y ) + =y+d b (z) log b |X-Y| (=y+s b (z) when S x ≠S y )
15 Diagram of Chen’s 1-D DCT S(1/4) C(1/4) S(1/8) C(1/8) S(1/8) -C(1/8) C(1/4) S(1/4) S(1/16) C(1/16) -S(7/16) C(7/16) S(5/16) C(5/16) -S(3/16) C(3/16) f(0) f(1) f(2) f(3) f(4) f(5) f(6) f(7) F(0) F(4) F(2) F(6) F(1) F(5) F(3) F(7) S(m,n)=sin(mπ/n), C(m,n)=cos(mπ/n) S(1/8) C(1/8) -C(1/8) S(1/8) C(1/8) S(1/8) -C(1/8) S(1/8) C(1/8) -C(1/8) S(1/8)
16 Some computation units perform blow computations Combined LNS adder/subtractors a 1 X+a 2 Y -a 2 X+a 1 Y (a 1, a 2 are constants) S(1/8) C(1/8) S(1/8) -C(1/8) Access different tables in an LNS adder Share table-lookup part Add some extra combinational hardware The table-lookup of the two computations use different addresses =
17 Combined LNS adder/subtractors (type 2) s b (z) d b (z) =y+s b (z 1 )_ + log b a 2 X log b (a 1 X+a 2 Y) (=y+d b (z 1 ) when S x ≠S y ) + =y+d b (z 2 ) log b (-a 2 X+a 1 Y) (=y+s b (z 2 ) when S x ≠S y ) _ log b a 2 Y log b a 1 X log b a 1 Y z2z2 z1z1
18 Portions of table-lookup part in LNS adders
19 ROM size with/without combined LNS adder/subtractors
20 Hardware comparison for LNS adder and LNS adder/subtractors
21 LNS adder/subtractors in Chen’s hardware LNS adders OrdinaryType 1Type 2 Direct inferred hardware Two-cycle version hardware 14432
22 Hardware comparison for Chen’s DCT algorithm at F=4
23 Conclusion Significant area savings by combined LNS adder/subtractors in DCT hardware Suitable to reduce area in portable MPEG devices Some overhead when converting to/from fixed-point