Function Evaluation Using Tables and Small Multipliers CS252A, Spring 2005 Jason Fong
Overview Want to obtain values of elementary functions sin(x), cos(x), e x Full lookup table would be too large Bipartite and multipartite tables Split into multiple smaller tables and add values to obtain an approximation
Table Method With Small Multipliers Similar to multipartite method Approximate using 5 th order Taylor expansion Use a set of smaller tables and some small multipliers Better precision for same amount of hardware when compared to bipartite and multipartite methods
Taylor Series Approximates the value of f(x) near x = a More terms give a better approximation But not directly applicable for table values
Making a Taylor Series Useful Split n-bit input x into x 0, x 1, x 2, x 3, x 4 x 0, x 1, x 2, x 3 are k-bits wide x 4 is p-bits wide 4k+p = n p < k Use first 5 terms, and set a = x 0 Rearrange terms into groups that depend on only two parts of x Reduces possible values for each group Reduces number of rows in a groups table of values
Resulting Formula Each term depends on only two parts of x Compute all possible values of each term and create a lookup table with those values Lookup table row number obtained by concatenating input values Some terms require small multiplications Add together all terms to get the function value
Input Restrictions x is in a fixed-point format x is in the range [0,1) Range reductions common in approximation methods Apply transformation to reduce range of input Obtain approximation Apply another transformation to obtain final value
Block Diagram
Area Reduction in Tables n = 23, k = 5, p = 3 Full lookup table: 2 n entries, each 4k+p bits ~8 million rows Smaller tables: 2 2k entries of 4k+p+g bits (Table A) 2 2k entries of 2k+p+g bits (Table B) 2 x 2 2k entries of k+p+g bits (Tables C and E) 2 p+k entries of p+g bits (Table D) ~5000 rows
Multipliers Two small multipliers: k x k+p+g k x p+g One operand less than ¼ size of input precision Modern FPGAs include small multipliers
Implementation Java program calculates values of tables Function evaluator implemented using Altera Quartus II Size and delay measurements for Altera Stratix II FPGA
Building Table Values Java program generates Verilog code implementing each lookup table Iterate through each combination of (x 0,x 1 ), (x 0, x 2 ), etc. and calculate the corresponding value of the table Check correctness by iterating through all values of x and comparing with functions real value
Guard Bits Can find worse-case number of guard bits required based on logic structure May not actually need all the guard bits Adjust guard bit value and find minimum needed for a particular function
Results Synthesized for an Altera Stratix II ALUTs 96 DSP blocks (used as multipliers) f(x) = e x, n= ALUTs (17%) 4 DSP blocks (4%) 23 ns delay
In Comparison... FunctionALUTsDSPsDelay(ns) e x, n= e x, n= sin(x), n= adder, n= adder, n=141709
Possible Improvements Optimize final adder Currently using a generic parallel adder Not all operands are the same width Can optimize by making a custom adder Merge multiplications into the final adder Move partial product arrays into the adder Change splitting of the x input Improves table size More complicated formulas for table values
References D. Defour, F de Dinechin, and J.-M. Muller, "A New Scheme fo Table-Based Evaluation of Functions," Proc. 36th Asilomar Conf. Signals, Systems, and Computers, Nov F. de Dinechin, A. Tisserand, "Multipartite Table Methods," IEEE Transactions on Computers, March 2005 M. Ercegovac, T. Lang, Digital Arithmetic, Ch. 10