Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science.

Similar presentations


Presentation on theme: "Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science."— Presentation transcript:

1 Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

2 June 19th, 2000www.cag.lcs.mit.edu/bitwise Goal For a program written in a high level language, automatically find the minimum number of bits needed to represent: –Each static variable in the program –Each operation in the program.

3 June 19th, 2000www.cag.lcs.mit.edu/bitwise Usefulness of Bitwidth Analysis Higher Language Abstraction Enables other compiler optimizations 1. Synthesizing application-specific processors 2. Optimizing for power-aware processors 3. Extracting more parallelism for SIMD processors

4 June 19th, 2000www.cag.lcs.mit.edu/bitwise Bitwidth Opportunities Runtime profiling reveals plenty of bitwidth opportunities. For the SPECint95 benchmark suite, –Over 50% of operands use less than half the number of bits specified by the programmer.

5 June 19th, 2000www.cag.lcs.mit.edu/bitwise Analysis Constraints Bitwidth results must maintain program correctness for all input data sets –Results are not runtime/data dependent A static analysis can do very well, even in light of this constraint

6 June 19th, 2000www.cag.lcs.mit.edu/bitwise Bitwidth Extraction Use abundant hints in the source language to discover bitwidths with near optimal precision. Caveats – Analysis limited to fixed-point variables. – We assume source program correctness.

7 June 19th, 2000www.cag.lcs.mit.edu/bitwise The Hints Bitwidth refining constructs 1.Arithmetic operations 2.Boolean operations 3.Bitmask operations 4.Loop induction variable bounding 5.Clamping operations 6.Type castings 7.Static array index bounding

8 June 19th, 2000www.cag.lcs.mit.edu/bitwise 1. Arithmetic Operations Example int a; unsigned b; a = random(); b = random(); a = a / 2; b = b >> 4; a: 32 bits b: 32 bits a: 31 bits b: 32 bits a: 31 bits b: 28 bits

9 June 19th, 2000www.cag.lcs.mit.edu/bitwise 2. Boolean Operations Example int a; a = (b != 15); a: 32 bits a: 1 bit

10 June 19th, 2000www.cag.lcs.mit.edu/bitwise int a; a = random() & 0xff; 3. Bitmask Operations Example a: 32 bits a: 8 bits

11 June 19th, 2000www.cag.lcs.mit.edu/bitwise Applicable to for loop induction variables. Example int i; for (i = 0; i < 6; i++) { … } 4. Loop Induction Variable Bounding i: 32 bits i: 3 bits

12 June 19th, 2000www.cag.lcs.mit.edu/bitwise 5. Clamping Optimization Multimedia codes often simulate saturating instructions. Example int valpred if (valpred > 32767) valpred = 32767 else if (valpred < -32768) valpred = -32768 valpred: 32 bits valpred: 16 bits

13 June 19th, 2000www.cag.lcs.mit.edu/bitwise 6. Type Casting (Part I) Example int a; char b; a = b; a: 32 bits b: 8 bits a: 8 bits b: 8 bits

14 June 19th, 2000www.cag.lcs.mit.edu/bitwise 6. Type Casting (Part II) Example int a; char b; b = a; a: 32 bits b: 8 bits a: 8 bits b: 8 bits

15 June 19th, 2000www.cag.lcs.mit.edu/bitwise 7. Array Index Optimization An index into an array can be set based on the bounds of the array. Example int a, b; int X[1024]; X[a] = X[4*b]; a: 32 bits b: 32 bits a: 10 bits b: 8 bits

16 June 19th, 2000www.cag.lcs.mit.edu/bitwise Data-flow analysis Three candidate lattices –Bitwidth –Vector of bits –Data-ranges Propagating Data-Ranges a = a + 1 a: 4 bits a: 5 bits Propagating bitwidths

17 June 19th, 2000www.cag.lcs.mit.edu/bitwise Data-flow analysis Three candidate lattices –Bitwidth –Vector of bits –Data-ranges Propagating Data-Ranges a = a + 1 a:  1X a:  XXX Propagating bit vectors

18 June 19th, 2000www.cag.lcs.mit.edu/bitwise Data-flow analysis Three candidate lattices –Bitwidth –Vector of bits –Data-ranges Propagating Data-Ranges a = a + 1 a: Propagating data-ranges Four bits are required

19 June 19th, 2000www.cag.lcs.mit.edu/bitwise Propagating Data-Ranges Propagate data-ranges forward and backward over the control-flow graph using transfer functions described in the paper Use Static Single Assignment (SSA) form with extensions to: –Gracefully handle pointers and arrays. –Extract data-range information from conditional statements.

20 June 19th, 2000www.cag.lcs.mit.edu/bitwise a2 = a1:(a1  0) a3 = a2 + 1 Example of Data-Range Propagation a0 = input() a1 = a0 + 1 a1 < 0 a4 = a1:(a1  0) c0 = a4 a5 =  (a3,a4) b0 = array[a5] Range-refinement functions true

21 June 19th, 2000www.cag.lcs.mit.edu/bitwise a2 = a1:(a1  0) a3 = a2 + 1 Example of Data-Range Propagation a0 = input() a1 = a0 + 1 a1 < 0 a4 = a1:(a1  0) c0 = a4 a5 =  (a3,a4) b0 = array[a5] array’s bounds are [0:9] true

22 June 19th, 2000www.cag.lcs.mit.edu/bitwise What to do with Loops? Finding the fixed-point around back edges will often saturate data-ranges.

23 June 19th, 2000www.cag.lcs.mit.edu/bitwise What to do with Loops? Finding the fixed-point around back edges will often saturate data-ranges. Example a0 = 0 y0 = 1 y1 =  (y0, y2) a1 =  (a0, a3) y1 < 100 a2 = a1 + 5 y2 = y1 + 1 a: 0..0 y: 1..1 a: 0..0 a: 0..5 y: 1..2 a: 0..10 y: 1..3 a: 0..20 y: 1..5 a: 0..25 y: 1..6 a: 0..  y: 1..  a = 0 for (y = 1; y < 100; y++) a = a + 5;

24 June 19th, 2000www.cag.lcs.mit.edu/bitwise Our Loop Solution Find the closed-form solutions to commonly occurring sequences. –A sequence is a mutually dependent group of instructions. Use the closed-form solutions to determine final ranges.

25 June 19th, 2000www.cag.lcs.mit.edu/bitwise Finding the Closed-Form Solution a = 0 for i = 1 to 10 a = a + 1 for j = 1 to 10 a = a + 2 for k = 1 to 10 a = a + 3...= a + 4

26 June 19th, 2000www.cag.lcs.mit.edu/bitwise Finding the Closed-Form Solution a = 0 for i = 1 to 10 a = a + 1 for j = 1 to 10 a = a + 2 for k = 1 to 10 a = a + 3...= a + 4

27 June 19th, 2000www.cag.lcs.mit.edu/bitwise a = 0 for i = 1 to 10 a = a + 1 for j = 1 to 10 a = a + 2 for k = 1 to 10 a = a + 3...= a + 4 Non-trivial to find the exact ranges Finding the Closed-Form Solution

28 June 19th, 2000www.cag.lcs.mit.edu/bitwise a = 0 for i = 1 to 10 a = a + 1 for j = 1 to 10 a = a + 2 for k = 1 to 10 a = a + 3...= a + 4 Non-trivial to find the exact ranges Finding the Closed-Form Solution

29 June 19th, 2000www.cag.lcs.mit.edu/bitwise a = 0 for i = 1 to 10 a = a + 1 for j = 1 to 10 a = a + 2 for k = 1 to 10 a = a + 3...= a + 4 Can easily find conservative range of Finding the Closed-Form Solution

30 June 19th, 2000www.cag.lcs.mit.edu/bitwise a = 0 for i = 1 to 10 a = a + 1 for j = 1 to 10 a = a + 2 for k = 1 to 10 a = a + 3...= a + 4 Figure out the iteration count of each loop. Solving the Linear Sequence

31 June 19th, 2000www.cag.lcs.mit.edu/bitwise a = 0 for i = 1 to 10 a = a + 1 for j = 1 to 10 a = a + 2 for k = 1 to 10 a = a + 3...= a + 4 Find out how much each instruction contributes to sequence using iteration count. Solving the Linear Sequence * =

32 June 19th, 2000www.cag.lcs.mit.edu/bitwise a = 0 for i = 1 to 10 a = a + 1 for j = 1 to 10 a = a + 2 for k = 1 to 10 a = a + 3...= a + 4 Sum all the contributions together, and take the data- range union with the initial value. Solving the Linear Sequence * = ( + + )  =

33 June 19th, 2000www.cag.lcs.mit.edu/bitwise Results Standalone Bitwise compiler. –Bits cut from scalar variables –Bits cut from array variables With the DeepC silicon compiler.

34 June 19th, 2000www.cag.lcs.mit.edu/bitwise Percentage of Original Scalar Bits

35 June 19th, 2000www.cag.lcs.mit.edu/bitwise Percentage of Original Array Bits

36 June 19th, 2000www.cag.lcs.mit.edu/bitwise DeepC Compiler Targeted to FPGAs Suif Frontend C/Fortran program Pointer alias and other high-level analyses MachSuif Codegen DeepC specialization Raw parallelization Traditional CAD optimizations Physical Circuit Bitwidth Analysis Verilog

37 June 19th, 2000www.cag.lcs.mit.edu/bitwise FPGA Area 0 200 400 600 800 1000 1200 1400 1600 1800 2000 adpcm (8) bubblesort (32) convolve (16) histogram (16) intfir (32) intmatmul (16) jacobi (8) life (1) median (32) mpegcorr (16) newlife (1) parity (32) pmatch (32) sor (32) Area (CLB count) Without bitwiseWith bitwise Benchmark (main datapath width)

38 June 19th, 2000www.cag.lcs.mit.edu/bitwise FPGA Clock Speed (50 MHz Target) Without bitwiseWith bitwise 0 25 50 75 100 125 150 adpcm bubblesort convolve histogram intfir intmatmul jacobi life median mpegcorr newlife parity pmatch sor XC4000-09 Clock Speed (MHZ)

39 June 19th, 2000www.cag.lcs.mit.edu/bitwise Power Savings 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 bubblesorthistogramjacobipmatch Average Dynamic Power (mW) Without bitwidth analysisWith bitwidth analysis

40 June 19th, 2000www.cag.lcs.mit.edu/bitwise Related Work Data-range propagation for branch prediction [Patterson] Symbolic data-range analysis [Rugina et al.] Bitwidth propagation [Ananian] Bit-vector propagation [Rahzdan, Budiu et al.]

41 June 19th, 2000www.cag.lcs.mit.edu/bitwise Summary Developed Bitwise: a scalable bitwidth analyzer –Standard data-flow analysis –Loop analysis –Incorporate pointer analysis Demonstrate savings when targeting silicon from high-level languages –57% less area –up to 86% improvement in clock speed –less than 50% of the power

42 June 19th, 2000www.cag.lcs.mit.edu/bitwise

43 June 19th, 2000www.cag.lcs.mit.edu/bitwise Power Savings C  ASIC –IBM SA27E process 0.15 micron drawn –200 MHz Methodology –C  RTL –RTL simulation  Register switching activity –Synthesis reports dynamic power

44 June 19th, 2000www.cag.lcs.mit.edu/bitwise Mismatched Bitwidths When operands of an instruction are of differing sizes –type conversion instructions are added, converting both operands to an integer of the widest of the two, and with the appropriate sign


Download ppt "Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science."

Similar presentations


Ads by Google