Bitwidth Analysis with Application to Silicon Compilation

Slides:

Advertisements

Similar presentations

Chapter 4 Computation Bjarne Stroustrup

Advertisements

MATH 224 – Discrete Mathematics

1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.

Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.

The Assembly Language Level

Macro Processor.

Program Representations. Representing programs Goals.

Chapter 9 Imperative and object-oriented languages 1.

Common Sub-expression Elim Want to compute when an expression is available in a var Domain:

Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.

1 Chapter 4 Language Fundamentals. 2 Identifiers Program parts such as packages, classes, and class members have names, which are formally known as identifiers.

CS 536 Spring Global Optimizations Lecture 23.

A High Performance Application Representation for Reconfigurable Systems Wenrui GongGang WangRyan Kastner Department of Electrical and Computer Engineering.

Chapter 2: Algorithm Discovery and Design

4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)

BitValue: Detecting and Exploiting Narrow Bitwidth Computations Mihai Budiu Carnegie Mellon University joint work with Majd Sakr, Kip.

Chapter 2 The Algorithmic Foundations of Computer Science

Prof. Fateman CS 164 Lecture 221 Global Optimization Lecture 22.

Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...

Chapter 2: Algorithm Discovery and Design

Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.

Chapter 2: Algorithm Discovery and Design

Chapter 2: Algorithm Discovery and Design

Prof. Bodik CS 164 Lecture 16, Fall Global Optimization Lecture 16.

Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science.

Precision Going back to constant prop, in what cases would we lose precision?

Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Chapter 2: Algorithm Discovery and Design Invitation to Computer Science, C++ Version, Third Edition.

Invitation to Computer Science 6th Edition

Invitation to Computer Science, Java Version, Second Edition.

Department of Computer Science A Static Program Analyzer to increase software reuse Ramakrishnan Venkitaraman and Gopal Gupta.

Analysis of Algorithms

Stephen P. Carl - CS 2421 Recursion Reading : Chapter 4.

School of Computer Science & Information Technology G6DICP - Lecture 9 Software Development Techniques.

Programming for Beginners Martin Nelson Elizabeth FitzGerald Lecture 5: Software Design & Testing; Revision Session.

Static Program Analyses of DSP Software Systems Ramakrishnan Venkitaraman and Gopal Gupta.

Pseudocode Simple Program Design Third Edition A Step-by-Step Approach 2.

Introduction to Problem Solving. Steps in Programming A Very Simplified Picture –Problem Definition & Analysis – High Level Strategy for a solution –Arriving.

1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”

Algorithm Design.

6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)

 In the java programming language, a keyword is one of 50 reserved words which have a predefined meaning in the language; because of this,

LECTURE 4 Logic Design. LOGIC DESIGN We already know that the language of the machine is binary – that is, sequences of 1’s and 0’s. But why is this?

Chapter 2: Algorithm Discovery and Design Invitation to Computer Science.

Bitwise Operations C includes operators that permit working with the bit-level representation of a value. You can: - shift the bits of a value to the left.

Control Structures I Chapter 3

Data Flow Analysis Suman Jana

Control Flow Testing Handouts

Handouts Software Testing and Quality Assurance Theory and Practice Chapter 4 Control Flow Testing

Concepts and Challenges

Analysis of Algorithms

Discussion 2: More to discuss

GC211Data Structure Lecture2 Sara Alhajjam.

CS1371 Introduction to Computing for Engineers

Outline of the Chapter Basic Idea Outline of Control Flow Testing

Structural testing, Path Testing

CSCI1600: Embedded and Real Time Software

Arrays, For loop While loop Do while loop

University Of Virginia

Objective of This Course

Datapaths For the rest of the semester, we’ll focus on computer architecture: how to assemble the combinational and sequential components we’ve studied.

CS 240 – Lecture 9 Bit Shift Operations, Assignment Expressions, Modulo Operator, Converting Numeric Types to Strings.

Bitwise Operations C includes operators that permit working with the bit-level representation of a value. You can: - shift the bits of a value to the left.

Coding Concepts (Basics)

Branch instructions We’ll implement branch instructions for the eight different conditions shown here. Bits 11-9 of the opcode field will indicate the.

Control units In the last lecture, we introduced the basic structure of a control unit, and translated our assembly instructions into a binary representation.

ECE 352 Digital System Fundamentals

ECE 352 Digital System Fundamentals

Data Structures & Algorithms

CSCI1600: Embedded and Real Time Software

Presentation transcript:

Bitwidth Analysis with Application to Silicon Compilation Mark Stephenson Jonathan Babb Saman Amarasinghe MIT Laboratory for Computer Science

Goal For a program written in a high level language, automatically find the minimum number of bits needed to represent: Each static variable in the program Each operation in the program June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Usefulness of Bitwidth Analysis Higher Language Abstraction Enables other compiler optimizations Synthesizing application-specific processors June 19th, 2000 www.cag.lcs.mit.edu/bitwise

DeepC Compiler Targeted to FPGAs C/Fortran program Suif Frontend Pointer alias and other high-level analyses Bitwidth Analysis Raw parallelization MachSuif Codegen DeepC specialization Verilog Traditional CAD optimizations Physical Circuit June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Usefulness of Bitwidth Analysis Higher Language Abstraction Enables other compiler optimizations Synthesizing application-specific processors Optimizing for power-aware processors Extracting more parallelism for same instruction, multiple data (SIMD) processors June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Bitwidth Opportunities Runtime profiling reveals plenty of bitwidth opportunities For the SPECint95 benchmark suite, Over 50% of operands use less than half the number of bits specified by the programmer June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Analysis Constraints Bitwidth results must maintain program correctness for all input data sets Results are not runtime/data dependent A static analysis can do very well, even in light of this constraint June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Bitwidth Extraction Use abundant hints in the source language to discover bitwidths with near optimal precision Caveats Analysis limited to fixed-point variables Assume source program correctness June 19th, 2000 www.cag.lcs.mit.edu/bitwise

The Hints Bitwidth refining constructs Arithmetic operations Boolean operations Bitmask operations Loop induction variable bounding Clamping operations Type castings Static array index bounding June 19th, 2000 www.cag.lcs.mit.edu/bitwise

1. Arithmetic Operations Example int a; unsigned b; a = random(); b = random(); a = a / 2; b = b >> 4; a: 32 bits b: 32 bits Arithmetic operations such as divide can be used to reduce bitwidth. Here we see that before execution of the divide instruction, a’s bitwidth is 32. After the divide by 2 however, only 31 bits are needed to represent the variable. This slide also provides an example of a right shift operation. B’s value before its execution is 32 bits. Because the value has been shifted right 4 times, after the instruction, only 28 bits are required. a: 31 bits b: 32 bits a: 31 bits b: 28 bits June 19th, 2000 www.cag.lcs.mit.edu/bitwise

2. Boolean Operations Example int a; a = (b != 15); a: 32 bits In the C programming language, it is common to use a 32-bit integer data-type to represent a boolean variable– as in the case of this example. Thus, identifying such operations can be very profitable. Before the boolean operation, 32 bits are required to represent a. After the operation, however, only 1-bit is required. June 19th, 2000 www.cag.lcs.mit.edu/bitwise

3. Bitmask Operations Example int a; a = random() & 0xff; a: 32 bits Many codes use bitmask operations. In this example, even though we don’t know what the return value of random is, we know that a’s value requires at most 8-bits because we are ANDing it with the quantity 0xff. June 19th, 2000 www.cag.lcs.mit.edu/bitwise

4. Loop Induction Variable Bounding Applicable to for loop induction variables. Example int i; for (i = 0; i < 6; i++) { … } i: 32 bits Another high-level technique is to identify for loop induction variables. In essence, for loops determine the range of values that the induction variable can assume. This range can be converted to a bitwidth. For instance, within and after the for loop in this example we see that I requires only 3-bits to represent the range of values from 0 to 6. i: 3 bits June 19th, 2000 www.cag.lcs.mit.edu/bitwise

5. Clamping Optimization Multimedia codes often simulate saturating instructions. Example int valpred if (valpred > 32767) valpred = 32767 else if (valpred < -32768) valpred = -32768 valpred: 32 bits We call the next optimization the clamping optimization. In many multimedia codes you see sequences of instructions simulating saturating instructions. In other words, the programmer wants to clamp the range of a variable. In this example, taken from one of our benchmarks, adpcm, the programmer wants to restrict the range of valpred to values that can be represented with 16-bits. Thus, after the saturating instructions are executed, only 16-bits are required to represent valpred. valpred: 16 bits June 19th, 2000 www.cag.lcs.mit.edu/bitwise

6. Type Casting (Part I) Example char b; a = b; a: 32 bits b: 8 bits int a; char b; a = b; a: 32 bits b: 8 bits a: 8 bits b: 8 bits Type promotion also serves as a good tool for reducing bitwidths. In this example, a is an integer and b is a character. So after the assignment of the 8-bit type b to a, a’s value can be at most 8-bits. June 19th, 2000 www.cag.lcs.mit.edu/bitwise

6. Type Casting (Part II) Example char b; b = a; a: 32 bits b: 8 bits int a; char b; b = a; a: 32 bits b: 8 bits a: 8 bits b: 8 bits Type promotion also serves as a good tool for reducing bitwidths. In this example, a is an integer and b is a character. So after the assignment of the 8-bit type b to a, a’s value can be at most 8-bits. a: 8 bits b: 8 bits June 19th, 2000 www.cag.lcs.mit.edu/bitwise

7. Array Index Optimization An index into an array can be set based on the bounds of the array. Example int a, b; int X[1024]; X[a] = X[4*b]; a: 32 bits b: 32 bits Assuming program correctness we can use the bounds information of an array to restrict the ranges of variables that index into the array. In this example, an index into the array X can be at most 10-bits, otherwise a buffer overrun will result. Thus we can set the index variables a and b accordingly. This slide alludes to another of the optimizations that we implemented: backward propagation. We can use this information to re-compute the bitwidths of ancestor instructions. I’ll explain this better in a later slide. a: 10 bits b: 8 bits a: 10 bits b: 8 bits June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Propagating Data-Ranges Data-flow analysis Three candidate lattices Bitwidth Vector of bits Data-ranges All of the things that we’ve shown you to this point are high level ideas. Let’s dive in now and look at the details of the implementation. First of all, we perform data-flow analysis. We explored three candidate lattices for the data-flow analysis: The first lattice simply keeps track of a variable’s bitwidth. Here’s an example of bitwidth propagation: in this example, let’s assume that a’s bitwidth is initially 4. Propagating bitwidths, we have to conservatively assume that even an increment instruction always results in a carry, when in all likely hood, it does not. Though the second lattice we considered, a vector of bits, has advantages over the bitwidth lattice, it still suffers from this same arithmetic imprecision. We decided to propagate data-ranges, which are simply all the integers between a lower and an upper bound. We did this because, of the three structures, the data-range lattice is the only one that handles arithmetic expressions well. Since all of the code we examined had some degree of arithmetic computation, we decided that this is an important feature. a: 4 bits Propagating bitwidths a = a + 1 a: 5 bits June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Propagating Data-Ranges Data-flow analysis Three candidate lattices Bitwidth Vector of bits Data-ranges All of the things that we’ve shown you to this point are high level ideas. Let’s dive in now and look at the details of the implementation. First of all, we perform data-flow analysis. We explored three candidate lattices for the data-flow analysis: The first lattice simply keeps track of a variable’s bitwidth. Here’s an example of bitwidth propagation: in this example, let’s assume that a’s bitwidth is initially 4. Propagating bitwidths, we have to conservatively assume that even an increment instruction always results in a carry, when in all likely hood, it does not. Though the second lattice we considered, a vector of bits, has advantages over the bitwidth lattice, it still suffers from this same arithmetic imprecision. We decided to propagate data-ranges, which are simply all the integers between a lower and an upper bound. We did this because, of the three structures, the data-range lattice is the only one that handles arithmetic expressions well. Since all of the code we examined had some degree of arithmetic computation, we decided that this is an important feature. a: 1X Propagating bit vectors a = a + 1 a: XXX June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Propagating Data-Ranges Data-flow analysis Three candidate lattices Bitwidth Vector of bits Data-ranges Four bits are required Data ranges handle arithmetic much better. With this information we see that the required number of bits for arithmetic operations does not always change. In this example, a’s range is initially <0,13>. After being incremented, its range becomes <1,14>. In both cases 4 bits is sufficient to represent a. I should also note that it’s easy to compute the number of bits needed to represent a data-range. a: <0,13> Propagating data-ranges a = a + 1 a: <1,14> June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Propagating Data-Ranges Propagate data-ranges forward and backward over the control-flow graph using transfer functions described in the paper Use Static Single Assignment (SSA) form with extensions to: Gracefully handle pointers and arrays. Extract data-range information from conditional statements. With the lattice set, we propagate the data-ranges both forward and backward in the control flow graph. I’ll show you an example of this in a minute. We also chose to use SSA form because in the common case of forward propagation, it is an efficient form for data-range propagation. In the next slide I’ll show you how we extended SSA form to help us extract data-range information from conditional statements. Because we’re pressed for time, I won’t be talking about the extensions we made to SSA form that allow us to gracefully handle pointers and arrays. They’re in the paper though. June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Example of Data-Range Propagation a0 = input() a1 = a0 + 1 Range-refinement functions a1 < 0 true a2 = a1:(a10) a3 = a2 + 1 a4 = a1:(a10) c0 = a4 Here are the extensions to SSA form to extract data-range information from conditional statements. We call them range refinement functions. They allow us to restrict the range of a predicate based on the outcome of the test. For example the highlighted block corresponds to a path where the predicate is true. Thus, we know that the value of the predicate variable, a1, is less than 0. There is also a refinement function that corresponds to the situation where the predicate is false. a5 = (a3,a4) b0 = array[a5] June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Example of Data-Range Propagation a0 = input() a1 = a0 + 1 <-128, 127> <-2, 8> <-127, 127> <-1, 9> <-1, -1> a1 < 0 <-127, -1> true <0, 9> <0, 127> a2 = a1:(a10) a3 = a2 + 1 a4 = a1:(a10) c0 = a4 <-126, 0> <0, 127> <0, 9> <0, 9> a5 = (a3,a4) b0 = array[a5] <-126, 127> <0, 9> array’s bounds are [0:9] June 19th, 2000 www.cag.lcs.mit.edu/bitwise

What to do with Loops? June 19th, 2000 www.cag.lcs.mit.edu/bitwise As we just saw, straight line code turns out to be fairly straightforward to analyze. Loops on the other hand present a bit of a challenge. In traditional data-flow analysis, you iterate over the control flow graph applying transfer functions until a fixed-point is reached. June 19th, 2000 www.cag.lcs.mit.edu/bitwise

What to do with Loops? Finding the fixed-point around back edges will often saturate data-ranges. a = 0 for (y = 1; y < 100; y++) a = a + 5; Well, this will saturate even the simplest arithmetic expression. In this example, we see that a fixed point is only reached when both linear sequences are saturated. Because instructions in loops comprise the bulk of dynamically executed instructions, it’s important that we analyze them well. June 19th, 2000 www.cag.lcs.mit.edu/bitwise

The Loop Solution Classify groups of dependent instructions into sequences Linear sequence: i = i + 1; //counter j = k + n; k = j + 1; Call a solver to find the closed form solution to approximate range We came up with a solution that allows us to accurately determine data-ranges of operands and expressions in loops. We find closed-form solutions to commonly occurring sequences. A sequence is defined as a group of mutually dependent instructions. I’ll show you an example in a minute to elucidate this point. We can then used the closed-form solutions to determine the final ranges. June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Finding the Closed-Form Solution a0 = 0 for i = 1 to 10 a1 = a0 + 1 for j = 1 to 10 a2 = a1 + 2 for k = 1 to 10 a3 = a2 + 3 ...= a3 + 4 Like I said before, a sequence is a group of mutually dependent instructions. In this example there is one sequence and all of the instructions that comprise the sequence are highlighted in white. June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Finding the Closed-Form Solution a0 = 0 <0,0> for i = 1 to 10 a1 = a0 + 1 <1,460> for j = 1 to 10 a2 = a1 + 2 <3,480> for k = 1 to 10 a3 = a2 + 3 <24,510> ...= a3 + 4 <510,510> Shown to the right of each instruction in the sequence is the actual range of values that the particular instruction takes on. It’s interesting to note that each instruction in the sequence takes on a different range. It turns out that it’s non-trivial to find the exact ranges. Non-trivial to find the exact ranges June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Finding the Closed-Form Solution a0 = 0 <0,0> for i = 1 to 10 a1 = a0 + 1 <1,460> for j = 1 to 10 a2 = a1 + 2 <3,480> for k = 1 to 10 a3 = a2 + 3 <24,510> ...= a3 + 4 <510,510> Shown to the right of each instruction in the sequence is the actual range of values that the particular instruction takes on. It’s interesting to note that each instruction in the sequence takes on a different range. It turns out that it’s non-trivial to find the exact ranges. Can easily find conservative range of <0,510> June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Solving the Linear Sequence for i = 1 to 10 a1 = a0 + 1 for j = 1 to 10 a2 = a1 + 2 for k = 1 to 10 a3 = a2 + 3 ...= a3 + 4 <1,10> <1,100> Before I explain the algorithm, note that there are several types of sequences to detect and solve, not just linear sequences. Each of them require a different method by which to compute the closed form solution. Here we show you the algorithm that solves closed-form solutions of linear sequences. It begins by calculating the iteration count of each loop. Here iteration count is defined as the number of times instructions in the loop body are going to be executed. Figure out the iteration count of each loop. June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Solving the Linear Sequence for i = 1 to 10 a1 = a0 + 1 for j = 1 to 10 a2 = a1 + 2 for k = 1 to 10 a3 = a2 + 3 ...= a3 + 4 <1,10> <1,10>*<1,1>=<1,10> <1,100> <1,100>*<2,2>=<2,200> <1,100> <1,100>*<3,3>=<3,300> We then use the iteration count and the growth of each instruction to determine how much the instruction contributes to the growth of the sequence. (<1,10>+<2,200>+<3,300>)<0,0>=<0,510> Sum all the contributions together, and take the data-range union with the initial value. June 19th, 2000 www.cag.lcs.mit.edu/bitwise

Summary Developed Bitwise: a compiler that automatically determines integer bitwidths Propagated value ranges Loop analysis Demonstrate savings when targeting silicon from high-level languages onto FPGA 57% less area up to 86% improvement in clock speed less than 50% of the power In summary, we created the Bitwise compiler, a compiler that successfully determines operand bitwidth with excellent precision. Bitwise uses a suite of techniques including standard data-flow analysis, sophisticated loop analysis, and we’ve also incorporated pointer analysis. We demonstrate substantial savings when targeting silicon. June 19th, 2000 www.cag.lcs.mit.edu/bitwise