Radu Rugina and Martin Rinard Laboratory for Computer Science

Slides:

Advertisements

Similar presentations

Masahiro Fujita Yoshihisa Kojima University of Tokyo May 2, 2008

Advertisements

MATH 224 – Discrete Mathematics

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.

Practical techniques & Examples

Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

1 Divide & Conquer Algorithms. 2 Recursion Review A function that calls itself either directly or indirectly through another function Recursive solutions.

Lecture 7 : Parallel Algorithms (focus on sorting algorithms) Courtesy : SUNY-Stony Brook Prof. Chowdhury’s course note slides are used in this lecture.

Commutativity Analysis: A New Analysis Technique for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz April 7 th, 2010 Youngjoon Jo.

Scott Grissom, copyright 2004 Chapter 5 Slide 1 Analysis of Algorithms (Ch 5) Chapter 5 focuses on: algorithm analysis searching algorithms sorting algorithms.

Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz

Analysis of Algorithms 7/2/2015CS202 - Fundamentals of Computer Science II1.

CHAPTER 10 Recursion. 2 Recursive Thinking Recursion is a programming technique in which a method can call itself to solve a problem A recursive definition.

Analysis of Algorithms Spring 2015CS202 - Fundamentals of Computer Science II1.

Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.

HOW TO SOLVE IT? Algorithms. An Algorithm An algorithm is any well-defined (computational) procedure that takes some value, or set of values, as input.

A Review of Recursion Dr. Jicheng Fu Department of Computer Science University of Central Oklahoma.

1 L ECTURE 2 Matrix Multiplication Tableau Construction Recurrences (Review) Conclusion Merge Sort.

ECOE 456/556: Algorithms and Computational Complexity Lecture 1 Serdar Taşıran.

Introduction to Problem Solving. Steps in Programming A Very Simplified Picture –Problem Definition & Analysis – High Level Strategy for a solution –Arriving.

Review 1 Selection Sort Selection Sort Algorithm Time Complexity Best case Average case Worst case Examples.

Pointer Analysis for Multithreaded Programs Radu Rugina and Martin Rinard M I T Laboratory for Computer Science.

Algorithm Analysis. What is an algorithm ? A clearly specifiable set of instructions –to solve a problem Given a problem –decide that the algorithm is.

Unit 2: Systems Day 1: Solving Systems with tables and graphing.

Pointer and Escape Analysis for Multithreaded Programs Alexandru Salcianu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Program Analysis and Design Conformance Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

1 Proving program termination Lecture 5 · February 4 th, 2008 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A.

Recursion Unrolling for Divide and Conquer Programs Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Analysis of Algorithms Spring 2016CS202 - Fundamentals of Computer Science II1.

Chapter 15 Running Time Analysis. Topics Orders of Magnitude and Big-Oh Notation Running Time Analysis of Algorithms –Counting Statements –Evaluating.

Credible Compilation With Pointers Martin Rinard and Darko Marinov Laboratory for Computer Science Massachusetts Institute of Technology.

Secure Coding Rules for C++ Copyright © 2016 Curt Hill

ECOE 456/556: Algorithms and Computational Complexity

CMPT 438 Algorithms.

Using recursion for Searching and Sorting

Sorts, CompareTo Method and Strings

Fundamentals of Algorithms MCS - 2 Lecture # 11

Analysis of Algorithms

Analysis of Algorithms

Decrease-and-Conquer Approach

Analysis of Algorithms

GC211Data Structure Lecture2 Sara Alhajjam.

CS 3343: Analysis of Algorithms

Quick Sort and Merge Sort

Program Analysis Techniques for Memory Disambiguation

COSC160: Data Structures Linked Lists

Martin Rinard Laboratory for Computer Science

Recursion "To understand recursion, one must first understand recursion." -Stephen Hawking.

Design-Driven Compilation

Growth Functions Algorithms Lecture 8

CS 3343: Analysis of Algorithms

Algorithm Analysis (not included in any exams!)

Objective of This Course

A Practical Stride Prefetching Implementation in Global Optimizer

Hassan Khosravi / Geoffrey Tien

Ch 2: Getting Started Ming-Te Chi

Topic: Divide and Conquer

Analysis of Algorithms

Searching: linear & binary

Divide and Conquer (Merge Sort)

Algorithms: the big picture

Samuel Larsen Saman Amarasinghe Laboratory for Computer Science

Recursion Chapter 11.

CSE 373 Data Structures and Algorithms

Divide & Conquer Algorithms

David Kauchak cs161 Summer 2009

Major Design Strategies

Analysis of Algorithms

Major Design Strategies

Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1

Presentation transcript:

Symbolic Bounds Analysis of Pointers, Array Indices, and Accessed Memory Regions Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Outline Examples Key Problem: Extracting Symbolic Bounds for Accessed Memory Regions Key Technology: Formulating and Solving Systems of Symbolic Inequality Constraints Results Conclusion

Example - Divide and Conquer Sort 7 4 6 1 3 5 8 2

Example - Divide and Conquer Sort 7 4 6 1 3 5 8 2 4 7 6 1 5 3 8 2 Divide

Example - Divide and Conquer Sort 7 4 6 1 3 5 8 2 4 7 6 1 5 3 8 2 Divide 4 7 1 6 3 5 2 8 Conquer

Example - Divide and Conquer Sort 7 4 6 1 3 5 8 2 4 7 6 1 5 3 8 2 Divide 4 7 1 6 3 5 2 8 Conquer 1 4 6 7 2 3 5 8 Combine

Example - Divide and Conquer Sort 7 4 6 1 3 5 8 2 4 7 6 1 5 3 8 2 Divide 4 7 1 6 3 5 2 8 Conquer 1 4 6 7 2 3 5 8 Combine 1 2 3 4 5 6 7 8

“Sort n Items in d, Using t as Temporary Storage” void sort(int *d, int *t, int n) if (n > CUTOFF) { sort(d,t,n/4); sort(d+n/4,t+n/4,n/4); sort(d+2*(n/2),t+2*(n/2),n/4); sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); merge(d,d+n/4,d+n/2,t); merge(d+n/2,d+3*(n/4),d+n,t+n/2); merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n);

Exploit parallelism in this code “Sort n Items in d, Using t as Temporary Storage” void sort(int *d, int *t, int n) if (n > CUTOFF) { sort(d,t,n/4); sort(d+n/4,t+n/4,n/4); sort(d+2*(n/2),t+2*(n/2),n/4); sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); merge(d,d+n/4,d+n/2,t); merge(d+n/2,d+3*(n/4),d+n,t+n/2); merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); Motivating Problem Exploit parallelism in this code

“Recursively Sort Four Quarters of d” void sort(int *d, int *t, int n) if (n > CUTOFF) { sort(d,t,n/4); sort(d+n/4,t+n/4,n/4); sort(d+2*(n/2),t+2*(n/2),n/4); sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); merge(d,d+n/4,d+n/2,t); merge(d+n/2,d+3*(n/4),d+n,t+n/2); merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); Divide array into subarrays and recursively sort subarrays

“Recursively Sort Four Quarters of d” void sort(int *d, int *t, int n) if (n > CUTOFF) { sort(d,t,n/4); sort(d+n/4,t+n/4,n/4); sort(d+2*(n/2),t+2*(n/2),n/4); sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); merge(d,d+n/4,d+n/2,t); merge(d+n/2,d+3*(n/4),d+n,t+n/2); merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); Subproblems Identified Using Pointers Into Middle of Array 4 7 6 1 5 3 8 2 d d+n/4 d+n/2 d+3*(n/4)

“Recursively Sort Four Quarters of d” void sort(int *d, int *t, int n) if (n > CUTOFF) { sort(d,t,n/4); sort(d+n/4,t+n/4,n/4); sort(d+2*(n/2),t+2*(n/2),n/4); sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); merge(d,d+n/4,d+n/2,t); merge(d+n/2,d+3*(n/4),d+n,t+n/2); merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); 4 7 6 1 5 3 8 2 d d+n/4 d+n/2 d+3*(n/4)

“Recursively Sort Four Quarters of d” void sort(int *d, int *t, int n) if (n > CUTOFF) { sort(d,t,n/4); sort(d+n/4,t+n/4,n/4); sort(d+2*(n/2),t+2*(n/2),n/4); sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); merge(d,d+n/4,d+n/2,t); merge(d+n/2,d+3*(n/4),d+n,t+n/2); merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); Sorted Results Written Back Into Input Array 7 4 1 6 5 3 2 8 d d+n/4 d+n/2 d+3*(n/4)

“Merge Sorted Quarters of d Into Halves of t” void sort(int *d, int *t, int n) if (n > CUTOFF) { sort(d,t,n/4); sort(d+n/4,t+n/4,n/4); sort(d+2*(n/2),t+2*(n/2),n/4); sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); merge(d,d+n/4,d+n/2,t); merge(d+n/2,d+3*(n/4),d+n,t+n/2); merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); 7 4 1 6 5 3 2 8 d 4 1 6 7 3 2 5 8 t t+n/2

“Merge Sorted Halves of t Back Into d” void sort(int *d, int *t, int n) if (n > CUTOFF) { sort(d,t,n/4); sort(d+n/4,t+n/4,n/4); sort(d+2*(n/2),t+2*(n/2),n/4); sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); merge(d,d+n/4,d+n/2,t); merge(d+n/2,d+3*(n/4),d+n,t+n/2); merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); 2 1 3 4 6 5 7 8 d 4 1 6 7 3 2 5 8 t t+n/2

“Use a Simple Sort for Small Problem Sizes” void sort(int *d, int *t, int n) if (n > CUTOFF) { sort(d,t,n/4); sort(d+n/4,t+n/4,n/4); sort(d+2*(n/2),t+2*(n/2),n/4); sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); merge(d,d+n/4,d+n/2,t); merge(d+n/2,d+3*(n/4),d+n,t+n/2); merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); 4 7 6 1 5 3 8 2 d d+n

“Use a Simple Sort for Small Problem Sizes” void sort(int *d, int *t, int n) if (n > CUTOFF) { sort(d,t,n/4); sort(d+n/4,t+n/4,n/4); sort(d+2*(n/2),t+2*(n/2),n/4); sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); merge(d,d+n/4,d+n/2,t); merge(d+n/2,d+3*(n/4),d+n,t+n/2); merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); 4 7 1 6 5 3 8 2 d d+n

Parallel Sort void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4); spawn sort(d+2*(n/2),t+2*(n/2),n/4); spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); sync; spawn merge(d,d+n/4,d+n/2,t); spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n);

What Do You Need To Know To Exploit This Form of Parallelism?

What Do You Need To Know To Exploit This Form of Parallelism? Symbolic Information About Accessed Memory Regions

Information Needed To Exploit Parallelism Calls to sort access disjoint parts of d and t Together, calls access [d,d+n-1] and [t,t+n-1] sort(d,t,n/4); sort(d+n/4,t+n/4,n/4); sort(d+n/2,t+n/2,n/4); sort(d+3*(n/4),t+3*(n/4), n-3*(n/4)); d d+n-1 t t+n-1 d d+n-1 t t+n-1 d d+n-1 t t+n-1 d d+n-1 t t+n-1

Information Needed To Exploit Parallelism First two calls to merge access disjoint parts of d,t Together, calls access [d,d+n-1] and [t,t+n-1] merge(d,d+n/4,d+n/2,t); merge(d+n/2,d+3*(n/4), d+n,t+n/2); merge(t,t+n/2,t+n,d); d d+n-1 t t+n-1 d d+n-1 t t+n-1 d d+n-1 t t+n-1

Information Needed To Exploit Parallelism Calls to insertionSort access [d,d+n-1] insertionSort(d,d+n); d d+n-1 t t+n-1

What Do You Need To Know To Exploit This Form of Parallelism? Symbolic Information About Accessed Memory Regions: sort(p,n) accesses [p,p+n-1] insertionSort(p,n) accesses [p,p+n-1] merge(l,m,h,d) accesses [l,h-1], [d,d+(h-l)-1]

How Hard Is It To Figure These Things Out?

How Hard Is It To Figure These Things Out? Challenging

How Hard Is It To Figure These Things Out? void insertionSort(int *l, int *h) { int *p, *q, k; for (p = l+1; p < h; p++) { for (k = *p, q = p-1; l <= q && k < *q; q--) *(q+1) = *q; *(q+1) = k; } Not immediately obvious that insertionSort(l,h) accesses [l,h-1]

How Hard Is It To Figure These Things Out? void merge(int *l1, int*m, int *h2, int *d) { int *h1 = m; int *l2 = m; while ((l1 < h1) && (l2 < h2)) if (*l1 < *l2) *d++ = *l1++; else *d++ = *l2++; while (l1 < h1) *d++ = *l1++; while (l2 < h2) *d++ = *l2++; } Not immediately obvious that merge(l,m,h,d) accesses [l,h-1] and [d,d+(h-l)-1]

Issues Heavy Use of Pointers Pointers into Middle of Arrays Pointer Arithmetic Pointer Comparison Multiple Procedures sort(int *d, int *t, n) insertionSort(int *l, int *h) merge(int *l, int *m, int *h, int *t) Recursion

How the Compiler Does It

Compiler Structure Pointer Analysis Bounds Analysis Region Analysis Disambiguate References at Granularity of Allocation Blocks Symbolic Upper and Lower Bounds for Each Memory Access in Each Procedure Bounds Analysis Region Analysis Symbolic Regions Accessed By Execution of Each Procedure Parallelization Independent Procedure Calls That Can Execute in Parallel

Example – Array Increment void f(char *p, int n) if (n > CUTOFF) { f(p, n/2); /* increment first half */ f(p+n/2, n/2); /* increment second half */ } else { /* base case: initialize small array */ int i = 0; while (i < n) { *(p+i) += 1; i++; } }

Intra-procedural Bounds Analysis For each integer variable at each program point, derive lower and upper bounds Bounds are symbolic expressions variables represent initial values of parameters of enclosing procedure bounds are linear combinations of variables Example expression for f(p,n): p+n-1

Bounds Analysis What are upper and lower bounds for region accessed by while loop in base case? int i = 0; while (i < n) { *(p+i) += 1; i++; }

Build control flow graph Bounds Analysis, Step 1 Build control flow graph i = 0 i < n *(p+i) += 1 i = i+1

Set up bounds at beginning of basic blocks Bounds Analysis, Step 2 Set up bounds at beginning of basic blocks i = 0 l1  i  u1 i < n l2  i  u2 *(p+i) += 1 i = i+1 l3  i  u3

Compute transfer functions Bounds Analysis, Step 3 Compute transfer functions i = 0 l1  i  u1 0  i  0 l2  i  u2 i < n *(p+i) += 1 i = i+1 l3  i  u3 l3  i  u3 l3+1  i  u3+1

Compute transfer functions Bounds Analysis, Step 3 Compute transfer functions l1  i  u1 i = 0 0  i  0 i < n l2  i  u2 l2  i  n-1 l2  i  u2 *(p+i) += 1 i = i+1 l3  i  u3 l3  i  u3 l3+1  i  u3+1

Set up constraints for bounds Bounds Analysis, Step 4 Set up constraints for bounds i = 0 l1  i  u1 l2  0 l2  l3+1 l3  l2 0  i  0 i < n l2  i  u2 l2  i  n-1 l2  i  u2 0  u2 u2+1  u2 n-1  u3 *(p+i) += 1 i = i+1 l3  i  u3 l3  i  u3 l3+1  i  u3+1

Set up constraints for bounds Bounds Analysis, Step 4 Set up constraints for bounds i = 0 -  i + l2  0 l2  l3+1 l3  l2 0  i  0 i < n l2  i  u2 l2  i  n-1 l2  i  u2 0  u2 u2+1  u2 n-1  u3 *(p+i) += 1 i = i+1 l3  i  u3 l3  i  u3 l3+1  i  u3+1

Bounds Analysis, Step 5 Generate symbolic expressions for bounds Goal: express bounds in terms of parameters l2 = c1p + c2n + c3 l3 = c4p + c5n + c6 u2 = c7p + c8n + c9 u3 = c10p + c11n + c12

Substitute expressions into constraints Bounds Analysis, Step 6 Substitute expressions into constraints c1p + c2n + c3  0 c1p + c2n + c3  c4p + c5n + c6 +1 c4p + c5n + c6  c1p + c2n + c3 0  c7p + c8n + c9 c10p + c11n + c12 +1  c7p + c8n + c9 c7p + c8n + c9  c10p + c11n + c12

Goal Solve Symbolic Constraint System find values for constraint variables c1, ..., c12 that satisfy the inequality constraints Maximize Lower Bounds Minimize Upper Bounds

Reduce symbolic inequalities to Bounds Analysis, Step 7 Reduce symbolic inequalities to linear inequalities c1p + c2n + c3  c4p + c5n + c6 if c1  c4, c2  c5, and c3  c6

max: (c1 + ••• + c6) - (c7 + ••• + c12) Bounds Analysis, Step 7 Apply reduction and generate a linear program c1  0 c2  0 c3  0 c1  c4 c2  c5 c3  c6+1 c4  c1 c5  c2 c6  c3 0  c7 0  c8 0  c9 c10  c7 c11  c8 c12+1  c9 c7  c10 c8  c11 c9  c12 Objective Function: max: (c1 + ••• + c6) - (c7 + ••• + c12) lower bounds upper bounds

Bounds Analysis, Step 7 Apply reduction and generate a linear program This is a linear program (LP), not an integer linear program (ILP) The coefficients in the symbolic expressions are rational numbers Rational coefficients are needed for expressions like middle of an array: low+(high - low)/2

Solve linear program to extract bounds Bounds Analysis, Step 8 Solve linear program to extract bounds c1=0 c2 =0 c3 =0 c4=0 c5 =0 c6 =0 c7=0 c8 =1 c9 =0 c10=0 c11=1 c12=-1 -  i + i = 0 0  i  0 l2  i  u2 i < n l2  i  n-1 l2  i  u2 l2 = 0 l3 = 0 *(p+i) += 1 i = i+1 l3  i  u3 u2 = 0 u3 = n-1 l3  i  u3 l3+1  i  u3+1

Solve linear program to extract bounds Bounds Analysis, Step 8 Solve linear program to extract bounds c1=0 c2 =0 c3 =0 c4=0 c5 =0 c6 =0 c7=0 c8 =1 c9 =0 c10=0 c11=1 c12=-1 i = 0 -  i + 0  i  0 0  i  n i < n 0  i  n-1 0  i  n l2 = 0 l3 = 0 *(p+i) += 1 i = i+1 0  i  n-1 u2 = 0 u3 = n-1 0  i  n-1 1  i  n

Solve linear program to extract bounds Bounds Analysis, Step 8 Solve linear program to extract bounds c1=0 c2 =0 c3 =0 c4=0 c5 =0 c6 =0 c7=0 c8 =1 c9 =0 c10=0 c11=1 c12=-1 i = 0 -  i + 0  i  0 0  i  n i < n 0  i  n-1 0  i  n l2 = 0 l3 = 0 *(p+i) += 1 i = i+1 0  i  n-1 u2 = 0 u3 = n-1 0  i  n-1 1  i  n

Goal: Compute Accessed Regions of Memory Region Analysis Goal: Compute Accessed Regions of Memory Intra-Procedural Use bounds at each load or store Compute accessed region Inter-Procedural Use intra-procedural results Set up another symbolic constraint system Solve to find regions accessed by entire execution of the procedure

Basic Principle of Inter-Procedural Region Analysis For each procedure Generate symbolic expressions for upper and lower bounds of accessed regions Constraint System Accessed regions include regions accessed by statements in procedure Accessed regions include regions accessed by invoked procedures

Inter-Procedural Constraints in Example Accesses [ l(f,p,n), u(f,p,n) ] void f(char *p, int n) if (n > CUTOFF) { f(p, n/2); f(p+n/2, n/2); } else { int i = 0; while (i < n) { *(p+i) += 1; i++; } l(f,p,n)  l(f,p,n/2) u(f,p,n)  u(f,p,n/2) l(f,p,n)  l(f,p+n/2,n/2) u(f,p,n)  u(f,p+n/2,n/2) l(f,p,n)  p u(f,p,n)  p+n-1

Derive Constraint System Generate symbolic expressions l(f,p,n) = C1p + C2n + C3 u(f,p,n) = C4p + C5n + C6 Build constraint system C1p + C2n + C3  p C4p + C5n + C6  p + n -1 C1p + C2n + C3  C1p + C2(n/2) + C3 C4p + C5n + C6  C4p + C5(n/2) + C6 C1p + C2n + C3  C1(p+n/2) + C2(n/2) + C3 C4p + C5n + C6  C4(p+n/2) + C5(n/2) + C6

Solve Constraint System Simplify Constraint System C1p + C2n + C3  p C4p + C5n + C6  p + n -1 C2n  C2(n/2) C5n  C5(n/2) C2(n/2)  C1(n/2) C5(n/2)  C4(n/2) Generate and Solve Linear Program l(f,p,n) = p u(f,p,n) = p+n-1 Access region: [p, p+n-1]

Parallelization Dependence Testing of Two Calls Do accessed regions intersect? Based on comparing upper and lower bounds of accessed regions Parallelization Find sequences of independent calls Execute independent calls in parallel

Details Inter-procedural positivity analysis Verify that variables are positive Required for correctness of reduction Correlation analysis Integer division Basic idea : (n-1)/2  n/2  n/2 Generalized : (n-m+1)/m  n/m  n/m Linear system decomposition

Comparison to Dataflow Analysis Uses iterative algorithms Cannot handle lattices with infinite ascending chains, because termination is not guaranteed Our framework Reduces the analysis to a linear program Works for lattices with infinite ascending chains like integers, rational numbers or polynomials No possibility of non-termination

Uses of Symbolic Bounds Information Transformations Verifications Automatic Parallelization Of Sequential Programs Data Race Detection For Parallel Programs Bounds Checks Elimination For Safe Programs Array Bounds Checking For Unsafe Programs

Application of Analysis Framework Bitwidth Analysis: Computes minimum number of bits to represent computed values Important for hardware synthesis from high level languages For our framework: Bitwidth analysis is a special case: Compute precise numeric bounds Constraint system = linear program

Experimental Results Implementation - SUIF, lp_solve, Cilk Parallelization speedups: Application Number of Processors 1 2 4 6 8 Fibonacci 0.76 1.52 3.03 4.55 6.04 Quicksort 1.00 1.99 3.89 5.68 7.36 Mergesort 2.00 3.90 5.70 7.41 Heat 1.03 2.02 5.53 6.83 BlockMul 0.97 1.86 3.84 7.54 NoTempMul 1.02 2.01 4.03 6.02 8.02 LU 0.98 1.95 5.66 7.39

Experimental Results Implementation - SUIF, lp_solve, Cilk Parallelization speedups: Close to linear speedups Most of parallelism detected

Experimental Results Implementation - SUIF, lp_solve, Cilk Parallelization speedups: Close to linear speedups Most of parallelism detected Compiler also verified that: Parallel versions were free of data races Benchmarks do not violate the array bounds

Experimental Results Implementation - SUIF, lp_solve Bitwidth reduction:

Context Mainstream parallelizing compilers Loop nests, dense matrices Affine access functions Our framework focuses on: Recursion, dynamically allocated arrays Pointers, pointer arithmetic Key problems: pointer analysis, symbolic region analysis, solving linear programs

Conclusion Novel framework for symbolic bounds analysis Uses symbolic constraint systems Reduces problem to linear programs More powerful than iterative approaches Analysis uses: Parallelization, data race detection Detecting array bounds violations Array bounds check elimination Bitwidth analysis