Program Analysis and Design Conformance Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Slides:



Advertisements
Similar presentations
Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.
Advertisements

A Program Transformation For Faster Goal-Directed Search Akash Lal, Shaz Qadeer Microsoft Research.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Potential Languages of the Future Chapel,
A survey of techniques for precise program slicing Komondoor V. Raghavan Indian Institute of Science, Bangalore.
Automatic Parallelization of Divide and Conquer Algorithms Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.
Commutativity Analysis: A New Analysis Technique for Parallelizing Compilers Martin C. Rinard Pedro C. Diniz April 7 th, 2010 Youngjoon Jo.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 18.
Static Analysis of Embedded C Code John Regehr University of Utah Joint work with Nathan Cooprider.
Vertically Integrated Analysis and Transformation for Embedded Software John Regehr University of Utah.
Establishing Local Temporal Heap Safety Properties with Applications to Compile-Time Memory Management Ran Shaham Eran Yahav Elliot Kolodner Mooly Sagiv.
Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.
Chapter 2: Algorithm Discovery and Design
Programming Language Semantics Java Threads and Locks Informal Introduction The Java Specification Language Chapter 17.
Compositional Pointer and Escape Analysis for Java Programs Martin Rinard Laboratory for Computer Science MIT John Whaley IBM Tokyo Research Laboratory.
Run time vs. Compile time
Compile-Time Deallocation of Individual Objects Sigmund Cherem and Radu Rugina International Symposium on Memory Management June, 2006.
Role Analysis Victor Kunkac, Patric Lam, Martin Rinard Laboratory for Computer Science, MIT Presentation by George Caragea CMSC631,
1 Run time vs. Compile time The compiler must generate code to handle issues that arise at run time Representation of various data types Procedure linkage.
Reps Horwitz and Sagiv 95 (RHS) Another approach to context-sensitive interprocedural analysis Express the problem as a graph reachability query Works.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Chapter 2: Algorithm Discovery and Design
Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.
Handouts Software Testing and Quality Assurance Theory and Practice Chapter 5 Data Flow Testing
Memory Management for Real-Time Java Wes Beebee and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Supported by: DARPA.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
Analyses and Optimizations for Multithreaded Programs Martin Rinard, Alex Salcianu, Brian Demsky MIT Laboratory for Computer Science John Whaley IBM Tokyo.
Chapters 7, 8, & 9 Quiz 3 Review 1. 2 Algorithms Algorithm A set of unambiguous instructions for solving a problem or subproblem in a finite amount of.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
Compiler Construction
Chapter 13 Recursion. Learning Objectives Recursive void Functions – Tracing recursive calls – Infinite recursion, overflows Recursive Functions that.
ROBERT BOCCHINO, ET AL. UNIVERSAL PARALLEL COMPUTING RESEARCH CENTER UNIVERSITY OF ILLINOIS A Type and Effect System for Deterministic Parallel Java *Based.
Dynamic Analysis of Multithreaded Java Programs Dr. Abhik Roychoudhury National University of Singapore.
Type Systems CS Definitions Program analysis Discovering facts about programs. Dynamic analysis Program analysis by using program executions.
Fast Points-to Analysis for Languages with Structured Types Michael Jung and Sorin A. Huss Integrated Circuits and Systems Lab. Department of Computer.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CSC3315 (Spring 2008)1 CSC 3315 Subprograms Hamid Harroud School of Science and Engineering, Akhawayn University
Component Composition for Embedded Systems Using Semantic Aspect-Oriented Programming Martin Rinard Laboratory for Computer Science Massachusetts Institute.
Introduction to Problem Solving. Steps in Programming A Very Simplified Picture –Problem Definition & Analysis – High Level Strategy for a solution –Arriving.
Mark Marron IMDEA-Software (Madrid, Spain) 1.
Run-Time Storage Organization Compiler Design Lecture (03/23/98) Computer Science Rensselaer Polytechnic.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Overview of Compilers and JikesRVM John.
RUN-Time Organization Compiler phase— Before writing a code generator, we must decide how to marshal the resources of the target machine (instructions,
Review 1 Selection Sort Selection Sort Algorithm Time Complexity Best case Average case Worst case Examples.
Chapter 3 Top-Down Design with Functions Part II J. H. Wang ( 王正豪 ), Ph. D. Assistant Professor Dept. Computer Science and Information Engineering National.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.
Pointer Analysis Survey. Rupesh Nasre. Aug 24, 2007.
Pointer Analysis for Multithreaded Programs Radu Rugina and Martin Rinard M I T Laboratory for Computer Science.
Redesigning Air Traffic Control: An Exercise in Software Design Daniel Jackson and John Chapin, MIT Lab for Computer Science Presented by: Jingming Zhang.
Design-Directed Programming Martin Rinard Daniel Jackson MIT Laboratory for Computer Science.
Pointer and Escape Analysis for (Multithreaded) Programs Martin Rinard MIT Laboratory for Computer Science.
Pointer and Escape Analysis for Multithreaded Programs Alexandru Salcianu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
Recursion Unrolling for Divide and Conquer Programs Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.
SOFTWARE TESTING LECTURE 9. OBSERVATIONS ABOUT TESTING “ Testing is the process of executing a program with the intention of finding errors. ” – Myers.
Credible Compilation With Pointers Martin Rinard and Darko Marinov Laboratory for Computer Science Massachusetts Institute of Technology.
Abstract Interpretation and Future Program Analysis Problems Martin Rinard Alexandru Salcianu Laboratory for Computer Science Massachusetts Institute of.
Object Lifetime and Pointers
Compositional Pointer and Escape Analysis for Java programs
Support for Program Analysis as a First-Class Design Constraint in Legion Michael Bauer 02/22/17.
Compositional Pointer and Escape Analysis for Java Programs
Program Analysis Techniques for Memory Disambiguation
Martin Rinard Laboratory for Computer Science
Design-Driven Compilation
Radu Rugina and Martin Rinard Laboratory for Computer Science
Pointer analysis.
자바 언어를 위한 정적 분석 (Static Analyses for Java) ‘99 한국정보과학회 가을학술발표회 튜토리얼
Software Testing and QA Theory and Practice (Chapter 5: Data Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.
Presentation transcript:

Program Analysis and Design Conformance Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Research Overview Program Analysis Commutativity Analysis for C++ Programs [PLDI96] Memory Disambiguation for Multithreaded C Programs Pointer Analysis [PLDI99] Region Analysis [PPoPP99, PLDI00] Pointer and Escape Analysis for Multithreaded Java Programs [OOPSLA99, PLDI01, PPoPP01]

Research Overview Transformations Automatic Parallelization Object-Oriented Programs with Linked Data Structures [PLDI96] Divide and Conquer Programs [PPoPP99, PLDI00] Synchronization Optimizations Lock Coarsening [POPL97,PLDI98] Synchronization Elimination [OOPSLA99] Optimistic Synchronization Primitives [PPoPP97] Memory Management Optimizations Stack Allocation [OOPSLA99,PLDI01] Per-Thread Heap Allocation

Research Overview Verifications of Safety Properties Data Race Freedom [PLDI00] Array Bounds Checks [PLDI00] Correctness of Region-Based Allocation [PPoPP01] Credible Compilation [RTRV99] Correctness of Dataflow Analysis Results Correctness of Standard Compiler Optimizations

Talk Overview Memory Disambiguation Goal: Verify Data Race Freedom for Multithreaded Divide and Conquer Programs Analyses: Pointer Analysis Accessed Region Analysis Experience integrating information from the developer into the memory disambiguation analysis Role Verification Design Conformance

Basic Memory Disambiguation Problem *p = v Without Any Analysis: *p=v may access any location *p = v; (write v into the memory location that p points to) What memory locations may *p=v access?

*p = v; (write v into the memory location that p points to) What memory location may *p=v access? *p = v With Analysis: *p=v does not access these memory locations ! *p=v may access this location Basic Memory Disambiguation Problem

Static Memory Disambiguation Analyze the program to characterize the memory locations that statements in the program read and write Fundamental problem in program analysis with many applications

Application: Verify Data Race Freedom *p = v1; *q = v2; *q = v2 *p = v1 || *q = v2 *p = v1 Program Does This NOT This

Example - Divide and Conquer Sort

Divide

Example - Divide and Conquer Sort Conquer Divide

Example - Divide and Conquer Sort Conquer Divide Combine

Example - Divide and Conquer Sort Conquer Divide Combine

Divide and Conquer Algorithms Lots of Generated Concurrency Solve Subproblems in Parallel

Divide and Conquer Algorithms Lots of Recursively Generated Concurrency Recursively Solve Subproblems in Parallel

Divide and Conquer Algorithms Lots of Recursively Generated Concurrency Recursively Solve Subproblems in Parallel Combine Results in Parallel

“Sort n Items in d, Using t as Temporary Storage” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4); spawn sort(d+2*(n/2),t+2*(n/2),n/4); spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); sync; spawn merge(d,d+n/4,d+n/2,t); spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); sync; merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n);

“Sort n Items in d, Using t as Temporary Storage” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4); spawn sort(d+2*(n/2),t+2*(n/2),n/4); spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); sync; spawn merge(d,d+n/4,d+n/2,t); spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); sync; merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); Divide array into subarrays and recursively sort subarrays in parallel

“Sort n Items in d, Using t as Temporary Storage” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4); spawn sort(d+2*(n/2),t+2*(n/2),n/4); spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); sync; spawn merge(d,d+n/4,d+n/2,t); spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); sync; merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); Subproblems Identified Using Pointers Into Middle of Array d d+n/4 d+n/2 d+3*(n/4)

“Sort n Items in d, Using t as Temporary Storage” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4); spawn sort(d+2*(n/2),t+2*(n/2),n/4); spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); sync; spawn merge(d,d+n/4,d+n/2,t); spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); sync; merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); d d+n/4 d+n/2 d+3*(n/4) Sorted Results Written Back Into Input Array

“Merge Sorted Quarters of d Into Halves of t” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4); spawn sort(d+2*(n/2),t+2*(n/2),n/4); spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); sync; spawn merge(d,d+n/4,d+n/2,t); spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); sync; merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); d t t+n/2

“Merge Sorted Halves of t Back Into d” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4); spawn sort(d+2*(n/2),t+2*(n/2),n/4); spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); sync; spawn merge(d,d+n/4,d+n/2,t); spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); sync; merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); d t t+n/2

“Use a Simple Sort for Small Problem Sizes” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4); spawn sort(d+2*(n/2),t+2*(n/2),n/4); spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); sync; spawn merge(d,d+n/4,d+n/2,t); spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); sync; merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); d d+n

“Use a Simple Sort for Small Problem Sizes” void sort(int *d, int *t, int n) if (n > CUTOFF) { spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4); spawn sort(d+2*(n/2),t+2*(n/2),n/4); spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4)); sync; spawn merge(d,d+n/4,d+n/2,t); spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2); sync; merge(t,t+n/2,t+n,d); } else insertionSort(d,d+n); d d+n

What Do You Need To Know To Verify Data Race Freedom? Points-to Information (data blocks that pointers point into) Region Information (accessed regions within data blocks)

d and t point to different memory blocks Calls to sort access disjoint parts of d and t Together, calls access [d,d+n-1] and [t,t+n-1] sort(d,t,n/4); sort(d+n/4,t+n/4,n/4); sort(d+n/2,t+n/2,n/4); sort(d+3*(n/4),t+3*(n/4), n-3*(n/4)); Information Needed To Verify Race Freedom d t d t d t d t d+n-1 t+n-1 d+n-1 t+n-1 d+n-1 t+n-1 d+n-1 t+n-1

d and t point to different memory blocks First two calls to merge access disjoint parts of d,t Together, calls access [d,d+n-1] and [t,t+n-1] merge(d,d+n/4,d+n/2,t); merge(d+n/2,d+3*(n/4), d+n,t+n/2); merge(t,t+n/2,t+n,d); d t d t d+n-1 t+n-1 d+n-1 t+n-1 d t d+n-1 t+n-1 Information Needed To Verify Race Freedom

dd+n-1 Information Needed To Verify Race Freedom Calls to insertionSort access [d,d+n-1] insertionSort(d,d+n);

What Do You Need To Know To Verify Data Race Freedom? Points-to Information (d and t point to different data blocks) Symbolic Region Information (accessed regions within d and t blocks)

How Hard Is It To Figure These Things Out?

Challenging How Hard Is It For the Program Analysis To Figure These Things Out?

void insertionSort(int *l, int *h) { int *p, *q, k; for (p = l+1; p < h; p++) { for (k = *p, q = p-1; l <= q && k < *q; q--) *(q+1) = *q; *(q+1) = k; } Not immediately obvious that insertionSort(l,h) accesses [l,h-1]

void merge(int *l1, int*m, int *h2, int *d) { int *h1 = m; int *l2 = m; while ((l1 < h1) && (l2 < h2)) if (*l1 < *l2) *d++ = *l1++; else *d++ = *l2++; while (l1 < h1) *d++ = *l1++; while (l2 < h2) *d++ = *l2++; } Not immediately obvious that merge(l,m,h,d) accesses [l,h-1] and [d,d+(h-l)-1] How Hard Is It For the Program Analysis To Figure These Things Out?

Issues Heavy Use of Pointers Pointers into Middle of Arrays Pointer Arithmetic Pointer Comparison Multiple Procedures sort(int *d, int *t, n) insertionSort(int *l, int *h) merge(int *l, int *m, int *h, int *t) Recursion Multithreading

Pointer Analysis For each program point, computes where each pointer may point e.g. “ p  x before statement *p = 1” Complications 1. Statically unbounded number of locations recursive data structures (lists, trees) dynamically allocated arrays 2. Multiple possible executions of the program may create different dynamic data structures

Memory Abstraction Physical Memory Abstract Memory StackHeap p i head r p r q v qv j i j Allocation block for each variable declaration Allocation block for each memory allocation site

Memory Abstraction Physical Memory Abstract Memory StackHeap p i head r p r q v qv j i j Allocation block for each variable declaration Allocation block for each memory allocation site

Pointer Analysis Summary Key Challenge for Multithreaded Programs: Analyzing interactions between threads Solution: Interference Edges Record edges generated by each thread Captures effect of parallel threads on points-to information of other threads

What Pointer Analysis Gives Us Disambiguation of Memory Accesses Via Pointers Pointer-based loads and stores: use pointer analysis results to derive the allocation block that each pointer-based load or store statement accesses MOD-REF or READ-WRITE SETS Analysis: All loads and stores Procedures: use the memory access information for loads and stores to compute the allocation blocks that each procedure accesses

Is This Information Enough?

NO Necessary but not Sufficient Parallel Tasks Access (Disjoint) Regions of Same Allocated Block of Memory

Structure of Analysis Bounds Analysis Region Analysis Data Race Freedom Symbolic Upper and Lower Bounds for Each Memory Access in Each Procedure Symbolic Regions Accessed By Execution of Each Procedure Check that Parallel Threads Are Independent Pointer Analysis Disambiguate Memory at the Granularity of Allocation Blocks

Running Example – Array Increment void f(char *p, int n) if (n > CUTOFF) { spawn f(p, n/2); /* increment first half */ spawn f(p+n/2, n/2); /* increment second half */ sync; } else { /* base case: increment small array */ int i = 0; while (i < n) { *(p+i) += 1; i++; } }

Bounds Analysis Region Analysis Data Race Detection Symbolic Upper and Lower Bounds for Each Memory Access in Each Procedure Pointer Analysis Intra-procedural Bounds Analysis

Intraprocedural Bounds Analysis GOAL: For each pointer and array index variable at each program point, derive lower and upper bounds E.g. “ 0  i  n-1 at statement *(p+i) += 1 ” Bounds are symbolic expressions variables represent initial values of parameters of enclosing procedure bounds are combinations of variables example expression for f(p,n): p+(n/2)-1

What are upper and lower bounds for i at each program point in base case? int i = 0; while (i < n) { *(p+i) += 1; i++; } Intraprocedural Bounds Analysis

Bounds Analysis, Step 1 Build control flow graph i = 0 i < n *(p+i) += 1 i = i+1

Set up bounds at beginning of basic blocks Bounds Analysis, Step 2 l 1  i  u 1 i = 0 i < n *(p+i) += 1 i = i+1 l 2  i  u 2 l 3  i  u 3

Compute transfer functions Bounds Analysis, Step 3 l 1  i  u 1 i = 0 i < n *(p+i) += 1 i = i+1 l 2  i  u 2 l 3  i  u 3 0  i  0 l 3  i  u 3 l 3 +1  i  u 3 +1

l 2  i  n-1 n  i  u 2 l 2  i  u 2 Compute transfer functions Bounds Analysis, Step 3 l 1  i  u 1 i = 0 i < n *(p+i) += 1 i = i+1 l 3  i  u 3 0  i  0 l 3  i  u 3 l 3 +1  i  u 3 +1

Key Step: set up constraints for bounds Bounds Analysis, Step 4 l 2  i  n-1 n  i  u 2 l 2  i  u 2 i = 0 i < n *(p+i) += 1 i = i+1 l 3  i  u 3 0  i  0 l 3  i  u 3 l 3 +1  i  u 3 +1 Build Region Constraints [ 0, 0 ]  [ l 2, u 2 ] [ l 3 +1, u 3 +1 ]  [ l 2, u 2 ] [ l 2, n-1 ]  [ l 3, u 3 ] l 1  i  u 1

Key Step: set up constraints for bounds Bounds Analysis, Step 4 l 2  i  n-1 n  i  u 2 l 2  i  u 2 i = 0 i < n *(p+i) += 1 i = i+1 l 3  i  u 3 0  i  0 l 3  i  u 3 l 3 +1  i  u 3 +1 Build Region Constraints [ 0, 0 ]  [ l 2, u 2 ] [ l 3 +1, u 3 +1 ]  [ l 2, u 2 ] [ l 2, n-1 ]  [ l 3, u 3 ] l 1  i  u 1

Key Step: set up constraints for bounds Bounds Analysis, Step 4 l 2  i  n-1 n  i  u 2 l 2  i  u 2 i = 0 i < n *(p+i) += 1 i = i+1 l 3  i  u 3 0  i  0 l 3  i  u 3 l 3 +1  i  u 3 +1 Build Region Constraints [ 0, 0 ]  [ l 2, u 2 ] [ l 3 +1, u 3 +1 ]  [ l 2, u 2 ] [ l 2, n-1 ]  [ l 3, u 3 ] l 1  i  u 1

Key Step: set up constraints for bounds Bounds Analysis, Step 4 l 2  i  n-1 n  i  u 2 l 2  i  u 2 i = 0 i < n *(p+i) += 1 i = i+1 l 3  i  u 3 0  i  0 l 3  i  u 3 l 3 +1  i  u 3 +1 Build Region Constraints [ 0, 0 ]  [ l 2, u 2 ] [ l 3 +1, u 3 +1 ]  [ l 2, u 2 ] [ l 2, n-1 ]  [ l 3, u 3 ] -   i  + 

Key Step: set up constraints for bounds Bounds Analysis, Step 4 l 2  i  n-1 n  i  u 2 l 2  i  u 2 i = 0 i < n *(p+i) += 1 i = i+1 l 3  i  u 3 0  i  0 l 3  i  u 3 l 3 +1  i  u 3 +1 Build Region Constraints [ 0, 0 ]  [ l 2, u 2 ] [ l 3 +1, u 3 +1 ]  [ l 2, u 2 ] [ l 2, n-1 ]  [ l 3, u 3 ] -   i  + 

Key Step: set up constraints for bounds Bounds Analysis, Step 4 l 2  i  n-1 n  i  u 2 l 2  i  u 2 i = 0 i < n *(p+i) += 1 i = i+1 l 3  i  u 3 0  i  0 l 3  i  u 3 l 3 +1  i  u 3 +1 Build Region Constraints [ 0, 0 ]  [ l 2, u 2 ] [ l 3 +1, u 3 +1 ]  [ l 2, u 2 ] [ l 2, n-1 ]  [ l 3, u 3 ] -   i  +  l 2  0 l 2  l 3 +1 l 3  l 2 0  u 2 u 3 +1  u 2 n-1  u 3 Inequality Constraints

Generate symbolic expressions for bounds Goal: express bounds in terms of parameters Bounds Analysis, Step 5 l 2 = c 1 p + c 2 n + c 3 l 3 = c 4 p + c 5 n + c 6 u 2 = c 7 p + c 8 n + c 9 u 3 = c 10 p + c 11 n + c 12

Generate symbolic expressions for bounds Goal: express bounds in terms of parameters l 2 = c 1 p + c 2 n + c 3 l 3 = c 4 p + c 5 n + c 6 Bounds Analysis, Step 5 u 2 = c 7 p + c 8 n + c 9 u 3 = c 10 p + c 11 n + c 12 l 2  0 l 2  l 3 +1 l 3  l 2 0  u 2 u 3 +1  u 2 n-1  u 3

c 1 p + c 2 n + c 3  0 c 1 p + c 2 n + c 3  c 4 p + c 5 n + c 6 +1 c 4 p + c 5 n + c 6  c 1 p + c 2 n + c 3 Substitute expressions into constraints Bounds Analysis, Step 6 0  c 7 p + c 8 n + c 9 c 10 p + c 11 n + c  c 7 p + c 8 n + c 9 c 7 p + c 8 n + c 9  c 10 p + c 11 n + c 12

Reduce symbolic inequalities to linear inequalities c 1 p + c 2 n + c 3  c 4 p + c 5 n + c 6 if c 1  c 4, c 2  c 5, and c 3  c 6 Bounds Analysis, Step 7

Apply reduction and generate a linear program c 1  0 c 2  0 c 3  0 c 1  c 4 c 2  c 5 c 3  c 6 +1 c 4  c 1 c 5  c 2 c 6  c 3 Bounds Analysis, Step 8 0  c 7 0  c 8 0  c 9 c 10  c 7 c 11  c 8 c  c 9 c 7  c 10 c 8  c 11 c 9  c 12

Apply reduction and generate a linear program c 1  0 c 2  0 c 3  0 c 1  c 4 c 2  c 5 c 3  c 6 +1 c 4  c 1 c 5  c 2 c 6  c 3 lower boundsupper bounds Bounds Analysis, Step 8 Objective Function: max: (c c 6 ) - (c c 12 ) 0  c 7 0  c 8 0  c 9 c 10  c 7 c 11  c 8 c  c 9 c 7  c 10 c 8  c 11 c 9  c 12

Solve linear program to extract bounds Bounds Analysis, Step 10 c 1 =0c 2 =0c 3 =0 c 4 =0c 5 =0c 6 =0 c 7 =0c 8 =1c 9 =0 c 10 =0c 11 =1c 12 =-1 l 2  i  n-1 n  i  u 2 l 2  i  u 2 -   i  +  i = 0 i < n *(p+i) += 1 i = i+1 l 3  i  u 3 0  i  0 l 3  i  u 3 l 3 +1  i  u 3 +1 Solution

Solve linear program to extract bounds Bounds Analysis, Step 9 u 2 = n u 3 = n-1 l 2  i  n-1 n  i  u 2 l 2  i  u 2 -   i  +  i = 0 i < n *(p+i) += 1 i = i+1 l 3  i  u 3 0  i  0 l 3  i  u 3 l 3 +1  i  u 3 +1 l 2 = 0 l 3 = 0 c 1 =0c 2 =0c 3 =0 c 4 =0c 5 =0c 6 =0 c 7 =0c 8 =1c 9 =0 c 10 =0c 11 =1c 12 =-1 Solution Symbolic Bounds

Substitute bounds at each program point Bounds Analysis, Step 10 0  i  n-1 n  i  n 0  i  n -   i  +  i = 0 i < n *(p+i) += 1 i = i+1 0  i  n-1 0  i  0 0  i  n-1 1  i  n u 2 = n u 3 = n-1 l 2 = 0 l 3 = 0 c 1 =0c 2 =0c 3 =0 c 4 =0c 5 =0c 6 =0 c 7 =0c 8 =1c 9 =0 c 10 =0c 11 =1c 12 =-1 Solution Symbolic Bounds

0  i  n-1 n  i  n 0  i  n -   i  +  i = 0 i < n *(p+i) += 1 i = i+1 0  i  n-1 0  i  0 0  i  n-1 1  i  n Compute access regions at each load or store Access Regions [p,p+n-1] u 2 = n u 3 = n-1 l 2 = 0 l 3 = 0 c 1 =0c 2 =0c 3 =0 c 4 =0c 5 =0c 6 =0 c 7 =0c 8 =1c 9 =0 c 10 =0c 11 =1c 12 =-1 Solution Symbolic Bounds

Bounds Analysis Region Analysis Data Race Detection Symbolic Regions Accessed By Execution of Each Procedure Pointer Analysis Interprocedural Region Analysis

Same Approach Set up target bounds of accessed regions Build a constraint system to compute these bounds Constraint System Accessed regions for a procedure must include: 1. Regions accessed by statements in the procedure 2. Regions accessed by invoked procedures Interprocedural Region Analysis GOAL: Compute accessed regions of memory for each procedure E.g. “ f(p,n) accesses [p, p+n-1] ”

void f(char *p, int n) if (n > CUTOFF) { spawn f(p, n/2); spawn f(p+n/2, n/2); sync; } else { int i = 0; while (i < n) { *(p+i) += 1; i++; } } [ p, p+n-1 ] Region Analysis in Example

f(p,n) accesses [ l(p,n), u(p,n) ] Region Analysis in Example void f(char *p, int n) if (n > CUTOFF) { spawn f(p, n/2); spawn f(p+n/2, n/2); sync; } else { int i = 0; while (i < n) { *(p+i) += 1; i++; } } [ p, p+n-1 ]

[ l(p,n/2), u(p,n/2) ] [ l(p+n/2,n/2), u(p+n/2,n/2) ] Region Analysis in Example void f(char *p, int n) if (n > CUTOFF) { spawn f(p, n/2); spawn f(p+n/2, n/2); sync; } else { int i = 0; while (i < n) { *(p+i) += 1; i++; } } [ p, p+n-1 ] f(p,n) accesses [ l(p,n), u(p,n) ]

Derive Constraint System Region constraints [ l(p,n/2), u(p,n/2) ]  [ l(p,n), u(p,n) ]www [ l(p+n/2,n/2), u(p+n/2,n/2) ]  [ l(p,n), u(p,n) ]www [ p, p+n-1 ]  [ l(p,n), u(p,n) ]www Reduce to inequalities between lower/upper bounds Further reduce to a linear program and solve: l(p,n) = p u(p,n) = p+n-1 Access region for f(p,n): [p, p+n-1]

Bounds Analysis Region Analysis Data Race Freedom Check that Parallel Threads Are Independent Pointer Analysis Data Race Freedom

Dependence testing of two statements Do accessed regions intersect? Based on comparing upper and lower bounds of accessed regions Absence of data races Check that all the statements that execute in parallel are independent Data Race Freedom

void f(char *p, int n) if (n > CUTOFF) { spawn f(p, n/2); spawn f(p+n/2, n/2); sync; } else { int i = 0; while (i < n) { *(p+i) += 1; i++; } } f(p,n) accesses [ p, p+n-1 ]

[ p, p+n/2-1 ] [ p+n/2, p+n-1 ] Data Race Freedom void f(char *p, int n) if (n > CUTOFF) { spawn f(p, n/2); spawn f(p+n/2, n/2); sync; } else { int i = 0; while (i < n) { *(p+i) += 1; i++; } } f(p,n) accesses [ p, p+n-1 ]

No data races ! Data Race Freedom void f(char *p, int n) if (n > CUTOFF) { spawn f(p, n/2); spawn f(p+n/2, n/2); sync; } else { int i = 0; while (i < n) { *(p+i) += 1; i++; } }

Fundamental Property of the Analysis: No Fixed Point Computations The analysis does not use fixed-point computations: The problem is reduced to a linear program The solution to the linear program directly gives the symbolic lower and upper bounds Fixed-point approaches: Termination is not guaranteed: analysis domain of symbolic expressions has infinite ascending chains Use imprecise techniques to ensure termination: Artificially truncate number of iterations Use imprecise widening operators

Experience Set of benchmark programs Two versions of each benchmark Sequential version written in C Multithreaded version written in Cilk Experiments: 1.Data Race Freedom for the multithreaded versions 2.Array Bounds Violation Detection for both sequential and multithreaded versions 3.Automatic Parallelization for the sequential version

Data Races and Array Bounds Violations Application Data races (multithreaded) Array Bounds Violations (multithreaded) Array Bounds Violations (sequential) QuickSort NO MergeSort NO BlockMul NO NoTempMul NO LU NO Knapsack YESNO Heat NO

Parallel Performance Quicksort MergesortHeat BlockMul NoTempMulLU

Summary Sophisticated Memory Disambiguation Analysis Points-to Information Accessed Region Information Automatic Interprocedural Handles Multithreaded Programs Other Uses Besides Data Race Freedom Bitwidth Analysis Array-Bounds Check Elimination Buffer Overrun Detection

Bigger Picture Analysis has a very specific goal Developer understands and cares about results Points-to and region information is (implicitly) part of the interface of each procedure Developer understands interfaces Developer has expectations about analysis results Analysis can identify serious programming errors Developer expectations are implicit

Idea Enhance procedure interface to make points-to and region information explicit Points-to language Points-to graphs at entry and exit Effect on points-to relationships Region language Symbolic specification of accessed regions Developer provides information Analysis verifies that it is correct, and that correctness implies data race freedom

Points-to Language f(p, q, n) { context { entry: p->_a, q->_b; exit: p->_a, _a->_c, q->_b, _b->_d; } context { entry: p->_a, q->_a; exit: p->_a, _a->_c, q->_a; }

Points-to Language f(p, q, n) { context { entry: p->_a, q->_b; exit: p->_a, _a->_c, q->_b, _b->_d; } context { entry: p->_a, q->_a; exit: p->_a, _a->_c, q->_a; } p q p q p q p q Contexts for f(p,q,n) entry exit

Verifying Points-to Information One (flow sensitive) analysis per context f(p,q,n) {. } p q p q p q p q Contexts for f(p,q,n) entry exit

Verifying Points-to Information Start with entry points-to graph f(p,q,n) {. } p q p q p q p q entry exit p q Contexts for f(p,q,n)

Verifying Points-to Information Analyze procedure f(p,q,n) {. } p q p q p q p q entry exit p q Contexts for f(p,q,n)

Verifying Points-to Information Analyze procedure f(p,q,n) {. } p q p q p q p q entry exit p q Contexts for f(p,q,n)

Verifying Points-to Information Check result against exit points-to graph f(p,q,n) {. } p q p q p q p q entry exit p q Contexts for f(p,q,n)

Verifying Points-to Information Similarly for other context f(p,q,n) {. } p q p q p q p q entry exit Contexts for f(p,q,n)

Verifying Points-to Information Start with entry points-to graph f(p,q,n) {. } p q p q p q p q entry exit p q Contexts for f(p,q,n)

Verifying Points-to Information Analyze procedure f(p,q,n) {. } p q p q p q p q entry exit p q Contexts for f(p,q,n)

Verifying Points-to Information Check result against exit points-to graph f(p,q,n) {. } p q p q p q p q entry exit p q Contexts for f(p,q,n)

Analysis of Call Statements g(r,n) {. f(r,s,n);. }

Analysis of Call Statements Analysis produces points-graph before call g(r,n) {. f(r,s,n);. } r s

p q p q p q p q entry exit Contexts for f(p,q,n) Analysis of Call Statements Retrieve declared contexts from callee g(r,n) {. f(r,s,n);. } r s

p q p q p q p q entry exit Contexts for f(p,q,n) Analysis of Call Statements Find context with matching entry graph g(r,n) {. f(r,s,n);. } r s

p q p q p q p q entry exit Contexts for f(p,q,n) Analysis of Call Statements Find context with matching entry graph g(r,n) {. f(r,s,n);. } r s

p q p q p q p q entry exit Contexts for f(p,q,n) Analysis of Call Statements Apply corresponding exit points-to graph g(r,n) {. f(r,s,n);. } r s r s

Analysis of Call Statements Continue analysis after call g(r,n) {. f(r,s,n);. } r s

Analysis of Call Statements g(r,n) {. f(r,s,n);. } r s Result Points-to declarations separate analysis of multiple procedures Transformed global, whole-program analysis into local analysis that operates on each procedure independently

Experience Implemented points-to and region languages Integrated with points-to and region analyses Divide and Conquer Benchmarks Quicksort (QS) Mergesort (MS) Matrix multiply (MM) LU decomposition (LU) Heat (H) We added points-to and region information Sorting Programs Dense Matrix Computations Scientific Computation

Programming Overhead Proportion of C Code, Region Declarations, and Points-to Declarations QSMSMMLUH C Code Region Declarations Points-to Declarations

Evaluation How difficult is it to provide declarations? Not that difficult. Have to write comparatively little code Must know information anyway How much benefit does analysis obtain? Substantial benefit. Simpler analysis software (no complex interprocedural analysis) More scalable, precise analysis

Evaluation Software Engineering Benefits of Points-to and Region Declarations Improved communication between developer and analysis Analysis reflects developer’s expectations Enhanced code reliability Enhanced interface information Analyze incomplete programs Programs that use libraries Programs under development

Evaluation Drawbacks of Points-to and Region Declarations Have to learn new language Have to integrate into development process Legacy software issues (programmer may not know points-to and region information)

Steps to Design Conformance Verify that Program Correctly Implements Key Design Properties as Expressed by Developer or Designer Role Verification Design Conformance for Object Models (joint with Daniel Jackson, MIT LCS) Context: Air Traffic Control Software MIT LCS (Daniel Jackson, Martin Rinard) MIT Aero-Astro Department (R. John Hansman) NASA Ames Research Center (Michelle Eshow) Kansas State University CS Dept. (David Schmidt) CTAS (Center/TRACON Automation System)

Role Verification Objects play different roles during their lifetime in computation Parked Aircraft, Taxiing Aircraft, Cleared for Takeoff Aircraft, In Flight Aircraft Roles reflect constraints on activities of object System actions must respect role constraints Parked Aircraft can’t take off Action violations indicate system confusion Goals Obtain role information from developer Check that program uses roles correctly

Role Classification Two General Kinds of Classification Content-based (predicate on object fields determines role) Relative (points-to relationships determine role) Role Classification is Application Dependent Aircraft Flying Aircraft Parked Aircraft Taxiing Aircraft Cleared Aircraft Class Roles

Standard View of Object Fields Outgoing References List of Meter Fixes Sequence Of Points String Runway Object Gate Object Incoming References Flight Plan Trajectory Flight Name Runway Gate

Relative Role Classification Points-to relationships define roles Specify sources of incoming edges Field of an object playing a given role Global or local variable Specify target of outgoing edges Specify available fields in each role

Example Roles Gate Object Aircraft Parked Aircraft Flight Plan Trajectory Flight Name Runway Gate

Trajectory Gate Example Roles Runway Object Aircraft Cleared for Takeoff Aircraft Flight Plan Runway Flight Name List of Meter Fixes String

Role Verification Analysis Obtains Role Definitions Method Information Roles of parameters and globals on entry Role changes that method performs Role of return value Intraprocedural Analysis Simulates potential executions of method Precise abstraction of heap Use role information for invoked methods Verify correctness of role information

Benefits of Roles Software Engineering Benefits Safety checks that take application semantics into account Enhanced implementation transparency Transformations Enabled By Precise Referencing Behavior Safe real-time memory management Parallelization and race detection for Programs with linked data structures Optimized Atomic Transactions

Key Issue: Obtaining Role Information Range of Developer and Designer Involvement Some Involvement Reasonable and Necessary: Roles Reflect Application-Specific Properties Primary Focus: Role Definitions Determine analysis distinctions Relevance of extracted information Secondary Focus: Method Specifications Developer specifies roles of parameters Analysis extracts role changes

Design Conformance Software Development Activities Requirements Design Implementation Design is Partial Focus on Important Aspects Omit Many Low-Level Details Design and Implementation are Disconnected No guarantee that code conforms to design

Goal of Design Conformance Establish and mechanically check conformance Use specific design formalism (object models) Boxes (objects) and Arrows (relations between objects) Aircraft Flying Aircraft Parked Aircraft Taxiing Aircraft Cleared Aircraft Meter Fix Flight Plan ++

Key Issue Establishing correspondence between object model and implementation Object models usually at a higher level of abstraction Many relations in object model realized as group of objects and references Object model may entirely omit some objects or references Enables designer to focus on important aspects But complicates path to conformance analysis

Aircraft Flying Aircraft Parked Aircraft Taxiing Aircraft Cleared Aircraft Meter Fix Flight Plan ++ Gate Object Aircraft Flight Plan Trajectory Flight Name Runway Gate Trajectory Gate Runway Object Aircraft Flight Plan Runway Flight Name List of Meter Fixes String Aircraft Meter Fix Flight Plan + Abstract Object Model Concrete Object Model Intermediate Object Model Roles

Concretization Specifications Maps Between Object Models Enables Designer/Developer to Establish Correspondence Between Object Models Specify how Object Model is Realized in Code Foundation for design conformance analysis Guides implementation of object model Implementation patterns for object models

Design Conformance Benefits Higher Confidence in Software Promote clean implementation of design Guarantee important design properties Design becomes useful throughout entire development cycle Updated as implementation changes Reliable source of information Enables more precise, relevant analysis

Related Work Pointer Analysis Landi, Ryder, Zhang – PLDI93 Emami, Ghiya, Hendren – PLDI94 Wilson, Lam – PLDI96 Rugina, Rinard – PLDI99 Rountev, Ryder – CC01 Salcianu, Rinard – PPoPP01 Region Analysis Triolet, Irigoin, Feautrier- PLDI86 Havlak, Kennedy – IEEE TPDS91 Rugina, Rinard – PLDI00 Pointer Specifications Hendren, Hummel, Nicolau – PLDI92 Guyer, Lin – LCPC00

Related Work Shape Analysis [CWZ90,GH96,FL97,SRW99,MS01] Extended Type Systems FX/87 [GJLS87] Dependent Types [XF99] Program Verification ESC [DLNS98] PVS [ORRSS96] Implementations of Object Models [HBR00]

Conclusion Developer and Designer Interact with Analysis Benefits More precise, relevant analysis Verify key safety and design properties Enhance utility of design Enable powerful transformations Key Issue: Determining appropriate abstractions to leverage Access regions, roles, object models Abstractions Share Several Features Identify important properties of data Relate properties of data to behavior of computation