Abstract Interpretation and Future Program Analysis Problems Martin Rinard Alexandru Salcianu Laboratory for Computer Science Massachusetts Institute of.

Abstract Interpretation and Future Program Analysis Problems Martin Rinard Alexandru Salcianu Laboratory for Computer Science Massachusetts Institute of Technology

Abstract Interpretation: The Early Years Formal Connection Between Sound analysis of program Execution of program Broader Impact Insight that analysis is execution Reduced need to think of analysis as reasoning about all possible executions! Good fit with analysis problems of that era Properties of local variables Within single procedure

How Is Abstract Interpretation Holding Up? Technical result as relevant as ever Moore’s Law effects Much more computing power for analysis More complex programs Ambitious analyses Heap properties Multiple threads Interprocedural partial program analyses Stretch intuitive vision of analysis as execution

Outline Combined pointer and escape analysis Rationale behind design decisions Alternative choices in design space Challenges and Predictions Bigger Picture

Goal of Pointer Analysis r = p.f; p r f Characterize objects to which pointers point Synthesize finite set of object representatives Derive representative(s) each pointer points to “p.f points to a object, so after the execution of r = p.f, r may point to a object, but not to a,, or object”

Our Pointer Analysis Goals Accurate for multithreaded programs Compositional, partial program analysis Analyze each procedure once Independently of callers May skip analysis of invoked procedures Why? Parts of program unavailable (different language, not written yet) Parts may be irrelevant for desired result

Analysis Abstraction Basic abstraction Is Points-to Graph Nodes represent objects in heap Edges represent references in heap p q f f u f f

Two Kinds of Edges Inside edges (solid) – represent references created inside analyzed part of program Outside edges (dashed) – represent references created outside analyzed part of program p q f f u f f

Two Kinds of Nodes Inside nodes (solid) – represent objects created inside analyzed part of program Outside nodes (dashed) – represent objects Created outside analyzed part of program, or Accessed via edges created outside analyzed part of program p q f f u f f

Key Question What does the heap look like when the procedure begins its execution? Previous algorithms analyzed callers before callees, so model of heap always available Unfortunately, this approach requires analysis of entire program in top-down fashion Our solution: use code to reconstruct what (accessed part of) heap must look like

m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p q Analysis In Example

m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p q Analysis In Example r

m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p q Analysis In Example r f

m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p q Analysis In Example r f s

m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p q Analysis In Example r f f s

m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p q Analysis In Example r f f s f One option – continue to expand graph But the analysis may never terminate…

m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p q Analysis In Example r f f s Instead have one outside node per load statement Represents all objects loaded at that statement Bounds graph and guarantees termination f

Consequences of This Decision Multiple objects represented by single node (load node in loop) But can also have single object represented by multiple nodes in graph (!!) (object loaded at multiple statements) q f f f f do a = q.f; until (a = null); do b = q.f; until (b = null);

Consequences of This Decision Form of points-to graph depends on program Programs with identical behavior but different graphs… p q r f f s f p q r f f s f f do s = s.f; until (s = null); s = s.f; while (s != null) s = s.f

m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p q Analysis In Example r f f s f

m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p q Analysis In Example r f f s f t

m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p q Analysis In Example r f f s f t f

m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p q Analysis In Example r f f s f t f u

Nodes (inside, captured) Created inside analyzed part of program Unreachable from unanalyzed part of program Complete information about referencing relationships! p q What Does Result Tell Us? r f f s f t f u Nodes (outside) Created outside analyzed part of program Incomplete information Nodes (inside, escaped) Created inside analyzed part of program But reachable from unanalyzed part of program Incomplete information

p q Crucial Distinction r f f s f t f u Escaped vs. Captured Enables analysis to identify regions of heap where it has complete information Crucial for both Accuracy of analysis Effective use of analysis results

p q r f f s f t f Multiple Calling Contexts Two Key Assumptions p and q refer to different objects Parallel threads may access objects m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); }

m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p q r f f s f t f Multiple Calling Contexts What if p and q refer to the same object? (i.e. p and q aliased) q r f f s f t f p

m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } Multiple Calling Contexts What if p and q refer to the same object and there are no parallel threads? p q r f f s f t f q r f f s f t f p

m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p q r f st f Multiple Calling Contexts What if p and q refer to the same object and there are no parallel threads?

Issues Substantially different results for different calling contexts But caller is unavailable at analysis time… New analysis for each possible context? Lots of contexts… Most of which probably won’t be needed…

Our Solution Analyze assuming Distinct parameters Parallel threads Aliased parameters at caller? Merge nodes… No parallel threads? Remove outside edges and nodes… p q r f f s f t f p q r f st f q r f f s f t f p

Solution Is Not Perfect Specialization can lose precision – can have two procedures such that when analyzed with Distinct parameters – same analysis result Aliased parameters - different analysis result Conceptually complex analysis Think about all contexts during analysis Start to lose intuition of analysis as execution Difficult time applying abstract interpretation framework

Abstract Interpretation and Analysis V – concrete values A – abstract values  - abstraction function  - concretization function Abstract interpretation is parameterized framework v1v1 v2v2 tvtv a1a1 a2a2 tata 

Applying Framework A – points-to graphs V – concrete heaps  - points-to graph for a given heap Points-to graph depends on program Need to augment heap with access history  - all heaps that correspond to points-to graph OK, I give up…

Correctness Proof Inductively construct a relation  between Objects in heap Nodes that represent objects Invariants that characterize  Transfer function Takes points-to graph and  Give new points-to graph and  Prove that transfer functions preserve invariants

Threads and Abstract Interpretation Philosophy of Abstract Interpretation Come up with a decent abstraction Execute program on that abstraction Problem with threads Execution usually modeled as interleaving Too many interleavings!

Our Solution Points-to graphs explicitly represent all possible interactions between parallel threads Basic Analysis Approach Analyze each thread in isolation To compute combined effect of multiple threads Retrieve result for each thread Compute interactions that may occur Outside edges Interactions in which one thread reads a reference created by parallel thread Inside Edges Interactions in which one thread creates a reference read by parallel thread

Interthread Analysis n(p,q) || m(p,q)

q p Interthread Analysis n(p,q) || m(p,q) p q Retrieve points-to graph from analysis of each thread

q p Interthread Analysis n(p,q) || m(p,q) p q Establish correspondence between nodes BA if may represent same object as AB Start with parameter nodes

q p Interthread Analysis n(p,q) || m(p,q) p q Compute Interactions Between Threads Match inside and outside edges For each outside node, compute nodes in other graph that it represents

q p Interthread Analysis n(p,q) || m(p,q) p q p q Use computed representation relationship to combine graphs and obtain single graph for the execution of both threads

Property of Analysis Flow-sensitive within each thread (if reorder statements, get different result) Flow-insensitive between threads Assumes interactions can happen Any number of times In any order Analysis models interactions that can’t actually happen in any interleaved execution

Imprecision Due To Flow Insensitivity a b c a b c a b c n(a,b,c) { 1 :p=b.f p.f=a 2 :a.f=b } m(a,c) { 3 :q=a.f 4 :q.f=c } || Interthread Analysis Result 1 2 3 4 Execution Order Required to Produce Blue Edge

Weak Memory Consistency Models

y=0 x=1 z = x+y Initially: y=1 x=0 Thread 2Thread 1 What is value of z?

y=0 x=1 z = x+y Initially: y=1 x=0 Thread 2Thread 1 What is value of z? y=0 x=1 z = x+y y=0 x=1 z = x+y y=0 x=1 z = x+y z = 1 z = 0 z = 1 Three Interleavings

y=0 x=1 z = x+y Initially: y=1 x=0 Thread 2Thread 1 What is value of z? y=0 x=1 z = x+y y=0 x=1 z = x+y y=0 x=1 z = x+y z = 1 z = 0 z = 1 Three Interleavings z can be 0 or 1

y=0 x=1 z = x+y Initially: y=1 x=0 Thread 2Thread 1 What is value of z? y=0 x=1 z = x+y y=0 x=1 z = x+y y=0 x=1 z = x+y z = 1 z = 0 z = 1 Three Interleavings z can be 0 or 1 INCORRECT REASONING!

y=0 x=1 z = x+y Initially: y=1 x=0 Thread 2Thread 1 What is value of z? z can be 0 or 1 OR 2! Memory system can reorder writes as long as it preserves illusion of sequential execution within each thread! z = x+y y=0 x=1 Different threads can observe different orders!

Implications for Example a b c a b c a b c n(a,b,c) { 1 :p=b.f p.f=a 2 :a.f=b } m(a,c) { 3 :q=a.f 4 :q.f=c } || Interthread Analysis Result Blue Edge Can Actually Occur in Some Execution! 1 2 3 4 Can’t reason about program by interleaving statements…

Implications for Analysis of Multithreaded Programs Analyzing all statement interleavings is unsound We believe that our flow-insensitive analysis is sound even for weak consistency models But formal semantics of weak memory consistency models still under development Maessen, Arvind, Shen – OOPSLA 2000 Manson, Pugh – Java Grande/ISCOPE 2001 Unclear how to prove ANY analysis sound…

Challenges and Predictions

Need To Analyze Partial Programs Fact of life - whole program may be either Unavailable, Infeasible to analyze, or Unnecessary to analyze Challenges What is starting context(s) for analysis? What is effect of invoked but unanalyzed parts of program? Especially difficult for linked data structures

Predictions Future analyses will not use presented technique Care about more sophisticated properties Need more information about calling context Many potential calling contexts never used Analysis will instead start with specification Provided by programmer Automatically guessed by unsound static analysis heuristic or dynamic analysis Then automatically verify specification Need To Analyze Partial Programs

Multithreaded Programs Challenge – too many potential executions Prediction – more two phase analyses Phase One Analyze each thread in isolation Represent potential interactions between analyzed thread and other threads Phase Two Collect results from parallel threads Compute interactions between threads

Multithreaded Programs Prediction Language will enforce more structured model Enhanced type system Force threads to interact only at explicit synchronization points Development of structured analyses Analyze single thread in isolation between synchronization points Apply potential interaction effects only at synchronization points

Weak Memory Consistency Models Challenges Lack of good formal semantics Explosion in possible program behaviors Short Term Prediction Development of formal semantics Flow-insensitive analyses proved sound Long Term Prediction Structured model will force threads to interact only at synchronization points Eliminate visibility of weak models

Trends More sophisticated properties Harsher analysis environments Partial programs Threads with weak consistency models Role of abstract interpretation Intuition of analysis as execution breaking down as analyses become more ambitious Analyses starting to look like verifications Synthesis of loop invariants Synthesizing global view of computation

Can write full formal specification for program No idea what program should do Correctness Crucial Don’t care if program works reliably or not Program verification Abstract Interpretation Bigger Picture Unsound Static Analyses Dynamic Analyses ? ? ? ? ? ?

Abstract Interpretation and Future Program Analysis Problems Martin Rinard Alexandru Salcianu Laboratory for Computer Science Massachusetts Institute of.

Similar presentations

Presentation on theme: "Abstract Interpretation and Future Program Analysis Problems Martin Rinard Alexandru Salcianu Laboratory for Computer Science Massachusetts Institute of."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Abstract Interpretation and Future Program Analysis Problems Martin Rinard Alexandru Salcianu Laboratory for Computer Science Massachusetts Institute of.

Similar presentations

Presentation on theme: "Abstract Interpretation and Future Program Analysis Problems Martin Rinard Alexandru Salcianu Laboratory for Computer Science Massachusetts Institute of."— Presentation transcript:

Similar presentations

About project

Feedback