CS 6340 Constraint-Based Analysis. Motivation “What” No null pointer is dereferenced along any path in the program. Program Analysis = Specification +

Slides:



Advertisements
Similar presentations
Interprocedural Analysis. Currently, we only perform data-flow analysis on procedures one at a time. Such analyses are called intraprocedural analyses.
Advertisements

Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
Lecture 11: Code Optimization CS 540 George Mason University.
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
Control-Flow Graphs & Dataflow Analysis CS153: Compilers Greg Morrisett.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
1 CS 201 Compiler Construction Lecture Interprocedural Data Flow Analysis.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Dataflow Analysis Introduction Guo, Yao Part of the slides are adapted from.
(c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 1 Dependence and Data Flow Models.
1 Data flow analysis Goal : collect information about how a procedure manipulates its data This information is used in various optimizations For example,
Parameterized Object Sensitivity for Points-to Analysis for Java Presented By: - Anand Bahety Dan Bucatanschi.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
Common Sub-expression Elim Want to compute when an expression is available in a var Domain:
Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
CS 536 Spring Global Optimizations Lecture 23.
Administrative info Subscribe to the class mailing list –instructions are on the class web page, which is accessible from my home page, which is accessible.
4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)
Previous finals up on the web page use them as practice problems look at them early.
Scaling CFL-Reachability-Based Points- To Analysis Using Context-Sensitive Must-Not-Alias Analysis Guoqing Xu, Atanas Rountev, Manu Sridharan Ohio State.
Another example p := &x; *p := 5 y := x + 1;. Another example p := &x; *p := 5 y := x + 1; x := 5; *p := 3 y := x + 1; ???
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
Prof. Fateman CS 164 Lecture 221 Global Optimization Lecture 22.
Projects. Dataflow analysis Dataflow analysis: what is it? A common framework for expressing algorithms that compute information about a program Why.
Comparison Caller precisionCallee precisionCode bloat Inlining context-insensitive interproc Context sensitive interproc Specialization.
Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...
Machine-Independent Optimizations Ⅰ CS308 Compiler Theory1.
From last time: reaching definitions For each use of a variable, determine what assignments could have set the value being read from the variable Information.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Prof. Aiken CS 294 Lecture 11 Program Analysis. Prof. Aiken CS 294 Lecture 12 The Purpose of this Course How are the following related? –Program analysis.
Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.
Prof. Bodik CS 164 Lecture 16, Fall Global Optimization Lecture 16.
1/25 Pointer Logic Changki PSWLAB Pointer Logic Daniel Kroening and Ofer Strichman Decision Procedure.
Procedure Optimizations and Interprocedural Analysis Chapter 15, 19 Mooly Sagiv.
1 CS 201 Compiler Construction Data Flow Analysis.
1 ECE 453 – CS 447 – SE 465 Software Testing & Quality Assurance Instructor Kostas Kontogiannis.
Software (Program) Analysis. Automated Static Analysis Static analyzers are software tools for source text processing They parse the program text and.
1 Automatic Refinement and Vacuity Detection for Symbolic Trajectory Evaluation Orna Grumberg Technion Haifa, Israel Joint work with Rachel Tzoref.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.
Dataflow Analysis Topic today Data flow analysis: Section 3 of Representation and Analysis Paper (Section 3) NOTE we finished through slide 30 on Friday.
1 Data Flow Analysis Data flow analysis is used to collect information about the flow of data values across basic blocks. Dominator analysis collected.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.
Pointer Analysis Survey. Rupesh Nasre. Aug 24, 2007.
Pointer Analysis – Part II CS Unification vs. Inclusion Earlier scalable pointer analysis was context- insensitive unification-based [Steensgaard.
1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.
Pointer Analysis – Part I CS Pointer Analysis Answers which pointers can point to which memory locations at run-time Central to many program optimization.
Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 8: Static Analysis II Roman Manevich Ben-Gurion University.
Dataflow Analysis CS What I s Dataflow Analysis? Static analysis reasoning about flow of data in program Different kinds of data: constants, variables,
DFA foundations Simone Campanoni
Data Flow Analysis Suman Jana
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Iterative Dataflow Problems
Dataflow analysis.
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Harry Xu University of California, Irvine & Microsoft Research
Dataflow Analysis Mayur Naik CIS 700 – Fall 2017 {HEADSHOT}
Fall Compiler Principles Lecture 8: Loop Optimizations
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Topic 10: Dataflow Analysis
University Of Virginia
Constraint-Based Analysis
1. Reaching Definitions Definition d of variable v: a statement d that assigns a value to v. Use of variable v: reference to value of v in an expression.
Data Flow Analysis Compiler Design
Pointer analysis.
Static Single Assignment
CUTE: A Concolic Unit Testing Engine for C
Presentation transcript:

CS 6340 Constraint-Based Analysis

Motivation “What” No null pointer is dereferenced along any path in the program. Program Analysis = Specification + Implementation “How” Many design choices: ●forward vs. backward traversal ●symbolic vs. explicit representation ●... Designing an efficient program analysis is challenging

Motivation Designing an efficient program analysis is challenging Program Analysis = Specification + Implementation “How” Many design choices: ●forward vs. backward traversal ●symbolic vs. explicit representation ●... Nontrivial! Consider null pointer dereference analysis: ●No null pointer assignments (v = null): forward is best ●No pointer dereferences (v->next): backward is best

“What” Defined by the user in the constraint language. What Is Constraint-Based Analysis? Program Analysis = Specification + Implementation Designing an efficient program analysis is challenging “How” Automated by the constraint solver.

Benefits of Constraint-Based Analysis Separates analysis specification from implementation –Analysis writer can focus on “what” rather than “how” Yields natural program specifications –Constraints are usually local, whose conjunctions capture global properties Enables sophisticated analysis implementations –Leverage powerful, off-the-shelf solvers

QUIZ: Specification & Implementation Consider a dataflow analysis such as live variables analysis. If one expresses it as a constraint-based analysis, one must still decide: The order in which statements should be processed. What the gen and kill sets for each kind of statement are. In what language to implement the chaotic iteration algorithm. Whether to take intersection or union at merge points.

Consider a dataflow analysis such as live variables analysis. If one expresses it as a constraint-based analysis, one must still decide: The order in which statements should be processed. What the gen and kill sets for each kind of statement are. In what language to implement the chaotic iteration algorithm. Whether to take intersection or union at merge points. QUIZ: Specification & Implementation ✓ ✓

Outline of this Lesson A constraint language: Datalog Two static analyses in Datalog: Intra-procedural analysis: computing reaching definitions Inter-procedural analysis: computing points- to information

A Constraint Language: Datalog A declarative logic programming language Not Turing-complete: subset of Prolog, or SQL with recursion => Efficient algorithms to evaluate Datalog programs Originated as query language for deductive databases Later applied in many other domains: software analysis, data mining, networking, security, knowledge representation, cloud-computing,... Many implementations: Logicblox, bddbddb, IRIS, Paddle,...

Syntax of Datalog: Example Input Relations: edge(n:N, m:N) Output Relations: path(n:N, m:N) Rules: path(x, x). path(x, z) :- path(x, y), edge(y, z).

A relation is similar to a table in a database. A tuple in a relation is similar to a row in a table. Syntax of Datalog: Example Input Relations: edge(n:N, m:N) Output Relations: path(n:N, m:N) Rules: path(x, x). path(x, z) :- path(x, y), edge(y, z).

edge nm Syntax of Datalog: Example Input Relations: edge(n:N, m:N) Output Relations: path(n:N, m:N) Rules: path(x, x). path(x, z) :- path(x, y), edge(y, z)

Deductive rules that hold universally (i.e., variables like x, y, z can be replaced by any constant). Specify “if … then … ” logic. Syntax of Datalog: Example Input Relations: edge(n:N, m:N) Output Relations: path(n:N, m:N) Rules: path(x, x). path(x, z) :- path(x, y), edge(y, z).

(If TRUE,) there is a path from each node to itself. If there is path from node x to y, and there is an edge from y to z, then there is path from x to z. Syntax of Datalog: Example Input Relations: edge(n:N, m:N) Output Relations: path(n:N, m:N) Rules: path(x, x). path(x, z) :- path(x, y), edge(y, z).

Semantics of Datalog: Example Input Relations: edge(n:N, m:N) Output Relations: path(n:N, m:N) Rules: path(x, x). path(x, z) :- path(x, y), edge(y, z).

Semantics of Datalog: Example Input Relations: edge(n:N, m:N) Output Relations: path(n:N, m:N) Rules: path(x, x). path(x, z) :- path(x, y), edge(y, z) Input Tuples: edge(0, 1), edge(0, 2), edge(2, 3), edge(2, 4) Output Tuples: path(0, 0), path(1, 1), path(2, 2), path(3, 3), path(4, 4), path(0, 1), path(0, 2), path(2, 3), path(2, 4), path(0, 3), path(0, 4)

Input Tuples: edge(0, 1), edge(0, 2), edge(2, 3), edge(2, 4) Output Tuples: path(0, 0), path(1, 1), path(2, 2), path(3, 3), path(4, 4), path(0, 1), path(0, 2), path(2, 3), path(2, 4), path(0, 3), path(0, 4) Semantics of Datalog: Example Input Relations: edge(n:N, m:N) Output Relations: path(n:N, m:N) Rules: path(x, x). path(x, z) :- path(x, y), edge(y, z).

Semantics of Datalog: Example Input Relations: edge(n:N, m:N) Output Relations: path(n:N, m:N) Rules: path(x, x). path(x, z) :- path(x, y), edge(y, z). Input Tuples: edge(0, 1), edge(0, 2), edge(2, 3), edge(2, 4) Output Tuples: path(0, 0), path(1, 1), path(2, 2), path(3, 3), path(4, 4), path(0, 1), path(0, 2), path(2, 3), path(2, 4), path(0, 3), path(0, 4)

Semantics of Datalog: Example Input Relations: edge(n:N, m:N) Output Relations: path(n:N, m:N) Rules: path(x, x). path(x, z) :- path(x, y), edge(y, z) Input Tuples: edge(0, 1), edge(0, 2), edge(2, 3), edge(2, 4) Output Tuples: path(0, 0), path(1, 1), path(2, 2), path(3, 3), path(4, 4), path(0, 1), path(0, 2), path(2, 3), path(2, 4), path(0, 3), path(0, 4)

QUIZ: Computation Using Datalog scc(n1, n2) :- edge(n1, n2), edge(n2, n1). scc(n1, n2) :- path(n1, n2), path(n2, n1). scc(n1, n2) :- path(n1, n3), path(n3, n2), path(n2, n4), path(n4, n1). scc(n1, n2) :- path(n1, n3), path(n2, n3). Check each of the below Datalog programs that computes in relation scc exactly those pairs of nodes (n1, n2) such that n2 is reachable from n1 AND n1 is reachable from n2.

QUIZ: Computation Using Datalog scc(n1, n2) :- edge(n1, n2), edge(n2, n1). ✓ ✓ scc(n1, n2) :- path(n1, n2), path(n2, n1). scc(n1, n2) :- path(n1, n3), path(n3, n2), path(n2, n4), path(n4, n1). scc(n1, n2) :- path(n1, n3), path(n2, n3). Check each of the below Datalog programs that computes in relation scc exactly those pairs of nodes (n1, n2) such that n2 is reachable from n1 AND n1 is reachable from n2.

Outline of this Lesson A constraint language: Datalog Two static analyses in Datalog: Intra-procedural analysis: computing reaching definitions Inter-procedural analysis: computing points- to information

Dataflow Analysis in Datalog Recall the specification of reaching definitions analysis: OUT[n] = (IN[n] - KILL[n]) ∪ GEN[n] IN[n] = OUT[n’] n’ ∈ predecessors( n) ∪

Input Relations: kill(n:N, d:D) Definition d is killed by statement n. Reaching Definitions Analysis in Datalog Output Relations: Rules: OUT[n] = (IN[n] - KILL[n]) ∪ GEN[n] IN[n] = OUT[n’] n’ ∈ predecessors( n) ∪

Rules: Reaching Definitions Analysis in Datalog Definition d is generated by statement n. Input Relations: kill(n:N, d:D) gen (n:N, d:D) OUT[n] = (IN[n] - KILL[n]) ∪ GEN[n] IN[n] = OUT[n’] n’ ∈ predecessors( n) ∪ Output Relations:

Input Relations: kill(n:N, d:D) gen (n:N, d:D) next(n:N, m:N) Output Relations: Rules: Statement m is an immediate successor of statement n. Reaching Definitions Analysis in Datalog OUT[n] = (IN[n] - KILL[n]) ∪ GEN[n] IN[n] = OUT[n’] n’ ∈ predecessors( n) ∪

Reaching Definitions Analysis in Datalog Output Relations: Rules: OUT[n] = (IN[n] - KILL[n]) ∪ GEN[n] IN[n] = OUT[n’] n’ ∈ predecessors( n) ∪ Input Relations: kill(n:N, d:D) gen (n:N, d:D) next(n:N, m:N)

Output Relations: in (n:N, d:D) Rules: Reaching Definitions Analysis in Datalog Definition d may reach the program point just before statement n. OUT[n] = (IN[n] - KILL[n]) ∪ GEN[n] IN[n] = OUT[n’] n’ ∈ predecessors( n) ∪ Input Relations: kill(n:N, d:D) gen (n:N, d:D) next(n:N, m:N)

Output Relations: in (n:N, d:D) out(n:N, d:D) Rules: Reaching Definitions Analysis in Datalog OUT[n] = (IN[n] - KILL[n]) ∪ GEN[n] IN[n] = OUT[n’] n’ ∈ predecessors( n) ∪ Definition d may reach the program point just after statement n. Input Relations: kill(n:N, d:D) gen (n:N, d:D) next(n:N, m:N)

Rules: out(n, d) :- gen(n, d). out(n, d) :- in(n, d), !kill(n, d). Reaching Definitions Analysis in Datalog OUT[n] = (IN[n] - KILL[n]) ∪ GEN[n] IN[n] = OUT[n’] n’ ∈ predecessors( n) ∪ Input Relations: kill(n:N, d:D) gen (n:N, d:D) next(n:N, m:N) Output Relations: in (n:N, d:D) out(n:N, d:D)

Reaching Definitions Analysis in Datalog Rules: out(n, d) :- gen(n, d). out(n, d) :- in(n, d), !kill(n, d). in (m, d) :- out(n, d), next(n, m). OUT[n] = (IN[n] - KILL[n]) ∪ GEN[n] IN[n] = OUT[n’] n’ ∈ predecessors( n) ∪ Input Relations: kill(n:N, d:D) gen (n:N, d:D) next(n:N, m:N) Output Relations: in (n:N, d:D) out(n:N, d:D)

x = 8 (x != 1)? exit entry 2: 1: 5: 3: x=x-1 4: false true Reaching Definitions Analysis: Example Rules: out(n, d) :- gen(n, d). out(n, d) :- in(n, d), !kill(n, d). in (m, d) :- out(n, d), next(n, m). Input Relations: kill(n:N, d:D) gen (n:N, d:D) next(n:N, m:N) Output Relations: in (n:N, d:D) out(n:N, d:D)

Input Tuples: kill(4, 2), gen (2, 2), gen (4, 4), next(1, 2), next(2, 3), next(3, 4), next(3, 5), next(4, 3) x = 8 (x != 1)? exit entry 2: 1: 5: 3: x=x-1 4: false true Reaching Definitions Analysis: Example Rules: out(n, d) :- gen(n, d). out(n, d) :- in(n, d), !kill(n, d). in (m, d) :- out(n, d), next(n, m). Output Relations: in (n:N, d:D) out(n:N, d:D) Input Relations: kill(n:N, d:D) gen (n:N, d:D) next(n:N, m:N)

Input Tuples: kill(4, 2), gen (2, 2), gen (4, 4), next(1, 2), next(2, 3), next(3, 4), next(3, 5), next(4, 3) Output Tuples: in (3, 2), in (3, 4), in (4, 2), in (4, 4), in (5, 2), in (5, 4), out(2, 2), out(3, 2), out(3, 4), out(4, 2), out(4, 4), out(5, 2), out(5, 4) x = 8 (x != 1)? exit entry 2: 1: 5: 3: x=x-1 4: false true Reaching Definitions Analysis: Example Rules: out(n, d) :- gen(n, d). out(n, d) :- in(n, d), !kill(n, d). in (m, d) :- out(n, d), next(n, m). Output Relations: in (n:N, d:D) out(n:N, d:D) Input Relations: kill(n:N, d:D) gen (n:N, d:D) next(n:N, m:N)

QUIZ: Live Variables Analysis Input Relations: kill(n:N, v:V) gen (n:N, v:V) next(n:N, m:N) Output Relations: in (n:N, v:V) out(n:N, v:V) Rules: Complete the Datalog program below by filling in the rules for live variables analysis. :-. :-, !. :-,.

QUIZ: Live Variables Analysis :-. :-, !. :-,. in(n, v) out(n, v)kill(n, v) out(n, v) in(m, v)next(n, m) in(n, v)gen(n, v) Complete the Datalog program below by filling in the rules for live variables analysis. Rules: Input Relations: kill(n:N, v:V) gen (n:N, v:V) next(n:N, m:N) Output Relations: in (n:N, v:V) out(n:N, v:V)

Outline of this Lesson A constraint language: Datalog Two static analyses in Datalog: Intra-procedural analysis: computing reaching definitions Inter-procedural analysis: computing points- to information

Pointer Analysis in Datalog Consider a flow-insensitive may-alias analysis for a simple language: (function body) f(v) { s1,..., sn } (statement) s ::= v = new h | v = u | return u | v = f(u) (pointer variable) u, v (allocation site) h (function name) f

Pointer Analysis in Datalog: Intra-procedural Consider a flow-insensitive may-alias analysis for a simple language: (function body) f(v) { s1,..., sn } (statement) s ::= v = new h | v = u | return u | v = f(u) (pointer variable) u, v (allocation site) h (function name) f

v v h2 h v = new h v v h h h2 v = u u h2 u Pointer Analysis in Datalog: Intra-procedural Recall the specification: After: Before:

Pointer Analysis in Datalog: Intra-procedural Input Relations: new (v:V, h:H) assign(v:V, u:V) Output Relations: Rules: v = new h v = u v v h2 h v v h h u u After: Before:

v v h2 h v = new h v v h h h2 v = u u h2 u Pointer Analysis in Datalog: Intra-procedural Input Relations: new (v:V, h:H) assign(v:V, u:V) Output Relations: points(v:V, h:H) Rules: After: Before:

Pointer Analysis in Datalog: Intra-procedural Input Relations: new (v:V, h:H) assign(v:V, u:V) Output Relations: points(v:V, h:H) Rules: points(v, h) :- new(v, h). v = new h v v h2 h v = u v v h h h2 u u After: Before:

Input Relations: new (v:V, h:H) assign(v:V, u:V) Output Relations: points(v:V, h:H) Rules: points(v, h) :- new(v, h). points(v, h) :- assign(v, u), points(u, h). Pointer Analysis in Datalog: Intra-procedural v v h2 h v = new h v = u v v h h h2 u u After: Before:

After: Before: Pointer Analysis in Datalog: Intra-procedural Input Relations: new (v:V, h:H) assign(v:V, u:V) Output Relations: points(v:V, h:H) Rules: points(v, h) :- new(v, h). points(v, h) :- assign(v, u), points(u, h). v v h2 h v = new h v = u v v h h h2 u u

Pointer Analysis in Datalog: Inter-procedural Consider a flow-insensitive may-alias analysis for a simple language: (function body) f(v) { s1,..., sn } (statement) s ::= v = new h | v = u | return u | v = f(u) (pointer variable) u, v (allocation site) h (function name) f

? Pointer Analysis in Datalog: Inter-procedural x = new h1; y = f(x); f(v) { u = v; return u; }

Parameter passing and return can be treated as assignments! Input Relations: new (v:V, h:H) assign(v:V, u:V) Output Relations: points(v:V, h:H) Rules: points(v, h) :- new(v, h). points(v, h) :- assign(v, u), points(u, h). Pointer Analysis in Datalog: Inter-procedural x = new h1; y = f(x); f(v) { u = v; return u; }

v = x u = v y = u Input Relations: new(v:V, h:H) assign(v:V, u:V) Output Relations: points(v:V, h:H) Rules: points(v, h) :- new(v, h). points(v, h) :- assign(v, u), points(u, h). Pointer Analysis in Datalog: Inter-procedural x = new h1; y = f(x); f(v) { u = v; return u; }

v = x u = v y = u Input Relations: new(v:V, h:H) assign(v:V, u:V) Output Relations: points(v:V, h:H) Rules: points(v, h) :- new(v, h). points(v, h) :- assign(v, u), points(u, h). Pointer Analysis in Datalog: Inter-procedural x = new h1; y = f(x); f(v) { u = v; return u; }

v = x u = v y = u Input Relations: new(v:V, h:H) arg(f:F, v:V) ret(f:F, u:V) assign(v:V, u:V) call(y:V, f:F, x:V) Output Relations: points(v:V, h:H) Rules: points(v, h) :- new(v, h). points(v, h) :- assign(v, u), points(u, h). Pointer Analysis in Datalog: Inter-procedural x = new h1; y = f(x); f(v) { u = v; return u; }

x = new h1; y = f(x); f(v) { u = v; return u; } Input Relations: new(v:V, h:H) arg(f:F, v:V) ret(f:F, u:V) assign(v:V, u:V) call(y:V, f:F, x:V) Output Relations: points(v:V, h:H) Rules: points(v, h) :- new(v, h). points(v, h) :- assign(v, u), points(u, h). Pointer Analysis in Datalog: Inter-procedural call(y,f,x) arg(f,v) ret(f,u)

Input Relations: new(v:V, h:H) arg(f:F, v:V) ret(f:F, u:V) assign(v:V, u:V) call(y:V, f:F, x:V) Output Relations: points(v:V, h:H) Rules: points(v, h) :- new(v, h). points(v, h) :- assign(v, u), points(u, h). points(v, h) :- call(_, f, x), arg(f, v), points(x, h). v = x u = v y = u Wildcard, “don’t care” Pointer Analysis in Datalog: Inter-procedural x = new h1; y = f(x); f(v) { u = v; return u; }

v = x u = v y = u Input Relations: new(v:V, h:H) arg(f:F, v:V) ret(f:F, u:V) assign(v:V, u:V) call(y:V, f:F, x:V) Output Relations: points(v:V, h:H) Rules: points(v, h) :- new(v, h). points(v, h) :- assign(v, u), points(u, h). points(v, h) :- call(_, f, x), arg(f, v), points(x, h). points(y, h) :- call(y, f, _), ret(f, u), points(u, h). Pointer Analysis in Datalog: Inter-procedural x = new h1; y = f(x); f(v) { u = v; return u; }

QUIZ: Querying Pointer Analysis in Datalog mustNotAlias(u, v) :- points(u, h1), points(v, h2), h1 != h2. mayAlias(u, v) :- points(u, _), points(v, _). mustNotAlias(u, v) :- !mayAlias(u, v). common(u, v, h) :- points(u, h), points(v, h). mayAlias(u, v) :- common(u, v, _). mustNotAlias(u, v) :- !mayAlias(u, v). mayAlias(u, v) :- points(u, h), points(v, h). mustNotAlias(u, v) :- !mayAlias(u, v). Check each of the below Datalog programs that computes in relation mustNotAlias each pair of variables ( u, v ) such that u and v do not alias in any run of the program.

QUIZ: Querying Pointer Analysis in Datalog ✓ ✓ mustNotAlias(u, v) :- points(u, h1), points(v, h2), h1 != h2. mayAlias(u, v) :- points(u, _), points(v, _). mustNotAlias(u, v) :- !mayAlias(u, v). common(u, v, h) :- points(u, h), points(v, h). mayAlias(u, v) :- common(u, v, _). mustNotAlias(u, v) :- !mayAlias(u, v). mayAlias(u, v) :- points(u, h), points(v, h). mustNotAlias(u, v) :- !mayAlias(u, v). Check each of the below Datalog programs that computes in relation mustNotAlias each pair of variables ( u, v ) such that u and v do not alias in any run of the program.

Context Sensitivity x = new h1; z = new h2; y = f(x); w = f(z); f(v) { u = v; return u; } Input Relations: new(v:V, h:H) arg(f:F, v:V) ret(f:F, u:V) assign(v:V, u:V) call(y:V, f:F, x:V) Output Relations: points(v:V, h:H) Rules: points(v, h) :- new(v, h). points(v, h) :- assign(v, u), points(u, h). points(v, h) :- call(_, f, x), arg(f, v), points(x, h). points(y, h) :- call(y, f, _), ret(f, u), points(u, h).

Context Sensitivity v = x u = v y = u v = z u = v w = u x = new h1; z = new h2; y = f(x); w = f(z); f(v) { u = v; return u; } Input Relations: new(v:V, h:H) arg(f:F, v:V) ret(f:F, u:V) assign(v:V, u:V) call(y:V, f:F, x:V) Output Relations: points(v:V, h:H) Rules: points(v, h) :- new(v, h). points(v, h) :- assign(v, u), points(u, h). points(v, h) :- call(_, f, x), arg(f, v), points(x, h). points(y, h) :- call(y, f, _), ret(f, u), points(u, h).

Context Sensitivity z h1 x h2 yw v u v = z u = v w = u x = new h1; z = new h2; y = f(x); w = f(z); f(v) { u = v; return u; } v = x u = v y = u

Context Sensitivity Imprecision ! v = z u = v w = u x = new h1; z = new h2; y = f(x); w = f(z); f(v) { u = v; return u; } v = x u = v y = u z h1 x h2 yw v u

Cloning-Based Inter-procedural Analysis ui uj Achieves context sensitivity by inlining procedure calls Cloning depth : precision vs. scalability vi = x ui = vi y = ui vj = z uj = vj w = uj x = new h1; z = new h2; i: y = f(x); j: w = f(z); f(v) { u = v; return u; } z h1 x h2 yw vj vi

What about Recursion? Need infinite cloning depth to differentiate the points-to sets of x, y and w, z! x = new h1; z = new h2; y = f(x); w = f(z); f(v) { if (*) v = f(v); return v; }

Summary-Based Inter-procedural Analysis Use the incoming program states to differentiate calls to the same procedure Same incoming program states yield same outgoing program states for a given procedure As precise as cloning-based analysis with infinite cloning depth

Other Constraint Languages Constraint Language Problem ExpressedExample Solvers DatalogLeast solution of deductive inference rules LogixBlox, bddbddb SATBoolean satisfiability problemMiniSat, Glucose MaxSATBoolean satisfiability problem extended with optimization open-wbo, SAT4j SMTSatisfiability modulo theories problem Z3, Yices MaxSMTSatisfiability modulo theories problem extended with optimization Z3

What Have We Learned? Constraint-based analysis and its benefits The Datalog constraint language How to express static analyses in Datalog –Analysis logic == constraints in Datalog –Analysis inputs and outputs == relations of tuples Context-insensitive and context-sensitive inter-procedural analysis