SPEED: Statically Estimating Symbolic Computational Complexity of Programs Sumit Gulwani MSR Redmond TexPoint fonts used in EMF. Read the TexPoint manual.

Slides:

Advertisements

Similar presentations

Assertion Checking over Combined Abstraction of Linear Arithmetic and Uninterpreted Functions Sumit Gulwani Microsoft Research, Redmond Ashish Tiwari SRI.

Advertisements

Global Value Numbering using Random Interpretation Sumit Gulwani George C. Necula CS Department University of California, Berkeley.

Chapter 22 Implementing lists: linked implementations.

Precise Interprocedural Analysis using Random Interpretation Sumit Gulwani George Necula UC-Berkeley.

Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?

© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.

Continuing Abstract Interpretation We have seen: 1.How to compile abstract syntax trees into control-flow graphs 2.Lattices, as structures that describe.

Semantics Static semantics Dynamic semantics attribute grammars

Masahiro Fujita Yoshihisa Kojima University of Tokyo May 2, 2008

Optimizing Compilers for Modern Architectures Allen and Kennedy, Chapter 13 Compiling Array Assignments.

SPEED: Precise & Efficient Static Estimation of Symbolic Computational Complexity Sumit Gulwani MSR Redmond TexPoint fonts used in EMF. Read the TexPoint.

Intermediate Code Generation

Shape Analysis by Graph Decomposition R. Manevich M. Sagiv Tel Aviv University G. Ramalingam MSR India J. Berdine B. Cook MSR Cambridge.

Control Structures Any mechanism that departs from straight-line execution: –Selection: if-statements –Multiway-selection: case statements –Unbounded iteration:

3-Valued Logic Analyzer (TVP) Tal Lev-Ami and Mooly Sagiv.

1 1 Regression Verification for Multi-Threaded Programs Sagar Chaki, SEI-Pittsburgh Arie Gurfinkel, SEI-Pittsburgh Ofer Strichman, Technion-Haifa Originally.

Inpainting Assigment – Tips and Hints Outline how to design a good test plan selection of dimensions to test along selection of values for each dimension.

Discussion #33 Adjacency Matrices. Topics Adjacency matrix for a directed graph Reachability Algorithmic Complexity and Correctness –Big Oh –Proofs of.

A survey of techniques for precise program slicing Komondoor V. Raghavan Indian Institute of Science, Bangalore.

Program Analysis as Constraint Solving Sumit Gulwani (MSR Redmond) Ramarathnam Venkatesan (MSR Redmond) Saurabh Srivastava (Univ. of Maryland) TexPoint.

Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages Chapter 3 : Describing Syntax and Semantics Axiomatic Semantics.

ISBN Chapter 3 Describing Syntax and Semantics.

1 Semantic Description of Programming languages. 2 Static versus Dynamic Semantics n Static Semantics represents legal forms of programs that cannot be.

1/22 Programs : Semantics and Verification Charngki PSWLAB Programs: Semantics and Verification Mordechai Ben-Ari Mathematical Logic for Computer.

Rahul Sharma Işil Dillig, Thomas Dillig, and Alex Aiken Stanford University Simplifying Loop Invariant Generation Using Splitter Predicates.

Discovering Affine Equalities Using Random Interpretation Sumit Gulwani George Necula EECS Department University of California, Berkeley.

Program Verification as Probabilistic Inference Sumit Gulwani Nebojsa Jojic Microsoft Research, Redmond.

1 Operational Semantics Mooly Sagiv Tel Aviv University Textbook: Semantics with Applications.

Vertically Integrated Analysis and Transformation for Embedded Software John Regehr University of Utah.

Counterexample-Guided Focus TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A AA A A Thomas Wies Institute of.

Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.

Complexity Analysis (Part I)

Abstract Interpretation Part I Mooly Sagiv Textbook: Chapter 4.

A Numerical Abstract Domain based on Expression Abstraction + Max Operator with Application in Timing Analysis Sumit Gulwani (MSR Redmond) Bhargav Gulavani.

TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A Sumit Gulwani (Microsoft Research, Redmond, USA) Symbolic Bound Computation.

Overview of program analysis Mooly Sagiv html://

Improving Code Generation Honors Compilers April 16 th 2002.

Describing Syntax and Semantics

C++ for Engineers and Scientists Third Edition

Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.

Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.

C How to Program, 6/e Summary © by Pearson Education, Inc. All Rights Reserved.

Precision Going back to constant prop, in what cases would we lose precision?

Software Systems Verification and Validation Laboratory Assignment 3

1 Recursion Algorithm Analysis Standard Algorithms Chapter 7.

June 27, 2002 HornstrupCentret1 Using Compile-time Techniques to Generate and Visualize Invariants for Algorithm Explanation Thursday, 27 June :00-13:30.

1 Inference Rules and Proofs (Z); Program Specification and Verification Inference Rules and Proofs (Z); Program Specification and Verification.

Jessie Zhao Course page: 1.

CS 363 Comparative Programming Languages Semantics.

Program Efficiency & Complexity Analysis. Algorithm Review An algorithm is a definite procedure for solving a problem in finite number of steps Algorithm.

TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A Sumit Gulwani (Microsoft Research, Redmond) The Reachability-Bound.

Introduction to Problem Solving. Steps in Programming A Very Simplified Picture –Problem Definition & Analysis – High Level Strategy for a solution –Arriving.

CSC 221: Recursion. Recursion: Definition Function that solves a problem by relying on itself to compute the correct solution for a smaller version of.

Data Structure Introduction.

CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.

Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler.

Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.

1 Proving program termination Lecture 5 · February 4 th, 2008 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A.

Operational Semantics Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.

CSC 212 – Data Structures Lecture 15: Big-Oh Notation.

Operational Semantics Mooly Sagiv Reference: Semantics with Applications Chapter 2 H. Nielson and F. Nielson

Algorithm Analysis 1.

Spring 2017 Program Analysis and Verification

Complexity Analysis (Part I)

Textbook: Principles of Program Analysis

Chapter 4 C Program Control Part I

CS 326 Programming Languages, Concepts and Implementation

Symbolic Implementation of the Best Transformer

Complexity Analysis (Part I)

Complexity Analysis (Part I)

Presentation transcript:

SPEED: Statically Estimating Symbolic Computational Complexity of Programs Sumit Gulwani MSR Redmond TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A Trishul Chilimbi MSR Redmond Krishna Mehra MSR Bangalore

Problem Definition Compute symbolic complexity bounds of procedures in terms of inputs (assuming unit cost for statements). Can use different cost metrics. –Only count memory instructions –Only count memory allocation instructions and weight them with memory allocated (space bounds) –Only count network instructions weighted appropriately (network traffic bounds) Can also compute bounds for interesting code fragments. –code executed between lock acquire/release. 1

Comparison with Profiling 2 ProfilingStatic Symbolic Complexity (+) Generates real running cost.(-) Ignores low-level architectural features like caches, pipelines (-) Only as good as test-inputs.(+) Generates worst-case bounds. (+) Can also generate predicated bounds. (-) Requires building of executables => Late feedback. (+) Requires only compilation of relevant procedures => Immediate feedback.

Applications Provide immediate feedback during code development –Code Editing. –Use of unfamiliar APIs. Performance Analysis –Identify corner cases. Embedded Systems –Establish space bounds. Soft Real-time Systems –Establish time bounds. –Feedback into a runtime power-management scheme. 3

Outline  Challenges in Bounds Analysis Idea #1: Proof Structure (control flow) Idea #2: Quantitative Functions (data-structures) 4

Challenges in Computing Bounds Presence of control-flow –Bounds for even simple programs are non-linear, disjunctive. –Sometimes even proving termination is hard. Presence of data-structures –Expressing bounds requires numerical fns over data-structures. –Computing these bounds requires sophisticated shape analysis. 5

Counter Instrumentation Based Solution The main challenge is in computing loop bounds. A simple counter instrumentation scheme: Loop bounds can be obtained by computing bounds on c using invariant generation tools [CAV ‘08] However the required invariants are usually disjunctive, non- linear, and refer to heap -- and hence hard to compute. Our solution: Refinement of above scheme that allows bounds generation using simple linear invariant generation tools. 6 while (cond) do S c := 0; while (cond) do S; c := c+1;

int size; // Assume(0 · e1.len, e2.len · size); Equals (StringBuffer s1, StringBuffer s2) { c1 := c2 := c3 := 0; e1:=s1.GetHead(); e2:=s2.GetHead(); i1:=e1.len-1; i2:=e2.len-1; while (true) { while (i1 ¸ 0 Æ i2 ¸ 0) { if (e1.arr[i1]  e2.arr[i2]) return 0; i1--; i2--; c3++; } while (i1<0 Æ e1  null) { e1 := s1.GetNext(e1); i1 := i1+e1.len; c1++; c3 := 0; } while (i2<0 Æ e2  null) { e2 := s2.GetNext(e2); i2 := i2+e2.len; c2++; c3 := 0; } if (i1<0) return (i2<0); if (i2<0) return 0; c3++; }; return 1; } Total iterations of 2 nd & 3 rd inner loops: Len(s1) & Len(s2). For each iteration of 2 nd & 3 rd inner loops, combined iterations of 1 st inner loop & outer loop: size Therefore total complexity is (1+size)*(1+Len(s1)+Len(s2)) Example: Non-linear bounds 7

Example: Disjunctive Bounds Example2(int n, x 0, z 0 ) { c1 := 0; c2 := 0; x := x 0 ; z := z 0 ; while (x<n) if (z>x) x++; c1++; else z++; c2++; } Termination proof based on disjunctively well-founded relation. We can even compute bounds using following proof structure: –Number of times if-branch is executed (if at all): n-x 0 –Number of times else-branch is executed (if at all): n-z 0 –Therefore, total iterations: Max(0,n-x 0 ) + Max(0,n-z 0 ) 8

Outline Challenges in Bounds Analysis  Idea #1: Proof Structure (control flow) Idea #2: Quantitative Functions (data-structures) 9

Proof Structure Proof Structure specifies where to increment and initialize multiple counter variables. It is a tuple (M,G) such that M maps each backedge q to some counter variable c. –“c++” at q. G is some DAG over counter variables. –“c:=0” at entry and where any predecessor in G is incremented. Invariant tool can bound counters instrumented as above. 10 c1 c2 c3 Proof Structure for StringBuffer Example: M = {q  c3, q1  c3, q2  c1, q3  c2,} q: backedge of outer loop, qi: backedge of i th inner loop G =

Computing bound from a proof structure Given a proof structure (M,G), bound U is computed as: U = Sum { TotalBound(c) | c } TotalBound(c) = Max{ 0, B(q) | M(q) = c } £ (1 + Sum{TotalBound(c’) | (c’,c) 2 G}) Where B(q) is the bound computed on M(q) at q. Bound for StringBuffer Example: U = Len(s1) + Len(s2) + (1+size) £ (Len(s1)+Len(s2)) 11

Automatically Computing Proof Structure Total number of potential proof structures (M,G) are exponential in number of back-edges. –Hence a naïve search is expensive. Key Idea: Increasing counters and dependencies increases ability of an invariant generation tool to discover bounds. –But cannot simply make all counters depend on each other. –Need to find right set of dependencies that create a DAG. There is a quadratic (in number of back-edges) algorithm to compute a (counter-optimal) proof structure. [POPL ’09] –A counter-optimal proof structure uses minimal counters and miminal dependencies between counters. –Generally, this leads to more precise bounds. 12

Outline Challenges in Bounds Analysis Idea #1: Proof Structure (control flow)  Idea #2: Quantitative Functions (data-structures) 13

Quantitative Functions Defined over tuple of abstract data-structures –Similar to ghost fields. Len(L) : Length of list L. Pos(e,L) : Position of list-element e in List L. Semantics is defined by describing effect of data- structure methods on quantitative functions. –Sequence of (conditional) assignments and assumes. –Can also refer to unscoped variables (universally quantified). 14 Data Structure OperationUpdates to Quantitative Functions L.Append(e);Len(L)++; Pos(e,L) := Len(L); L.Delete(e);Len(L)--; if (Pos(e,L) < Pos(e’,L)) Pos(e’,L) --; e1 := L.GetNext(e2);Pos(e1,L) := Pos(e2,L)+1; Assume(Pos(e1,L) · Len(L));

Principles behind defining Quantitative Functions Precision –Defining more quantitative fns. increases ability of linear invariant generation tool to find bounds. –In practice, a few quantitative fns are usually sufficient. Soundness –Method annotations are always sound from tool’s perspective. –User’s responsibility to ensure that intended semantics matches with the method annotations. –Verification is possible if intended semantics can be described in an appropriate logic Gulwani, Sagiv, Lev-Ami: “A Combination Framework for Tracking Partition Sizes”, POPL

Computing Invariants over Quantitative Functions Instrument a data-structure method call with its effect allowing quantitative fns. to be treated as uninterpreted. –Instantiate unscoped variables with all appropriate terms. Use a linear invariant generation tool with support for uninterpreted functions. –Abstract Interpretation based Technique. Combine Polyhedron abstract domain [Cousot, POPL ‘79] with uninterpreted fns domain [Gulwani, Necula, SAS’ 04] using domain-combinators [Gulwani, Tiwari, PLDI ‘06] –Constraint-based Invariant Generation Technique. [Beyer et.al., VMCAI ‘07] 16

Example: Breadth First Traversal BFT(List L): ToDo.Init(); L.MoveTo(L.Head(),ToDo); c:=0; while (! ToDo.IsEmpty()) e := ToDo.Head(); ToDo.Delete(e); foreach successor s in e.Successors() if (L.contains(s)) L.MoveTo(s,ToDo); c++; Inductive Invariant at back-edge of while-loop c · Old(Len(L)) - Len(L) – Len(ToDo) Æ Len(L) ¸ 0 Æ Len(ToDo) ¸ 0 This implies a bound of Old(Len(L)) for while loop. 17

Quantitative Functions for Bit-vectors Ones(b): Number of 1 bits in b One(b): Position of least significant 1 bit in b Bits(b): Number of bits in b 18 Data Structure Operation Updates to Quantitative Functions a := b << index;Ones(a) := ?; Assume (Ones(a) · Ones(b)); One(a) := index + One(b);

Example Iterate(BitVector a): b := a; c := 0; while (BitScanForward(&id1,b)) b := b | ((1 << id1)-1); // set all bits before id1 if (BitScanForward(&id2, » b)) break; b := b & ( » ((1 << id2)-1)); // reset bits before id2 c++; Each loop iteration masks chunk of consecutive 1s to 0. Our tool computes invariant: c · Ones(a)-Ones(b) Æ 2c · One(b)-One(a) Æ One(b) · Bits(a) This implies bound of Min {Ones(a), Bits(a)/2 } 19

Quantitative Functions for List of Lists TotalNodes(L) = Sum { Len(e’) | L.BelongsTo(e’) } MaxNodes(L) = Max { Len(e’) | L.BelongsTo(e’) } 20 ProgramBound for (e := L.Head(); e  null; e := L.GetNext(e));Len(L) for (e := L.Head(); e  null; e := L.GetNext(e)) for (f := e.Head(); f  null; f := e.GetNext(f)); Len(L) + TotalNodes(L) for (e := L.Head(); e  null; e := L.GetNext(e)) if (*) break; for (f := e.Head(); f  null; f := e.GetNext(f)); Len(L) + MaxNodes(L)

Quantitative Functions for Trees Nodes(T): Total number of nodes in tree T Height(T): Height of tree T 21

Conclusion Applications of Symbolic Bounds Analysis –Interactive code development, Embedded/Real-time systems Challenges in Bounds Analysis –Control flow leads to non-linear and disjunctive bounds. –Data-structures require numerical shape analysis. Idea #1: Proof Structure (control flow) –Addresses issue of non-linear and disjunctive bounds. –Reduces Bounds Analysis to linear numerical shape analysis. Idea #2: Quantitative Functions (data-structures) –Further reduces Bounds Analysis to linear invariant generation over uninterpreted functions. 22