Safely Supporting Probabilistic Data: PL Techniques as Part of the Story Dan Grossman University of Washington.

Slides:



Advertisements
Similar presentations
Demand-driven inference of loop invariants in a theorem prover
Advertisements

Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
A Program Transformation For Faster Goal-Directed Search Akash Lal, Shaz Qadeer Microsoft Research.
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
Programming Abstractions for Approximate Computing Michael Carbin with Sasa Misailovic, Hank Hoffmann, Deokhwan Kim, Stelios Sidiroglou, Martin Rinard.
A survey of techniques for precise program slicing Komondoor V. Raghavan Indian Institute of Science, Bangalore.
An Integration of Program Analysis and Automated Theorem Proving Bill J. Ellis & Andrew Ireland School of Mathematical & Computer Sciences Heriot-Watt.
CSE503: SOFTWARE ENGINEERING SYMBOLIC TESTING, AUTOMATED TEST GENERATION … AND MORE! David Notkin Spring 2011.
C. FlanaganSAS’04: Type Inference Against Races1 Type Inference Against Races Cormac Flanagan UC Santa Cruz Stephen N. Freund Williams College.
A Type-Checked Restrict Qualifier Jeff Foster OSQ Retreat May 9-10, 2001.
Well-cooked Spaghetti: Weakest-Precondition of Unstructured Programs Mike Barnett and Rustan Leino Microsoft Research Redmond, WA, USA.
School of Computer ScienceG53FSP Formal Specification1 Dr. Rong Qu Introduction to Formal Specification
Composing Dataflow Analyses and Transformations Sorin Lerner (University of Washington) David Grove (IBM T.J. Watson) Craig Chambers (University of Washington)
CS527: (Advanced) Topics in Software Engineering Overview of Software Quality Assurance Tao Xie ©D. Marinov, T. Xie.
CUTE: A Concolic Unit Testing Engine for C Technical Report Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
1 Debugging and Testing Overview Defensive Programming The goal is to prevent failures Debugging The goal is to find cause of failures and fix it Testing.
Mathematical Modeling and Formal Specification Languages CIS 376 Bruce R. Maxim UM-Dearborn.
Type Systems CS Definitions Program analysis Discovering facts about programs. Dynamic analysis Program analysis by using program executions.
Writing Systems Software in a Functional Language An Experience Report Iavor Diatchki, Thomas Hallgren, Mark Jones, Rebekah Leslie, Andrew Tolmach.
QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs By Koen Claessen, Juhn Hughes ME: Mike Izbicki.
A Specification Logic for Exceptions and Beyond Cristina David Cristian Gherghina National University of Singapore.
Types and Programming Languages Lecture 11 Simon Gay Department of Computing Science University of Glasgow 2006/07.
Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler.
Software Testing Mehwish Shafiq. Testing Testing is carried out to validate and verify the piece developed in order to give user a confidence to use reliable.
CUTE: A Concolic Unit Testing Engine for C Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
Introduction to Software Analysis CS Why Take This Course? Learn methods to improve software quality – reliability, security, performance, etc.
EnerJ: Approximate Data Types for Safe and General Low-Power Computation (PLDI’2011) Adrian Sampson, Werner Dietl, Emily Fortuna Danushen Gnanapragasam,
Rely: Verifying Quantitative Reliability for Programs that Execute on Unreliable Hardware Michael Carbin, Sasa Misailovic, and Martin Rinard MIT CSAIL.
CSE 331 SOFTWARE DESIGN & IMPLEMENTATION SYMBOLIC TESTING Autumn 2011.
LECTURE 10 Semantic Analysis. REVIEW So far, we’ve covered the following: Compilation methods: compilation vs. interpretation. The overall compilation.
A Single Intermediate Language That Supports Multiple Implemtntation of Exceptions Delvin Defoe Washington University in Saint Louis Department of Computer.
Functional Programming
The architecture of the P416 compiler
Type Checking and Type Inference
CSE341: Programming Languages Lecture 11 Type Inference
Data Flow Analysis Suman Jana
Types for Programs and Proofs
CSE 374 Programming Concepts & Tools
Representation, Syntax, Paradigms, Types
Testing UW CSE 160 Winter 2017.
Ch. 4 – Semantic Analysis Errors can arise in syntax, static semantics, dynamic semantics Some PL features are impossible or infeasible to specify in grammar.
(One-Path) Reachability Logic
Types of Testing Visit to more Learning Resources.
Testing UW CSE 160 Spring 2018.
The Future of Software Engineering: Tools
Stack Memory 1 (also called Call Stack)
Phil Tayco Slide version 1.0 Created Oct 2, 2017
Securing A Compiler Transformation
Chapter 15 Debugging.
University Of Virginia
Lecture 5 Floyd-Hoare Style Verification
Testing UW CSE 160 Winter 2016.
Information Security CS 526
The Procedure Abstraction Part I: Basics
CSE341: Programming Languages Lecture 11 Type Inference
Representation, Syntax, Paradigms, Types
Semantics In Text: Chapter 3.
Chapter 15 Debugging.
CSE341: Programming Languages Lecture 11 Type Inference
Representation, Syntax, Paradigms, Types
Information Security CS 526
Representation, Syntax, Paradigms, Types
CSE341: Programming Languages Lecture 11 Type Inference
CUTE: A Concolic Unit Testing Engine for C
Assertions References: internet notes; Bertrand Meyer, Object-Oriented Software Construction; 4/25/2019.
CSE 1020:Software Development
Information Security CS 526
CSE341: Programming Languages Lecture 11 Type Inference
Adding Uncertainty to a RETE-OO Rule Engine
CSE341: Programming Languages Lecture 11 Type Inference
Presentation transcript:

Safely Supporting Probabilistic Data: PL Techniques as Part of the Story Dan Grossman University of Washington

Executive summary Language design+implementation to help wrangle uncertainty PL concepts+tools are a huge help But also need to learn from statisticians Validate these claims with UW’s approximate computing work Acknowledgments: 9 co-authors [papers at end] Especially Adrian Sampson, Luis Ceze Including Kathryn and Todd

Background (1/2): PL bread-and-butter Types for information flow int<H> x; int<L> y; if(x) y = 7; ✗ Symbolic execution z: z = x; if(x!=0) z = x*y; ? : == x * y Type inference let f =  y. y+7 let z = f 9 let q = z && true ✗ Function inlining/specialization int f(int x, int y){ return x*y; } f(0,a) 0 f(3,b) f(3,b) f(1,c) c

Background (2/2): Approximation Full bit-precision is unnecessary and wastes energy Allowing probabilistic [in]correctness can work! Let ALUs and memory produce garbage with low-nonzero probability But most code/programmers want nothing to do with that…

EnerJ (and EnerC) a la 2011 Information flow is exactly the right high-level abstraction Type qualifier for @approx Explicit endorse as needed Convenient: Opt-in with precise default Overloaded operations and methods Strong guarantee: Approximate data has no effect on precise data except via endorse (classic non-interference theorem) @approx int x = 12; int y = 27; y = x*2; x = y*3; @approx int z = f(x); if(looks_okay(z)) int w = endorse(z); … ✗

EnerJ limitations Only “best effort” semantics for approximate computation Encapsulated all the probability, and then ignored it! No approximate control-flow (without endorse) Stronger limitation to ensure non-interference, no crashes, no extra non-termination, …

Adding probabilities (2015) Address limitation #1 directly: @approx<p> int: static guarantee that at run-time value will be correct with at least probability p Operator uses (e.g., +) also have correctness probability EnerJ’s @approx is @approx<0.0> Precise is @approx<1.0> Natural subtyping: @approx<p> t <: @approx<q> t if p >= q [See also Mike’s Rely and Chisel work (2014)]

Essential additions Type inference, part 1: Programmer states probabilities at key points (inputs, outputs) Automatic solver fills in the rest, and/or programmer can provide more annotations Type inference, part 2: Problem often under-constrained; goal is to save as much energy as possible within constraints We use Microsoft’s Z3 solver with a custom objective function

Essential additions Type inference, part 1: Programmer states probabilities at key points (inputs, outputs) Automatic solver fills in the rest, and/or programmer can provide more annotations Type inference, part 2: Problem often under-constrained; goal is to save as much energy as possible within constraints We use Microsoft’s Z3 solver with a custom objective function Method specialization Up to k approximation settings for each method Opt-in dynamic tracking for loop-carried dependencies

Still not much statistics Additions are all “PL bread-and-butter” Uses only one trivial statistical fact: Result type is precise if x, y, (and addition) are independent Result type is sound regardless of [in]dependence Other panelists all make much better use of statistics, like in our probabilistic assertions work… @approx<p1> int x = …; @approx<p2> int y = …; x +<p3> y // @approx<p1*p2*p3>

Probabilistic assertions (2014) Much richer setting: Inputs/values can have arbitrary distributions, not just “Bernouilli failure” Dependence tracked via symbolic execution, even through if-statements and some loops Evaluate arbitrary probabilistic assertions: passert(e,p,c) Key insight: Data-structure produced by symbolic execution is an “expression DAG” and a “Bayesian network” So apply compiler and statistical optimizations to it Followed by hypothesis testing

The limitation EnerJ and follow-on work gave static guarantees regardless of input Probabilistic assertions either revalidates for each input (testing) or needs probabilistic assumptions (distributions) of inputs Need more research on: Bridging this gap Supporting unbounded loops by soundly trimming low- probability paths

The big context “Early days” Excited by the panel’s work, but many open questions… Technical questions: (loops, modularity, scale, …) Tools questions, also with some preliminary work Debugging, profiling, monitoring Error messages Is “adding statistical properties” to modern language design The True Way Forward or a local optimum to avoid?

To learn more EnerJ: Approximate Data Types for Safe and General Low-Power Computation. Adrian Sampson, Werner Dietl, Emily Fortuna, Danushen Gnanapragasam, Luis Ceze, Dan Grossman. PLDI2011 Expressing and Verifying Probabilistic Assertions. Adrian Sampson, Pavel Panchekha, Todd Mytkowicz, Kathryn S. McKinley, Dan Grossman, Luis Ceze. PLDI2014 Probability Type Inference for Flexible Approximate Programming. Brett Boston, Adrian Sampson, Dan Grossman, Luis Ceze. OOPSLA 2015