How analysis can hinder source code manipulaton And what to do about it Michael Ernst University of Washington.

Slides:



Advertisements
Similar presentations
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Advertisements

Design by Contract.
Rigorous Software Development CSCI-GA Instructor: Thomas Wies Spring 2012 Lecture 8.
Modular and Verified Automatic Program Repair Francesco Logozzo, Thomas Ball RiSE - Microsoft Research Redmond.
An Abstract Interpretation Framework for Refactoring P. Cousot, NYU, ENS, CNRS, INRIA R. Cousot, ENS, CNRS, INRIA F. Logozzo, M. Barnett, Microsoft Research.
Identity and Equality Based on material by Michael Ernst, University of Washington.
A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Koushik Sen CS 265.
Dynamic Typing COS 441 Princeton University Fall 2004.
INF 212 ANALYSIS OF PROG. LANGS Type Systems Instructors: Crista Lopes Copyright © Instructors.
An Integration of Program Analysis and Automated Theorem Proving Bill J. Ellis & Andrew Ireland School of Mathematical & Computer Sciences Heriot-Watt.
CSE 1302 Lecture 8 Inheritance Richard Gesick Figures from Deitel, “Visual C#”, Pearson.
How tests and proofs impede one another: The need for always-on static and dynamic feedback Michael Ernst joint work with Michael Bayne University of Washington.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
Type-Safe Programming in C George Necula EECS Department University of California, Berkeley.
Programmability with Proof-Carrying Code George C. Necula University of California Berkeley Peter Lee Carnegie Mellon University.
OOP #10: Correctness Fritz Henglein. Wrap-up: Types A type is a collection of objects with common behavior (operations and properties). (Abstract) types.
Describing Syntax and Semantics
1 A Short Introduction to (Object-Oriented) Type Systems Kris De Volder.
1 The Problem o Fluid software cannot be trusted to behave as advertised unknown origin (must be assumed to be malicious) known origin (can be erroneous.
CSE341: Programming Languages Lecture 11 Type Inference Dan Grossman Winter 2013.
SEG Software Maintenance1 Software Maintenance “The modification of a software product after delivery to correct faults, to improve performance or.
Unit Testing & Defensive Programming. F-22 Raptor Fighter.
1 Chapter One A First Program Using C#. 2 Objectives Learn about programming tasks Learn object-oriented programming concepts Learn about the C# programming.
Chapter 3 – Agile Software Development 1Chapter 3 Agile software development.
Lecture 8 Inheritance Richard Gesick. 2 OBJECTIVES How inheritance promotes software reusability. The concepts of base classes and derived classes. To.
CISC6795: Spring Object-Oriented Programming: Polymorphism.
Software Engineering Prof. Dr. Bertrand Meyer March 2007 – June 2007 Chair of Software Engineering Static program checking and verification Slides: Based.
Types for Programs and Proofs Lecture 1. What are types? int, float, char, …, arrays types of procedures, functions, references, records, objects,...
1 Abstraction  Identify important aspects and ignore the details  Permeates software development programming languages are abstractions built on hardware.
Inference and Checking of Object Ownership Wei Huang 1, Werner Dietl 2, Ana Milanova 1, Michael D. Ernst 2 1 Rensselaer Polytechnic Institute 2 University.
CSC 211 Introduction to Design Patterns. Intro to the course Syllabus About the textbook – Read the introduction and Chapter 1 Good attendance is the.
CSC-682 Cryptography & Computer Security Sound and Precise Analysis of Web Applications for Injection Vulnerabilities Pompi Rotaru Based on an article.
Type Systems CS Definitions Program analysis Discovering facts about programs. Dynamic analysis Program analysis by using program executions.
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
CSC 395 – Software Engineering Lecture 10: Execution-based Testing –or– We can make it better than it was. Better...faster...agiler.
User-defined type checkers for error detection and prevention in Java Michael D. Ernst MIT Computer Science & AI Lab
Module 4 Part 2 Introduction To Software Development : Programming & Languages Introduction To Software Development : Programming & Languages.
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
An Undergraduate Course on Software Bug Detection Tools and Techniques Eric Larson Seattle University March 3, 2006.
“Isolating Failure Causes through Test Case Generation “ Jeremias Rößler Gordon Fraser Andreas Zeller Alessandro Orso Presented by John-Paul Ore.
Design - programming Cmpe 450 Fall Dynamic Analysis Software quality Design carefully from the start Simple and clean Fewer errors Finding errors.
Demo of Scalable Pluggable Types Michael Ernst MIT Dagstuhl Seminar “Scalable Program Analysis” April 17, 2008.
Design Patterns Software Engineering CS 561. Last Time Introduced design patterns Abstraction-Occurrence General Hierarchy Player-Role.
Review of Parnas’ Criteria for Decomposing Systems into Modules Zheng Wang, Yuan Zhang Michigan State University 04/19/2002.
SWE 4743 Abstract Data Types Richard Gesick. SWE Abstract Data Types Object-oriented design is based on the theory of abstract data types Domain.
Inheritance and Class Hierarchies Chapter 3. Chapter 3: Inheritance and Class Hierarchies2 Chapter Objectives To understand inheritance and how it facilitates.
PROGRAMMING PRE- AND POSTCONDITIONS, INVARIANTS AND METHOD CONTRACTS B MODULE 2: SOFTWARE SYSTEMS 13 NOVEMBER 2013.
Combining Static and Dynamic Reasoning for Bug Detection Yannis Smaragdakis and Christoph Csallner Elnatan Reisner – April 17, 2008.
Whole Test Suite Generation. Abstract Not all bugs lead to program crashes, and not always is there a formal specification to check the correctness of.
Preventing bugs with pluggable type-checking Michael Ernst MIT
Defensive Programming. Good programming practices that protect you from your own programming mistakes, as well as those of others – Assertions – Parameter.
The Ins and Outs of Gradual Type Inference Avik Chaudhuri Basil Hosmer Adobe Systems Aseem Rastogi Stony Brook University.
ALLOY: A Formal Methods Tool Glenn Gordon Indiana University of Pennsylvania COSC 481- Formal Methods Dr. W. Oblitey 26 April 2005.
CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 17 – Specifications, error checking & assert.
Jeremy Nimmer, page 1 Automatic Generation of Program Specifications Jeremy Nimmer MIT Lab for Computer Science Joint work with.
Language-Based Security: Overview of Types Deepak Garg Foundations of Security and Privacy October 27, 2009.
CSE341: Programming Languages Lecture 11 Type Inference
Types for Programs and Proofs
CSE 374 Programming Concepts & Tools
State your reasons or how to keep proofs while optimizing code
Lightweight Verification of Array Indexing
Objects First with Java
Lecture 22 Inheritance Richard Gesick.
CSE341: Programming Languages Lecture 11 Type Inference
CSE341: Programming Languages Lecture 11 Type Inference
CSE341: Programming Languages Lecture 11 Type Inference
Computer Science 340 Software Design & Testing
CSE341: Programming Languages Lecture 11 Type Inference
CSE341: Programming Languages Lecture 11 Type Inference
Refactoring.
Presentation transcript:

How analysis can hinder source code manipulaton And what to do about it Michael Ernst University of Washington

Static analysis pros and cons Benefits of program analysis: + identifies/prevents bugs + enables transformation + aids re-engineering & other development tasks Program analysis is mandatory for effective software development This reliance can have negative effects: - delays testing - discourages restructuring/refactoring

My background & biases My previous research thrusts: slicing dynamic analysis testing types security (not clones, much) Today’s talk: inspired by types and testing – Type-checking is the analysis Many of the ideas generalize more broadly – Transformation is the manipulation Type inference General restructuring

Why is program transformation hard? Case study: convert a program to a stronger type system

Two conclusions of this talk 1.We need to strengthen type systems 2.We need to weaken type systems

Upgrading your type system For untyped programs: – Add types to Scheme, Python For typed programs: – Generics (parametric polymorphism): Java 5 – Information flow (e.g., Jif) – Pluggable type qualifiers: nullness, immutability, … For stronger properties: – Extended Static Checking: array bounds – Discharge arbitrary assertions

Java Generics Convert Java 1.4 code to Java 5 code – Key benefit: type-safe containers Instantiation problem: List myList = …;  List myList = …; Parameterization problem – Adding parameters changes the type constraints Series of research papers – OOPSLA 2004, ECOOP 2005, ICSE 2007 class Map { Object get(Object key) { … } } class Map { V get(K key) { … } } 

Java standard libraries The JDK libraries are not generics-correct – The signatures use generic types – The implementations do not type-check Retained from Java 1.4, usually without change Why? – Old design flaws were revealed – Danger of refactoring was too great – The code already worked – It wasn’t worth it

Immutability Goal: avoid unintended side effects – in the context of an imperative language Dozens of papers proposing type systems Little empirical evaluation – Javari: 160KLOC case studies (OOPSLA 2005) – IGJ: 106KLOC case studies (FSE 2007)

Obstacles to evaluation and use of a type system Building a type checker is hard – Solution: Checker Framework (ISSTA 2008) Building a type inference is hard – Example: Javarifier (ECOOP 2008) – Inference is much harder than checking – No general framework for implementations Programs resist transformation – Often type-incorrect – Difficult to make type-correct

Eliminating null pointer exceptions A common analysis – NPEs are pervasive and important – Problem is simple to express – We should be able to solve this problem Goal: prove all dereferences safe – Suppose: 95% success rate, program with 100,000 dereferences – 5000 dereferences to check manually! – Empirically, how many are false positives?

Analysis power vs. transparency A powerful analysis can prove many facts – Example: analysis that considers all possible execution flows in your program – Pointer analysis, type inference A transparent analysis has comprehensible behavior and results – Results depend on local information only – Small change to program  small change in analysis results Making an analysis more transparent – Concrete error cases & counterexamples, … – User-supplied annotations Do programmers need more power or transparency? – We need research addressing this question

Feedback: the essence of engineering Feedback during the development process is essential for improving software quality. Two useful types of feedback: – static checking via type systems and code analysis – dynamic checking via tests and experimentation

Get the feedback you need Static analysis: – Is the design sound and consistent? Does the system fit together as a whole and implement the design? – Guarantees the absence of errors Alternative: discover piecemeal during testing or in the field. Experimentation and testing: – Does this algorithm work? Do the pieces fit together to implement the desired functionality? – Many important properties are beyond any static analysis User satisfaction, but algorithmic properties too Each is most useful in certain circumstances – Sound reasoning can trump quick experimentation – Quick experimentation can trump sound reasoning

Problem: programmers cannot choose between these approaches A programmer should always be able to – execute the software to run tests – statically verify (types or other properties) Current languages and environments impose their own model of feedback – Depends largely on the typing discipline

Dynamically-typed languages Good support for testing At any moment, run tests – No need to bother with irritating type declarations No support for type-checking – And most other static analyses are also very hard Errors persist, and are even discovered in the field Simple errors turn into debugging marathons – Also true for statically-typed languages without support for immutability or other propertes Programmers attempts to emulate a type system – Naming conventions, comments, assertions, unit tests – Less effective and more work than type-checking – Despite its roots, aggressive testing is still desirable

Statically-typed languages Supports both testing and type-checking — in a specific order: 1.First, type-check: all types must be perfect 2.Then, experiment with the system You cannot run the system if type errors exist This delays the learning that comes from experimentation. Type system fails to capture many important properties of software A quick experiment might have indicated a problem and saved time overall – In other cases, type-checking saves time overall This approach stems from: – assumptions about what errors are important – desire to simplify compilers & other tools

Who is in charge? Both language-imposed approaches lead to frustration and wasted effort The programmer should be able to do either type of checking at any time – knows the biggest risks or uncertainties at the moment – knows how to address them … or even do both at the same time

Two research questions When is static feedback (types) most useful? When is dynamic feedback (testing) most useful? Not: “When does lack of X get most in the way?” What language model or toolset gives the benefits of both approaches?

Giving the benefits of both approaches Add a static type system to a dynamically- typed language Relax a static type system

Add types to a dynamically-typed language Popular approach among academics – Bring religion to the heathens Not yet successful – Java generics can be viewed as an exception – Maybe pluggable types will be an exception

Difficulties with re-engineering a program to add stronger types The design may not expressible in a statically type-safe way – May take advantage of the benefits of dynamic typing – Requires many loopholes (casts, suppressed warnings), limiting its value The high-level design may be instantiable type-safely, but this instantiation is not – Programmer had no pressure to make the program type-safe – Programmer chose an implementation strategy that does not preserve type safety – Would have been easy to chose the other implementation approach from the start – The type-safe version is better, but you learn that in retrospect There may be too many errors to correct – Cost of change may outweigh benefits – Especially if the program works now – Every change carries a risk Design may be too hard to understand – Better analysis tools are required

A few reasons for type errors Heterogeneity Application invariants Simple bugs Especially when the property is checkable at run time

The transformation is possible But if you want a typed program, it’s best to aim at that from the beginning

Relax a statically-typed language This approach has not been attempted before (to my knowledge) Start with a typed program Temporarily, at defined moments in the development lifecycle, ignore the types – Treat the program as if it had been written in a dynamically typed language Ignore during compilation and execution. – Exception: resolving overloading/dispatch – Exception: log any errors that would have been thrown “Weekend release” for type systems

Not just dynamic typing or optional typing Different philosophy from dynamically typed languages Dynamically typed execution temporarily, during testing or development – Programmer will re-introduce types after the dynamic experiment – Programmer views loopholes in the static type system as rarities, not commonplaces

Implementation approach Mask but log all errors If the execution fails, show the log to find the first evidence of the failure Prototype shows promise of this approach User testing is necessary

Changing type systems Strengthen them to enforce more guarantees Weaken them to enable better experimentation Enables better transformations as well Instances of: – combining static and dynamic analysis – inverting common approaches Can be extended to other analyses and transformations