ESP: Program Verification Of Millions of Lines of Code Manuvir Das Researcher PPRC Reliability Team Microsoft Research.

Slides:



Advertisements
Similar presentations
Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.
Advertisements

Object Oriented Analysis And Design-IT0207 iiI Semester
Semantics Static semantics Dynamic semantics attribute grammars
Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
Effectively Prioritizing Tests in Development Environment
INF 212 ANALYSIS OF PROG. LANGS Type Systems Instructors: Crista Lopes Copyright © Instructors.
Taming Win32 Threads with Static Analysis Jason Yang Program Analysis Group Center for Software Excellence (CSE) Microsoft Corporation.
© Janice Regan, CMPT 102, Sept CMPT 102 Introduction to Scientific Computer Programming The software development method algorithms.
1 Static Testing: defect prevention SIM objectives Able to list various type of structured group examinations (manual checking) Able to statically.
Synergy: A New Algorithm for Property Checking
Model Checking. Used in studying behaviors of reactive systems Typically involves three steps: Create a finite state model (FSM) of the system design.
Final exam week Three things on finals week: –final exam –final project presentations –final project report.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Speeding Up Dataflow Analysis Using Flow- Insensitive Pointer Analysis Stephen Adams, Tom Ball, Manuvir Das Sorin Lerner, Mark Seigle Westley Weimer Microsoft.
Software Reliability Methods Sorin Lerner. Software reliability methods: issues What are the issues?
Software Excellence via Program Verification at Microsoft Manuvir Das Center for Software Excellence Microsoft Corporation.
ESP [Das et al PLDI 2002] Interface usage rules in documentation –Order of operations, data access –Resource management –Incomplete, wordy, not checked.
Guide To UNIX Using Linux Third Edition
© 2006 Pearson Addison-Wesley. All rights reserved2-1 Chapter 2 Principles of Programming & Software Engineering.
Improving the Precision of Abstract Simulation using Demand-driven Analysis Olatunji Ruwase Suzanne Rivoire CS June 12, 2002.
Principle of Functional Verification Chapter 1~3 Presenter : Fu-Ching Yang.
Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft.
Chapter 3 Planning Your Solution
1CMSC 345, Version 4/04 Verification and Validation Reference: Software Engineering, Ian Sommerville, 6th edition, Chapter 19.
1 Joe Meehean. 2 Testing is the process of executing a program with the intent of finding errors. -Glenford Myers.
Precision Going back to constant prop, in what cases would we lose precision?
Impact Analysis of Database Schema Changes Andy Maule, Wolfgang Emmerich and David S. Rosenblum London Software Systems Dept. of Computer Science, University.
Software Testing Sudipto Ghosh CS 406 Fall 99 November 9, 1999.
Formal Methods 1. Software Engineering and Formal Methods  Every software engineering methodology is based on a recommended development process  proceeding.
Cheng/Dillon-Software Engineering: Formal Methods Model Checking.
Language Evaluation Criteria
System/Software Testing
Structural Coverage Verilog code is available to help generate tests o Code can be analyzed statically and/or simulated Easier to detect “additive” design.
Abstraction IS 101Y/CMSC 101 Computational Thinking and Design Tuesday, September 17, 2013 Carolyn Seaman University of Maryland, Baltimore County.
Unleashing the Power of Static Analysis Manuvir Das Principal Researcher Center for Software Excellence Microsoft Corporation.
Scalable Defect Detection Manuvir Das, Zhe Yang, Daniel Wang Center for Software Excellence Microsoft Corporation.
CS 501: Software Engineering Fall 1999 Lecture 16 Verification and Validation.
Introduction CS 3358 Data Structures. What is Computer Science? Computer Science is the study of algorithms, including their  Formal and mathematical.
Abstraction IS 101Y/CMSC 101 Computational Thinking and Design Tuesday, September 17, 2013 Marie desJardins University of Maryland, Baltimore County.
Aditya V. Nori, Sriram K. Rajamani Microsoft Research India.
Carnegie Mellon University 10/23/2015 Survivability Analysis via Model Checking Oleg Sheyner Jeannette Wing Carnegie Mellon University.
Type Systems CS Definitions Program analysis Discovering facts about programs. Dynamic analysis Program analysis by using program executions.
Introduction CS 3358 Data Structures. What is Computer Science? Computer Science is the study of algorithms, including their  Formal and mathematical.
1 Introduction to Software Testing. Reading Assignment P. Ammann and J. Offutt “Introduction to Software Testing” ◦ Chapter 1 2.
Review, Pseudocode, Flow Charting, and Storyboarding.
Convergence of Model Checking & Program Analysis Philippe Giabbanelli CMPT 894 – Spring 2008.
Copyright © Curt Hill The IF Revisited If part 4 Style and Testing.
© 2006 Pearson Addison-Wesley. All rights reserved 2-1 Chapter 2 Principles of Programming & Software Engineering.
CSCI1600: Embedded and Real Time Software Lecture 28: Verification I Steven Reiss, Fall 2015.
Software Quality Assurance and Testing Fazal Rehman Shamil.
Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.
1 Iterative Program Analysis Abstract Interpretation Mooly Sagiv Tel Aviv University Textbook:
MOPS: an Infrastructure for Examining Security Properties of Software Authors Hao Chen and David Wagner Appears in ACM Conference on Computer and Communications.
Introduction to Computing Systems and Programming Programming.
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Tool Support for Testing Classify different types of test tools according to their purpose Explain the benefits of using test tools.
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
CS223: Software Engineering Lecture 26: Software Testing.
University of Virginia Computer Science Extensible Lightweight Static Checking David Evans On the I/O.
Structured Programming The Basics
Verification and Validation
CodePeer Update Arnaud Charlet CodePeer Update Arnaud Charlet
CodePeer Update Arnaud Charlet CodePeer Update Arnaud Charlet
ESP: Program Verification Of Millions of Lines of Code
Chapter 10 – Software Testing
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
Presentation transcript:

ESP: Program Verification Of Millions of Lines of Code Manuvir Das Researcher PPRC Reliability Team Microsoft Research

Motivation No Buffer Overruns! No Resource Leaks! No Privilege Misuse!

Approach Redundency is good Redundency is good Redundancy exposes inconsistency Redundancy exposes inconsistency Inconsistency points to errors Inconsistency points to errors Compare Compare  what programmer should do  what her code actually does

Lightweight specifications Rules Rules  Describe correct behavior  Readable/writable by programmers Specify limited properties Specify limited properties  not total correctness/verification Compare rules against code Compare rules against code

Types are rules Programmers use types to Programmers use types to  document interface syntax  represent program abstractions Types are written, read and checked Types are written, read and checked  routine part of development process Why are types successful? Why are types successful?  types are lightweight specifications  type checking is fast & routine  errors are found early, at compile-time

Can we extend this approach? Specify and check other properties Specify and check other properties  languages to express rules  tools to check that code obeys rules Goal is partial correctness Goal is partial correctness  detect and report important classes of errors  no guarantee of program correctness Systematic tools of various flavors Systematic tools of various flavors  compile-time verifiers and bug-finders  run-time monitors and fault injectors  document generators

Source Code Testing Development Precise Rules Program Analysis Engine Read for understanding New API rules Drive testing tools Static Verification Tool Rules 100% path coverage Defects Rule-based programming

C/C++ Code OPAL Rules Path-sensitive Dataflow Analysis ESP Rules 100% path coverage Defects ESP

Requirements Scalability Scalability  Complete coverage  Millions of lines of code  All features of C/C++ Usability Usability  Low number of false positives  Simple rule description language  Informative error reports

The bottom line Can ESP verify a million lines of code? Can ESP verify a million lines of code? We’re not sure …. yet We’re not sure …. yet We’ve done 150 KLOC in 70s and 50MB We’ve done 150 KLOC in 70s and 50MB So, we’re cautiously optimistic So, we’re cautiously optimistic

Are we running into a wall? Verification demands precision Verification demands precision  Need to minimize false error reports  Must analyze each execution path Big programs demand scalability Big programs demand scalability  Exponentially/infinitely many paths  Cannot analyze each execution path  Must use approximate analysis

Research problem Can we invent a verification method that Can we invent a verification method that  is always conservative,  is always scalable,  is almost always precise, and  matches our intuition? Yes, for a certain class of rules Yes, for a certain class of rules  Finite state, temporal safety properties

Finite state safety properties Property is described by an FSA Property is described by an FSA As the program executes, a monitor As the program executes, a monitor  tracks the current state of the FSA  updates the current state  signals an error when the FSA transitions into special error states Goal of verification: Goal of verification:  Is there some execution path that would cause the monitor to signal an error?

Example: stdio usage in gcc void main () { if (dump) fil = fopen(dumpFile,”w”); if (p) x = 0; else x = 1; if (dump) fclose(fil); } Closed Opened Error Open Print Open Close Print/Close * void main () { if (dump) Open; if (p) x = 0; else x = 1; if (dump) Close; }

Path-sensitive property analysis Symbolically evaluate the program Symbolically evaluate the program Track FSA state and execution state Track FSA state and execution state At branch points: At branch points:  Execution state implies branch direction?  Yes: process appropriate branch  No: split state and process both branches

Example [Opened|dump=T] [Closed|dump=T] [Opened|dump=T,p=T] [Opened|dump=T,p=T,x=0] [Closed|dump=T,p=T,x=0] [Closed] entry dump p x = 0x = 1 Open Close exit dump T T T F F F [Opened|dump=T,p=F] [Opened|dump=T,p=F,x=1] [Closed|dump=T,p=F,x=1]

Dataflow property analysis Track only FSA state Track only FSA state Ignore non-state-changing code Ignore non-state-changing code At control flow join points: At control flow join points:  Accumulate FSA states

Example {Closed,Opened} {Error,Closed,Opened} {Closed} {Closed,Opened} entry dump p x = 0x = 1 Open Close exit dump T T T F F F

Why is this code correct? void main () { if (dump) Open; if (p) x = 0; else x = 1; if (dump) Close; } Closed Opened Error Open Print Open Close Print/Close *

When is a branch relevant? Precise answer Precise answer  When the value of the branch condition determines the property FSA state Heuristic answer Heuristic answer  When the property FSA is driven to different states along the arms of the branch statement

Property simulation Modification of path-sensitive analysis Modification of path-sensitive analysis At control flow join points: At control flow join points:  States agree on property FSA state?   Yes: merge states   No: process states separately

Example [Opened|dump=T] [Closed] [Closed|dump=T] [Closed|dump=F] [Opened|dump=T,p=T,x=0] [Opened|dump=T,p=F,x=1] [Opened|dump=T] [Closed|dump=F] [Closed|dump=T] [Closed|dump=F] [Closed] entry dump p x = 0x = 1 Open Close exit dump T T T F F F

Loop example entry * new++ Close exit new != old T T F F new = old Open [Closed] [Opened|new=old] [Closed|new=old+1] [Opened|new=old] [Closed|new=old] [Closed|new=old+1]

Making property simulation work Real programs are complex Real programs are complex  Multiple FSAs  Aliasing Real code bases are very large Real code bases are very large  Well beyond a million lines ESP = ESP = Property Simulation + Multiple FSAs + Property Simulation + Multiple FSAs + Aliasing + Component-wise Analysis Aliasing + Component-wise Analysis

Problem: Multiple FSAs void main () { if (dump1) fil1 = fopen(dumpFile1,”w”); if (dump2) fil2 = fopen(dumpFile2,”w”); if (dump1) fclose(fil1); if (dump2) fclose(fil2); } TransitionSource code pattern Close fclose(e) fclose(e) Open e = fopen(_) e = fopen(_) void main () { if (dump1) fil1 = fopen(dumpFile1,”w”); if (dump2) fil2 = fopen(dumpFile2,”w”); if (dump1) fclose(fil1); if (dump2) fclose(fil2); } void main () { if (dump1) Open(fil1); if (dump2) Open(fil2); if (dump1) Close(fil1); if (dump2) Close(fil2); } Transition Source code pattern Close fclose(e) fclose(e) Open e = fopen(_) e = fopen(_) Closed Opened Error Open Print Open Close Print/Close *

Property simulation, bit by bit void main () { if (dump1) Open; if (dump2) ID; if (dump1) Close; if (dump2) ID; } void main () { if (dump1) ID; if (dump2) Open; if (dump1) ID; if (dump2) Close; } Problem: property state can be exponential Problem: property state can be exponential Solution: track one FSA at a time Solution: track one FSA at a time

Property simulation, bit by bit One FSA at a time One FSA at a time + Avoids exponential property state + Fewer branches are relevant + Lifetimes are often short + Smaller memory footprint + Embarassingly parallel − Cannot correlate FSAs

Problem: Aliasing void main () { if (dump1) fil1 = fopen(dumpFile1,”w”); if (dump2) fil2 = fopen(dumpFile2,”w”); fil3 = fil1; if (dump1) fclose( fil3 ); if (dump2) fclose( fil2 ); }

ESP Model: Values Have State During execution, the program During execution, the program  creates stateful values  changes the state of stateful values The programmer defines The programmer defines  how values are created (syntactic patterns)  how values change state (syntactic patterns) Syntactic expressions are aliases for values Syntactic expressions are aliases for values

OPAL Rule Descriptions Object Property Automata Language Object Property Automata Language State Closed State Opened State Error Initial Event Open { _object_ ASTFUNCTIONCALL { ASTSYMBOL “fopen” } { _anyargs_ } } Event Close { ASTFUNCTIONCALL { ASTSYMBOL “fclose” } { _object_ } } Transition _ -> Opened on Open Transition Opened -> Closed on Close Transition Closed -> Error on Close “File already closed”

Parameterized transitions void main () { if (dump1) fil1 = fopen(dumpFile1,”w”); if (dump2) fil2 = fopen(dumpFile2,”w”); fil3 = fil1; if (dump1) fclose( fil3 ); if (dump2) fclose( fil2 ); }

Parameterized transitions void main () { if (dump1) { t1 = fopen(dumpFile1,”w”); Open(t1); fil1 = t1; } if (dump2) { t2 = fopen(dumpFile2,”w”); Open(t2); fil2 = t2; } fil3 = fil1; if (dump1) { fclose( fil3 ); Close(fil3); } if (dump2) { fclose( fil2 ); Close(fil2); }

Expressions are value aliases void main () { if (dump1) { t1 = fopen(dumpFile1,”w”); Open(t1); fil1 = t1; } if (dump2) { t2 = fopen(dumpFile2,”w”); Open(t2); fil2 = t2; } fil3 = fil1; if (dump1) { fclose( fil3 ); Close(fil3); } if (dump2) { fclose( fil2 ); Close(fil2); }

Value-alias analysis Is expression e an alias for value v? Is expression e an alias for value v? ESP uses GOLF to answer this query ESP uses GOLF to answer this query Generalized One Level Flow Generalized One Level Flow  Context-sensitive  Largely flow-insensitive  Millions of lines of code, in seconds

Putting it all together Property simulation Property simulation  Identify and track relevant execution state Syntactic patterns + value-alias analysis Syntactic patterns + value-alias analysis  Identify and isolate individual FSAs One FSA at a time One FSA at a time  Bit vector analysis for safety properties

Case study: stdio usage in gcc cc1 from gcc version (Spec95) cc1 from gcc version (Spec95) Does cc1 always print to opened files? Does cc1 always print to opened files? cc1 is a complex program: cc1 is a complex program:  140K non-blank, non-comment lines of C  2149 functions, 66 files, 1086 globals  Call graph includes one 450 function SCC

Skeleton of cc1 source FILE *f1, …, *f15; int p1, …, p15; void compileFile() { if (p1) f1 = fopen(…); … if (p15) f15 = fopen(…); restOfComp(); if (p1) fclose(f1); … if (p15) fclose(f15); } void restOfComp() { if (p1) printRtl(f1); … if (p15) printRtl(f15); restOfComp(); } void printRtl(FILE *f) { fprintf(f); }

OPAL rules for stdio usage State Uninit State Closed State Opened State Error Initial Event Decl {ASTDECLARATION {_object_ ASTSYMBOL _any_}} Initial Event Open {_object_ ASTFUNCTIONCALL {ASTSYMBOL “fopen”} {_anyargs_}} Event Print {ASTFUNCTIONCALL {ASTSYMBOL “fprintf”} {_object_,_anyargs_}} Event Close {ASTFUNCTIONCALL {ASTSYMBOL “fclose”} {_object_}} Transition _ -> Uninit on Decl Transition _ -> Opened on Open Transition Uninit -> Error on Print “File not opened” Transition Opened -> Opened on Print Transition Closed -> Error on Print “Printing to closed file” Transition Opened -> Closed on Close Transition Closed -> Error on Close “File already closed”

Experimental results Precision Precision  Verification succeeds for every file handle  No transitions to Error ; no false errors Scalability Scalability  Ave. per handle: 72.9 seconds, 49.7 MB  Single 1GHz PIII laptop with 512 MB RAM We have proved that: We have proved that:  Each of the 646 calls to fprintf in the source code prints to a valid, open file

Ongoing research Path-sensitive value-alias analysis Path-sensitive value-alias analysis  Value-alias sets  Expressions that hold tracked value  Track value-alias sets during simulation  Add value-alias sets to property state  When things get complicated, use GOLF Component-wise analysis Component-wise analysis  Identify and analyze components  Link using less precise analysis