Presentation is loading. Please wait.

Presentation is loading. Please wait.

ESP: Program Verification Of Millions of Lines of Code Manuvir Das Researcher PPRC Reliability Team Microsoft Research.

Similar presentations


Presentation on theme: "ESP: Program Verification Of Millions of Lines of Code Manuvir Das Researcher PPRC Reliability Team Microsoft Research."— Presentation transcript:

1

2 ESP: Program Verification Of Millions of Lines of Code Manuvir Das Researcher PPRC Reliability Team Microsoft Research

3 Motivation No Buffer Overruns! No Resource Leaks! No Privilege Misuse!

4 Approach Redundency is good Redundency is good Redundancy exposes inconsistency Redundancy exposes inconsistency Inconsistency points to errors Inconsistency points to errors Compare Compare  what programmer should do  what her code actually does

5 Lightweight specifications Rules Rules  Describe correct behavior  Readable/writable by programmers Specify limited properties Specify limited properties  not total correctness/verification Compare rules against code Compare rules against code

6 Types are rules Programmers use types to Programmers use types to  document interface syntax  represent program abstractions Types are written, read and checked Types are written, read and checked  routine part of development process Why are types successful? Why are types successful?  types are lightweight specifications  type checking is fast & routine  errors are found early, at compile-time

7 Can we extend this approach? Specify and check other properties Specify and check other properties  languages to express rules  tools to check that code obeys rules Goal is partial correctness Goal is partial correctness  detect and report important classes of errors  no guarantee of program correctness Systematic tools of various flavors Systematic tools of various flavors  compile-time verifiers and bug-finders  run-time monitors and fault injectors  document generators

8 Source Code Testing Development Precise Rules Program Analysis Engine Read for understanding New API rules Drive testing tools Static Verification Tool Rules 100% path coverage Defects Rule-based programming

9 C/C++ Code OPAL Rules Path-sensitive Dataflow Analysis ESP Rules 100% path coverage Defects ESP

10 Requirements Scalability Scalability  Complete coverage  Millions of lines of code  All features of C/C++ Usability Usability  Low number of false positives  Simple rule description language  Informative error reports

11 The bottom line Can ESP verify a million lines of code? Can ESP verify a million lines of code? We’re not sure …. yet We’re not sure …. yet We’ve done 150 KLOC in 70s and 50MB We’ve done 150 KLOC in 70s and 50MB So, we’re cautiously optimistic So, we’re cautiously optimistic

12 Are we running into a wall? Verification demands precision Verification demands precision  Need to minimize false error reports  Must analyze each execution path Big programs demand scalability Big programs demand scalability  Exponentially/infinitely many paths  Cannot analyze each execution path  Must use approximate analysis

13 Research problem Can we invent a verification method that Can we invent a verification method that  is always conservative,  is always scalable,  is almost always precise, and  matches our intuition? Yes, for a certain class of rules Yes, for a certain class of rules  Finite state, temporal safety properties

14 Finite state safety properties Property is described by an FSA Property is described by an FSA As the program executes, a monitor As the program executes, a monitor  tracks the current state of the FSA  updates the current state  signals an error when the FSA transitions into special error states Goal of verification: Goal of verification:  Is there some execution path that would cause the monitor to signal an error?

15 Example: stdio usage in gcc void main () { if (dump) fil = fopen(dumpFile,”w”); if (p) x = 0; else x = 1; if (dump) fclose(fil); } Closed Opened Error Open Print Open Close Print/Close * void main () { if (dump) Open; if (p) x = 0; else x = 1; if (dump) Close; }

16 Path-sensitive property analysis Symbolically evaluate the program Symbolically evaluate the program Track FSA state and execution state Track FSA state and execution state At branch points: At branch points:  Execution state implies branch direction?  Yes: process appropriate branch  No: split state and process both branches

17 Example [Opened|dump=T] [Closed|dump=T] [Opened|dump=T,p=T] [Opened|dump=T,p=T,x=0] [Closed|dump=T,p=T,x=0] [Closed] entry dump p x = 0x = 1 Open Close exit dump T T T F F F [Opened|dump=T,p=F] [Opened|dump=T,p=F,x=1] [Closed|dump=T,p=F,x=1]

18 Dataflow property analysis Track only FSA state Track only FSA state Ignore non-state-changing code Ignore non-state-changing code At control flow join points: At control flow join points:  Accumulate FSA states

19 Example {Closed,Opened} {Error,Closed,Opened} {Closed} {Closed,Opened} entry dump p x = 0x = 1 Open Close exit dump T T T F F F

20 Why is this code correct? void main () { if (dump) Open; if (p) x = 0; else x = 1; if (dump) Close; } Closed Opened Error Open Print Open Close Print/Close *

21 When is a branch relevant? Precise answer Precise answer  When the value of the branch condition determines the property FSA state Heuristic answer Heuristic answer  When the property FSA is driven to different states along the arms of the branch statement

22 Property simulation Modification of path-sensitive analysis Modification of path-sensitive analysis At control flow join points: At control flow join points:  States agree on property FSA state?   Yes: merge states   No: process states separately

23 Example [Opened|dump=T] [Closed] [Closed|dump=T] [Closed|dump=F] [Opened|dump=T,p=T,x=0] [Opened|dump=T,p=F,x=1] [Opened|dump=T] [Closed|dump=F] [Closed|dump=T] [Closed|dump=F] [Closed] entry dump p x = 0x = 1 Open Close exit dump T T T F F F

24 Loop example entry * new++ Close exit new != old T T F F new = old Open [Closed] [Opened|new=old] [Closed|new=old+1] [Opened|new=old] [Closed|new=old] [Closed|new=old+1]

25 Making property simulation work Real programs are complex Real programs are complex  Multiple FSAs  Aliasing Real code bases are very large Real code bases are very large  Well beyond a million lines ESP = ESP = Property Simulation + Multiple FSAs + Property Simulation + Multiple FSAs + Aliasing + Component-wise Analysis Aliasing + Component-wise Analysis

26 Problem: Multiple FSAs void main () { if (dump1) fil1 = fopen(dumpFile1,”w”); if (dump2) fil2 = fopen(dumpFile2,”w”); if (dump1) fclose(fil1); if (dump2) fclose(fil2); } TransitionSource code pattern Close fclose(e) fclose(e) Open e = fopen(_) e = fopen(_) void main () { if (dump1) fil1 = fopen(dumpFile1,”w”); if (dump2) fil2 = fopen(dumpFile2,”w”); if (dump1) fclose(fil1); if (dump2) fclose(fil2); } void main () { if (dump1) Open(fil1); if (dump2) Open(fil2); if (dump1) Close(fil1); if (dump2) Close(fil2); } Transition Source code pattern Close fclose(e) fclose(e) Open e = fopen(_) e = fopen(_) Closed Opened Error Open Print Open Close Print/Close *

27 Property simulation, bit by bit void main () { if (dump1) Open; if (dump2) ID; if (dump1) Close; if (dump2) ID; } void main () { if (dump1) ID; if (dump2) Open; if (dump1) ID; if (dump2) Close; } Problem: property state can be exponential Problem: property state can be exponential Solution: track one FSA at a time Solution: track one FSA at a time

28 Property simulation, bit by bit One FSA at a time One FSA at a time + Avoids exponential property state + Fewer branches are relevant + Lifetimes are often short + Smaller memory footprint + Embarassingly parallel − Cannot correlate FSAs

29 Problem: Aliasing void main () { if (dump1) fil1 = fopen(dumpFile1,”w”); if (dump2) fil2 = fopen(dumpFile2,”w”); fil3 = fil1; if (dump1) fclose( fil3 ); if (dump2) fclose( fil2 ); }

30 ESP Model: Values Have State During execution, the program During execution, the program  creates stateful values  changes the state of stateful values The programmer defines The programmer defines  how values are created (syntactic patterns)  how values change state (syntactic patterns) Syntactic expressions are aliases for values Syntactic expressions are aliases for values

31 OPAL Rule Descriptions Object Property Automata Language Object Property Automata Language State Closed State Opened State Error Initial Event Open { _object_ ASTFUNCTIONCALL { ASTSYMBOL “fopen” } { _anyargs_ } } Event Close { ASTFUNCTIONCALL { ASTSYMBOL “fclose” } { _object_ } } Transition _ -> Opened on Open Transition Opened -> Closed on Close Transition Closed -> Error on Close “File already closed”

32 Parameterized transitions void main () { if (dump1) fil1 = fopen(dumpFile1,”w”); if (dump2) fil2 = fopen(dumpFile2,”w”); fil3 = fil1; if (dump1) fclose( fil3 ); if (dump2) fclose( fil2 ); }

33 Parameterized transitions void main () { if (dump1) { t1 = fopen(dumpFile1,”w”); Open(t1); fil1 = t1; } if (dump2) { t2 = fopen(dumpFile2,”w”); Open(t2); fil2 = t2; } fil3 = fil1; if (dump1) { fclose( fil3 ); Close(fil3); } if (dump2) { fclose( fil2 ); Close(fil2); }

34 Expressions are value aliases void main () { if (dump1) { t1 = fopen(dumpFile1,”w”); Open(t1); fil1 = t1; } if (dump2) { t2 = fopen(dumpFile2,”w”); Open(t2); fil2 = t2; } fil3 = fil1; if (dump1) { fclose( fil3 ); Close(fil3); } if (dump2) { fclose( fil2 ); Close(fil2); }

35 Value-alias analysis Is expression e an alias for value v? Is expression e an alias for value v? ESP uses GOLF to answer this query ESP uses GOLF to answer this query Generalized One Level Flow Generalized One Level Flow  Context-sensitive  Largely flow-insensitive  Millions of lines of code, in seconds

36 Putting it all together Property simulation Property simulation  Identify and track relevant execution state Syntactic patterns + value-alias analysis Syntactic patterns + value-alias analysis  Identify and isolate individual FSAs One FSA at a time One FSA at a time  Bit vector analysis for safety properties

37 Case study: stdio usage in gcc cc1 from gcc version 2.5.3 (Spec95) cc1 from gcc version 2.5.3 (Spec95) Does cc1 always print to opened files? Does cc1 always print to opened files? cc1 is a complex program: cc1 is a complex program:  140K non-blank, non-comment lines of C  2149 functions, 66 files, 1086 globals  Call graph includes one 450 function SCC

38 Skeleton of cc1 source FILE *f1, …, *f15; int p1, …, p15; void compileFile() { if (p1) f1 = fopen(…); … if (p15) f15 = fopen(…); restOfComp(); if (p1) fclose(f1); … if (p15) fclose(f15); } void restOfComp() { if (p1) printRtl(f1); … if (p15) printRtl(f15); restOfComp(); } void printRtl(FILE *f) { fprintf(f); }

39 OPAL rules for stdio usage State Uninit State Closed State Opened State Error Initial Event Decl {ASTDECLARATION {_object_ ASTSYMBOL _any_}} Initial Event Open {_object_ ASTFUNCTIONCALL {ASTSYMBOL “fopen”} {_anyargs_}} Event Print {ASTFUNCTIONCALL {ASTSYMBOL “fprintf”} {_object_,_anyargs_}} Event Close {ASTFUNCTIONCALL {ASTSYMBOL “fclose”} {_object_}} Transition _ -> Uninit on Decl Transition _ -> Opened on Open Transition Uninit -> Error on Print “File not opened” Transition Opened -> Opened on Print Transition Closed -> Error on Print “Printing to closed file” Transition Opened -> Closed on Close Transition Closed -> Error on Close “File already closed”

40 Experimental results Precision Precision  Verification succeeds for every file handle  No transitions to Error ; no false errors Scalability Scalability  Ave. per handle: 72.9 seconds, 49.7 MB  Single 1GHz PIII laptop with 512 MB RAM We have proved that: We have proved that:  Each of the 646 calls to fprintf in the source code prints to a valid, open file

41 Ongoing research Path-sensitive value-alias analysis Path-sensitive value-alias analysis  Value-alias sets  Expressions that hold tracked value  Track value-alias sets during simulation  Add value-alias sets to property state  When things get complicated, use GOLF Component-wise analysis Component-wise analysis  Identify and analyze components  Link using less precise analysis


Download ppt "ESP: Program Verification Of Millions of Lines of Code Manuvir Das Researcher PPRC Reliability Team Microsoft Research."

Similar presentations


Ads by Google