Scalable Defect Detection Manuvir Das, Zhe Yang, Daniel Wang Center for Software Excellence Microsoft Corporation.

Slides:



Advertisements
Similar presentations
Challenges in increasing tool support for programming K. Rustan M. Leino Microsoft Research, Redmond, WA, USA 23 Sep 2004 ICTAC Guiyang, Guizhou, PRC joint.
Advertisements

Verification and Validation
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
Building a Better Backtrace: Techniques for Postmortem Program Analysis Ben Liblit & Alex Aiken.
Building a Better Backtrace: Techniques for Postmortem Program Analysis Ben Liblit & Alex Aiken.
Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
Background for “KISS: Keep It Simple and Sequential” cs264 Ras Bodik spring 2005.
Programming Types of Testing.
Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.
Taming Win32 Threads with Static Analysis Jason Yang Program Analysis Group Center for Software Excellence (CSE) Microsoft Corporation.
Final exam week Three things on finals week: –final exam –final project presentations –final project report.
Speeding Up Dataflow Analysis Using Flow- Insensitive Pointer Analysis Stephen Adams, Tom Ball, Manuvir Das Sorin Lerner, Mark Seigle Westley Weimer Microsoft.
PowerPoint Presentation for Dennis, Wixom & Tegarden Systems Analysis and Design Copyright 2001 © John Wiley & Sons, Inc. All rights reserved. Slide 1.
Software Reliability Methods Sorin Lerner. Software reliability methods: issues What are the issues?
ESP: Program Verification Of Millions of Lines of Code Manuvir Das Researcher PPRC Reliability Team Microsoft Research.
ISBN Lecture 01 Preliminaries. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.1-2 Lecture 01 Topics Motivation Programming.
Software Excellence via Program Verification at Microsoft Manuvir Das Center for Software Excellence Microsoft Corporation.
ESP [Das et al PLDI 2002] Interface usage rules in documentation –Order of operations, data access –Resource management –Incomplete, wordy, not checked.
PASTE at Microsoft Manuvir Das Center for Software Excellence Microsoft Corporation.
Defect Detection at Microsoft – Where the Rubber Meets the Road Manuvir Das (and too many others to list) Program Analysis Group Center for Software Excellence.
ECE122 L17: Method Development and Testing April 5, 2007 ECE 122 Engineering Problem Solving with Java Lecture 17 Method Development and Testing.
Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft.
Chapter 3 Planning Your Solution
Software Testing and QA Theory and Practice (Chapter 4: Control Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.
Understanding Active Directory
1CMSC 345, Version 4/04 Verification and Validation Reference: Software Engineering, Ian Sommerville, 6th edition, Chapter 19.
Python quick start guide
Dr. Pedro Mejia Alvarez Software Testing Slide 1 Software Testing: Building Test Cases.
Success status, page 1 Collaborative learning for security and repair in application communities MIT & Determina AC PI meeting July 10, 2007 Milestones.
Abstraction IS 101Y/CMSC 101 Computational Thinking and Design Tuesday, September 17, 2013 Carolyn Seaman University of Maryland, Baltimore County.
Verification and Validation Yonsei University 2 nd Semester, 2014 Sanghyun Park.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Verification and Validation.
Unleashing the Power of Static Analysis Manuvir Das Principal Researcher Center for Software Excellence Microsoft Corporation.
CS 501: Software Engineering Fall 1999 Lecture 16 Verification and Validation.
Mining Windows Kernel API Rules Jinlin Yang 09/28/2005CS696.
Control Flow Resolution in Dynamic Language Author: Štěpán Šindelář Supervisor: Filip Zavoral, Ph.D.
Building Robust and Reliable Software Jason Anderson, Microsoft
Slide 1 Construction (Testing) Chapter 15 Alan Dennis, Barbara Wixom, and David Tegarden John Wiley & Sons, Inc. Slides by Fred Niederman Edited by Solomon.
Debugging in Java. Common Bugs Compilation or syntactical errors are the first that you will encounter and the easiest to debug They are usually the result.
Testing and Debugging Version 1.0. All kinds of things can go wrong when you are developing a program. The compiler discovers syntax errors in your code.
Static Program Analyses of DSP Software Systems Ramakrishnan Venkitaraman and Gopal Gupta.
Building More Reliable And Better Performing Web Applications With Visual Studio 2005 Team System Gabriel Marius TLN312 Program Manager Microsoft Corporation.
Project Portfolio Management Business Priorities Presentation.
Convergence of Model Checking & Program Analysis Philippe Giabbanelli CMPT 894 – Spring 2008.
Design - programming Cmpe 450 Fall Dynamic Analysis Software quality Design carefully from the start Simple and clean Fewer errors Finding errors.
Chapter 1: Fundamental of Testing Systems Testing & Evaluation (MNN1063)
Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.
CSCI 161 Lecture 3 Martin van Bommel. Operating System Program that acts as interface to other software and the underlying hardware Operating System Utilities.
CS223: Software Engineering Lecture 21: Unit Testing Metric.
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
The PLA Model: On the Combination of Product-Line Analyses 강태준.
Jeremy Nimmer, page 1 Automatic Generation of Program Specifications Jeremy Nimmer MIT Lab for Computer Science Joint work with.
Optimistic Hybrid Analysis
Building Enterprise Applications Using Visual Studio®
Types for Programs and Proofs
Control Flow Testing Handouts
Handouts Software Testing and Quality Assurance Theory and Practice Chapter 4 Control Flow Testing
Automated Pattern Based Mobile Testing
Outline of the Chapter Basic Idea Outline of Control Flow Testing
Improving software quality using Visual Studio 11 C++ Code Analysis
Human Complexity of Software
Verification and Validation Unit Testing
ESP: Program Verification Of Millions of Lines of Code
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
Dynamic Program Analysis
Making Windows Azure Relevant to IT Professionals
Software Testing.
Presentation transcript:

Scalable Defect Detection Manuvir Das, Zhe Yang, Daniel Wang Center for Software Excellence Microsoft Corporation

What this series covers Various techniques for static analysis of large, real imperative programs Lessons from our experience building static analyses in the “real world” –“real world” == Microsoft product teams A pragmatic methodology for mixing specifications with static analysis

Who we are Microsoft Corporation –Center for Software Excellence Program Analysis Group –10 full time people including 7 PhDs We are program analysis researchers –But we measure our success by impact –CSE impact on Windows Vista Developers fixed ~100K bugs that we found Developers added ~500K specifications we designed We answered thousands of developer s

The real world Code on a massive scale –10s of millions of lines of code –Many configurations & code branches Developers on a massive scale –Small mistakes in tools are magnified –Small developer overheads are magnified Defects on a massive scale –Bug databases and established processes rule –Defect classes repeat, both across code bases and across defect properties

Code in the real world Main Branch Team Branch Desktop Team Branch Team Branch Desktop ……

Process in the real world – 1 An opportunity for lightweight tools –“always on” on every developer desktop –issues tracked within the program artifacts –enforcement by rejection at “quality gate” Speed, suppression, determinism Main Branch Team Branch Desktop Team Branch Team Branch Desktop ……

Process in the real world – 2 An opportunity for heavyweight tools –run routinely after integration in main branch –issues tracked through a central bug database –enforcement by developer “bug cap” Scale, uniqueness, defect management Team Branch Desktop Team Branch Team Branch Desktop …… Main Branch

Implications for analysis Scale, scale, scale –Should be run routinely on massive scale High accuracy –Ratio of bugs “worth fixing” should be high High clarity –Defect reports must be understandable Low startup cost –Developer effort to get results must be low High return on investment –More developer effort should reveal more bugs High agility –New defect detection tools should be easy to produce

Our solutions over time Gen 1: Manual Review –Too many code paths to think about Gen 2: Massive Testing –Inefficient detection of simple errors Gen 3: Global Program Analysis –Delayed results Gen 4: Local Program Analysis –Lack of calling context limits accuracy Gen 5: Formal interface specifications

Contrast this with … Build it into the language –e.g. memory management in Java/C# If not, then fix it with a type system –e.g. memory safety in Cyclone If not, then add formal specifications –e.g. memory defect detection in ESC/Java If not, then find bugs with static analysis –e.g. memory defect detection in PREfix If not, then find bugs with dynamic analysis

Our approach to scale Scalable whole program analysis –Combine lightweight analysis everywhere with heavyweight analysis in just the right places Accurate modular analysis –Assume availability of function-level pre- conditions and post-conditions –Powerful analysis + defect bucketing Programmer supplied specifications –Designed to be developer friendly –Automatically inferred via global analysis

… explained in 3 lectures Scalable whole program analysis –ESP –Manuvir Das Accurate modular analysis –espX, μSPACE –Zhe Yang Programmer supplied specifications –SAL, SALinfer –Daniel Wang

Scalable Whole Program Analysis

Safety properties Dynamic checking –Instrument the program with a monitor –Fail if the monitor enters a bad state What is a safety property? –Anything that can be monitored Static checking –Simulate all possible executions of the instrumented program

Example void main () { if (dump) fil = fopen(dumpFile,”w”); if (p) x = 0; else x = 1; if (dump) fclose(fil); } Closed Opened Error Open Print Open Close Print/Close * void main () { if (dump) Open; if (p) x = 0; else x = 1; if (dump) Close; }

Symbolic evaluation Execute multiple paths in the program –Use symbolic values for variables –Execution state = Symbolic state + Monitor state Assignments & function calls: –Update execution state Branch points: –Does execution state imply branch direction? –Yes: process appropriate branch –No: split & update state, process branches Merge points: –Collapse identical states

Example [Opened|dump=T] [Closed] [Closed|dump=F] [Opened|dump=T,p=T,x=0] [Opened|dump=T,p=F,x=1] entry dump p x = 0x = 1 Open Close exit dump T T T F F F [Closed|dump=T,p=T,x=0] [Closed|dump=T,p=F,x=1]

Example [Opened|dump=T] [Closed] [Closed|dump=F] [Closed|dump=F,p=T,x=0] [Closed|dump=F,p=F,x=1] entry dump p x = 0x = 1 Open Close exit dump T T T F F F [Closed|dump=F,p=T,x=0] [Closed|dump=F,p=F,x=1]

Assessment [+] can make this arbitrarily precise [+] can show debugging traces [-] may not scale (exponential paths) [-] may not terminate (loops) PREfix (SP&E 2000) – explore a subset of all paths

Dataflow analysis Merge points: –Collapse all states –One execution state per program point

Example [Opened|dump=T] [Closed] [Closed|dump=F] [Opened/Closed|p=T,x=0] [Opened/Closed|p=F,x=1] [Closed/Error] entry dump p x = 0x = 1 Open Close exit dump T T T F F F [Opened/Closed]

Assessment [-] precision is limited [-] cannot show debugging traces [+] scales well [+] terminates CQual (PLDI 2002) – apply to type-based properties

Property simulation Merge points: –Do states agree on monitor state? –Yes: merge states –No: process states separately ESP –PLDI 2002, ISSTA 2004

Example [Opened|dump=T] [Closed] [Closed|dump=F] [Opened|dump=T,p=T,x=0] [Opened|dump=T,p=F,x=1] [Opened|dump=T] [Closed|dump=F] [Closed] entry dump p x = 0x = 1 Open Close exit dump T T T F F F [Closed|dump=T] [Closed|dump=F] [Closed|dump=T]

Assessment [=] is usually precise [=] can usually show debugging traces [=] usually scales well [=] usually terminates ESP –a pragmatic compromise

Multiple state machines void main () { if (dump1) fil1 = fopen(dumpFile1,”w”); if (dump2) fil2 = fopen(dumpFile2,”w”); if (dump1) fclose(fil1); if (dump2) fclose(fil2); } Code Pattern Monitor Transition e = fopen(_)Open(e) fclose(e)Close(e)

Multiple state machines void main () { if (dump1) Open(fil1); if (dump2) Open(fil2); if (dump1) Close(fil1); if (dump2) Close(fil2); } Code Pattern Monitor Transition e = fopen(_)Open(e) fclose(e)Close(e)

Bit vector analysis void main () { if (dump1) Open; if (dump2) ID; if (dump1) Close; if (dump2) ID; } void main () { if (dump1) ID; if (dump2) Open; if (dump1) ID; if (dump2) Close; }

Bit vector analysis Source to sink safety properties –Sources: Object creation points or function/component entry points –Sinks: Transitions to error state Analyze every source independently –Requires (exponentially) less memory –Spans smaller segments of code –Parallelizes easily

Memory aliasing void main () { if (dump) fil = fopen(dumpFile,”w”); pfil = &fil; if (dump) fclose(*pfil); } Does Event( exp ) invoke a transition on the monitor for location l ?

Value alias analysis Precise alias analysis is expensive Solution: value alias sets (ISSTA 04) –For a given execution state, which syntactic expressions refer to location l ? Must and May sets for accuracy Transfer functions to update these Observation: We can make value alias analysis path-sensitive by tracking value alias sets as part of monitor state

Selective merging Property simulation is really an instance of a more general analysis approach Selective merging –Define a projection on symbolic states –Define equality on projections –Ensure that domain of projections is finite Merge points: –Do states agree on projection? –Yes: merge states –No: process states separately Examples –Value flow analysis, call graph analysis

ESP at Microsoft Windows Vista IssueFixedNoise Security – RELOJ3864% Security – Impersonation Token13510% Security – OpenView542% Leaks – RegCloseHandle630% In Progress IssueFound Localization – Constant strings1214 Security – ClientID282 …

Summary Scalable whole program analysis –Combine lightweight analysis everywhere with heavyweight analysis in just the right places Accurate modular analysis –Assume availability of function-level pre- conditions and post-conditions –Powerful analysis + defect bucketing Programmer supplied specifications –Designed to be developer friendly –Automatically inferred via global analysis

© 2007 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. research.microsoft.com/manuvir