Scalable Defect Detection Manuvir Das, Zhe Yang, Daniel Wang Center for Software Excellence Microsoft Corporation.

Slides:



Advertisements
Similar presentations
Advanced programming tools at Microsoft
Advertisements

Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Challenges in increasing tool support for programming K. Rustan M. Leino Microsoft Research, Redmond, WA, USA 23 Sep 2004 ICTAC Guiyang, Guizhou, PRC joint.
Formal Specifications on Industrial Strength Code: From Myth to Reality Manuvir Das Principal Researcher Center for Software Excellence Microsoft Corporation.
An Abstract Interpretation Framework for Refactoring P. Cousot, NYU, ENS, CNRS, INRIA R. Cousot, ENS, CNRS, INRIA F. Logozzo, M. Barnett, Microsoft Research.
Things to Remember When Developing 64-bit Software OOO "Program Verification Systems"
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
The Interface Definition Language for Fail-Safe C Kohei Suenaga, Yutaka Oiwa, Eijiro Sumii, Akinori Yonezawa University of Tokyko.
CSE341: Programming Languages Lecture 16 Datatype-Style Programming With Lists or Structs Dan Grossman Winter 2013.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 18.
The Java Programming Language
Software Testing and Quality Assurance
Outline Java program structure Basic program elements
Stacks. 2 What is a stack? A stack is a Last In, First Out (LIFO) data structure Anything added to the stack goes on the “top” of the stack Anything removed.
CS 330 Programming Languages 09 / 16 / 2008 Instructor: Michael Eckmann.
1 Fall 2008ACS-1903 for Loop Reading files String conversions Random class.
Defect Detection at Microsoft – Where the Rubber Meets the Road Manuvir Das (and too many others to list) Program Analysis Group Center for Software Excellence.
From C++ to C#. Web programming The course is on web programming using ASP.Net and C# The course is on web programming using ASP.Net and C# ASP.Net is.
Cs205: engineering software university of virginia fall 2006 Semantics and Specifying Procedures David Evans
JS Arrays, Functions, Events Week 5 INFM 603. Agenda Arrays Functions Event-Driven Programming.
Procedure specifications CSE 331. Outline Satisfying a specification; substitutability Stronger and weaker specifications - Comparing by hand - Comparing.
CS 1031 C++: Object-Oriented Programming Classes and Objects Template classes Operator Overloading Inheritance Polymorphism.
1 CISC181 Introduction to Computer Science Dr. McCoy Lecture 19 Clicker Questions November 3, 2009.
Unleashing the Power of Static Analysis Manuvir Das Principal Researcher Center for Software Excellence Microsoft Corporation.
Towards Scalable Modular Checking of User-defined Properties Thomas Ball, MSR Brian Hackett, Mozilla Shuvendu Lahiri, MSR Shaz Qadeer, MSR Julien Vanegue,
Scalable Defect Detection Manuvir Das, Zhe Yang, Daniel Wang Center for Software Excellence Microsoft Corporation.
Chapter 3 Introduction to Collections – Stacks Modified
Introduction CS 3358 Data Structures. What is Computer Science? Computer Science is the study of algorithms, including their  Formal and mathematical.
Stacks - 2 Nour El-Kadri ITI Implementing 2 stacks in one array and we don’t mean two stacks growing in the same direction: but two stacks growing.
Loops: Handling Infinite Processes CS 21a: Introduction to Computing I First Semester,
Data Structures Chapter 1- Introduction Mohamed Mustaq.A.
SWE 619 © Paul Ammann Procedural Abstraction and Design by Contract Paul Ammann Information & Software Engineering SWE 619 Software Construction cs.gmu.edu/~pammann/
Introduction CS 3358 Data Structures. What is Computer Science? Computer Science is the study of algorithms, including their  Formal and mathematical.
UNIX Files File organization and a few primitives.
Abstract Processes in BPEL4WS Tony Andrews Software Architect Microsoft.
Sound Haskell Dana N. Xu University of Cambridge Joint work with Simon Peyton Jones Microsoft Research Cambridge Koen Claessen Chalmers University of Technology.
Chapter 15 Introduction to PL/SQL. Chapter Objectives  Explain the benefits of using PL/SQL blocks versus several SQL statements  Identify the sections.
Data TypestMyn1 Data Types The type of a variable is not set by the programmer; rather, it is decided at runtime by PHP depending on the context in which.
Chapter 3 Part II Describing Syntax and Semantics.
Bill Campbell, UMB Microsoft's.NET C# and The Common Language Runtime.
Chapter 4 Literals, Variables and Constants. #Page2 4.1 Literals Any numeric literal starting with 0x specifies that the following is a hexadecimal value.
ANU COMP2110 Software Design in 2003 Lecture 10Slide 1 COMP2110 Software Design in 2004 Lecture 12 Documenting Detailed Design How to write down detailed.
L13: Design by Contract Definition Reliability Correctness Pre- and post-condition Asserts and Exceptions Weak & Strong Conditions Class invariants Conditions.
How to execute Program structure Variables name, keywords, binding, scope, lifetime Data types – type system – primitives, strings, arrays, hashes – pointers/references.
Chapter 1 Data Abstraction: The Walls CS Data Structures Mehmet H Gunes Modified from authors’ slides.
PROGRAMMING TESTING B MODULE 2: SOFTWARE SYSTEMS 22 NOVEMBER 2013.
More about Java Classes Writing your own Java Classes More about constructors and creating objects.
PROGRAMMING PRE- AND POSTCONDITIONS, INVARIANTS AND METHOD CONTRACTS B MODULE 2: SOFTWARE SYSTEMS 13 NOVEMBER 2013.
Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.
DBC NOTES. Design By Contract l A contract carries mutual obligations and benefits. l The client should only call a routine when the routine’s pre-condition.
C Part 1 Computer Organization I 1 August 2009 © McQuain, Feng & Ribbens A History Lesson Development of language by Dennis Ritchie at Bell.
Chapter 1 The Phases of Software Development. Software Development Phases ● Specification of the task ● Design of a solution ● Implementation of solution.
Announcements Assignment 2 Out Today Quiz today - so I need to shut up at 4:25 1.
C Programming Day 2. 2 Copyright © 2005, Infosys Technologies Ltd ER/CORP/CRS/LA07/003 Version No. 1.0 Union –mechanism to create user defined data types.
© 2011 Pearson Education, publishing as Addison-Wesley Chapter 1: Computer Systems Presentation slides for Java Software Solutions for AP* Computer Science.
1 Problem Solving  The purpose of writing a program is to solve a problem  The general steps in problem solving are: Understand the problem Dissect the.
CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 17 – Specifications, error checking & assert.
Language-Based Security: Overview of Types Deepak Garg Foundations of Security and Privacy October 27, 2009.
CS 326 Programming Languages, Concepts and Implementation
CSE 374 Programming Concepts & Tools
C Basics.
Variables In programming, we often need to have places to store data. These receptacles are called variables. They are called that because they can change.
Improving software quality using Visual Studio 11 C++ Code Analysis
slides created by Ethan Apter
slides created by Ethan Apter
Fall 2018 CISC124 2/24/2019 CISC124 Quiz 1 marking is complete. Quiz average was about 40/60 or 67%. TAs are still grading assn 1. Assn 2 due this Friday,
Focus of the Course Object-Oriented Software Development
2. Second Step for Learning C++ Programming • Data Type • Char • Float
4/28/2019 6:13 PM HW-889P Advanced driver code analysis techniques Tips and tricks to develop more secure & reliable drivers Dave Sielaff Principal Software.
slides created by Ethan Apter and Marty Stepp
Presentation transcript:

Scalable Defect Detection Manuvir Das, Zhe Yang, Daniel Wang Center for Software Excellence Microsoft Corporation

Part III Lightweight Specifications for Win32 APIs Center for Software Excellence

The Win32 API Win32 API is the layer on which all modern Windows applications are built.NET is built on top, and contains many managed classes that wrap Win32 functionality 3

Business Goals I.Significantly reduce the number of exploitable buffer overruns in Windows Vista II.Change development process so products after Vista are more secure 4

Standard Annotation Language Created in summer June 2002 joint effort with product groups and CSE Specifies programmer intent which leads to: – Better coverage (reduce false negatives) – Reduced noise (reduce false positives) – Ecosystem of tools – High impact results 5

Measured Outcomes Headers for toy application only expose 1/5 th of all Win32 APIs Developers did more than the minimum required for security! #include int _tmain(…) { return 0; } #include int _tmain(…) { return 0; } Mutable String Arguments Functions Total109620,928 Annotated10316,918 6

How We Got There! Code SALInfer MIDL Compiler Manual Annotation SAL espX, IO, MSRC, Prefast, Prefix, Truscan, … Triage Warnings Code FixesSAL Fixes Warnings Drive these to zero! Lock in progress! 7

Focus of This Talk Code SALInfer MIDL Compiler Manual Annotation Manual Annotation SAL espX, IO, MSRC, Prefast, Prefix, Truscan, … espX, IO, MSRC, Prefast, Prefix, Truscan, … Triage Warnings Code FixesSAL Fixes Warnings 8

Technical Design Goals Improves coverage and accuracy of static tools Locks in progress for the future Usable by an average windows developer Cannot break existing Win32 public APIs or force changes in data-structures (i.e. no fat pointers) 9

Technical Design Non-Goals No need to guarantee safety No need to be efficiently checked as part of normal foreground “edit-debug-compile” loop No need to handle all the corner cases No need to be “pretty” 10

Take a Peek Yourself! For MSDN documented Win32 APIs start here or Google “Live Search” for them Annotated headers can be download from Vista SDK first search hit for “Vista SDK” 11

MSDN Documentation for an API 12

memcpy, wmemcpy, (cont) For every API there’s usually a wide version. Many errors are confusing “byte” versus “element” counts For every API there’s usually a wide version. Many errors are confusing “byte” versus “element” counts Just say “No” to bad APIs. Not all the information is relevant to buffer overruns. 13

This unfortunately is a typical Win32 API 14

A common pattern Not so common pattern 15

How to Solve a Problem like MultiByteToWideChar? Start with an approximate specification See how much noise and real bugs you find Power up the tools and refine until you find the next thing to worry about Need conditional null termination to handle case when cbMultiByte is -1 Buffer size weakening handles cbMultiByte 0 case 16

17

A common pattern to communicate buffer sizes to callee Optional reference argument! Optional buffer! 18

Double null termination! 19

Does Your Head Hurt Yet? 20

Does Your Head Hurt Yet? If only C had exceptions, garbage collection, and a better string type the Win32 APIs would be much simpler! 21

Does Your Head Hurt Yet? I WISH IT DID! 22

The Next Best Thing Use the.NET Win32 bindings until it does! 23

The Next Best Thing So when are they going to rewrite Vista in C#? 24

So That’s Why It Took Five Years! Read up about the "Longhorn Reset" nt_of_Windows_Vista Read up about the "Longhorn Reset" nt_of_Windows_Vista 25

So That’s Why It Took Five Years! Intel and AMD will solve this problem eventually! Until then we have SAL. 26

MultiByteToWideChar WINBASEAPI int WINAPI MultiByteToWideChar( __in UINT CodePage, __in DWORD dwFlags, __in_bcount(cbMultiByte) LPCSTR lpMultiByteStr, __in int cbMultiByte, __out_ecount_opt(cchWideChar) LPWSTR lpWideCharStr, __in int cchWideChar); 27

BCryptResolveProvider NTSTATUS WINAPI BCryptResolveProviders( __in_opt LPCWSTR pszContext, __in_opt ULONG dwInterface, __in_opt LPCWSTR pszFunction, __in_opt LPCWSTR pszProvider, __in ULONG dwMode, __in ULONG dwFlags, __inout ULONG* pcbBuffer, __deref_opt_inout_bcount_part_opt(*pcbBuffer, *pcbBuffer) PCRYPT_PROVIDER_REFS *ppBuffer); 28

GetEnvironmentStrings WINBASEAPI __out __nullnullterminated LPCH WINAPI GetEnvironmentStrings( VOID ); 29

End of Section A Questions?

From Types to Program Logics a Recipe for SAL A story inspired by true events

A Recipe for SAL 1)Start with a simple Cyclone like type system 2)Slowly shape it into a powerful program logic for describing common Win32 APIs 3)Add some syntactic sugar and abstraction facilities 4)Mix in a lot of developer feedback 5)Bake it until it’s properly done! It’s getting there but still needs some cooking! 32

Types vs Program Logic Types are used to describe the representation of a value in a given program state Program Logic describe transitions between program states Aside: Each execution step in a type-safe imperative languages preserves types so types by themselves are sufficient to establish a wide class of properties without the need for program logic 33

Concrete Values '\0',..,'a','b','c', … …,-2,-1,0, 1, 2, … ?'a'1 Scalars Pointers Cells 'H''e''l' 'o''\0'? Extent 34

Abstract Values A,B, …,X,Y,Z Some Scalar Some Pointer Some Cell Some Extent … … … n 35

Program State x  y  'a' 1 p  1 Roots Store 36

Well-Typed Program State char x  int y  'a' 1 int* p  1 Roots Store 37

Well-Typed Program State char x  int y  'a' 1 int* p  Roots Store 38

Well-Typed Program States char x  int y  'a' 1 int* p  1 Roots Store C types not descriptive enough to avoid errors 39

Well-Typed Program States char x  int y  'a' int* p  1 Roots Store Use Cyclone style qualifiers to be more precise! 40

Generalizing int* buf  123 int cbuf  int* buf  … … … N What's wrong with this? 41

Is it Initialized or Not? int* buf  123 1?? 1?3 42

@extent(3,3) int* buf  int* buf  int* buf  1?3 Just give up here! 43

Refined Abstract Extent … … … n m Initialized count Extent capacity where m <= n 44

Some Special Cases … … … m when m == n … … … n when m == 0 Fully initialized extent Some allocated extent 45

int* buf  ??? int cbuf  int* buf  … … … N 46

Qualified Types Useful for Win32 APIs t ::= int | void | char | t* | q 1 … q n t q 1, e 1 ) 1, e 2 ) 1, e 2 ) op ::= == | = | != e ::= …. 47

A Qualifed Type @numelts(count) const void *src, size_t count) It seems to work? What's wrong? 48

Which One is Right? int *p) { *p = 1; } int *p) { *p = 1; } Types don’t capture the state transition! 49

Program State int* p  int* p  1 f(&p); Post-condition Pre-condition Pre-post pair make a up a contract! 50

Contracts with Program Logics @numelts(1) } int *p) { *p = 1; } 51

Contracts with Program Logics } int *p) { *p = 1; } Simplify because C is call by value! 52

Contracts with Program Logics } int *p) { *p = 1; } Who in their right mind is going to write that! 53

Contracts with Program Logics #define @alloced(1) } void f(__out int *p) { *p = 1; } C Preprocessor macros to the rescue! Defined to empty string for compatibility. C Preprocessor macros to the rescue! Defined to empty string for compatibility. 54

Single Element Contracts #define } #define @alloced(1) } #define @numelts(1) } 55

Single Element Contracts #define } #define @alloced(1) } #define @numelts(1) } 56

Contracts for Element Extents #define } #define @alloced(cap) } #define @extent(count,cap) } Note order of args 57

Contracts for Element Extents #define __out_ecount_full(e) \ __out_ecount_part(e,e) #define __inout_ecount_full(e) … /* opt versions */ #define __in_ecount_opt(e) … #define __out_ecount_part_opt(cap,count) … #define __inout_ecount_part_opt(cap,count) … #define __out_ecount_full_opt(e) … #define __inout_ecount_full_opt(e) … 58

Contracts for Byte Extents #define __in_bcount(e) … #define __out_bcount_part(cap,count) … #define __inout_bcount_part(cap,count) … #define __out_bcount_full(e) … #define __inout_bcount_full(e) … #define __in_bcount_opt(e) … #define __out_bcount_part_opt(cap,count) … #define __inout_bcount_part_opt(cap,count) … #define __out_bcount_full_opt(e) … #define __inout_bcount_full_opt(e) … 59

Developers can learn a small set of macros and be productive quickly Distribution of macros used across Vista source base. 60

Contract for memcpy __out_bcount_full(count) void* memcpy( __out_bcount_full(count) void *dest, __in_bcount(count) const void *src, size_t count); Ignore meaningless pre-condition 61

What about pointers to pointers? void f( __out int*)* p) { static int l = 3; if(…) *p = NULL; else *p = &l; } void f(__deref_out_opt int **p) { … } Syntax makes applying automatically inferred annotations to legacy code tractable! 62

How We Got There! Code SALInfer MIDL Compiler Manual Annotation SAL espX, IO, MSRC, Prefast, Prefix, Truscan, … Triage Warnings Code FixesSAL Fixes Warnings Inference bootstrapped everything! 63

What about Nested Pointers? #define @alloced(1) @numelt(1) } Pushes context of assertion down a pointer level 64

Annotated Types for Win32 APIs t ::= int | char | void | t* | t at ::= a 1 … a n t p 1, e 1 ) | … a { p 1 … p n } { p 1 … p n } | p op ::= … e ::= …. Actual primitive syntax is different. Just use the macros! Your code will be non-portable if you don't! Annotated type split into annotations and type, Not mixed in as type qualifiers Annotated type split into annotations and type, Not mixed in as type qualifiers 65

What About This Case? bool f(__out_opt int *p) { if(p != NULL) { *p = 1; return true; } return false } Need to introduce conditional contracts! 66

Adding __success(cond) Most conditional behavior is related to error handling protocols (i.e. exceptions via return codes) Introduce specialized construct for this case __success(expr) f(…); means Post-conditions only hold when "expr" is true (non-zero) on return of function. Full conditional support on the roadmap! 67

Using Success __success(return == true) bool f(__out_opt int *p) { if(p != NULL) { *p = 1; return true; } return false } Is _opt the right thing? 68

Using Success Correctly! __success(return == true) bool f(__out int *p) { if(p != NULL) { *p = 1; return true; } return false } Annotate for successful case! 69

70

Contracts For StringCchCat HRESULT StringCchCat( __post __nullterminated __out LPTSTR pszDect, __range(0,STRSAFE_MAX_CCH) size_t cchDest, __nullterminated __in LPCTSTR pszSrc); Much more verbose than we'd like! 71

Types with Contracts For StringCchCat typedef __nullterminated TCHAR* LPSTR; typedef const LPSTR LPCSTR; typedef __range(0,STRSAFE_MAX_CCH) size_t STRSIZE; HRESULT StringCchCat( __out LPTSTR pszDect, __in STRSIZE cchDest, __in LPCTSTR pszSrc); Must mean null terminated only in post condition! 72

New TCHAR* LPSTR; LPSTR s) { s[0]='\0'; } Annotations associated with types only happen when an extent is "valid" 73

Memory Semantics Revisited ? Allocated Initialized Valid Can be written to but nothing is known about its contents The contents are in a known state Type specific properties hold 74

Lifecycle of a LPTSTR s  ??? LPTSTR s  LPTSTR s  'a''\0'? Valid 75

Validity: Related Work Validity is a lot like the Boogie methodology used in Spec# – Not as general since validity is just baked into macros – Many things are conditionally valid because of __success – Full conditional pre/post will allow more flexibility Even without it we can do some interesting with Objects – Treat them like structs! – Added in a few defaults 76

Structure Annotations Describes properties of buffers embedded in structs/classes Three scenarios supported – Outlined structure buffers – Structs with inline buffers – Header structs Structure descriptions interact with __in, __out, and __inout to determine pre/post rules for functions using structure buffers 77

struct buf { int n; __field_ecount(n) int *data; }; struct ibuf { int n; __field_ecount(n) int data[1]; }; __struct_bcount(n * sizeof(int)) struct hbuf { int n; int data[1]; }; …… n n … n … n … n … n 78

struct buf { int n; __field_ecount(n) int *data; }; …… n n 0 Zero Sized Buffers and NULL _opt versions available but generally not needed 79

SAL Annotations for Classes class Stack { public: Stack(int max); // Stack(__out Stack *this,int max); int Pop(); // int Pop(__inout Stack *this); void Push(int v); // void Push(__inout Stack *this,int v); ~Stack(); // treated specially private: int m_max; int m_top; __field_ecount_part(m_max,m_top) int *m_buf; }; 80

Conclusions Developers will accept the use of appropriate light weight specifications! But must understand the problem and tailor custom solutions Generic recipe: 1)Write the problem down. 2)Think real hard. 3)Write the solution down. 4)Repeat! 81

Questions?

© 2007 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. 83