“Tracking Pointers with Path and Context Sensitivity for Bug Detection in C Programs” CMSC 838Z – Spring 2004 V. Benjamin Livshits and Monica S. Lam presented.

Slides:



Advertisements
Similar presentations
Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.
Advertisements

Static Analysis for Security
Finding Security Violations by Using Precise Source- level Analysis by V.Benjamin Livshits and Monica Lam {livshits, SUIF Group CSL,
Runtime Techniques for Efficient and Reliable Program Execution Harry Xu CS 295 Winter 2012.
C Language.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Programming Languages and Paradigms
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
3-Valued Logic Analyzer (TVP) Tal Lev-Ami and Mooly Sagiv.
CS412/413 Introduction to Compilers Radu Rugina Lecture 37: DU Chains and SSA Form 29 Apr 02.
SPLINT STATIC CHECKING TOOL Sripriya Subramanian 10/29/2002.
Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.
Gabe Kanzelmeyer CS 450 4/14/10.  What is buffer overflow?  How memory is processed and the stack  The threat  Stack overrun attack  Dangers  Prevention.
Semi-Sparse Flow-Sensitive Pointer Analysis Ben Hardekopf Calvin Lin The University of Texas at Austin POPL ’09 Simplified by Eric Villasenor.
Chapter 10.
Precise Inter-procedural Analysis Sumit Gulwani George C. Necula using Random Interpretation presented by Kian Win Ong UC Berkeley.
4 July 2005 overview Traineeship: Mapping of data structures in multiprocessor systems Nick de Koning
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
Aliases in a bug finding tool Benjamin Chelf Seth Hallem June 5 th, 2002.
Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.
1 1 Lecture 4 Structure – Array, Records and Alignment Memory- How to allocate memory to speed up operation Structure – Array, Records and Alignment Memory-
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
Context-Sensitive Flow Analysis Using Instantiation Constraints CS 343, Spring 01/02 John Whaley Based on a presentation by Chris Unkel.
ReferencesReferences DiscussionDiscussion Vulnerability Example: SQL injection Auditing Tool for Eclipse LAPSE: a Security Auditing Tool for Eclipse IntroductionIntroductionResultsResults.
1 Static Analysis for Bug Finding Benjamin Livshits.
Interprocedural pointer analysis for C We’ll look at Wilson & Lam PLDI 95, and focus on two problems solved by this paper: –how to represent pointer information.
Previous finals up on the web page use them as practice problems look at them early.
Range Analysis. Intraprocedural Points-to Analysis Want to compute may-points-to information Lattice:
Run time vs. Compile time
Intraprocedural Points-to Analysis Flow functions:
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
Finding the Weakest Characterization of Erroneous Inputs Dzintars Avots and Benjamin Livshits.
ESP [Das et al PLDI 2002] Interface usage rules in documentation –Order of operations, data access –Resource management –Incomplete, wordy, not checked.
Run-time Environment and Program Organization
Comparison Caller precisionCallee precisionCode bloat Inlining context-insensitive interproc Context sensitive interproc Specialization.
Tracking Pointers with Path and Context Sensitivity for Bug Detection in C Programs by V.Benjamin Livshits and Monica S. Lam {livshits,
1 CSC 1401 S1 Computer Programming I Hamid Harroud School of Science and Engineering, Akhawayn University
Reps Horwitz and Sagiv 95 (RHS) Another approach to context-sensitive interprocedural analysis Express the problem as a graph reachability query Works.
An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages John Whaley Monica S. Lam Computer Systems Laboratory Stanford University.
Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.
Security Exploiting Overflows. Introduction r See the following link for more info: operating-systems-and-applications-in-
Control Flow Resolution in Dynamic Language Author: Štěpán Šindelář Supervisor: Filip Zavoral, Ph.D.
Computer Science Detecting Memory Access Errors via Illegal Write Monitoring Ongoing Research by Emre Can Sezer.
CMPSC 16 Problem Solving with Computers I Spring 2014 Instructor: Tevfik Bultan Lecture 12: Pointers continued, C strings.
Fast Points-to Analysis for Languages with Structured Types Michael Jung and Sorin A. Huss Integrated Circuits and Systems Lab. Department of Computer.
COMPILERS Symbol Tables hussein suleman uct csc3003s 2007.
Static Program Analysis of Embedded Software Ramakrishnan Venkitaraman Graduate Student, Computer Science Advisor: Dr. Gopal Gupta.
Overflow Examples 01/13/2012. ACKNOWLEDGEMENTS These slides where compiled from the Malware and Software Vulnerabilities class taught by Dr Cliff Zou.
An Undergraduate Course on Software Bug Detection Tools and Techniques Eric Larson Seattle University March 3, 2006.
ESEC/FSE-99 1 Data-Flow Analysis of Program Fragments Atanas Rountev 1 Barbara G. Ryder 1 William Landi 2 1 Department of Computer Science, Rutgers University.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.
Using Types to Analyze and Optimize Object-Oriented Programs By: Amer Diwan Presented By: Jess Martin, Noah Wallace, and Will von Rosenberg.
Pointer Analysis Survey. Rupesh Nasre. Aug 24, 2007.
Pointer Analysis for Multithreaded Programs Radu Rugina and Martin Rinard M I T Laboratory for Computer Science.
Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler.
© 2006 Carnegie Mellon University Introduction to CBMC: Part 1 Software Engineering Institute Carnegie Mellon University Pittsburgh, PA Arie Gurfinkel,
Pointer Analysis – Part I CS Pointer Analysis Answers which pointers can point to which memory locations at run-time Central to many program optimization.
D A C U C P Speculative Alias Analysis for Executable Code Manel Fernández and Roger Espasa Computer Architecture Department Universitat Politècnica de.
Announcements You will receive your scores back for Assignment 2 this week. You will have an opportunity to correct your code and resubmit it for partial.
Chapter 4 Static Analysis. Summary (1) Building a model of the program:  Lexical analysis  Parsing  Abstract syntax  Semantic Analysis  Tracking.
Chapter 10 Chapter 10 Implementing Subprograms. Implementing Subprograms  The subprogram call and return operations are together called subprogram linkage.
© 2006 Carnegie Mellon University Introduction to CBMC: Part 1 Software Engineering Institute Carnegie Mellon University Pittsburgh, PA Arie Gurfinkel,
Software Security. Bugs Most software has bugs Some bugs cause security vulnerabilities Incorrect processing of security related data Incorrect processing.
Inter-procedural analysis
INFORMATION-FLOW ANALYSIS OF ANDROID APPLICATIONS IN DROIDSAFE JARED YOUNG.
University of Virginia Computer Science Extensible Lightweight Static Checking David Evans On the I/O.
Hongtao Yu Wei Huo ZhaoQing Zhang XiaoBing Feng
Pointer analysis.
Symbol Table 薛智文 (textbook ch#2.7 and 6.5) 薛智文 96 Spring.
Pointer Analysis Jeff Da Silva Sept 20, 2004 CARG.
Presentation transcript:

“Tracking Pointers with Path and Context Sensitivity for Bug Detection in C Programs” CMSC 838Z – Spring 2004 V. Benjamin Livshits and Monica S. Lam presented by Mujtaba Ali (Based on a presentation given at ACM SIGSOFT FSE-11, September 2003)

2 Bugs are Bad Costs lots of money to fix after deployment Nastiest bugs == security violations –Hard to discover effects of malicious attacks –Legal/liability issues

3 CVE Classification 62% Would like to address these Study of security reports in 2002 –source: SecurityFocus.com

4 Motivation Goal: Report hard-to-find security violations –Errors spanning many functions and files –Reduce false positives Applications: –Buffer overflows –Format string vulnerabilities

5 Examples Two security vulnerabilities: –Buffer overrun in gzip, compression utility –Format string violation in muh, network game Cause: Unsafe use of user-supplied data –gzip copies data to statically-sized buffer may result in an overrun –muh uses data as format argument call to vsnprintf user can maliciously embed %n into format string

6 Buffer Overrun in gzip 0592 while (optind < argc) { 0593 treat_file(argv[optind++]); 0594 } 0704 local void treat_file(char *iname){ if (get_istat(iname, &istat) != OK) retu rn; 0997 local int get_istat(char *iname, struct stat *sbuf){ strcpy(ifname, iname); gzip.c:593 gzip.c:1009 gzip.c:716 Need a model of strcpy 0233 char ifname[MAX_PATH_LEN]; /*input file name*/ gzip.c:233

7 Format String Violation in muh 0838 s = ( char * )malloc( 1024 ); 0839 while( fgets( s, 1023, messagelog ) ) { 0841 irc_notice(&c_client, status.nickname, s); 0842 } 0843 FREESTRING( s ); 257 void irc_notice(con_type *con, char nick[], char *format,... ){ 259 va_list va; 260 char buffer[ BUFFERSIZE ]; va_start( va, format ); 263 vsnprintf( buffer, BUFFERSIZE - 10, format, va ); muh.c:839 irc.c:263

8 Easy Bugs are Boring Programs have security violations despite code reviews and years of use Common observation about hard errors: –Errors on interface boundaries – need to follow data flow between procedures –Errors occur along complicated control-flow paths: need to follow long definition-use chains

9 Need for Alias Analysis Both examples involved complex flow of data Tracking data flow in modern languages requires alias analysis Steensgaard’s or Andersen’s analysis? –Flow- and context- insensitive –Fast, but imprecise: too many false positives –But flow- and context- sensitive analyses do not scale

10 Tradeoff: Scalability vs Precision slow and expensive Speed / Scalability Precision low high Steensgaard 3-value logic This analysis Andersen Wilson & Lam fast

11 A Hybrid Analysis Analyze precisely: –Local variables –Function parameters –Global variables –…their dereferences and fields –These are essentially access paths, i.e. p.next.data. The rest: Break into equivalence classes Represent by abstract locations: –Recursive data structures –Arrays –Locations accessed through pointer arithmetic Maintain precision selectively

12 Linearity Regular assignments result in strong updates Assignments to abstract memory locations – weak updates x = 1; x 0 = 1 x = 2; x 1 = 2 y = x; y 0 = x 1 x is 2 A[i] = 1; m 0 = 1 A[j] = 2; m 1 =  (m 0, 2) b = A[k]; b 0 = m 1 Either 1 or 2

13 Static Single Assignment Sparse representation of program Propagate facts about definition where needed –Definition-use relationships Each variable is defined only once –Give new names (subscripts) to definitions –Solves flow-sensitivity problem But a definition may be used many times

14 SSA: Join points In standard SSA, use  function at joins –d3 =  (d1, d2) In Gated SSA, use  function at joins –d3 =  (, ) –Solves path-sensitivity problem

15 IPSSA – Intraprocedurally Extension of Gated SSA Provides pointer resolution –Replace indirect pointer dereferences with direct accesses of potentially new temporary locations

16 Example of Pointer Resolution Load resolution Store resolution int a=0,b=1; int c=2,d=3; if(Q){ p = &a; }else{ p = &b; } c = *p; *p = d; p 1 = &a p 2 = &b p 3 =  (, ) c 1 =  (, ) a 1 =  (, ) b 1 =  (, ) a 0 = 0, b 0 = 1 c 0 = 2, d 0 = 3

17 Pointer Resolution Rules When resolving definition d, next step depends on RHS of d Expressed as conditional rewrite rules A few sample rules: –d = &x, result is x –d =  (…), result is d^ –d =  (,…, ), follow d 1 …d n Refer to the paper for details

18 Interprocedural Example Data flow in and out of functions: –Create links between formal and actual parameters –Reflect stores and assignments to globals at the callee int f(int* p){ *p = 100; } int main(){ int x = 0; int *q = &x; c: f(q); } p 0 =  ( ) p^ 1 = 100 x 1 = ρ( ) q 0 = &x Formal-actual connection for call site c Reflect store inside of f within main x 0 = 0

19 Unsound Unaliasing Assumption A1: No aliased parameters A2: No aliased abstract locations Assumption Locations accessible through different parameters are distinct Things pulled out of an abstract location is not aliased Justification Matches how good interfaces are written Holds in most usage cases Consequence Context-independent procedure summaries Give unique names when we get data from abstract location

20 Interprocedural Algorithm Process one SCC of the call graph at a time –Bottom-up –SCC == strongly-connected component For each SCC, within each procedure: –Resolve all pointer operations (loads and stores) –Create links between formal and actual parameters –Reflect stores and assignments to globals at call sites Iterate within SCC until the representation stabilizes

21 Framework Program sources IPSSA construction Buffer overruns Error traces Format violations …others… Abstracts away many details. Makes it easy to write tools IP data flow info Framework makes it easy to add new analyses

22 Application 1.Start at roots – sources of user input such as –argv[] elements –Input functions: fgets, gets, recv, getenv, etc. 2.Follow data flow chains provided by IPSSA: for every definition, IPSSA provides a list of its uses 3.A sink is a potentially dangerous usage such as –A buffer of a statically defined length –A format argument of vulnerable functions: printf, fprintf, snprintf, vsnprintf 4.Report bug, record full path

23 Experimental Setup Implementation w/ SUIF2 compiler suite –Pentium IV 2GHz, 2GB of RAM running Linux

24 Summary of Results Many definitions Many procedures

25 False Positive in pcre Copying “tainted” user data to a statically- sized buffer may be unsafe –Turns out to be safe in this case sprintf(buffer, “%.512s”, filename) Limits the length of copied data. Buffer is big enough! Tainted data

26 Related Work xgcc –Also unsound –Many more false positives –Even “regular” developers can specify new analyses Cqual –Interprocedural, sound analysis –Requires annotations –Flow-, context-, and path-insensitive More false positives

27 The Good Unsoundness gives very few false positives Unsoundness allows efficient path- and context-sensitive analysis Tested on real code Good presentations to steal slides from

28 The Bad Cqual is much improved now –Very few false positives on these types of security vulnerabilities –Let’s see some more interesting vulnerabilities Does not detect buffer overflows due to array bounds violations Not tested with large programs

29 Singular Key Idea IPSSA coupled with a (slightly) unsound alias analysis facilitates efficient detection of hard- to-find security violations with very few false positives