Slide 1 Vitaly Shmatikov CS 380S Static Defenses against Memory Corruption.

Slides:



Advertisements
Similar presentations
Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.
Advertisements

Advanced programming tools at Microsoft
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Buffer Overflows Nick Feamster CS 6262 Spring 2009 (credit to Vitaly S. from UT for slides)
De necessariis pre condiciones consequentia sine machina P. Consobrinus, R. Consobrinus M. Aquilifer, F. Oratio.
A Program Transformation For Faster Goal-Directed Search Akash Lal, Shaz Qadeer Microsoft Research.
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
The Interface Definition Language for Fail-Safe C Kohei Suenaga, Yutaka Oiwa, Eijiro Sumii, Akinori Yonezawa University of Tokyko.
Names and Bindings.
CSc 352 Programming Hygiene Saumya Debray Dept. of Computer Science The University of Arizona, Tucson
INF 212 ANALYSIS OF PROG. LANGS Type Systems Instructors: Crista Lopes Copyright © Instructors.
Various languages….  Could affect performance  Could affect reliability  Could affect language choice.
CSCI 171 Presentation 11 Pointers. Pointer Basics.
1 Lecture 05 – Pointer Analyses & CSSV Eran Yahav.
CSSV: Towards a Realistic Tool for Statically Detecting All Buffer Overflows in C Nurit Dor (TAU), Michael Rodeh (IBM Research Haifa), Mooly Sagiv (TAU)
Gabe Kanzelmeyer CS 450 4/14/10.  What is buffer overflow?  How memory is processed and the stack  The threat  Stack overrun attack  Dangers  Prevention.
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.
Type-Safe Programming in C George Necula EECS Department University of California, Berkeley.
Program analysis Mooly Sagiv html://
Memory Arrangement Memory is arrange in a sequence of addressable units (usually bytes) –sizeof( ) return the number of units it takes to store a type.
4/30/2008Prof. Hilfinger CS 164 Lecture 391 Language Security Lecture 39 (from notes by G. Necula)
Program analysis Mooly Sagiv html://
Static Detection of Buffer Overrun In C Code Lucas Silacci CSE 231 Spring 2000 University of California San Diego.
C pointers (Reek, Ch. 6) 1CS 3090: Safety Critical Programming in C.
May 9, 2001OSQ Retreat 1 Run-Time Type Checking for Pointers and Arrays in C Wes Weimer, George Necula Scott McPeak, S.P. Rahul, Raymond To.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
May 22, 2002OSQ Retreat 1 CCured: Taming C Pointers George Necula Scott McPeak Wes Weimer
Statically Detecting Likely Buffer Overflow Vulnerabilities David Larochelle David Evans University of Virginia Department of Computer Science Supported.
C++ Programming: Program Design Including Data Structures, Fourth Edition Chapter 13: Pointers, Classes, Virtual Functions, and Abstract Classes.
Data Structures Using C++ 2E Chapter 3 Pointers and Array-Based Lists.
C++ Programming: From Problem Analysis to Program Design, Fourth Edition Chapter 14: Pointers, Classes, Virtual Functions, and Abstract Classes.
Security Exploiting Overflows. Introduction r See the following link for more info: operating-systems-and-applications-in-
CS 11 C track: lecture 5 Last week: pointers This week: Pointer arithmetic Arrays and pointers Dynamic memory allocation The stack and the heap.
COP4020 Programming Languages
Chapter 6 Buffer Overflow. Buffer Overflow occurs when the program overwrites data outside the bounds of allocated memory It was one of the first exploited.
Names Variables Type Checking Strong Typing Type Compatibility 1.
Computer Security and Penetration Testing
A Survey of Dynamic Techniques for Detecting Device Driver Errors Olatunji Ruwase LBA Reading Group 18 th May 2010.
CSC-682 Cryptography & Computer Security Sound and Precise Analysis of Web Applications for Injection Vulnerabilities Pompi Rotaru Based on an article.
CMPSC 16 Problem Solving with Computers I Spring 2014 Instructor: Tevfik Bultan Lecture 12: Pointers continued, C strings.
Overloading Binary Operators Two ways to overload –As a member function of a class –As a friend function As member functions –General syntax Data Structures.
Type Systems CS Definitions Program analysis Discovering facts about programs. Dynamic analysis Program analysis by using program executions.
Pointers OVERVIEW.
Advanced Computer Architecture Lab University of Michigan USENIX Security ’03 Slide 1 High Coverage Detection of Input-Related Security Faults Eric Larson.
ECE 103 Engineering Programming Chapter 47 Dynamic Memory Alocation Herbert G. Mayer, PSU CS Status 6/4/2014 Initial content copied verbatim from ECE 103.
Data Structures Using C++ 2E Chapter 3 Pointers. Data Structures Using C++ 2E2 Objectives Learn about the pointer data type and pointer variables Explore.
Overflow Examples 01/13/2012. ACKNOWLEDGEMENTS These slides where compiled from the Malware and Software Vulnerabilities class taught by Dr Cliff Zou.
Static Analysis of Memory Errors Mooly Sagiv Tel Aviv University.
Buffer Overflow Proofing of Code Binaries By Ramya Reguramalingam Graduate Student, Computer Science Advisor: Dr. Gopal Gupta.
Structure of Programming Languages Names, Bindings, Type Checking, and Scopes.
Protecting C Programs from Attacks via Invalid Pointer Dereferences Suan Hsi Yong, Susan Horwitz University of Wisconsin – Madison.
Computer Organization and Design Pointers, Arrays and Strings in C Montek Singh Sep 18, 2015 Lab 5 supplement.
Pointers. Variable Declarations Declarations served dual purpose –Specification of range of values and operations –Specification of Storage requirement.
How to execute Program structure Variables name, keywords, binding, scope, lifetime Data types – type system – primitives, strings, arrays, hashes – pointers/references.
CSSV – C String Static Verifier Nurit Dor Michael Rodeh Mooly Sagiv Greta Yorsh Tel-Aviv University
Semantic Analysis II Type Checking EECS 483 – Lecture 12 University of Michigan Wednesday, October 18, 2006.
CSSV: Towards a Realistic Tool for Statically Detecting All Buffer Overflows in C Nurit Dor, Michael Rodeh, Mooly Sagiv PLDI’2003 DAEDALUS project.
© 2006 Carnegie Mellon University Introduction to CBMC: Part 1 Software Engineering Institute Carnegie Mellon University Pittsburgh, PA Arie Gurfinkel,
CS 330 Programming Languages 10 / 23 / 2007 Instructor: Michael Eckmann.
/ PSWLAB Evidence-Based Analysis and Inferring Preconditions for Bug Detection By D. Brand, M. Buss, V. C. Sreedhar published in ICSM 2007.
1 Dynamic Memory Allocation. 2 In everything we have done so far, our variables have been declared at compile time. In these slides, we will see how to.
CMPSC 16 Problem Solving with Computers I Spring 2014 Instructor: Lucas Bang Lecture 11: Pointers.
Announcements You will receive your scores back for Assignment 2 this week. You will have an opportunity to correct your code and resubmit it for partial.
MOPS: an Infrastructure for Examining Security Properties of Software Authors Hao Chen and David Wagner Appears in ACM Conference on Computer and Communications.
© 2006 Carnegie Mellon University Introduction to CBMC: Part 1 Software Engineering Institute Carnegie Mellon University Pittsburgh, PA Arie Gurfinkel,
Language-Based Security: Overview of Types Deepak Garg Foundations of Security and Privacy October 27, 2009.
Dynamic Allocation in C
Object Lifetime and Pointers
Dynamic Storage Allocation
High Coverage Detection of Input-Related Security Faults
Presentation transcript:

slide 1 Vitaly Shmatikov CS 380S Static Defenses against Memory Corruption

slide 2 Reading Assignment uWagner et al. “A first step towards automated detection of buffer overrun vulnerabilities” (NDSS 2000). uGanapathy et al. “Buffer overrun detection using linear programming and static analysis” (CCS 2003). uDor, Rodeh, Sagiv. “CSSV: Towards a realistic tool for statically detecting all buffer overflows in C” (PLDI 2003).

slide 3 Static Analysis uGoal: catch buffer overflow bugs by analyzing the source code of the program Typically at compile-time, but also binary analysis uStatic analysis is necessarily imprecise Soundness: finds all instances of buffer overflow –Problem: false positives (good code erroneously flagged) Completeness: every reported problem is indeed an instance of buffer overflow –Problem: false negatives (misses some buffer overflows) No technique is both sound and complete (why?) Maybe don’t need either…

slide 4 Static vs. Dynamic uBoth static and dynamic approaches have their advantages and disadvantages (what are they?) uHybrid approaches (example: CCured) Try to verify absence of memory errors statically, then insert runtime checks where static verification failed uPerformance and usability are always important Does source code need to be modified? Does source code need to be recompiled? How is backward compatibility (if any) achieved? –Rewriting binaries vs. special runtime environment

slide 5 BOON uTreat C strings as abstract data types Assume that C strings are accessed only through library functions: strcpy, strcat, etc. Pointer arithmetic is greatly simplified (what does this imply for soundness?) uCharacterize each buffer by its allocated size and current length (number of bytes in use) uFor each of these values, statically determine acceptable range at each point of the program Done at compile-time, thus necessarily conservative (what does this imply for completeness?) [Wagner et al.]

slide 6 uLet s be some string variable used in the program ulen(s) is the set of possible lengths Why is len(s) not a single integer, but a set? ualloc(s) is the set of possible values for the number of bytes allocated for s Is it possible to compute len(s) and alloc(s) precisely at compile-time? uAt each point in program execution, want len(s)  alloc(s) Safety Condition

slide 7 Integer Constraints uEvery string operation is associated with a constraint describing its effects strcpy(dst,src) strncpy(dst,src,n) gets(s) s=“Hello!” s[n]=‘\0’ len(src)  len(dst) min(len(src),n)  len(dst) [1,  ]  len(s) 7  len(s), 7  alloc(s) min(len(s),n+1))  len(s) and so on Does this fully capture what strncpy does? Range of possible values

slide 8 Constraint Generation Example char buf[128]; while (fgets(buf, 128, stdin)) { if (!strchr(buf, ‘\n’)) { char error[128]; sprintf(error,“Line too long: %s\n,buf); die(error); } … } 128  alloc(buf) [1,128]  len(buf) 128  alloc(error) len(buf)+16  len(error) [Wagner]

slide 9 Imprecision uSimplifies pointer arithmetic and pointer aliasing For example, q=p+j is associated with this constraint: alloc(p)-j  alloc(q), len(p)-j  len(q) This is unsound (why?) uIgnores function pointers uIgnores control flow and order of statements Consequence: every non-trivial strcat() must be flagged as a potential buffer overflow (why?) uMerges information from all call sites of a function into one variable

slide 10 Constraint Solving u“Bounding-box” algorithm (see paper) Imprecise, but scalable: sendmail (32K LoC) yields a system with 9,000 variables and 29,000 constraints uSuppose analysis discovers len(s) is in [a,b] range, and alloc(s) is in [c,d] range at some point If b  c, then code is “safe” –Does not completely rule out buffer overflow (why?) If a > d, then buffer overflow always occurs here If ranges overlap, overflow is possible uGanapathy et al.: model and solve the constraints as a linear program (see paper)

slide 11 BOON: Practical Results uFound new vulnerabilities in real systems code Exploitable buffer overflows in nettools and sendmail uLots of false positives, but still a dramatic improvement over hand search sendmail: over 700 calls to unsafe string functions, of them 44 flagged as dangerous, 4 are real errors Example of a false alarm: if (sizeof from e_from.q_paddr)+1) break; strcpy(from, e->e_from.q_paddr);

foo () { bar () { int x; int y; x = foobar(5); y = foobar(30); } } int foobar (int z) { int i; i = z + 1; return i; } slide 12 False path Result: x = y = [6..31] Context-Insensitivity is Imprecise

Adding Context Sensitivity uMake user functions context-sensitive For example, wrappers around library calls uInefficient method: constraint inlining Can separate calling contexts  Large number of constraint variables  Cannot support recursion uEfficient method: procedure summaries Summarize the called procedure Insert the summary at the callsite in the caller Remove false paths [Ganapathy et al.] slide 13

foo () { bar () { int x; int y; x = foobar(5); y = foobar(30); } } int foobar (int z) { int i; i = z + 1; return i; } slide 14 Context-Sensitive Analysis [Ganapathy et al.] x = y = Summary: i = z + 1

foo () { bar () { int x; int y; x = foobar(5); y = foobar(30); } } int foobar (int z) { int i; i = z + 1; return i; } slide 15 No False Paths [Ganapathy et al.] Jump functions Constraints x = [6..6] y = [31..31] i = [6..31]

Computing Procedure Summaries uIf function produces only difference constraints, reduces to an all-pairs shortest-path problem uOtherwise, Fourier-Motzkin variable elimination uTradeoff between precision and efficiency Constraint inlining: rename local variables of the called function at each callsite –Precise, but a huge number of variables and constraints Procedure summaries: merge variables across callsites –For example, constraint for i in the foobar example slide 16 [Ganapathy et al.]

Off-by-one Bug in sendmail orderq() reads a file from the queue directory, copies its name into d->d_name and w->w_name –As long as 21 bytes, including the ’\0’ terminator runqueue() calls dowork(w->w_name+2,...), dowork() stores its first argument into e->e_id queuename() concatenates "qf" and e->e_id, copies the result into 20-byte dfname buffer slide bytes 21 bytes uWagner et al.: a pointer to a structure of type T can point to all structures of type T Finds the bug, but do you see any issues? uGanapathy et al.: precise points-to analysis

slide 18 CSSV uGoal: sound static detection of buffer overflows What does this mean? uSeparate analysis for each procedure u“Contracts” specify procedure’s pre- and post- conditions, potential side effects Analysis only meaningful if contracts are correct uFlow-insensitive “points-to” pointer analysis uTransform C into a procedure over integers, apply integer analysis to find variable constraints Any potential buffer overflow in the original program violates an “assert” statement in this integer program [Dor, Rodeh, Sagiv]

char* strcpy(char* dst, char *src) requires modifies ensures string(src)  alloc(dst) > len(src) dst.strlen, dst.is_nullt len(dst) = =  return = = Example: strcpy Contract slide 19 [Dor, Rodeh, Sagiv]

#define BUFSIZ 1024 #include "insert_long.h" char buf[BUFSIZ]; char * insert_long (char *cp) { char temp[BUFSIZ]; int i; for (i=0; &buf[i] < cp; ++i){ temp[i] = buf[i]; } strcpy (&temp[i],"(long)"); strcpy (&temp[i + 6], cp); strcpy (buf, temp); return cp + 6; } Example: insert_long() slide 20 cp buf (long)temp cp buf (long)temp [Dor, Rodeh, Sagiv]

#define BUFSIZ 1024 #include "insert_long.h" char buf[BUFSIZ]; char * insert_long (char *cp) { char temp[BUFSIZ]; int i; for (i=0; &buf[i] < cp; ++i){ temp[i] = buf[i]; } strcpy (&temp[i],"(long)"); strcpy (&temp[i + 6], cp); strcpy (buf, temp); return cp + 6; } insert_long() Contract slide 21 char * insert_long(char *cp) requires string(cp)  buf ≤ cp < buf + BUFSIZ modifies cp.strlen ensures cp.strlen = = pre[cp.strlen] + 6  return_value = = cp + 6 ; [Dor, Rodeh, Sagiv]

slide 22 Pointer Analysis uGoal: compute points-to relation This is highly nontrivial for C programs (see paper) Pointer arithmetic, typeless memory locations, etc. uAbstract interpretation of memory accesses For each allocation, keep base and size in bytes Map each variable to their abstract locations We’ll see something similar in CCured uSound approximation of may-point-to For each pointer, set of abstract locations it can point to More conservative than actual points-to relation [Dor, Rodeh, Sagiv]

slide 23 C2IP: C to Integer Program uInteger variables only uNo function calls uNon-deterministic uConstraint variables uUpdate statements uAssert statements Any string manipulation error in the original C program is guaranteed to violate an assertion in integer program Based on points-to information [Dor, Rodeh, Sagiv]

slide 24 Transformations for C Statements For pointer p, l p - its location r p - location it points to (if several possibilities, use nondeterministic assignment) For abstract location l, l.val - potential values stored in the locations represented by l l.offset - potential values of the pointers represented by l l.aSize - allocation size l.is_nullt - null-terminated? l.len - length of the string [Dor, Rodeh, Sagiv]

slide 25 Correctness Assertions All dereferenced pointers point to valid locations Results of pointer arithmetic are valid [Dor, Rodeh, Sagiv]

slide 26 Example p = q + 5; Assert statement: Update statement: p.offset = q.offset + 5; assert ( 5 <= q.alloc && (!q.is_nullt || 5 <= q.len) ) [Dor, Rodeh, Sagiv]

slide 27 Nondeterminism *p = 0; if (…) { aloc 1.len = p.offset; aloc 1.is_nullt = true; } else { alloc 5.len = p.offset; alloc 5.is_nullt = true; } p aloc 1 aloc 5 [Dor, Rodeh, Sagiv]

Integer Analysis uInterval analysis not enough Loses relationships between variables uInfer variable constraints using abstract domain of polyhedra [Cousot and Halbwachs, 1978] a 1 * var 1 + a 2 * var 2 + … + a n * var n ≤ b slide 28 y  1 x + y  3 -x + y ≤ x y V = R = join [Dor, Rodeh, Sagiv]

#define BUFSIZ 1024 #include "insert_long.h" char buf[BUFSIZ]; char * insert_long (char *cp) { char temp[BUFSIZ]; int i; for (i=0; &buf[i] < cp; ++i){ temp[i] = buf[i]; } strcpy (&temp[i],"(long)"); strcpy (&temp[i + 6], cp); strcpy (buf, temp); return cp + 6; } insert_long() Redux slide 29 cp buf (long)temp cp buf (long)temp [Dor, Rodeh, Sagiv]

Integer Analysis of insert_long() slide 30 cp buf (long)temp assert(0  i < s temp.msize - 6); // strcpy(&temp[i],"(long)"); buf.offset = 0 temp.offset = 0 0  cp.offset = i i  s buf.len < s buf.msize s buf.msize = 1024 s temp.msize= 1024 Potential violation when cp.offset  1018 cp.offset  1018 [Dor, Rodeh, Sagiv]

CCured uGoal: make legacy C code type-safe uTreat C as a mixture of a strongly typed, statically checked language and an “unsafe” language checked at runtime All values belong either to “safe,” or “unsafe” world uCombination of static and dynamic checking Check type safety at compile-time whenever possible When compile-time checking fails, compiler inserts run-time checks in the code Fewer run-time checks  better performance slide 31 [Necula et al.]

Safe Pointers uEither NULL, or a valid address of type T uAliases are either safe pointers, or sequence pointers of base type T uWhat is legal to do with a safe pointer? Set to NULL Cast from a sequence pointer of base type T Cast to an integer uWhat runtime checks are required? Not equal to NULL when dereferenced slide 32

Sequence Pointers uAt runtime, either an integer, or points to a known memory area containing values of type T uAliases are safe, or sequence ptrs of base type T uWhat is legal to do with a sequence pointer? Perform pointer arithmetic Cast to a safe pointer of base type T Cast to or from an integer uWhat runtime checks are required? Points to a valid address when dereferenced –Subsumes NULL checking Bounds check when dereferenced or cast to safe ptr slide 33

Dynamic Pointers uAt runtime, either an integer, or points to a known memory area containing values of type T uThe memory area to which it points has tags that distinguish integers from pointers uAliases are dynamic pointers uWhat is legal to do with a dynamic pointer? Perform pointer arithmetic Cast to or from an integer or any dynamic pointer type uRuntime checks of address validity and bounds Maintain tags when reading & writing to base area slide 34

slide 35 int **a; int i; int acc; int **p; int *e; acc=0; for(i=0; i<100;i++){ p= a + i; e = *p; while((int) e % 2 == 0){ e = *(int **) e;} acc+=((int) e >> 1); } safe pointer sequence pointer dynamic pointer Example

For sequence and dynamic pointers, must keep track of the address and size of the pointed area for runtime bounds checking uEach allocated memory area is called a home (H), with a starting address h and a size uValid runtime values for a given type: Integers: ||int|| = N Safe pointers: ||τ ref SAFE|| = { h+i | h  H and 0  i<size(h) and (h=0 or kind(h)=Typed(τ)) } Sequence pointers: ||τ ref SEQ|| = { | h  H and (h=0 or kind(h)=Typed(τ)) } Dynamic pointers: ||DYNAMIC|| = { | h  H and (h=0 or kind(h)=Untyped) } slide 36 Modified Pointer Representation Safe pointers are integers, same as standard C

slide 37 Runtime Memory Safety uEach memory home (i.e., allocated memory area) has typing constraints Either contains values of type τ, or is untyped uIf a memory address belong to a home, its contents at runtime must satisfy the home’s typing constraints  h  H\{0}  i  N if 0  i<size(h) then (kind(h)=Untyped  Memory[h+i]  ||DYNAMIC|| and kind(h)=Typed(τ)  Memory[h+i]  ||τ||)

Runtime Checks uMemory accesses If via safe pointer, only check for non-NULL If via sequence or dynamic pointer, also bounds check uTypecasts From sequence pointers to safe pointers –This requires a bounds check! From pointers to integers From integers to sequence or dynamic pointers –But the home of the resulting pointer is NULL and it cannot be dereferenced; this breaks C programs that cast pointers into integers and back into pointers slide 38

slide 39 uManual: programmer annotates code uBetter: type inference Analyze the source code to find as many safe and sequence pointers as possible uThis is done by resolving a set of constraints If p is used in pointer arithmetic, p is not safe If p1 is cast to p2 –Either they are of the same kind, or p1 is a sequence pointer and p2 is a safe pointer –Pointed areas must be of same type, unless both are dynamic If p1 points to p2 and p1 is dynamic, then p2 dynamic uSee the CCured paper for more details Inferring Pointer Types

slide 40 Various CCured Issues uConverting a pointer to an integer and back to a pointer no longer works Sometimes fixed by forcing the pointer to be dynamic uModified pointer representation Not interoperable with libraries that are not recompiled using CCured (use wrappers) Breaks sizeof() on pointer types uIf program stores addresses of stack variables in memory, these variables must be moved to heap uGarbage collection instead of explicit deallocation

slide 41 Performance uMost pointers in benchmark programs were inferred safe, performance penalty under 90% Less than 20% in half the cases Minimal slowdown on I/O-bound applications –Linux kernel modules, Apache If all pointers were made dynamic, then 6 to 20 times slower (similar to a pure runtime-checks approach) On the other hand, pure runtime-checks approach does not require access to source code and recompilation uVarious bugs found in test programs Array bounds violations, uninitialized array indices

slide 42 Other Static Analysis Tools uCoverity uPREfix and PREfast (from Microsoft) uPolySpace uCyclone dialect of C uMany, many others For example, see