Presentation is loading. Please wait.

Presentation is loading. Please wait.

“Tracking Pointers with Path and Context Sensitivity for Bug Detection in C Programs” CMSC 838Z – Spring 2004 V. Benjamin Livshits and Monica S. Lam presented.

Similar presentations


Presentation on theme: "“Tracking Pointers with Path and Context Sensitivity for Bug Detection in C Programs” CMSC 838Z – Spring 2004 V. Benjamin Livshits and Monica S. Lam presented."— Presentation transcript:

1 “Tracking Pointers with Path and Context Sensitivity for Bug Detection in C Programs” CMSC 838Z – Spring 2004 V. Benjamin Livshits and Monica S. Lam presented by Mujtaba Ali (Based on a presentation given at ACM SIGSOFT FSE-11, September 2003)

2 2 Bugs are Bad Costs lots of money to fix after deployment Nastiest bugs == security violations –Hard to discover effects of malicious attacks –Legal/liability issues

3 3 CVE Classification 62% Would like to address these Study of security reports in 2002 –source: SecurityFocus.com

4 4 Motivation Goal: Report hard-to-find security violations –Errors spanning many functions and files –Reduce false positives Applications: –Buffer overflows –Format string vulnerabilities

5 5 Examples Two security vulnerabilities: –Buffer overrun in gzip, compression utility –Format string violation in muh, network game Cause: Unsafe use of user-supplied data –gzip copies data to statically-sized buffer may result in an overrun –muh uses data as format argument call to vsnprintf user can maliciously embed %n into format string

6 6 Buffer Overrun in gzip 0592 while (optind < argc) { 0593 treat_file(argv[optind++]); 0594 } 0704 local void treat_file(char *iname){... 0716 if (get_istat(iname, &istat) != OK) retu rn; 0997 local int get_istat(char *iname, struct stat *sbuf){... 1009 strcpy(ifname, iname); gzip.c:593 gzip.c:1009 gzip.c:716 Need a model of strcpy 0233 char ifname[MAX_PATH_LEN]; /*input file name*/ gzip.c:233

7 7 Format String Violation in muh 0838 s = ( char * )malloc( 1024 ); 0839 while( fgets( s, 1023, messagelog ) ) { 0841 irc_notice(&c_client, status.nickname, s); 0842 } 0843 FREESTRING( s ); 257 void irc_notice(con_type *con, char nick[], char *format,... ){ 259 va_list va; 260 char buffer[ BUFFERSIZE ]; 261 262 va_start( va, format ); 263 vsnprintf( buffer, BUFFERSIZE - 10, format, va ); muh.c:839 irc.c:263

8 8 Easy Bugs are Boring Programs have security violations despite code reviews and years of use Common observation about hard errors: –Errors on interface boundaries – need to follow data flow between procedures –Errors occur along complicated control-flow paths: need to follow long definition-use chains

9 9 Need for Alias Analysis Both examples involved complex flow of data Tracking data flow in modern languages requires alias analysis Steensgaard’s or Andersen’s analysis? –Flow- and context- insensitive –Fast, but imprecise: too many false positives –But flow- and context- sensitive analyses do not scale

10 10 Tradeoff: Scalability vs Precision slow and expensive Speed / Scalability Precision low high Steensgaard 3-value logic This analysis Andersen Wilson & Lam fast

11 11 A Hybrid Analysis Analyze precisely: –Local variables –Function parameters –Global variables –…their dereferences and fields –These are essentially access paths, i.e. p.next.data. The rest: Break into equivalence classes Represent by abstract locations: –Recursive data structures –Arrays –Locations accessed through pointer arithmetic Maintain precision selectively

12 12 Linearity Regular assignments result in strong updates Assignments to abstract memory locations – weak updates x = 1; x 0 = 1 x = 2; x 1 = 2 y = x; y 0 = x 1 x is 2 A[i] = 1; m 0 = 1 A[j] = 2; m 1 =  (m 0, 2) b = A[k]; b 0 = m 1 Either 1 or 2

13 13 Static Single Assignment Sparse representation of program Propagate facts about definition where needed –Definition-use relationships Each variable is defined only once –Give new names (subscripts) to definitions –Solves flow-sensitivity problem But a definition may be used many times

14 14 SSA: Join points In standard SSA, use  function at joins –d3 =  (d1, d2) In Gated SSA, use  function at joins –d3 =  (, ) –Solves path-sensitivity problem

15 15 IPSSA – Intraprocedurally Extension of Gated SSA Provides pointer resolution –Replace indirect pointer dereferences with direct accesses of potentially new temporary locations

16 16 Example of Pointer Resolution Load resolution Store resolution int a=0,b=1; int c=2,d=3; if(Q){ p = &a; }else{ p = &b; } c = *p; *p = d; p 1 = &a p 2 = &b p 3 =  (, ) c 1 =  (, ) a 1 =  (, ) b 1 =  (, ) a 0 = 0, b 0 = 1 c 0 = 2, d 0 = 3

17 17 Pointer Resolution Rules When resolving definition d, next step depends on RHS of d Expressed as conditional rewrite rules A few sample rules: –d = &x, result is x –d =  (…), result is d^ –d =  (,…, ), follow d 1 …d n Refer to the paper for details

18 18 Interprocedural Example Data flow in and out of functions: –Create links between formal and actual parameters –Reflect stores and assignments to globals at the callee int f(int* p){ *p = 100; } int main(){ int x = 0; int *q = &x; c: f(q); } p 0 =  ( ) p^ 1 = 100 x 1 = ρ( ) q 0 = &x Formal-actual connection for call site c Reflect store inside of f within main x 0 = 0

19 19 Unsound Unaliasing Assumption A1: No aliased parameters A2: No aliased abstract locations Assumption Locations accessible through different parameters are distinct Things pulled out of an abstract location is not aliased Justification Matches how good interfaces are written Holds in most usage cases Consequence Context-independent procedure summaries Give unique names when we get data from abstract location

20 20 Interprocedural Algorithm Process one SCC of the call graph at a time –Bottom-up –SCC == strongly-connected component For each SCC, within each procedure: –Resolve all pointer operations (loads and stores) –Create links between formal and actual parameters –Reflect stores and assignments to globals at call sites Iterate within SCC until the representation stabilizes

21 21 Framework Program sources IPSSA construction Buffer overruns Error traces Format violations …others… Abstracts away many details. Makes it easy to write tools IP data flow info Framework makes it easy to add new analyses

22 22 Application 1.Start at roots – sources of user input such as –argv[] elements –Input functions: fgets, gets, recv, getenv, etc. 2.Follow data flow chains provided by IPSSA: for every definition, IPSSA provides a list of its uses 3.A sink is a potentially dangerous usage such as –A buffer of a statically defined length –A format argument of vulnerable functions: printf, fprintf, snprintf, vsnprintf 4.Report bug, record full path

23 23 Experimental Setup Implementation w/ SUIF2 compiler suite –Pentium IV 2GHz, 2GB of RAM running Linux

24 24 Summary of Results Many definitions Many procedures

25 25 False Positive in pcre Copying “tainted” user data to a statically- sized buffer may be unsafe –Turns out to be safe in this case sprintf(buffer, “%.512s”, filename) Limits the length of copied data. Buffer is big enough! Tainted data

26 26 Related Work xgcc –Also unsound –Many more false positives –Even “regular” developers can specify new analyses Cqual –Interprocedural, sound analysis –Requires annotations –Flow-, context-, and path-insensitive More false positives

27 27 The Good Unsoundness gives very few false positives Unsoundness allows efficient path- and context-sensitive analysis Tested on real code Good presentations to steal slides from

28 28 The Bad Cqual is much improved now –Very few false positives on these types of security vulnerabilities –Let’s see some more interesting vulnerabilities Does not detect buffer overflows due to array bounds violations Not tested with large programs

29 29 Singular Key Idea IPSSA coupled with a (slightly) unsound alias analysis facilitates efficient detection of hard- to-find security violations with very few false positives


Download ppt "“Tracking Pointers with Path and Context Sensitivity for Bug Detection in C Programs” CMSC 838Z – Spring 2004 V. Benjamin Livshits and Monica S. Lam presented."

Similar presentations


Ads by Google