Download presentation
Presentation is loading. Please wait.
1
“Tracking Pointers with Path and Context Sensitivity for Bug Detection in C Programs” CMSC 838Z – Spring 2004 V. Benjamin Livshits and Monica S. Lam presented by Mujtaba Ali (Based on a presentation given at ACM SIGSOFT FSE-11, September 2003)
2
2 Bugs are Bad Costs lots of money to fix after deployment Nastiest bugs == security violations –Hard to discover effects of malicious attacks –Legal/liability issues
3
3 CVE Classification 62% Would like to address these Study of security reports in 2002 –source: SecurityFocus.com
4
4 Motivation Goal: Report hard-to-find security violations –Errors spanning many functions and files –Reduce false positives Applications: –Buffer overflows –Format string vulnerabilities
5
5 Examples Two security vulnerabilities: –Buffer overrun in gzip, compression utility –Format string violation in muh, network game Cause: Unsafe use of user-supplied data –gzip copies data to statically-sized buffer may result in an overrun –muh uses data as format argument call to vsnprintf user can maliciously embed %n into format string
6
6 Buffer Overrun in gzip 0592 while (optind < argc) { 0593 treat_file(argv[optind++]); 0594 } 0704 local void treat_file(char *iname){... 0716 if (get_istat(iname, &istat) != OK) retu rn; 0997 local int get_istat(char *iname, struct stat *sbuf){... 1009 strcpy(ifname, iname); gzip.c:593 gzip.c:1009 gzip.c:716 Need a model of strcpy 0233 char ifname[MAX_PATH_LEN]; /*input file name*/ gzip.c:233
7
7 Format String Violation in muh 0838 s = ( char * )malloc( 1024 ); 0839 while( fgets( s, 1023, messagelog ) ) { 0841 irc_notice(&c_client, status.nickname, s); 0842 } 0843 FREESTRING( s ); 257 void irc_notice(con_type *con, char nick[], char *format,... ){ 259 va_list va; 260 char buffer[ BUFFERSIZE ]; 261 262 va_start( va, format ); 263 vsnprintf( buffer, BUFFERSIZE - 10, format, va ); muh.c:839 irc.c:263
8
8 Easy Bugs are Boring Programs have security violations despite code reviews and years of use Common observation about hard errors: –Errors on interface boundaries – need to follow data flow between procedures –Errors occur along complicated control-flow paths: need to follow long definition-use chains
9
9 Need for Alias Analysis Both examples involved complex flow of data Tracking data flow in modern languages requires alias analysis Steensgaard’s or Andersen’s analysis? –Flow- and context- insensitive –Fast, but imprecise: too many false positives –But flow- and context- sensitive analyses do not scale
10
10 Tradeoff: Scalability vs Precision slow and expensive Speed / Scalability Precision low high Steensgaard 3-value logic This analysis Andersen Wilson & Lam fast
11
11 A Hybrid Analysis Analyze precisely: –Local variables –Function parameters –Global variables –…their dereferences and fields –These are essentially access paths, i.e. p.next.data. The rest: Break into equivalence classes Represent by abstract locations: –Recursive data structures –Arrays –Locations accessed through pointer arithmetic Maintain precision selectively
12
12 Linearity Regular assignments result in strong updates Assignments to abstract memory locations – weak updates x = 1; x 0 = 1 x = 2; x 1 = 2 y = x; y 0 = x 1 x is 2 A[i] = 1; m 0 = 1 A[j] = 2; m 1 = (m 0, 2) b = A[k]; b 0 = m 1 Either 1 or 2
13
13 Static Single Assignment Sparse representation of program Propagate facts about definition where needed –Definition-use relationships Each variable is defined only once –Give new names (subscripts) to definitions –Solves flow-sensitivity problem But a definition may be used many times
14
14 SSA: Join points In standard SSA, use function at joins –d3 = (d1, d2) In Gated SSA, use function at joins –d3 = (, ) –Solves path-sensitivity problem
15
15 IPSSA – Intraprocedurally Extension of Gated SSA Provides pointer resolution –Replace indirect pointer dereferences with direct accesses of potentially new temporary locations
16
16 Example of Pointer Resolution Load resolution Store resolution int a=0,b=1; int c=2,d=3; if(Q){ p = &a; }else{ p = &b; } c = *p; *p = d; p 1 = &a p 2 = &b p 3 = (, ) c 1 = (, ) a 1 = (, ) b 1 = (, ) a 0 = 0, b 0 = 1 c 0 = 2, d 0 = 3
17
17 Pointer Resolution Rules When resolving definition d, next step depends on RHS of d Expressed as conditional rewrite rules A few sample rules: –d = &x, result is x –d = (…), result is d^ –d = (,…, ), follow d 1 …d n Refer to the paper for details
18
18 Interprocedural Example Data flow in and out of functions: –Create links between formal and actual parameters –Reflect stores and assignments to globals at the callee int f(int* p){ *p = 100; } int main(){ int x = 0; int *q = &x; c: f(q); } p 0 = ( ) p^ 1 = 100 x 1 = ρ( ) q 0 = &x Formal-actual connection for call site c Reflect store inside of f within main x 0 = 0
19
19 Unsound Unaliasing Assumption A1: No aliased parameters A2: No aliased abstract locations Assumption Locations accessible through different parameters are distinct Things pulled out of an abstract location is not aliased Justification Matches how good interfaces are written Holds in most usage cases Consequence Context-independent procedure summaries Give unique names when we get data from abstract location
20
20 Interprocedural Algorithm Process one SCC of the call graph at a time –Bottom-up –SCC == strongly-connected component For each SCC, within each procedure: –Resolve all pointer operations (loads and stores) –Create links between formal and actual parameters –Reflect stores and assignments to globals at call sites Iterate within SCC until the representation stabilizes
21
21 Framework Program sources IPSSA construction Buffer overruns Error traces Format violations …others… Abstracts away many details. Makes it easy to write tools IP data flow info Framework makes it easy to add new analyses
22
22 Application 1.Start at roots – sources of user input such as –argv[] elements –Input functions: fgets, gets, recv, getenv, etc. 2.Follow data flow chains provided by IPSSA: for every definition, IPSSA provides a list of its uses 3.A sink is a potentially dangerous usage such as –A buffer of a statically defined length –A format argument of vulnerable functions: printf, fprintf, snprintf, vsnprintf 4.Report bug, record full path
23
23 Experimental Setup Implementation w/ SUIF2 compiler suite –Pentium IV 2GHz, 2GB of RAM running Linux
24
24 Summary of Results Many definitions Many procedures
25
25 False Positive in pcre Copying “tainted” user data to a statically- sized buffer may be unsafe –Turns out to be safe in this case sprintf(buffer, “%.512s”, filename) Limits the length of copied data. Buffer is big enough! Tainted data
26
26 Related Work xgcc –Also unsound –Many more false positives –Even “regular” developers can specify new analyses Cqual –Interprocedural, sound analysis –Requires annotations –Flow-, context-, and path-insensitive More false positives
27
27 The Good Unsoundness gives very few false positives Unsoundness allows efficient path- and context-sensitive analysis Tested on real code Good presentations to steal slides from
28
28 The Bad Cqual is much improved now –Very few false positives on these types of security vulnerabilities –Let’s see some more interesting vulnerabilities Does not detect buffer overflows due to array bounds violations Not tested with large programs
29
29 Singular Key Idea IPSSA coupled with a (slightly) unsound alias analysis facilitates efficient detection of hard- to-find security violations with very few false positives
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.