CSSV: Towards a Realistic Tool for Statically Detecting All Buffer Overflows in C Nurit Dor (TAU), Michael Rodeh (IBM Research Haifa), Mooly Sagiv (TAU) Greta Yorsh (TAU)? Seminar in Program Analysis for Cyber-Security Ittay Eyal, March 2011
High-Level Structure 2
Example void RTC_Si_SkipLine(const INT32 NbLine, char ** const PtrEndText) { INT32 indice; for (indice=0; indice<NbLine; indice++) { **PtrEndText = ‘\n’; (*PtrEndText)++; } **PtrEndText = ‘\0’; return; } 3
Core C Control-flow statements: if, goto, break, or continue Expressions are side-effect free and cannot be nested All assignments are statements Declarations do not have initializations Address-of formal variables is not allowed 4
void RTC_Si_SkipLine(const INT32 NbLine, char ** const PtrEndText) { INT32 indice; for (indice=0; indice<NbLine; indice++) { **PtrEndText = ‘\n’; (*PtrEndText)++; } **PtrEndText = ‘\0’; return; } void SkipLine(int NbLine, char** PtrEndText) { int indice; char* PtrEndLoc; indice=0; begin_loop: if (indice>=NbLine) goto end_loop; PtrEndLoc = *PtrEndText; *PtrEndLoc = ‘\n’; *PtrEndText = PtrEndLoc + 1; indice = indice + 1; goto begin_loop; end_loop: PtrEndLoc = *PtrEndText *PtrEndLoc = ‘\0’; } 5
Contracts Describe input, side-effects and output: Requires Modifies Ensures 6
void SkipLine(int NbLine, char** PtrEndText) requires is_within_bounds(*PtrEndText) && *PtrEndText.alloc > NbLine && NbLine >= 0 modifies *PtrEndText *PtrEndText.is_nullt *PtrEndText.strlen ensures *PtrEndText.is_nullt && *PtrEndText.strlen == 0 && *PtrEndText == [*PtrEndText] pre + NbLine; void SkipLine(int NbLine, char** PtrEndText) { int indice; char* PtrEndLoc; indice=0; begin_loop: if (indice>=NbLine) goto end_loop; PtrEndLoc = *PtrEndText; *PtrEndLoc = ’\n’; *PtrEndText = PtrEndLoc + 1; indice = indice + 1; goto begin_loop; end_loop: PtrEndLoc = *PtrEndText *PtrEndLoc = ’\0’; } 7
void main() { char buf[SIZE]; char *r, *s; r = buf; SkipLine(1,&r); fgets(r,SIZE-1,stdin); s = r + strlen(r); SkipLine(1,&s); } 8
Requires: is_within_bounds(*PtrEndText) && *PtrEndText.alloc > NbLine && NbLine >= 0 Modifies: *PtrEndText, *PtrEndText.is_nullt, *PtrEndText.strlen Ensures: *PtrEndText.is_nullt && *PtrEndText.strlen == 0 && *PtrEndText == [*PtrEndText] pre + NbLine; void SkipLine(int NbLine, char** PtrEndText) { int indice; char* PtrEndLoc; indice=0; begin_loop: if (indice>=NbLine) goto end_loop; PtrEndLoc = *PtrEndText; *PtrEndLoc = ’\n’; *PtrEndText = PtrEndLoc + 1; indice = indice + 1; goto begin_loop; end_loop: PtrEndLoc = *PtrEndText *PtrEndLoc = ’\0’; } void main() { char buf[SIZE]; char *r, *s; r = buf; SkipLine(1,&r); fgets(r,SIZE-1,stdin); s = r + strlen(r); SkipLine(1,&s); } 9
10
11
void main() { char buf[SIZE]; char *r, *s; r = buf; SkipLine(1,&r); fgets(r,SIZE-1,stdin); s = r + strlen(r); SkipLine(1,&s); } void SkipLine(int NbLine, char** PtrEndText) 12
P inline(P) Function Entry point: Assume pre-conditions. Store inputs ([x] pre ) in temporary variables for post-conditions check. Return: Set return_value P. Function exit: Assert post-conditions. Function call and its result assertion: Assert pre-conditions. Assume post-conditions (possibly w.r.t. inputs). 13
Pointer Analysis The target – determine which objects may be updated through a pointer. Whole program points-to state is calculated. Then per-procedure. 14
Pointer Analysis foo(char *p, char *q) { char local[100]; … p = local; *q = 0; … } main() { char s[10], t[20], r[30]; char *temp; foo(s,t); foo(s,r); … temp = s … } str temp local pq 15
Pointer Analysis foo(char *p, char *q) { char local[100]; … p = local; *q = 0; … } main() { char s[10], t[20], r[30]; char *temp; foo(s,t); foo(s,r); … temp = s … } PARAM #1 local pq Parametrization for foo PARAM #2 16
C to Integer Program 17
C2IP Inline(P) Pointer info Integer Program l.val: possible values. l.offset: w.r.t. base address. l.aSize: Allocation size. l.is_nullt: Null terminated? l.len: String length (with \0) 18
C to Integer Program Expression Check 19
C to Integer Program Constructs to Statements 20
C to Integer Program Notation V: the number of variables and allocation sites. S: the number of C expressions. Integer Program Complexity O(V) constraint variables Each pointer may point to O(V) locations Total complexity: O(S V) 21
Integer Analysis Calculates the inequalities that hold at each point. Conservative. Each assertion is verified against the inequalities. 22
Integer Analysis *PtrEndText.alloc > NbLine void main() { char buf[SIZE]; char *r, *s; r = buf; SkipLine(1,&r); fgets(r,SIZE-1,stdin); s = r + strlen(r); SkipLine(1,&s); } 23
Integer Analysis - Contracts To optimize the contracts, do the following: 1.Assume True preconditions Use ASPost [1] to calculate the linear inequalities at the exit point Deduce the postconditions. 1.Use AWPre to calculate backwards the most liberal preconditions. [6] P. Cousot and N. Halbwachs. Automatic discovery of linear constraints among variables of a program. In Symp. on Princ. of Prog. Lang.,
Implementation C CoreC: Based on the AST-Toolkit [32] Points-to analysis: Golf [8, 9] Integer analysis: Polyhedra library [6, 19] [6] P. Cousot and N. Halbwachs. Automatic discovery of linear constraints among variables of a program. In Symp. on Princ. of Prog. Lang., [8] M. Das. Unification-based pointer analysis with directional assignments. In SIGPLAN Conf. on Prog. Lang. Design and Impl., [9] M. Das, B. Liblit, M. F¨hndrich, and J. Rehof. Estimating the impact of scalable pointer analysis on optimization. In Static Analysis Symp., [19] B. Jeannet. New polka library. Available at “ [32] Microsoft Research. AST-toolkit
Empirical Results Source from two real-world projects: String manipulation library from EADS Airbus code. 11 procedures, 400 lines. Part of the WEB2c converter. 8 procedures, 460 lines. 26
Empirical Results 27
Empirical Results 28
Empirical Results 29
Conclusion Not easy to analyze C. Plenty of techniques and tools. High false positive ratio - without hand-crafted contracts. Experimental results section slim. High variance for little data. (They had to write all contracts…) What would happen to normal code? 30