Checking the World’s Software for Exploitable Bugs David Brumley Carnegie Mellon University dbrumley@cmu.edu http://security.ece.cmu.edu/ Stand up straight. PPW
vs. An epic battle White Black format c: First sentence: I love computer security. I love that it’s an epic battle between white vs black, us vs them, good vs. evil. It’s the only area of computer science that brings alive the notion of an adversary. In security, adversaries really exist. format c:
Exploit bugs Bug White Black format c:
OK Exploit $ iwconfig accesspoint $ iwconfig # 01ad 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 fce8 bfff 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 3101 50c0 2f68 732f 6868 622f 6e69 e389 5350 e189 d231 0bb0 80cd Superuser
Bug Fixed! White Black format c:
Fact: Ubuntu Linux has over 99,000 known bugs
1009 Linux programs. 13 minutes. 52 new bugs in 29 programs. inp=`perl –e '{print "A"x8000}'` for program in /usr/bin/*; do for opt in {a..z} {A..Z}; do timeout –s 9 1s $program -$opt $inp done 1009 Linux programs. 13 minutes. 52 new bugs in 29 programs.
Which bugs are exploitable? Evil David
Plaid Parliament of Pwning CMU Hacking Team
DEF CON 2012 scoreboard CMU Time (3 days total)
A Manual Process
DEF CON 2013
I skate to where the puck is going to be, not where it has been. --- Wayne Gretzky Hockey Hall of Fame
White Our Vision: Automatically Check the World’s Software for Exploitable Bugs
We owned the machine in seconds Evil David
Verification, but with a twist Correct Safe paths Verification Program Incorrect Exploit Correctness Property Un-exploitability Property 33,248 programs 152 new exploitable bugs
Outline Basic exploitation Symbolic execution for exploit generation Automatic exploit generation on real code Experiments Related projects and the future
attacker gains control of execution Control flow hijack attacker gains control of execution buffer overflow format string attack heap metadata overwrite use-after-free ... Same principle, different mechanism
Basic execution semantics of compiled code Process Memory Instruction Pointer points to next instruction to execute Fetch, decode, execute Code Data ... Processor EIP ... Stack read and write Start with code in memory Heap Control Flow Hijack: EIP = Attacker Code
Buffer overflows and the runtime stack int vulnerable(char *input) { char buf[32]; int x; if(...){ x = 1; } else { x = 0; } strcpy(buf,input); return x; execution semantics, including call/return local variables Control flow hijack when input length > buffer length Go over program.
locals allocated on stack ... char *input saved eip caller’s ebp 32 bytes for buf int x char *buf lower addresses vulnerable’s initial stack frame int vulnerable(char *input) { char buf[32]; int x; ... strcpy(buf,input); return x; } locals allocated on stack
input = “ABC\0” Writes go up! ABC\0 int vulnerable(char *input) { ... char *input saved eip caller’s ebp 32 bytes for buf int x char *buf lower addresses writes Writes go up! ABC\0 int vulnerable(char *input) { char buf[32]; int x; ... strcpy(buf,input); return x; }
“return address” Processor caller(){ i: vulnerable(input); i+1: ... char *input saved eip caller’s ebp 32 bytes for buf int x char *buf caller(){ i: vulnerable(input); i+1: ... saved eip lower addresses ABC\0 int vulnerable(char *input) { char buf[32]; int x; ... strcpy(buf,input); return x; } Processor EIP
Traditionally we show exploitability by running shellcode A buffer overflow occurs when data is written outside of the space allocated for the buffer. C does not check that writes are in-bound ... char *input saved eip caller’s ebp 32 bytes for buf int x char *buf writes Classic Exploit: overwrite saved EIP The key here is predetermined allocation. * More advanced methods, like Return-Oriented Programming, can also be automatically generated in our research Traditionally we show exploitability by running shellcode
Shellcode is a string execve(“/bin/sh”, 0, 0); Compile \x31\xc9\xf7\xe1\x51\x68\x2f\x2f \x73\x68\x68\x2f\x62\x69\x6e\x89 \xe3\xb0\x0b\xcd\x80 Executable String Author: kernel_panik, http://www.shell-storm.org/shellcode/files/shellcode-752.php
input = shellcode . address of buf ... char *input saved eip caller’s ebp 32 bytes for buf int x char *buf &buf \x31\xc9\xf7\xe1\x51\x68\x02\x02\x73\x68\x68\x2f ... int vulnerable(char *input) { char buf[32]; int x; ... strcpy(buf,input); return x; } &buf Processor EIP
Owned! input = shellcode . address of buf %eip = <shellcode> ... char *input saved eip caller’s ebp 32 bytes for buf int x char *buf %eip = <shellcode> execve(“/bin/sh”, NULL) Owned! &buf \x31\xc9\xf7\xe1\x51\x68\x02\x02\x73\x68\x68\x2f ... int vulnerable(char *input) { char buf[32]; int x; ... strcpy(buf,input); return x; } &buf Processor EIP
Automatically finding exploitable bugs
Verification, but with a twist Correct Safe path Verification Program Incorrect Exploitable Correctness Property Un-exploitability Property We use symbolic execution to test paths [Boyer75, Howden75, King76]
Basic symbolic execution x = input() x can be anything x > 42 if x > 42 f t (x > 42) ∧ (x*x != MAXINT) if x*x = MAXINT Symbolic execution runs on symbolic input f t (x > 42) ∧ (x*x != MAXINT) ∧ !(x < 42) if x < 42 jmp stack[x] f t
Path formula (true for inputs that take path) x = input() x can be anything x > 42 if x > 42 Path formula (true for inputs that take path) f t (x > 42) ∧ (x*x != MAXINT) if x*x = MAXINT Symbolic execution runs on symbolic input f t (x > 42) ∧ (x*x != MAXINT) ∧ !(x < 42) if x < 42 jmp stack[x] f t
Basic symbolic execution Satisfiable (x = 43) x = input() path test case! Satisfiability Modulo Theory (SMT) Solver if x > 42 f t if x*x = MAXINT Symbolic execution runs on symbolic input f t (x > 42) ∧ (x*x != MAXINT) ∧ !(x < 42) if x < 42 jmp stack[x] f t
Basic symbolic execution ∧ (x*x != MAXINT) ∧ (x <= 42) UNSAT (infeasible) x = input() SMT Solver if x > 42 f t if x*x = MAXINT Symbolic execution runs on symbolic input f t if x < 42 jmp stack[x] f t
Checking non-exploitability x = input() Un-exploitability property: EIP != user input if x > 42 f t (x > 42) ∧ (x*x == MAXINT) ∧ Un-exploitable if x*x = MAXINT f t if x < 42 jmp stack[x] f t
Checking non-exploitability SAT (safe) UNSAT (exploit) ... char *input saved eip caller’s ebp 32 bytes for buf SMT <path formula> ∧ eip != user input For each path
Exploit generation can be cast as a verification problem.
Real world exploit generation a brief history Ours Others 2005 Automatic Discovery of API-Level Exploits [Ganapathy et al., Conference on Software Engineering] 2008 Automatic Patch-Based Exploit Generation [Brumley et al., IEEE Security and Privacy Symposium] 2010 Automatic Generation of Control Flow Hijack Exploits for Commodity Software [Heelan, MS Thesis] 2011 Automatic Exploit Generation [Avgerinos et al., Network and Distributed System Security Symposium] Q: Exploit Hardening Made Easy [Schwartz et al., USENIX Security Symposium] 2012 Unleashing Mayhem on Binary Code [Cha et al., IEEE Security and Privacy Symposium] And >150 papers on symbolic execution
Exploiting Real Code: The Mayhem Architecture Principles: Require only the binary e.g., BAP, our binary analysis platform Use intelligent analysis to reduce state space e.g., preconditioned symbolic execution Make queries to SMT as easy as possible e.g., symbolic memories
Potentially infinite state space strcpy(buf, input); if (input[0] != 0) t f if (input[1] != 0) t f while(input[i] != 0){ buf[i] = input[i]; i++; } buf[i] = 0; … if (input[n] != 0) t f
check every branch blindly if (input[0] != 0) t f if (input[1] != 0) if (input[n] != 0) … 20 min exploration 30 min exploration x min exploration Exploitable bug found KLEE [Cadar’08] does this
Preconditioned symbolic execution All Inputs Trigger bug Preconditions focus search, e.g.: input > len Control Hijack input vs bugs doesn’t typecheck other examples in [Avgerinos11]
Static and online analysis determines likely exploit conditions 40 bytes All non-NULL ... char *input saved eip caller’s ebp 32 bytes for buf char buf[32]; int x; ... strcpy(buf, input);
Example: length precondition Precondition Check: length(input) > 40 ∧ input[0] == 0 Unsatisfiable If (input[0] != 0) t f If (input[1] != 0) If (input[n] != 0) … Unsatisfiable Not explored. Saved 20 min Precondition Check: length(input) > 40 ∧ input[1] == 0 Not explored. Saved 30min Not explored. Saved x min Exploitable bug found
Don’t treat as a black box! SAT. (x = 43) SMT Solver “program” the SMT Symbolic execution runs on symbolic input (x > 42) ∧ (x*x != 0xffffffff) ∧ !(x < 42)
Symbolic memory indices x can be anything x := user_input(); <executed path> y := mem[x]; assert(y = 42); vulnerable(); Which memory cell contains 42? There are more causes… 232 cells to check Memory 232 -1
Symbolic addresses occur often c = get_char(); ... to_lower(c); to_lower(char c){ c >= -128 && c < 256 ? tbl[c] : c; } ... a b c d tbl+’A’ Other causes Parsing: sscanf, vfprintf, etc. Character test: isspace, isalpha, etc. Conversion: toupper, tolower, mbtowc, etc. … Address is symbolic
Concretization: test case generation e.g., SAGE, DART, CUTE, KLEE ✓ Solvable ✗ Exploits x := user_input(); <executed path> y := mem[30]; assert(y = 42); vulnerable(); Misses over 40% of exploits There are more causes… 1 cell to check Memory 232 -1 30
Observation Path formula constrains range of symbolic memory accesses y = mem[x] assert(y==42) f t x > 0 x can be anything x < 5 0 < x < 5 Use symbolic execution state to: Step 1: Bound memory addresses referenced Step 2: Reduce to linear formulas
piecewise linear equations Ind. Value 4 20 22 12 10 know: 0 < x < 5 y = mem[x] 40% more exploits (strength reduction) 1 y = - 2*x + 28 Index Value 10 12 22 20 See Paper for Details y = 2*x + 10
Experiments with Mayhem Known exploitable bugs Coverage for 997 programs Checking Debian
2 Unknown Bugs: FreeRadius, GnuGol [Cha et al, NDSS’12] Windows 2 Unknown Bugs: FreeRadius, GnuGol Linux
coverage 50% Unchecked 50% on average tested Code coverage measures percentage of statements executed at least once by symbolic executor Mayhem coverage measured on 997 programs compiled with gcov from /usr/bin and /bin 50% Unchecked 50% on average tested
gcov coverage per program Programs
Unique code lines covered total unique lines (all programs): 2,245,632 lines covered (all programs): 437,455 absolute coverage: 19.48% Achieving 100% impossible due to dead code and other factors !
Checking Debian 152 new exploits 33,248 programs 2,727 days CPU time 15,914,407,892 SMT queries 199,685,594 test cases 2,365,154 crashes 28 cents a bug 21 dollars an exploit. 11,690 unique bugs 152 new exploits
public data http://forallsecure.com/summaries
mining data Q: How long do queries take on average? A: 3.67ms on average with 0.34 variance Q: Should I optimize hard or easy formulae? A: 99.99% take less than 1 second and account for 78% of total time Q: Do queries get harder? A: Good question... > basicStat() Q. How many programs do you have? #program 957 Q. How many SMT formulae have you queried and solved (within timeout)? #query 20223626 Q. Among those, how many are SAT? UNSAT? #sat #unsat 2544620 17679006 Q. How many programs yield *fresh* formulae that take at least 1 second to solve? 563 Q. How many *distinct* SMT formulae take at least 1 second to solve? #formula 18663 Q. What are the basic statistics on the TIME it took to solve these formulae? bothtimesum sattimesum unsattimesum 387213.9 95185.1 292028.8 bothtimemax sattimemax unsattimemax 277.2653 107.1409 277.2653 bothtimeavg sattimeavg unsattimeavg 0.01914661 0.03740641 0.0165184 bothtimevar sattimevar unsattimevar 0.06179814 0.07019839 0.06053416 Q. What are the basic statistics on the number of VARIABLES in these formulae? bothvarsmax satvarsmax unsatvarsmax 111 89 111 bothvarsavg satvarsavg unsatvarsavg 15.12656 16.42357 14.93987 bothvarsvar satvarsvar unsatvarsvar 35.7964 53.92014 32.91078 Q. What are the basic statistics on the number of CLAUSES in these formulae? bothclausesmax satclausesmax unsatclausesmax 275990 275990 275990 bothclausesavg satclausesavg unsatclausesavg 968.1475 669.0946 1011.192 bothclausesvar satclausesvar unsatclausesvar 6801531 11601246 6095962 Q. What are the basic statistics on the number of AST Nodes in these formulae? bothastnodesmax satastnodesmax unsatastnodesmax 3354063 3354063 3354060 bothastnodesavg satastnodesavg unsatastnodesavg 15234.39 11408.75 15785.04 bothastnodesvar satastnodesvar unsatastnodesvar 1044798320 1788943562 935280428 Q. What are the basic statistics on the DEPTH of exploration when generating these formulae? bothdepthmax satdepthmax unsatdepthmax 19557 3587 19557 bothdepthavg satdepthavg unsatdepthavg 283.9193 189.2816 297.5409 bothdepthvar satdepthvar unsatdepthvar 129693.1 73239.32 136344.1 optimize fast queries
500 sec timeout No dominant upward trend in time to solve
Size not strongly correlated with hardness SAT UNSAT
hardness is (likely) localized Sym Exe. Thread Depth 0 (Pointer Res.) Depth 1 Depth 2 Hard Query
Only 39 programs create hard formulas a/10 replaced with (a*0xcccccccd) >> 3
But each report is actionable. We are not perfect We don’t claim to find all exploitable bugs “Exploitability” vs “safe” wrt to fixed input size Better symbolic execution <and many more reasons...> But each report is actionable.
symbolic execution thrusts 2. Binary Program Verification path merging, faster SMT 1. Formalize Exploit control flow hijack, information leaks, command injection 3. Real Code Handle messy details, transactional rollback
the larger pipeline BAP [Brumley11] Decompiler [Schwartz13] 15,546 vulns [Jang12] BAP [Brumley11] Decompiler [Schwartz13] 2 year total: 27,659 bugs 15,698 vulns Triage unpatched code clones scheduling Program Analysis symbolic execution static analysis fuzzing Check OS Distribution 423 from fuzzing + 15,546 from redebug + 11690 from mayhem Weighted coupon bug collecting with randomized MAB algs. 1.55x more bugs [Woo13] Mayhem 11,690 bugs, 152 exploits [Cha11,Avgerinos12]
we’re not even close to done GE A C Refinement-Based Component Analysis for Binary Code (DARPA, with Engler) Breaking the Satisfiability Barrier (NSF, with Tinelli and Barrett) High School Hacking Competition And others: SMT Hardness (w/ Williams) Exploiting multi-core for behavior-based detection and repair (NSF, w/ Mutlu, Mowry) Vetting commodity systems (DARPA, w/ Gligor, Jaeger) ....
It seems wrong to not try. White Our Vision: Automatically Check the World’s Software for Exploitable Bugs It seems wrong to not try.
Thank You! Questions? Credits Postdocs: Manuel Egele Maverick Woo PhD Students: Thanassis Avgerinos Tiffany Bao Sang Kil Cha Peter Chapman Samantha Gottlieb Jiyong Jang Matt Maurer Alex Rebert Ed Schwartz Jonathan Burket Undergrads: David Kohlbrenner Tyler Nighswander Brian Pak Collaborators: Robert Brumley Jonathan Diamond Brent Ledvina Special Thanks: Coherent Navigation Mike Carns Pete Kind Barbara McNamara Funding: Core Security DARPA Google Lockheed Martin Northrop Grumman NSA NSF SEI ODNI Symantec Microsoft Wiley Pearson Amazon AWS
END