Unleashing Mayhem on Binary Code

Slides:



Advertisements
Similar presentations
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, ThanassisAvgerinos,
Advertisements

Smashing the Stack for Fun and Profit
ByteWeight: Learning to Recognize Functions in Binary Code
Defenses. Preventing hijacking attacks 1. Fix bugs: – Audit software Automated tools: Coverity, Prefast/Prefix. – Rewrite software in a type safe languange.
David Brumley Carnegie Mellon University Credit: Some slides from Ed Schwartz.
Fuzzing and Patch Analysis: SAGEly Advice. Introduction.
Native x86 Decompilation Using Semantics-Preserving Structural Analysis and Iterative Control-Flow Structuring Edward J. Schwartz *, JongHyup Lee ✝, Maverick.
CSc 352 Programming Hygiene Saumya Debray Dept. of Computer Science The University of Arizona, Tucson
SMU SRG reading by Tey Chee Meng: Automatic Patch-Based Exploit Generation is Possible: Techniques and Implications by David Brumley, Pongsin Poosankam,
David Brumley, Pongsin Poosankam, Dawn Song and Jiang Zheng Presented by Nimrod Partush.
Review: Software Security David Brumley Carnegie Mellon University.
Intro to Exploitation Stack Overflows James McFadyen UTD Computer Security Group 10/20/2011.
Stack-Based Buffer Overflows Attacker – Can take over a system remotely across a network. local malicious users – To elevate their privileges and gain.
Computer Security Buffer Overflow lab Eu-Jin Goh.
Additional project suggestions Finish lecturing on BDDs Survey some symbolic analysis papers Ganesh, Week 8 CS 6110, Spring 2011.
Control hijacking attacks Attacker’s goal: – Take over target machine (e.g. web server) Execute arbitrary code on target by hijacking application control.
Some Example C Programs. These programs show how to use the exec function.
By David Brumley, James Newsome, Dawn Song and Hao Wang and Somesh Jha.
Security Exploiting Overflows. Introduction r See the following link for more info: operating-systems-and-applications-in-
CUTE: A Concolic Unit Testing Engine for C Technical Report Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
CAP6135: Malware and Software Vulnerability Analysis Buffer Overflow : Example of Using GDB to Check Stack Memory Cliff Zou Spring 2011.
Exploiting Buffer Overflows on AIX/PowerPC HP-UX/PA-RISC Solaris/SPARC.
Buffer Overflows Lesson 14. Example of poor programming/errors Buffer Overflows result of poor programming practice use of functions such as gets and.
Mitigation of Buffer Overflow Attacks
EECS 583 – Class 21 Research Topic 3: Dynamic Taint Analysis University of Michigan December 5, 2012.
Buffer Overflow CS461/ECE422 Spring Reading Material Based on Chapter 11 of the text.
Overflows & Exploits. In the beginning 11/02/1988 Robert Morris, Jr., a graduate student in Computer Science at Cornell, wrote an experimental, self-replicating,
CPS4200 Unix Systems Programming Chapter 2. Programs, Processes and Threads A program is a prepared sequence of instructions to accomplish a defined task.
Buffer Overflow Attack Proofing of Code Binary Gopal Gupta, Parag Doshi, R. Reghuramalingam, Doug Harris The University of Texas at Dallas.
What is exactly Exploit writing?  Writing a piece of code which is capable of exploit the vulnerability in the target software.
A Tool for Pro-active Defense Against the Buffer Overrun Attack D. Bruschi, E. Rosti, R. Banfi Presented By: Warshavsky Alex.
CSV 889: Concurrent Software Verification Subodh Sharma Indian Institute of Technology Delhi Scalable Symbolic Execution: KLEE.
Introduction to Information Security ROP – Recitation 5.
 Static Analysis ◦ Towards Automatic Signature Generation of Vulnerability-based Signature  Dynamic Analysis ◦ Unleashing Mayhem on Binary Code  Automatic.
Buffer overflow and stack smashing attacks Principles of application software security.
CS 155 Section 1 PP1 Eu-Jin Goh. Setting up Environment Demo.
A Survey on Runtime Smashed Stack Detection 坂井研究室 M 豊島隆志.
Information Security - 2. A Stack Frame. Pushed to stack on function CALL The return address is copied to the CPU Instruction Pointer when the function.
CUTE: A Concolic Unit Testing Engine for C Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
About Exploits Writing ABOUT EXPLOITS WRITING Gerardo Richarte 
CS426Fall 2010/Lecture 141 Computer Security CS 426 Lecture 14 Software Vulnerabilities: Format String and Integer Overflow Vulnerabilities.
CAP6135: Malware and Software Vulnerability Analysis Buffer Overflow : Example of Using GDB to Check Stack Memory Cliff Zou Spring 2014.
Software Dataplane Verification Mihai Dobrescu and Katerina Argyraki EPFL, Switzerland Awarded Best NSDI, 2014 Presented by YH.
CSC 482/582: Computer Security
Content Coverity Static Analysis Use cases of Coverity Examples
Introduction to Information Security
Buffer Overflow By Collin Donaldson.
Mitigation against Buffer Overflow Attacks
Checking the World’s Software for Exploitable Bugs
APEx: Automated Inference of Error Specifications for C APIs
Introduction to Information Security
MobiSys 2017 Symbolic Execution of Android Framework with Applications to Vulnerability Discovery and Exploit Generation Qiang Zeng joint work with Lannan.
Taint tracking Suman Jana.
Basic Control Hijacking Attacks
Software Security.
Q: Exploit Hardening Made Easy
Edward J. Schwartz, Thanassis Avgerinos, David Brumley
CS 465 Buffer Overflow Slides by Kent Seamons and Tim van der Horst
Software Security Lesson Introduction
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis.
CSC 495/583 Topics of Software Security Format String Bug (2) & Heap
CAP6135: Malware and Software Vulnerability Analysis Buffer Overflow : Example of Using GDB to Check Stack Memory Cliff Zou Spring 2015.
Introduction to Static Analyzer
CSC-682 Advanced Computer Security
CUTE: A Concolic Unit Testing Engine for C
CAP6135: Malware and Software Vulnerability Analysis Buffer Overflow : Example of Using GDB to Check Stack Memory Cliff Zou Spring 2016.
CAP6135: Malware and Software Vulnerability Analysis Buffer Overflow : Example of Using GDB to Check Stack Memory Cliff Zou Spring 2013.
CAP6135: Malware and Software Vulnerability Analysis Buffer Overflow : Example of Using GDB to Check Stack Memory Cliff Zou Spring 2010.
Format String Vulnerability
Presentation transcript:

Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University

Automatic Exploit Generation Challenge Automatically Find Bugs & Generate Exploits I = input(); if (I < 42) vuln(); else safe(); AEG Exploits Program

Automatic Exploit Generation Challenge Automatically Find Bugs & Generate Exploits Explore Program

Ghostscript v8.62 Bug Buffer overflow int outprintf( const char *fmt, … ) { int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out } int main( int argc, char* argv[] ) const char *arg; while( (arg = *argv++) != 0 ) { switch ( arg[0] ) { case ‘-’: { switch ( arg[1] ) { case 0: … default: outprintf( “unknown switch %s\n”, arg[1] ); default: … Buffer overflow Reading user input from command line CVE-2009-4270

Multiple Paths Many Branches! int outprintf( const char *fmt, … ) { int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out } int main( int argc, char* argv[] ) const char *arg; while( (arg = *argv++) != 0 ) { switch ( arg[0] ) { case ‘-’: { switch ( arg[1] ) { case 0: … default: outprintf( “unknown switch %s\n”, arg[1] ); default: … Many Branches! We need to consider multiple path

Automatic Exploit Generation Challenge Automatically Find Bugs & Generate Exploits Transfer Control to Attacker Code (exec “/bin/sh”)

Generating Exploits … fmt ret addr count args buf user input esp main int outprintf( const char *fmt, … ) { int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out } int main( int argc, char* argv[] ) const char *arg; while( (arg = *argv++) != 0 ) { switch ( arg[0] ) { case ‘-’: { switch ( arg[1] ) { case 0: … default: outprintf( “unknown switch %s\n”, arg[1] ); default: … … fmt ret addr count args buf main user input outprintf esp

Generating Exploits … fmt ret addr count args buf int outprintf( const char *fmt, … ) { int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out } int main( int argc, char* argv[] ) const char *arg; while( (arg = *argv++) != 0 ) { switch ( arg[0] ) { case ‘-’: { switch ( arg[1] ) { case 0: … default: outprintf( “unknown switch %s\n”, arg[1] ); default: … … fmt ret addr count args buf main Read Return Address from Stack Pointer (esp) user input Control Hijack Possible outprintf esp 8

Automatically Find Bugs & Generate Exploits Unleashing Mayhem Automatically Find Bugs & Generate Exploits for Executables Source int main( int argc, char* argv[] ) { const char *arg; while( (arg = *argv++) != 0 ) { … Executables (Binary) 01010010101010100101010010101010100101010101010101000100001000101001001001001000000010100010010101010010101001001010101001010101001010000110010101010111011001010101010101010100101010111110100101010101010101001010101010101010101010 Raw binary

Demo

How Mayhem Works: Symbolic Execution x = input() x can be anything x > 42 if x > 42 f t (x > 42) ∧ (x*x != 0xffffffff) if x*x = 0xffffffff Symbolic execution runs on symbolic input f t (x > 42) ∧ (x*x != 0xffffffff) ∧ (x >= 100) if x < 100 vuln() f t

How Mayhem Works: Symbolic Execution x = input() x can be anything x > 42 if x > 42 f t (x > 42) ∧ (x*x == 0xffffffff) if x*x = 0xffffffff f t if x < 100 vuln() f t

Π = Path Predicate = Π x = input() x can be anything x > 42 if x > 42 f t Π = (x > 42) ∧ (x*x == 0xffffffff) if x*x = 0xffffffff f t if x < 100 vuln() f t

How Mayhem Works: Symbolic Execution x = input() x can be anything x > 42 if x > 42 f t (x > 42) ∧ (x*x == 0xffffffff) if x*x = 0xffffffff During exploration, mayhem will detect bug when it sees safety policy violation f t Violates Safety Policy if x < 100 vuln() f t

Safety Policy in Mayhem Instruction Pointer (EIP) level: EIP not affected by user input … fmt ret addr count args buf main esp user input int outprintf( const char *fmt, … ) { int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out } outprintf Return to user-controlled address

Exploit Generation Π Exploit is an input that satisfies the predicate: Can position attack code? Π ∧ input[0-31] = attack code input[1038-1042] = attack code address Exploit Predicate Now we know Can transfer control to attack code?

Challenges Efficient Resource Management Symbolic Index Challenge Symbolic Execution Exploit Generation Efficient Resource Management Symbolic Index Challenge Hybrid Execution Index-based Memory Model

Challenge 1: Resource Management in Symbolic Execution

Current Resource Management in Symbolic Execution Offline Symbolic Execution (a.k.a. Concolic) Online Symbolic Execution

Re-executed every time Offline Execution One path at a time Re-executed every time Method 1: Re-run from scratch ⟹ Inefficient By re-running a program from scratch for every path, we only need to consider a single program state at at time

Online Execution Fork at branches Method 2: Stop forking ⟹ Miss paths Method 3: Snapshot process ⟹ Huge disk image Hit Resource Cap

Mayhem: Hybrid Execution Fork at branches Our Method: Don’t snapshot state; use path predicate to recreate state “Checkpoint” Ghostscript 8.62 Our Idea: Path predicate is a succinct state representation that allows you to suspend and resume Suspend and resume using Path Predicate 9.4M  500K Hit Resource Cap

Hybrid Execution ✓ ✓ ✓ Manage #executors in memory within resource cap Minimize duplicated work ✓ 30KB Lightweight checkpoints ✓

Challenge 2: Symbolic Indices

Which memory cell contains 42? Symbolic Indices x = user_input(); y = mem[x]; assert (y == 42); x can be anything Which memory cell contains 42? There are more causes… 232 cells to check Memory 232 -1

One Cause: Overwritten Pointers 42 mem[0x11223344] … arg ret addr ptr buf ptr = 0x11223344 ptr address 11223344 user input … assert(*ptr==42); return; mem[input] The following stack diagram contains a buffer and a local pointer. And the local pointer points to a memory object.

Another Cause: Table Lookups Table lookups in standard APIs: Parsing: sscanf, vfprintf, etc. Character test: isspace, isalpha, etc. Conversion: toupper, tolower, mbtowc, etc. …

Method 1: Concretization Π ∧ mem[x] = 42 ∧ Π’ Π ∧ x = 17 ∧ mem[x] = 42 ∧ Π’ ✓ Solvable ✗ Exploits Over-constrained Misses 40% of exploits in our experiments

Method 2: Fully Symbolic Π ∧ mem[x] = 42 ∧ Π’ Π ∧ mem[x] = 42 ∧ mem[0] = v0 ∧…∧ mem[232-1] = v232-1 ∧ Π’ mem[2] = v2 ✗ Solvable ✓ Exploits

Our Observation Path predicate (Π) constrains range of symbolic memory accesses y = mem[x] f t x <= 42 x can be anything x >= 50 Π  42 < x < 50 Use symbolic execution state to: Step 1: Bound memory addresses referenced Step 2: Make search tree for memory address values

Step 1 — Find Bounds mem[ x & 0xff ] Lowerbound = 0, Upperbound = 0xff Value Set Analysis1 provides initial bounds Over-approximation Query solver to refine bounds Without vsa 54 -> with vsa 14 [1] Balakrishnan et al., Analyzing memory accesses in x86 executables, ICCC 2004

Step 2 — Index Search Tree Construction if x = 1 then y = 10 y = mem[x] ite( x < 3, left, right ) if x = 2 then y = 12 ite( x < 2, left, right ) if x = 3 then y = 22 if x = 4 then y = 20 Index Memory Value 10 12 22 20

Fully Symbolic vs. Index-based Memory Modeling atphttpd v0.4b Time Timeout

Index Search Tree Optimization: Piecewise Linear Approximation y = - 2*x + 28 Index Memory Value See Paper for Details y = 2*x + 10

Piecewise Linear Approximation atphttpd v0.4b Time 2x faster

Exploit Generation

Windows (7) Linux (22)

2 Unknown Bugs: FreeRadius, GnuGol

But Every Report is Actionable Limitations We do not claim to find all exploitable bugs Given an exploitable bug, we do not guarantee we will always find an exploit Lots of room for improving symbolic execution, generating other types of exploits (e.g., info leaks), etc. We do not consider defenses, which may defend against otherwise exploitable bugs Q [Schwartz et al., USENIX 2011] But Every Report is Actionable

Related Work APEG [Brumley et al., IEEE S&P 2008] Uses patch to locate bug, no shellcode executed Automatic Generation of Control Flow Hijacking Exploits for Software Vulnerabilities [Heelan, MS Thesis, U. of Oxford 2009] Creates control flow hijack from crashing input AEG [Avgerinos et al., NDSS 2011] Find and generate exploits from source code BitBlaze, KLEE, Sage, S2E, etc. Symbolic execution frameworks

Conclusion Mayhem automatically generated 29 exploits against Windows and Linux programs Hybrid Execution Efficient resource management for symbolic execution Index-based Memory Modeling Handle symbolic memory in real-world applications You can find many more details in the paper

Thank You Our shepherd: Cristian Cadar Anonymous reviewers Maverick Woo, Spencer Whitman DARPA grant to CyLab at Carnegie Mellon University (N11AP20005/D11AP00262) NSF Career grant (CNS0953751) CyLab ARO support from grant DAAD19-02-1-0389 and W911NF-09-1- 0273

Q&A Sang Kil Cha (sangkilc@cmu.edu) http://www.ece.cmu.edu/~sangkilc/ http://security.ece.cmu.edu/aeg/index.html Add video