 Static Analysis ◦ Towards Automatic Signature Generation of Vulnerability-based Signature  Dynamic Analysis ◦ Unleashing Mayhem on Binary Code  Automatic.

Slides:



Advertisements
Similar presentations
Finding bugs: Analysis Techniques & Tools Symbolic Execution & Constraint Solving CS161 Computer Security Cho, Chia Yuan.
Advertisements

Masahiro Fujita Yoshihisa Kojima University of Tokyo May 2, 2008
Defenses. Preventing hijacking attacks 1. Fix bugs: – Audit software Automated tools: Coverity, Prefast/Prefix. – Rewrite software in a type safe languange.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
David Brumley Carnegie Mellon University Credit: Some slides from Ed Schwartz.
Fuzzing and Patch Analysis: SAGEly Advice. Introduction.
A survey of techniques for precise program slicing Komondoor V. Raghavan Indian Institute of Science, Bangalore.
SMU SRG reading by Tey Chee Meng: Automatic Patch-Based Exploit Generation is Possible: Techniques and Implications by David Brumley, Pongsin Poosankam,
David Brumley, Pongsin Poosankam, Dawn Song and Jiang Zheng Presented by Nimrod Partush.
Bouncer securing software by blocking bad input Miguel Castro Manuel Costa, Lidong Zhou, Lintao Zhang, and Marcus Peinado Microsoft Research.
Review: Software Security David Brumley Carnegie Mellon University.
Unleashing Mayhem on Binary Code
Detecting Format String Vulnerabilities with Type Qualifier Umesh Shankar, Kunal Talwar, Jeffrey S. Foster, David Wanger University of California at Berkeley.
Models and Security Requirements for IDS. Overview The system and attack model Security requirements for IDS –Sensitivity –Detection Analysis methodology.
Vertically Integrated Analysis and Transformation for Embedded Software John Regehr University of Utah.
1 Achieving Trusted Systems by Providing Security and Reliability (Research Project #22) Project Members: Ravishankar K. Iyer, Zbigniew Kalbarczyk, Jun.
Validating High-Level Synthesis Sudipta Kundu, Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University of California, San.
Achieving Trusted Systems by Providing Security and Reliability Ravishankar K. Iyer, Zbigniew Kalbarczyk, Jun Xu, Shuo Chen, Nithin Nakka and Karthik Pattabiraman.
CSE S. Tanimoto Syntax and Types 1 Representation, Syntax, Paradigms, Types Representation Formal Syntax Paradigms Data Types Type Inference.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Processes in Unix, Linux, and Windows CS-502 Fall Processes in Unix, Linux, and Windows CS502 Operating Systems (Slides include materials from Operating.
Control hijacking attacks Attacker’s goal: – Take over target machine (e.g. web server) Execute arbitrary code on target by hijacking application control.
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 4: SMT-based Bounded Model Checking of Concurrent Software.
By David Brumley, James Newsome, Dawn Song and Hao Wang and Somesh Jha.
Dynamic Test Generation To Find Integer Bugs in x86 Binary Linux Programs David Molnar Xue Cong Li David Wagner.
Security Exploiting Overflows. Introduction r See the following link for more info: operating-systems-and-applications-in-
DART: Directed Automated Random Testing Koushik Sen University of Illinois Urbana-Champaign Joint work with Patrice Godefroid and Nils Klarlund.
CUTE: A Concolic Unit Testing Engine for C Technical Report Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
Process in Unix, Linux, and Windows CS-3013 A-term Processes in Unix, Linux, and Windows CS-3013 Operating Systems (Slides include materials from.
CAP6135: Malware and Software Vulnerability Analysis Buffer Overflow : Example of Using GDB to Check Stack Memory Cliff Zou Spring 2011.
Computer Security and Penetration Testing
CSC-682 Cryptography & Computer Security Sound and Precise Analysis of Web Applications for Injection Vulnerabilities Pompi Rotaru Based on an article.
Effective Real-time Android Application Auditing
Mitigation of Buffer Overflow Attacks
Packet Vaccine: Black-box Exploit Detection and Signature Generation
COMPUTER SECURITY MIDTERM REVIEW CS161 University of California BerkeleyApril 4, 2012.
Jose Sanchez 1 o Tielei Wang†, TaoWei†, Zhiqiang Lin‡, Wei Zou†. o Purdue University & Peking University o Proceedings of NDSS'09: Network and Distributed.
Automatic Diagnosis and Response to Memory Corruption Vulnerabilities Presenter: Jianyong Dai Jun Xu, Peng Ning, Chongkyung Kil, Yan Zhai, Chris Bookhot.
MICHALIS POLYCHRONAKIS(COLUMBIA UNIVERSITY,USA), KOSTAS G. ANAGNOSTAKIS(NIOMETRICS, SINGAPORE), EVANGELOS P. MARKATOS(FORTH-ICS, GREECE) ACSAC,2010 Comprehensive.
Introduction to Problem Solving. Steps in Programming A Very Simplified Picture –Problem Definition & Analysis – High Level Strategy for a solution –Arriving.
CPS4200 Unix Systems Programming Chapter 2. Programs, Processes and Threads A program is a prepared sequence of instructions to accomplish a defined task.
Xusheng Xiao North Carolina State University CSC 720 Project Presentation 1.
Buffer Overflow Group 7Group 8 Nathaniel CrowellDerek Edwards Punna ChalasaniAxel Abellard Steven Studniarz.
Buffer Overflow Attack Proofing of Code Binary Gopal Gupta, Parag Doshi, R. Reghuramalingam, Doug Harris The University of Texas at Dallas.
Symbolic and Concolic Execution of Programs Information Security, CS 526 Omar Chowdhury 10/7/2015Information Security, CS 5261.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Binary Concolic Execution for Automatic Exploit Generation Todd Frederick.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 11: Functions and stack frames.
Buffer overflow and stack smashing attacks Principles of application software security.
Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software Paper by: James Newsome and Dawn Song.
CUTE: A Concolic Unit Testing Engine for C Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
/ PSWLAB Evidence-Based Analysis and Inferring Preconditions for Bug Detection By D. Brand, M. Buss, V. C. Sreedhar published in ICSM 2007.
1 Test Coverage Coverage can be based on: –source code –object code –model –control flow graph –(extended) finite state machines –data flow graph –requirements.
MOPS: an Infrastructure for Examining Security Properties of Software Authors Hao Chen and David Wagner Appears in ACM Conference on Computer and Communications.
Beyond Stack Smashing: Recent Advances In Exploiting Buffer Overruns Jonathan Pincus and Brandon Baker Microsoft Researchers IEEE Security and.
CAP6135: Malware and Software Vulnerability Analysis Buffer Overflow : Example of Using GDB to Check Stack Memory Cliff Zou Spring 2014.
Content Coverity Static Analysis Use cases of Coverity Examples
Mitigation against Buffer Overflow Attacks
CMSC 345 Defensive Programming Practices from Software Engineering 6th Edition by Ian Sommerville.
Taint tracking Suman Jana.
High Coverage Detection of Input-Related Security Faults
CSCI1600: Embedded and Real Time Software
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis.
Representation, Syntax, Paradigms, Types
CSC-682 Advanced Computer Security
CS5123 Software Validation and Quality Assurance
CUTE: A Concolic Unit Testing Engine for C
CAP6135: Malware and Software Vulnerability Analysis Buffer Overflow : Example of Using GDB to Check Stack Memory Cliff Zou Spring 2016.
Model Checking and Its Applications
CSCI1600: Embedded and Real Time Software
Format String Vulnerability
Presentation transcript:

 Static Analysis ◦ Towards Automatic Signature Generation of Vulnerability-based Signature  Dynamic Analysis ◦ Unleashing Mayhem on Binary Code  Automatic exploit detection: attack and defense

 Towards Automatic Signature Generation of Vulnerability-based Signature

 Definition ◦ Vulnerability - A vulnerability is a type of bug that can be used by an attacker to alter the intended operation of the software in a malicious way. ◦ Exploit - An exploit is an actual input that triggers a software vulnerability, typically with malicious intent and devastating consequences

 Zero-day attacks that exploit unknown vulnerabilities represent a serious threat  No patch or signature available  Symantec:20 unknown vulnerabilities exploited 07/2005 – 06/2007  Current practice is new vulnerability analysis and protection generation is mostly manual  Our goal: automate the process of protection generation for unknown vulnerabilities

 Software Patch: patch the binary of vulnerable application  Input Filter: a network firewall or a module on the I/O path  Data Patch: patch the data input instead of binary  Signature: signature-based input filtering Data Input Input Filter Vulnerable application Dropped

 Automatic signature generation  Reason: ◦ Manual signature generation is slow and error ◦ Fast generation is important – previously unknown or unpatched vulnerabilities can be exploited orders of magnitude faster than a human can respond ◦ More accurate

 There are usually several different polymorphic exploit variants that can trigger a software vulnerability  Exploit variants may differ syntactically but be semantically equivalent  To be effective -- the signature should be constructed based on the property of the vulnerability, instead of an exploit

 Require manual steps  Employ heuristics which may fail in many settings  Techniques rely on specific properties of an exploit – return addresses  Only work for specific vulnerabilities in specific circumstances

 At a high level, our main contribution is a new class of signature, that is not specific to details such as whether an exploit successfully hijacks control of the program, but instead whether executing an input will (potentially) result in an unsafe execution state.

 vulnerability signature ◦ whether executing an input potentially results in an unsafe program state  T(P, x) ◦ the execution trace obtained by executing a program P on input x  Vulnerability condition ◦ representation (how to express a vulnerability as a signature) ◦ coverage (measured by false positive rate)

 vulnerability signature ◦ representation for set of inputs that define a specified vulnerability condition  trade-offs ◦ representation: matching accuracy vs. efficiency ◦ signature creation: creation time vs. coverage  Tuple {P,T,x,c} ◦ binary program (P), instruction trace (T), exploit string (x), vulnerability condition (c)

 (P,c) = (,c)  T(P,x) is the execution trace of running P with input x means T satisfies vulnerability condition c  L P,c consists of the set of all inputs x to a program P such that  Formally:  An exploit for a vulnerability (P,c) is an input

 P given in box  x = g/AAAA  T={1,2,3,4,6,7, 8,9,8,10,11,10, 11,10,11,10, 11,10,11}  c = heap overflow (on 5th iteration of line 11)

 A vulnerability signature is a matching function MATCH which for an input x returns either EXPLOIT or BENIGN for a program P without running the program  A perfect vulnerability signature satisfies  Completeness:  Soundness:

 C: Ґ×D×M×K×I ->{BENIGN, EXPLOIT}  Ґ is a memory  D is the set of variables defined  M is the program’s map from memory to values  K is the continuation stack  I is the next instruction to execute

 Turing machine signatures ◦ precise (no false positive or negatives) ◦ may not terminate (in presence of loops, e.g.)  symbolic constraint signatures ◦ approximates looping, aliasing ◦ guaranteed to terminate  regular expression signatures ◦ approximates elementary constructs (counting) ◦ very efficient

 Can provide a precise, even exact, characterization of the vulnerability condition in a particular program  A TM that exactly emulates the program has no error rate

 says that for 10-char input, the first char is ‘g’ or ‘G’, up to four of the next chars may be spaces and at least 5 chars are non-spaces

 says ‘g’ or ‘G’ followed by 0 or more spaces and at least 5 non-spaces  E.g: [g|G][ ]*[ˆ ]{5,}

 TM - inlining vulnerability condition takes poly time  Symb. Constraint - poly-time transformations on TM  Regexp - solve constraint (exp time; PSPACE- complete)  or data-flow on TM (poly time)

 Input: ◦ Vulnerable program P ◦ Vul condition c ◦ Sample exploit x ◦ Instruction trace T  Output: ◦ TM sig ◦ Symbolic constraint sig ◦ RegEx sig

 MEP is a straight-line program -- e.g. the path that the exploit took to reach the vulnerability  PEP includes different paths to the vulnerability  a complete PEP coverage signature accepts all inputs in L P,c  complete coverage through a chop of the program includes all paths from the input read (v init ) to the vulnerability point (v final )

 Statically estimate effects of memory updates and loops  Memory updates: SSA analysis  Loops: static unrolling

 9000 lines C++ code ◦ CBMC model checker to build/solve symbolic constraints, generate RegEx’s ◦ disassembler based on Kruegel; IR new  ATPhttpd ◦ various vulnerabilities; sprintf-style string too long ◦ 10 distinct subpaths to RegEx in sec  BIND ◦ stack overflow vulnerability; TSIG vulnerability ◦ 10 distinct graphs in symbolic constraint ◦ 30ms for chopping ◦ 88% of functions were reachable between entry and vulnerability

 Propose a framework on automatically generate vulnerability signatures ◦ Turing Machine ◦ Symbolic Constraints ◦ Regular Expressions  Preliminary work on the feasibility of a grand challenge problem for decades

Attack: Dynamic Analysis

Automatically Find Bugs & Generate Exploits 27 Explore Program

int outprintf( const char *fmt, … ) { int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out } int main( int argc, char* argv[] ) { const char *arg; while( (arg = *argv++) != 0 ) { switch ( arg[0] ) { case ‘-’: { switch ( arg[1] ) { case 0: … default: outprintf( “unknown switch %s\n”, arg[1] ); } default: … } … 28 Reading user input from command line Buffer overflow CVE

int outprintf( const char *fmt, … ) { int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out } int main( int argc, char* argv[] ) { const char *arg; while( (arg = *argv++) != 0 ) { switch ( arg[0] ) { case ‘-’: { switch ( arg[1] ) { case 0: … default: outprintf( “unknown switch %s\n”, arg[1] ); } default: … } … 29 Many Branches!

Automatically Find Bugs & Generate Exploits 30 Transfer Control to Attacker Code (exec “/bin/sh”)

int outprintf( const char *fmt, … ) { int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out } int main( int argc, char* argv[] ) { const char *arg; while( (arg = *argv++) != 0 ) { switch ( arg[0] ) { case ‘-’: { switch ( arg[1] ) { case 0: … default: outprintf( “unknown switch %s\n”, arg[1] ); } default: … } … 31 outprintf … fmt ret addr count args buf user input main esp

int outprintf( const char *fmt, … ) { int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out } int main( int argc, char* argv[] ) { const char *arg; while( (arg = *argv++) != 0 ) { switch ( arg[0] ) { case ‘-’: { switch ( arg[1] ) { case 0: … default: outprintf( “unknown switch %s\n”, arg[1] ); } default: … } … 32 Read Return Address from Stack Pointer (esp) 32 outprintf … fmt ret addr count args buf user input main esp Control Hijack Possible

Source int main( int argc, char* argv[] ) { const char *arg; while( (arg = *argv++) != 0 ) { … int main( int argc, char* argv[] ) { const char *arg; while( (arg = *argv++) != 0 ) { … Executables (Binary) Automatically Find Bugs & Generate Exploits for Executables

f t f t f t x = input() 34 if x > 42 if x*x = 0xffffffff vuln() x can be anything x > 42 (x > 42) ∧ (x*x == 0xffffffff) (x > 42) ∧ (x*x == 0xffffffff) if x < 100

f t f t f t x = input() if x > 42 if x*x = 0xffffffff vuln() 35 x can be anything x > 42 (x > 42) ∧ (x*x == 0xffffffff) (x > 42) ∧ (x*x == 0xffffffff) Π = if x < 100

f t f t f t x = input() 36 if x > 42 if x*x = 0xffffffff vuln() x can be anything x > 42 (x > 42) ∧ (x*x == 0xffffffff) (x > 42) ∧ (x*x == 0xffffffff) Violates Safety Policy if x < 100

int outprintf( const char *fmt, … ) { int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out } 37 outprintf … fmt ret addr count args buf user input main esp Return to user-controlled address EIP not affected by user input Instruction Pointer (EIP) level:

38 Π ∧ input[0-31]= attack code ∧ input[ ] = attack code address Exploit is an input that satisfies the predicate: Exploit Predicate Can transfer control to attack code? Can position attack code?

Symbolic ExecutionExploit Generation 39 Efficient Resource Management Symbolic Index Challenge Hybrid Execution Index-based Memory Model

40

41 Online Symbolic Execution Offline Symbolic Execution (a.k.a. Concolic)

42 One path at a time Method 1: Re-run from scratch Inefficient Re-executed every time

43 Method 2: Stop forking Miss paths Method 3: Snapshot process Huge disk image Hit Resource Cap Fork at branches

44 Our Method: Don’t snapshot state; use path predicate to recreate state 9.4M  500K Hit Resource Cap Fork at branches Ghostscript 8.62 “Checkpoin t”

45 Manage #executors in memory within resource cap ✓ Minimize duplicated work ✓ Lightweight checkpoints ✓

46

47 x = user_input(); y = mem[x]; assert (y == 42); x = user_input(); y = mem[x]; assert (y == 42); x can be anything Which memory cell contains 42? 2 32 cells to check Memory

Table lookups in standard APIs:  Parsing: sscanf, vfprintf, etc.  Character test: isspace, isalpha, etc.  Conversion: toupper, tolower, mbtowc, etc.  … 48

Over-constrained  Misses 40% of exploits in our experiments 49 Π ∧ mem[x] = 42 ∧ Π’ Π ∧ mem[x] = 42 ∧ Π’ Π ∧ x = 17 ∧ mem[x] = 42 ∧ Π’ ✓ Solvable ✗ Exploits

50 Π ∧ mem[x] = 42 ∧ Π’ ✗ Solvable ✓ Exploits

Path predicate (Π) constrains range of symbolic memory accesses 51 y = mem[x] f t x <= 42 x can be anything f t x >= 50 Use symbolic execution state to: Step 1: Bound memory addresses referenced Step 2: Make search tree for memory address values Π  42 < x < 50

52 mem[ x & 0xff ] 1.Value Set Analysis 1 provides initial bounds Over-approximation 2.Query solver to refine bounds Lowerbound = 0, Upperbound = 0xff [1] Balakrishnan et al., Analyzing memory accesses in x86 executables, ICCC 2004

53 y = mem[x] if x = 1 then y = 10 Index Memory Value if x = 2 then y = 12 if x = 3 then y = 22 if x = 4 then y = 20 ite( x < 3, left, right ) ite( x < 2, left, right )

54

55 Linux (22) Windows (7)

56 2 Unknown Bugs: FreeRadius, GnuGol

 We do not claim to find all exploitable bugs  Given an exploitable bug, we do not guarantee we will always find an exploit  Lots of room for improving symbolic execution, generating other types of exploits (e.g., info leaks), etc.  We do not consider defenses, which may defend against otherwise exploitable bugs ◦ Q [Schwartz et al., USENIX 2011] 57 But Every Report is Actionable

 APEG [Brumley et al., IEEE S&P 2008] ◦ Uses patch to locate bug, no shellcode executed  Automatic Generation of Control Flow Hijacking Exploits for Software Vulnerabilities [Heelan, MS Thesis, U. of Oxford 2009] ◦ Creates control flow hijack from crashing input  AEG [Avgerinos et al., NDSS 2011] ◦ Find and generate exploits from source code  BitBlaze, KLEE, Sage, S2E, etc. ◦ Symbolic execution frameworks 58

 Mayhem automatically generated 29 exploits against Windows and Linux programs  Hybrid Execution ◦ Efficient resource management for symbolic execution  Index-based Memory Modeling ◦ Handle symbolic memory in real-world applications 59

PPre-process ◦D◦Disassemble binary ◦C◦Convert to an intermediate representation (IR) CChop ◦A◦A chop is a partial program P’ that starts at T 0 and ends at exploit point ◦C◦Call-graph level CCompute the sig ◦G◦Get TM sig ◦T◦TM -> Symbolic constraint ◦S◦Symbolic constraint -> RegEx

 Chopping reduces the size of program to be analyzed  Performed on call- graph level  No function pointer support yet

 Replace outgoing JMP with RET BENIGN

 Solution 1: Solve constraint system S and or- ing together all members  Solution 2: Data-flow analysis optimization

if x < 100 if x*x = 0xffffffff x = input() 65 if x > 42 vuln() x can be anything x > 42 (x > 42) ∧ (x*x != 0xffffffff) (x > 42) ∧ (x*x != 0xffffffff) (x > 42) ∧ (x*x != 0xffffffff) ∧ (x >= 100) (x > 42) ∧ (x*x != 0xffffffff) ∧ (x >= 100) f t f t f t

66 42 mem[0x ] mem[input] … arg ret addr ptr buf user input … assert(*ptr==42); return; ptr address ptr = 0x

67 y = 2*x + 10 y = - 2*x + 28 Index Memory Value

68 Time 2x faster atphttpd v0.4b