Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Binary Concolic Execution for Automatic Exploit Generation Todd Frederick
Vulnerabilities are everywhere… 2 Binary Concolic Execution
rtm Robert Morris An exploit 3 Binary Concolic Execution DD8F2F736800DD8F2F6 2696ED05E5ADD00DD00 DD5ADD03D05E5CBC3B shell# Finger Server 1987
The problem: exploiting vulnerable code o Find an exploit state in a program o Use a known existing vulnerability o Previous work automatically finds vulnerable states [Giffin, Jha, Miller 2006] 4 Binary Concolic Execution o Find input that drives the program down a path to the exploit state o Analyze program control flow o Walk through the program, finding inputs to reach the current point o Explore paths in the program to reach the vulnerability o Find an exploit state in a program o Use a known existing vulnerability o Previous work automatically finds vulnerable states [Giffin, Jha, Miller 2006]
The problem 5 Binary Concolic Execution normal input exploit Program Assume we know of a vulnerability
Running example 6 Binary Concolic Execution Program login: goodbad password:Using backdoor!
Working with binary code 7 Binary Concolic Execution Program : lea 0x4(%esp),%ecx : and $0xfffffff0,%esp : pushl 0xfffffffc(%ecx) c: push %ebp d: mov %esp,%ebp f: push %ebx : push %ecx : sub $0x10,%esp : call : mov $0x3,%eax e: mov $0x0,%ebx 80482a3: mov $0x80bd884,%ecx 80482a8: mov $0x10,%edx 80482ad: int $0x af: mov %eax,0xfffffff0(%ebp) 80482b2: movzbl 0x80bd886,%eax 80482b9: movsbl %al,%edx 80482bc: movzbl 0x80bd884,%eax 80482c3: movsbl %al,%eax 80482c6: mov %edx,%ecx 80482c8: sub %eax,%ecx 80482ca: mov %ecx,%eax 80482cc: cmp $0x2,%eax 80482cf: jne d1: movzbl 0x80bd886,%eax 80482d8: movsbl %al,%edx 80482db: movzbl 0x80bd885,%eax 80482e2: movsbl %al,%eax 80482e5: mov %edx,%ecx 80482e7: sub %eax,%ecx 80482e9: mov %ecx,%eax 80482eb: cmp $0x3,%eax 80482ee: jne f0: movzbl 0x80bd886,%eax 80482f7: cmp $0x64,%al 80482f9: jne fb: call c : jmp : call : mov $0x1,%eax c: mov $0x0,%ebx : int $0x : mov %eax,0xfffffff4(%ebp) : mov $0x0,%eax b: add $0x10,%esp e: pop %ecx f: pop %ebx : pop %ebp : lea 0xfffffffc(%ecx),%esp : ret exploit
Conceptual approach 8 Binary Concolic Execution Symbolic Execution Program Generated Input o Run program, tracking variables as expressions instead of actual (concrete) values o Collect expressions along the current path o Find concrete input to satisfy these expressions
Conceptual approach 9 Binary Concolic Execution o Run program, tracking variables as expressions instead of actual (concrete) values o Collect expressions along the current path o Find concrete input to satisfy these expressions Program Generated Input Symbolic Executor Solver Path Conditions
Conceptual approach 10 Binary Concolic Execution o Exponential number of paths o Limit and prioritize the paths we will explore Program Generated Input Symbolic Executor Solver Path Conditions Path Selector
Traditional symbolic execution 11 Binary Concolic Execution read_input() if( input[2]–input[0] == 2 ) if( input[2] == ‘d’ ) if( input[2]-input[1] == 3 ) backdoor() login()
Traditional symbolic execution 12 Binary Concolic Execution if( input[2] == ‘d’ ) if( input[2]-input[1] == 3 ) backdoor() login()
Traditional symbolic execution 13 Binary Concolic Execution if( input[2] == ‘d’ ) if( input[2]-input[1] == 3 ) backdoor() login()
Traditional symbolic execution 14 Binary Concolic Execution read_input() if( input[2]–input[0] == 2 ) if( input[2] == ‘d’ ) backdoor() login()
Traditional symbolic execution 15 Binary Concolic Execution read_input() if( input[2]–input[0] == 2 ) if( input[2]-input[1] == 3 ) backdoor() login()
Problems with symbolic execution Must maintain exponentially many symbolic states Expressions may be difficult or unfeasible to solve 16 Binary Concolic Execution Solution: Run program concretely and symbolically Concrete Execution Symbolic Execution Concolic Execution
Concolic execution overview 17 Binary Concolic Execution Instructions Program Concrete Executor Input Generated Input Symbolic Executor Solver Path Conditions Path Selector o Symbolic execution follows concrete path o Some expressions use concrete values
Concolic execution Advantages Track less state in parallel by following a single path at a time Simplify expressions by substituting concrete values for difficult sub expressions Disadvantage Concrete values only hold for a specific set of concrete inputs, so mixing concrete values and expressions may produce inaccurate expressions 18 Binary Concolic Execution
Concolic execution example 19 Binary Concolic Execution Input good read_input() if( input[2]–input[0] == 2 ) if( input[2] == ‘d’ ) if( input[2]-input[1] == 3 ) backdoor() login() Concrete Memory buffer:
Concolic execution example 20 Binary Concolic Execution Input good if( input[2] == ‘d’ ) if( input[2]-input[1] == 3 ) backdoor() login() Concrete Memory buffer: g,o,o,d
Concolic execution example 21 Binary Concolic Execution Input good if( input[2] == ‘d’ ) if( input[2]-input[1] == 3 ) backdoor() login() Concrete Memory buffer: g,o,o,d
Concolic execution example 22 Binary Concolic Execution Input good read_input() if( input[2]–input[0] == 2 ) if( input[2] == ‘d’ ) backdoor() Generated Input egg Concrete Memory buffer: g,o,o,d
Concolic execution example 23 Binary Concolic Execution Input egg read_input() if( input[2]–input[0] == 2 ) if( input[2] == ‘d’ ) if( input[2]-input[1] == 3 ) backdoor() login() Concrete Memory buffer:
Concolic execution example 24 Binary Concolic Execution Input egg if( input[2] == ‘d’ ) if( input[2]-input[1] == 3 ) backdoor() login() Concrete Memory buffer: e,g,g
Concolic execution example 25 Binary Concolic Execution Input egg if( input[2] == ‘d’ ) if( input[2]-input[1] == 3 ) backdoor() login() Concrete Memory buffer: e,g,g
Concolic execution example 26 Binary Concolic Execution Input egg read_input() if( input[2]–input[0] == 2 ) if( input[2] == ‘d’ ) backdoor() login() Concrete Memory buffer: e,g,g
Concolic execution example 27 Binary Concolic Execution Input egg read_input() if( input[2]–input[0] == 2 ) if( input[2]-input[1] == 3 ) backdoor() Generated Input port Concrete Memory buffer: e,g,g
Concolic execution example 28 Binary Concolic Execution Input port read_input() if( input[2]–input[0] == 2 ) if( input[2] == ‘d’ ) if( input[2]-input[1] == 3 ) backdoor() login() Concrete Memory buffer:
Concolic execution example 29 Binary Concolic Execution Input port if( input[2] == ‘d’ ) if( input[2]-input[1] == 3 ) backdoor() login() Concrete Memory buffer: p,o,r,t
Concolic execution example 30 Binary Concolic Execution Input port if( input[2] == ‘d’ ) if( input[2]-input[1] == 3 ) backdoor() login() Concrete Memory buffer: p,o,r,t
Concolic execution example 31 Binary Concolic Execution Input port read_input() if( input[2]–input[0] == 2 ) if( input[2] == ‘d’ ) backdoor() login() Concrete Memory buffer: p,o,r,t
Concolic execution example 32 Binary Concolic Execution Input port read_input() if( input[2]–input[0] == 2 ) if( input[2]-input[1] == 3 ) backdoor() login() Concrete Memory buffer: p,o,r,t
Concolic execution example 33 Binary Concolic Execution Input port read_input() if( input[2]–input[0] == 2 ) if( input[2] == ‘d’ ) if( input[2]-input[1] == 3 ) Generated Input bad Concrete Memory buffer: p,o,r,t
Concolic execution example 34 Binary Concolic Execution Input bad read_input() if( input[2]–input[0] == 2 ) if( input[2] == ‘d’ ) if( input[2]-input[1] == 3 ) backdoor() login() Concrete Memory buffer:
Concolic execution example 35 Binary Concolic Execution Input bad if( input[2] == ‘d’ ) if( input[2]-input[1] == 3 ) backdoor() login() Concrete Memory buffer: b,a,d
Concolic execution example 36 Binary Concolic Execution Input bad if( input[2] == ‘d’ ) if( input[2]-input[1] == 3 ) backdoor() login() Concrete Memory buffer: b,a,d
Concolic execution example 37 Binary Concolic Execution Input bad read_input() if( input[2]–input[0] == 2 ) if( input[2] == ‘d’ ) backdoor() login() Concrete Memory buffer: b,a,d
Concolic execution example 38 Binary Concolic Execution Input bad read_input() if( input[2]–input[0] == 2 ) if( input[2]-input[1] == 3 ) backdoor() login() Concrete Memory buffer: b,a,d
Concolic execution example 39 Binary Concolic Execution Input bad read_input() if( input[2]–input[0] == 2 ) if( input[2] == ‘d’ ) if( input[2]-input[1] == 3 ) login() Success Concrete Memory buffer: b,a,d
Inaccurate expressions Some variables depend on input Replacing these variables with concrete values may yield inaccurate expressions Solving an inaccurate path condition may produce input that does not take the desired path 40 Binary Concolic Execution
Concolic execution system design 41 Binary Concolic Execution Concrete Executor ProgramInput Solver InstructionsGenerated Input Symbolic Executor Path Conditions Path Selector
Concolic execution system design 42 Binary Concolic Execution Concrete Executor ProgramInput InstructionsGenerated Input Symbolic Executor STP (Solver) Path Conditions Path Selector SymEval Dyninst ProcControl API
Concrete execution components 43 Binary Concolic Execution Concrete Executor Dyninst ProcControl API
Concrete execution components 44 Binary Concolic Execution Concrete Executor Redirects program input Reads actual values of instruction operands Tracks path taken Concrete Executor Redirects program input Reads actual values of instruction operands Tracks path taken Dyninst Assists with static analysis Dyninst Assists with static analysis ProcControl API Runs program using single-stepping or breakpoints ProcControl API Runs program using single-stepping or breakpoints
Concolic execution system design 45 Binary Concolic Execution Concrete Executor ProgramInput InstructionsGenerated Input Symbolic Executor STP (Solver) Path Conditions Path Selector SymEval Dyninst ProcControl API
Symbolic execution components 46 Binary Concolic Execution Symbolic Executor SymEval
Symbolic execution components 47 Binary Concolic Execution Symbolic Executor Symbolic memory Identify input Update symbolic memory Extract conditional predicates Symbolic Executor Symbolic memory Identify input Update symbolic memory Extract conditional predicates SymEval Represents instruction semantics as ASTs SymEval Represents instruction semantics as ASTs
Concolic execution system design 48 Binary Concolic Execution Concrete Executor ProgramInput InstructionsGenerated Input Symbolic Executor STP (Solver) Path Conditions Path Selector SymEval Dyninst ProcControl API
Path searching components 49 Binary Concolic Execution STP (Solver) Path Conditions Path Selector
Path searching components 50 Binary Concolic Execution STP (Solver) Designed for program analysis applications Handles bit-vector data types STP (Solver) Designed for program analysis applications Handles bit-vector data types Path Conditions One term for each branch taken Path Selector Decides where to branch off from current path Is a depth-first search for now Other strategies will use static CFG analysis Path Selector Decides where to branch off from current path Is a depth-first search for now Other strategies will use static CFG analysis
Previous Work in Binary Concolic Execution IDS signature generation [Song, et al. 2008] Combined exploit strings to create signatures Required an initial exploit, or a patch for the vulnerability Program testing [Godefroid, et al. 2008] Created test cases with maximum code coverage in mind Used instruction-level tracing for concrete execution 51 Binary Concolic Execution
Potential Benefits of our Approach Our approach will be capable of finding the initial exploit We will do concrete execution with instrumentation, which gives us the flexibility to instrument selectively We plan to develop smarter path selection techniques using static control flow analysis 52 Binary Concolic Execution
Status Concrete execution partially implemented using ProcControlAPI Using standard input Will support network and environment as inputs Symbolic execution and path selection not implemented yet Driving development of SymEval Instruction semantics AST simplification 53 Binary Concolic Execution
Conclusion 54 Binary Concolic Execution Finding the first exploit with binary concolic execution using instrumentation movzbl 0x80bd886,%eax cmp $0x64,%al jne call c input[2] == ‘d’ mov %edx,%ecx sub %eax,%ecx mov %ecx,%eax cmp $0x2,%eax jne movzbl 0x80bd886,%eax cmp $0x64,%al jne call c mov %edx,%ecx sub %eax,%ecx mov %ecx,%eax cmp $0x3,%eax jne