© 2006 Nathan RosenblumMarch 2006Unconventional Code Constructs The New Dyninst Code Parser: Binary Code Isn't as Simple as it Used to Be Nathan Rosenblum.

Slides:



Advertisements
Similar presentations
PASTE 2011 Szeged, Hungary September 5, 2011 Labeling Library Functions in Stripped Binaries Emily R. Jacobson, Nathan Rosenblum, and Barton P. Miller.
Advertisements

Fabián E. Bustamante, Spring 2007 Machine-Level Programming II: Control Flow Today Condition codes Control flow structures Next time Procedures.
ByteWeight: Learning to Recognize Functions in Binary Code
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
Introduction to Information Security ROP – Recitation 5 nirkrako at post.tau.ac.il itamarg at post.tau.ac.il.
Native x86 Decompilation Using Semantics-Preserving Structural Analysis and Iterative Control-Flow Structuring Edward J. Schwartz *, JongHyup Lee ✝, Maverick.
Lecture 11 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD
Peephole Optimization Final pass over generated code: examine a few consecutive instructions: 2 to 4 See if an obvious replacement is possible: store/load.
PC hardware and x86 3/3/08 Frans Kaashoek MIT
1 ICS 51 Introductory Computer Organization Fall 2006 updated: Oct. 2, 2006.
Position Independent Code self sufficiency of combining program.
Assembly תרגול 8 פונקציות והתקפת buffer.. Procedures (Functions) A procedure call involves passing both data and control from one part of the code to.
Practical Session 8 Computer Architecture and Assembly Language.
Branch Regulation: Low-Overhead Protection from Code Reuse Attacks Mehmet Kayaalp, Meltem Ozsoy, Nael Abu-Ghazaleh and Dmitry Ponomarev Department of Computer.
Recitation 2: Assembly & gdb Andrew Faulring Section A 16 September 2002.
Machine-Level Programming 3 Control Flow Topics Control Flow Switch Statements Jump Tables.
6.828: PC hardware and x86 Frans Kaashoek
Paradyn Project Dyninst/MRNet Users’ Meeting Madison, Wisconsin August 7, 2014 The Evolution of Dyninst in Support of Cyber Security Emily Gember-Jacobson.
University of Washington x86 Programming III The Hardware/Software Interface CSE351 Winter 2013.
Practical Session 4. Labels Definition - advanced label: (pseudo) instruction operands ; comment valid characters in labels are: letters, numbers, _,
Analysis Of Stripped Binary Code Laune Harris University of Wisconsin – Madison
Paradyn Project Petascale Tools Workshop Madison, Wisconsin Aug 4-Aug 7, 2014 Binary Code is Not Easy Xiaozhu Meng, Emily Gember-Jacobson, and Bill Williams.
Dr. José M. Reyes Álamo 1.  Review: ◦ Statement Labels ◦ Unconditional Jumps ◦ Conditional Jumps.
Machine-Level Programming 3 Control Flow Topics Control Flow Switch Statements Jump Tables.
Analyzing Memory Accesses in Obfuscated x86 Executables Michael Venable Mohamed R. Choucane Md. Enamul Karim Arun Lakhotia (Presenter) DIMVA 2005 Wien.
CS216: Program and Data Representation University of Virginia Computer Science Spring 2006 David Evans Lecture 22: Unconventional.
Microprocessors The ia32 User Instruction Set Jan 31st, 2002.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Binary Concolic Execution for Automatic Exploit Generation Todd Frederick.
Functions/Methods in Assembly
Compiler Construction Code Generation Activation Records
© 2006 Andrew R. BernatMarch 2006Generalized Code Relocation Generalized Code Relocation for Instrumentation and Efficiency Andrew R. Bernat University.
COMP1070/2002/lec1/H.Melikian COMP1070 Lecture #2 Computers and Computer Languages Some terminology What is Software? Operating Systems.
1 Linking. 2 Outline Symbol Resolution Relocation Suggested reading: 7.6~7.7.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 29-May 1, 2013 Detecting Code Reuse Attacks Using Dyninst Components Emily Jacobson, Drew.
Overview of Back-end for CComp Zhaopeng Li Software Security Lab. June 8, 2009.
Practical Session 8. Position Independent Code- self sufficiency of combining program Position Independent Code (PIC) program has everything it needs.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin May 2-4, 2011 unstrip: Restoring Function Information to Stripped Binaries Using Dyninst Emily.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004 Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004.
Correct RelocationMarch 20, 2016 Correct Relocation: Do You Trust a Mutated Binary? Drew Bernat
OUTLINE 2 Pre-requisite Bomb! Pre-requisite Bomb! 3.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Paradyn Project Safe and Efficient Instrumentation Andrew Bernat.
Machine-Level Programming 2 Control Flow
Instruction Set Architecture
Assembly language.
Static and dynamic analysis of binaries
Conditional Branch Example
Computer Architecture and Assembly Language
Aaron Miller David Cohen Spring 2011
Introduction to Compilers Tim Teitelbaum
Emily Jacobson and Nathan Rosenblum
Chapter 3 Machine-Level Representation of Programs
Computer Architecture and Assembly Language
Ramblr Making Reassembly Great Again
Defeating Instruction Set Randomization Nora Sovarel
C Prog. To Object Code text text binary binary Code in files p1.c p2.c
Machine-Level Programming 2 Control Flow
Assembly Language Programming II: C Compiler Calling Sequences
Machine-Level Programming 2 Control Flow
Machine-Level Representation of Programs III
Machine-Level Programming 2 Control Flow
Machine Level Representation of Programs (IV)
Efficient x86 Instrumentation:
Multi-modules programming
Chapter 3 Machine-Level Representation of Programs
X86 Assembly Review.
Intermediate Code Generation
Computer Architecture and System Programming Laboratory
Computer Architecture and Assembly Language
Dynamic Binary Translators and Instrumenters
Computer Architecture and System Programming Laboratory
Presentation transcript:

© 2006 Nathan RosenblumMarch 2006Unconventional Code Constructs The New Dyninst Code Parser: Binary Code Isn't as Simple as it Used to Be Nathan Rosenblum University of Wisconsin

– 2 –© 2006 Nathan RosenblumUnconventional Code Constructs Binary Analysis  Processing of the binary code to extract syntactic and symbolic information from many sources: Symbol tables (if present) Decode (disassemble) instructions Control-flow information: basic blocks, loops, functions Data-flow information: from basic register information to highly sophisticated (and expensive) analyses.

– 3 –© 2006 Nathan RosenblumUnconventional Code Constructs Products of Binary Analysis  High-level organization and characteristics Function entry/exit points Intra-procedural call graph Inter-procedural control-flow graph Exception handlers Jump tables Virtual function tables  Abstract assembly representation  Data-flow characteristics Register liveness (for instrumentation, modification)

– 4 –© 2006 Nathan RosenblumUnconventional Code Constructs Uses of Binary Analysis  Debugging  Testing  Performance profiling  Performance modeling  Behavior Modeling  Dynamic Modification  Binary Rewriting  Reverse engineering

– 5 –© 2006 Nathan RosenblumUnconventional Code Constructs Binary Analysis Tool Goals SafeEliminate false positives to make instrumentation safe AccurateMinimize false negatives for complete view of the binary OpportunisticUse all available information and techniques to maximum effect ResilientTools are robust to unexpected and unusual applications AutomatedAnalysis does not depend on human interaction ComplementaryProduce products compatible with source- level analysis tools.

– 6 –© 2006 Nathan RosenblumUnconventional Code Constructs Why is Binary Analysis Hard? Func foo() { … switch(a) { … } … } push %ebp mov %esp, %ebp … mov [0x1d], %eax jmp *%eax … The Compiler Source CodeBinary

– 7 –© 2006 Nathan RosenblumUnconventional Code Constructs Current Approaches  Linear disassembly of binaries is insufficient Symbol tables often lie, or are absent Functions are not address ranges, may be non- contiguous  Parsing based on program control flow Commonly used approach: UQBTLEEL RADIDA-Pro Dyninst Must contend with gaps in known code regions after parsing

– 8 –© 2006 Nathan RosenblumUnconventional Code Constructs Dyninst Control Flow Parsing  Opportunistic parsing: Utilizes symbol table and other information when available (and sensible)  Provides more accurate view of the binary than linear disassembly  Addresses problem of gaps in the binary through speculative parsing Heuristics to identify function preambles

– 9 –© 2006 Nathan RosenblumUnconventional Code Constructs Control Flow Traversal Illustrated : 00: mov [a8], r1 04: mov [ac], r2 08: add r1, r2, r3 0c: cmp r3, 0 10: bne 24 14: call 18: add r3, 8, r3 1c: call 20: jmp 28 24: mul r2, 2, r3 28: sub r1, r3, r Parsing follows control flow Control transfers are edges in the CFG Target blocks can parsed in any order

– 10 –© 2006 Nathan RosenblumUnconventional Code Constructs Control Flow Traversal Illustrated : 00: mov [a8], r1 04: mov [ac], r2 08: add r1, r2, r3 0c: cmp r3, 0 10: bne 24 14: call 18: add r3, 8, r3 1c: call 20: jmp 28 24: mul r2, 2, r3 28: sub r1, r3, r1... Call sites determine location of functions Targets of calls are added to the function parsing work list Known Functions foo quux quuux bar baz

– 11 –© 2006 Nathan RosenblumUnconventional Code Constructs Binary Parsing Challenges  Pointer-based control transfer  Non-returning calls  Non-contiguous code sections  Tail calls  Gaps in the binary  Exception handlers  Shared code and multiple entry representation

– 12 –© 2006 Nathan RosenblumUnconventional Code Constructs Non-returning Call Sites  Some functions will not return Examples: abort, exit  Code following call site may not be valid  Even if names are available, calls may be hard to detect: dfaerrorfatalexit

– 13 –© 2006 Nathan RosenblumUnconventional Code Constructs Detecting Non-Returning Functions  Goal: detect non- returning functions from first principles  Identify distinguishing features of non- returning functions Wide variety of behavior in non- returning functions makes this difficult Example: operations in abort abort() -> sigaction() IO_flush_all() raise(SIGABRT) -> kill(getpid(),sig) hlt [privileged instruction]

– 14 –© 2006 Nathan RosenblumUnconventional Code Constructs Non-returning Call Sites d0 : f: e8 cc db 0a 00 call cf1e : e8 07 7f call : 90 nop 2161a: 90 nop 2161b: 90 nop 2161c: 90 nop 2161d: 90 nop 2161e: 90 nop 2161f: 90 nop : 21620: 55 push %ebp 21621: 89 e5 mov %esp,%ebp... Example: GNU libc library routines Call to abort does not return Parser will naively follow control into the following region Bytes following call site may not be code (e.g., jump tables, other functions, string data)

– 15 –© 2006 Nathan RosenblumUnconventional Code Constructs Non-contiguous Code Func Foo Functions are not address ranges Symbol table representation fails Many sources of non-contiguous layout: Jump tables Data (strings, etc) Unparsed code Exception handlers Padding or junk bytes

– 16 –© 2006 Nathan RosenblumUnconventional Code Constructs Non-contiguous Code... 77e7b1cb: addl $0x4,0x4(%ecx) 77e7b1cf: 5d pop %ebp 77e7b1d0: c2 0c 00 ret $0xc 77e7b1d3: 68 f push $0x6f5 77e7b1d8: eb 05 jmp 0x77e7b1df 77e7b1da: 68 e push $0x6e6 77e7b1df: e8 bb call 0x77ea389f 77e7b1e4: 4c ba e e7b1e8: 34 b2 e e7b1ec: b5 b1 e e7b1f0: 0c 9f e e7b1f4: e e7b1f8: cf b1 e e7b1fc: e7b20c: 3c 10 cmp $0x10,%al 77e7b20e: 0f 85 a6 3b jne 0x77e9edba... Example: Microsoft Word Jump table separates valid instruction sequences Control following call site is invalid

– 17 –© 2006 Nathan RosenblumUnconventional Code Constructs Named Non-contiguous Sections : f0: lock cmpxchg %ecx,0x2968(%ebx) 210f8: jne 2118e 210fe: xor %esi,%esi 21100: cmp $0x6,%esi e : 2118e: lea 0x2968(%ebx),%ecx 21194: call ea0f : jmp 210fe Example: GNU libc library routines Looks like shared code Fragment is not a real function

– 18 –© 2006 Nathan RosenblumUnconventional Code Constructs Named Non-contiguous Sections  Recognizing function fragments Have a symbol table entry Reached by branches from one function Branch back to one function  Use combination of CFG and symbol table clues

– 19 –© 2006 Nathan RosenblumUnconventional Code Constructs Tail Calls Func Bar... jmp Func Quux Compiler has joined two functions into one Looks like non- contiguous shared code... ret Func Foo... call

– 20 –© 2006 Nathan RosenblumUnconventional Code Constructs Gap Parsing Func Foo Func Bar Unidentified section of code Gaps between known code regions may contain undiscovered functions Targets of indirect calls Speculative parsing: pattern- based heuristics to recognize function prologues in gaps

– 21 –© 2006 Nathan RosenblumUnconventional Code Constructs Exceptions  Exception handling code is normally unreachable  Use information in the binary where available Example: Linux ELF exception tables C++ style exception catch block push %ebp mov %esp,%ebp push %ebx sub $0x24,%esp movl $0x6,0xfffffff8(%ebp) mov 0x8(%ebp),%eax mov %eax,(%esp) call 804aafa jmp 804abe9 mov %eax,0xfffffff4(%ebp) cmp $0x2,%edx je 804ab58... mov 0xfffffff4(%ebp),%eax mov %eax,(%esp) call 804a388 add $0x24,%esp pop %ebx pop %ebp ret

– 22 –© 2006 Nathan RosenblumUnconventional Code Constructs Shared Code Models Shared Code Func AFunc B  Code may be shared between functions Multiple entry functions Compiler optimizations  Analysis tools must be able to recognize and handle overlapping control flow

– 23 –© 2006 Nathan RosenblumUnconventional Code Constructs Summary of Binary Analysis Techniques  Control flow traversal is a powerful tool for addressing the challenges of modern binaries Lying/missing symbol tables Data/code disambiguation Jump tables  Speculative parsing techniques can be useful for expanding code coverage Gaps in code Indirect calls and branches

– 24 –© 2006 Nathan RosenblumUnconventional Code Constructs Incidence of Shared Code in Binaries  Parsed 828 Linux/x86 binaries 238 contained shared code  Most binaries contain only a few code- sharing functions  Some code sharing may be due to non- returning call sites

– 25 –© 2006 Nathan RosenblumUnconventional Code Constructs Where Do We Go From Here?  Are there good solutions from first principles? Almost certainly. We are just starting to explore the limits of such techniques.  Are special case solutions necessary? Again, almost certainly. We will try to use these as sparingly as possible.

– 26 –© 2006 Nathan RosenblumUnconventional Code Constructs Future Directions in Binary Analysis  Problem: code exists but is unreachable through standard control-flow traversal parsing Heuristics are a moving target  Existing opportunistic parsing techniques can help, but only to an extent Exception handlers, virtual function tables may be recoverable from the binary  Given the information we can recover from traditional techniques, can we synthesize additional information that will increase coverage of the binary?

– 27 –© 2006 Nathan RosenblumUnconventional Code Constructs Statistical Binary Parsing  Can we utilize known code to find unknown code? We have a partial parse of the binary Code unknown regions of the binary will likely share characteristics with previously identified code  Identify code in unknown regions: Create a probabilistic model of valid code Identify sections of unknown regions in the binary that are similar to valid code

– 28 –© 2006 Nathan RosenblumUnconventional Code Constructs Binary Modeling Techniques  Code idioms are one possibility for validating potential code Function preambles, jump table bounds tests, system call stubs, case statements  Idioms can be identified manually  Model can be trained to identify new idioms with machine learning techniques n-gram models, long-distance interaction  Unparsed code can be scored to indicate its statistical similarity to known code

– 29 –© 2006 Nathan RosenblumUnconventional Code Constructs Open Questions in Binary Analysis  What learning techniques will yield the best results?  How can we overcome the relative dearth of information in binaries with very little code reachable through control flow analysis? Incorporate information from analysis of other binaries  What techniques will allow us to accurately identify the range of recognizable code?

– 30 –© 2006 Nathan RosenblumUnconventional Code Constructs Questions?

– 31 –© 2006 Nathan RosenblumUnconventional Code Constructs Backup Slides

– 32 –© 2006 Nathan RosenblumUnconventional Code Constructs Shared Code Models Shared CodeMultiple Entry Func AFunc B Entry AEntry B What is the difference from the perspective of the parser?

– 33 –© 2006 Nathan RosenblumUnconventional Code Constructs A Choice of Abstraction  Shared code and multiple entry models are similar Represent independent flows of control merging together  Shared model is a better fit for Dyninst Preserves semantic guarantees of function independence

– 34 –© 2006 Nathan RosenblumUnconventional Code Constructs Shared Code 000a94c0 : a94c0: cmpl $0x0,%gs:0xc a94c8: jne a94e7 000a94ca : a94ca: push %ebx a94cb: mov 0x10(%esp,1),%edx a94cf: mov 0xc(%esp,1),%ecx a94d3: mov 0x8(%esp,1),%ebx a94d7: mov $0x7,%eax a94dc: int $0x80 a94de: pop %ebx a94df: cmp $0xfffff001,%eax a94e4: jae a Code common to the two functions is marked as shared. Example: GNU libc library routines