Automatic Reverse Engineering of Program Data Structures from Binary Execution Zhiqiang Lin Xiangyu Zhang, Dongyan Xu Dept. of Computer Science and CERIAS.

Slides:



Advertisements
Similar presentations
Christo Wilson Project 2: User Programs in Pintos
Advertisements

Smashing the Stack for Fun and Profit
ByteWeight: Learning to Recognize Functions in Binary Code
Native x86 Decompilation Using Semantics-Preserving Structural Analysis and Iterative Control-Flow Structuring Edward J. Schwartz *, JongHyup Lee ✝, Maverick.
The art of exploitation
Chapter 7 Process Environment Chien-Chung Shen CIS, UD
Dec 5, 2007University of Virginia1 Efficient Dynamic Tainting using Multiple Cores Yan Huang University of Virginia Dec
Review: Software Security David Brumley Carnegie Mellon University.
TaintCheck and LockSet LBA Reading Group Presentation by Shimin Chen.
Memory Image of Running Programs Executable file on disk, running program in memory, activation record, C-style and Pascal-style parameter passing.
PC hardware and x86 3/3/08 Frans Kaashoek MIT
1 Function Calls Professor Jennifer Rexford COS 217 Reading: Chapter 4 of “Programming From the Ground Up” (available online from the course Web site)
Practical Session 3. The Stack The stack is an area in memory that its purpose is to provide a space for temporary storage of addresses and data items.
1 Homework Reading –PAL, pp , Machine Projects –Finish mp2warmup Questions? –Start mp2 as soon as possible Labs –Continue labs with your.
High-Level Language Interface Chapter 17 S. Dandamudi.
6.828: PC hardware and x86 Frans Kaashoek
CAP6135: Malware and Software Vulnerability Analysis Buffer Overflow : Example of Using GDB to Check Stack Memory Cliff Zou Spring 2011.
Exploiting Buffer Overflows on AIX/PowerPC HP-UX/PA-RISC Solaris/SPARC.
The Structure of Processes. What is a Process? an instance of running program Program vs process(task) Program : just a passive collection of instructions.
Chapter 0.2 – Pointers and Memory. Type Specifiers  const  may be initialised but not used in any subsequent assignment  common and useful  volatile.
Processes and Threads CS550 Operating Systems. Processes and Threads These exist only at execution time They have fast state changes -> in memory and.
Jose Sanchez 1 o Tielei Wang†, TaoWei†, Zhiqiang Lin‡, Wei Zou†. o Purdue University & Peking University o Proceedings of NDSS'09: Network and Distributed.
Automatic Diagnosis and Response to Memory Corruption Vulnerabilities Presenter: Jianyong Dai Jun Xu, Peng Ning, Chongkyung Kil, Yan Zhai, Chris Bookhot.
Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution Zhiqiang Lin 1 Xuxian Jiang 2, Dongyan Xu 1, Xiangyu Zhang 1 1.
CS333 Intro to Operating Systems Jonathan Walpole.
Deriving Input Syntactic Structure From Execution Zhiqiang Lin Xiangyu Zhang Purdue University November 11 th, 2008 The 16th ACM SIGSOFT International.
Buffer Overflow Proofing of Code Binaries By Ramya Reguramalingam Graduate Student, Computer Science Advisor: Dr. Gopal Gupta.
1 Review. 2 Creating a Runnable Program  What is the function of the compiler?  What is the function of the linker?  Java doesn't have a linker. If.
Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution Zhiqiang Lin 1 Xuxian Jiang 2, Dongyan Xu 1, Xiangyu Zhang 1 1.
CNIT 127: Exploit Development Ch 1: Before you begin.
Chapter 4 Process Abstraction Chien-Chung Shen CIS, UD
Carnegie Mellon 1 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Machine-Level Programming V: Unions and Memory layout.
SigGraph: Brute Force Scanning of Kernel Data Structure Instances Using Graph-based Signatures Zhiqiang Lin 1 Junghwan Rhee 1, Xiangyu Zhang 1, Dongyan.
University of Amsterdam Computer Systems – the instruction set architecture Arnoud Visser 1 Computer Systems The instruction set architecture.
Convicting Exploitable Software Vulnerabilities: An Efficient Input Provenance Based Approach Zhiqiang Lin Xiangyu Zhang, Dongyan Xu Purdue University.
What is a Process ? A program in execution.
CAP6135: Malware and Software Vulnerability Analysis Buffer Overflow : Example of Using GDB to Check Stack Memory Cliff Zou Spring 2014.
Memory Management 백 일 우
Lecture No.08 Data Structures Dr. Sohail Aslam. Converting Infix to Postfix  Example: (A + B) * C symbpostfixstack( AA( +A( + BAB( + )AB + *AB +* CAB.
Chapter 7 Process Environment Chien-Chung Shen CIS/UD
Variables Bryce Boe 2012/09/05 CS32, Summer 2012 B.
Practical Session 3.
CS 177 Computer Security Lecture 9
Virtualization Virtualize hardware resources through abstraction CPU
Static and dynamic analysis of binaries
Automatic Network Protocol Analysis
ASIACCS 2007 AutoPaG: Towards Automated Software Patch Generation with Source Code Root Cause Identification and Repair Zhiqiang Lin 1,3 Xuxian Jiang 2,
Computer Architecture and Assembly Language
Machine-Level Programming V: Unions and Memory layout
CS399 New Beginnings Jonathan Walpole.
Cyber Grand Challenge “Cyber Grand Challenge (CGC) is a contest to build high-performance computers capable of playing in a Capture-the-Flag style cyber-security.
Exploiting & Defense Day 2 Recap
The 40th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Reuse-Oriented Camouflaging Trojan: Vulnerability Detection and Attack.
Introduction to Compilers Tim Teitelbaum
Taint tracking Suman Jana.
The 17th Annual Network and Distributed System Security Symposium
C Prog. To Object Code text text binary binary Code in files p1.c p2.c
Assembly Language Programming II: C Compiler Calling Sequences
Understanding Program Address Space
Virtual Memory.
CNT4704: Analysis of Computer Communication Network Buffer Overflow : Example of Using GDB to Check Stack Memory Cliff Zou Fall 2011.
System Calls David Ferry CSCI 3500 – Operating Systems
Discovering Data Structures
Khaled Yakdan University of Bonn Fraunhofer FKIE
Virtual Memory: Systems CSCI 380: Operating Systems
IntScope: Automatically Detecting Integer overflow vulnerability in X86 Binary Using Symbolic Execution Tielei Wang, TaoWei, ZhingiangLin, weiZou Purdue.
“Way easier than when we were students”
CAP6135: Malware and Software Vulnerability Analysis Buffer Overflow : Example of Using GDB to Check Stack Memory Cliff Zou Spring 2016.
System and Cyber Security
CAP6135: Malware and Software Vulnerability Analysis Buffer Overflow : Example of Using GDB to Check Stack Memory Cliff Zou Spring 2010.
Presentation transcript:

Automatic Reverse Engineering of Program Data Structures from Binary Execution Zhiqiang Lin Xiangyu Zhang, Dongyan Xu Dept. of Computer Science and CERIAS Purdue University March 3 rd, 2010 The 17 th Annual Network and Distributed System Security Symposium

Problem Definition  Recover data structure specifications from binary  Syntactic  Layout  Offset  Size  Semantic  ip_addr_t, pid_t, …  input_t  malloc_arg_t, format_string_t…

Security Applications -- Memory Forensics thread_info mm count pgd mmap mmap_avl mmap_sem mm_struct task_struct vm_end vm_start vm_flags vm_inode vm_ops vm_next vm_area_struct file task_struct* exec_domain supervisor_stack[0] count fdtable *fdt fdtable fdtab file_lock file *fd_array[32]

char [4] unused[1020] stack_frame_pointer return_address char* Security Applications -- Vulnerability Discovery input_t 1 void main(int argc, char* argv[]) 2 { 3 char tempname[1024]; 4 strcpy(tempname, argv[1]); 5} $./a.out aaa’\n’ $./a.out aaaaaaaaaaaaaaaa…a’\n’

Security Applications -- Program Signature = Laika [Cozzie et al., OSDI’08] Using Data structure as a classifier  Malware Signature struct A { int a; char b; int c; float d; }; struct A { int a; char b; int c; float d; }; struct B { int h; char i; int j; float k; }; struct B { int h; char i; int j; float k; };

Security Applications  Memory forensics  Vulnerability discovery  Program signature  Protocol format reverse engineering  Virtual machine introspection ……

Related Work  Type inference  Hindley-Miner algorithm [Miner, 1978]  Iterative type analysis [Ungar et al, PLDI’90]  Object-oriented type inference [Palsberg et al, OOPSLA’91]  Abstract type inference  Lakwit [Callahan et al, ICSE’97]  Dynamic Abstract Type Inference [Guo et al, ISSTA’06] Require source code

Related Work (binary-based)  Variable discovery (DIVINE) [Balakrishnan and Reps, 2004,2007,2008]  Static binary points-to analysis, alias analysis  No semantics  Decompilation [Mycrof, ESOP’99]  Execution-equivalent code  Protocol format reverse engineering  Polyglot [Caballero et al,CCS’07], AutoFormat [Lin et al, NDSS’08], ANPA [Wondracek et al, NDSS’08], Tupni [Cui et al, CCS’08], Prospex [Comparetti et al., Oakland’09], ReFormat [Wang et al, ESORICS’09], Dispatcher [Caballero,CCS’09], …  Input and output messages

Overview Type Resolution Points Data flow tracking, Type Resolution Application binary Data Structure Specification REWARDS Reverse Engineering Work for Automatic Revelation of Data Structures

Key Ideas a0: call 0x80480b a5: mov $0x1,%eax aa: mov $0x0,%ebx … c1: call 0x c6: mov eax, 0x … : mov $0x14,%eax : int $0x : ret 1 struct { 2 unsigned int pid; 3 char data[16]; 4 }test; 5 6 void foo(){ 7 char *p="hello world"; 8 test.pid=getpid(); 9 strcpy(test.data,p); 10 } … bss_0x { +00: pid_t, +04: char[12], +16: unused[4] } fun_0x080480b4{ -28: unused[20], -08: char *, -04: stack_frame_t, +00: ret_addr_t } Type Resolution Point Data Flow foo fun_0x { +00: ret_addr_t } getpid global var test

Type Resolution Point - I  System calls  Syscall num   Syscall_enter: Type parameter passing registers (i.e., ebx, ecx, edx, esi, edi, and ebp) if they involved  Syscall_exit: type return value (eax) : mov $0x14,%eax : int $0x : ret

Type Resolution Point - II  Standard Library Call (API)  Type corresponding argument and return value  More wealth than System call  2016 APIs in Libc.so.6 vs. 289 sys call (2.6.15) ce: mov %eax,0x4(%esp) d2: movl $0x , (%esp) d9: call 0x80480e0

Type Resolution Point - III  Other type revealing instructions in binary code  String related (e.g., MOVS/B/D/W, LOADS/STOS/B/D/W)  Floating-point related (e.g., FADD, FABS, FST)  Pointer-related (e.g., MOV (%edx), %ebx)

Backward Type Resolution Source Sinks movl $0x ,%eax mov %eax, (%esp) call 0x80480e0 Backward strcpy(char*, char*)char* arg1

Forward Type Resolution Source call 0x return pid_t to %eax mov %eax, 0x Forward pid_t

Data Flow Propagation  Similar to taint analysis, using shadow memory to keep the variable attributes and track the propagation  New challenges  Two-way type resolution  Dynamic life time of stack and heap variables  Memory locations will be reused  Multiple instances of the same type

Implementations  Dynamic Binary Instrumentation  A pin-tool on top of Pin-2.6   Data structures of interest:  File information  Network communication  Process information  Input-influenced (for vulnerability discovery) PIN

Evaluations  Experiment set up  10 utility binaries (e.g., ls, ps, ping, netstat…)  Linux , gcc-3.4, 2G mem  Ground truth  debug information

Experimental Results  False negative: the data structure we missed (compared with the ground truth)  Global:70%  Heap: 55%  Stack: 60%  Why false negative  Dynamic analysis

Experimental Results  False positive (we get the wrong data type)  Global: 3%  Heap: 0%  Stack: 15%

Sources of False Positives  Loss of data structure hierarchy  There is no corresponding type resolution point with the same hierarchical type  Heuristics exist (e.g., AutoFormat [Lin et al, NDSS’08] )  Path-sensitive memory reuse  Compiler might assign different local variables declared in different program paths to the same memory address  Path-sensitive analysis  Type cast  int a=(float)b;

Application I: Memory Forensics b0 5b fe b7 b0 5b fe b e a b c7 b0 af 4a c7 b0 af 4a a a struct 0x804dd4f { 00: pthread_t; 04: int; 08: socket; 12: struct sockaddr_in; 28: time_t; 32: time_t; 36: unused[4]; 40: struct 0x804ddfb*; }; b0 5b fe b e 0a b c7 b0 af 4a a 05 08

Application I: Memory Forensics b0 5b fe b7 b0 5b fe b e a b c7 b0 af 4a c7 b0 af 4a a a { 00: sin_family; 02: sin_port; 04: sin_addr; 08: sin_zero; }; e 0a b e 0a b ip_addr_t: struct 0x804dd4f { 00: pthread_t; 04: int; 08: socket; 12: struct sockaddr_in; 28: time_t; 32: time_t; 36: unused[4]; 40: struct 0x804ddfb*; };

Application II: Vulnerability Discovery Programbuffer overflow #total #real integer overflow #total #real Format string #total #real ncompress bftpd gzip nullhttpd xzgv gnuPG ipgrab cfingerd ngircd Suspects True

Discussion  Dynamic analysis  Full coverage of data structures  Only user-level data structure  OS kernel  Obfuscated binary can cheat REWARDS  No library call  Memory cast  Lack of other application-specific type resolution points  Other APIs  E.g., browser-related

Summary  REWARDS  Binary only  Dynamic analysis  Data flow tracking  Key insight  Using system call/API/Type revealing instruction as type resolution point  Two-way type resolution  Unique benefits to memory forensics and vulnerability discovery

Automatic Reverse Engineering of Program Data Structures from Binary Execution Zhiqiang Lin Xiangyu Zhang, Dongyan Xu Dept. of Computer Science and CERIAS Purdue University March 3 rd, 2010 The 17 th Annual Network and Distributed System Security Symposium

Backup: Path-sensitive memory reuse 1061 if (udp.len != 0) if (!audp_recv(&udp,&tmpclient,buf,3000)) { 1062 struct header *rc; 1063 struct llheader *llrc; 1064 char k=0; 1065 buf+=sizeof(struct llheader); 1066 udp.len-=sizeof(struct llheader); 1067 llrc=(struct llheader *)(buf-sizeof(struct llheader)); 1068 rc=(struct header *)buf; 1069 if (llrc->type == 0) { 1070 struct llheader ll; 1071 memset((void*)&ll,0,sizeof(struct llheader)); 919 case 'e': { 920 if (strlen(params[0]) > 1) switch (tolower(params[0][1])) { 921 case 's': { 922 struct escan_rec rc; 923 if (num_params \n"); 924 else { 925 memset((void*)&rc,0,sizeof(struct escan_rec)); (pudclient.c)

Backup: offline is needed sometimes Source Sinks Variable A gets defined Variable A dies (keep in the shadow log file) The type of Variable A gets resolved Backward

Backup: Vulnerability Discovery  Buffer overflow  Format string  Integer overflow fun_0x08048e76{ : char[13], : unused[1023], : stack_frame_t, +0000: ret_addr_t, +0004: char*} input_t argv fun 0x0805f9a5 {..., : format_string_t[76], : unused[204], : stack_frame_t, +0000: ret_addr_t,...} input_trecv bss 0x0809ac80 { : int (malloc_arg_t), : int (malloc_arg_t),...} input_t fread