Information Hiding in Program Binaries Rakan El-Khalil xvr  xvr  net.

Slides:



Advertisements
Similar presentations
Practical Malware Analysis
Advertisements

Fabián E. Bustamante, Spring 2007 Machine-Level Programming II: Control Flow Today Condition codes Control flow structures Next time Procedures.
Smita Thaker 1 Polymorphic & Metamorphic Viruses Presented By : Smita Thaker Dated : Nov 18, 2003.
Introduction to Watermarking Anna Ukovich Image Processing Laboratory (IPL)
Information Hiding: Watermarking and Steganography
White-Box Cryptography
C Programming and Assembly Language Janakiraman V – NITK Surathkal 2 nd August 2014.
Dr. Richard Ford  Szor 7  Another way viruses try to evade scanners.
Wmobf.1 1/5/00 Clark Thomborson Watermarking, Tamper-Proofing and Obfuscation – Tools for Software Protection Christian Collberg & Clark Thomborson Computer.
Joshua Mason, Sam Small Johns Hopkins University Fabian Monrose University of North Carolina Greg MacManus iSIGHT Partners 16th ACM CCS.
Impeding Malware Analysis Using Conditional Code Obfuscation Paper by: Monirul Sharif, Andrea Lanzi, Jonathon Giffin, and Wenke Lee Conference: Network.
1 ICS 51 Introductory Computer Organization Fall 2006 updated: Oct. 2, 2006.
In the last part of the course we make a review of selected technical problems in multimedia signal processing First problem: CONTENT SECURITY AND WATERMARKING.
Practical Session 3. The Stack The stack is an area in memory that its purpose is to provide a space for temporary storage of addresses and data items.
C Prog. To Object Code text text binary binary Code in files p1.c p2.c
Recitation 2: Assembly & gdb Andrew Faulring Section A 16 September 2002.
Application Security Tom Chothia Computer Security, Lecture 14.
Watermarking University of Palestine Eng. Wisam Zaqoot May 2010.
Y86 Processor State Program Registers
Discovering Similarity of Short Programs by Canonical Form Baohua Wu University of Pennsylvania.
Digital image processing is the use of computer algorithms to perform image processing on digital images which is a subfield of digital signal processing.
EECS 354 Network Security Reverse Engineering. Introduction Preventing Reverse Engineering Reversing High Level Languages Reversing an ELF Executable.
Recitation 2 – 2/11/02 Outline Stacks & Procedures Homogenous Data –Arrays –Nested Arrays Mengzhi Wang Office Hours: Thursday.
Johann A. Briffa Mahesh Theru Manohar Das A Robust Method For Imperceptible High- Capacity Information Hiding in Images. INTRODUCTION  The art of Hidden.
RIVERSIDE RESEARCH INSTITUTE Deobfuscator: An Automated Approach to the Identification and Removal of Code Obfuscation Eric Laspe, Reverse Engineer Jason.
Machine-Level Programming 3 Control Flow Topics Control Flow Switch Statements Jump Tables.
Analyzing Memory Accesses in Obfuscated x86 Executables Michael Venable Mohamed R. Choucane Md. Enamul Karim Arun Lakhotia (Presenter) DIMVA 2005 Wien.
Protecting Software Code By Guards The George Washington University Cs297 YU-HAO HU.
Assembly and Bomb Lab : Introduction to Computer Systems Recitation 4: Monday, Sept. 16, 2013 Marjorie Carlson Section A.
Assembly Language. Symbol Table Variables.DATA var DW 0 sum DD 0 array TIMES 10 DW 0 message DB ’ Welcome ’,0 char1 DB ? Symbol Table Name Offset var.
1 Understanding Pointers Buffer Overflow. 2 Outline Understanding Pointers Buffer Overflow Suggested reading –Chap 3.10, 3.12.
Operating System Protection Through Program Evolution Fred Cohen Computers and Security 1992.
Buffer Overflow Attack- proofing of Code Binaries Ramya Reguramalingam Gopal Gupta Gopal Gupta Department of Computer Science University of Texas at Dallas.
University of Amsterdam Computer Systems – the instruction set architecture Arnoud Visser 1 Computer Systems The instruction set architecture.
Overview of Back-end for CComp Zhaopeng Li Software Security Lab. June 8, 2009.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin May 2-4, 2011 unstrip: Restoring Function Information to Stripped Binaries Using Dyninst Emily.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Paradyn Project Safe and Efficient Instrumentation Andrew Bernat.
Practical Session 3.
A job ad at a game programming company
Machine-Level Programming 2 Control Flow
Static and dynamic analysis of binaries
Recitation 2 – 2/11/02 Outline Stacks & Procedures
Aaron Miller David Cohen Spring 2011
Homework In-line Assembly Code Machine Language
Computer Architecture
Recitation 2 – 2/4/01 Outline Machine Model
Emily Jacobson and Nathan Rosenblum
Chapter 3 Machine-Level Representation of Programs
Computer Architecture adapted by Jason Fritts then by David Ferry
Y86 Processor State Program Registers
BIC 10503: COMPUTER ARCHITECTURE
Instruction Scheduling for Instruction-Level Parallelism
Code Generation.
C Prog. To Object Code text text binary binary Code in files p1.c p2.c
Software Watermarking Deterring Software Piracy
Machine-Level Programming 2 Control Flow
Systems I Pipelining II
Assembly Language Programming II: C Compiler Calling Sequences
Machine-Level Programming 2 Control Flow
Machine-Level Representation of Programs III
Machine-Level Programming 2 Control Flow
第七章 資訊隱藏 張真誠 國立中正大學資訊工程研究所.
Machine Level Representation of Programs (IV)
Chapter 3 Machine-Level Representation of Programs
RopSteg: Program Steganography with Return Oriented Programming
Systems I Pipelining II
Computer Architecture and System Programming Laboratory
Systems I Pipelining II
Computer Architecture and System Programming Laboratory
Presentation transcript:

Information Hiding in Program Binaries Rakan El-Khalil xvr  xvr  net

Warning: Steganography vs. Stenography.

Intro Information hiding overview Theoretical aspects of Software marking In practice… Applications

Types of Information Hiding Steganography Covert channels Anonymity Copyright marking  Robust marks Fingerprinting Watermarking [imperceptible or visible]  Fragile marks

General Methods Security through obscurity  Mostly used historically  Sometimes used today Camouflage  Hiding in plain sight  Hiding the location of the embedded data Spreading the hidden information

Strength Evaluation Data-Rate Stealth Resilience

Attacks Subtractive Distortive Additive w

Attacks Subtractive Distortive Additive w

Attacks Subtractive Distortive Additive w w’

Attacks Subtractive Distortive Additive w w+w’ w’

Mediums Sound Image Video Text Relational Databases Sets of Numbers Etc…

Binary Info Hiding Overview Low redundancy medium Static marks Dynamic marks

Static Data Marks char mark[] = “All your base…” switch (a) { case 1: return “are”; case 2: return “belong”; case 3: return “to us”; … } Etc…

Static Code Marks { int gonads, strife; gonads = 1; strife = 1; printf (“weeeeee”); } { int gonads, strife; printf (“weeeeee”); gonads = 1; strife = 1; }

Static Code Marks [cont.] … if(…) goto a if(…) goto d if(…) goto b a b c d

Static Code Marks [fin.] Pro: easy to implement Con: easy to break.

Dynamic Marks Mark stored in program’s execution state Types of marks:  Data structure  Execution trace  Easter egg

Easter Egg

Dynamic Data Structure Var[0] = 0x ; Var[1] = 0x ; Var[2] = 0x ; Var[3] = 0x ; Var[0] = 0x ; Var[1] = 0x ; Var[2] = 0x74204d61; Var[3] = 0x ; Op1 OpN … “The Great Mahir” Input1 InputN

Dynamic Graph Watermarking Generate a number n = P x Q Watermark is the graph topology Eg: Radix-5 encoding = 7*191 =

Dynamic Execution Trace 80480d3: 85 db test %ebx,%ebx 80480d5: 7e 29 jle 0x d7: 83 7d cmpl $0x0,0x8(%ebp) 80480db: je 0x dd: 8b mov 0x8(%ebp),%eax 80480e0: a3 40 bc mov %eax,0x808bc e5: cmpb $0x0,(%eax) … : b mov $0x0,%eax : 85 c0 test %eax,%eax : 74 0c je 0x : 83 c4 f4 add $0xfffffff4,%esp

Attacks Semantics preserving transformations { char c1, c2, c3; c1 = ‘u’; c2 = ‘n’; c3 = ‘f’; } char c1, c2, c3; int i; for (i=1; i <= 3; i++) { switch (i) { case 1: c1 = ‘u’ - 2; break; case 2: c2 = ‘n’ - 1; c1++; break; case 3: c3 = ‘f’; c1++; c2++; break; }}

Attacks on Dynamic Marking Add extra pointers Rename and reorder fields Add levels of indirection

In Practice… Difference between bytecode [Java,.Net, etc] and machine code [x86 asm]. Data vs. Code Problem.

In Practice Can’t use advanced techniques. Little work done on machine code watermarking.

Virus Intro Virus Code Program Entry Point Control Hijacking Control Return

Virus Code Obfuscation Encrypted  Fixed decryption routine  Changing virus body Polymorphic  Mutation engine that randomizes decryption routine Metamorphic  No decryptor  Randomizes its code

Metamorphic Tricks Register Swapping: 89 c4 mov %eax,%eax 80 3e 2f cmpb $0x2f,(%esi) jne 0x80480fa 8d lea 0x1(%esi),%ebx 46 inc %esi 80 3e 00 cmpb $0x0,(%esi) 89 f6 mov %esi,%esi f cmpb $0x2f,(%eax) jne 0x80480fa 8d lea 0x1(%eax),%ecx 40 inc %eax cmpb $0x0,(%eax)

More Tricks Instruction substitutions Changing of data values Nop and garbage insertions Branch reversals Alternate opcode encodings …

Hydan Generic information hiding tool Works with instruction substitution

Hydan Demo

Example Substitutions 83 c0 40: add %eax, $0x40 83 e8 c0: sub %eax, $-0x40 85 c0: test %eax, %eax 09 c0: or %eax, %eax 0b c0: or %eax, %eax 21 c0: and %eax, %eax

Hydan Examples [cont.] Embed 0100 into a listing: 83 c4 10 add %esp,$0x10 21 c0 and %eax,%eax je 0x804cbc0 83 ec 04 sub %esp,$0x4 50 push %eax 83 c4 10 add %esp,$0x10 ob c0 or %eax,%eax je 0x804cbc0 83 c4 fc add %esp,$-0x4 50 push %eax

Extra Security Techniques Message Whitening:  ASCII text easily recognizable  Text encrypted with Blowfish in CBC mode  Length masked with SHA hash of password

Random Walk:

Flag Collision Detection: Some ‘equivalent’ instructions set flags differently:  Eg: add vs. sub What to do? Scan forwards. xorl %eax, %eax addl $0xffffff, %eax xorl %eax, %eax subl $0x1, %eax %eax: -1, OF = 0, CF = 0 %eax: -1, OF = 0, CF = 1

Codebook Embedding: Build a codebook of equivalent instructions:  2 insns --> 1 bit  4 insns --> 2 bits  8 insns --> 3 bits What happens if we have 7 insns? Encoding Rate: { log 2 (N): N is a power of 2 log 2 (N-1): otherwise

Hydan Issues Detectable  Instructions are not created equal Low bandwidth  1/110 vs. Outguess’ 1/17 Easy to tamper with Breaks SMC

Statistics

Future Work: Reordering Given a list of n objects:  Can embed: floor [ log 2 (n!) ] bits: NBits

Reordering Method How does one encode data with an ordering? Eg: n = 3: 000  abc 001  acb 010  bac 011  bca 100  cab 101  cba

Encoding Algorithm More specifically:  floor [ log 2 (n!) ] bits of input  Take input and decompose it along its factorials.  Each factor represents the index of item in the sorted substring. Eg: n = 4, input =  Floor [ log2(4!) ]  4bits  First 4 bits: 1101 == 13  Decomposition: 13 == 2*3! + 0*2! + 1*1!  Resulting string: abcd  cabd  cabd  cadb

Decoding Algorithm Similarly:  floor [ log 2 (strlen(input)!) ] bits of output  Take input string and decompose it along its factorials.  Each factor represents the index of item in the sorted substring. Eg: input = cadb  Floor [ log2(4!) ]  4bits  Decomposition: 2*3! + 0*2! + 1*1! == 13  Result: 1101

Reorderables What can be reordered?  Functions  Independent instructions  Arguments  Register allocation  Data  …

Instruction Reordering mov $2, %eax mov $3, %ebx push %ecx call $0x mov $2, %eax mov $3, %ebx push %ecx mov $2, %eax push %ecx mov $3, %ebx mov $2, %eax push %ecx mov $3, %ebx push %ecx mov $2, %eax push %ecx mov $2, %eax mov $3, %ebx push %ecx mov $3, %ebx mov $2, %eax

Instruction Reordering Problem mov $2, %eax mov $3, %ebx push %ecx call $0x808000

Insn Reordering Prob [cont] Need to use a VM to emulate:  All Execution paths  Stack  Heap

Uses Traditional info hiding Security Polymorphism

Conclusion + QA