Detecting Obfuscated Code Using Cosine Similarity

Slides:



Advertisements
Similar presentations
A proposed Trusted-Flow system architecture with aspect-oriented implementation Paolo Falcarin, Mario Baldi Riccardo Scandariato, Maurizio Morisio (Politecnico.
Advertisements

Smita Thaker 1 Polymorphic & Metamorphic Viruses Presented By : Smita Thaker Dated : Nov 18, 2003.
Pablo Garaizar Sagarminaga Jaime Devesa Esteban Dr. Igor Santos.
Dr. Richard Ford  Szor 7  Another way viruses try to evade scanners.
IBinHunt: Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State.
Disclaimer The Content, Demonstration, Source Code and Programs presented here is "AS IS" without any warranty or conditions.
©2011 Check Point Software Technologies Ltd. [PROTECTED] — All rights reserved. Introduction to Reverse Engineering Inbar Raz Malware Research Lab Manager.
Information Hiding in Program Binaries Rakan El-Khalil xvr  xvr  net.
HUNTING FOR METAMORPHIC ENGINES Mark Stamp & Wing Wong August 5, 2006.
Programming Introduction November 9 Unit 7. What is Programming? Besides being a huge industry? Programming is the process used to write computer programs.
Code Injection and Software Cracking’s Effect on Network Security Group 5 Jason Fritts Utsav Kanani Zener Bayudan ECE 4112 Fall 2007.
Automated malware classification based on network behavior
MutantX-S: Scalable Malware Clustering Based on Static Features Xin Hu, IBM T.J. Watson Research Center; Sandeep Bhatkar and Kent Griffin, Symantec Research.
CISC Machine Learning for Solving Systems Problems Presented by: Akanksha Kaul Dept of Computer & Information Sciences University of Delaware SBMDS:
OOSE 01/17 Institute of Computer Science and Information Engineering, National Cheng Kung University Member:Q 薛弘志 P 蔡文豪 F 周詩御.
Software Analysis & Deobfuscation Engine. Page  2  Project Name: SADE  Project Members: Faiza Khalid, Komal Babar and Abdul Wahab  Project Supervisor.
David Evans CS201j: Engineering Software University of Virginia Computer Science Lecture 18: 0xCAFEBABE (Java Byte Codes)
Trying to like a boss… REVERSE ENGINEERING. WHAT EVEN IS… REVERSE ENGINEERING?? Reverse engineering is the process of disassembling and analyzing a particular.
Computer Viruses Preetha Annamalai Niranjan Potnis.
Application Security Tom Chothia Computer Security, Lecture 14.
Department of Computer Science Yasmine Kandissounon.
Paradyn Project Dyninst/MRNet Users’ Meeting Madison, Wisconsin August 7, 2014 The Evolution of Dyninst in Support of Cyber Security Emily Gember-Jacobson.
Behavior-based Spyware Detection By Engin Kirda and Christopher Kruegel Secure Systems Lab Technical University Vienna Greg Banks, Giovanni Vigna, and.
KEVIN COOGAN, GEN LU, SAUMYA DEBRAY DEPARTMENT OF COMUPUTER SCIENCE UNIVERSITY OF ARIZONA 報告者:張逸文 Deobfuscation of Virtualization- Obfuscated Software.
Windows PE files Infections and Heuristic Detection Nicolas BRULEZ / Digital River PACSEC '04.
Ether: Malware Analysis via Hardware Virtualization Extensions Author: Artem Dinaburg, Paul Royal, Monirul Sharif, Wenke Lee Presenter: Yi Yang Presenter:
Introduction: Exploiting Linux. Basic Concepts Vulnerability A flaw in a system that allows an attacker to do something the designer did not intend,
CS266 Software Reverse Engineering (SRE) Reversing and Patching Java Bytecode Teodoro (Ted) Cipresso,
Hunting for Metamorphic Engines Wing Wong Mark Stamp Hunting for Metamorphic Engines 1.
EECS 354 Network Security Reverse Engineering. Introduction Preventing Reverse Engineering Reversing High Level Languages Reversing an ELF Executable.
Introduction to Information Security מרצים : Dr. Eran Tromer: Prof. Avishai Wool: מתרגלים : Itamar Gilad
RIVERSIDE RESEARCH INSTITUTE Deobfuscator: An Automated Approach to the Identification and Removal of Code Obfuscation Eric Laspe, Reverse Engineer Jason.
Presented by: Akbar Saidov Authors: M. Polychronakis, K. G. Anagnostakis, E. P. Markatos.
Normalizing Metamorphic Malware Using Term Rewriting A. Walenstein, R. Mathur, M. R. Chouchane, and A. Lakhotia Software Research Laboratory The University.
CISC Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.
1 CHAPTER 5 DIFFING. 2 What is Diffing? Practice of comparing two sets of data, before and after a changed has occurred Practice of comparing two sets.
Analyzing Memory Accesses in Obfuscated x86 Executables Michael Venable Mohamed R. Choucane Md. Enamul Karim Arun Lakhotia (Presenter) DIMVA 2005 Wien.
Advanced Polymorphic Worms: Evading IDS by Blending in with Normal Traffic Authors: Oleg Kolensnikov and Wenke Lee Published: Technical report, 2005, College.
Buffer Overflow Proofing of Code Binaries By Ramya Reguramalingam Graduate Student, Computer Science Advisor: Dr. Gopal Gupta.
Using Engine Signature to Detect Metamorphic Malware Mohamed R. Chouchane and Arun Lakhotia Software Research Laboratory The University of Louisiana at.
Forensic Analysis of Toolkit-Generated Malicious Programs Yasmine Kandissounon TSYS School of Computer Science Columbus State University 2009 ACM Mid-Southeast.
Copyright © 2011, A Behavior-based Methodology for Malware Detection Student: Hsun-Yi Tsai Advisor: Dr. Kuo-Chen Wang 2012/04/30.
nd Joint Workshop between Security Research Labs in JAPAN and KOREA Polymorphic Worm Detection by Instruction Distribution Kihun Lee HPC Lab., Postech.
 Programming - the process of creating computer programs.
METAMORPHIC VIRUS NGUYEN LE VAN.
Buffer Overflow Attack- proofing of Code Binaries Ramya Reguramalingam Gopal Gupta Gopal Gupta Department of Computer Science University of Texas at Dallas.
Overview of Back-end for CComp Zhaopeng Li Software Security Lab. June 8, 2009.
Cosc 4765 Antivirus Approaches. In a Perfect world The best solution to viruses and worms to prevent infected the system –Generally considered impossible.
1 Chapter 1: Introduction Appendix A: Binary and Hexadecimal Tutorial Assembly Language for Intel-Based Computers, 3rd edition Kip R. Irvine.
Bringing VX back to life!
Static and dynamic analysis of binaries
By Hongyuan Qiu, Anthony Arrott & Fernando C. Colon Osorio
V. A. Memos and K. E. Psannis*
Techniques, Tools, and Research Issues
Techniques, Tools, and Research Issues
16.317: Microprocessor System Design I
Teaching Computing to GCSE
Semantics-Aware Malware Detection
Chap 10 Malicious Software.
Executive Director and Endowed Chair
Executive Director and Endowed Chair
Fundamentals of Computer Organisation & Architecture
تحلیل ساختاری ویروس‌های کامپیوتری از تئوری تا کاربرد
CSC 382/582: Computer Security
Challenges in Building and Detecting Portable Source Code Morphers
Week 2: Buffer Overflow Part 2.
Chap 10 Malicious Software.
Normalizing Metamorphic Malware Using Term Rewriting
Program & Application Security Through Binary Code Analysis
Reverse Engineering for CTFs
Presentation transcript:

Detecting Obfuscated Code Using Cosine Similarity

Overview Motivation Code Obfuscation Techniques and examples Proposed Approach Experiments and Results Extended Work and Results Future Work Limitations

Motivation Virus writers come up with new innovative ways to evade detection Polymorphic and metamorphic viruses morph code to evade detection There is code obfuscation-deobfuscation game played by virus writers String based detection not suitable for “smart” viruses such as those which morph code

PE Header

Code Obfuscation Techniques Dead code insertion Code transposition Register reassignment Instruction substitution

Example code and virus scanner signature Original Code Hex Opcodes Assembly 51 push ecx 50 push eax 5B pop ebx 8D 4B 38 lea ecx, [ebx + 38h] E8 00000000 call 0h 83 C3 1C add ebx, 1Ch FA cli 8B 2B mov ebp, [ebx] Signature 5150 5B8D 4B38 50E8 0000 0000 5B83 C31C FA8B 2B5B

Dead code insertion Hex Opcodes Assembly 51 push ecx 90 nop 50 push eax 5B pop ebx 8D 4B 38 lea ecx, [ebx + 38h] E8 00000000 call 0h 83 C3 1C add ebx, 1Ch FA cli 8B 2B mov ebp, [ebx] New Signature 5190 505B 8D4B 3850 90E8 0000 0000 5B83 C31C FA90 8B2B 5B

Code transposition push ecx push eax jmp A C: pop ebx add ebx, 1Ch cli mov ebp, [ebx] jmp D A: pop ebx lea ecx, [ebx + 38h] jmp B B: push eax call 0h jmp C D: pop ebx

Instruction substitution Example: 1. add eax, 1 Can be substituted as sub eax, -1 2. mov eax,5 mul eax, 2 mov eax,0 mov eax, 5 add eax, 5

Proposed Approach Given two programs Program A and Program B we would like to determine the degree of similarity Disassemble programs Run a trace through the disassembled code to extract functional blocks Encode each functional block as a single dimensional vector based on frequency of instructions within a block Use cosine similarity to compute similarity of two functional block vectors from Program A and Program B

Proposed Approach (cont’d) Based on functions with maximum similarity, compute overall program similarity. This similarity allows us to infer whether one program is an obfuscated version of the other.

Data Modeling Figure 1: Functional block Data Structure Fun1 MOV 5 ADD NOP 4 JMP 5 JE 6 SUB 8 Fun2 ……………….. Fun3 MOV 6 ADD 7 NOP 18 JMP 22 JE SUB 3 Fun4 ……………….. Fun5 MOV ADD 2 NOP 54 JMP 5 JE 16 SUB 4 Figure 1: Functional block Data Structure

Experimental Methodology Figure 2 - Comparison Engine Module 1 Module 2 Module 3

Cosine Similarity Given two vectors A and B Vector A = [x1 x2 x3 x4……xn] Vector B = [y1 y2 y3 y4……yn] Cosine similarity is given by ∑ xi yi / [(∑ xi2)1/2 (∑ yi2)1/2]

Algorithm to compute cosine similarity For each function vector targetA[]do the following: for( i = 1 to m) // m = number of functions in targetA{ for(j = 1 to k){// k = number of functions in targetB Initialize Sum_XY, Sum_denomX,, Sum_denomY, for(z =1 to n){ // number of IA-32 instructions Sum_XY = targetA[i][z]*targetB[j][z] Sum_denomX = targetA[i][z]* targetA[i][z] Sum_denomY = targetB[j][z]* targetB[j][z] } cosine_sim = Sum_XY /(SQRT(Sum_denomX *Sum_denomY)) if(Cosine_sim [i] < cosine_sim){ Cosine_sim [i] = cosine_sim If avg_max_sim ≥ Threshold then predict that the two programs are similar

Experiments Program Name Description Win32.Evol.a / b Metamorphic versions of Win32.Evol Win32.Evul.a /b Metamorphic versions of Win32.Evul Win32.Oroch.a /b Versions of Win32.Oroch Java1.4/1.5 Version of JRE PPT2001/PPT2003 2001 and 2003 versions of Microsoft Power Point Mcg.exe A program written using MS VS6 for Software Engg course at UCF Logoff Windows Logoff program MsPaint Windows MS Paint program Calc Windows Calculator

Viruses experimented on Win32.Evol - is known to be a true metamorphic virus morphing code snippets to evade signature detection. Symantec describes Win32.Evol as a “32-bit metamorphic virus. It is the first W32 virus using a 32-bit true metamorphic engine. It can replicate on Windows 9x as well as Windows NT and Windows 2000.” Two versions Evol.a and Evol.b were compared Win32.Oroch - is a non-memory resident encrypted Win32 virus. It replicates under Windows32 systems and infects PE files. The virus uses anti-debugging tricks in its decryption routine and is quite stable under WinNT. Two versions Oroch.a and Oroch.b were compared Win32.Evul – Two versions were obtained of possible 8 versions

Results Source Program Target Program Avg. Cosine Similarity Win32.Evol.a Win32.Evol.b 0.999 Oroch.a Oroch.b 1.000 Win32.Evul.a Win32.Evul.b 0.996 Java1.5 Logoff.exe 0.886 Java1.4 0.998 PPT2001 PPT2003 Mcg.exe 0.870 Calc.exe 0.761 0.866 Mspaint.exe 0.909

Extended Work Three similarity measures used Cosine similarity Pearson correlation Jaccard similarity Utilized a voting scheme based on matching functions

Jaccard Similarity Given two vectors A and B Vector A = [x1 x2 x3 x4……xn] Vector B = [y1 y2 y3 y4……yn] Jaccard similarity is given by ∑ xi yi / [(∑ xi2)+(∑ yi2) - ∑ xi yi ]

Pearson Correlation Given two vectors A and B Vector A = [x1 x2 x3 x4……xn] Vector B = [y1 y2 y3 y4……yn] Pearson correlation is given by ½* ∑ [(xi-Mean(∑ xi))*(yi-Mean(∑ yi))]/ [∑(xi-Mean(∑ xi))2*∑(yi-Mean(∑ yi))2]1/2] + 1

Final Functional Match Cosine Pearson Jaccard Program A Program B F1 F3 F2 F4 F5 Match Final Functional Match F3 (Max Similarity) F4 (2/3 Max Similarity) F5 -1

Results Source Program Target Program Avg. Similarity Win32.Evol.a Win32.Evol.b 0.998 Oroch.a Oroch.b 1.000 Win32.Evul.a Win32.Evul.b 0.994 Java1.5 Logoff.exe 0.689 Java1.4 0.988 PPT2001 PPT2003 0.997 Mcg.exe 0.657 Calc.exe 0.579 0.627 Mspaint.exe 0.573

Future Work Automate disassembly output for Module 1 Use other machine learning algorithms for similarity analysis Use other heuristic approaches (such as extensive identification for dead code insertion techniques) Write a program obfuscator to minimize efforts of hand coding obfuscations Obtain and experiment on more obfuscated code samples. Test our method of detection against popular Anti-virus scanners

Limitations Method does not work for packed code and code that uses anti-disassembly techniques Identification may be slower since it requires additional computation compared to simple signature based detection. Limited dead code insertion identification (currently limited to NOPs) Does not work for instruction substitution.

References M.Christodorescu, S.Jha. Testing Malware Detectors. International Symposium on Software Testing and Analysis’ 2004 M.Christodorescu, S.Jha. Static Analysis of Executables to Detect Malicious Patterns.12th USENIX Security Symposium’ 2003 C.Collberg, C.Thomborson, D.Low. A taxonomy of obfuscating transformations. Technical Report 148,Department of Computer science, University of Auckland, New Zealand, July 1997 M.Weber, M.Schmid, M.Schatz, D.Geyer. A Toolkit for Detecting and Analysing Malicious Software.Proceedings of ASAC ‘2002. http://vx.netlux.org/ http://www.symantec.com Peter Szor , The Art of Computer Virus Research and Defense, Symantec Press. M Steinbach, G Karypis, V Kumar. A comparison of document clustering techniques. KDD workshop on text mining 2000. www.stanford.edu/class/cs276/handouts/lecture13-vector-classify.ppt

M. Christodorescu, S. Jha, S. A. Seshia, D. Song, and R. E. Bryant M. Christodorescu, S. Jha, S. A. Seshia, D. Song, and R. E. Bryant. “Semantics-aware malware detection”. In Proceedings of the 2005IEEE Symposium on Security and Privacy (S&P’05), pages 32–46, Oakland, CA, USA, May 8–11, 2005. IEEE Computer Society. Mila Dalla Preda, Mihai Christodorescu, Somesh Jha and Saumya Debray, “A Semantics-Based Approach to Malware Detection”, Proceedings of "34th Annual Symposium on Principles of Programming Languages (POPL'07).", 2007 Christian S. Collberg and Clark Thomborson.“Watermarking, tamper-proofing, and obfuscation - tools for software protection”. In IEEE Transactions on Software Engineering, volume 28, pages 735–746, August 2002. A. Lakhotia, E U. Kumar and M. Venable, “A Method for Detecting Obfuscated Calls in Malicious Binaries”, IEEE Transactions on Software Engineering, Vol 31, No 11, November 2005. A.H. Sung, J. Xu, P. Chavez, S. Mukkamala, “Static Analyzer of Vicious Executables (SAVE). In the Proceedings of the 20th Annual Computer Security Applications Conference (ACSAC’04)