Download presentation
Presentation is loading. Please wait.
Published byHermann Grosser Modified over 6 years ago
1
Detecting Obfuscated Code Using Cosine Similarity
2
Overview Motivation Code Obfuscation Techniques and examples
Proposed Approach Experiments and Results Extended Work and Results Future Work Limitations
3
Motivation Virus writers come up with new innovative ways to evade detection Polymorphic and metamorphic viruses morph code to evade detection There is code obfuscation-deobfuscation game played by virus writers String based detection not suitable for “smart” viruses such as those which morph code
4
PE Header
5
Code Obfuscation Techniques
Dead code insertion Code transposition Register reassignment Instruction substitution
6
Example code and virus scanner signature
Original Code Hex Opcodes Assembly 51 push ecx 50 push eax 5B pop ebx 8D 4B 38 lea ecx, [ebx + 38h] E call 0h 83 C3 1C add ebx, 1Ch FA cli 8B 2B mov ebp, [ebx] Signature 5150 5B8D 4B38 50E B83 C31C FA8B 2B5B
7
Dead code insertion Hex Opcodes Assembly 51 push ecx 90 nop
50 push eax 5B pop ebx 8D 4B 38 lea ecx, [ebx + 38h] E call 0h 83 C3 1C add ebx, 1Ch FA cli 8B 2B mov ebp, [ebx] New Signature B 8D4B E B83 C31C FA90 8B2B 5B
8
Code transposition push ecx push eax jmp A C: pop ebx add ebx, 1Ch cli
mov ebp, [ebx] jmp D A: pop ebx lea ecx, [ebx + 38h] jmp B B: push eax call 0h jmp C D: pop ebx
9
Instruction substitution
Example: 1. add eax, 1 Can be substituted as sub eax, -1 2. mov eax,5 mul eax, 2 mov eax,0 mov eax, 5 add eax, 5
10
Proposed Approach Given two programs Program A and Program B we would like to determine the degree of similarity Disassemble programs Run a trace through the disassembled code to extract functional blocks Encode each functional block as a single dimensional vector based on frequency of instructions within a block Use cosine similarity to compute similarity of two functional block vectors from Program A and Program B
11
Proposed Approach (cont’d)
Based on functions with maximum similarity, compute overall program similarity. This similarity allows us to infer whether one program is an obfuscated version of the other.
12
Data Modeling Figure 1: Functional block Data Structure Fun1 MOV 5 ADD
NOP 4 JMP 5 JE 6 SUB 8 Fun2 ……………….. Fun3 MOV 6 ADD 7 NOP 18 JMP 22 JE SUB 3 Fun4 ……………….. Fun5 MOV ADD 2 NOP 54 JMP 5 JE 16 SUB 4 Figure 1: Functional block Data Structure
13
Experimental Methodology
Figure 2 - Comparison Engine Module 1 Module 2 Module 3
14
Cosine Similarity Given two vectors A and B
Vector A = [x1 x2 x3 x4……xn] Vector B = [y1 y2 y3 y4……yn] Cosine similarity is given by ∑ xi yi / [(∑ xi2)1/2 (∑ yi2)1/2]
15
Algorithm to compute cosine similarity
For each function vector targetA[]do the following: for( i = 1 to m) // m = number of functions in targetA{ for(j = 1 to k){// k = number of functions in targetB Initialize Sum_XY, Sum_denomX,, Sum_denomY, for(z =1 to n){ // number of IA-32 instructions Sum_XY = targetA[i][z]*targetB[j][z] Sum_denomX = targetA[i][z]* targetA[i][z] Sum_denomY = targetB[j][z]* targetB[j][z] } cosine_sim = Sum_XY /(SQRT(Sum_denomX *Sum_denomY)) if(Cosine_sim [i] < cosine_sim){ Cosine_sim [i] = cosine_sim If avg_max_sim ≥ Threshold then predict that the two programs are similar
16
Experiments Program Name Description Win32.Evol.a / b
Metamorphic versions of Win32.Evol Win32.Evul.a /b Metamorphic versions of Win32.Evul Win32.Oroch.a /b Versions of Win32.Oroch Java1.4/1.5 Version of JRE PPT2001/PPT2003 2001 and 2003 versions of Microsoft Power Point Mcg.exe A program written using MS VS6 for Software Engg course at UCF Logoff Windows Logoff program MsPaint Windows MS Paint program Calc Windows Calculator
17
Viruses experimented on
Win32.Evol - is known to be a true metamorphic virus morphing code snippets to evade signature detection. Symantec describes Win32.Evol as a “32-bit metamorphic virus. It is the first W32 virus using a 32-bit true metamorphic engine. It can replicate on Windows 9x as well as Windows NT and Windows 2000.” Two versions Evol.a and Evol.b were compared Win32.Oroch - is a non-memory resident encrypted Win32 virus. It replicates under Windows32 systems and infects PE files. The virus uses anti-debugging tricks in its decryption routine and is quite stable under WinNT. Two versions Oroch.a and Oroch.b were compared Win32.Evul – Two versions were obtained of possible 8 versions
18
Results Source Program Target Program Avg. Cosine Similarity
Win32.Evol.a Win32.Evol.b 0.999 Oroch.a Oroch.b 1.000 Win32.Evul.a Win32.Evul.b 0.996 Java1.5 Logoff.exe 0.886 Java1.4 0.998 PPT2001 PPT2003 Mcg.exe 0.870 Calc.exe 0.761 0.866 Mspaint.exe 0.909
19
Extended Work Three similarity measures used
Cosine similarity Pearson correlation Jaccard similarity Utilized a voting scheme based on matching functions
20
Jaccard Similarity Given two vectors A and B
Vector A = [x1 x2 x3 x4……xn] Vector B = [y1 y2 y3 y4……yn] Jaccard similarity is given by ∑ xi yi / [(∑ xi2)+(∑ yi2) - ∑ xi yi ]
21
Pearson Correlation Given two vectors A and B
Vector A = [x1 x2 x3 x4……xn] Vector B = [y1 y2 y3 y4……yn] Pearson correlation is given by ½* ∑ [(xi-Mean(∑ xi))*(yi-Mean(∑ yi))]/ [∑(xi-Mean(∑ xi))2*∑(yi-Mean(∑ yi))2]1/2] + 1
22
Final Functional Match
Cosine Pearson Jaccard Program A Program B F1 F3 F2 F4 F5 Match Final Functional Match F3 (Max Similarity) F4 (2/3 Max Similarity) F5 -1
23
Results Source Program Target Program Avg. Similarity Win32.Evol.a
Win32.Evol.b 0.998 Oroch.a Oroch.b 1.000 Win32.Evul.a Win32.Evul.b 0.994 Java1.5 Logoff.exe 0.689 Java1.4 0.988 PPT2001 PPT2003 0.997 Mcg.exe 0.657 Calc.exe 0.579 0.627 Mspaint.exe 0.573
24
Future Work Automate disassembly output for Module 1
Use other machine learning algorithms for similarity analysis Use other heuristic approaches (such as extensive identification for dead code insertion techniques) Write a program obfuscator to minimize efforts of hand coding obfuscations Obtain and experiment on more obfuscated code samples. Test our method of detection against popular Anti-virus scanners
25
Limitations Method does not work for packed code and code that uses anti-disassembly techniques Identification may be slower since it requires additional computation compared to simple signature based detection. Limited dead code insertion identification (currently limited to NOPs) Does not work for instruction substitution.
26
References M.Christodorescu, S.Jha. Testing Malware Detectors. International Symposium on Software Testing and Analysis’ 2004 M.Christodorescu, S.Jha. Static Analysis of Executables to Detect Malicious Patterns.12th USENIX Security Symposium’ 2003 C.Collberg, C.Thomborson, D.Low. A taxonomy of obfuscating transformations. Technical Report 148,Department of Computer science, University of Auckland, New Zealand, July 1997 M.Weber, M.Schmid, M.Schatz, D.Geyer. A Toolkit for Detecting and Analysing Malicious Software.Proceedings of ASAC ‘2002. Peter Szor , The Art of Computer Virus Research and Defense, Symantec Press. M Steinbach, G Karypis, V Kumar. A comparison of document clustering techniques. KDD workshop on text mining 2000.
27
M. Christodorescu, S. Jha, S. A. Seshia, D. Song, and R. E. Bryant
M. Christodorescu, S. Jha, S. A. Seshia, D. Song, and R. E. Bryant. “Semantics-aware malware detection”. In Proceedings of the 2005IEEE Symposium on Security and Privacy (S&P’05), pages 32–46, Oakland, CA, USA, May 8–11, IEEE Computer Society. Mila Dalla Preda, Mihai Christodorescu, Somesh Jha and Saumya Debray, “A Semantics-Based Approach to Malware Detection”, Proceedings of "34th Annual Symposium on Principles of Programming Languages (POPL'07).", 2007 Christian S. Collberg and Clark Thomborson.“Watermarking, tamper-proofing, and obfuscation - tools for software protection”. In IEEE Transactions on Software Engineering, volume 28, pages 735–746, August 2002. A. Lakhotia, E U. Kumar and M. Venable, “A Method for Detecting Obfuscated Calls in Malicious Binaries”, IEEE Transactions on Software Engineering, Vol 31, No 11, November 2005. A.H. Sung, J. Xu, P. Chavez, S. Mukkamala, “Static Analyzer of Vicious Executables (SAVE). In the Proceedings of the 20th Annual Computer Security Applications Conference (ACSAC’04)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.