Download presentation
Presentation is loading. Please wait.
Published byEdwina Walters Modified over 6 years ago
1
Malware Recognition with Binary Fingerprint Final Meeting
Students : Tal Greenshpan & Offer Akrabi Supervisors : Ben Herzog & Amir Mizrahi (CheckPoint)
2
Goals Build an automated classifier for new malware
Using static analysis methods Help reverse engineers classify new malware Comparing new functions to known functions
3
Methodology Static Analysis
PE files Research important features in function comparison Reverse engineering Extract key features in order to identify resemblance between functions Keep only key features Develop an algorithm to determine feature similarity Compare functions Feature contribution
4
Methodology Build a database of known functions
MSSQL Develop extractor and classifier Python IDAPython Testing Extra: GUI
5
Achievements Decided on a set of features to be used to differentiate functions Function size Number of API call Register count Memory count Arguments count Local variables size Features from the Function Call Graph (Generated by IDA) Number of Nodes Min/Max Out-degree Min /Max In-degree Min/Max Well Connected Components size Ratio of out-degrees that are larger than 1 Ratio of in-degrees that are larger than 1 Ratio of Well Connected Components that are larger than 1 Number of API call – All API calls made and the number of occurrences Register count – Number of time the registers were accessed Mem count – Number of times the memory was accessed Arguments count – Number of arguments the function has Local variables – Number of local variables
6
Achievements Automated mass feature extraction
Low runtime complexity Created an Algorithm to differentiate functions Feature contribution Standard deviation Using the Numpy Python library Distance Algorithm – Contribution = -log(distance)
7
Achievements Successfully matched functions from actual malware samples! Distance Algorithm – Contribution = -log(distance)
8
Example Two very similar simple C++ malware like programs
Different number of arguments Different number of local variables Different order of declaration Database containing about 2,500 functions
9
Perfect match : Resemblance = 34
כ
10
Function Call Graphs (generated by IDA) for the encryption function
twin1.exe twin2.exe
11
Live Demonstration Database containing about 1,000 functions
Suspected Zeus malware related files Locky ransomware samples Analysis of a different Locky sample, not in the database File analyzed : 0deb_U.exe Function analyzed: sub_402743
12
Conclusions Efficient classification of functions with selected features The first set of features we selected did not get sufficient results Euclidian distance not good enough to differentiate functions Good classification accuracy Run time complexity for very large databases could be problematic Can improve run time significantly – cost to accuracy Removing only one feature Most of the run time is spent calculating the contribution of each feature , therefore if the database is left unchanged than no need to calculate it again – saves a lot of time. Our run time complexity is O(n^2)
13
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.