Malware Recognition with Binary Fingerprint Final Meeting

Slides:



Advertisements
Similar presentations
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Advertisements

Segmentation of Touching Characters in Devnagari & Bangla Scripts Using Fuzzy MultiFactorial Analysis Presented By: Sanjeev Maharjan St. Xavier’s College.
Queries with Difference on Probabilistic Databases Sanjeev Khanna Sudeepa Roy Val Tannen University of Pennsylvania 1.
Face Recognition Method of OpenCV
Development of a visual studio plugin to visualize a Blocks-Graph
Fingerprint Minutiae Matching Algorithm using Distance Histogram of Neighborhood Presented By: Neeraj Sharma M.S. student, Dongseo University, Pusan South.
Jianwei Lu1 Information Extraction from Event Announcements Student: Jianwei Lu ( ) Supervisor: Robert Dale.
SE 450 Software Processes & Product Metrics Reliability: An Introduction.
LYU0603 A Generic Real-Time Facial Expression Modelling System Supervisor: Prof. Michael R. Lyu Group Member: Cheung Ka Shun ( ) Wong Chi Kin ( )
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Oral Defense by Sunny Tang 15 Aug 2003
COVERTNESS CENTRALITY IN NETWORKS Michael Ovelgönne UMIACS University of Maryland 1 Chanhyun Kang, Anshul Sawant Computer Science Dept.
Vision-Based Biometric Authentication System by Padraic o hIarnain Final Year Project Presentation.
Automated malware classification based on network behavior
Silvio Cesare Ph.D. Candidate, Deakin University.
MutantX-S: Scalable Malware Clustering Based on Static Features Xin Hu, IBM T.J. Watson Research Center; Sandeep Bhatkar and Kent Griffin, Symantec Research.
Multiclass object recognition
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 17: Code Mining.
Industrial Project (234313) Final Presentation “App Analyzer” Deliver the right apps users want! (VMware) Students: Edward Khachatryan & Elina Zharikov.
1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute.
A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.
“Automate Anything You See” Uses image recognition to identify and control GUI components Useful when there is no easy access to a GUI's internal or source.
Chapter 13 Recursion. Topics Simple Recursion Recursion with a Return Value Recursion with Two Base Cases Binary Search Revisited Animation Using Recursion.
Visual Inspection Product reliability is of maximum importance in most mass-production facilities.  100% inspection of all parts, subassemblies, and.
Scalable Symbolic Model Order Reduction Yiyu Shi*, Lei He* and C. J. Richard Shi + *Electrical Engineering Department, UCLA + Electrical Engineering Department,
ANALYSIS AND IMPLEMENTATION OF GRAPH COLORING ALGORITHMS FOR REGISTER ALLOCATION By, Sumeeth K. C Vasanth K.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
RIVERSIDE RESEARCH INSTITUTE Deobfuscator: An Automated Approach to the Identification and Removal of Code Obfuscation Eric Laspe, Reverse Engineer Jason.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
21 June 2009Robust Feature Matching in 2.3μs1 Simon Taylor Edward Rosten Tom Drummond University of Cambridge.
Fighting Identity Theft with Advances in Fingerprint Recognition Dick Mathekga.
Biometric Iris Recognition System INTRODUCTION Iris recognition is fast developing to be a foolproof and fast identification technique that can be administered.
LOGOPolyUnpack: Automating the Hidden-Code Extraction of Unpack-Executing Malware Royal, P.; Halpin, M.; Dagon, D.; Edmonds, R.; Wenke Lee; Computer Security.
Handwritten Signature Verification
Motivation: Sorting is among the fundamental problems of computer science. Sorting of different datasets is present in most applications, ranging from.
SAR-ATR-MSTAR TARGET RECOGNITION FOR MULTI-ASPECT SAR IMAGES WITH FUSION STRATEGIES ASWIN KUMAR GUTTA.
Fingerprint Classification Maor Sharf. A fingerprint can be used by many organizations for many purposes: Fingerprints Police Biometric ID Security.
Reverse Engineering Contemporary Countermeasures By: Joshua Schwartz.
이 장 우. 1. Introduction  HPLC-MS/MS methodology achieved its preferred status -Highly selective and effectively eliminated interference -Without.
Content Based Coding of Face Images
Top 50 Data Structures Interview Questions
Recognition of biological cells – development
OBJECT ORIENTED PROGRAMMING II LECTURE 23 GEORGE KOUTSOGIANNAKIS
Multi-Layer Network Representation of the NTC Environment Lili Sun, Proof School Arijit Das, Computer Science Introduction The United States Army’s National.
My Tiny Ping-Pong Helper
pycuda Jin Kwon Kim May 25, 2017 Hi my name is jin kwon kim.
Chapter 1. Basic Static Techniques
Instance Based Learning
Harvesting Runtime Values in Android Applications That Feature Anti-Analysis Techniques Presented by Vikraman Mohan.
A paper on Join Synopses for Approximate Query Answering
A Malware Similarity Testing Framework
Submitted by: Ala Berawi Sujod Makhlof Samah Hanani Supervisor:
Data Structures Interview / VIVA Questions and Answers
BotCatch: A Behavior and Signature Correlated Bot Detection Approach
MID-SEM REVIEW.
Hybrid Features based Gender Classification
Department of Electrical & Computer Engineering
Queries with Difference on Probabilistic Databases
Presented by: Mikko Kyllönen
Party-by-Night Problem
Radio Propagation Simulation Based on Automatic 3D Environment Reconstruction D. He A novel method to simulate radio propagation is presented. The method.
MXNet Internals Cyrus M. Vahid, Principal Solutions Architect,
Performance Comparison of Tarry and Awerbuch Algorithms
Packet Classification with Evolvable Hardware Hash Functions
A Suite to Compile and Analyze an LSP Corpus
What is The Optimal Number of Features
Chapter 10 Content Analysis
Parameters and Arguments
Presentation transcript:

Malware Recognition with Binary Fingerprint Final Meeting Students : Tal Greenshpan & Offer Akrabi Supervisors : Ben Herzog & Amir Mizrahi (CheckPoint)

Goals Build an automated classifier for new malware Using static analysis methods Help reverse engineers classify new malware Comparing new functions to known functions

Methodology Static Analysis PE files Research important features in function comparison Reverse engineering Extract key features in order to identify resemblance between functions Keep only key features Develop an algorithm to determine feature similarity Compare functions Feature contribution

Methodology Build a database of known functions MSSQL Develop extractor and classifier Python IDAPython Testing Extra: GUI

Achievements Decided on a set of features to be used to differentiate functions Function size Number of API call Register count Memory count Arguments count Local variables size Features from the Function Call Graph (Generated by IDA) Number of Nodes Min/Max Out-degree Min /Max In-degree Min/Max Well Connected Components size Ratio of out-degrees that are larger than 1 Ratio of in-degrees that are larger than 1 Ratio of Well Connected Components that are larger than 1 Number of API call – All API calls made and the number of occurrences Register count – Number of time the registers were accessed Mem count – Number of times the memory was accessed Arguments count – Number of arguments the function has Local variables – Number of local variables

Achievements Automated mass feature extraction Low runtime complexity Created an Algorithm to differentiate functions Feature contribution Standard deviation Using the Numpy Python library Distance Algorithm – Contribution = -log(distance)

Achievements Successfully matched functions from actual malware samples! Distance Algorithm – Contribution = -log(distance)

Example Two very similar simple C++ malware like programs Different number of arguments Different number of local variables Different order of declaration Database containing about 2,500 functions

Perfect match : Resemblance = 34 כ

Function Call Graphs (generated by IDA) for the encryption function twin1.exe twin2.exe

Live Demonstration Database containing about 1,000 functions Suspected Zeus malware related files Locky ransomware samples Analysis of a different Locky sample, not in the database File analyzed : 0deb_U.exe Function analyzed: sub_402743

Conclusions Efficient classification of functions with selected features The first set of features we selected did not get sufficient results Euclidian distance not good enough to differentiate functions Good classification accuracy Run time complexity for very large databases could be problematic Can improve run time significantly – cost to accuracy Removing only one feature Most of the run time is spent calculating the contribution of each feature , therefore if the database is left unchanged than no need to calculate it again – saves a lot of time. Our run time complexity is O(n^2)

Thank you!