CISC 879 - Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Discovering and Exploiting Program Phases Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman Sair, Brad Calder CSE 231 Presentation by Justin Ma.
Han-na Yang Trace Clustering in Process Mining M. Song, C.W. Gunther, and W.M.P. van der Aalst.
A Survey of Botnet Size Measurement PRESENTED: KAI-HSIANG YANG ( 楊凱翔 ) DATE: 2013/11/04 1/24.
Malware Identification and Classification
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Impeding Malware Analysis Using Conditional Code Obfuscation Paper by: Monirul Sharif, Andrea Lanzi, Jonathon Giffin, and Wenke Lee Conference: Network.
Effective and Efficient Malware Detection at the End Host Clemens Kolbitsch, Paolo Milani TU Vienna Christopher UCSB Engin Kirda.
CS292 Computational Vision and Language Pattern Recognition and Classification.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
“A Comparison of Document Clustering Techniques” Michael Steinbach, George Karypis and Vipin Kumar (Technical Report, CSE, UMN, 2000) Mahashweta Das
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Evaluating Performance for Data Mining Techniques
Impact Analysis of Database Schema Changes Andy Maule, Wolfgang Emmerich and David S. Rosenblum London Software Systems Dept. of Computer Science, University.
Automated malware classification based on network behavior
MutantX-S: Scalable Malware Clustering Based on Static Features Xin Hu, IBM T.J. Watson Research Center; Sandeep Bhatkar and Kent Griffin, Symantec Research.
CISC Machine Learning for Solving Systems Problems Presented by: Akanksha Kaul Dept of Computer & Information Sciences University of Delaware SBMDS:
Combining Supervised and Unsupervised Learning for Zero-Day Malware Detection © 2013 Narus, Inc. Prakash Comar 1 Lei Liu 1 Sabyasachi (Saby) Saha 2 Pang-Ning.
A Taxonomy of Network and Computer Attacks Simon Hansman & Ray Hunt Computers & Security (2005) Present by Mike Hsiao, S. Hansman and R. Hunt,
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Behavior-based Spyware Detection By Engin Kirda and Christopher Kruegel Secure Systems Lab Technical University Vienna Greg Banks, Giovanni Vigna, and.
Presented by Tienwei Tsai July, 2005
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Carnegie Mellon Selected Topics in Automated Diversity Stephanie Forrest University of New Mexico Mike Reiter Dawn Song Carnegie Mellon University.
A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation Dmitri G. Roussinov Department of.
Ether: Malware Analysis via Hardware Virtualization Extensions Author: Artem Dinaburg, Paul Royal, Monirul Sharif, Wenke Lee Presenter: Yi Yang Presenter:
Generalized Fuzzy Clustering Model with Fuzzy C-Means Hong Jiang Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, US.
Behavioral Detection of Malware on Mobile Handsets Abhijit Bose IBM TJ Watson Research Xin Hu University of Michigan Kang G. Shin University of Michigan.
Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.
KAIST Internet Security Lab. CS710 Behavioral Detection of Malware on Mobile Handsets MobiSys 2008, Abhijit Bose et al 이 승 민.
Roberto Paleari,Universit`a degli Studi di Milano Lorenzo Martignoni,Universit`a degli Studi di Udine Emanuele Passerini,Universit`a degli Studi di Milano.
Automatically Generating Models for Botnet Detection Presenter: 葉倚任 Authors: Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel,
CISC Machine Learning for Solving Systems Problems Presented by: Alparslan SARI Dept of Computer & Information Sciences University of Delaware
Biologically Inspired Defenses against Computer Viruses International Joint Conference on Artificial Intelligence 95’ J.O. Kephart et al.
Problems in large-scale computer vision David Crandall School of Informatics and Computing Indiana University.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
CISC Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
IR Homework #3 By J. H. Wang May 4, Programming Exercise #3: Text Classification Goal: to classify each document into predefined categories Input:
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
Query Sensitive Embeddings Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, Stan Sclaroff.
A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER
Ensemble Learning for Low-level Hardware-supported Malware Detection
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
CISC Machine Learning for Solving Systems Problems Presented by: Suparna Manjunath Dept of Computer & Information Sciences University of Delaware.
Automated Worm Fingerprinting Authors: Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Publish: OSDI'04. Presenter: YanYan Wang.
CISC Machine Learning for Solving Systems Problems Presented by: Eunjung Park Dept of Computer & Information Sciences University of Delaware Solutions.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Detecting Remote Evolutionary Relationships among Proteins by Large-Scale Semantic Embedding Xu Linhe 14S
2009/6/221 BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure- Independent Botnet Detection Reporter : Fong-Ruei, Li Machine.
TEMPLATE DESIGN © Crawling is the process of automatically exploring a web application to discover the states of the application.
SEMINAR - SCALABLE, BEHAVIOR-BASED MALWARE CLUSTERING GUIDES : BOJAN KOLOSNJAJI, MOHAMMAD REZA NOROUZIAN, GEORGE WEBSTER PRESENTER RAMAKANT AGRAWAL.
Unveiling Zeus Automated Classification of Malware Samples Abedelaziz Mohaisen Omar Alrawi Verisign Inc, VA, USA Verisign Labs, VA, USA
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Learning to Detect and Classify Malicious Executables in the Wild by J
BotCatch: A Behavior and Signature Correlated Bot Detection Approach
Objective of This Course
Reducing Training Time in a One-shot Machine Learning-based Compiler
Ernest Valveny Computer Vision Center
Presentation transcript:

CISC Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic Analysis of Malware Behavior using Machine Learning Author’s: Konrad Rieck, Philipp Trinius, Carsten Willems, and Thosten Holz

CISC Machine Learning for Solving Systems Problems Abstract & Introduction Malware - Poses major threat to security of computer systems. Very diverse – viruses, internet worms, trojan horses, Amount of malware – millions of hosts infected Obfuscation and polymorphism impede detection at file level Dynamic analysis helps characterizing and defending.

CISC Machine Learning for Solving Systems Problems A bstract & Introduction Contd.. Framework for automatic analysis of malware behavior using Machine learning Framework allows automatic analysis of novel classes of malware with similar behavior – Clustering. Assigning unknown classes of malware to these discovered classes – Classification. An incremental approach based on both for behavior based analysis.

CISC Machine Learning for Solving Systems Problems Automatic analysis of Malware Behavior Framework steps and procedure Executing and monitoring malware binaries in sandbox environment. Report generated on system calls and their arguments. Sequential reports are embedded in a vector space where each dimension is associated with a behavioral pattern. ML techniques then applied to the embedded reports to identify and classify malware. Incremental analysis progress by alternating between clustering and classification.

CISC Machine Learning for Solving Systems Problems Report representation Can be textual or XML Human readable and suitable for computation of general statistics But not efficient for automatic analysis Hence MIST (Malware Instr. Set) Inspired from instr. set used in process design.

CISC Machine Learning for Solving Systems Problems MIST Category of system calls Operation - Reflects a particular system call Arguments as argblocks.

CISC Machine Learning for Solving Systems Problems Sandbox and MIST representation

CISC Machine Learning for Solving Systems Problems Representation These sequential reports identify typical behavior of malware – Changing registry keys, modifying system files. But still not suitable for efficient analysis techniques. Hence the need to embed behavior reports in vector space – Using instruction q-grams. This embedding enables expressing the similarity of behavior geometrically – Calculating distance.

CISC Machine Learning for Solving Systems Problems Clustering and Classification Reports are embedded in vector space – Process ready for applying ML techniques Clustering of behavior – where classes of similar behavior malware are identified. Classification of behavior – which allows to assign malware to known classes of behavior. What allows us to do this? Malware binaries are a family of similar variants with similar behavior patterns !

CISC Machine Learning for Solving Systems Problems Contd..

CISC Machine Learning for Solving Systems Problems Algorithms Prototype extraction Iterative algorithm Extracts small set of prototypes from set of reports. First one chosen at random. Clustering using Prototypes Prototypes at beginning are individual clusters Algorithm determines and merges nearest pairs of clusters Classification using Prototypes Allows to learn to discriminate between classes of malware.

CISC Machine Learning for Solving Systems Problems Algorithms Contd.. For each report algorithm determines the nearest prototype of clusters in training data, if within radius then assigns to cluster Else rejects and holds back for later incremental analysis. Incremental analysis Reports to be analyzed are received from source. Initially classified using prototypes of known clusters Thereby variants of known malware are identified for further analysis. Prototypes extracted from remaining reports and clustered again.

CISC Machine Learning for Solving Systems Problems Experiments and Results

CISC Machine Learning for Solving Systems Problems Evaluating components Prototype extraction Evaluated using Precision, Recall and Compression. Precision – 0.99 when corpus compressed by 2.9 % & 7% Clustering Evaluated using F-measure F-measure for experiments – MIST 1 = 0.93 and MIST 2 = 0.95 better than previous related work Classification F-measure for experiments – MIST 1= 0.96 and MIST 2 = 0.99

CISC Machine Learning for Solving Systems Problems Experiments and Results Contd..

CISC Machine Learning for Solving Systems Problems Experiments and Results Contd..

CISC Machine Learning for Solving Systems Problems Conclusion A new framework introduced which overcomes several previous deficiencies. The framework is learning based Framework can be implemented in practice Steps – Collection of malware, a study in sandbox environment, embed observed behavior in vector space, apply learning algorithms – clustering and classification. This process is efficient and learns automatically after initial setup and run.

CISC Machine Learning for Solving Systems Problems Thank you !