Over the last years, the amount of malicious code (Viruses, worms, Trojans, etc.) sent through the internet is highly increasing. Due to this significant.

Slides:



Advertisements
Similar presentations
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Advertisements

Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Decision Tree Approach in Data Mining
1 Detection of Injected, Dynamically Generated, and Obfuscated Malicious Code (DOME) Subha Ramanathan & Arun Krishnamurthy Nov 15, 2005.
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Aki Hecht Seminar in Databases (236826) January 2009
Metamorphic Malware Research
1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with.
Greedy Algorithms Huffman Coding
Radial Basis Function Networks
Beyond Anti-Virus by Dan Keller Fred Cohen- Computer Scientist “there is no algorithm that can perfectly detect all possible computer viruses”
PAKDD'15 DATA MINING COMPETITION: GENDER PREDICTION BASED ON E-COMMERCE DATA Team members: Maria Brbic, Dragan Gamberger, Jan Kralj, Matej Mihelcic, Matija.
Automated malware classification based on network behavior
Silvio Cesare Ph.D. Candidate, Deakin University.
Malware  Viruses  Virus  Worms  Trojan Horses  Spyware –Keystroke Loggers  Adware.
Modularizing B+-trees: Three-Level B+-trees Work Fine Shigero Sasaki* and Takuya Araki NEC Corporation * currently with 1st Nexpire Inc.
BY ANDREA ALMEIDA T.E COMP DON BOSCO COLLEGE OF ENGINEERING.
Presented by Tienwei Tsai July, 2005
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman.
Detection of ASCII Malware Parbati Kumar Manna Dr. Sanjay Ranka Dr. Shigang Chen.
1 Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Speaker: Jun-Yi Zheng 2010/03/29.
Learning Phonetic Similarity for Matching Named Entity Translation and Mining New Translations Wai Lam, Ruizhang Huang, Pik-Shan Cheung ACM SIGIR 2004.
AUTHORS: ASAF SHABTAI, URI KANONOV, YUVAL ELOVICI, CHANAN GLEZER, AND YAEL WEISS "ANDROMALY": A BEHAVIORAL MALWARE DETECTION FRAMEWORK FOR ANDROID.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Author : Ozgun Erdogan and Pei Cao Publisher : IEEE Globecom 2005 (IJSN 2007) Presenter : Zong-Lin Sie Date : 2010/12/08 1.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Information Coding in noisy channel error protection:-- improve tolerance of errors error detection: --- indicate occurrence of errors. Source.
1 A Feature Selection and Evaluation Scheme for Computer Virus Detection Olivier Henchiri and Nathalie Japkowicz School of Information Technology and Engineering.
Hunting for Metamorphic Engines Wing Wong Mark Stamp Hunting for Metamorphic Engines 1.
AccessMiner Using System- Centric Models for Malware Protection Andrea Lanzi, Davide Balzarotti, Christopher Kruegel, Mihai Christodorescu and Engin Kirda.
Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.
One-class Training for Masquerade Detection Ke Wang, Sal Stolfo Columbia University Computer Science IDS Lab.
Biologically Inspired Defenses against Computer Viruses International Joint Conference on Artificial Intelligence 95’ J.O. Kephart et al.
Christopher Kruegel University of California Engin Kirda Institute Eurecom Clemens Kolbitsch Thorsten Holz Secure Systems Lab Vienna University of Technology.
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,
Decision Trees. MS Algorithms Decision Trees The basic idea –creating a series of splits, also called nodes, in the tree. The algorithm adds a node to.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
A genetic approach to the automatic clustering problem Author : Lin Yu Tseng Shiueng Bien Yang Graduate : Chien-Ming Hsiao.
Error Control Code. Widely used in many areas, like communications, DVD, data storage… In communications, because of noise, you can never be sure that.
Dealing with Malware By: Brandon Payne Image source: TechTips.com.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
1 CS 430: Information Discovery Lecture 4 Files Structures for Inverted Files.
HMM - Part 2 The EM algorithm Continuous density HMM.
Polygraph: Automatically Generating Signatures for Polymorphic Worms James Newsome, Brad Karp, and Dawn Song Carnegie Mellon University Presented by Ryan.
CISC Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Finding frequent and interesting triples in text Janez Brank, Dunja Mladenić, Marko Grobelnik Jožef Stefan Institute, Ljubljana, Slovenia.
Polygraph: Automatically Generating Signatures for Polymorphic Worms Presented by: Devendra Salvi Paper by : James Newsome, Brad Karp, Dawn Song.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.
Step 3: Tools Database Searching
INVITATION TO Computer Science 1 11 Chapter 2 The Algorithmic Foundations of Computer Science.
Polygraph: Automatically Generating Signatures for Polymorphic Worms Authors: James Newsome (CMU), Brad Karp (Intel Research), Dawn Song (CMU) Presenter:
Multiple Sequence Alignment Vasileios Hatzivassiloglou University of Texas at Dallas.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Protein – Protein Interactions Simon Kanaan Advisor: Dr. Izaguirre Others: Dr. Chen, Dr. Wuchty, ChengBang Huang.
2014 Unsupervised Malware Classification: How Bad Software Can Find its own Kind Shannon Steinfadt, Ph.D., Juston Moore, Micah Yates Los Alamos National.
CS4432: Database Systems II Query Processing- Part 1 1.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
Automatic Extraction of Malicious Behaviors
POLYGRAPH: Automatically Generating Signatures for Polymorphic Worms
Data Mining Lecture 11.
Hidden Markov Models Part 2: Algorithms
Artificial Immune System against Viral Attack
Merge Sort 11/28/2018 2:21 AM The Greedy Method The Greedy Method.
Project 1 Binary Classification
Analysis of Algorithms CS 477/677
Presentation transcript:

Over the last years, the amount of malicious code (Viruses, worms, Trojans, etc.) sent through the internet is highly increasing. Due to this significant growth, viruses' renewal and improvement is done much faster than the update time of the anti-virus software selling today. Our solution focuses on the signature generation process. We have developed an automatic system, which its goal is to extract simple, unique and optimal signatures for malicious files. This way any IDS/IPS will be able to neutralize a hostile code in real-time. In addition we have developed an evaluation environment - its objective is to determine the best configuration for generating an optimal signature for malicious files. Language:IDE:Operation System: Ido LevinOfir NisselYotam Katzman Academic Advisor: Dr. Yuval Elovici Professional Advisor: Mr. Asaf Shabtai Extraction method: The Interactive Disassembler (IDA): IDA is a commercial Disassembler widely used for reverse engineering meaning, it is able to receive a binary file and reverse it back to the assembler code. Using a dedicated plug-in, IDA can identify, extract and normalize all the functions in the file. Data mining : Using classifier which takes a training set of bytes' segments and classify if it an end, start or neither, then classify segments of bytes from a suspicious file, and determine if these segments are start, end or neither. That way we are able to extract functions from a given file. Selection methods: Random Selector: Choose a signature randomly from the candidates. Minimum Entropy Selector: The selector calculates the entropy of the candidates and selects the one with the minimum entropy. Cluster Selector: This Selector creates groups of candidates by their distance from each other, and will score each cluster by the chance it will contain the best signature. Each cluster will get score that will reflect this chance with the following formula: Probability Selector: Key idea: estimate the probability that each of the candidate signatures will match a randomly chosen block of bytes in the code of a randomly chosen program Select one or more signatures with the lowest estimated False Positive probabilities of all the candidates which is less than pre-defined threshold. Cs denotes Cluster size in bytes Fs denotes File’s Size Fc denotes number of functions in cluster T denotes total number of function in file Fl denotes the sum of function’s length in cluster Let S be a string/signature. Sc character in S |Sc| the number of times Sc appears at S. The Entropy of S will be as follows: For a given sequence of S bytes B=B1B2…BS estimate the probability p(B) for B to occur in a large body of normal uninfected code: TS - number of S-byte sequences in a large corpus of uninfected programs f(B) - number of occurrences of B in Ts Generally, the Signature Builder system operation is: Building a common functions library (CFL), Given a malicious file, extract its functions and filter the common ones using the CFL, generate signature and at last Choosing from the remaining functions (candidates), the best one to act as the malicious file’s signature. The system extracts functions from the malwares by several algorithms, and provide a signature for each malware. Initialize Configuration CFL Handling Receive File from Client Initialize the system Extracting Functions Filter Common Functions Generate Candidates Select Best Candidate Return Signature Evaluation Environment - evaluates the different configurations of the signature builder, in order to decide about the quality of the signature. The main idea is checking if a signature of a malicious file appears in control group- benign files. Of course, a good signature which belongs to a malicious file – should not appear in benign files. The output consists the following: Processed - The number of malware files that the system managed to generate a signature for them. Processed (%) - Processed / Total Malware Files. Signature Hits - The number of malware files that gives at least one False Alarm, which means the number of unique malware files that produced False Alarm. Signature Hits (%) - Signature Hits / Processed. Unique Signature - The number of unique signatures that didn’t produced FA. Different Files - The number of distinct files in the Control Group that has at least one hit. Different Files (%) – Different Files / Total Control Group Files. Each configuration consist the following input: CFL size in MB maximum signature length in byte Function similarity threshold Offset size in byte Function Extractor Function selection.