Department of Computer Science Yasmine Kandissounon.

Slides:



Advertisements
Similar presentations
Smita Thaker 1 Polymorphic & Metamorphic Viruses Presented By : Smita Thaker Dated : Nov 18, 2003.
Advertisements

Data Mining Classification: Alternative Techniques
Low Complexity Keypoint Recognition and Pose Estimation Vincent Lepetit.
Theoretical Program Checking Greg Bronevetsky. Background The field of Program Checking is about 13 years old. Pioneered by Manuel Blum, Hal Wasserman,
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Ragib Hasan Johns Hopkins University en Spring 2011 Lecture 10 04/18/2011 Security and Privacy in Cloud Computing.
Metamorphic Malware Research
METAMORPHIC SOFTWARE FOR GOOD AND EVIL Wing Wong & Mark Stamp November 20, 2006.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
HUNTING FOR METAMORPHIC ENGINES Mark Stamp & Wing Wong August 5, 2006.
Profile Hidden Markov Models PHMM 1 Mark Stamp. Hidden Markov Models  Here, we assume you know about HMMs o If not, see “A revealing introduction to.
05/06/2005CSIS © M. Gibbons On Evaluating Open Biometric Identification Systems Spring 2005 Michael Gibbons School of Computer Science & Information Systems.
Pairwise Alignment of Metamorphic Computer Viruses Student:Scott McGhee Advisor:Dr. Mark Stamp Committee:Dr. David Taylor Dr. Teng Moh.
1 Time Scales Virtual Clocks and Algorithms Ricardo José de Carvalho National Observatory Time Service Division February 06, 2008.
Beyond Anti-Virus by Dan Keller Fred Cohen- Computer Scientist “there is no algorithm that can perfectly detect all possible computer viruses”
Estimating Software Size Part I. This chapter first discuss the size estimating problem and then describes the PROBE estimating method used in this book.
Concordia University Department of Computer Science and Software Engineering Click to edit Master title style ADVANCED PROGRAMING PRACTICES API documentation.
Sample Design.
Automated malware classification based on network behavior
1. Introduction Generally Intrusion Detection Systems (IDSs), as special-purpose devices to detect network anomalies and attacks, are using two approaches.
CISC Machine Learning for Solving Systems Problems Presented by: Akanksha Kaul Dept of Computer & Information Sciences University of Delaware SBMDS:
Data Mining Techniques
Over the last years, the amount of malicious code (Viruses, worms, Trojans, etc.) sent through the internet is highly increasing. Due to this significant.
Introduction to Profile Hidden Markov Models
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
1 Chap 10 Malicious Software. 2 Viruses and ”Malicious Programs ” Computer “Viruses” and related programs have the ability to replicate themselves on.
Masquerade Detection Mark Stamp 1Masquerade Detection.
Basic tasks of generic software Chapter 3. Contents This presentation covers the following: – The basic tasks of standard/generic software including:
Alternative Measures of Risk. The Optimal Risk Measure Desirable Properties for Risk Measure A risk measure maps the whole distribution of one dollar.
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
EVENT MANAGEMENT IN MULTIVARIATE STREAMING SENSOR DATA National and Kapodistrian University of Athens.
Fast Portscan Detection Using Sequential Hypothesis Testing Authors: Jaeyeon Jung, Vern Paxson, Arthur W. Berger, and Hari Balakrishnan Publication: IEEE.
Lecture 6: The Ultimate Authorship Problem: Verification for Short Docs Moshe Koppel and Yaron Winter.
Behavior-based Spyware Detection By Engin Kirda and Christopher Kruegel Secure Systems Lab Technical University Vienna Greg Banks, Giovanni Vigna, and.
COMPUTER-ASSISTED PLAGIARISM DETECTION PRESENTER: CSCI 6530 STUDENT.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
1 A Feature Selection and Evaluation Scheme for Computer Virus Detection Olivier Henchiri and Nathalie Japkowicz School of Information Technology and Engineering.
Hunting for Metamorphic Engines Wing Wong Mark Stamp Hunting for Metamorphic Engines 1.
Audit Sampling: An Overview and Application to Tests of Controls
Statistical Tools for Linking Engine-generated Malware to its Engine Edna C. Milgo M.S. Student in Applied Computer Science TSYS School of Computer Science.
Problem Solving Techniques. Compiler n Is a computer program whose purpose is to take a description of a desired program coded in a programming language.
Biologically Inspired Defenses against Computer Viruses International Joint Conference on Artificial Intelligence 95’ J.O. Kephart et al.
22CS 338: Graphical User Interfaces. Dario Salvucci, Drexel University. Lecture 10: Advanced Input.
Hidden Markov Models for Software Piracy Detection Shabana Kazi Mark Stamp HMMs for Piracy Detection 1.
Normalizing Metamorphic Malware Using Term Rewriting A. Walenstein, R. Mathur, M. R. Chouchane, and A. Lakhotia Software Research Laboratory The University.
CISC Machine Learning for Solving Systems Problems Presented by: Sandeep Dept of Computer & Information Sciences University of Delaware Detection.
PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.
Using Engine Signature to Detect Metamorphic Malware Mohamed R. Chouchane and Arun Lakhotia Software Research Laboratory The University of Louisiana at.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin 8-1 Chapter Eight Audit Sampling: An Overview and Application.
Finding Near-Duplicate Web Pages: A Large-Scale Evaluation of Algorithms Author: Monika Henzinger Presenter: Chao Yan.
Forensic Analysis of Toolkit-Generated Malicious Programs Yasmine Kandissounon TSYS School of Computer Science Columbus State University 2009 ACM Mid-Southeast.
© Love Ekenberg Hashing Love Ekenberg. © Love Ekenberg In General These slides provide an overview of different hashing techniques that are used to store.
Introduction Mehdi Einali Advanced Programming in Java 1.
Introduction to OOP CPS235: Introduction.
Technion Haifa Research Labs Israel Institute of Technology Underapproximation for Model-Checking Based on Random Cryptographic Constructions Arie Matsliah.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
HUNTING FOR METAMORPHIC ENGINES Mark Stamp & Wing Wong September 13, 2006.
October 20-23rd, 2015 Deep Neural Network Based Malware Detection Using Two Dimensional Binary Program Features Joshua Saxe, Dr. Konstantin Berlin Invincea.
Automatic Extraction of Malicious Behaviors
Audit Sampling: An Overview and Application to Tests of Controls
BotCatch: A Behavior and Signature Correlated Bot Detection Approach
Unit# 9: Computer Program Development
An Efficient method to recommend research papers and highly influential authors. VIRAJITHA KARNATAPU.
Hidden Markov Models Part 2: Algorithms
Algorithms and Problem Solving
Presentation transcript:

Department of Computer Science Yasmine Kandissounon

The problem In an attempt to create the undetectable virus, malware writers have imagined and used many strategies, the most current and efficient being “metamorphism”. Metamorphism is a strategy that helps a virus hide its malicious behavior and change its appearance at each generation. Metamorphism led to a profusion of virus with which the Anti-Virus scanners cannot keep up with. The invention of virus generation kits made things worse as it allows people with few or even non-existent programming skills to generate a metamorphic virus in no time [1]. Most malicious programs created by virus generation kits are able to avoid detection because the techniques used by Anti Virus scanners are just not efficient enough to outsmart them.

Related Works Metamorphic malware has challenged scholars and inspired serious research. IBM researchers applied neural networks to detect boot-sector viruses [2]. They generated short byte strings (called trigrams) from a set of trained examples and used them as features for virus detection. According to IBM, this technique has helped detect about 75% of known boot sector viruses, but failed to recognize programs which malicious programs were obscured. Chouchane and al. have suggested a detection using Instructions Frequency Vectors based on Markov Chains [3]. They computed the matrices for the IFVs of the eve of a virus and its variant after a number of generations to prove or disprove that the variant was generated by a particular engine.

Our solution Eugene Spafford’s analysis of authorship of a software inspired our solution. Spafford used the same idea behind forensic linguistics which can accurately identify an English text’s author [4]. Indeed, combining software metrics and other features like variable naming and code indentation, Spafford showed that a program could be attributed to a specific author. His technique is even easier to use in the case of virus generation kits, given that their signature is more consistent than humans’. Our solution consists of using Markov Chains to attribute the authorship of a virus to an engine. The extraction and study of the opcodes of a number of variants of popular generation kits showed an independency between an opcode and the one two steps up. Hence, the Markov Chains can be applied to viruses generated by kits to get the engine’s signature.

The culture We decided to work with both Next Generation Virus Construction Kit (NGVCK) and Virus Creation Lab (VCL32). Mark Stamp from San Jose State University showed that the similarities among NGVCK variants are less than 2%, which makes it a highly metamorphic engine and thus relevant to our study [6]. Except from the fact that VCL32 variants also presents an interesting low degree of similarity, VCL32 has inspired many other virus generation kits which strive to get the same metamorphic features. Virtual Box of Sun Microsystems was used as our isolated platform. We downloaded our kits from vx.netlux.org, which is an almost complete repository of all known virus engines, constructors and simulators. This website also provides some documentation for each virus in a library.

NGVCK’s graphical interface VCL32’s graphical interface

The work From the preceding GUIs, we created 50 variants of each kit and extracted the opcodes of each variant using a little homemade java program. The next phase in the process of finding a common signature to the variants of each kit consists of computing a transition matrix using Markov Chains for each variant and calculating the average matrix which will constitute a signature for each the variants of NGVCK and VCL32.

A Markov Chain is a set of states linked by probabilities. Let S={s 1, s 2, s 3,…, s n } be a set of states. If a process starts at s1, it will need a probability p 12 (called transition probability) to move to state s 2 and so on. More generally, the probability p ij of a process with n states to move from s i to s j is : n p ij = ∑ p ik p kj k=1 A transition matrix is a matrix which holds the probabilities of the different states in the Markov Chain. In our case, the states are the different opcodes

For each opcode, the probability will be taken proportionally to the opcodes that follow it. Thus, if an opcode O i occurs n times in a variant and is followed x times by an opcode O j, in our transition matrix the probability p for state (here opcode) O i to be followed by state O j is: x/n. As an example, let’s compute the transition matrix of a simple program with opcodes common to those in our variants:

call push add sub jmp push add call This yields the following transition matrix M (notice that the sum of the probabilities of each state has to be 1): CallAddSubJmpPush Call Add 1/3 00 M=M= Sub Jmp Push 0½001/2

Expected Impact Our solution presents the advantage of accuracy and space and time efficiency. Using Markov Chains help reduce the percentage of false positives. We expect to define a reasonable threshold which will help separate malicious programs from benign ones without getting high quantities of false negatives. In addition, storing only one signature for a whole set of metamorphic variants with a common origin is more space-efficient than storing a signature for each of the variants as the Anti Virus companies seems to do. Finally, our solution presents the advantage of being time-efficient, as the algorithm of the comparison our computed signature against a potential malicious program has a linear time complexity in the size of the matrix, which is accepted as time-efficient by scientists.

Limitations Although our solution seems very appealing, it also has some downsides: One disadvantage is the very fact that the signature is the average matrix. The definition of a threshold to back up the average matrix may be really tricky as it will need to be accurate enough to avoid false negatives. Also, because we have a very limited culture (50 variants for each NGVCK and VCL32), we will test the signature on a very limited scale and will only assume it works on a larger scale.

References [1] Jhttp://packetstormsecurity.org/mag/40hex/40HEX-10/40HEX J [2] Nets.html [3] M.R. Chouchane, A. Walenstein, A. Lakhotia. Using Markov Chains to Filter Machine-morphed Variants of Malicious Programs. [4] Ivan Krsul and Eugene H. Spafford, Authorship Analysis: Identifying the Author of a Program. [5]Peter Szor, Advanced Code Evolution Techniques and Computer Virus Generation Kits. [6]Wing wong and Mark Stamp, Hunting for Metamorphic Engine. [7]