Masquerade Detection Mark Stamp 1Masquerade Detection.

Slides:



Advertisements
Similar presentations
Lecture 9 Support Vector Machines
Advertisements

SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Sensor-Based Abnormal Human-Activity Detection Authors: Jie Yin, Qiang Yang, and Jeffrey Junfeng Pan Presenter: Raghu Rangan.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
SVM—Support Vector Machines
Machine learning continued Image source:
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Minimum Redundancy and Maximum Relevance Feature Selection
Partitioned Logistic Regression for Spam Filtering Ming-wei Chang University of Illinois at Urbana-Champaign Wen-tau Yih and Christopher Meek Microsoft.
The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,
Service Discrimination and Audit File Reduction for Effective Intrusion Detection by Fernando Godínez (ITESM) In collaboration with Dieter Hutter (DFKI)
Robust Moving Object Detection & Categorization using self- improving classifiers Omar Javed, Saad Ali & Mubarak Shah.
Profile-profile alignment using hidden Markov models Wing Wong.
Lecture 5: Learning models using EM
Metamorphic Malware Research
Decision Theory Naïve Bayes ROC Curves
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
Support Vector Machines
Semi-supervised protein classification using cluster kernels Jason Weston, Christina Leslie, Eugene Ie, Dengyong Zhou, Andre Elisseeff and William Stafford.
Profile Hidden Markov Models PHMM 1 Mark Stamp. Hidden Markov Models  Here, we assume you know about HMMs o If not, see “A revealing introduction to.
Machine Learning as Applied to Intrusion Detection By Christine Fossaceca.
Scalable Text Mining with Sparse Generative Models
05/06/2005CSIS © M. Gibbons On Evaluating Open Biometric Identification Systems Spring 2005 Michael Gibbons School of Computer Science & Information Systems.
Remote Homology detection: A motif based approach CS 6890: Bioinformatics - Dr. Yan CS 6890: Bioinformatics - Dr. Yan Swati Adhau Swati Adhau 04/14/06.
Wang, Z., et al. Presented by: Kayla Henneman October 27, 2014 WHO IS HERE: LOCATION AWARE FACE RECOGNITION.
Introduction to Profile Hidden Markov Models
A.C. Chen ADL M Zubair Rafique Muhammad Khurram Khan Khaled Alghathbar Muddassar Farooq The 8th FTRA International Conference on Secure and.
A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau,
Data Analysis 1 Mark Stamp. Topics  Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc.  Accuracy o.
Assignment 2: Papers read for this assignment Paper 1: PALMA: mRNA to Genome Alignments using Large Margin Algorithms Paper 2: Optimal spliced alignments.
The Disputed Federalist Papers: Resolution via Support Vector Machine Feature Selection Olvi Mangasarian UW Madison & UCSD La Jolla Glenn Fung Amazon Inc.,
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Hunting for Metamorphic Engines Wing Wong Mark Stamp Hunting for Metamorphic Engines 1.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Learning from Multi-topic Web Documents for Contextual Advertisement KDD 2008.
One-class Training for Masquerade Detection Ke Wang, Sal Stolfo Columbia University Computer Science IDS Lab.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.
Limitations of Cotemporary Classification Algorithms Major limitations of classification algorithms like Adaboost, SVMs, or Naïve Bayes include, Requirement.
Evaluation of Techniques for Classifying Biological Sequences Authors: Mukund Deshpande and George Karypis Speaker: Sarah Chan CSIS DB Seminar May 31,
1 CISC 841 Bioinformatics (Fall 2007) Kernel engineering and applications of SVMs.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support Vector Machines Tao Department of computer science University of Illinois.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Application of latent semantic analysis to protein remote homology detection Wu Dongyin 4/13/2015.
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Mismatch String Kernals for SVM Protein Classification Christina Leslie, Eleazar Eskin, Jason Weston, William Stafford Noble Presented by Pradeep Anand.
Machine Learning and Data Mining: A Math Programming- Based Approach Glenn Fung CS412 April 10, 2003 Madison, Wisconsin.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
Profile Hidden Markov Models PHMM 1 Mark Stamp. Hidden Markov Models  Here, we assume you know about HMMs o If not, see “A revealing introduction to.
Experience Report: System Log Analysis for Anomaly Detection
Learning to Detect and Classify Malicious Executables in the Wild by J
An Enhanced Support Vector Machine Model for Intrusion Detection
Combining HMMs with SVMs
A survey of network anomaly detection techniques
NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &
Presentation transcript:

Masquerade Detection Mark Stamp 1Masquerade Detection

 Masquerader --- someone who makes unauthorized use of a computer  How to detect a masquerader?  Here, we consider…  Anomaly-based intrusion detection (IDS)  Detection is based on UNIX commands  Lots and lots of prior work on this problem  We attempt to apply PHMMs  For comparison, we also implement other techniques (HMM and N-gram) Masquerade Detection2

Schonlau Data Set  Schonlau, et al, collected large data set  Contains UNIX commands for 50 users  50 files, one for each user  Each file has 15k commands, 5k from user plus 10k for masquerade test data  Test data: 100 blocks, 100 commands each  Dataset includes map file  100 rows (test blocks), 50 columns (users)  0 if block is user data, 1 if masquerade data Masquerade Detection3

Schonlau Data Set  Map file structure  This data set used for many studies  Approximately, 50 published papers Masquerade Detection4

Previous Work  Approaches to masquerade detection  Information theoretic  Text mining  Hidden Markov models (HMM)  Naïve Bayes  Sequences and bioinformatics  Support vector machines (SVM)  Other approaches  We briefly look at each of these Masquerade Detection5

Information Theoretic  Original work by Schonlau included a compression technique  Based on theory (hope?) that legitimate commands compress more than attack  Results were disappointing  Some additional recent work  Still not competitive with best approaches Masquerade Detection6

Text Mining  A few papers in this area  One approach extracts repetitive sequences from training data  Another paper use principal component analysis (PCA)  Method of “exploratory data analysis”  Good results on Schonlau data set  But high cost during training phase Masquerade Detection7

Hidden Markov Models  Several authors have used HMMs  One of the best known approaches  We have implemented HMM detector  We do sensitivity analysis on the parameters  In particular, determine optimal N (number of hidden states)  We also use HMMs for comparison with our PHMM results Masquerade Detection8

Naïve Bayes  In simplest form, relies only on command frequencies  That is, no sequence info is used  Several papers analyze this approach  Among the simplest approaches  And, results are good Masquerade Detection9

Sequences  In a sense, this is the opposite extreme from naïve Bayes  Naïve Bayes only considers frequency stats  Sequence/bioinformatics focused on sequence-related information  Schonlau’s original work included elementary sequence-based analysis Masquerade Detection10

Bioinformatics  We are aware of only one previous paper that uses bioinformatics approach  Use Smith-Waterman algorithm to create local alignments  Alignments then used directly for detection  In contrast, we do pairwise alignments, MSA, PHMM  PHMM is used for scoring (forward algorithm)  Our scoring is much more efficient  Also, our results are at least as strong Masquerade Detection11

Support Vector Machines  Support vector machines (SVM)  Machine learning technique  Separate data points (i.e., classify) based on hyperplanes in high dimensional space  Original data mapped to higher dimension, where separation is likely easier  SVMs maximize separation  And have low computational costs  Used for classification and regression analysis Masquerade Detection12

SVMs & Masquerade Detection  SVMs have been applied to masquerade detection problem  Results are good  Comparable to naïve Bayes  Recent work using SVMs focused on improved efficiency Masquerade Detection13

Other Approaches  The following have also been studied  Detect using low frequency commands  Detect using high frequency commands  Hybrid Bayes “one step Markov”  Natural to consider hybrid approaches  Multistep Markov  Markov process of order greater than 1  None of these particularly successful Masquerade Detection14

Other Approaches (Continued)  Non-negative matrix factorization (NMF)  At least 2 papers on this topic  Appears to be competitive  Other hybrids that attempt to combine several approaches  So far, no significant improvement over individual techniques Masquerade Detection15

HMMs  See previous presentation Masquerade Detection16

HMM for Masquerade Detection  Using the Schonlau data set we…  Train HMM for each user  Set thresholds  Test the models and plot results  Note that this has been done before  Here, we perform sensitivity analysis  That is, we test different number of hidden states, N  Also use it for comparison with PHMM Masquerade Detection17

HMM Experiments  Plotted as “ROC” curves  Closer to origin is better  Useful region  That is, false positives below 5%  The shaded region Masquerade Detection18

HMM Conclusion  Number of hidden states does not matter  So, use N=2  Since most efficient Masquerade Detection19

PHMM  See previous presentation Masquerade Detection20

PHMM Experiments  A problem with Schonlau data…  For given user, 5000 commands  No begin/end session markers  So, must split it up to obtain multiple sequences  But where to split sequence?  And what about tradeoff between number of sequences and length of each sequence?  That is, how to decide length/number??? Masquerade Detection21

PHMM Experiments  Experiments done for following cases:  See next slide… Masquerade Detection22

PHMM Experiments  Tests various numbers of sequences  Best results  5 sequences, 1k commands each seq.  This case in next slide Masquerade Detection23

PHMM Comparison  Compare PHMM to “weighted N -gram” and HMM  HMM is best  PHMM is competitive Masquerade Detection24

PHMM Detector  PHMM at disadvantage on Schonlau data  PHMM uses positional information  Such info not available for Schonlau data  We have to guess the positions for PHMM  How to get fairer comparison between HMM and PHMM?  We need different data set  Only option is simulated data set Masquerade Detection25

Simulated Data  We generate simulated data as follows  Using Schonlau data, construct Markov chain for each user  Use resulting Markov chain to generate sequences representing user behavior  Restrict “begin” to more common commands  What’s the point?  Simulated seqs have sensible begin and end Masquerade Detection26

Simulated Data  Training data and user data for scoring generated using Markov chain  Attack data taken from Schonlau data  How much data to generate?  First test, we generate same amount of simulated data as is in Schonlau set  That is, 5k commands per user Masquerade Detection27

Detection with Simulated Data  PHMM vs HMM  Round 2  It’s close, but HMM still wins! Masquerade Detection28

Limited Training Data  What if less training data is available?  In a real application, initially, training data is limited  Can’t detect attacks until sufficient training data has been accumulated  So, less data required, the better  Experiments, using simulated data, limited training date  Used 200 to 800 commands for training Masquerade Detection29

Limited Training Data  PHMM vs HMM  Round 3  With 400 or less, PHMM wins big! Masquerade Detection30

Conclusion  PHMM is competitive with best approaches  PHMM likely to do better, given better training data (begin/end info)  PHMM much better than HMM when limited training data available  Of practical importance  Why does it make sense that PHMM would do better with limited training data? Masquerade Detection31

Conclusion  Given current state of research…  Optimal masquerade detection approach  Initially, collect small training set  Train PHMM and use for detection  No attack, then continue to collect data  When sufficient data available, train HMM  From then on, use HMM for detection Masquerade Detection32

Future Work  Collect better real data set!!!  Many problems/limitations with Schonlau data  Improved data set could be basis for lots and lots of research  Directly compare PHMM/bioinformatics approaches with previous work (HMM, naïve Bayes, SVM, etc., etc.)  Consider hybrid techniques  Other techniques? Masquerade Detection33

References  Masquerade detection using profile hidden Markov models, L. Huang and M. Stamp, to appear in Computers and Security  Masquerading user data, M. Schonlau Masquerading user data Masquerade Detection34