By Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting.

Slides:



Advertisements
Similar presentations
Random Forest Predrag Radenković 3237/10
Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Imbalanced data David Kauchak CS 451 – Fall 2013.
Support Vector Machines
Machine learning continued Image source:
TransAD: A Content Based Anomaly Detector Sharath Hiremagalore Advisor: Dr. Angelos Stavrou October 23, 2013.
A Ternary Unification Framework for Optimizing TCAM-Based Packet Classification Systems Author: Eric Norige, Alex X. Liu, and Eric Torng Publisher: ANCS.
Service Discrimination and Audit File Reduction for Effective Intrusion Detection by Fernando Godínez (ITESM) In collaboration with Dieter Hutter (DFKI)
Discriminative and generative methods for bags of features
CPSC 335 Computer Science University of Calgary Canada.
Models and Security Requirements for IDS. Overview The system and attack model Security requirements for IDS –Sensitivity –Detection Analysis methodology.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
x – independent variable (input)
Text Classification With Support Vector Machines
Intrusion Detection Systems and Practices
CS292 Computational Vision and Language Pattern Recognition and Classification.
Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department.
Beyond Bloom Filters: From Approximate Membership Checks to Approximate State Machines By F. Bonomi et al. Presented by Kenny Cheng, Tonny Mak Yui Kuen.
Ensemble Tracking Shai Avidan IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE February 2007.
Unsupervised Intrusion Detection Using Clustering Approach Muhammet Kabukçu Sefa Kılıç Ferhat Kutlu Teoman Toraman 1/29.
Neural Technology and Fuzzy Systems in Network Security Project Progress 2 Group 2: Omar Ehtisham Anwar Aneela Laeeq
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Machine Learning as Applied to Intrusion Detection By Christine Fossaceca.
Linear Discriminant Functions Chapter 5 (Duda et al.)
Big Data Analytics and Challenge Presented by Saurabh Rastogi Asst. Prof. in Maharaja Agrasen Institute of Technology B.Tech(IT), M.Tech(IT)
05/06/2005CSIS © M. Gibbons On Evaluating Open Biometric Identification Systems Spring 2005 Michael Gibbons School of Computer Science & Information Systems.
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Text Classification With Labeled and Unlabeled Data Presenter: Aleksandar Milisic Supervisor: Dr. David Albrecht.
Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorphic Worms Zhichun Li 1, Lanjia Wang 2, Yan Chen 1 and Judy Fu 3 1 Lab.
Combining Supervised and Unsupervised Learning for Zero-Day Malware Detection © 2013 Narus, Inc. Prakash Comar 1 Lei Liu 1 Sabyasachi (Saby) Saha 2 Pang-Ning.
A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau,
Alert Correlation for Extracting Attack Strategies Authors: B. Zhu and A. A. Ghorbani Source: IJNS review paper Reporter: Chun-Ta Li ( 李俊達 )
Masquerade Detection Mark Stamp 1Masquerade Detection.
Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Classification. An Example (from Pattern Classification by Duda & Hart & Stork – Second Edition, 2001)
Computer Security and Penetration Testing
Efficient Minimal Perfect Hash Language Models David Guthrie, Mark Hepple, Wei Liu University of Sheffield.
Classifier Evaluation Vasileios Hatzivassiloglou University of Texas at Dallas.
Event Management & ITIL V3
Selective Block Minimization for Faster Convergence of Limited Memory Large-scale Linear Models Kai-Wei Chang and Dan Roth Experiment Settings Block Minimization.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley.
Polygraph: Automatically Generating Signatures for Polymorphic Worms James Newsome, Brad Karp, and Dawn Song Carnegie Mellon University Presented by Ryan.
Matwin Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original.
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Irfan Ullah Department of Information and Communication Engineering Myongji university, Yongin, South Korea Copyright © solarlits.com.
PANACEA: AUTOMATING ATTACK CLASSIFICATION FOR ANOMALY-BASED NETWORK INTRUSION DETECTION SYSTEMS Reporter : 鄭志欣 Advisor: Hsing-Kuo Pao.
Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorphic Worms Zhichun Li 1, Lanjia Wang 2, Yan Chen 1 and Judy Fu 3 1 Lab.
CS Machine Learning Instance Based Learning (Adapted from various sources)
AdaBoost Algorithm and its Application on Object Detection Fayin Li.
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Session 7: Face Detection (cont.)
Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel
Panacea: Automating Attack Classification for Anomaly-based Network Intrusion Systems By: david Rodriguez.
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
In summary C1={skin} C2={~skin} Given x=[R,G,B], is it skin or ~skin?
Bloom Filters Very fast set membership. Is x in S? False Positive
Packet Classification Using Coarse-Grained Tuple Spaces
Presentation transcript:

by Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel --James O’Reilly presenting

Intrusion Detection Systems (IDS) Signature-based IDS (SBS) Matches activity/payload to known attacks. The attacks are classified simultaneously with detection. Not able to recognize non-trivial variations of known attacks Useless against new, zero-day attacks With classification, security personnel can automate response and prioritize alerts

Intrusion Detection Systems (IDS) Anomaly-based IDS (ABS) Recognizes suspicious activity, even novel attacks, but cannot classify the activity. High false positive rate, which is a burden to support staff Without classification, prioritization and alert response automation are impossible

Problem Statement: Anomaly-Based IDSes appear to have more potential than signature-based IDSes because they can handle novel cases but are handicapped since they are unable to relate to security personnel anything about, e.g., a packet other than that it is “odd.”

Panacea’s Goal To accurately classify anomalies based on payload information from anomaly- based IDSes into different attack classes. The ability of an ABS to identify attacks will finally be paired with a system that can efficiently classify attacks as they happen, making ABSes far less costly in man-hours to use.

Training and Classification

Alert Information Extractor “Boils down” an alert into a Bloom Filter representation that the classification engine can analyze. Goes through two stages: 1. Building the n-gram bitmap 2. Computing the Bloom Filter 3. (Only During training) It will additionally pass along classification information (labels). The more samples the Alert Classification Engine has, the more accurately it can classify alerts. As we will see, storage for all the alert payloads is impractical.

Representing the data features  There are 256 length possible messages of length bytes.  Problem for classification is what features to pick – how to represent the payload without losing “meaning” Each feature has a space and time cost, especially during training. Too few features corresponds to a lack of resolution and the classifier’s task is likely impossible or hampered

N-grams  The information in the payload is represented using binary n-gram analysis. n, the n-gram order, represents the number of adjacent symbols that are analyzed.  The feature is the presence or absence of an n-gram in the payload and is stored in a bitmap  n-gram bitmap size is on the order of 256 n

Bloom Filter  The size of 3-gram bitmap is about 2MB. A 5-gram is about 128GB.  A Bloom Filter offers an aggressive compression of the n-gram features at the risk of false positives when reading the data.  The authors state that a 10KB space would be acceptable in the 5-gram case.

Bloom Filter The binary Bloom Filter data structure is basically a vector of some length and is used for determining set membership. Insertion: hash with different hash functions (with a range of Bloom Filter vector length) and mark the positions that are hashed to. Membership: hash the value with the chosen hash functions and look up in vector- if all positions are marked then present*, otherwise absent.

Inserting into a Bloom Filter “The error rate can be decreased by increasing the number of hash transforms and the space allocated to store the table.[1]”

Collisions in the Map A collision will occur with the above probability, where l is the size of the space, k is the number of hash functions, and n is the number of insertions.

Alert Classification Engine This Engine has essentially two modes: training and classification. Training is when the classifier (SVM or RIPPER) is learning how to classify the attacks. It does this with labeled Bloom Filter data(supervised learning). Once trained, the classifier can be given unlabelled Bloom Filter data and classify it.

Training and Classification

Accuracy and Training Set The accuracy of the classifier is dependent on training set size and, of course, its quality. Quality will be effected by the way the training data is labeled, more shortly. The training set needs to be fairly large (the larger the better but there are diminishing returns). Bolzoni et al. chose SVM and RIPPER for their accuracy but they are non-iterative learners: to update with new samples they must essentially add the samples to the original data and completely retrain. Therefore, the data must be as compact as possible without destroying the distinguishing features of the payloads.

Classifier A classifier takes input and classifies it as a member of a class A binary classifier takes input and decides essentially whether it’s a member of a class or not. Training a supervised-learning classifier involves taking labeled data and then minimizing the error on the training data using whatever sort of implementation the classifier is using.

SVM Training: It takes its sample set and plots it in a high dimensional space using a non-linear function and then divides its data with a hyperplane (a plane in a higher dimensional space). A signed distance from a plane is the metric to evaluate class membership (planes can have a positive or negative faces). Multiple classes are essentially done by adding multiple hyperplanes.

RIPPER  RIPPER is a rule-based classifier. It begins with an empty growing set and adds rules until there is no error on the growing set.  Handles multiple classes by identifying least common set and then the second- least common…  Has an optimization step to reduce rule set size

Labeling Alerts (input to the Alert information Extractor) Three methods: 1. Automatically: use the input from an SBS 2. Semi-automatic: use the SBS input and add data from an ABS with manual labeling 3. Manual: All alerts are manually classified

Test 1: Automatic DS a : Data Set a is 3200 automatically generated Snort alerts (SBS) triggered with vulnerability assessment tools in 14 classes. 4 classes were excluded because they had fewer than 10 samples.

n-gram length vs. accuracy

Selected Classes: Classifier vs. Sample Size

Test 2: Web attacks semi-automatic DS b : as Ds a but focused on web attacks alone with addition of some Milw0rm attacks alerts all manually classified. Two most common semi-

Live attacks DS c : Manually classified alerts from university server, no injection but alerted from ABS Poseidon and Sphinx. 100 alerts over 2 weeks. Panacea trained on DS b and is tested against the 100 ABS alerts.

Novelty: SVM vs. Ripper Extra Buffer Overflows were created by mutating known ones with the Sploit framework.

With Confidence Evaluation

Results Bolzoni et al. present an attack payload classifier for anomaly-based intrusion detection systems. Also, there exists a framework in this paper to add other classifiers and this framework can be extended to hybrid responses (SVM early, RIPPER if sample size over some amount, SVM for high-risk cases…)

References. Questions? [1] J. Bluestein, A. El-Maazawi. “Bloom Filters- A Tutorial, Analysis, and Survey”, Technical Report CS Faculty of Computer Science, Dalhousie Univ., Canada.