Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute.

Slides:



Advertisements
Similar presentations
Intrusion Detection Systems (I) CS 6262 Fall 02. Definitions Intrusion Intrusion A set of actions aimed to compromise the security goals, namely A set.
Advertisements

V-Detector: A Negative Selection Algorithm Zhou Ji, advised by Prof. Dasgupta Computer Science Research Day The University of Memphis March 25, 2005.
Hybrid BDD and All-SAT Method for Model Checking Orna Grumberg Joint work with Assaf Schuster and Avi Yadgar Technion – Israel Institute of Technology.
Loss-Sensitive Decision Rules for Intrusion Detection and Response Linda Zhao Statistics Department University of Pennsylvania Joint work with I. Lee,
Yinyin Yuan and Chang-Tsun Li Computer Science Department
RNA-Seq based discovery and reconstruction of unannotated transcripts
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Fast Incremental Maintenance of Approximate histograms : Phillip B. Gibbons (Intel Research Pittsburgh) Yossi Matias (Tel Aviv University) Viswanath Poosala.
Prachi Saraph, Mark Last, and Abraham Kandel. Introduction Black-Box Testing Apply an Input Observe the corresponding output Compare Observed output with.
Network Traffic Anomaly Detection Based on Packet Bytes Matthew V. Mahoney Florida Institute of Technology
Trust and Profit Sensitive Ranking for Web Databases and On-line Advertisements Raju Balakrishnan (Arizona State University)
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Service Discrimination and Audit File Reduction for Effective Intrusion Detection by Fernando Godínez (ITESM) In collaboration with Dieter Hutter (DFKI)
Packet Anomaly Intrusion Detection PAID Constantine Manikopoulos and Zheng Zhang New Jersey Center for Wireless Networking and Security (NJWINS) at NJIT.
New Sampling-Based Summary Statistics for Improving Approximate Query Answers P. B. Gibbons and Y. Matias (ACM SIGMOD 1998) Rongfang Li Feb 2007.
A Generalized Model for Financial Time Series Representation and Prediction Author: Depei Bao Presenter: Liao Shu Acknowledgement: Some figures in this.
Importance Sampling. What is Importance Sampling ? A simulation technique Used when we are interested in rare events Examples: Bit Error Rate on a channel,
Today’s Agenda  HW #1 Due  Quick Review  Finish Input Space Partitioning  Combinatorial Testing Software Testing and Maintenance 1.
Heuristic alignment algorithms and cost matrices
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
Parameterizing Random Test Data According to Equivalence Classes Chris Murphy, Gail Kaiser, Marta Arias Columbia University.
Mining Behavior Models Wenke Lee College of Computing Georgia Institute of Technology.
1 Dot Plots For Time Series Analysis Dragomir Yankov, Eamonn Keogh, Stefano Lonardi Dept. of Computer Science & Eng. University of California Riverside.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic A Dissertation by Matthew V. Mahoney Major Advisor: Philip.
Lucent Technologies – Proprietary Use pursuant to company instruction Learning Sequential Models for Detecting Anomalous Protocol Usage (work in progress)
Distributed Network Intrusion Detection An Immunological Approach Steven Hofmeyr Stephanie Forrest Patrik D’haeseleer Dept. of Computer Science University.
Anomaly detection Problem motivation Machine Learning.
A Hybrid Model to Detect Malicious Executables Mohammad M. Masud Latifur Khan Bhavani Thuraisingham Department of Computer Science The University of Texas.
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau,
Discovering Interesting Subsets Using Statistical Analysis Maitreya Natu and Girish K. Palshikar Tata Research Development and Design Centre (TRDDC) Pune,
Chapter 8 Introduction to Hypothesis Testing
Ranking the Importance of Alerts for Problem Determination in Large Computer System Guofei Jiang, Haifeng Chen, Kenji Yoshihira, Akhilesh Saxena NEC Laboratories.
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.
nd Joint Workshop between Security Research Labs in JAPAN and KOREA Profile-based Web Application Security System Kyungtae Kim High Performance.
Chapter 10. Sampling Strategy for Building Decision Trees from Very Large Databases Comprising Many Continuous Attributes Jean-Hugues Chauchat and Ricco.
Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection Matt Mahoney Feb. 18, 2003.
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
Copyright © Curt Hill Query Evaluation Translating a query into action.
Learning Rules for Anomaly Detection of Hostile Network Traffic Matthew V. Mahoney and Philip K. Chan Florida Institute of Technology.
One-class Training for Masquerade Detection Ke Wang, Sal Stolfo Columbia University Computer Science IDS Lab.
Mining Click-stream Data With Statistical and Rule-based Methods Martin Labský, Vladimír Laš, Petr Berka University of Economics, Prague.
1 Discovering Robust Knowledge from Databases that Change Chun-Nan HsuCraig A. Knoblock Arizona State UniversityUniversity of Southern California Journal.
Estimating Recombination Rates. LRH selection test, and recombination Recall that LRH/EHH tests for selection by looking at frequencies of specific haplotypes.
Peeping Tom in the Neighborhood Keystroke Eavesdropping on Multi-User Systems USENIX 2009 Kehuan Zhang, Indiana University, Bloomington XiaoFeng Wang,
Kanpur Genetic Algorithms Laboratory IIT Kanpur 25, July 2006 (11:00 AM) Multi-Objective Dynamic Optimization using Evolutionary Algorithms by Udaya Bhaskara.
Protein motif extraction with neuro-fuzzy optimization Bill C. H. Chang and Author : Bill C. H. Chang and Saman K. Halgamuge Saman K. Halgamuge Adviser.
Boundary Detection in Tokenizing Network Application Payload for Anomaly Detection Rachna Vargiya and Philip Chan Department of Computer Sciences Florida.
A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou Qiang Wang Guo Li Christos Faloutsos Presented by Rui Li.
Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall Chapter 9 Designing Databases 9.1.
18 February 2003Mathias Creutz 1 T Seminar: Discovery of frequent episodes in event sequences Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo.
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.
Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS
Bootstrapped Optimistic Algorithm for Tree Construction
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
Machine Learning for Network Anomaly Detection Matt Mahoney.
HANGMAN OPTIMIZATION Kyle Anderson, Sean Barton and Brandyn Deffinbaugh.
Learning and Removing Cast Shadows through a Multidistribution Approach Nicolas Martel-Brisson, Andre Zaccarin IEEE TRANSACTIONS ON PATTERN ANALYSIS AND.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
A Generic Approach to Big Data Alarms Prioritization
Hybrid BDD and All-SAT Method for Model Checking
Estimating Recombination Rates
Searching Similar Segments over Textual Event Sequences
Hierarchical Search on DisCSPs
Hierarchical Search on DisCSPs
Presentation transcript:

Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute of Technology

Overview Related work in system call sequence-based systems Problem Statement – Can system call arguments as attributes improve anomaly detection algorithms? Approach –LERAD ( a conditional rule learning algorithm) –Variants of attributes Experimental evaluation Conclusions and future work

Related Work tide (time-delay embedding) Forrest et al, 1996 stide (sequence time-delay embedding) Hofmeyr et al, 1999 t-stide (stide with frequency threshold) Warrender et al, 1999 Variable length sequence-based techniques (Wespi et al, 1999, 2000; Jiang et al, 2001) False Alarms !!

Problem Statement Current models – system call sequences What else can we model? System call arguments open(“/etc/passwd”) open(“/users/readme”)

Approach Models based upon system calls 3 sets of attributes - system call sequence -system call arguments -system call arguments + sequence Adopt a rule learning approach - Learning Rules for Anomaly Detection (LERAD)

Learning Rules for Anomaly Detection (LERAD) [Mahoney and Chan, 2003] A, B, and X are attributes a, b, x1, x2 are values to the corresponding attributes p - probability of observing a value not in the consequent r - cardinality of the set {x1, x2, …} in the consequent n - number of samples that satisfy the antecedent

Overview of LERAD 4 steps involved in rule generation: 1From a small training sample, generate candidate rules and associate probabilities with them 2Coverage test to minimize the rule set 3Update rules beyond the small training sample 4Validating rules on a separate validation set

Step 1a: Generate Candidate Rules Two samples are picked at random (say S1 and S2) Matching attributes A, B and C are picked in random order (say B, C and A) These attributes are used to form rules with 0, 1, 2 conditions in the antecedent Training Data ABCD Random Sample S11234 Random Sample S21235 Random Sample S36784 TrainingS41095 TrainingS51234 ValidationS66385

Step 1b: Generate Candidate Rules Training Data ABCD Random Sample S11234 Random Sample S21235 Random Sample S36784 TrainingS41095 TrainingS51234 ValidationS66385 Adding values to the consequent based on a subset of the training set (say S1-S3) Probability estimate p associated with every rule when it is violated ( instead of in each rule) Rules are sorted in increasing order of the p

Step 2: Coverage Test Training Data ABCD Random Sample S112 (Rule 2)34 Random Sample S212 (Rule 2)35 Random Sample S367 (Rule 1)84 TrainingS41095 TrainingS51234 ValidationS66385 Obtain minimal set of rules

Step 2: Coverage Test Training Data ABCD Random Sample S112 (Rule 2)34 Random Sample S212 (Rule 2)35 Random Sample S367 (Rule 1)84 TrainingS41095 TrainingS51234 ValidationS66385 Obtain minimal set of rules

Step 3: Updating rules beyond the training samples Training Data ABCD Random Sample S11234 Random Sample S21235 Random Sample S36784 TrainingS41095 TrainingS51234 ValidationS66385 Extend rules to the entire training (minus validation) set (samples S1-S5)

Step 4: Validating rules Training Data ABCD Random Sample S11234 Random Sample S21235 Random Sample S36784 TrainingS41095 TrainingS51234 ValidationS66385 Test the set of rules on the validation set (S6) Remove rules that produce anomaly

Step 4: Validating rules Training Data ABCD Random Sample S11234 Random Sample S21235 Random Sample S36784 TrainingS41095 TrainingS51234 ValidationS66385 Test the set of rules on the validation set (S6) Remove rules that produce anomaly

Learning Rules for Anomaly Detection (LERAD) t - time interval since the last anomalous event i - index of the rule violated Non-stationary model - only the last occurrence of an event is important

Variants of attributes 3 variants (i)S-LERAD: system call sequence (ii)A-LERAD: system call arguments (iii)M-LERAD: system call arguments + sequence

S-LERAD System call sequence-based LERAD Samples comprising 6 contiguous system call tokens input to LERAD SC1SC2SC3SC4SC5SC6 mmap()munmap()mmap()munmap()open()close() munmap()mmap()munmap()open()close()open() mmap()munmap()open()close()open()mmap()

A-LERAD Samples containing system call along with arguments System call will always be a condition in the antecedent of the rule SCArg1Arg2Arg3Arg4Arg5

M-LERAD Combination of system call sequences and arguments

1999 DARPA IDS Evaluation [ Lippmann et al, 2000] Week 3 – Training data (~ 2.1 million system calls) Weeks 4 and 5 – Test Data (over 7 million system calls) Total – 51 attacks on the Solaris host

Experimental Procedures Preprocessing the data: BSM audit log Applications Processes Model per application Merge all alarms PiPi Application 1 PjPj PkPk Application 2 Application N …

Evaluation Criteria Attack detected if alarm generated within 60 seconds of occurrence of the attack Number of attacks 10 false alarms/day Time and storage requirements

Detections vs. false alarms

Percentage detections per attack type

Comparison of CPU times Application Training Time (seconds) [on 1 week of data] Testing Time (seconds) [on 2 weeks of data] t-stideM-LERADt-stideM-LERAD ftpd Telnetd ufsdump tcsh login sendmail quota sh

Storage Requirements More data extracted (system calls + arguments) – more space Only during training – can be done offline Small rule set vs. large database (stide, t-stide) e.g. for tcsh application: 1.5 KB file for the set of rules (M-LERAD) 5 KB for sequence database (stide)

Summary of contributions Introduced argument information to model systems Enhanced LERAD to form rules with system calls as pivotal attributes LERAD with argument information detects more attacks than existing system call sequence based algorithms (tide, stide, t-stide). Sequence + argument based system generally detected the most attacks with different false alarm rates Argument information alone can be used effectively to detect attacks at lower false alarm rates Less memory requirements during detection as compared to sequence based techniques

Future Work More $$$$$$$$$$

Future Work A richer representation More attributes - time between subsequent system calls Anomaly score t-stide vs. LERAD

Thank You