Anomaly Detection of Web- based Attacks Kruegel, C. and Vigna, G. University of California, Santa Barbara The 10th ACM Conference on Computer and Communication.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
Introduction to Computer Science 2 Lecture 7: Extended binary trees
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Fast Algorithms For Hierarchical Range Histogram Constructions
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Chapter 11- Confidence Intervals for Univariate Data Math 22 Introductory Statistics.
An Introduction to Variational Methods for Graphical Models.
CSC 380 Algorithm Project Presentation Spam Detection Algorithms Kyle McCombs Bridget Kelly.
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Hidden Markov Models Theory By Johan Walters (SR 2003)
Service Discrimination and Audit File Reduction for Effective Intrusion Detection by Fernando Godínez (ITESM) In collaboration with Dieter Hutter (DFKI)
Planning under Uncertainty
Models and Security Requirements for IDS. Overview The system and attack model Security requirements for IDS –Sensitivity –Detection Analysis methodology.
Hidden Markov Models Pairwise Alignments. Hidden Markov Models Finite state automata with multiple states as a convenient description of complex dynamic.
1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's.
Chapter 7 Sampling and Sampling Distributions
Ensemble Learning: An Introduction
Evaluating Hypotheses
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Aho-Corasick String Matching An Efficient String Matching.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 8-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Part III: Inference Topic 6 Sampling and Sampling Distributions
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Experimental Evaluation
Statistics for Managers Using Microsoft® Excel 5th Edition
Time Series Data Analysis - II
©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.
Intrusion and Anomaly Detection in Network Traffic Streams: Checking and Machine Learning Approaches ONR MURI area: High Confidence Real-Time Misuse and.
Chapter 10 Hypothesis Testing
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
PARAMETRIC STATISTICAL INFERENCE
BINF6201/8201 Hidden Markov Models for Sequence Analysis
nd Joint Workshop between Security Research Labs in JAPAN and KOREA Profile-based Web Application Security System Kyungtae Kim High Performance.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
Querying Structured Text in an XML Database By Xuemei Luo.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
H IDDEN M ARKOV M ODELS. O VERVIEW Markov models Hidden Markov models(HMM) Issues Regarding HMM Algorithmic approach to Issues of HMM.
Automatically Generating Models for Botnet Detection Presenter: 葉倚任 Authors: Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel,
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
1 Swaddler: An Approach for the Anomaly-based Detection of State Violations in Web Application Marco Cova, Davide Balzarotti, Viktoria Felmetsger, and.
Confidence intervals and hypothesis testing Petter Mostad
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Hidden Markovian Model. Some Definitions Finite automation is defined by a set of states, and a set of transitions between states that are taken based.
STATISTICS AND OPTIMIZATION Dr. Asawer A. Alwasiti.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS
Computational Biology, Part 3 Representing and Finding Sequence Features using Frequency Matrices Robert F. Murphy Copyright  All rights reserved.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 9 Hypothesis Testing: Single.
Chapter 6 Sampling and Sampling Distributions
Hidden Markov Models BMI/CS 576
LECTURE 33: STATISTICAL SIGNIFICANCE AND CONFIDENCE (CONT.)
Objective of This Course
Parametric Methods Berlin Chen, 2005 References:
Chapter 9 Hypothesis Testing: Single Population
Error Correction Coding
Some Key Ingredients for Inferential Statistics
Presentation transcript:

Anomaly Detection of Web- based Attacks Kruegel, C. and Vigna, G. University of California, Santa Barbara The 10th ACM Conference on Computer and Communication Security, Presenter: Liaw, Yun

2008/10/17Presenter: Liaw, Yun2 Outline  Introduction  Related Works  Data Model  Detection Models 1.Attribute Length 2.Attribute Character Distribution 3.Structural Inference 4.Token Finder 5.Attribute Presence or Absence 6.Attribute Order  Evaluation 1.Model Validation 2.Detection Effectiveness  Conclusions & Comments

2008/10/17Presenter: Liaw, Yun3 Introduction(1/2) - Background  Vulnerabilities of Web Server Accessible through corporate firewalls Usually developed without a sound security methodology  Between April 2001 and March 2002, there are 23% web related attacks of the total number of vulnerabilities disclosed.

2008/10/17Presenter: Liaw, Yun4 Introduction(2/2) – The System  A anomaly detection system for web- based attacks INPUT: Logs of the web server OUTPUT: An anomaly score for each web request  Analyze the parameters of HTTP requests (GET requests) and compare to the specific program being referenced.

2008/10/17Presenter: Liaw, Yun5 Outline  Introduction  Related Works  Data Model  Detection Models 1.Attribute Length 2.Attribute Character Distribution 3.Structural Inference 4.Token Finder 5.Attribute Presence or Absence 6.Attribute Order  Evaluation 1.Model Validation 2.Detection Effectiveness  Conclusions & Comments

2008/10/17Presenter: Liaw, Yun6 Related Works(1/1)  Anomaly Detection System Relies on behavior and interpret deviations from “ normal ” behaviors. Assumptions of this kind of system:  Attack patterns differ from normal.  The difference quantitatively expressed Data mining, statistical analysis, and sequence analysis  Techniques that learn detection parameters from data (1) Extract features that are useful for building intrusion classification models. Use labeled data to derive which is the best set for classification. (1) W. Lee and S. Stolfo. A Framework for Constructing Features and Models for Intrusion Detection Systems, 2000(1) W. Lee and S. Stolfo. A Framework for Constructing Features and Models for Intrusion Detection Systems, 2000.

2008/10/17Presenter: Liaw, Yun7 Outline  Introduction  Related Works  Data Model  Detection Models 1.Attribute Length 2.Attribute Character Distribution 3.Structural Inference 4.Token Finder 5.Attribute Presence or Absence 6.Attribute Order  Evaluation 1.Model Validation 2.Detection Effectiveness  Conclusions & Comments

2008/10/17Presenter: Liaw, Yun8 Data Model(1/2)  INPUT : An ordered set of URIs U={u 1, u 2, …,u m } extracted from successful GET requests  The composition of u i Path to the desired resource (path i ) Optional path information (pinfo i ) Optional query string (q)  q = (a 1, v 1 ), (a 2, v 2 ), …,(a n, v n ) where a i ∈ A, a set of all attributes, and v i is a string.  S q = {a j, …, a k }, a subset of all attributes of query q.

2008/10/17Presenter: Liaw, Yun9 Data Model(2/2)  URIs that do not contain query string would be removed from U.  U would be partitioned into subsets U r according to the resource path.  The anomaly detection algorithms are run each set of queries U r.

2008/10/17Presenter: Liaw, Yun10 Outline  Introduction  Related Works  Data Model  Detection Models 1.Attribute Length 2.Attribute Character Distribution 3.Structural Inference 4.Token Finder 5.Attribute Presence or Absence 6.Attribute Order  Evaluation 1.Model Validation 2.Detection Effectiveness  Conclusions & Comments

2008/10/17Presenter: Liaw, Yun11 Detection Models(1/2)  A model is used to evaluate a certain feature of a query attribute or a query as a whole.  The task of a model is to assign a probability for the input query or its attributes Low probability indicates a potential attack  When one anomaly score exceeds the threshold determined by training, mark the query as anomalous. w m represents the weight associated with model p m is the returned probability

2008/10/17Presenter: Liaw, Yun12 Detection Models(2/2)  Training Phase Create profile for each programs and its attributes Establish a suitable threshold  For each programs and its attributes, store the highest score, and make a adjustable percentage higher, usually 10%.  Detection Phase Calculate anomaly score and report the anomalous query

2008/10/17Presenter: Liaw, Yun13 Outline  Introduction  Related Works  Data Model  Detection Models 1.Attribute Length 2.Attribute Character Distribution 3.Structural Inference 4.Token Finder 5.Attribute Presence or Absence 6.Attribute Order  Evaluation 1.Model Validation 2.Detection Effectiveness  Conclusions & Comments

2008/10/17Presenter: Liaw, Yun14 Attribute Length(1/3) - Learning  To Approximate the unknown distribution of parameter length and detect the deviated instances  Calculate sample mean μ and the sample variance σ 2 for the lengths of the parameters l 1, l 2, …, l n

2008/10/17Presenter: Liaw, Yun15 Attribute Length(2/3) - Detection  Use Chebyshev inequality  Let t = |l -μ|, where l represents attribute length, and μ is the sample mean  p(l) means the probability of any length x “ deviates more ” than l. when l goes bigger, p(l) decreases

2008/10/17Presenter: Liaw, Yun16 Attribute Length(3/3)  The bound computed by Chebyshev inequality is weak High degree of tolerance to deviation  By using this model only obvious outliers are flagged as suspicious, leading to a reduced number of false alarm

2008/10/17Presenter: Liaw, Yun17 Outline  Introduction  Related Works  Data Model  Detection Models 1.Attribute Length 2.Attribute Character Distribution 3.Structural Inference 4.Token Finder 5.Attribute Presence or Absence 6.Attribute Order  Evaluation 1.Model Validation 2.Detection Effectiveness  Conclusions & Comments

2008/10/17Presenter: Liaw, Yun18 Attribute Character Distribution(1/3)  To analyze the relative frequencies of all 256 characters sorted in descending order. Normal input will slowly decrease  Do not rely on the occurrence of particular character.  Malicious input: Frequencies drop extremely Nearly not at all e.g., long sequence of 0x90(NOP), XOR operation and character shifting

2008/10/17Presenter: Liaw, Yun19 Attribute Character Distribution(2/3) - Learning  ICD (Idealized character distribution) A perfectly normal distribution of an attribute ’ s character ICD(n) stands for the n-most often relative frequency of the character  e.g., “ passwd ”, ICD(0)=0.33 ICD(1) to ICD(4) is 0.17, ICD(5) to ICD(255)=0 calculated by store character distribution for each query attribute, then average them all

2008/10/17Presenter: Liaw, Yun20 Attribute Character Distribution(3/3) - Detection  Use Pearson χ 2 -test as a ‘ goodness-of-fit ’ test to test if a query attribute is a sample drawn from ICD Divide ICD(0) to ICD(256) into six (2) segments χ 2 -test 1.Calculate the observed O i (given) and expected frequencies E i (ICD * length of attribute) 2.Compute the χ 2 -value as 3.Determine the degrees of freedom (which is 5 in this case) and obtain the significance probability (2) K. Tan and R. Maxion. Why 6? Defining the Operational Limits of Stide, an Anomaly-Based Intrusion Detector, May 2002.

2008/10/17Presenter: Liaw, Yun21 Outline  Introduction  Related Works  Data Model  Detection Models 1.Attribute Length 2.Attribute Character Distribution 3.Structural Inference 4.Token Finder 5.Attribute Presence or Absence 6.Attribute Order  Evaluation 1.Model Validation 2.Detection Effectiveness  Conclusions & Comments

2008/10/17Presenter: Liaw, Yun22 Structural Inference(1/6)  Structure of a parameter is the regular grammar that describes its normal values  Structure form Over-simplification: Only able to derive the learned data Over-generalization: Capable to generate all possible strings  Use hidden Markov model and Bayesian probability to generalize the simplified grammar “ reasonable ”

2008/10/17Presenter: Liaw, Yun23 Structural Inference(2/6) – Learning  Probabilistic grammar A grammar that assigns probabilities to each of its productions. i.e., some words are more likely to be produced than others. Can be transformed into a non-deterministic finite automaton (NFA)  The model is to find the a NFA that has the highest likelihood for the training data Using a Bayesian technique (3) to derive the Markov model from empirical data Calculate P(Model/TrainingData) (3)Andreas Stolcke and Stephen Omohundro. Hidden Markov Model Induction by Bayesian Model Merging, 1993.

2008/10/17Presenter: Liaw, Yun24 Structural Inference(3/6) - Learning  p(w) The probability of an output word (sequence of symbol) the sum of all paths ’ probability through the automaton  For the word ‘ ab ’ in Figure 2 NFA, o i is an output symbol Ps i (o i ) ≡ prob. of output o i in state s i P(t i ) ≡ prob. of taken transition t i

2008/10/17Presenter: Liaw, Yun25 Structural Inference(4/6) - Learning  Use Bayesian theorem to maximize P(Model|TrainingData) P(TrainingData) is considered as a scaling factor P(TrainingData|Model) ≡ Adding each training input ’ s probability for certain automaton P(Model)  Should reflect that smaller models are preferred N ≡ total number of states Σ S trans ≡ number of transitions in state S Σ S emit ≡ number of emissions in state S

2008/10/17Presenter: Liaw, Yun26 Structural Inference(5/6) - Learning  Outcome of P(Model|TrainingData) Simple model  High P(Model) and Low P(TrainingData|Model) Complex model  High P(TrainingData|Model) and Low P(Model)

2008/10/17Presenter: Liaw, Yun27 Structural Inference(6/6) – Learning and Detection  Model Building Process 1.Starts with an automaton that exactly reflects the input data 2.Continue merging states until a posteriori probability no longer increases (using optimization algorithm such as Viterbi path approximation)  Detection: valid output may receive a small probability since all probabilities of words sum up to 1 If the word is a valid output from the model, return 1, otherwise, return 0

2008/10/17Presenter: Liaw, Yun28 Outline  Introduction  Related Works  Data Model  Detection Models 1.Attribute Length 2.Attribute Character Distribution 3.Structural Inference 4.Token Finder 5.Attribute Presence or Absence 6.Attribute Order  Evaluation 1.Model Validation 2.Detection Effectiveness  Conclusion & Comment

2008/10/17Presenter: Liaw, Yun29 Token Finder(1/2) - Learning  Many attributes are drawn from a limited set of alternatives  Based on different occurrences of parameter values is bounded on some unknown threshold t  When the number of different argument grows proportional to the total number of argument instances, random value is indicated  Calculate correlation ρ between these two functions: x is an increasing number from 1

2008/10/17Presenter: Liaw, Yun30 Token Finder(2/2) – Learning and Detection  Outcome of ρ ρ>0, mark the attribute as random value ρ<0, mark the attribute as enumeration  Detection If an attribute is marked as enumeration  Return 1 for the known value, otherwise, return 0 If an attribute is marked as random value  Always return 1

2008/10/17Presenter: Liaw, Yun31 Outline  Introduction  Related Works  Data Model  Detection Models 1.Attribute Length 2.Attribute Character Distribution 3.Structural Inference 4.Token Finder 5.Attribute Presence or Absence 6.Attribute Order  Evaluation 1.Model Validation 2.Detection Effectiveness  Conclusions & Comments

2008/10/17Presenter: Liaw, Yun32 Attribute Presence or Absence(1/1)  Many hand-crafted attacks focus on a certain parameter, and paid little attention to the others  The Analysis is perform on the whole query  Learning: Record each distinct subset S q = {a i, …, a k } of attributes that is seen during training  Detection: Perform for each query a lookup of the current subset If fit the record subset, return 1, otherwise, return 0

2008/10/17Presenter: Liaw, Yun33 Attribute Order(1/1)  Server-side programs often contain the same relative parameter order, even some of the parameters are omitted, the order is still preserved  Learning 1.Process an ordered list O 2.Make them into directed graph G  number of vertex equal to number of distinct attributes  For each pair (a s, a t ), insert the edge from v s to v t 3.Use Tarjan ’ s algorithm (4) to identify strongly connected component and remove the cycle 4.Add all reachable node pairs into O  Detection  If one pair violates an element in O, return 0, otherwise, return 1 (4) Robert Tarjan. Depth-First Search and Linear Graph Algorithms, 1972.

2008/10/17Presenter: Liaw, Yun34 Outline  Introduction  Related Works  Data Model  Detection Models 1.Attribute Length 2.Attribute Character Distribution 3.Structural Inference 4.Token Finder 5.Attribute Presence or Absence 6.Attribute Order  Evaluation 1.Model Validation 2.Detection Effectiveness  Conclusions & Comments

2008/10/17Presenter: Liaw, Yun35 Evaluation(1/1)  Data sets: Apache log files Google, Inc.  With restricted access because of privacy issues University of California, Santa Barbara Technical University, Vienna  Both two Schools ’ logs are fully accessible

2008/10/17Presenter: Liaw, Yun36 Model Validation(1/2)  The length of training phase was set to 1000 for all following experiments With a long tail much higher than other two Logarithmic scale

2008/10/17Presenter: Liaw, Yun37 Model Validation(2/2)  In Figure 3 and 4 Most attributes have high probability value and drop from above 90% to below 1%  In Table 3 # of queries < # of attributes, because one query may contains many attributes  Google data request varies to the greatest extent, because search string is included

2008/10/17Presenter: Liaw, Yun38 Outline  Introduction  Related Works  Data Model  Detection Models 1.Attribute Length 2.Attribute Character Distribution 3.Structural Inference 4.Token Finder 5.Attribute Presence or Absence 6.Attribute Order  Evaluation 1.Model Validation 2.Detection Effectiveness  Conclusions & Comments

2008/10/17Presenter: Liaw, Yun39 Detection Effectiveness(1/2)  Assumed that training data has no real attacks  Wanted to include Nimda and Code Red worms, but Apache is unable to execute them  Google has highest alarms per day The system parameters is chosen for university log Non-printable characters (probably because of incompatible character sets) Extremely long string (such as URLs pasted directly)  Several anomalous but not malicious queries in the two universities ’ log (some users are testing the system) Low false alarm rate

2008/10/17Presenter: Liaw, Yun40 Detection Effectiveness(2/2)  Use 11 real-world exploits and Code Red 1 buffer overflow attack - phorum 3 directory traversal attacks(../attack) - htmlscript 2 XSS exploits - imp 2 XSS exploits - csSearch 3 input validation error - Webwho  No model can raise alert for all attacks  Reliance on web logs is the limitation of the system produce little false alarm Effective against many attacks that injecting malicious payloads

2008/10/17Presenter: Liaw, Yun41 Outline  Introduction  Related Works  Data Model  Detection Models 1.Attribute Length 2.Attribute Character Distribution 3.Structural Inference 4.Token Finder 5.Attribute Presence or Absence 6.Attribute Order  Evaluation 1.Model Validation 2.Detection Effectiveness  Conclusions & Comments

2008/10/17Presenter: Liaw, Yun42 Conclusions(1/1)  The first anomaly detection system for the web-based attacks  Takes advantage of the correlation between server-side program and the parameter characteristics  The parameter characteristics are learned from input data  Future work: decreasing the number of false positives by refining the algorithms

2008/10/17Presenter: Liaw, Yun43 Comments(1/1)  Similar to our system as an anomaly detection system, but detects deviations with content-based analysis.  Provide several methods to analyze the log content.  Machine learning techniques are pretty useful for developing and enhancing an anomaly detection system.