Anomaly Detection of Web- based Attacks Kruegel, C. and Vigna, G. University of California, Santa Barbara The 10th ACM Conference on Computer and Communication.

Anomaly Detection of Web- based Attacks Kruegel, C. and Vigna, G. University of California, Santa Barbara The 10th ACM Conference on Computer and Communication Security, 2003. Presenter: Liaw, Yun

2008/10/17Presenter: Liaw, Yun2 Outline  Introduction  Related Works  Data Model  Detection Models 1.Attribute Length 2.Attribute Character Distribution 3.Structural Inference 4.Token Finder 5.Attribute Presence or Absence 6.Attribute Order  Evaluation 1.Model Validation 2.Detection Effectiveness  Conclusions & Comments

2008/10/17Presenter: Liaw, Yun3 Introduction(1/2) - Background  Vulnerabilities of Web Server Accessible through corporate firewalls Usually developed without a sound security methodology  Between April 2001 and March 2002, there are 23% web related attacks of the total number of vulnerabilities disclosed.

2008/10/17Presenter: Liaw, Yun4 Introduction(2/2) – The System  A anomaly detection system for web- based attacks INPUT: Logs of the web server OUTPUT: An anomaly score for each web request  Analyze the parameters of HTTP requests (GET requests) and compare to the specific program being referenced.

2008/10/17Presenter: Liaw, Yun6 Related Works(1/1)  Anomaly Detection System Relies on behavior and interpret deviations from “ normal ” behaviors. Assumptions of this kind of system:  Attack patterns differ from normal.  The difference quantitatively expressed Data mining, statistical analysis, and sequence analysis  Techniques that learn detection parameters from data (1) Extract features that are useful for building intrusion classification models. Use labeled data to derive which is the best set for classification. (1) W. Lee and S. Stolfo. A Framework for Constructing Features and Models for Intrusion Detection Systems, 2000(1) W. Lee and S. Stolfo. A Framework for Constructing Features and Models for Intrusion Detection Systems, 2000.

2008/10/17Presenter: Liaw, Yun8 Data Model(1/2)  INPUT : An ordered set of URIs U={u 1, u 2, …,u m } extracted from successful GET requests  The composition of u i Path to the desired resource (path i ) Optional path information (pinfo i ) Optional query string (q)  q = (a 1, v 1 ), (a 2, v 2 ), …,(a n, v n ) where a i ∈ A, a set of all attributes, and v i is a string.  S q = {a j, …, a k }, a subset of all attributes of query q.

2008/10/17Presenter: Liaw, Yun9 Data Model(2/2)  URIs that do not contain query string would be removed from U.  U would be partitioned into subsets U r according to the resource path.  The anomaly detection algorithms are run each set of queries U r.

2008/10/17Presenter: Liaw, Yun11 Detection Models(1/2)  A model is used to evaluate a certain feature of a query attribute or a query as a whole.  The task of a model is to assign a probability for the input query or its attributes Low probability indicates a potential attack  When one anomaly score exceeds the threshold determined by training, mark the query as anomalous. w m represents the weight associated with model p m is the returned probability

2008/10/17Presenter: Liaw, Yun12 Detection Models(2/2)  Training Phase Create profile for each programs and its attributes Establish a suitable threshold  For each programs and its attributes, store the highest score, and make a adjustable percentage higher, usually 10%.  Detection Phase Calculate anomaly score and report the anomalous query

2008/10/17Presenter: Liaw, Yun14 Attribute Length(1/3) - Learning  To Approximate the unknown distribution of parameter length and detect the deviated instances  Calculate sample mean μ and the sample variance σ 2 for the lengths of the parameters l 1, l 2, …, l n

2008/10/17Presenter: Liaw, Yun15 Attribute Length(2/3) - Detection  Use Chebyshev inequality  Let t = |l -μ|, where l represents attribute length, and μ is the sample mean  p(l) means the probability of any length x “ deviates more ” than l. when l goes bigger, p(l) decreases

2008/10/17Presenter: Liaw, Yun16 Attribute Length(3/3)  The bound computed by Chebyshev inequality is weak High degree of tolerance to deviation  By using this model only obvious outliers are flagged as suspicious, leading to a reduced number of false alarm

2008/10/17Presenter: Liaw, Yun18 Attribute Character Distribution(1/3)  To analyze the relative frequencies of all 256 characters sorted in descending order. Normal input will slowly decrease  Do not rely on the occurrence of particular character.  Malicious input: Frequencies drop extremely Nearly not at all e.g., long sequence of 0x90(NOP), XOR operation and character shifting

2008/10/17Presenter: Liaw, Yun19 Attribute Character Distribution(2/3) - Learning  ICD (Idealized character distribution) A perfectly normal distribution of an attribute ’ s character ICD(n) stands for the n-most often relative frequency of the character  e.g., “ passwd ”, ICD(0)=0.33 ICD(1) to ICD(4) is 0.17, ICD(5) to ICD(255)=0 calculated by store character distribution for each query attribute, then average them all

2008/10/17Presenter: Liaw, Yun20 Attribute Character Distribution(3/3) - Detection  Use Pearson χ 2 -test as a ‘ goodness-of-fit ’ test to test if a query attribute is a sample drawn from ICD Divide ICD(0) to ICD(256) into six (2) segments χ 2 -test 1.Calculate the observed O i (given) and expected frequencies E i (ICD * length of attribute) 2.Compute the χ 2 -value as 3.Determine the degrees of freedom (which is 5 in this case) and obtain the significance probability (2) K. Tan and R. Maxion. Why 6? Defining the Operational Limits of Stide, an Anomaly-Based Intrusion Detector, May 2002.

2008/10/17Presenter: Liaw, Yun22 Structural Inference(1/6)  Structure of a parameter is the regular grammar that describes its normal values  Structure form Over-simplification: Only able to derive the learned data Over-generalization: Capable to generate all possible strings  Use hidden Markov model and Bayesian probability to generalize the simplified grammar “ reasonable ”

2008/10/17Presenter: Liaw, Yun23 Structural Inference(2/6) – Learning  Probabilistic grammar A grammar that assigns probabilities to each of its productions. i.e., some words are more likely to be produced than others. Can be transformed into a non-deterministic finite automaton (NFA)  The model is to find the a NFA that has the highest likelihood for the training data Using a Bayesian technique (3) to derive the Markov model from empirical data Calculate P(Model/TrainingData) (3)Andreas Stolcke and Stephen Omohundro. Hidden Markov Model Induction by Bayesian Model Merging, 1993.

2008/10/17Presenter: Liaw, Yun24 Structural Inference(3/6) - Learning  p(w) The probability of an output word (sequence of symbol) the sum of all paths ’ probability through the automaton  For the word ‘ ab ’ in Figure 2 NFA, o i is an output symbol Ps i (o i ) ≡ prob. of output o i in state s i P(t i ) ≡ prob. of taken transition t i

2008/10/17Presenter: Liaw, Yun25 Structural Inference(4/6) - Learning  Use Bayesian theorem to maximize P(Model|TrainingData) P(TrainingData) is considered as a scaling factor P(TrainingData|Model) ≡ Adding each training input ’ s probability for certain automaton P(Model)  Should reflect that smaller models are preferred N ≡ total number of states Σ S trans ≡ number of transitions in state S Σ S emit ≡ number of emissions in state S

2008/10/17Presenter: Liaw, Yun26 Structural Inference(5/6) - Learning  Outcome of P(Model|TrainingData) Simple model  High P(Model) and Low P(TrainingData|Model) Complex model  High P(TrainingData|Model) and Low P(Model)

2008/10/17Presenter: Liaw, Yun27 Structural Inference(6/6) – Learning and Detection  Model Building Process 1.Starts with an automaton that exactly reflects the input data 2.Continue merging states until a posteriori probability no longer increases (using optimization algorithm such as Viterbi path approximation)  Detection: valid output may receive a small probability since all probabilities of words sum up to 1 If the word is a valid output from the model, return 1, otherwise, return 0

2008/10/17Presenter: Liaw, Yun28 Outline  Introduction  Related Works  Data Model  Detection Models 1.Attribute Length 2.Attribute Character Distribution 3.Structural Inference 4.Token Finder 5.Attribute Presence or Absence 6.Attribute Order  Evaluation 1.Model Validation 2.Detection Effectiveness  Conclusion & Comment

2008/10/17Presenter: Liaw, Yun29 Token Finder(1/2) - Learning  Many attributes are drawn from a limited set of alternatives  Based on different occurrences of parameter values is bounded on some unknown threshold t  When the number of different argument grows proportional to the total number of argument instances, random value is indicated  Calculate correlation ρ between these two functions: x is an increasing number from 1

2008/10/17Presenter: Liaw, Yun30 Token Finder(2/2) – Learning and Detection  Outcome of ρ ρ>0, mark the attribute as random value ρ<0, mark the attribute as enumeration  Detection If an attribute is marked as enumeration  Return 1 for the known value, otherwise, return 0 If an attribute is marked as random value  Always return 1

2008/10/17Presenter: Liaw, Yun32 Attribute Presence or Absence(1/1)  Many hand-crafted attacks focus on a certain parameter, and paid little attention to the others  The Analysis is perform on the whole query  Learning: Record each distinct subset S q = {a i, …, a k } of attributes that is seen during training  Detection: Perform for each query a lookup of the current subset If fit the record subset, return 1, otherwise, return 0

2008/10/17Presenter: Liaw, Yun33 Attribute Order(1/1)  Server-side programs often contain the same relative parameter order, even some of the parameters are omitted, the order is still preserved  Learning 1.Process an ordered list O 2.Make them into directed graph G  number of vertex equal to number of distinct attributes  For each pair (a s, a t ), insert the edge from v s to v t 3.Use Tarjan ’ s algorithm (4) to identify strongly connected component and remove the cycle 4.Add all reachable node pairs into O  Detection  If one pair violates an element in O, return 0, otherwise, return 1 (4) Robert Tarjan. Depth-First Search and Linear Graph Algorithms, 1972.

2008/10/17Presenter: Liaw, Yun35 Evaluation(1/1)  Data sets: Apache log files Google, Inc.  With restricted access because of privacy issues University of California, Santa Barbara Technical University, Vienna  Both two Schools ’ logs are fully accessible

2008/10/17Presenter: Liaw, Yun36 Model Validation(1/2)  The length of training phase was set to 1000 for all following experiments With a long tail much higher than other two Logarithmic scale

2008/10/17Presenter: Liaw, Yun37 Model Validation(2/2)  In Figure 3 and 4 Most attributes have high probability value and drop from above 90% to below 1%  In Table 3 # of queries < # of attributes, because one query may contains many attributes  Google data request varies to the greatest extent, because search string is included

2008/10/17Presenter: Liaw, Yun39 Detection Effectiveness(1/2)  Assumed that training data has no real attacks  Wanted to include Nimda and Code Red worms, but Apache is unable to execute them  Google has highest alarms per day The system parameters is chosen for university log Non-printable characters (probably because of incompatible character sets) Extremely long string (such as URLs pasted directly)  Several anomalous but not malicious queries in the two universities ’ log (some users are testing the system) Low false alarm rate

2008/10/17Presenter: Liaw, Yun40 Detection Effectiveness(2/2)  Use 11 real-world exploits and Code Red 1 buffer overflow attack - phorum 3 directory traversal attacks(../attack) - htmlscript 2 XSS exploits - imp 2 XSS exploits - csSearch 3 input validation error - Webwho  No model can raise alert for all attacks  Reliance on web logs is the limitation of the system produce little false alarm Effective against many attacks that injecting malicious payloads

2008/10/17Presenter: Liaw, Yun42 Conclusions(1/1)  The first anomaly detection system for the web-based attacks  Takes advantage of the correlation between server-side program and the parameter characteristics  The parameter characteristics are learned from input data  Future work: decreasing the number of false positives by refining the algorithms

2008/10/17Presenter: Liaw, Yun43 Comments(1/1)  Similar to our system as an anomaly detection system, but detects deviations with content-based analysis.  Provide several methods to analyze the log content.  Machine learning techniques are pretty useful for developing and enhancing an anomaly detection system.

Anomaly Detection of Web- based Attacks Kruegel, C. and Vigna, G. University of California, Santa Barbara The 10th ACM Conference on Computer and Communication.

Similar presentations

Presentation on theme: "Anomaly Detection of Web- based Attacks Kruegel, C. and Vigna, G. University of California, Santa Barbara The 10th ACM Conference on Computer and Communication."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Anomaly Detection of Web- based Attacks Kruegel, C. and Vigna, G. University of California, Santa Barbara The 10th ACM Conference on Computer and Communication.

Similar presentations

Presentation on theme: "Anomaly Detection of Web- based Attacks Kruegel, C. and Vigna, G. University of California, Santa Barbara The 10th ACM Conference on Computer and Communication."— Presentation transcript:

Similar presentations

About project

Feedback