Modeling IDS using hybrid intelligent systems Peddabachigari et al. (2007) Presenter: Andy Tang
Background Intrusion defense system (IDS): 2nd line of defense, after authentication / encryption Two main types: Misuse intrusion – well-defined patterns of attack, encoded in advance Anomaly intrusion – deviation from baseline behaviors, approximated by differences Machine learning (ML) paradigm of IDS: model intrusion detection as a binary classification task with input features from various sensors Neural nets (NN) Genetic programming (GP) Fuzzy logic (FL) Decision trees (DT) Support vector machines (SVM)
General IDS model
General IDS model Sequence of events E(1) – E(2) – … – E(n) ⇒ classification @ E(n) General IDS model
Updated rules (statistical or rule-based) dependent on recent behaviors General IDS model
Updates baseline behaviors General IDS model
General IDS model Some Limitations: A misuse detector paradigm, requires complimentary anomaly detection. Rule-based (ES), limited representation power w.r.t. transition patterns. Does not scale with complex (i.e., non-linear) sequences of behaviors. Depends on quality and quantity of pre-defined rules. General IDS model
Goal: represent characteristics of intrusion behavior by adaptive behavioral “models”. ML style of IDS
Goal: represent characteristics of intrusion behavior by adaptive behavioral “models”. End-to-end pipeline from input data streams to classification rules. Direct feedback from empirical performance to adjust extracted patterns and model parameters. ML style of IDS
ML style of IDS Neural Nets Inputs: a sliding window of w previous records. Output: classification rule Learning Mechanism: backpropagation by vector-jacobian products. “Hidden layers” are intermediary transformed feature spaces ≈ behavioral patterns. ML style of IDS
ML style of IDS Fuzzy logic (FL) Inputs: set of rules, operators and knowledge base(i.e., user-specified graph dependencies). Output: classification rule Learning Mechanism: linear combinations of pre-defined rules on transformed features. ML style of IDS
ML style of IDS Support Vector Machines (SVM) Inputs: feature space Output: classification rule Learning Mechanism: linear combinations of support vectors based Deals w/ nonlinearity in decision space by transforming the feature dimensions (Kernel Trick) More on this later. ML style of IDS
SVM Background
SVM Background
SVM Background e = vector of 1’s Q is SPD, also called the “kernel” matrix. SVM Background
Kernel Trick
SVM Cost Sample complexity: Runtime of QP-solver for support vector selection: w/ sparse feature space w/ dense feature space Key assumption in paper: Kernel selection (and support vector computation separate from “runtime” of SVM during training / testing)! SVM Cost
ML style of IDS Decision Trees (DT) Inputs: joint feature and output space Output: a set of decision rules in the joint space Learning Mechanism: weighted combination of decision rules User redefines the max number of leaves, leading to fixed complexity Already an ensemble system! ML style of IDS
Decision tree example
Others: Evolutionary computation (EC) and genetic programs (GP) Hybrid system: DT-SVM ML style of IDS
Hybrid DT-SVM mode.
Experiments KDD cup 1999 data: 5 million connection records, 24 attack types categorized into 4 classes of intrusions: Denial of service (DOS): sequence of behaviors leading to overload of resources for service. Remote to user (R2L): static attack sending packages over network to gain access in vulnerable nodes. User to Root (U2R): static attack with access to normal user account, gains root access by system vulnerabilities. Probing: sends sequential information over network to identify new vulnerabilities.
SVM vs. DT vs. DT-SVM Experiment Design: Input features: 41 attributes for each connection (e.g., content features, no. of failed logins) Labels: multilabel (1 for each class of attack) for decision trees, multiclass (5 classes for SVM and DT-SVM) “Ensemble” = DT-SVM + SVM + DT. Train-test split over 11,982 records (5092 Tr, 6890 Te) Kernel selection for SVM on training sets (Polynomial p=2) selected. SVM vs. DT vs. DT-SVM
DT vs. SVM performance
Comparison w/ DT-SVM and ensemble Key observations: Performance gain almost exclusively from DT No insight into statistical significance was investigated 1 train-test split for each task, failure to provide variance of performance Remark: DT is already an ensemble approach. Lack of systematic approach for picking ensemble combinations. Comparison w/ DT-SVM and ensemble
Moving Forward: Addition of Transfer Learning?
Discussion 1) It is interesting to consider how this ensemble combination (DT-SVM) would fair against other ensemble combinations: decision tree with neural nets (DT-NN), SVM-NN, RBF-SVM with Linear-SVM, ... etc. Do you think there is a general rule for choosing ensemble pairs (or feature / kernel spaces and loss functions) that best compliment each other? 2) What feasibility issues (e.g., cost, runtime issues) may occur when trying to implement this ensemble IDS in CAVs? 3) Why do you think the U2R attack predictions had the lowest performance across all the learning models? Can you think of other classes of attacks similar to U2R that may be difficult for the proposed method to identify correctly?
Q & A