Evolving Insider Threat Detection

Slides:

Advertisements

Similar presentations

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois.

Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.

Support vector machine

Machine learning continued Image source:

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Data Stream Classification: Training with Limited Amount of Labeled Data Mohammad Mehedy Masud Latifur Khan Bhavani Thuraisingham University of Texas at.

Pallabi Parveen, Nate McDaniel, Varun S. Hariharan, Bhavani Thuraisingham and Latifur Khan Department of Computer Science at The University of Texas at.

SLIQ: A Fast Scalable Classifier for Data Mining Manish Mehta, Rakesh Agrawal, Jorma Rissanen Presentation by: Vladan Radosavljevic.

Date : 21 st of May, Shri Ramdeo Baba College of Engineering and Management Presentation By : Rimjhim Singh Under the Guidance of: Dr. M.B. Chandak.

SUPPORT VECTOR MACHINES PRESENTED BY MUTHAPPA. Introduction Support Vector Machines(SVMs) are supervised learning models with associated learning algorithms.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.

Discovering Substructures in Chemical Toxicity Domain Masters Project Defense by Ravindra Nath Chittimoori Committee: DR. Lawrence B. Holder, DR. Diane.

Graph-Based Concept Learning Jesus A. Gonzalez, Lawrence B. Holder, and Diane J. Cook Department of Computer Science and Engineering University of Texas.

Structural Knowledge Discovery Used to Analyze Earthquake Activity Jesus A. Gonzalez Lawrence B. Holder Diane J. Cook.

Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.

Machine Learning as Applied to Intrusion Detection By Christine Fossaceca.

Detection and Resolution of Anomalies in Firewall Policy Rules

Wang, Z., et al. Presented by: Kayla Henneman October 27, 2014 WHO IS HERE: LOCATION AWARE FACE RECOGNITION.

Evolving Insider Threat Detection

A Hybrid Model to Detect Malicious Executables Mohammad M. Masud Latifur Khan Bhavani Thuraisingham Department of Computer Science The University of Texas.

Combining Supervised and Unsupervised Learning for Zero-Day Malware Detection © 2013 Narus, Inc. Prakash Comar 1 Lei Liu 1 Sabyasachi (Saby) Saha 2 Pang-Ning.

Chirag N. Modi and Prof. Dhiren R. Patel NIT Surat, India Ph. D Colloquium, CSI-2011 Signature Apriori based Network.

A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau,

Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.

Alert Correlation for Extracting Attack Strategies Authors: B. Zhu and A. A. Ghorbani Source: IJNS review paper Reporter: Chun-Ta Li ( 李俊達 )

Learning from Imbalanced, Only Positive and Unlabeled Data Yetian Chen

Semi-Supervised Learning with Concept Drift using Particle Dynamics applied to Network Intrusion Detection Data Fabricio Breve Institute of Geosciences.

Network Intrusion Detection Using Random Forests Jiong Zhang Mohammad Zulkernine School of Computing Queen's University Kingston, Ontario, Canada.

Intrusion Detection Using Hybrid Neural Networks Vishal Sevani ( )

An Overview of Intrusion Detection Using Soft Computing Archana Sapkota Palden Lama CS591 Fall 2009.

One-class Training for Masquerade Detection Ke Wang, Sal Stolfo Columbia University Computer Science IDS Lab.

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

Classification and Novel Class Detection in Data Streams Classification and Novel Class Detection in Data Streams Mehedy Masud 1, Latifur Khan 1, Jing.

Boundary Detection in Tokenizing Network Application Payload for Anomaly Detection Rachna Vargiya and Philip Chan Department of Computer Sciences Florida.

1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery.

Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

Data Mining and Decision Support

Anomaly Detection. Network Intrusion Detection Techniques. Ştefan-Iulian Handra Dept. of Computer Science Polytechnic University of Timișoara June 2010.

 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:

1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.

PEER TO PEER BOTNET DETECTION FOR CYBER- SECURITY (DEFENSIVE OPERATION): A DATA MINING APPROACH Masud, M. M. 1, Gao, J. 2, Khan, L. 1, Han, J. 2, Thuraisingham,

REU 2009-Traffic Analysis of IP Networks Daniel S. Allen, Mentor: Dr. Rahul Tripathi Department of Computer Science & Engineering Data Streams Data streams.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Mining Concept-Drifting Data Streams Using Ensemble Classifiers Haixun Wang Wei Fan Philip S. YU Jiawei Han Proc. 9 th ACM SIGKDD Internal Conf. Knowledge.

1. ABSTRACT Information access through Internet provides intruders various ways of attacking a computer system. Establishment of a safe and strong network.

Experience Report: System Log Analysis for Anomaly Detection

CS 9633 Machine Learning Support Vector Machines

Semi-Supervised Clustering

Active Learning Intrusion Detection using k-Means Clustering Selection

Introductory Seminar on Research: Fall 2017

Basic machine learning background with Python scikit-learn

An Enhanced Support Vector Machine Model for Intrusion Detection

LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS

PEBL: Web Page Classification without Negative Examples

K Nearest Neighbor Classification

Data Mining Practical Machine Learning Tools and Techniques

A survey of network anomaly detection techniques

Experiments in Machine Learning

Presented by: Prof. Ali Jaoua

Unsupervised Ensemble Based Learning for Insider Threat Detection

Discriminative Frequent Pattern Analysis for Effective Classification

iSRD Spam Review Detection with Imbalanced Data Distributions

Support Vector Machines and Kernels

Concave Minimization for Support Vector Machine Classifiers

Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1 classifier 2 classifier.

Physics-guided machine learning for milling stability:

Exploiting the Power of Group Differences to Solve Data Analysis Problems Outlier & Intrusion Detection Guozhu Dong, PhD, Professor CSE

Modeling IDS using hybrid intelligent systems

Presentation transcript:

Evolving Insider Threat Detection Pallabi Parveen Dr. Bhavani Thuraisingham (Advisor) Dept of Computer Science University of Texas at Dallas Funded by AFOSR

Outline Evolving Insider threat Detection Unsupervised Learning

Evolving Insider Threat Detection System log j System traces System Traces weeki+1 weeki Anomaly? Feature Extraction & Selection Testing on Data from weeki+1 Online learning Gather Data from Weeki Feature Extraction & Selection Learning algorithm Supervised - One class SVM, OCSVM Unsupervised - Graph based Anomaly detection, GBAD Ensemble based Stream Mining Ensemble of Models Update models

Insider Threat Detection using unsupervised Learning based on Graph

Outlines: Unsupervised Learning Insider Threat Related Work Proposed Method Experiments & Results

Definition of an Insider An Insider is someone who exploits, or has the intention to exploit, their legitimate access to assets for unauthorised purposes

Insider Threat is a real threat Computer Crime and Security Survey 2001 $377 million financial losses due to attacks 49% reported incidents of unauthorized network access by insiders

Insider Threat : Continue Detection Prevention Detection based approach: Unsupervised learning, Graph Based Anomaly Detection Ensembles based Stream Mining

Evolving Insider Threat Detection System log Feature Extraction & Selection Anomaly? j System traces System Traces weeki+1 weeki Testing on Data from weeki+1 Online learning Gather Data from Weeki Feature Extraction & Selection Learning algorithm Supervised - One class SVM, OCSVM Unsupervised - Graph based Anomaly detection, GBAD Ensemble based Stream Mining Ensemble of Models Update models

Related work "Intrusion Detection Using Sequences of System Calls," Supervised learning by Hofmeyr "Mining for Structural Anomalies in Graph-Based Data Representations (GBAD) for Insider Threat Detection." Unsupervised learning by Staniford-Chen and Lawrence Holder All are static in nature. Cannot learn from evolving Data stream

Related Approaches and comparison with proposed solutions Techniques Proposed By Challenges Supervised/Unsuper vised Concept-drift Insider Threat Graph-based Forrest, Hofmeyr Supervised X √ Masud , Fan (Stream Mining) N/A Liu Unsupervised Holder (GBAD) Our Approach (EIT)

Why Unsupervised Learning? One approach to detecting insider threat is supervised learning where models are built from training data. Approximately .03% of the training data is associated with insider threats (minority class) While 99.97% of the training data is associated with non insider threat (majority class). Unsupervised learning is an alternative for this.

Why Stream Mining All are static in nature. Cannot learn from evolving Data stream Current decision boundary Data Stream Data Chunk Previous decision boundary Normal Data Anomaly Data Instances victim of concept drift

Proposed Method Graph based anomaly detection (GBAD, Unsupervised learning) [2] + Ensemble based Stream Mining

GBAD Approach Determine normative pattern S using SUBDUE minimum description length (MDL) heuristic that minimizes: M(S,G) = DL(G|S) + DL(S)

Unsupervised Pattern Discovery Graph compression and the minimum description length (MDL) principle The best graphical pattern S minimizes the description length of S and the description length of the graph G compressed with pattern S where description length DL(S) is the minimum number of bits needed to represent S (SUBDUE) Compression can be based on inexact matches to pattern S1 S1 S1 S1 S1 S2 S2 S2

Three types of anomalies Three algorithms for handling each of the different anomaly categories using Graph compression and the minimum description length (MDL) principle: GBAD-MDL finds anomalous modifications GBAD-P (Probability) finds anomalous insertions GBAD-MPS (Maximum Partial Substructure) finds anomalous deletions

Example of graph with normative pattern and different types of anomalies GBAD-P (insertion) G C G G G A B C D A B C D A B E D A B C D A B C D GBAD-MPS (Deletion) GBAD-MDL (modification) Normative Structure

Proposed Method Graph based anomaly detection (GBAD, Unsupervised learning) + Ensemble based Stream Mining

Characteristics of Data Stream Continuous flow of data Examples: Network traffic Sensor data Call center records

DataStream Classification Single Model Incremental classification Ensemble Model based classification Ensemble based is more effective than incremental approach.

Ensemble of Classifiers + C2 x,? + + C3 input - Individual outputs voting Ensemble output Classifier

Proposed Ensemble based Insider Threat Detection (EIT) Maintain K GBAD models q normative patterns Majority Voting Updated Ensembles Always maintain K models Drop least accurate model

Ensemble based Classification of Data Streams (unsupervised Learning--GBAD) Build a model (with q normative patterns) from each data chunk Keep the best K such model-ensemble Example: for K = 3 Data chunks D1 C1 D2 C2 D4 C4 D5 C5 D3 C3 D6 D5 D4 Update Ensemble Testing chunk Model with Normative Patterns Prediction C4 C5 C1 C2 C4 C3 C5 Ensemble

EIT –U pseudocode Ensemble (Ensemble A, test Graph t, Chunk S) LABEL/TEST THE NEW MODEL 1: Compute new model with q normative Substructure using GBAD from S 2: Add new model to A 3: For each model M in A 4: For each Class/ normative substructure, q in M 5: Results1  Run GBAD-P with test Graph t & q 6: Results2 Run GBAD-MDL with test Graph t & q 7: Result3 Run GBAD-MPS with test Graph t & q 8: Anomalies Parse Results (Results1, Results2, Results3) End For 9: For each anomaly N in Anomalies 10: If greater than half of the models agree 11: Agreed Anomalies  N 12: Add 1 to incorrect values of the disagreeing models 13: Add 1 to correct values of the agreeing models UPDATE THE ENSEMBLE: 14: Remove model with lowest (correct/(correct + incorrect)) ratio End Ensemble

Experiments 1998 MIT Lincoln Laboratory 500,000+ vertices K =1,3,5,7,9 Models q= 5 Normative substructures per model/ Chunk 9 weeks Each chunk covers 1 week

A Sample system call record from MIT Lincoln Dataset header,150,2, execve(2),,Fri Jul 31 07:46:33 1998, + 652468777 msec path,/usr/lib/fs/ufs/quota attribute,104555,root,bin,8388614,187986,0 exec_args,1, /usr/sbin/quota subject,2110,root,rjm,2110,rjm,280,272,0-0-172.16.112.50 return,success,0 trailer,150

Token Sub-graph

Total False Positives/Negative Performance Total Ensemble Accuracy # of Models Total False Positives/Negative True Positives False Positives False Negatives Normal GBAD 9 920 K=3 188 K=5 180 K=7 179 K=9 150

Performance Contd.. 0 false negatives Significant decrease in false positives Number of Model increases False positive decreases slowly after k=3

Performance Contd.. Distribution of False Positives

Performance Contd.. Summary of Dataset A & B Entry Description—Dataset A Description—Dataset B User Donaldh William # of vertices 269 1283 # of Edges 556 469 Week 2-8 4-7 Day Friday Thursday

Performance Contd.. The effect of q on TP rates for fixed K = 6 on dataset A The effect of q on FP rates for fixed K = 6 on dataset A The effect of q on runtime For fixed K = 6 on Dataset A

Performance Contd.. The effect of K on runtime for True Positive vs # normative substructure for fixed K=6 on dataset A True Positive vs # normative substructure for fixed K=6 on dataset A Performance Contd.. The effect of K on runtime for fixed q = 4 on Dataset A The effect of K on TP rates for fixed q = 4 on dataset A

Evolving Insider Threat Detection using Supervised Learning

Evolving Insider Threat Detection System log Feature Extraction & Selection Anomaly? j System traces System Traces weeki+1 weeki Testing on Data from weeki+1 Online learning Gather Data from Weeki Feature Extraction & Selection Learning algorithm Supervised - One class SVM, OCSVM Unsupervised - Graph based Anomaly detection, GBAD Ensemble based Stream Mining Ensemble of Models Update models

Outlines: Supervised Learning Related Work Proposed Method Experiments & Results

Related Approaches and comparison with proposed solutions Techniques Proposed By Challenges Supervised/Unsupervised Concept- drift Insider Threat Graph-based Liu Unsupervised X √ Holder (GBAD) Masud , Fan (Stream Mining) Supervised N/A Forrest, Hofmeyr Our Approach (EIT-U) Our Approach (EIT-S)

Why one class SVM Insider threat data is minority class Traditional support vector machines (SVM) trained from such an imbalanced dataset are likely to perform poorly on test datasets specially on minority class One-class SVMs (OCSVM) addresses the rare-class issue by building a model that considers only normal data (i.e., non-threat data). During the testing phase, test data is classified as normal or anomalous based on geometric deviations from the model.

Proposed Method One class SVM (OCSVM) , Supervised learning + Ensemble based Stream Mining

One class SVM (OCSVM) Maps training data into a high dimensional feature space (via a kernel). Then iteratively finds the maximal margin hyper plane which best separates the training data from the origin corresponds to the classification rule: For testing, f(x) < 0. we label x as an anomaly, otherwise as normal data f(X) = <w,x> + bwhere w is the normal vector and b is a bias term

Proposed Ensemble based Insider Threat Detection (EIT) Maintain K number of OCSVM (One class SVM) models Majority Voting Updated Ensemble Always maintain K models Drop least accurate model

Ensemble based Classification of Data Streams (supervised Learning) Divide the data stream into equal sized chunks Train a classifier from each data chunk Keep the best K OCSVM classifier-ensemble Example: for K= 3 D1 C1 D2 C2 D4 C4 D3 C3 D5 C5 D5 D6 D4 Labeled chunk Data chunks Unlabeled chunk Prediction C5 C4 Addresses infinite length and concept-drift Classifiers C1 C4 C2 C3 C5 Ensemble

EIT –S pseudo code (Testing) Algorithm 1 Testing Input: A← Build-initial-ensemble() Du← latest chunk of unlabeled instances Output: Prediction/Label of Du 1: Fu Extract&Select-Features(Du) //Feature set for Du 2: for each xj∈ Fu do 3. ResultsNULL 4. for each model M in A 5. Results Results U Prediction (xj, M) end for 6. Anomalies Majority Voting (Results)

EIT –S pseudocode Algorithm 2 Updating the classifier ensemble Input: Dn: the most recently labeled data chunks, A: the current ensemble of best K classifiers Output: an updated ensemble A 1: for each model M ∈ A do 2: Test M on Dn and compute its expected error 3: end for 4: Mn  Newly trained 1-class SVM classifier (OCSVM) from data Dn 5: Test Mn on Dn and compute its expected error 6: A  best K classifiers from Mn ∪ A based on expected error

Time, userID, machine IP, command, argument, path, return Feature Set extracted Time, userID, machine IP, command, argument, path, return 1 1:29669 6:1 8:1 21:1 32:1 36:0

PERFORMANCE…..

Performance Contd.. Updating vs Non-updating stream approach False Positives 13774 24426 True Negatives 44362 33710 False Negatives 1 True Positives 9 Accuracy 0.76 0.58 False Positive Rate 0.24 0.42 False Negative Rate 0.1

Supervised (EIT-S) vs. Unsupervised(EIT-U) Learning Performance Contd.. Supervised (EIT-S) vs. Unsupervised(EIT-U) Learning Summary of Dataset A Supervised Learning Unsupervised Learning False Positives 55 95 True Negatives 122 82 False Negatives 5 True Positives 12 7 Accuracy 0.71 0.56 False Positive Rate 0.31 0.54 False Negative Rate 0.42 Entry Description—Dataset A User Donaldh # of records 189 Week 2-7 (Friday only)

Conclusion & Future Work Evolving Insider threat detection using Stream Mining Unsupervised learning and supervised learning Future Work: Misuse detection in mobile device Cloud computing for improving processing time.

Publication Conference Papers: “Insider Threat Detection Using Stream Mining and Graph Mining,” in Proc. of the Third IEEE international Conference on Information Privacy, Security, Risk and Trust (PASSAT 2011), October 2011, MIT, Boston, USA (full paper acceptance rate: 8%). ”Supervised Learning for Insider Threat Detection Using Stream Mining”, in 23rd IEEE International Conference on Tools with Artificial Intelligence (ICTAI2011), Nov. 2011, Boca Raton, Florida, USA (full paper acceptance rate is 30%)

References W. Eberle and L. Holder, Anomaly detection in Data Represented as Graphs, Intelligent Data Analysis, Volume 11, Number 6, 2007. http://ailab.wsu.edu/subdue W. Ling Chen, Shan Zhang, Li Tu: An Algorithm for Mining Frequent Items on Data Stream Using Fading Factor. COMPSAC(2) 2009: 172-177 S. A. Hofmeyr, S. Forrest, and A. Somayaji, “Intrusion Detection Using Sequences of System Calls,” Journal of Computer Security, vol. 6, pp. 151-180, 1998. M. Masud, J. Gao, L. Khan, J. Han, B. Thuraisingham, “A Practical Approach to Classify Evolving Data Streams: Training with Limited Amount of Labeled Data,” Int.Conf. on Data Mining, Pisa, Italy, December 2010.

Thank You