Presentation is loading. Please wait.

Presentation is loading. Please wait.

Unsupervised Ensemble Based Learning for Insider Threat Detection

Similar presentations


Presentation on theme: "Unsupervised Ensemble Based Learning for Insider Threat Detection"— Presentation transcript:

1 Unsupervised Ensemble Based Learning for Insider Threat Detection
Pallabi Parveen, Nate McDaniel, Varun S. Hariharan, Bhavani Thuraisingham and Latifur Khan Department of Computer Science at The University of Texas at Dallas

2 Outlines Insider Threat LZW & Quantized Dictionary Concept Drift
Experiments & Results Now I will present Outline in the context of unsupervised learning First I will talk about what is Insider threat, and how we can detect it. Second I will talk about some existing works on insider threat. Next I will talk about our proposed method for detecting insider threat. Then I will talk about our experiment and as well as about the outcome.

3 Definition of an Insider
What is the Problem? Definition of an Insider An Insider is someone who exploits, or has the intention to exploit, his/her legitimate access to assets for unauthorised purposes. attacks by people with legitimate access to an organization’s computers and networks represent a growing problem in our digital world. An Insider is someone who exploits, or has the intention to exploit, their legitimate access to assets for unauthorised purposes Insiders are not just employees today, they can include contractors, business partners, auditors... even an alumnus with a valid address. The term can also apply to an outside person who poses as an employee or officer by obtaining false credentials. An insider threat is a malicious hacker (also called a cracker or a black hat) For example, over time, legitimate users may enter commands that read or write private data, or install malicious software

4 Motivation Computer Crime and Security Survey 2001
$377 million financial losses due to attacks 49% reported incidents of unauthorized network access by insiders WikiLeaks Breach Highlights Insider Security Threat--Even the toughest security systems sometimes have a soft center that can be exploited by someone who has passed rigorous screening There was a Computer crime and security survey on 2001, It was found that there was a financial loss of more 300 million dollars due to several attacks. Among all theses attacks it was reported that 49% of the incidents were unauthorized access by the insiders.

5 Challenges/Issues Reduce false alarm rate without sacrificing threat detection rate Threat detection is challenging since insiders mask and adapt their behavior to resemble legitimate system.

6 Unsupervised Sequence Learning
Normal users have a repetitive sequence of commands, system calls etc.. A sudden deviation from normal behavior, raises an alarm indicating an insider threat To find an insider threat We need to collect these repeated sequences of commands in an unsupervised fashion First challenge: variability in sequence length Overcome: Generating a LZW dictionary with combinations of possible potential patterns in the gathered data using Lempel- Ziv- Welch algorithm (LZW) Second Challenge: Huge size of the Dictionary Overcome: Compress the Dictionary Potential variations that could emerge within the data include the commencement of new events, the omission or modification of existing events, or the reordering of events in the sequence. However, the huge size of the dictionary presents another significant challenge

7 Ensemble of Models Using an ensemble of models increases the accuracy of threat anomaly detection New data chunks create new models Problem: Ensemble holds K models and there are K+1 Solution: Remove the least accurate model Majority voting by all models used to determine the model that is performing the worst Potential variations that could emerge within the data include the commencement of new events, the omission or modification of existing events, or the reordering of events in the sequence. However, the huge size of the dictionary presents another significant challenge

8 Unsupervised Sequence Learning
Technical Approach: Stream-based Sequence Learning for Insider Threat Detection Indexed the system calls with Unicode Anomaly? j System call/ command System Call/ Command Chunki+1 Chunki System log Testing on Data from weeki+1 Online learning Gather Data from Unsupervised Sequence Learning Compressed the Dictionary (QD) Generate a LZW Dictionary (D) containing all possible patterns using Lempel-Ziv-welch Algorithm Incremental based Stream Mining Update the previous QD Update models

9 Example of LZW & Quantized Dictionary
liftliftlifliftliftliftliftliftliftliftliftliftliftlift lift LZW Dictionary Quantized Dictionary Lossy compression Unlabeled data stream LZW li lif lift If Ift Iftl ft ftl ftli tl tli tlif

10 Construct a Quantized Dictionary

11 Block Diagram of Incremental Learning
LZW Dictionary PREVIOUS CHUNK OLD Quantized Dictionary (OQD) NEW CHUNK LZW Dictionary Session 1 Session 2 Session n New Quantized Dictionary (NQD) compression

12 Anomaly Detection Given data test stream S and quantized dictionary QD = {qd1, qd2, …}, An anomaly is a phrase/pattern in the stream which is more than α edit distance from all the patterns in QD Steps in identifying non-matching phrases Compute edit distance matrix L for each phrase in dictionary and data stream S If the edit distance is within α edit distance , delete the matching part from the stream S Remaining patterns in the stream S is considered as anomaly

13 Concept Drift User command patterns shift over time
i.e. programmer slowly evolves into an advanced programmer Changes in users’ habits should not be identified as anomalies Attribute natural changes to concept drift Concept drift can be added artificially and anomalies are still detected

14 Drift Formula

15 Drift Example drift = [.7071, 1.1180, 1.5811, 1.5811, 1.5811]
drift = [.7071, , , , ] Min/Max distributions = [.42929/.57071, /.31180, 0/.25811, 0/.25811, 0/.25811]

16 Compared Algorithms Modified Naïve Bayes that uses incremental approach(NB-INC)* Unsupervised ensemble approach (USSL-GG) that incrementally tests for anomalies and best performs with an ensemble size of 3 (*) R. A. Maxion, “Masquerade detection using enriched command lines,” in Proc. IEEE International Conference on Dependable Systems & Networks (DSN), 2003, pp. 5–14.

17 Results - Definitions

18 Results TPR FPR Accuracy Time(sec) Drift NB-INC USSL-GG 0.000001 0.34
0.49 0.12 0.10 0.80 0.85 0.44 0.47 52.0 3.60 0.36 0.58 0.09 0.79 0.87 0.50 0.54 50.8 3.54 0.0001 0.37 0.51 0.11 0.82 0.86 0.45 51.0 3.55 0.001 0.38 0.81 53.4

19 TPR – Ensemble Size 3

20 FPR – Ensemble Size 3

21 TPR – Various Drift Values

22 FPR– Various Drift Values

23 Accuracy– Various Drift Values

24

25

26 Accomplishment so far Ensemble based stream mining effectively detects insider threats while coping with evolving concept drift Our approach adopts advantages from stream mining, compression and ensembles– Compression gives unsupervised learning Stream mining offered adaptive learning Ensembles increase accuracy with concept drift

27 Comparison of Related Approaches
Un/Supervised Drift Insider Threat Sequence Ju S N Y Maxion Liu U Wang Szymanski Masud Parveen USSL-GG

28 What remains to be accomplished?
Update existing models based on user feedback Update and refine models on ground truth when it is available

29 THANKS


Download ppt "Unsupervised Ensemble Based Learning for Insider Threat Detection"

Similar presentations


Ads by Google