Presentation is loading. Please wait.

Presentation is loading. Please wait.

Flavio Toffalini, Ivan Homoliak, Athul Harilal,

Similar presentations


Presentation on theme: "Flavio Toffalini, Ivan Homoliak, Athul Harilal,"— Presentation transcript:

1 Detection of Masqueraders Based on Graph Partitioning of File System Access Events
Flavio Toffalini, Ivan Homoliak, Athul Harilal, Alexander Binder, and Martín Ochoa ST Electronics- SUTD Cyber Security Laboratory 39th IEEE Symposium on Security and Privacy on Workshop on Research for Insider Threats. USA, San Francisco, May 24th, 2018.

2 SUTD - CorpLab It is a laboratory co-founded by:
Singapore University of Technology and Design (aka SUTD) ST Electronics National Research Foundation (aka NRF) One of its projects deals with insider threats

3 SUTD - CorpLab TWOS: A Dataset of Malicious Insider Threat Behavior Based on a Gamified Competition + Insight into Insiders: A Survey of Insider Threat Taxonomies, Analysis, Modeling, and Countermeasures Others about host detection and continuously authentication

4 Agenda Our goal Attacker model Previous approaches Intuition
Markov Cluster Chain Overview Implementation Qualitative evaluation Efficiency Analysis

5 Our Goal We aim at catching masqueraders by using an anomaly detection system Masquerader

6 Attacker Model What is a masquerader? Masquerader Legitimate

7 Attacker Model What is a masquerader?
He is any (malicious) user who acts on behalf of a legitimate user He already bypassed previous system access controls He can exfiltrate data He can sabotage the machine

8 Alternative detection
Previous approaches Scenarios Logs File systems Network Login/logout Features Extraction + Machine Learning Alternative detection “Raw data” + Deep Learning

9 Intuition e.g., file system access Legitimate tasks follow patterns
Masquerader tasks are different A task generates events Modeling events as graphs e.g., file system access

10 Intuition Users’ file system access: Timestamp Action File
05/05/ :10:05.. Open C:\File1 05/05/ :10:06.. Read C:\File2 05/05/ :10:07.. 05/05/ :11:13.. Close 05/05/ :12:05.. Delete 05/05/ :12:10.. 05/05/ :13:11.. List Dir C:\Temp\ 05/05/ :13:21.. Create C:\Temp\ToExfiltrate

11 Markov Cluster Graph C:\User\Alice\Documets\Project1.doc
C:\User\Alice\Documents\Prject3.doc C:\User\Alice\Documents\Project2.doc C:\User\Alice\Documents\Administration\.. . Vertex Cluster

12 Markov Cluster Graph What does a Vertex Cluster mean?
A set of resources (i.e., files) used to achieve a task Vertex Cluster

13 Overview -> 0 : ES not so similar to the history H
-> 1 : ES very similar to the history H similarityFunction(H, ES) -> [0, 1] History H Event Sequence ES New list of file access logs Built from file access logs previously generated

14 Overview true: ES is Legitimate False: ES is Malicious
similarityFunction(H, ES) > t Threshold t

15 similarityFunction(…)
Implementation similarityFunction(…) Markov Cluster Graph

16 Implementation History
Similarity functions (we tried different approaches)

17 History History of a user U File system logs

18 History History of a user U Time windows

19 History History of a user U Graphs

20 History History of a user U Markov Cluster Graph History VC1: VC2:
A set of Vertex Cluster

21 Implementation Still a list of event
similarityFunction(HU, ES) -> [0, 1] History HU Event Sequence ES Still a list of event

22 Implementation Split ES in Time windows (as for the history)
Then extracting Vertex Clusters Event Sequence ESW Time window W

23 Similarity Function Inside
similarityFunction(HU, ESW) -> [0, 1] History HU Event Sequence ESW How to compare H and ES?

24 Similarity Function Inside
History HU Event Sequence ESW - History: is a set of Vertex Cluster - Event Sequence: is a set of Vertex Cluster

25 Similarity Function Inside
History HU Event Sequence ESW - History: is a set of Vertex Cluster - Event Sequence: is a set of Vertex Cluster

26 Similarity Function Inside
History HU Event Sequence ESW We built 7 similarity functions based on set comparison operators e.g., equal, subset, superset

27 Similarity Function Inside
Just an example, the simplest: SimilaryByEqual(HU, ESW) { m <- 0 for all s in ESW do for all h in HU do if s == h then m <- m + 1 break return m/| ESW | } Idea: trying to understand how many elements of ESW are contained in HU Other more complex versions in the paper

28 Similarity Function Inside
Something more complex SimilaryBySubsetWeight(HU, ESW) { m <- 0 n <- 0 for all s in ESW do for all h in HU do if s isSubset h then m <- m + |s|/|h| n <- n + 1 m’ <- m/n return m’/| ESW | } Idea2.0: we propose a weighted sum between HU and ESW . Sum elements ratio Normalization Other more complex versions in the paper

29 Evaluation That’s hard to find a good dataset
WUIL: The Windows-Users and -Intruder simulations Logs dataset TWOS: A Dataset of Malicious Insider Threat Behavior Based on a Gamified Competition Both contain file system access logs

30 WUIL Dataset Legitimate user activities: from real users
Masquerader user activities: synthetic For each user: 3 sessions of malicious logs, 5 min long each (around 15 min in total) Around 70 users in total

31 TWOS Dataset Legitimate user activities: from real users
Masquerader user activities: from real users too for each user: 1 sessions of malicious logs 1 hour long Around 20 users in total User behaviors from a gamified experiment

32 Setting WUIL: time window 30s 1m 2m TWOS: time window 10m 20m 30m
5-fold cross validation

33 Setting WUIL and TWOS have labeled data User U Legitimate Masquerader

34 Setting WUIL and TWOS History (training set): only legitimate
Test (test set): masquerade and legitimate User U

35 Setting WUIL and TWOS 5-fold cross validation User U 1 2
User for making the History 3 4 5 User for making the Test M

36 Results Area Under the Curve (AUC)
Receiver Operating Characteristic Curve (ROC) Best Configuration (threshold) Efficiency Analysis (time)

37 Area Under the Curve On average per user Sim. Function used:
WUIL (2m) 0.851 TWOS (30m) On average per user Sim. Function used: Subset OR Superset w/ weight

38 Receiver Operating Characteristic Curve
On average per user Sim. Function used: Subset OR Superset w/ weight

39 Best Configuration WUIL dataset (real legitimate user activities + synthetic attacks): TWOS dataset (real legitimate user activities + real attacks) True Positive Ratio False Positive Ratio Our results 95% 9% Previous results 91.5% 11.81% True Positive Ratio False Positive Ratio Our results 91% 11% Previous results No previous results

40 Efficiency Analysis How expensive is Markov Cluster Graph algorithm?
Mean time to analyze an ES (from a list of logs to a vertex cluster) WUIL: 0.015s (2 min TW) TWOS: 0:016s (30 min TW)

41 Future works Reducing False Positive Ratie
Developing auto-tune techniques Try over different types of logs (network, SQL queries, HTTP logs)

42 THANK YOU


Download ppt "Flavio Toffalini, Ivan Homoliak, Athul Harilal,"

Similar presentations


Ads by Google