Download presentation
Presentation is loading. Please wait.
1
Detection of Masqueraders Based on Graph Partitioning of File System Access Events
Flavio Toffalini, Ivan Homoliak, Athul Harilal, Alexander Binder, and Martín Ochoa ST Electronics- SUTD Cyber Security Laboratory 39th IEEE Symposium on Security and Privacy on Workshop on Research for Insider Threats. USA, San Francisco, May 24th, 2018.
2
SUTD - CorpLab It is a laboratory co-founded by:
Singapore University of Technology and Design (aka SUTD) ST Electronics National Research Foundation (aka NRF) One of its projects deals with insider threats
3
SUTD - CorpLab TWOS: A Dataset of Malicious Insider Threat Behavior Based on a Gamified Competition + Insight into Insiders: A Survey of Insider Threat Taxonomies, Analysis, Modeling, and Countermeasures Others about host detection and continuously authentication
4
Agenda Our goal Attacker model Previous approaches Intuition
Markov Cluster Chain Overview Implementation Qualitative evaluation Efficiency Analysis
5
Our Goal We aim at catching masqueraders by using an anomaly detection system Masquerader
6
Attacker Model What is a masquerader? Masquerader Legitimate
7
Attacker Model What is a masquerader?
He is any (malicious) user who acts on behalf of a legitimate user He already bypassed previous system access controls He can exfiltrate data He can sabotage the machine
8
Alternative detection
Previous approaches Scenarios Logs File systems Network Login/logout … Features Extraction + Machine Learning Alternative detection “Raw data” + Deep Learning
9
Intuition e.g., file system access Legitimate tasks follow patterns
Masquerader tasks are different A task generates events Modeling events as graphs e.g., file system access
10
Intuition Users’ file system access: Timestamp Action File
05/05/ :10:05.. Open C:\File1 05/05/ :10:06.. Read C:\File2 05/05/ :10:07.. 05/05/ :11:13.. Close 05/05/ :12:05.. Delete 05/05/ :12:10.. 05/05/ :13:11.. List Dir C:\Temp\ 05/05/ :13:21.. Create C:\Temp\ToExfiltrate
11
Markov Cluster Graph C:\User\Alice\Documets\Project1.doc
C:\User\Alice\Documents\Prject3.doc C:\User\Alice\Documents\Project2.doc C:\User\Alice\Documents\Administration\.. . Vertex Cluster
12
Markov Cluster Graph What does a Vertex Cluster mean?
A set of resources (i.e., files) used to achieve a task Vertex Cluster
13
Overview -> 0 : ES not so similar to the history H
-> 1 : ES very similar to the history H similarityFunction(H, ES) -> [0, 1] History H Event Sequence ES New list of file access logs Built from file access logs previously generated
14
Overview true: ES is Legitimate False: ES is Malicious
similarityFunction(H, ES) > t Threshold t
15
similarityFunction(…)
Implementation similarityFunction(…) Markov Cluster Graph
16
Implementation History
Similarity functions (we tried different approaches)
17
History History of a user U File system logs
18
History History of a user U Time windows
19
History History of a user U Graphs
20
History History of a user U Markov Cluster Graph History VC1: VC2:
A set of Vertex Cluster
21
Implementation Still a list of event
similarityFunction(HU, ES) -> [0, 1] History HU Event Sequence ES Still a list of event
22
Implementation Split ES in Time windows (as for the history)
Then extracting Vertex Clusters Event Sequence ESW Time window W
23
Similarity Function Inside
similarityFunction(HU, ESW) -> [0, 1] History HU Event Sequence ESW How to compare H and ES?
24
Similarity Function Inside
History HU Event Sequence ESW - History: is a set of Vertex Cluster - Event Sequence: is a set of Vertex Cluster
25
Similarity Function Inside
History HU Event Sequence ESW - History: is a set of Vertex Cluster - Event Sequence: is a set of Vertex Cluster
26
Similarity Function Inside
History HU Event Sequence ESW We built 7 similarity functions based on set comparison operators e.g., equal, subset, superset
27
Similarity Function Inside
Just an example, the simplest: SimilaryByEqual(HU, ESW) { m <- 0 for all s in ESW do for all h in HU do if s == h then m <- m + 1 break return m/| ESW | } Idea: trying to understand how many elements of ESW are contained in HU Other more complex versions in the paper
28
Similarity Function Inside
Something more complex SimilaryBySubsetWeight(HU, ESW) { m <- 0 n <- 0 for all s in ESW do for all h in HU do if s isSubset h then m <- m + |s|/|h| n <- n + 1 m’ <- m/n return m’/| ESW | } Idea2.0: we propose a weighted sum between HU and ESW . Sum elements ratio Normalization Other more complex versions in the paper
29
Evaluation That’s hard to find a good dataset
WUIL: The Windows-Users and -Intruder simulations Logs dataset TWOS: A Dataset of Malicious Insider Threat Behavior Based on a Gamified Competition Both contain file system access logs
30
WUIL Dataset Legitimate user activities: from real users
Masquerader user activities: synthetic For each user: 3 sessions of malicious logs, 5 min long each (around 15 min in total) Around 70 users in total
31
TWOS Dataset Legitimate user activities: from real users
Masquerader user activities: from real users too for each user: 1 sessions of malicious logs 1 hour long Around 20 users in total User behaviors from a gamified experiment
32
Setting WUIL: time window 30s 1m 2m TWOS: time window 10m 20m 30m
5-fold cross validation
33
Setting WUIL and TWOS have labeled data User U Legitimate Masquerader
34
Setting WUIL and TWOS History (training set): only legitimate
Test (test set): masquerade and legitimate User U
35
Setting WUIL and TWOS 5-fold cross validation User U 1 2
User for making the History 3 4 5 User for making the Test M
36
Results Area Under the Curve (AUC)
Receiver Operating Characteristic Curve (ROC) Best Configuration (threshold) Efficiency Analysis (time)
37
Area Under the Curve On average per user Sim. Function used:
WUIL (2m) 0.851 TWOS (30m) On average per user Sim. Function used: Subset OR Superset w/ weight
38
Receiver Operating Characteristic Curve
On average per user Sim. Function used: Subset OR Superset w/ weight
39
Best Configuration WUIL dataset (real legitimate user activities + synthetic attacks): TWOS dataset (real legitimate user activities + real attacks) True Positive Ratio False Positive Ratio Our results 95% 9% Previous results 91.5% 11.81% True Positive Ratio False Positive Ratio Our results 91% 11% Previous results No previous results
40
Efficiency Analysis How expensive is Markov Cluster Graph algorithm?
Mean time to analyze an ES (from a list of logs to a vertex cluster) WUIL: 0.015s (2 min TW) TWOS: 0:016s (30 min TW)
41
Future works Reducing False Positive Ratie
Developing auto-tune techniques Try over different types of logs (network, SQL queries, HTTP logs)
42
THANK YOU
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.