Flavio Toffalini, Ivan Homoliak, Athul Harilal,

Detection of Masqueraders Based on Graph Partitioning of File System Access Events
Flavio Toffalini, Ivan Homoliak, Athul Harilal, Alexander Binder, and Martín Ochoa ST Electronics- SUTD Cyber Security Laboratory 39th IEEE Symposium on Security and Privacy on Workshop on Research for Insider Threats. USA, San Francisco, May 24th, 2018.

SUTD - CorpLab It is a laboratory co-founded by:
Singapore University of Technology and Design (aka SUTD) ST Electronics National Research Foundation (aka NRF) One of its projects deals with insider threats

SUTD - CorpLab TWOS: A Dataset of Malicious Insider Threat Behavior Based on a Gamified Competition + Insight into Insiders: A Survey of Insider Threat Taxonomies, Analysis, Modeling, and Countermeasures Others about host detection and continuously authentication

Agenda Our goal Attacker model Previous approaches Intuition
Markov Cluster Chain Overview Implementation Qualitative evaluation Efficiency Analysis

Our Goal We aim at catching masqueraders by using an anomaly detection system Masquerader

Attacker Model What is a masquerader? Masquerader Legitimate

Attacker Model What is a masquerader?
He is any (malicious) user who acts on behalf of a legitimate user He already bypassed previous system access controls He can exfiltrate data He can sabotage the machine

Alternative detection
Previous approaches Scenarios Logs File systems Network Login/logout … Features Extraction + Machine Learning Alternative detection “Raw data” + Deep Learning

Intuition e.g., file system access Legitimate tasks follow patterns
Masquerader tasks are different A task generates events Modeling events as graphs e.g., file system access

Intuition Users’ file system access: Timestamp Action File
05/05/ :10:05.. Open C:\File1 05/05/ :10:06.. Read C:\File2 05/05/ :10:07.. 05/05/ :11:13.. Close 05/05/ :12:05.. Delete 05/05/ :12:10.. 05/05/ :13:11.. List Dir C:\Temp\ 05/05/ :13:21.. Create C:\Temp\ToExfiltrate

Markov Cluster Graph C:\User\Alice\Documets\Project1.doc
C:\User\Alice\Documents\Prject3.doc C:\User\Alice\Documents\Project2.doc C:\User\Alice\Documents\Administration\.. . Vertex Cluster

Markov Cluster Graph What does a Vertex Cluster mean?
A set of resources (i.e., files) used to achieve a task Vertex Cluster

Overview -> 0 : ES not so similar to the history H
-> 1 : ES very similar to the history H similarityFunction(H, ES) -> [0, 1] History H Event Sequence ES New list of file access logs Built from file access logs previously generated

Overview true: ES is Legitimate False: ES is Malicious
similarityFunction(H, ES) > t Threshold t

similarityFunction(…)
Implementation similarityFunction(…) Markov Cluster Graph

Implementation History
Similarity functions (we tried different approaches)

History History of a user U File system logs

History History of a user U Time windows

History History of a user U Graphs

History History of a user U Markov Cluster Graph History VC1: VC2:
A set of Vertex Cluster

Implementation Still a list of event
similarityFunction(HU, ES) -> [0, 1] History HU Event Sequence ES Still a list of event

Implementation Split ES in Time windows (as for the history)
Then extracting Vertex Clusters Event Sequence ESW Time window W

Similarity Function Inside
similarityFunction(HU, ESW) -> [0, 1] History HU Event Sequence ESW How to compare H and ES?

History HU Event Sequence ESW - History: is a set of Vertex Cluster - Event Sequence: is a set of Vertex Cluster

History HU Event Sequence ESW We built 7 similarity functions based on set comparison operators e.g., equal, subset, superset

Just an example, the simplest: SimilaryByEqual(HU, ESW) { m <- 0 for all s in ESW do for all h in HU do if s == h then m <- m + 1 break return m/| ESW | } Idea: trying to understand how many elements of ESW are contained in HU Other more complex versions in the paper

Something more complex SimilaryBySubsetWeight(HU, ESW) { m <- 0 n <- 0 for all s in ESW do for all h in HU do if s isSubset h then m <- m + |s|/|h| n <- n + 1 m’ <- m/n return m’/| ESW | } Idea2.0: we propose a weighted sum between HU and ESW . Sum elements ratio Normalization Other more complex versions in the paper

Evaluation That’s hard to find a good dataset
WUIL: The Windows-Users and -Intruder simulations Logs dataset TWOS: A Dataset of Malicious Insider Threat Behavior Based on a Gamified Competition Both contain file system access logs

WUIL Dataset Legitimate user activities: from real users
Masquerader user activities: synthetic For each user: 3 sessions of malicious logs, 5 min long each (around 15 min in total) Around 70 users in total

TWOS Dataset Legitimate user activities: from real users
Masquerader user activities: from real users too for each user: 1 sessions of malicious logs 1 hour long Around 20 users in total User behaviors from a gamified experiment

Setting WUIL: time window 30s 1m 2m TWOS: time window 10m 20m 30m
5-fold cross validation

Setting WUIL and TWOS have labeled data User U Legitimate Masquerader

Setting WUIL and TWOS History (training set): only legitimate
Test (test set): masquerade and legitimate User U

Setting WUIL and TWOS 5-fold cross validation User U 1 2
User for making the History 3 4 5 User for making the Test M

Results Area Under the Curve (AUC)
Receiver Operating Characteristic Curve (ROC) Best Configuration (threshold) Efficiency Analysis (time)

Area Under the Curve On average per user Sim. Function used:
WUIL (2m) 0.851 TWOS (30m) On average per user Sim. Function used: Subset OR Superset w/ weight

Receiver Operating Characteristic Curve
On average per user Sim. Function used: Subset OR Superset w/ weight

Best Configuration WUIL dataset (real legitimate user activities + synthetic attacks): TWOS dataset (real legitimate user activities + real attacks) True Positive Ratio False Positive Ratio Our results 95% 9% Previous results 91.5% 11.81% True Positive Ratio False Positive Ratio Our results 91% 11% Previous results No previous results

Efficiency Analysis How expensive is Markov Cluster Graph algorithm?
Mean time to analyze an ES (from a list of logs to a vertex cluster) WUIL: 0.015s (2 min TW) TWOS: 0:016s (30 min TW)

Future works Reducing False Positive Ratie
Developing auto-tune techniques Try over different types of logs (network, SQL queries, HTTP logs)

THANK YOU 

Flavio Toffalini, Ivan Homoliak, Athul Harilal,

Similar presentations

Presentation on theme: "Flavio Toffalini, Ivan Homoliak, Athul Harilal,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Flavio Toffalini, Ivan Homoliak, Athul Harilal,

Similar presentations

Presentation on theme: "Flavio Toffalini, Ivan Homoliak, Athul Harilal,"— Presentation transcript:

Similar presentations

About project

Feedback