Flavio Toffalini, Ivan Homoliak, Athul Harilal,

Slides:



Advertisements
Similar presentations
Validating EMR Audit Automation Carl A. Gunter University of Illinois Accountable Systems Workshop.
Advertisements

Imbalanced data David Kauchak CS 451 – Fall 2013.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
1. AGENDA History. WHAT’S AN IDS? Security and Roles Types of Violations. Types of Detection Types of IDS. IDS issues. Application.
Data Mining and Intrusion Detection
Intrusion Detection and Containment in Database Systems Abhijit Bhosale M.Tech (IT) School of Information Technology, IIT Kharagpur.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010.
Intrusion detection Anomaly detection models: compare a user’s normal behavior statistically to parameters of the current session, in order to find significant.
Mining Behavior Models Wenke Lee College of Computing Georgia Institute of Technology.
Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By.
Query Log Analysis Naama Kraus Slides are based on the papers: Andrei Broder, A taxonomy of web search Ricardo Baeza-Yates, Graphs from Search Engine Queries.
Scientific Computing Department Faculty of Computer and Information Sciences Ain Shams University Supervised By: Mohammad F. Tolba Mohammad S. Abdel-Wahab.
WAC/ISSCI Automated Anomaly Detection Using Time-Variant Normal Profiling Jung-Yeop Kim, Utica College Rex E. Gantenbein, University of Wyoming.
Today Evaluation Measures Accuracy Significance Testing
Automated malware classification based on network behavior
Intrusion and Anomaly Detection in Network Traffic Streams: Checking and Machine Learning Approaches ONR MURI area: High Confidence Real-Time Misuse and.
Lucent Technologies – Proprietary Use pursuant to company instruction Learning Sequential Models for Detecting Anomalous Protocol Usage (work in progress)
Alert Correlation for Extracting Attack Strategies Authors: B. Zhu and A. A. Ghorbani Source: IJNS review paper Reporter: Chun-Ta Li ( 李俊達 )
Masquerade Detection Mark Stamp 1Masquerade Detection.
Fast Portscan Detection Using Sequential Hypothesis Testing Authors: Jaeyeon Jung, Vern Paxson, Arthur W. Berger, and Hari Balakrishnan Publication: IEEE.
Outlier Detection Using k-Nearest Neighbour Graph Ville Hautamäki, Ismo Kärkkäinen and Pasi Fränti Department of Computer Science University of Joensuu,
1 Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Speaker: Jun-Yi Zheng 2010/03/29.
Data Analysis 1 Mark Stamp. Topics  Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc.  Accuracy o.
Using Identity Credential Usage Logs to Detect Anomalous Service Accesses Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of.
Copyright © 2015 KDDI R&D Labs. Inc. All Rights Reserved
One-class Training for Masquerade Detection Ke Wang, Sal Stolfo Columbia University Computer Science IDS Lab.
Automatically Generating Models for Botnet Detection Presenter: 葉倚任 Authors: Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel,
CpSc 810: Machine Learning Evaluation of Classifier.
Mapping Internet Sensors with Probe Response Attacks Authors: John Bethencourt, Jason Franklin, Mary Vernon Published At: Usenix Security Symposium, 2005.
Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.
Evaluating Network Security with Two-Layer Attack Graphs Anming Xie Zhuhua Cai Cong Tang Jianbin Hu Zhong Chen ACSAC (Dec., 2009) 2010/6/151.
Data Mining BS/MS Project Anomaly Detection for Cyber Security Presentation by Mike Calder.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
Secure Unlocking of Mobile Touch Screen Devices by Simple Gestures – You can see it but you can not do it Muhammad Shahzad, Alex X. Liu Michigan State.
Crowd Fraud Detection in Internet Advertising Tian Tian 1 Jun Zhu 1 Fen Xia 2 Xin Zhuang 2 Tong Zhang 2 Tsinghua University 1 Baidu Inc. 2 1.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Wire Detection Version 2 Joshua Candamo Friday, February 29, 2008.
Active Learning for Network Intrusion Detection ACM CCS 2009 Nico Görnitz, Technische Universität Berlin Marius Kloft, Technische Universität Berlin Konrad.
1. ABSTRACT Information access through Internet provides intruders various ways of attacking a computer system. Establishment of a safe and strong network.
Palindrome Technologies all rights reserved © 2016 – PG: Palindrome Technologies all rights reserved © 2016 – PG: 1 Peter Thermos President & CTO Tel:
DOWeR Detecting Outliers in Web Service Requests Master’s Presentation of Christian Blass.
Some Great Open Source Intrusion Detection Systems (IDSs)
Written by Qiang Cao, Xiaowei Yang, Jieqi Yu and Christopher Palow
Experience Report: System Log Analysis for Anomaly Detection
TriggerScope: Towards Detecting Logic Bombs in Android Applications
Predicting Interface Failures For Better Traffic Management.
Intrusion Control.
Security of Grid Computing Environments
Written by Qiang Cao, Xiaowei Yang, Jieqi Yu and Christopher Palow
Outline Introduction Characteristics of intrusion detection systems
Evaluating a Real-time Anomaly-based IDS
Improving Digest-Based Collaborative Spam Detection
Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel
Generalization ..
Unknown Malware Detection Using Network Traffic Classification
Detecting Insider Information Theft Using Features from File Access Logs Every action, on your phone, on your computer, online, has some risk associated.
Data Mining Practical Machine Learning Tools and Techniques
A survey of network anomaly detection techniques
iSRD Spam Review Detection with Imbalanced Data Distributions
Department of Electrical Engineering
Intro to Machine Learning
Machine Learning in Practice Lecture 27
Jia-Bin Huang Virginia Tech
MAPO: Mining and Recommending API Usage Patterns
Modeling IDS using hybrid intelligent systems
Anomalous Database Transaction Detection
PCAV: Evaluation of Parallel Coordinates Attack Visualization
Presentation transcript:

Detection of Masqueraders Based on Graph Partitioning of File System Access Events Flavio Toffalini, Ivan Homoliak, Athul Harilal, Alexander Binder, and Martín Ochoa ST Electronics- SUTD Cyber Security Laboratory 39th IEEE Symposium on Security and Privacy on Workshop on Research for Insider Threats. USA, San Francisco, May 24th, 2018.

SUTD - CorpLab It is a laboratory co-founded by: Singapore University of Technology and Design (aka SUTD) ST Electronics National Research Foundation (aka NRF) One of its projects deals with insider threats

SUTD - CorpLab TWOS: A Dataset of Malicious Insider Threat Behavior Based on a Gamified Competition (MIST-CSS@2017 + JoWUA@2018) Insight into Insiders: A Survey of Insider Threat Taxonomies, Analysis, Modeling, and Countermeasures (preprint@arxiv.org) Others about host detection and continuously authentication

Agenda Our goal Attacker model Previous approaches Intuition Markov Cluster Chain Overview Implementation Qualitative evaluation Efficiency Analysis

Our Goal We aim at catching masqueraders by using an anomaly detection system Masquerader

Attacker Model What is a masquerader? Masquerader Legitimate

Attacker Model What is a masquerader? He is any (malicious) user who acts on behalf of a legitimate user He already bypassed previous system access controls He can exfiltrate data He can sabotage the machine

Alternative detection Previous approaches Scenarios Logs File systems Network Login/logout Email … Features Extraction + Machine Learning Alternative detection “Raw data” + Deep Learning

Intuition e.g., file system access Legitimate tasks follow patterns Masquerader tasks are different A task generates events Modeling events as graphs e.g., file system access

Intuition Users’ file system access: Timestamp Action File 05/05/2018 10:10:05.. Open C:\File1 05/05/2018 10:10:06.. Read C:\File2 05/05/2018 10:10:07.. 05/05/2018 10:11:13.. Close 05/05/2018 10:12:05.. Delete 05/05/2018 10:12:10.. 05/05/2018 10:13:11.. List Dir C:\Temp\ 05/05/2018 10:13:21.. Create C:\Temp\ToExfiltrate

Markov Cluster Graph C:\User\Alice\Documets\Project1.doc C:\User\Alice\Documents\Prject3.doc C:\User\Alice\Documents\Project2.doc C:\User\Alice\Documents\Administration\.. . Vertex Cluster

Markov Cluster Graph What does a Vertex Cluster mean? A set of resources (i.e., files) used to achieve a task Vertex Cluster

Overview -> 0 : ES not so similar to the history H -> 1 : ES very similar to the history H similarityFunction(H, ES) -> [0, 1] History H Event Sequence ES New list of file access logs Built from file access logs previously generated

Overview true: ES is Legitimate False: ES is Malicious similarityFunction(H, ES) > t Threshold t

similarityFunction(…) Implementation similarityFunction(…) Markov Cluster Graph

Implementation History Similarity functions (we tried different approaches)

History History of a user U File system logs

History History of a user U Time windows

History History of a user U Graphs

History History of a user U Markov Cluster Graph History VC1: VC2: A set of Vertex Cluster

Implementation Still a list of event similarityFunction(HU, ES) -> [0, 1] History HU Event Sequence ES Still a list of event

Implementation Split ES in Time windows (as for the history) Then extracting Vertex Clusters Event Sequence ESW Time window W

Similarity Function Inside similarityFunction(HU, ESW) -> [0, 1] History HU Event Sequence ESW How to compare H and ES?

Similarity Function Inside History HU Event Sequence ESW - History: is a set of Vertex Cluster - Event Sequence: is a set of Vertex Cluster

Similarity Function Inside History HU Event Sequence ESW - History: is a set of Vertex Cluster - Event Sequence: is a set of Vertex Cluster

Similarity Function Inside History HU Event Sequence ESW We built 7 similarity functions based on set comparison operators e.g., equal, subset, superset

Similarity Function Inside Just an example, the simplest: SimilaryByEqual(HU, ESW) { m <- 0 for all s in ESW do for all h in HU do if s == h then m <- m + 1 break return m/| ESW | } Idea: trying to understand how many elements of ESW are contained in HU Other more complex versions in the paper

Similarity Function Inside Something more complex SimilaryBySubsetWeight(HU, ESW) { m <- 0 n <- 0 for all s in ESW do for all h in HU do if s isSubset h then m <- m + |s|/|h| n <- n + 1 m’ <- m/n return m’/| ESW | } Idea2.0: we propose a weighted sum between HU and ESW . Sum elements ratio Normalization Other more complex versions in the paper

Evaluation That’s hard to find a good dataset WUIL: The Windows-Users and -Intruder simulations Logs dataset TWOS: A Dataset of Malicious Insider Threat Behavior Based on a Gamified Competition Both contain file system access logs

WUIL Dataset Legitimate user activities: from real users Masquerader user activities: synthetic For each user: 3 sessions of malicious logs, 5 min long each (around 15 min in total) Around 70 users in total

TWOS Dataset Legitimate user activities: from real users Masquerader user activities: from real users too for each user: 1 sessions of malicious logs 1 hour long Around 20 users in total User behaviors from a gamified experiment

Setting WUIL: time window 30s 1m 2m TWOS: time window 10m 20m 30m 5-fold cross validation

Setting WUIL and TWOS have labeled data User U Legitimate Masquerader

Setting WUIL and TWOS History (training set): only legitimate Test (test set): masquerade and legitimate User U

Setting WUIL and TWOS 5-fold cross validation User U 1 2 User for making the History 3 4 5 User for making the Test M

Results Area Under the Curve (AUC) Receiver Operating Characteristic Curve (ROC) Best Configuration (threshold) Efficiency Analysis (time)

Area Under the Curve On average per user Sim. Function used: WUIL (2m) 0.944 0.851 TWOS (30m) On average per user Sim. Function used: Subset OR Superset w/ weight

Receiver Operating Characteristic Curve On average per user Sim. Function used: Subset OR Superset w/ weight

Best Configuration WUIL dataset (real legitimate user activities + synthetic attacks): TWOS dataset (real legitimate user activities + real attacks) True Positive Ratio False Positive Ratio Our results 95% 9% Previous results 91.5% 11.81% True Positive Ratio False Positive Ratio Our results 91% 11% Previous results No previous results

Efficiency Analysis How expensive is Markov Cluster Graph algorithm? Mean time to analyze an ES (from a list of logs to a vertex cluster) WUIL: 0.015s (2 min TW) TWOS: 0:016s (30 min TW)

Future works Reducing False Positive Ratie Developing auto-tune techniques Try over different types of logs (network, SQL queries, HTTP logs)

THANK YOU 