Presentation is loading. Please wait.

Presentation is loading. Please wait.

Detecting Insider Information Theft Using Features from File Access Logs Every action, on your phone, on your computer, online, has some risk associated.

Similar presentations


Presentation on theme: "Detecting Insider Information Theft Using Features from File Access Logs Every action, on your phone, on your computer, online, has some risk associated."— Presentation transcript:

1 Detecting Insider Information Theft Using Features from File Access Logs
Every action, on your phone, on your computer, online, has some risk associated with it. Christopher Gates, Ninghui Li, Zenglin Xu Purdue University Suresh N. Chari, Ian Molloy, Youngja Park IBM TJ Watson Research

2 Detecting Malicious File Access
Intellectual Property Theft is an important security problem Insider Threat Legitimate access In-depth knowledge of resources Knowledge of deployed security mechanisms Stolen Credentials Can utilize other persons legitimate access

3 Current Prevention Techniques
Limit exposure via access control Users need access Productivity is often seen as more important Encrypt data at rest Does not stop legitimate access Use high level statistics for detection Does not capture more fine grained detail Does not give specific guidance for violation

4 Goal Exploit knowledge about resources to detect deviation from access history Can also be viewed as estimating/controlling risks of aggregated accesses by one user Two kinds of malicious insiders Impetuous Patient

5 Our Approach Generate a score for a set of accesses given a history
Score between two files Related to all history All files in current period Normalize

6 Similarity between Files
Files are not accessed randomly within a hierarchy There are reasons to access specific areas Job function Project Related content Similarity can also have many facets Distance Access similarity File type/content source-code/file-system/web

7 Distance Score Functions
Name Formula Binary Full Distance Lowest Common Ancestor (LCA) Log LCA Access Similarity Binary – exact match. Good if there is high overlap for files in previous time periods (like source code) Full Distance – when similarity up both sides of the branch matters LCA – when distance to an accessed branch is useful LogLCA – to penalize things closer to the root differently then deeply nested in the hierarchy Access – Can capture similarity for other reasons, so source code and documents in otherwise unrelated areas of the hierarchy can still be similar based on the user overlap

8 Aggregation Function 3 aggregation functions :
Relates f to all files in the history min : The lowest ave : Average all k-nearest : Compares to k lowest

9 Data CMVC Source Code Management System
Log data: [user, timestamp, action, resource] For evaluation we used 1 year of log data ~512k unique files ~133k unique directories ~2k users 1 period to bootstrap, 10 to train, 1 to test. Configuration Management Verion Control

10 Self Similarity Check a users current access against their history
Simple Easy to understand Detects deviations from past behavior

11 Adversary This can catch an impetuous attacker.
Patient adversary can seed file accesses in previous time periods to affect similarity of distance based scores

12 Similarity Between Users
Gives a relation of expected behavior across all profiles. Malicious user can only affect their own history. user1 u1Score u2Score uNScore user2 userN

13 Features to Find Anomalies
Description Unique File Count Main technique currently used in practice New Unique File Count Binary Method, new unique in window Average Similarity Score LogLCA Self Score values, [0,1] Sum Similarity Score LogLCA Sum Score values Mean Distance - Find a single point in to summarize previous periods over similarity between user features. - Use cosine similarity to find distance between the current point and the expected point. Mean Distance * New Unique Since the goal is to detect theft of files, and mean distance doesn’t have a feature to represent the number of files accessed, we combine the mean distance by the number of new unique files.

14 Exposure to Data

15 Values for Self Scores

16 Self Identification Performance

17 Generating Malicious Behavior
No ground truth data for malicious behavior Generate simulated attacks by injecting directories Represents targeted attacks on specific data Three size ranges for the injection : 10 unique attacks : 12 unique attacks 5000+ : 2 unique attacks Inject in two ways Impetous Attacker : Inject X accesses in current period Patient Attacker : Seed the current users history with files from the injection, then inject

18 Impetuous Attacker

19 Impetuous Attacker

20 Patient Attacker Injecting

21 Patient Attacker Injecting

22 Discussion: How to Present to Users
Similarity scores may help communicating events Better detection of truly anomalous activity Go beyond simple file counts Create a ranking of most anomalous users Better understanding of what is causing the score Ranking the files that a user is accessing Allows for an incident response team to more quickly understand why a user is received a high score

23 Summary Explored using file similarity features to identify malicious insiders Evaluated with real access logs and synthetic attacks


Download ppt "Detecting Insider Information Theft Using Features from File Access Logs Every action, on your phone, on your computer, online, has some risk associated."

Similar presentations


Ads by Google