Download presentation
Presentation is loading. Please wait.
Published byPhyllis Lee Modified over 8 years ago
1
CMU SCS Anomaly Detection in Large Graphs Christos Faloutsos CMU
2
CMU SCS Thank you! Prof. Bill Scherlis Sharon Blazevich hotSoS, 2016(c) 2016, C. Faloutsos 2
3
CMU SCS (c) 2016, C. Faloutsos 3 Roadmap Introduction – Motivation –Why study (big) graphs? Part#1: Patterns in graphs Part#2: time-evolving graphs; tensors Conclusions hotSoS, 2016
4
CMU SCS (c) 2016, C. Faloutsos 4 Graphs - why should we care? Internet Map [lumeta.com] hotSoS, 2016 computer network security: Email traffic IP traffic (src, dst, dst-port, t) Malware propagation (machine-id, infected-file-id)
5
CMU SCS (c) 2016, C. Faloutsos 5 Graphs - why should we care? >$10B; ~1B users hotSoS, 2016
6
CMU SCS (c) 2016, C. Faloutsos 6 Graphs - why should we care? hotSoS, 2016 U Kang, Jay-Yoon Lee, Danai Koutra, and Christos Faloutsos. Net-Ray: Visualizing and Mining Billion-Scale Graphs PAKDD 2014, Tainan, Taiwan. ~1B nodes (web sites) ~6B edges (http links) ‘YahooWeb graph’
7
CMU SCS (c) 2016, C. Faloutsos 7 Graphs - why should we care? web-log (‘blog’) news propagation Recommendation systems Documents & terms (and plagiarism; fake appstore reviews,....) Many-to-many db relationship -> graph hotSoS, 2016
8
CMU SCS Motivating problems P1: patterns? Fraud detection? P2: patterns in time-evolving graphs / tensors hotSoS, 2016(c) 2016, C. Faloutsos 8 time source destination
9
CMU SCS Motivating problems P1: patterns? Fraud detection? P2: patterns in time-evolving graphs / tensors hotSoS, 2016(c) 2016, C. Faloutsos 9 time source destination Patterns anomalies
10
CMU SCS (c) 2016, C. Faloutsos 10 Roadmap Introduction – Motivation –Why study (big) graphs? Part#1: Patterns & fraud detection Part#2: time-evolving graphs; tensors Conclusions hotSoS, 2016
11
CMU SCS hotSoS, 2016(c) 2016, C. Faloutsos 11 Part 1: Patterns, & fraud detection
12
CMU SCS (c) 2016, C. Faloutsos 12 Laws and patterns Q1: Are real graphs random? hotSoS, 2016
13
CMU SCS (c) 2016, C. Faloutsos 13 Laws and patterns Q1: Are real graphs random? A1: NO!! –Diameter (‘6 degrees’; ‘Kevin Bacon’) –in- and out- degree distributions –other (surprising) patterns So, let’s look at the data hotSoS, 2016
14
CMU SCS (c) 2016, C. Faloutsos 14 Solution# S.1 Power law in the degree distribution [Faloutsos x 3 SIGCOMM99] log(rank) log(degree) internet domains att.com ibm.com hotSoS, 2016
15
CMU SCS (c) 2016, C. Faloutsos 15 Solution# S.1 Power law in the degree distribution [Faloutsos x 3 SIGCOMM99; + Siganos] log(rank) log(degree) -0.82 internet domains att.com ibm.com hotSoS, 2016
16
CMU SCS 16 S2: connected component sizes Connected Components – 4 observations: Size Count (c) 2016, C. Faloutsos hotSoS, 2016 1.4B nodes 6B edges
17
CMU SCS 17 S2: connected component sizes Connected Components Size Count (c) 2016, C. Faloutsos hotSoS, 2016 1) 10K x larger than next
18
CMU SCS 18 S2: connected component sizes Connected Components Size Count (c) 2016, C. Faloutsos hotSoS, 2016 2) ~0.7B singleton nodes
19
CMU SCS 19 S2: connected component sizes Connected Components Size Count (c) 2016, C. Faloutsos hotSoS, 2016 3) SLOPE!
20
CMU SCS 20 S2: connected component sizes Connected Components Size Count 300-size cmpt X 500. Why? 1100-size cmpt X 65. Why? (c) 2016, C. Faloutsos hotSoS, 2016 4) Spikes!
21
CMU SCS 21 S2: connected component sizes Connected Components Size Count suspicious financial-advice sites (not existing now) (c) 2016, C. Faloutsos hotSoS, 2016
22
CMU SCS (c) 2016, C. Faloutsos 22 Roadmap Introduction – Motivation Part#1: Patterns in graphs –Patterns: Degree; Triangles –Anomaly/fraud detection Part#2: time-evolving graphs; tensors Conclusions hotSoS, 2016
23
CMU SCS (c) 2016, C. Faloutsos 23 Solution# S.3: Triangle ‘Laws’ Real social networks have a lot of triangles hotSoS, 2016
24
CMU SCS (c) 2016, C. Faloutsos 24 Solution# S.3: Triangle ‘Laws’ Real social networks have a lot of triangles –Friends of friends are friends Any patterns? –2x the friends, 2x the triangles ? hotSoS, 2016
25
CMU SCS (c) 2016, C. Faloutsos 25 Triangle Law: #S.3 [Tsourakakis ICDM 2008] SNReuters Epinions X-axis: degree Y-axis: mean # triangles n friends -> ~n 1.6 triangles hotSoS, 2016
26
CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 26 hotSoS, 2016 26 (c) 2016, C. Faloutsos
27
CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 27 hotSoS, 2016 27 (c) 2016, C. Faloutsos
28
CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 28 hotSoS, 2016 28 (c) 2016, C. Faloutsos
29
CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 29 hotSoS, 2016 29 (c) 2016, C. Faloutsos
30
CMU SCS MORE Graph Patterns hotSoS, 2016(c) 2016, C. Faloutsos 30 ✔ ✔ ✔ RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos. PKDD’09.
31
CMU SCS MORE Graph Patterns hotSoS, 2016(c) 2016, C. Faloutsos 31 Mary McGlohon, Leman Akoglu, Christos Faloutsos. Statistical Properties of Social Networks. in "Social Network Data Analytics” (Ed.: Charu Aggarwal) Deepayan Chakrabarti and Christos Faloutsos, Graph Mining: Laws, Tools, and Case Studies Oct. 2012, Morgan Claypool. Graph Mining: Laws, Tools, and Case Studies
32
CMU SCS (c) 2016, C. Faloutsos 32 Roadmap Introduction – Motivation Part#1: Patterns in graphs –Patterns –Anomaly / fraud detection CopyCatch Spectral methods (‘fBox’) Belief Propagation Part#2: time-evolving graphs; tensors Conclusions hotSoS, 2016 Patterns anomalies
33
CMU SCS Fraud Given –Who ‘likes’ what page, and when Find –Suspicious users and suspicious products hotSoS, 2016(c) 2016, C. Faloutsos 33 CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks, Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Christos Faloutsos WWW, 2013.
34
CMU SCS Fraud Given –Who ‘likes’ what page, and when Find –Suspicious users and suspicious products hotSoS, 2016(c) 2016, C. Faloutsos 34 CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks, Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Christos Faloutsos WWW, 2013. Likes
35
CMU SCS Our intuition ▪ Lockstep behavior: Same Likes, same time Graph Patterns and Lockstep Behavior Likes hotSoS, 2016 35 (c) 2016, C. Faloutsos
36
CMU SCS Our intuition ▪ Lockstep behavior: Same Likes, same time Graph Patterns and Lockstep Behavior Likes hotSoS, 2016 36 (c) 2016, C. Faloutsos
37
CMU SCS Our intuition ▪ Lockstep behavior: Same Likes, same time Graph Patterns and Lockstep Behavior Suspicious Lockstep Behavior Likes hotSoS, 2016 37 (c) 2016, C. Faloutsos
38
CMU SCS MapReduce Overview ▪ Use Hadoop to search for many clusters in parallel: 1. Start with randomly seed 2. Update set of Pages and center Like times for each cluster 3. Repeat until convergence Likes hotSoS, 2016 38 (c) 2016, C. Faloutsos
39
CMU SCS Deployment at Facebook ▪ CopyCatch runs regularly (along with many other security mechanisms, and a large Site Integrity team) 3 months of CopyCatch @ Facebook #users caught time hotSoS, 2016 39 (c) 2016, C. Faloutsos
40
CMU SCS Deployment at Facebook Manually labeled 22 randomly selected clusters from February 2013 Most clusters (77%) come from real but compromised users Fake acct hotSoS, 2016 40 (c) 2016, C. Faloutsos
41
CMU SCS (c) 2016, C. Faloutsos 41 Roadmap Introduction – Motivation Part#1: Patterns in graphs –Patterns –Anomaly / fraud detection CopyCatch Spectral methods (‘fBox’) Belief Propagation Part#2: time-evolving graphs; tensors Conclusions hotSoS, 2016
42
CMU SCS (c) 2016, C. Faloutsos 42 Problem: Social Network Link Fraud hotSoS, 2016 Target: find “stealthy” attackers missed by other algorithms Clique Bipartite core 41.7M nodes 1.5B edges
43
CMU SCS (c) 2016, C. Faloutsos 43 Problem: Social Network Link Fraud hotSoS, 2016 Neil Shah, Alex Beutel, Brian Gallagher and Christos Faloutsos. Spotting Suspicious Link Behavior with fBox: An Adversarial Perspective. ICDM 2014, Shenzhen, China. Target: find “stealthy” attackers missed by other algorithms Takeaway: use reconstruction error between true/latent representation!
44
CMU SCS (c) 2016, C. Faloutsos 44 Roadmap Introduction – Motivation Part#1: Patterns in graphs –Patterns –Anomaly / fraud detection CopyCatch Spectral methods (‘fBox’) Belief Propagation Part#2: time-evolving graphs; tensors Conclusions hotSoS, 2016
45
CMU SCS hotSoS, 2016(c) 2016, C. Faloutsos 45 E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU [www’07]
46
CMU SCS hotSoS, 2016(c) 2016, C. Faloutsos 46 E-bay Fraud detection
47
CMU SCS hotSoS, 2016(c) 2016, C. Faloutsos 47 E-bay Fraud detection
48
CMU SCS hotSoS, 2016(c) 2016, C. Faloutsos 48 E-bay Fraud detection - NetProbe
49
CMU SCS Popular press And less desirable attention: E-mail from ‘Belgium police’ (‘copy of your code?’) hotSoS, 2016(c) 2016, C. Faloutsos 49
50
CMU SCS (c) 2016, C. Faloutsos 50 Roadmap Introduction – Motivation Part#1: Patterns in graphs –Patterns –Anomaly / fraud detection CopyCatch Spectral methods (‘fBox’) Belief Propagation; antivirus app Part#2: time-evolving graphs; tensors Conclusions hotSoS, 2016
51
CMU SCS Polo Chau Machine Learning Dept Carey Nachenberg Vice President & Fellow Jeffrey Wilhelm Principal Software Engineer Adam Wright Software Engineer Prof. Christos Faloutsos Computer Science Dept Polonium: Tera-Scale Graph Mining and Inference for Malware Detection PATENT PENDING SDM 2011, Mesa, Arizona
52
CMU SCS Polonium: The Data 60+ terabytes of data anonymously contributed by participants of worldwide Norton Community Watch program 50+ million machines 900+ million executable files Constructed a machine-file bipartite graph (0.2 TB+) 1 billion nodes (machines and files) 37 billion edges hotSoS, 2016 52 (c) 2016, C. Faloutsos
53
CMU SCS Polonium: Key Ideas Use Belief Propagation to propagate domain knowledge in machine-file graph to detect malware Use “guilt-by-association” (i.e., homophily) –E.g., files that appear on machines with many bad files are more likely to be bad Scalability: handles 37 billion-edge graph hotSoS, 2016 53 (c) 2016, C. Faloutsos
54
CMU SCS Polonium: One-Interaction Results 84.9% True Positive Rate 1% False Positive Rate True Positive Rate % of malware correctly identified 54 Ideal hotSoS, 2016(c) 2016, C. Faloutsos False Positive Rate % of non-malware wrongly labeled as malware
55
CMU SCS (c) 2016, C. Faloutsos 55 Roadmap Introduction – Motivation Part#1: Patterns in graphs Part#2: time-evolving graphs; tensors Conclusions hotSoS, 2016
56
CMU SCS hotSoS, 2016(c) 2016, C. Faloutsos 56 Part 2: Time evolving graphs; tensors
57
CMU SCS Graphs over time -> tensors! Problem #2: –Given who calls whom, and when –Find patterns / anomalies hotSoS, 2016(c) 2016, C. Faloutsos 57 smith johnson
58
CMU SCS Graphs over time -> tensors! Problem #2: –Given who calls whom, and when –Find patterns / anomalies hotSoS, 2016(c) 2016, C. Faloutsos 58
59
CMU SCS Graphs over time -> tensors! Problem #2: –Given who calls whom, and when –Find patterns / anomalies hotSoS, 2016(c) 2016, C. Faloutsos 59 Mon Tue
60
CMU SCS Graphs over time -> tensors! Problem #2: –Given who calls whom, and when –Find patterns / anomalies hotSoS, 2016(c) 2016, C. Faloutsos 60 callee caller time
61
CMU SCS Graphs over time -> tensors! Problem #2’: –Given author-keyword-date –Find patterns / anomalies hotSoS, 2016(c) 2016, C. Faloutsos 61 keyword author date MANY more settings, with >2 ‘modes’
62
CMU SCS Graphs over time -> tensors! Problem #2’’: –Given subject – verb – object facts –Find patterns / anomalies hotSoS, 2016(c) 2016, C. Faloutsos 62 object subject verb MANY more settings, with >2 ‘modes’
63
CMU SCS Graphs over time -> tensors! Problem #2’’’: –Given –Find patterns / anomalies hotSoS, 2016(c) 2016, C. Faloutsos 63 mode2 mode1 mode3 MANY more settings, with >2 ‘modes’ (and 4, 5, etc modes)
64
CMU SCS (c) 2016, C. Faloutsos 64 Roadmap Introduction – Motivation Part#1: Patterns in graphs Part#2: time-evolving graphs; tensors –Intro to tensors –Results –Speed Conclusions hotSoS, 2016
65
CMU SCS Answer to both: tensor factorization Recall: (SVD) matrix factorization: finds blocks hotSoS, 2016(c) 2016, C. Faloutsos 65 N users M products ‘meat-eaters’ ‘steaks’ ‘vegetarians’ ‘plants’ ‘kids’ ‘cookies’ ~ + +
66
CMU SCS Answer to both: tensor factorization PARAFAC decomposition hotSoS, 2016(c) 2016, C. Faloutsos 66 = + + subject object verb politicians artistsathletes
67
CMU SCS Answer: tensor factorization PARAFAC decomposition Results for who-calls-whom-when –4M x 15 days hotSoS, 2016(c) 2016, C. Faloutsos 67 = + + caller callee time ??
68
CMU SCS Anomaly detection in time- evolving graphs Anomalous communities in phone call data: –European country, 4M clients, data over 2 weeks ~200 calls to EACH receiver on EACH day! 1 caller5 receivers4 days of activity hotSoS, 2016 68 (c) 2016, C. Faloutsos =
69
CMU SCS Anomaly detection in time- evolving graphs Anomalous communities in phone call data: –European country, 4M clients, data over 2 weeks ~200 calls to EACH receiver on EACH day! hotSoS, 2016 69 (c) 2016, C. Faloutsos = 1 caller5 receivers4 days of activity
70
CMU SCS Anomaly detection in time- evolving graphs Anomalous communities in phone call data: –European country, 4M clients, data over 2 weeks ~200 calls to EACH receiver on EACH day! hotSoS, 2016 70 (c) 2016, C. Faloutsos = Miguel Araujo, Spiros Papadimitriou, Stephan Günnemann, Christos Faloutsos, Prithwish Basu, Ananthram Swami, Evangelos Papalexakis, Danai Koutra. Com2: Fast Automatic Discovery of Temporal (Comet) Communities. PAKDD 2014, Tainan, Taiwan.
71
CMU SCS (c) 2016, C. Faloutsos 71 Roadmap Introduction – Motivation Part#1: Patterns in graphs Part#2: time-evolving graphs; tensors –Tensors for intrusion detection –Inter-arrival times Conclusions hotSoS, 2016
72
CMU SCS Honeypots MalSpot: Multi2 Malicious Network Behavior Patterns Analysis Ching-Hao Mao, Chung-Jung Wu, Kuo-Chen Lee, Evangelos E. Papalexakis, Christos Faloutsos, and Tien-Cheu Kao, PAKDD’14, Tainan, TW hotSoS, 2016(c) 2016, C. Faloutsos 72
73
CMU SCS Data The typical: –IP source –IP destination (honeypot) –Operation id (eg., ssh, ftp) –timestamp hotSoS, 2016(c) 2016, C. Faloutsos 73 128.1.1.1 128.3.3.3 128.2.2.2 ftp ssh 128.4.4.4
74
CMU SCS Preliminary analysis/visualization hotSoS, 2016(c) 2016, C. Faloutsos 74 Op-id Honeypot ‘A’Honeypot ‘B’Honeypot ‘C’ time
75
CMU SCS Tensor analysis hotSoS, 2016(c) 2016, C. Faloutsos 75 1 st latent variable 2 nd latent variable attacker, brute-force on POP3 (port 110) attacker on port 25 =
76
CMU SCS (c) 2016, C. Faloutsos 76 Roadmap Introduction – Motivation –Why study (big) graphs? Part#1: Patterns in graphs Part#2: time-evolving graphs; tensors –Inter-arrival time patterns Acknowledgements and Conclusions hotSoS, 2016
77
CMU SCS RSC: Mining and Modeling Temporal Activity in Social Media Alceu F. Costa * Yuto Yamaguchi Agma J. M. Traina Caetano Traina Jr. Christos Faloutsos Universidade de São Paulo KDD 2015 – Sydney, Australia * alceufc@icmc.usp.br
78
CMU SCS Reddit Dataset Time-stamp from comments 21,198 users 20 Million time-stamps Twitter Dataset Time-stamp from tweets 6,790 users 16 Million time-stamps Pattern Mining: Datasets For each user we have: Sequence of postings time-stamps: T = (t 1, t 2, t 3, …) Inter-arrival times (IAT) of postings: (∆ 1, ∆ 2, ∆ 3, …) 78 t1t1 t2t2 t3t3 t4t4 ∆1∆1 ∆2∆2 ∆3∆3 time hotSoS, 2016(c) 2016, C. Faloutsos
79
CMU SCS Pattern Mining Pattern 1: Distribution of IAT is heavy-tailed Users can be inactive for long periods of time before making new postings IAT Complementary Cumulative Distribution Function (CCDF) (log-log axis) 79 Reddit UsersTwitter Users hotSoS, 2016(c) 2016, C. Faloutsos
80
CMU SCS Pattern Mining Pattern 2: Bimodal IAT distribution Users have highly active sections and resting periods Log-binned histogram of postings IAT 80 Twitter Users 1 st Mode (1min) 2 nd Mode (3h) hotSoS, 2016(c) 2016, C. Faloutsos
81
CMU SCS Human? Robots? log linear hotSoS, 2016(c) 2016, C. Faloutsos 81
82
CMU SCS Human? Robots? log linear 2’ 3h 1day hotSoS, 2016(c) 2016, C. Faloutsos 82
83
CMU SCS Experiments: Can RSC-Spotter Detect Bots? Precision vs. Sensitivity Curves Good performance: curve close to the top 83 Precision > 94% Sensitivity > 70% With strongly imbalanced datasets # humans >> # bots Precision > 94% Sensitivity > 70% With strongly imbalanced datasets # humans >> # bots Twitter hotSoS, 2016(c) 2016, C. Faloutsos
84
CMU SCS Experiments: Can RSC-Spotter Detect Bots? Precision vs. Sensitivity Curves Good performance: curve close to the top 84 Precision > 96% Sensitivity > 47% With strongly imbalanced datasets # humans >> # bots Precision > 96% Sensitivity > 47% With strongly imbalanced datasets # humans >> # bots Reddit hotSoS, 2016(c) 2016, C. Faloutsos
85
CMU SCS Work under progress Active Directory log data From a large Asian institution (user-id, operation-id, timestamp) hotSoS, 2016(c) 2016, C. Faloutsos 85 Hemank Lamba
86
CMU SCS 30sec, 1min and 2min. - > scripts/attacks? IAT (log scale) PDF (log scale)
87
CMU SCS (c) 2016, C. Faloutsos 87 Roadmap Introduction – Motivation –Why study (big) graphs? Part#1: Patterns in graphs Part#2: time-evolving graphs; tensors Acknowledgements and Conclusions hotSoS, 2016
88
CMU SCS (c) 2016, C. Faloutsos 88 Thanks hotSoS, 2016 Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC ; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLab Disclaimer: All opinions are mine; not necessarily reflecting the opinions of the funding agencies
89
CMU SCS (c) 2016, C. Faloutsos 89 Cast Akoglu, Leman Chau, Polo Kang, U Prakash, Aditya hotSoS, 2016 Koutra, Danai Beutel, Alex Papalexakis, Vagelis Shah, Neil Lee, Jay Yoon Araujo, Miguel
90
CMU SCS (c) 2016, C. Faloutsos 90 CONCLUSION#1 – Big data Patterns Anomalies Large datasets reveal patterns/outliers that are invisible otherwise hotSoS, 2016
91
CMU SCS (c) 2016, C. Faloutsos 91 CONCLUSION#2 – tensors powerful tool hotSoS, 2016 = 1 caller5 receivers4 days of activity
92
CMU SCS (c) 2016, C. Faloutsos 92 References D. Chakrabarti, C. Faloutsos: Graph Mining – Laws, Tools and Case Studies, Morgan Claypool 2012 http://www.morganclaypool.com/doi/abs/10.2200/S004 49ED1V01Y201209DMK006 hotSoS, 2016
93
CMU SCS TAKE HOME MESSAGE: Cross-disciplinarity hotSoS, 2016(c) 2016, C. Faloutsos 93 =
94
CMU SCS Cross-disciplinarity hotSoS, 2016(c) 2016, C. Faloutsos 94 = Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.