Download presentation
Presentation is loading. Please wait.
Published byThomasina Little Modified over 8 years ago
1
CMU SCS Mining Large Graphs: Fraud Detection, and Algorithms Christos Faloutsos CMU
2
CMU SCS (c) 2015, C. Faloutsos 2 Graphs - why should we care? >$10B; ~1B users CMU Cylab
3
CMU SCS (c) 2015, C. Faloutsos 3 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] CMU Cylab
4
CMU SCS (c) 2015, C. Faloutsos 4 Graphs - why should we care? web-log (‘blog’) news propagation computer network security: email/IP traffic and anomaly detection Recommendation systems.... Many-to-many db relationship -> graph CMU Cylab
5
CMU SCS Motivating problems P1: patterns? Fraud detection? P2: patterns in time-evolving graphs / tensors CMU Cylab(c) 2015, C. Faloutsos 5 time source destination
6
CMU SCS (c) 2015, C. Faloutsos 6 Roadmap Introduction – Motivation –Why study (big) graphs? Part#1: Patterns & fraud detection Part#2: time-evolving graphs; tensors Conclusions CMU Cylab
7
CMU SCS CMU Cylab(c) 2015, C. Faloutsos 7 Part 1: Anomalies & fraud detection
8
CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 8 CMU Cylab 8 (c) 2015, C. Faloutsos ?? ?
9
CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 9 CMU Cylab 9 (c) 2015, C. Faloutsos
10
CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 10 CMU Cylab 10 (c) 2015, C. Faloutsos
11
CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 11 CMU Cylab 11 (c) 2015, C. Faloutsos
12
CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 12 CMU Cylab 12 (c) 2015, C. Faloutsos
13
CMU SCS (c) 2015, C. Faloutsos 13 Roadmap Introduction – Motivation Part#1: Anomaly / fraud detection –CopyCatch –Malware & Belief Propagation Part#2: time-evolving graphs; tensors Conclusions CMU Cylab
14
CMU SCS Fraud Given –Who ‘likes’ what page, and when Find –Suspicious users and suspicious products CMU Cylab(c) 2015, C. Faloutsos 14 CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks, Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Christos Faloutsos WWW, 2013.
15
CMU SCS Fraud Given –Who ‘likes’ what page, and when Find –Suspicious users and suspicious products CMU Cylab(c) 2015, C. Faloutsos 15 CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks, Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Christos Faloutsos WWW, 2013. Likes
16
CMU SCS Our intuition ▪ Lockstep behavior: Same Likes, same time Graph Patterns and Lockstep Behavior Likes CMU Cylab 16 (c) 2015, C. Faloutsos
17
CMU SCS Our intuition ▪ Lockstep behavior: Same Likes, same time Graph Patterns and Lockstep Behavior Likes CMU Cylab 17 (c) 2015, C. Faloutsos
18
CMU SCS Our intuition ▪ Lockstep behavior: Same Likes, same time Graph Patterns and Lockstep Behavior Suspicious Lockstep Behavior Likes CMU Cylab 18 (c) 2015, C. Faloutsos
19
CMU SCS MapReduce Overview ▪ Use Hadoop to search for many clusters in parallel: 1. Start with randomly seed 2. Update set of Pages and center Like times for each cluster 3. Repeat until convergence Likes CMU Cylab 19 (c) 2015, C. Faloutsos
20
CMU SCS Deployment at Facebook ▪ CopyCatch runs regularly (along with many other security mechanisms, and a large Site Integrity team) 3 months of CopyCatch @ Facebook #users caught time CMU Cylab 20 (c) 2015, C. Faloutsos
21
CMU SCS Deployment at Facebook Manually labeled 22 randomly selected clusters from February 2013 Fake acct CMU Cylab 21 (c) 2015, C. Faloutsos
22
CMU SCS Deployment at Facebook Manually labeled 22 randomly selected clusters from February 2013 Most clusters (77%) come from real but compromised users Fake acct CMU Cylab 22 (c) 2015, C. Faloutsos
23
CMU SCS (c) 2015, C. Faloutsos 23 Roadmap Introduction – Motivation Part#1: Anomaly / fraud detection –CopyCatch –Malware & Belief Propagation Part#2: time-evolving graphs; tensors Conclusions CMU Cylab
24
CMU SCS Polo Chau Machine Learning Dept Carey Nachenberg Vice President & Fellow Jeffrey Wilhelm Principal Software Engineer Adam Wright Software Engineer Prof. Christos Faloutsos Computer Science Dept Polonium: Tera-Scale Graph Mining and Inference for Malware Detection PATENT PENDING SDM 2011, Mesa, Arizona
25
CMU SCS Polonium: The Data 60+ terabytes of data anonymously contributed by participants of worldwide Norton Community Watch program 50+ million machines 900+ million executable files Constructed a machine-file bipartite graph (0.2 TB+) 1 billion nodes (machines and files) 37 billion edges CMU Cylab 25 (c) 2015, C. Faloutsos Ç√Ç√ Ç√Ç√ Ç√Ç√
26
CMU SCS Polonium: Key Ideas Use Belief Propagation to propagate domain knowledge in machine-file graph to detect malware Use “guilt-by-association” (i.e., homophily) –E.g., files that appear on machines with many bad files are more likely to be bad Scalability: handles 37 billion-edge graph CMU Cylab 26 (c) 2015, C. Faloutsos
27
CMU SCS Polonium: One-Interaction Results 84.9% True Positive Rate 1% False Positive Rate True Positive Rate % of malware correctly identified 27 Ideal CMU Cylab(c) 2015, C. Faloutsos False Positive Rate % of non-malware wrongly labeled as malware
28
CMU SCS CMU Cylab(c) 2015, C. Faloutsos 28 Part 2: Time evolving graphs; tensors
29
CMU SCS Graphs over time -> tensors! Problem #2.1: –Given who calls whom, and when –Find patterns / anomalies CMU Cylab(c) 2015, C. Faloutsos 29 smith johnson
30
CMU SCS Graphs over time -> tensors! Problem #2.1: –Given who calls whom, and when –Find patterns / anomalies CMU Cylab(c) 2015, C. Faloutsos 30
31
CMU SCS Graphs over time -> tensors! Problem #2.1: –Given who calls whom, and when –Find patterns / anomalies CMU Cylab(c) 2015, C. Faloutsos 31 Mon Tue
32
CMU SCS Graphs over time -> tensors! Problem #2.1: –Given who calls whom, and when –Find patterns / anomalies CMU Cylab(c) 2015, C. Faloutsos 32 callee caller time
33
CMU SCS Answer to both: tensor factorization Recall: (SVD) matrix factorization: finds blocks CMU Cylab(c) 2015, C. Faloutsos 33 N users M products ‘meat-eaters’ ‘steaks’ ‘vegetarians’ ‘plants’ ‘kids’ ‘cookies’ ~ ++
34
CMU SCS Answer to both: tensor factorization PARAFAC decomposition CMU Cylab(c) 2015, C. Faloutsos 34 = + + subject object verb politicians artistsathletes
35
CMU SCS Answer: tensor factorization PARAFAC decomposition Results for who-calls-whom-when –4M x 15 days CMU Cylab(c) 2015, C. Faloutsos 35 = + + caller callee time ??
36
CMU SCS Anomaly detection in time- evolving graphs Anomalous communities in phone call data: –European country, 4M clients, data over 2 weeks ~200 calls to EACH receiver on EACH day! 1 caller5 receivers4 days of activity CMU Cylab 36 (c) 2015, C. Faloutsos =
37
CMU SCS Anomaly detection in time- evolving graphs Anomalous communities in phone call data: –European country, 4M clients, data over 2 weeks ~200 calls to EACH receiver on EACH day! CMU Cylab 37 (c) 2015, C. Faloutsos = 1 caller5 receivers4 days of activity
38
CMU SCS Anomaly detection in time- evolving graphs Anomalous communities in phone call data: –European country, 4M clients, data over 2 weeks ~200 calls to EACH receiver on EACH day! CMU Cylab 38 (c) 2015, C. Faloutsos = Miguel Araujo, Spiros Papadimitriou, Stephan Günnemann, Christos Faloutsos, Prithwish Basu, Ananthram Swami, Evangelos Papalexakis, Danai Koutra. Com2: Fast Automatic Discovery of Temporal (Comet) Communities. PAKDD 2014, Tainan, Taiwan.
39
CMU SCS (c) 2015, C. Faloutsos 39 Cast Akoglu, Leman Chau, Polo Kang, U Prakash, Aditya CMU Cylab Koutra, Danai Beutel, Alex Papalexakis, Vagelis Shah, Neil Lee, Jay Yoon Araujo, Miguel
40
CMU SCS (c) 2015, C. Faloutsos 40 CONCLUSION#1 – Big data Large datasets reveal patterns/outliers that are invisible otherwise CMU Cylab
41
CMU SCS (c) 2015, C. Faloutsos 41 CONCLUSION#2 – tensors powerful tool CMU Cylab = 1 caller5 receivers4 days of activity
42
CMU SCS TAKE HOME MESSAGE: Cross-disciplinarity CMU Cylab(c) 2015, C. Faloutsos 42 = 1 caller5 receivers 4 days of activity
43
CMU SCS Cross-disciplinarity CMU Cylab(c) 2015, C. Faloutsos 43 = 1 caller5 receivers 4 days of activity Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.