CMU SCS Mining Large Graphs: Fraud Detection, and Algorithms Christos Faloutsos CMU
CMU SCS (c) 2015, C. Faloutsos 2 Graphs - why should we care? >$10B; ~1B users CMU Cylab
CMU SCS (c) 2015, C. Faloutsos 3 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] CMU Cylab
CMU SCS (c) 2015, C. Faloutsos 4 Graphs - why should we care? web-log (‘blog’) news propagation computer network security: /IP traffic and anomaly detection Recommendation systems.... Many-to-many db relationship -> graph CMU Cylab
CMU SCS Motivating problems P1: patterns? Fraud detection? P2: patterns in time-evolving graphs / tensors CMU Cylab(c) 2015, C. Faloutsos 5 time source destination
CMU SCS (c) 2015, C. Faloutsos 6 Roadmap Introduction – Motivation –Why study (big) graphs? Part#1: Patterns & fraud detection Part#2: time-evolving graphs; tensors Conclusions CMU Cylab
CMU SCS CMU Cylab(c) 2015, C. Faloutsos 7 Part 1: Anomalies & fraud detection
CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 8 CMU Cylab 8 (c) 2015, C. Faloutsos ?? ?
CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 9 CMU Cylab 9 (c) 2015, C. Faloutsos
CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 10 CMU Cylab 10 (c) 2015, C. Faloutsos
CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 11 CMU Cylab 11 (c) 2015, C. Faloutsos
CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 12 CMU Cylab 12 (c) 2015, C. Faloutsos
CMU SCS (c) 2015, C. Faloutsos 13 Roadmap Introduction – Motivation Part#1: Anomaly / fraud detection –CopyCatch –Malware & Belief Propagation Part#2: time-evolving graphs; tensors Conclusions CMU Cylab
CMU SCS Fraud Given –Who ‘likes’ what page, and when Find –Suspicious users and suspicious products CMU Cylab(c) 2015, C. Faloutsos 14 CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks, Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Christos Faloutsos WWW, 2013.
CMU SCS Fraud Given –Who ‘likes’ what page, and when Find –Suspicious users and suspicious products CMU Cylab(c) 2015, C. Faloutsos 15 CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks, Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Christos Faloutsos WWW, Likes
CMU SCS Our intuition ▪ Lockstep behavior: Same Likes, same time Graph Patterns and Lockstep Behavior Likes CMU Cylab 16 (c) 2015, C. Faloutsos
CMU SCS Our intuition ▪ Lockstep behavior: Same Likes, same time Graph Patterns and Lockstep Behavior Likes CMU Cylab 17 (c) 2015, C. Faloutsos
CMU SCS Our intuition ▪ Lockstep behavior: Same Likes, same time Graph Patterns and Lockstep Behavior Suspicious Lockstep Behavior Likes CMU Cylab 18 (c) 2015, C. Faloutsos
CMU SCS MapReduce Overview ▪ Use Hadoop to search for many clusters in parallel: 1. Start with randomly seed 2. Update set of Pages and center Like times for each cluster 3. Repeat until convergence Likes CMU Cylab 19 (c) 2015, C. Faloutsos
CMU SCS Deployment at Facebook ▪ CopyCatch runs regularly (along with many other security mechanisms, and a large Site Integrity team) 3 months of Facebook #users caught time CMU Cylab 20 (c) 2015, C. Faloutsos
CMU SCS Deployment at Facebook Manually labeled 22 randomly selected clusters from February 2013 Fake acct CMU Cylab 21 (c) 2015, C. Faloutsos
CMU SCS Deployment at Facebook Manually labeled 22 randomly selected clusters from February 2013 Most clusters (77%) come from real but compromised users Fake acct CMU Cylab 22 (c) 2015, C. Faloutsos
CMU SCS (c) 2015, C. Faloutsos 23 Roadmap Introduction – Motivation Part#1: Anomaly / fraud detection –CopyCatch –Malware & Belief Propagation Part#2: time-evolving graphs; tensors Conclusions CMU Cylab
CMU SCS Polo Chau Machine Learning Dept Carey Nachenberg Vice President & Fellow Jeffrey Wilhelm Principal Software Engineer Adam Wright Software Engineer Prof. Christos Faloutsos Computer Science Dept Polonium: Tera-Scale Graph Mining and Inference for Malware Detection PATENT PENDING SDM 2011, Mesa, Arizona
CMU SCS Polonium: The Data 60+ terabytes of data anonymously contributed by participants of worldwide Norton Community Watch program 50+ million machines 900+ million executable files Constructed a machine-file bipartite graph (0.2 TB+) 1 billion nodes (machines and files) 37 billion edges CMU Cylab 25 (c) 2015, C. Faloutsos Ç√Ç√ Ç√Ç√ Ç√Ç√
CMU SCS Polonium: Key Ideas Use Belief Propagation to propagate domain knowledge in machine-file graph to detect malware Use “guilt-by-association” (i.e., homophily) –E.g., files that appear on machines with many bad files are more likely to be bad Scalability: handles 37 billion-edge graph CMU Cylab 26 (c) 2015, C. Faloutsos
CMU SCS Polonium: One-Interaction Results 84.9% True Positive Rate 1% False Positive Rate True Positive Rate % of malware correctly identified 27 Ideal CMU Cylab(c) 2015, C. Faloutsos False Positive Rate % of non-malware wrongly labeled as malware
CMU SCS CMU Cylab(c) 2015, C. Faloutsos 28 Part 2: Time evolving graphs; tensors
CMU SCS Graphs over time -> tensors! Problem #2.1: –Given who calls whom, and when –Find patterns / anomalies CMU Cylab(c) 2015, C. Faloutsos 29 smith johnson
CMU SCS Graphs over time -> tensors! Problem #2.1: –Given who calls whom, and when –Find patterns / anomalies CMU Cylab(c) 2015, C. Faloutsos 30
CMU SCS Graphs over time -> tensors! Problem #2.1: –Given who calls whom, and when –Find patterns / anomalies CMU Cylab(c) 2015, C. Faloutsos 31 Mon Tue
CMU SCS Graphs over time -> tensors! Problem #2.1: –Given who calls whom, and when –Find patterns / anomalies CMU Cylab(c) 2015, C. Faloutsos 32 callee caller time
CMU SCS Answer to both: tensor factorization Recall: (SVD) matrix factorization: finds blocks CMU Cylab(c) 2015, C. Faloutsos 33 N users M products ‘meat-eaters’ ‘steaks’ ‘vegetarians’ ‘plants’ ‘kids’ ‘cookies’ ~ ++
CMU SCS Answer to both: tensor factorization PARAFAC decomposition CMU Cylab(c) 2015, C. Faloutsos 34 = + + subject object verb politicians artistsathletes
CMU SCS Answer: tensor factorization PARAFAC decomposition Results for who-calls-whom-when –4M x 15 days CMU Cylab(c) 2015, C. Faloutsos 35 = + + caller callee time ??
CMU SCS Anomaly detection in time- evolving graphs Anomalous communities in phone call data: –European country, 4M clients, data over 2 weeks ~200 calls to EACH receiver on EACH day! 1 caller5 receivers4 days of activity CMU Cylab 36 (c) 2015, C. Faloutsos =
CMU SCS Anomaly detection in time- evolving graphs Anomalous communities in phone call data: –European country, 4M clients, data over 2 weeks ~200 calls to EACH receiver on EACH day! CMU Cylab 37 (c) 2015, C. Faloutsos = 1 caller5 receivers4 days of activity
CMU SCS Anomaly detection in time- evolving graphs Anomalous communities in phone call data: –European country, 4M clients, data over 2 weeks ~200 calls to EACH receiver on EACH day! CMU Cylab 38 (c) 2015, C. Faloutsos = Miguel Araujo, Spiros Papadimitriou, Stephan Günnemann, Christos Faloutsos, Prithwish Basu, Ananthram Swami, Evangelos Papalexakis, Danai Koutra. Com2: Fast Automatic Discovery of Temporal (Comet) Communities. PAKDD 2014, Tainan, Taiwan.
CMU SCS (c) 2015, C. Faloutsos 39 Cast Akoglu, Leman Chau, Polo Kang, U Prakash, Aditya CMU Cylab Koutra, Danai Beutel, Alex Papalexakis, Vagelis Shah, Neil Lee, Jay Yoon Araujo, Miguel
CMU SCS (c) 2015, C. Faloutsos 40 CONCLUSION#1 – Big data Large datasets reveal patterns/outliers that are invisible otherwise CMU Cylab
CMU SCS (c) 2015, C. Faloutsos 41 CONCLUSION#2 – tensors powerful tool CMU Cylab = 1 caller5 receivers4 days of activity
CMU SCS TAKE HOME MESSAGE: Cross-disciplinarity CMU Cylab(c) 2015, C. Faloutsos 42 = 1 caller5 receivers 4 days of activity
CMU SCS Cross-disciplinarity CMU Cylab(c) 2015, C. Faloutsos 43 = 1 caller5 receivers 4 days of activity Thank you!